This is the start of the stable review cycle for the 5.4.262 release. There are 159 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 26 Nov 2023 17:19:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.262-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.4.262-rc1
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: bogus EBUSY when deleting flowtable after flush (for 5.4)
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: disable toggling dormant table state more than once
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: fix table flag updates
Pablo Neira Ayuso pablo@netfilter.org netfilter: nftables: update table flags from the commit phase
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: double hook unregistration in netns path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: unregister flowtable hooks on netns exit
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: fix memleak when more than 255 elements expired
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
Florian Westphal fw@strlen.de netfilter: nf_tables: defer gc run if previous batch is still pending
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: use correct lock to protect gc_list
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: GC transaction race with abort path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: GC transaction race with netns dismantle
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: remove busy mark and gc batch API
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_set_hash: mark set element as dead when deleting from packet path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: adapt set backend to use GC transaction API
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: GC transaction API to avoid race with control plane
Florian Westphal fw@strlen.de netfilter: nf_tables: don't skip expired elements during walk
Florian Westphal fw@strlen.de netfilter: nft_set_rbtree: fix overlap expiration walk
Florian Westphal fw@strlen.de netfilter: nft_set_rbtree: fix null deref on element insertion
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_set_rbtree: Switch to node list walk for overlap detection
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: drop map element references from preparation phase
Pablo Neira Ayuso pablo@netfilter.org netfilter: nftables: rename set element data activation/deactivation functions
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: pass context to nft_set_destroy()
Steven Rostedt (Google) rostedt@goodmis.org tracing: Have trace_event_file have ref counters
Christian König christian.koenig@amd.com drm/amdgpu: fix error handling in amdgpu_bo_list_get()
Kemeng Shi shikemeng@huaweicloud.com ext4: remove gdb backup copy for meta bg in setup_new_flex_group_blocks
Zhang Yi yi.zhang@huawei.com ext4: correct the start block of counting reserved clusters
Kemeng Shi shikemeng@huaweicloud.com ext4: correct return value of ext4_convert_meta_bg
Kemeng Shi shikemeng@huaweicloud.com ext4: correct offset of gdb backup in non meta_bg group to update_backups
Max Kellermann max.kellermann@ionos.com ext4: apply umask if ACL support is disabled
Heiner Kallweit hkallweit1@gmail.com Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E"
Mahmoud Adam mngyadam@amazon.com nfsd: fix file memleak on client_opens_release
Vikash Garodia quic_vgarodia@quicinc.com media: venus: hfi: add checks to handle capabilities from firmware
Vikash Garodia quic_vgarodia@quicinc.com media: venus: hfi: fix the check to handle session buffer requirement
Vikash Garodia quic_vgarodia@quicinc.com media: venus: hfi_parser: Add check to keep the number of codecs within range
Sean Young sean@mess.org media: sharp: fix sharp encoding
Sean Young sean@mess.org media: lirc: drop trailing space from scancode transmit
Heiner Kallweit hkallweit1@gmail.com i2c: i801: fix potential race in i801_block_transaction_byte_by_byte
Alexander Sverdlin alexander.sverdlin@siemens.com net: dsa: lan9303: consequently nested-lock physical MDIO
Johnathan Mantey johnathanx.mantey@intel.com Revert ncsi: Propagate carrier gain/loss events to the NCSI controller
Guan Wentao guanwentao@uniontech.com Bluetooth: btusb: Add 0bda:b85b for Fn-Link RTL8852BE
Masum Reza masumrezarock100@gmail.com Bluetooth: btusb: Add RTW8852BE device 13d3:3570 to device tables
Larry Finger Larry.Finger@lwfinger.net bluetooth: Add device 13d3:3571 to device tables
Larry Finger Larry.Finger@lwfinger.net bluetooth: Add device 0bda:887b to device tables
Artem Lukyanov dukzcry@ya.ru Bluetooth: btusb: Add Realtek RTL8852BE support ID 0x0cb8:0xc559
Joseph Hwang josephsih@chromium.org Bluetooth: btusb: add Realtek 8822CE to usb_device_id table
Alain Michaud alainm@chromium.org Bluetooth: btusb: Add flag to define wideband speech capability
Pavel Krasavin pkrasavin@imaqliq.com tty: serial: meson: fix hard LOCKUP on crtscts mode
Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com serial: meson: Use platform_get_irq() to get the interrupt
Neil Armstrong narmstrong@baylibre.com tty: serial: meson: retrieve port FIFO size from DT
Colin Ian King colin.king@canonical.com serial: meson: remove redundant initialization of variable id
Dmitry Safonov 0x7f454c46@gmail.com tty/serial: Migrate meson_uart to use has_sysrq
Chandradeep Dey codesigning@chandradeepdey.com ALSA: hda/realtek - Enable internal speaker of ASUS K6500ZC
Takashi Iwai tiwai@suse.de ALSA: info: Fix potential deadlock at disconnection
Helge Deller deller@gmx.de parisc/power: Fix power soft-off when running on qemu
Helge Deller deller@gmx.de parisc/pgtable: Do not drop upper 5 address bits of physical address
Helge Deller deller@gmx.de parisc: Prevent booting 64-bit kernels on PA1.x machines
Joshua Yeong joshua.yeong@starfivetech.com i3c: master: cdns: Fix reading status register
Zi Yan ziy@nvidia.com mm/cma: use nth_page() in place of direct struct page manipulation
Heiko Carstens hca@linux.ibm.com s390/cmma: fix handling of swapper_pg_dir and invalid_pg_dir
Heiko Carstens hca@linux.ibm.com s390/cmma: fix initial kernel address space page table walk
Alain Volmat alain.volmat@foss.st.com dmaengine: stm32-mdma: correct desc prep when channel running
Sanjuán García, Jorge Jorge.SanjuanGarcia@duagon.com mcb: fix error handling for different scenarios when parsing
Benjamin Bara benjamin.bara@skidata.com i2c: core: Run atomic i2c xfer when !preemptible
Benjamin Bara benjamin.bara@skidata.com kernel/reboot: emergency_restart: Set correct system_state
Eric Biggers ebiggers@google.com quota: explicitly forbid quota files from being encrypted
Zhihao Cheng chengzhihao1@huawei.com jbd2: fix potential data lost in recovering journal raced with synchronizing fs bdev
Josef Bacik josef@toxicpanda.com btrfs: don't arbitrarily slow down delalloc if we're committing
Brian Geffon bgeffon@google.com PM: hibernate: Clean up sync_read handling in snapshot_write_next()
Brian Geffon bgeffon@google.com PM: hibernate: Use __get_safe_page() rather than touching the list
Dan Carpenter dan.carpenter@linaro.org mmc: vub300: fix an error code
Kathiravan Thirumoorthy quic_kathirav@quicinc.com clk: qcom: ipq8074: drop the CLK_SET_RATE_PARENT flag from PLL clocks
Helge Deller deller@gmx.de parisc/power: Add power soft-off when running on qemu
Helge Deller deller@gmx.de parisc/pdc: Add width field to struct pdc_model
Uwe Kleine-König u.kleine-koenig@pengutronix.de PCI: keystone: Don't discard .probe() callback
Uwe Kleine-König u.kleine-koenig@pengutronix.de PCI: keystone: Don't discard .remove() callback
Herve Codina herve.codina@bootlin.com genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware
Rong Chen rong.chen@amlogic.com mmc: meson-gx: Remove setting of CMD_CFG_ERROR
Werner Sembach wse@tuxedocomputers.com ACPI: resource: Do IRQ override on TongFang GMxXGxx
Lukas Wunner lukas@wunner.de PCI/sysfs: Protect driver's D3cold preference from user space
David Woodhouse dwmw@amazon.co.uk hvc/xen: fix error path in xen_hvc_init() to always register frontend driver
Paul Moore paul@paul-moore.com audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare()
Paul Moore paul@paul-moore.com audit: don't take task_lock() in audit_exe_compare() code path
Maciej S. Szmigiero maciej.szmigiero@oracle.com KVM: x86: Ignore MSR_AMD64_TW_CFG access
Nicolas Saenz Julienne nsaenz@amazon.com KVM: x86: hyper-v: Don't auto-enable stimer on write from user-space
Pu Wen puwen@hygon.cn x86/cpu/hygon: Fix the CPU topology evaluation for real
Chandrakanth patil chandrakanth.patil@broadcom.com scsi: megaraid_sas: Increase register read retry rount from 3 to 30 for selected registers
Shung-Hsi Yu shung-hsi.yu@suse.com bpf: Fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
Kees Cook keescook@chromium.org randstruct: Fix gcc-plugin performance mode to stay in group
Vikash Garodia quic_vgarodia@quicinc.com media: venus: hfi: add checks to perform sanity on queue pointers
Anastasia Belova abelova@astralinux.ru cifs: spnego: add ';' in HOST_KEY_LEN
Zhang Rui rui.zhang@intel.com tools/power/turbostat: Fix a knl bug
Vlad Buslov vladbu@nvidia.com macvlan: Don't propagate promisc change to lower dev in passthru
Rahul Rameshbabu rrameshbabu@nvidia.com net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors
Leon Romanovsky leonro@nvidia.com net/mlx5_core: Clean driver version and name
Dust Li dust.li@linux.alibaba.com net/mlx5e: fix double free of encap_header
Baruch Siach baruch@tkos.co.il net: stmmac: fix rx budget limit check
Jose Abreu Jose.Abreu@synopsys.com net: stmmac: Rework stmmac_rx()
Linkui Xiao xiaolinkui@kylinos.cn netfilter: nf_conntrack_bridge: initialize err to 0
Linus Walleij linus.walleij@linaro.org net: ethernet: cortina: Fix MTU max setting
Linus Walleij linus.walleij@linaro.org net: ethernet: cortina: Handle large frames
Linus Walleij linus.walleij@linaro.org net: ethernet: cortina: Fix max RX frame define
Eric Dumazet edumazet@google.com bonding: stop the device in bond_setup_by_slave()
Eric Dumazet edumazet@google.com ptp: annotate data-race around q->head and q->tail
Juergen Gross jgross@suse.com xen/events: fix delayed eoi list handling
Willem de Bruijn willemb@google.com ppp: limit MRU to 64K
Shigeru Yoshida syoshida@redhat.com tipc: Fix kernel-infoleak due to uninitialized TLV value
Yonglong Liu liuyonglong@huawei.com net: hns3: fix variable may not initialized problem in hns3_init_mac_addr()
Shigeru Yoshida syoshida@redhat.com tty: Fix uninit-value access in ppp_sync_receive()
Eric Dumazet edumazet@google.com ipvlan: add ipvlan_route_v6_outbound() helper
Olga Kornievskaia kolga@netapp.com NFSv4.1: fix SP4_MACH_CRED protection for pnfs IO
Ian Rogers irogers@google.com perf hist: Add missing puts to hist__account_cycles
Kan Liang kan.liang@linux.intel.com perf tools: Add hw_idx in struct branch_stack
Miri Korenblit miriam.rachel.korenblit@intel.com wifi: iwlwifi: Use FW rate for non-data frames
Dan Carpenter dan.carpenter@linaro.org pwm: Fix double shift bug
Tony Lindgren tony@atomide.com ASoC: ti: omap-mcbsp: Fix runtime PM underflow warnings
Douglas Anderson dianders@chromium.org kgdb: Flush console before entering kgdb on panic
Wayne Lin wayne.lin@amd.com drm/amd/display: Avoid NULL dereference of timing generator
Ilpo Järvinen ilpo.jarvinen@linux.intel.com media: cobalt: Use FIELD_GET() to extract Link Width
Bob Peterson rpeterso@redhat.com gfs2: ignore negated quota changes
Hans Verkuil hverkuil-cisco@xs4all.nl media: vivid: avoid integer overflow
Rajeshwar R Shinde coolrrsh@gmail.com media: gspca: cpia1: shift-out-of-bounds in set_flicker
Axel Lin axel.lin@ingics.com i2c: sun6i-p2wi: Prevent potential division by zero
Hardik Gajjar hgajjar@de.adit-jv.com usb: gadget: f_ncm: Always set current gadget in ncm_bind()
Yi Yang yiyang13@huawei.com tty: vcc: Add check for kstrdup() in vcc_probe()
Jiri Kosina jkosina@suse.cz HID: Add quirk for Dell Pro Wireless Keyboard and Mouse KM5221W
Wenchao Hao haowenchao2@huawei.com scsi: libfc: Fix potential NULL pointer dereference in fc_lport_ptp_setup()
Ilpo Järvinen ilpo.jarvinen@linux.intel.com atm: iphase: Do PCI error checks on own line
Ilpo Järvinen ilpo.jarvinen@linux.intel.com PCI: tegra194: Use FIELD_GET()/FIELD_PREP() with Link Width fields
Cezary Rojewski cezary.rojewski@intel.com ALSA: hda: Fix possible null-ptr-deref when assigning a stream
Vincent Whitchurch vincent.whitchurch@axis.com ARM: 9320/1: fix stack depot IRQ stack filter
Manas Ghandat ghandatmanas@gmail.com jfs: fix array-index-out-of-bounds in diAlloc
Manas Ghandat ghandatmanas@gmail.com jfs: fix array-index-out-of-bounds in dbFindLeaf
Juntong Deng juntong.deng@outlook.com fs/jfs: Add validity check for db_maxag and db_agpref
Juntong Deng juntong.deng@outlook.com fs/jfs: Add check for negative db_l2nbperpage
Ilpo Järvinen ilpo.jarvinen@linux.intel.com RDMA/hfi1: Use FIELD_GET() to extract Link Width
Lu Jialin lujialin4@huawei.com crypto: pcrypt - Fix hungtask for PADATA_RESET
zhujun2 zhujun2@cmss.chinamobile.com selftests/efivarfs: create-read: fix a resource leak
Qu Huang qu.huang@linux.dev drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL
Mario Limonciello mario.limonciello@amd.com drm/amd: Fix UBSAN array-index-out-of-bounds for Polaris and Tonga
Mario Limonciello mario.limonciello@amd.com drm/amd: Fix UBSAN array-index-out-of-bounds for SMU7
baozhu.liu lucas.liu@siengine.com drm/komeda: drop all currently held locks if deadlock happens
Olli Asikainen olli.asikainen@gmail.com platform/x86: thinkpad_acpi: Add battery quirk for Thinkpad X120e
ZhengHan Wang wzhmmmmm@gmail.com Bluetooth: Fix double free in hci_conn_cleanup
Douglas Anderson dianders@chromium.org wifi: ath10k: Don't touch the CE interrupt registers after power up
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_dst_pending_confirm
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_tx_queue_mapping
Dmitry Antipov dmantipov@yandex.ru wifi: ath10k: fix clang-specific fortify warning
Dmitry Antipov dmantipov@yandex.ru wifi: ath9k: fix clang-specific fortify warnings
Ping-Ke Shih pkshih@realtek.com wifi: mac80211: don't return unset power in ieee80211_get_tx_power()
Dmitry Antipov dmantipov@yandex.ru wifi: mac80211_hwsim: fix clang-specific fortify warning
Mike Rapoport (IBM) rppt@kernel.org x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size
Ronald Wahl ronald.wahl@raritan.com clocksource/drivers/timer-atmel-tcb: Fix initialization on SAM9 hardware
Jacky Bai ping.bai@nxp.com clocksource/drivers/timer-imx-gpt: Fix potential memory leak
Shuai Xue xueshuai@linux.alibaba.com perf/core: Bail out early if the request AUX area is out of bound
John Stultz jstultz@google.com locking/ww_mutex/test: Fix potential workqueue corruption
-------------
Diffstat:
Makefile | 4 +- arch/arm/include/asm/exception.h | 4 - arch/parisc/include/uapi/asm/pdc.h | 1 + arch/parisc/kernel/entry.S | 7 +- arch/parisc/kernel/head.S | 5 +- arch/s390/mm/page-states.c | 19 +- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/numa.h | 7 - arch/x86/kernel/cpu/hygon.c | 8 +- arch/x86/kvm/hyperv.c | 10 +- arch/x86/kvm/x86.c | 2 + arch/x86/mm/numa.c | 7 - crypto/pcrypt.c | 4 + drivers/acpi/resource.c | 12 + drivers/atm/iphase.c | 20 +- drivers/bluetooth/btusb.c | 35 +- drivers/clk/qcom/gcc-ipq8074.c | 6 - drivers/clocksource/timer-atmel-tcb.c | 1 + drivers/clocksource/timer-imx-gpt.c | 18 +- drivers/dma/stm32-mdma.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 6 + drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 4 +- drivers/gpu/drm/amd/include/pptable.h | 4 +- drivers/gpu/drm/amd/powerplay/hwmgr/pptable_v1_0.h | 16 +- .../drm/arm/display/komeda/komeda_pipeline_state.c | 9 +- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-quirks.c | 1 + drivers/i2c/busses/i2c-i801.c | 19 +- drivers/i2c/busses/i2c-sun6i-p2wi.c | 5 + drivers/i2c/i2c-core.h | 2 +- drivers/i3c/master/i3c-master-cdns.c | 6 +- drivers/infiniband/hw/hfi1/pcie.c | 9 +- drivers/mcb/mcb-core.c | 1 + drivers/mcb/mcb-parse.c | 2 +- drivers/media/pci/cobalt/cobalt-driver.c | 11 +- drivers/media/platform/qcom/venus/hfi_msgs.c | 2 +- drivers/media/platform/qcom/venus/hfi_parser.c | 15 + drivers/media/platform/qcom/venus/hfi_venus.c | 10 + drivers/media/platform/vivid/vivid-rds-gen.c | 2 +- drivers/media/rc/ir-sharp-decoder.c | 8 +- drivers/media/rc/lirc_dev.c | 6 +- drivers/media/usb/gspca/cpia1.c | 3 + drivers/mmc/host/meson-gx-mmc.c | 1 - drivers/mmc/host/vub300.c | 1 + drivers/net/bonding/bond_main.c | 6 + drivers/net/dsa/lan9303_mdio.c | 4 +- drivers/net/ethernet/cortina/gemini.c | 45 +- drivers/net/ethernet/cortina/gemini.h | 4 +- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 2 +- .../net/ethernet/mellanox/mlx5/core/en/tc_tun.c | 10 +- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 13 +- .../ethernet/mellanox/mlx5/core/ipoib/ethtool.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 10 +- .../net/ethernet/mellanox/mlx5/core/mlx5_core.h | 3 - drivers/net/ethernet/realtek/r8169_main.c | 4 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 148 +++--- drivers/net/ipvlan/ipvlan_core.c | 41 +- drivers/net/macvlan.c | 2 +- drivers/net/ppp/ppp_synctty.c | 6 +- drivers/net/wireless/ath/ath10k/debug.c | 2 +- drivers/net/wireless/ath/ath10k/snoc.c | 18 +- drivers/net/wireless/ath/ath9k/debug.c | 2 +- drivers/net/wireless/ath/ath9k/htc_drv_debug.c | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/tx.c | 14 +- drivers/net/wireless/mac80211_hwsim.c | 2 +- drivers/parisc/power.c | 16 +- drivers/pci/controller/dwc/pci-keystone.c | 8 +- drivers/pci/controller/dwc/pcie-tegra194.c | 9 +- drivers/pci/pci-acpi.c | 2 +- drivers/pci/pci-sysfs.c | 5 +- drivers/platform/x86/thinkpad_acpi.c | 1 + drivers/ptp/ptp_chardev.c | 3 +- drivers/ptp/ptp_clock.c | 5 +- drivers/ptp/ptp_private.h | 8 +- drivers/ptp/ptp_sysfs.c | 3 +- drivers/scsi/libfc/fc_lport.c | 6 + drivers/scsi/megaraid/megaraid_sas_base.c | 4 +- drivers/tty/hvc/hvc_xen.c | 5 +- drivers/tty/serial/meson_uart.c | 38 +- drivers/tty/vcc.c | 16 +- drivers/usb/gadget/function/f_ncm.c | 27 +- drivers/xen/events/events_base.c | 4 +- fs/btrfs/delalloc-space.c | 3 - fs/cifs/cifs_spnego.c | 4 +- fs/ext4/acl.h | 5 + fs/ext4/extents_status.c | 4 +- fs/ext4/resize.c | 19 +- fs/gfs2/quota.c | 11 + fs/jbd2/recovery.c | 8 + fs/jfs/jfs_dmap.c | 23 +- fs/jfs/jfs_imap.c | 5 +- fs/nfs/nfs4proc.c | 5 +- fs/nfsd/nfs4state.c | 2 +- fs/quota/dquot.c | 14 + include/linux/mlx5/driver.h | 2 + include/linux/pwm.h | 4 +- include/linux/trace_events.h | 4 + include/net/netfilter/nf_tables.h | 129 ++---- include/net/sock.h | 26 +- include/uapi/linux/netfilter/nf_tables.h | 1 + kernel/audit_watch.c | 9 +- kernel/bpf/verifier.c | 7 +- kernel/debug/debug_core.c | 3 + kernel/events/ring_buffer.c | 6 + kernel/irq/generic-chip.c | 25 +- kernel/locking/test-ww_mutex.c | 20 +- kernel/padata.c | 2 +- kernel/power/snapshot.c | 16 +- kernel/reboot.c | 1 + kernel/trace/trace.c | 15 + kernel/trace/trace.h | 3 + kernel/trace/trace_events.c | 39 +- kernel/trace/trace_events_filter.c | 3 + mm/cma.c | 2 +- net/bluetooth/hci_conn.c | 6 +- net/bluetooth/hci_sysfs.c | 23 +- net/bridge/netfilter/nf_conntrack_bridge.c | 2 +- net/core/sock.c | 2 +- net/ipv4/tcp_output.c | 2 +- net/mac80211/cfg.c | 4 + net/ncsi/ncsi-aen.c | 5 - net/netfilter/nf_tables_api.c | 512 +++++++++++++++++---- net/netfilter/nft_chain_filter.c | 3 + net/netfilter/nft_set_bitmap.c | 5 +- net/netfilter/nft_set_hash.c | 110 +++-- net/netfilter/nft_set_rbtree.c | 375 ++++++++++++--- net/tipc/netlink_compat.c | 1 + scripts/gcc-plugins/randomize_layout_plugin.c | 11 +- sound/core/info.c | 21 +- sound/hda/hdac_stream.c | 6 +- sound/pci/hda/patch_realtek.c | 1 + sound/soc/ti/omap-mcbsp.c | 6 +- tools/perf/builtin-script.c | 70 +-- tools/perf/tests/sample-parsing.c | 7 +- tools/perf/util/branch.h | 22 + tools/perf/util/cs-etm.c | 2 + tools/perf/util/event.h | 1 + tools/perf/util/evsel.c | 5 + tools/perf/util/evsel.h | 5 + tools/perf/util/hist.c | 13 +- tools/perf/util/intel-pt.c | 2 + tools/perf/util/machine.c | 35 +- .../util/scripting-engines/trace-event-python.c | 30 +- tools/perf/util/session.c | 8 +- tools/perf/util/synthetic-events.c | 6 +- tools/power/x86/turbostat/turbostat.c | 2 +- tools/testing/selftests/efivarfs/create-read.c | 2 + 150 files changed, 1790 insertions(+), 761 deletions(-)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Stultz jstultz@google.com
[ Upstream commit bccdd808902f8c677317cec47c306e42b93b849e ]
In some cases running with the test-ww_mutex code, I was seeing odd behavior where sometimes it seemed flush_workqueue was returning before all the work threads were finished.
Often this would cause strange crashes as the mutexes would be freed while they were being used.
Looking at the code, there is a lifetime problem as the controlling thread that spawns the work allocates the "struct stress" structures that are passed to the workqueue threads. Then when the workqueue threads are finished, they free the stress struct that was passed to them.
Unfortunately the workqueue work_struct node is in the stress struct. Which means the work_struct is freed before the work thread returns and while flush_workqueue is waiting.
It seems like a better idea to have the controlling thread both allocate and free the stress structures, so that we can be sure we don't corrupt the workqueue by freeing the structure prematurely.
So this patch reworks the test to do so, and with this change I no longer see the early flush_workqueue returns.
Signed-off-by: John Stultz jstultz@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/locking/test-ww_mutex.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c index 3e82f449b4ff7..da36997d8742c 100644 --- a/kernel/locking/test-ww_mutex.c +++ b/kernel/locking/test-ww_mutex.c @@ -426,7 +426,6 @@ static void stress_inorder_work(struct work_struct *work) } while (!time_after(jiffies, stress->timeout));
kfree(order); - kfree(stress); }
struct reorder_lock { @@ -491,7 +490,6 @@ static void stress_reorder_work(struct work_struct *work) list_for_each_entry_safe(ll, ln, &locks, link) kfree(ll); kfree(order); - kfree(stress); }
static void stress_one_work(struct work_struct *work) @@ -512,8 +510,6 @@ static void stress_one_work(struct work_struct *work) break; } } while (!time_after(jiffies, stress->timeout)); - - kfree(stress); }
#define STRESS_INORDER BIT(0) @@ -524,15 +520,24 @@ static void stress_one_work(struct work_struct *work) static int stress(int nlocks, int nthreads, unsigned int flags) { struct ww_mutex *locks; - int n; + struct stress *stress_array; + int n, count;
locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL); if (!locks) return -ENOMEM;
+ stress_array = kmalloc_array(nthreads, sizeof(*stress_array), + GFP_KERNEL); + if (!stress_array) { + kfree(locks); + return -ENOMEM; + } + for (n = 0; n < nlocks; n++) ww_mutex_init(&locks[n], &ww_class);
+ count = 0; for (n = 0; nthreads; n++) { struct stress *stress; void (*fn)(struct work_struct *work); @@ -556,9 +561,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags) if (!fn) continue;
- stress = kmalloc(sizeof(*stress), GFP_KERNEL); - if (!stress) - break; + stress = &stress_array[count++];
INIT_WORK(&stress->work, fn); stress->locks = locks; @@ -573,6 +576,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
for (n = 0; n < nlocks; n++) ww_mutex_destroy(&locks[n]); + kfree(stress_array); kfree(locks);
return 0;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuai Xue xueshuai@linux.alibaba.com
[ Upstream commit 54aee5f15b83437f23b2b2469bcf21bdd9823916 ]
When perf-record with a large AUX area, e.g 4GB, it fails with:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1 failed to mmap with 12 (Cannot allocate memory)
and it reveals a WARNING with __alloc_pages():
------------[ cut here ]------------ WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248 Call trace: __alloc_pages+0x1ec/0x248 __kmalloc_large_node+0xc0/0x1f8 __kmalloc_node+0x134/0x1e8 rb_alloc_aux+0xe0/0x298 perf_mmap+0x440/0x660 mmap_region+0x308/0x8a8 do_mmap+0x3c0/0x528 vm_mmap_pgoff+0xf4/0x1b8 ksys_mmap_pgoff+0x18c/0x218 __arm64_sys_mmap+0x38/0x58 invoke_syscall+0x50/0x128 el0_svc_common.constprop.0+0x58/0x188 do_el0_svc+0x34/0x50 el0_svc+0x34/0x108 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x1a4/0x1a8
'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to maintains AUX trace pages. The allocated page for this array is physically contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the size of pointer array crosses the limitation set by MAX_ORDER, it reveals a WARNING.
So bail out early with -ENOMEM if the request AUX area is out of bound, e.g.:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1 failed to mmap with 12 (Cannot allocate memory)
Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/events/ring_buffer.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index ffb59a4ef4ff3..fb3edb2f8ac93 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -653,6 +653,12 @@ int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event, max_order--; }
+ /* + * kcalloc_node() is unable to allocate buffer if the size is larger + * than: PAGE_SIZE << MAX_ORDER; directly bail out in this case. + */ + if (get_order((unsigned long)nr_pages * sizeof(void *)) > MAX_ORDER) + return -ENOMEM; rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL, node); if (!rb->aux_pages)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacky Bai ping.bai@nxp.com
[ Upstream commit 8051a993ce222a5158bccc6ac22ace9253dd71cb ]
Fix coverity Issue CID 250382: Resource leak (RESOURCE_LEAK). Add kfree when error return.
Signed-off-by: Jacky Bai ping.bai@nxp.com Reviewed-by: Peng Fan peng.fan@nxp.com Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://lore.kernel.org/r/20231009083922.1942971-1-ping.bai@nxp.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clocksource/timer-imx-gpt.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/clocksource/timer-imx-gpt.c b/drivers/clocksource/timer-imx-gpt.c index 706c0d0ff56cc..268c09417fa21 100644 --- a/drivers/clocksource/timer-imx-gpt.c +++ b/drivers/clocksource/timer-imx-gpt.c @@ -460,12 +460,16 @@ static int __init mxc_timer_init_dt(struct device_node *np, enum imx_gpt_type t return -ENOMEM;
imxtm->base = of_iomap(np, 0); - if (!imxtm->base) - return -ENXIO; + if (!imxtm->base) { + ret = -ENXIO; + goto err_kfree; + }
imxtm->irq = irq_of_parse_and_map(np, 0); - if (imxtm->irq <= 0) - return -EINVAL; + if (imxtm->irq <= 0) { + ret = -EINVAL; + goto err_kfree; + }
imxtm->clk_ipg = of_clk_get_by_name(np, "ipg");
@@ -478,11 +482,15 @@ static int __init mxc_timer_init_dt(struct device_node *np, enum imx_gpt_type t
ret = _mxc_timer_init(imxtm); if (ret) - return ret; + goto err_kfree;
initialized = 1;
return 0; + +err_kfree: + kfree(imxtm); + return ret; }
static int __init imx1_timer_init_dt(struct device_node *np)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ronald Wahl ronald.wahl@raritan.com
[ Upstream commit 6d3bc4c02d59996d1d3180d8ed409a9d7d5900e0 ]
On SAM9 hardware two cascaded 16 bit timers are used to form a 32 bit high resolution timer that is used as scheduler clock when the kernel has been configured that way (CONFIG_ATMEL_CLOCKSOURCE_TCB).
The driver initially triggers a reset-to-zero of the two timers but this reset is only performed on the next rising clock. For the first timer this is ok - it will be in the next 60ns (16MHz clock). For the chained second timer this will only happen after the first timer overflows, i.e. after 2^16 clocks (~4ms with a 16MHz clock). So with other words the scheduler clock resets to 0 after the first 2^16 clock cycles.
It looks like that the scheduler does not like this and behaves wrongly over its lifetime, e.g. some tasks are scheduled with a long delay. Why that is and if there are additional requirements for this behaviour has not been further analysed.
There is a simple fix for resetting the second timer as well when the first timer is reset and this is to set the ATMEL_TC_ASWTRG_SET bit in the Channel Mode register (CMR) of the first timer. This will also rise the TIOA line (clock input of the second timer) when a software trigger respective SYNC is issued.
Signed-off-by: Ronald Wahl ronald.wahl@raritan.com Acked-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://lore.kernel.org/r/20231007161803.31342-1-rwahl@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clocksource/timer-atmel-tcb.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/clocksource/timer-atmel-tcb.c b/drivers/clocksource/timer-atmel-tcb.c index 7427b07495a89..906c1bfdccad3 100644 --- a/drivers/clocksource/timer-atmel-tcb.c +++ b/drivers/clocksource/timer-atmel-tcb.c @@ -310,6 +310,7 @@ static void __init tcb_setup_dual_chan(struct atmel_tc *tc, int mck_divisor_idx) writel(mck_divisor_idx /* likely divide-by-8 */ | ATMEL_TC_WAVE | ATMEL_TC_WAVESEL_UP /* free-run */ + | ATMEL_TC_ASWTRG_SET /* TIOA0 rises at software trigger */ | ATMEL_TC_ACPA_SET /* TIOA0 rises at 0 */ | ATMEL_TC_ACPC_CLEAR, /* (duty cycle 50%) */ tcaddr + ATMEL_TC_REG(0, CMR));
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mike Rapoport (IBM) rppt@kernel.org
[ Upstream commit a1e2b8b36820d8c91275f207e77e91645b7c6836 ]
Qi Zheng reported crashes in a production environment and provided a simplified example as a reproducer:
| For example, if we use Qemu to start a two NUMA node kernel, | one of the nodes has 2M memory (less than NODE_MIN_SIZE), | and the other node has 2G, then we will encounter the | following panic: | | BUG: kernel NULL pointer dereference, address: 0000000000000000 | <...> | RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40 | <...> | Call Trace: | <TASK> | deactivate_slab() | bootstrap() | kmem_cache_init() | start_kernel() | secondary_startup_64_no_verify()
The crashes happen because of inconsistency between the nodemask that has nodes with less than 4MB as memoryless, and the actual memory fed into the core mm.
The commit:
9391a3f9c7f1 ("[PATCH] x86_64: Clear more state when ignoring empty node in SRAT parsing")
... that introduced minimal size of a NUMA node does not explain why a node size cannot be less than 4MB and what boot failures this restriction might fix.
Fixes have been submitted to the core MM code to tighten up the memory topologies it accepts and to not crash on weird input:
mm: page_alloc: skip memoryless nodes entirely mm: memory_hotplug: drop memoryless node from fallback lists
Andrew has accepted them into the -mm tree, but there are no stable SHA1's yet.
This patch drops the limitation for minimal node size on x86:
- which works around the crash without the fixes to the core MM. - makes x86 topologies less weird, - removes an arbitrary and undocumented limitation on NUMA topologies.
[ mingo: Improved changelog clarity. ]
Reported-by: Qi Zheng zhengqi.arch@bytedance.com Tested-by: Mario Casquero mcasquer@redhat.com Signed-off-by: Mike Rapoport (IBM) rppt@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: David Hildenbrand david@redhat.com Acked-by: Michal Hocko mhocko@suse.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Rik van Riel riel@surriel.com Link: https://lore.kernel.org/r/ZS+2qqjEO5/867br@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/numa.h | 7 ------- arch/x86/mm/numa.c | 7 ------- 2 files changed, 14 deletions(-)
diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h index bbfde3d2662f4..4bcd9d0c7bee7 100644 --- a/arch/x86/include/asm/numa.h +++ b/arch/x86/include/asm/numa.h @@ -11,13 +11,6 @@
#define NR_NODE_MEMBLKS (MAX_NUMNODES*2)
-/* - * Too small node sizes may confuse the VM badly. Usually they - * result from BIOS bugs. So dont recognize nodes as standalone - * NUMA entities that have less than this amount of RAM listed: - */ -#define NODE_MIN_SIZE (4*1024*1024) - extern int numa_off;
/* diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 67c617c4a7f20..7316dca7e846a 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -581,13 +581,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi) if (start >= end) continue;
- /* - * Don't confuse VM with a node that doesn't have the - * minimum amount of memory: - */ - if (end && (end - start) < NODE_MIN_SIZE) - continue; - alloc_node_data(nid); }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Antipov dmantipov@yandex.ru
[ Upstream commit cbaccdc42483c65016f1bae89128c08dc17cfb2a ]
When compiling with clang 16.0.6 and CONFIG_FORTIFY_SOURCE=y, I've noticed the following (somewhat confusing due to absence of an actual source code location):
In file included from drivers/net/wireless/virtual/mac80211_hwsim.c:18: In file included from ./include/linux/slab.h:16: In file included from ./include/linux/gfp.h:7: In file included from ./include/linux/mmzone.h:8: In file included from ./include/linux/spinlock.h:56: In file included from ./include/linux/preempt.h:79: In file included from ./arch/x86/include/asm/preempt.h:9: In file included from ./include/linux/thread_info.h:60: In file included from ./arch/x86/include/asm/thread_info.h:53: In file included from ./arch/x86/include/asm/cpufeature.h:5: In file included from ./arch/x86/include/asm/processor.h:23: In file included from ./arch/x86/include/asm/msr.h:11: In file included from ./arch/x86/include/asm/cpumask.h:5: In file included from ./include/linux/cpumask.h:12: In file included from ./include/linux/bitmap.h:11: In file included from ./include/linux/string.h:254: ./include/linux/fortify-string.h:592:4: warning: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] __read_overflow2_field(q_size_field, size);
The compiler actually complains on 'mac80211_hwsim_get_et_strings()' where fortification logic inteprets call to 'memcpy()' as an attempt to copy the whole 'mac80211_hwsim_gstrings_stats' array from its first member and so issues an overread warning. This warning may be silenced by passing an address of the whole array and not the first member to 'memcpy()'.
Signed-off-by: Dmitry Antipov dmantipov@yandex.ru Link: https://lore.kernel.org/r/20230829094140.234636-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/mac80211_hwsim.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index a21739b2f44e6..634e8c1e71cca 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -2323,7 +2323,7 @@ static void mac80211_hwsim_get_et_strings(struct ieee80211_hw *hw, u32 sset, u8 *data) { if (sset == ETH_SS_STATS) - memcpy(data, *mac80211_hwsim_gstrings_stats, + memcpy(data, mac80211_hwsim_gstrings_stats, sizeof(mac80211_hwsim_gstrings_stats)); }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ping-Ke Shih pkshih@realtek.com
[ Upstream commit e160ab85166e77347d0cbe5149045cb25e83937f ]
We can get a UBSAN warning if ieee80211_get_tx_power() returns the INT_MIN value mac80211 internally uses for "unset power level".
UBSAN: signed-integer-overflow in net/wireless/nl80211.c:3816:5 -2147483648 * 100 cannot be represented in type 'int' CPU: 0 PID: 20433 Comm: insmod Tainted: G WC OE Call Trace: dump_stack+0x74/0x92 ubsan_epilogue+0x9/0x50 handle_overflow+0x8d/0xd0 __ubsan_handle_mul_overflow+0xe/0x10 nl80211_send_iface+0x688/0x6b0 [cfg80211] [...] cfg80211_register_wdev+0x78/0xb0 [cfg80211] cfg80211_netdev_notifier_call+0x200/0x620 [cfg80211] [...] ieee80211_if_add+0x60e/0x8f0 [mac80211] ieee80211_register_hw+0xda5/0x1170 [mac80211]
In this case, simply return an error instead, to indicate that no data is available.
Cc: Zong-Zhe Yang kevin_yang@realtek.com Signed-off-by: Ping-Ke Shih pkshih@realtek.com Link: https://lore.kernel.org/r/20230203023636.4418-1-pkshih@realtek.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/cfg.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index 9e3bff5aaf8b8..6428c0d371458 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -2581,6 +2581,10 @@ static int ieee80211_get_tx_power(struct wiphy *wiphy, else *dbm = sdata->vif.bss_conf.txpower;
+ /* INT_MIN indicates no power level was set yet */ + if (*dbm == INT_MIN) + return -EINVAL; + return 0; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Antipov dmantipov@yandex.ru
[ Upstream commit 95f97fe0ac974467ab4da215985a32b2fdf48af0 ]
When compiling with clang 16.0.6 and CONFIG_FORTIFY_SOURCE=y, I've noticed the following (somewhat confusing due to absence of an actual source code location):
In file included from drivers/net/wireless/ath/ath9k/debug.c:17: In file included from ./include/linux/slab.h:16: In file included from ./include/linux/gfp.h:7: In file included from ./include/linux/mmzone.h:8: In file included from ./include/linux/spinlock.h:56: In file included from ./include/linux/preempt.h:79: In file included from ./arch/x86/include/asm/preempt.h:9: In file included from ./include/linux/thread_info.h:60: In file included from ./arch/x86/include/asm/thread_info.h:53: In file included from ./arch/x86/include/asm/cpufeature.h:5: In file included from ./arch/x86/include/asm/processor.h:23: In file included from ./arch/x86/include/asm/msr.h:11: In file included from ./arch/x86/include/asm/cpumask.h:5: In file included from ./include/linux/cpumask.h:12: In file included from ./include/linux/bitmap.h:11: In file included from ./include/linux/string.h:254: ./include/linux/fortify-string.h:592:4: warning: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] __read_overflow2_field(q_size_field, size);
In file included from drivers/net/wireless/ath/ath9k/htc_drv_debug.c:17: In file included from drivers/net/wireless/ath/ath9k/htc.h:20: In file included from ./include/linux/module.h:13: In file included from ./include/linux/stat.h:19: In file included from ./include/linux/time.h:60: In file included from ./include/linux/time32.h:13: In file included from ./include/linux/timex.h:67: In file included from ./arch/x86/include/asm/timex.h:5: In file included from ./arch/x86/include/asm/processor.h:23: In file included from ./arch/x86/include/asm/msr.h:11: In file included from ./arch/x86/include/asm/cpumask.h:5: In file included from ./include/linux/cpumask.h:12: In file included from ./include/linux/bitmap.h:11: In file included from ./include/linux/string.h:254: ./include/linux/fortify-string.h:592:4: warning: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] __read_overflow2_field(q_size_field, size);
The compiler actually complains on 'ath9k_get_et_strings()' and 'ath9k_htc_get_et_strings()' due to the same reason: fortification logic inteprets call to 'memcpy()' as an attempt to copy the whole array from it's first member and so issues an overread warning. These warnings may be silenced by passing an address of the whole array and not the first member to 'memcpy()'.
Signed-off-by: Dmitry Antipov dmantipov@yandex.ru Acked-by: Toke Høiland-Jørgensen toke@toke.dk Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://lore.kernel.org/r/20230829093856.234584-1-dmantipov@yandex.ru Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath9k/debug.c | 2 +- drivers/net/wireless/ath/ath9k/htc_drv_debug.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c index 859a865c59950..8d98347e0ddff 100644 --- a/drivers/net/wireless/ath/ath9k/debug.c +++ b/drivers/net/wireless/ath/ath9k/debug.c @@ -1284,7 +1284,7 @@ void ath9k_get_et_strings(struct ieee80211_hw *hw, u32 sset, u8 *data) { if (sset == ETH_SS_STATS) - memcpy(data, *ath9k_gstrings_stats, + memcpy(data, ath9k_gstrings_stats, sizeof(ath9k_gstrings_stats)); }
diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_debug.c b/drivers/net/wireless/ath/ath9k/htc_drv_debug.c index c55aab01fff5d..e79bbcd3279af 100644 --- a/drivers/net/wireless/ath/ath9k/htc_drv_debug.c +++ b/drivers/net/wireless/ath/ath9k/htc_drv_debug.c @@ -428,7 +428,7 @@ void ath9k_htc_get_et_strings(struct ieee80211_hw *hw, u32 sset, u8 *data) { if (sset == ETH_SS_STATS) - memcpy(data, *ath9k_htc_gstrings_stats, + memcpy(data, ath9k_htc_gstrings_stats, sizeof(ath9k_htc_gstrings_stats)); }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Antipov dmantipov@yandex.ru
[ Upstream commit cb4c132ebfeac5962f7258ffc831caa0c4dada1a ]
When compiling with clang 16.0.6 and CONFIG_FORTIFY_SOURCE=y, I've noticed the following (somewhat confusing due to absence of an actual source code location):
In file included from drivers/net/wireless/ath/ath10k/debug.c:8: In file included from ./include/linux/module.h:13: In file included from ./include/linux/stat.h:19: In file included from ./include/linux/time.h:60: In file included from ./include/linux/time32.h:13: In file included from ./include/linux/timex.h:67: In file included from ./arch/x86/include/asm/timex.h:5: In file included from ./arch/x86/include/asm/processor.h:23: In file included from ./arch/x86/include/asm/msr.h:11: In file included from ./arch/x86/include/asm/cpumask.h:5: In file included from ./include/linux/cpumask.h:12: In file included from ./include/linux/bitmap.h:11: In file included from ./include/linux/string.h:254: ./include/linux/fortify-string.h:592:4: warning: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] __read_overflow2_field(q_size_field, size);
The compiler actually complains on 'ath10k_debug_get_et_strings()' where fortification logic inteprets call to 'memcpy()' as an attempt to copy the whole 'ath10k_gstrings_stats' array from it's first member and so issues an overread warning. This warning may be silenced by passing an address of the whole array and not the first member to 'memcpy()'.
Signed-off-by: Dmitry Antipov dmantipov@yandex.ru Acked-by: Jeff Johnson quic_jjohnson@quicinc.com Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://lore.kernel.org/r/20230829093652.234537-1-dmantipov@yandex.ru Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath10k/debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c index 04c50a26a4f47..34db968c4bd0b 100644 --- a/drivers/net/wireless/ath/ath10k/debug.c +++ b/drivers/net/wireless/ath/ath10k/debug.c @@ -1138,7 +1138,7 @@ void ath10k_debug_get_et_strings(struct ieee80211_hw *hw, u32 sset, u8 *data) { if (sset == ETH_SS_STATS) - memcpy(data, *ath10k_gstrings_stats, + memcpy(data, ath10k_gstrings_stats, sizeof(ath10k_gstrings_stats)); }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 0bb4d124d34044179b42a769a0c76f389ae973b6 ]
This field can be read or written without socket lock being held.
Add annotations to avoid load-store tearing.
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/sock.h | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h index f73ef7087a187..b021c8912e2cf 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1782,21 +1782,33 @@ static inline void sk_tx_queue_set(struct sock *sk, int tx_queue) /* sk_tx_queue_mapping accept only upto a 16-bit value */ if (WARN_ON_ONCE((unsigned short)tx_queue >= USHRT_MAX)) return; - sk->sk_tx_queue_mapping = tx_queue; + /* Paired with READ_ONCE() in sk_tx_queue_get() and + * other WRITE_ONCE() because socket lock might be not held. + */ + WRITE_ONCE(sk->sk_tx_queue_mapping, tx_queue); }
#define NO_QUEUE_MAPPING USHRT_MAX
static inline void sk_tx_queue_clear(struct sock *sk) { - sk->sk_tx_queue_mapping = NO_QUEUE_MAPPING; + /* Paired with READ_ONCE() in sk_tx_queue_get() and + * other WRITE_ONCE() because socket lock might be not held. + */ + WRITE_ONCE(sk->sk_tx_queue_mapping, NO_QUEUE_MAPPING); }
static inline int sk_tx_queue_get(const struct sock *sk) { - if (sk && sk->sk_tx_queue_mapping != NO_QUEUE_MAPPING) - return sk->sk_tx_queue_mapping; + if (sk) { + /* Paired with WRITE_ONCE() in sk_tx_queue_clear() + * and sk_tx_queue_set(). + */ + int val = READ_ONCE(sk->sk_tx_queue_mapping);
+ if (val != NO_QUEUE_MAPPING) + return val; + } return -1; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit eb44ad4e635132754bfbcb18103f1dcb7058aedd ]
This field can be read or written without socket lock being held.
Add annotations to avoid load-store tearing.
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/sock.h | 6 +++--- net/core/sock.c | 2 +- net/ipv4/tcp_output.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h index b021c8912e2cf..5293f2b65fb55 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1941,7 +1941,7 @@ static inline void dst_negative_advice(struct sock *sk) if (ndst != dst) { rcu_assign_pointer(sk->sk_dst_cache, ndst); sk_tx_queue_clear(sk); - sk->sk_dst_pending_confirm = 0; + WRITE_ONCE(sk->sk_dst_pending_confirm, 0); } } } @@ -1952,7 +1952,7 @@ __sk_dst_set(struct sock *sk, struct dst_entry *dst) struct dst_entry *old_dst;
sk_tx_queue_clear(sk); - sk->sk_dst_pending_confirm = 0; + WRITE_ONCE(sk->sk_dst_pending_confirm, 0); old_dst = rcu_dereference_protected(sk->sk_dst_cache, lockdep_sock_is_held(sk)); rcu_assign_pointer(sk->sk_dst_cache, dst); @@ -1965,7 +1965,7 @@ sk_dst_set(struct sock *sk, struct dst_entry *dst) struct dst_entry *old_dst;
sk_tx_queue_clear(sk); - sk->sk_dst_pending_confirm = 0; + WRITE_ONCE(sk->sk_dst_pending_confirm, 0); old_dst = xchg((__force struct dst_entry **)&sk->sk_dst_cache, dst); dst_release(old_dst); } diff --git a/net/core/sock.c b/net/core/sock.c index 9979cd602dfac..2c3c5df139345 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -545,7 +545,7 @@ struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) { sk_tx_queue_clear(sk); - sk->sk_dst_pending_confirm = 0; + WRITE_ONCE(sk->sk_dst_pending_confirm, 0); RCU_INIT_POINTER(sk->sk_dst_cache, NULL); dst_release(dst); return NULL; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 0107436860171..1dce05bfa3005 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1103,7 +1103,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, skb_set_hash_from_sk(skb, sk); refcount_add(skb->truesize, &sk->sk_wmem_alloc);
- skb_set_dst_pending_confirm(skb, sk->sk_dst_pending_confirm); + skb_set_dst_pending_confirm(skb, READ_ONCE(sk->sk_dst_pending_confirm));
/* Build TCP header and checksum it. */ th = (struct tcphdr *)skb->data;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Douglas Anderson dianders@chromium.org
[ Upstream commit 170c75d43a77dc937c58f07ecf847ba1b42ab74e ]
As talked about in commit d66d24ac300c ("ath10k: Keep track of which interrupts fired, don't poll them"), if we access the copy engine register at a bad time then ath10k can go boom. However, it's not necessarily easy to know when it's safe to access them.
The ChromeOS test labs saw a crash that looked like this at shutdown/reboot time (on a chromeos-5.15 kernel, but likely the problem could also reproduce upstream):
Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP ... CPU: 4 PID: 6168 Comm: reboot Not tainted 5.15.111-lockdep-19350-g1d624fe6758f #1 010b9b233ab055c27c6dc88efb0be2f4e9e86f51 Hardware name: Google Kingoftown (DT) ... pc : ath10k_snoc_read32+0x50/0x74 [ath10k_snoc] lr : ath10k_snoc_read32+0x24/0x74 [ath10k_snoc] ... Call trace: ath10k_snoc_read32+0x50/0x74 [ath10k_snoc ...] ath10k_ce_disable_interrupt+0x190/0x65c [ath10k_core ...] ath10k_ce_disable_interrupts+0x8c/0x120 [ath10k_core ...] ath10k_snoc_hif_stop+0x78/0x660 [ath10k_snoc ...] ath10k_core_stop+0x13c/0x1ec [ath10k_core ...] ath10k_halt+0x398/0x5b0 [ath10k_core ...] ath10k_stop+0xfc/0x1a8 [ath10k_core ...] drv_stop+0x148/0x6b4 [mac80211 ...] ieee80211_stop_device+0x70/0x80 [mac80211 ...] ieee80211_do_stop+0x10d8/0x15b0 [mac80211 ...] ieee80211_stop+0x144/0x1a0 [mac80211 ...] __dev_close_many+0x1e8/0x2c0 dev_close_many+0x198/0x33c dev_close+0x140/0x210 cfg80211_shutdown_all_interfaces+0xc8/0x1e0 [cfg80211 ...] ieee80211_remove_interfaces+0x118/0x5c4 [mac80211 ...] ieee80211_unregister_hw+0x64/0x1f4 [mac80211 ...] ath10k_mac_unregister+0x4c/0xf0 [ath10k_core ...] ath10k_core_unregister+0x80/0xb0 [ath10k_core ...] ath10k_snoc_free_resources+0xb8/0x1ec [ath10k_snoc ...] ath10k_snoc_shutdown+0x98/0xd0 [ath10k_snoc ...] platform_shutdown+0x7c/0xa0 device_shutdown+0x3e0/0x58c kernel_restart_prepare+0x68/0xa0 kernel_restart+0x28/0x7c
Though there's no known way to reproduce the problem, it makes sense that it would be the same issue where we're trying to access copy engine registers when it's not allowed.
Let's fix this by changing how we "disable" the interrupts. Instead of tweaking the copy engine registers we'll just use disable_irq() and enable_irq(). Then we'll configure the interrupts once at power up time.
Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2.c10-00754-QCAHLSWMTPL-1
Signed-off-by: Douglas Anderson dianders@chromium.org Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://lore.kernel.org/r/20230630151842.1.If764ede23c4e09a43a842771c2ddf996... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath10k/snoc.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c index b6762fe2efe26..29d52f7b4336d 100644 --- a/drivers/net/wireless/ath/ath10k/snoc.c +++ b/drivers/net/wireless/ath/ath10k/snoc.c @@ -821,12 +821,20 @@ static void ath10k_snoc_hif_get_default_pipe(struct ath10k *ar,
static inline void ath10k_snoc_irq_disable(struct ath10k *ar) { - ath10k_ce_disable_interrupts(ar); + struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar); + int id; + + for (id = 0; id < CE_COUNT_MAX; id++) + disable_irq(ar_snoc->ce_irqs[id].irq_line); }
static inline void ath10k_snoc_irq_enable(struct ath10k *ar) { - ath10k_ce_enable_interrupts(ar); + struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar); + int id; + + for (id = 0; id < CE_COUNT_MAX; id++) + enable_irq(ar_snoc->ce_irqs[id].irq_line); }
static void ath10k_snoc_rx_pipe_cleanup(struct ath10k_snoc_pipe *snoc_pipe) @@ -1042,6 +1050,8 @@ static int ath10k_snoc_hif_power_up(struct ath10k *ar, goto err_free_rri; }
+ ath10k_ce_enable_interrupts(ar); + return 0;
err_free_rri: @@ -1196,8 +1206,8 @@ static int ath10k_snoc_request_irq(struct ath10k *ar)
for (id = 0; id < CE_COUNT_MAX; id++) { ret = request_irq(ar_snoc->ce_irqs[id].irq_line, - ath10k_snoc_per_engine_handler, 0, - ce_name[id], ar); + ath10k_snoc_per_engine_handler, + IRQF_NO_AUTOEN, ce_name[id], ar); if (ret) { ath10k_err(ar, "failed to register IRQ handler for CE %d: %d",
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: ZhengHan Wang wzhmmmmm@gmail.com
[ Upstream commit a85fb91e3d728bdfc80833167e8162cce8bc7004 ]
syzbot reports a slab use-after-free in hci_conn_hash_flush [1]. After releasing an object using hci_conn_del_sysfs in the hci_conn_cleanup function, releasing the same object again using the hci_dev_put and hci_conn_put functions causes a double free. Here's a simplified flow:
hci_conn_del_sysfs: hci_dev_put put_device kobject_put kref_put kobject_release kobject_cleanup kfree_const kfree(name)
hci_dev_put: ... kfree(name)
hci_conn_put: put_device ... kfree(name)
This patch drop the hci_dev_put and hci_conn_put function call in hci_conn_cleanup function, because the object is freed in hci_conn_del_sysfs function.
This patch also fixes the refcounting in hci_conn_add_sysfs() and hci_conn_del_sysfs() to take into account device_add() failures.
This fixes CVE-2023-28464.
Link: https://syzkaller.appspot.com/bug?id=1bb51491ca5df96a5f724899d1dbb87afda6141... [1]
Signed-off-by: ZhengHan Wang wzhmmmmm@gmail.com Co-developed-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_conn.c | 6 ++---- net/bluetooth/hci_sysfs.c | 23 ++++++++++++----------- 2 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index afdc0afa8ee7d..e129b7fb6540a 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -125,13 +125,11 @@ static void hci_conn_cleanup(struct hci_conn *conn) if (hdev->notify) hdev->notify(hdev, HCI_NOTIFY_CONN_DEL);
- hci_conn_del_sysfs(conn); - debugfs_remove_recursive(conn->debugfs);
- hci_dev_put(hdev); + hci_conn_del_sysfs(conn);
- hci_conn_put(conn); + hci_dev_put(hdev); }
static void le_scan_cleanup(struct work_struct *work) diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c index ccd2c377bf83c..266112c960ee8 100644 --- a/net/bluetooth/hci_sysfs.c +++ b/net/bluetooth/hci_sysfs.c @@ -33,7 +33,7 @@ void hci_conn_init_sysfs(struct hci_conn *conn) { struct hci_dev *hdev = conn->hdev;
- BT_DBG("conn %p", conn); + bt_dev_dbg(hdev, "conn %p", conn);
conn->dev.type = &bt_link; conn->dev.class = bt_class; @@ -46,27 +46,30 @@ void hci_conn_add_sysfs(struct hci_conn *conn) { struct hci_dev *hdev = conn->hdev;
- BT_DBG("conn %p", conn); + bt_dev_dbg(hdev, "conn %p", conn);
if (device_is_registered(&conn->dev)) return;
dev_set_name(&conn->dev, "%s:%d", hdev->name, conn->handle);
- if (device_add(&conn->dev) < 0) { + if (device_add(&conn->dev) < 0) bt_dev_err(hdev, "failed to register connection device"); - return; - } - - hci_dev_hold(hdev); }
void hci_conn_del_sysfs(struct hci_conn *conn) { struct hci_dev *hdev = conn->hdev;
- if (!device_is_registered(&conn->dev)) + bt_dev_dbg(hdev, "conn %p", conn); + + if (!device_is_registered(&conn->dev)) { + /* If device_add() has *not* succeeded, use *only* put_device() + * to drop the reference count. + */ + put_device(&conn->dev); return; + }
while (1) { struct device *dev; @@ -78,9 +81,7 @@ void hci_conn_del_sysfs(struct hci_conn *conn) put_device(dev); }
- device_del(&conn->dev); - - hci_dev_put(hdev); + device_unregister(&conn->dev); }
static void bt_host_release(struct device *dev)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Olli Asikainen olli.asikainen@gmail.com
[ Upstream commit 916646758aea81a143ce89103910f715ed923346 ]
Thinkpad X120e also needs this battery quirk.
Signed-off-by: Olli Asikainen olli.asikainen@gmail.com Link: https://lore.kernel.org/r/20231024190922.2742-1-olli.asikainen@gmail.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/thinkpad_acpi.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index 5d114088c88fb..f0d6bb567d1dc 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -9699,6 +9699,7 @@ static const struct tpacpi_quirk battery_quirk_table[] __initconst = { * Individual addressing is broken on models that expose the * primary battery as BAT1. */ + TPACPI_Q_LNV('8', 'F', true), /* Thinkpad X120e */ TPACPI_Q_LNV('J', '7', true), /* B5400 */ TPACPI_Q_LNV('J', 'I', true), /* Thinkpad 11e */ TPACPI_Q_LNV3('R', '0', 'B', true), /* Thinkpad 11e gen 3 */
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: baozhu.liu lucas.liu@siengine.com
[ Upstream commit 19ecbe8325a2a7ffda5ff4790955b84eaccba49f ]
If komeda_pipeline_unbound_components() returns -EDEADLK, it means that a deadlock happened in the locking context. Currently, komeda is not dealing with the deadlock properly,producing the following output when CONFIG_DEBUG_WW_MUTEX_SLOWPATH is enabled:
------------[ cut here ]------------ [ 26.103984] WARNING: CPU: 2 PID: 345 at drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c:1248 komeda_release_unclaimed_resources+0x13c/0x170 [ 26.117453] Modules linked in: [ 26.120511] CPU: 2 PID: 345 Comm: composer@2.1-se Kdump: loaded Tainted: G W 5.10.110-SE-SDK1.8-dirty #16 [ 26.131374] Hardware name: Siengine Se1000 Evaluation board (DT) [ 26.137379] pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--) [ 26.143385] pc : komeda_release_unclaimed_resources+0x13c/0x170 [ 26.149301] lr : komeda_release_unclaimed_resources+0xbc/0x170 [ 26.155130] sp : ffff800017b8b8d0 [ 26.158442] pmr_save: 000000e0 [ 26.161493] x29: ffff800017b8b8d0 x28: ffff000cf2f96200 [ 26.166805] x27: ffff000c8f5a8800 x26: 0000000000000000 [ 26.172116] x25: 0000000000000038 x24: ffff8000116a0140 [ 26.177428] x23: 0000000000000038 x22: ffff000cf2f96200 [ 26.182739] x21: ffff000cfc300300 x20: ffff000c8ab77080 [ 26.188051] x19: 0000000000000003 x18: 0000000000000000 [ 26.193362] x17: 0000000000000000 x16: 0000000000000000 [ 26.198672] x15: b400e638f738ba38 x14: 0000000000000000 [ 26.203983] x13: 0000000106400a00 x12: 0000000000000000 [ 26.209294] x11: 0000000000000000 x10: 0000000000000000 [ 26.214604] x9 : ffff800012f80000 x8 : ffff000ca3308000 [ 26.219915] x7 : 0000000ff3000000 x6 : ffff80001084034c [ 26.225226] x5 : ffff800017b8bc40 x4 : 000000000000000f [ 26.230536] x3 : ffff000ca3308000 x2 : 0000000000000000 [ 26.235847] x1 : 0000000000000000 x0 : ffffffffffffffdd [ 26.241158] Call trace: [ 26.243604] komeda_release_unclaimed_resources+0x13c/0x170 [ 26.249175] komeda_crtc_atomic_check+0x68/0xf0 [ 26.253706] drm_atomic_helper_check_planes+0x138/0x1f4 [ 26.258929] komeda_kms_check+0x284/0x36c [ 26.262939] drm_atomic_check_only+0x40c/0x714 [ 26.267381] drm_atomic_nonblocking_commit+0x1c/0x60 [ 26.272344] drm_mode_atomic_ioctl+0xa3c/0xb8c [ 26.276787] drm_ioctl_kernel+0xc4/0x120 [ 26.280708] drm_ioctl+0x268/0x534 [ 26.284109] __arm64_sys_ioctl+0xa8/0xf0 [ 26.288030] el0_svc_common.constprop.0+0x80/0x240 [ 26.292817] do_el0_svc+0x24/0x90 [ 26.296132] el0_svc+0x20/0x30 [ 26.299185] el0_sync_handler+0xe8/0xf0 [ 26.303018] el0_sync+0x1a4/0x1c0 [ 26.306330] irq event stamp: 0 [ 26.309384] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 26.315650] hardirqs last disabled at (0): [<ffff800010056d34>] copy_process+0x5d0/0x183c [ 26.323825] softirqs last enabled at (0): [<ffff800010056d34>] copy_process+0x5d0/0x183c [ 26.331997] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 26.338261] ---[ end trace 20ae984fa860184a ]--- [ 26.343021] ------------[ cut here ]------------ [ 26.347646] WARNING: CPU: 3 PID: 345 at drivers/gpu/drm/drm_modeset_lock.c:228 drm_modeset_drop_locks+0x84/0x90 [ 26.357727] Modules linked in: [ 26.360783] CPU: 3 PID: 345 Comm: composer@2.1-se Kdump: loaded Tainted: G W 5.10.110-SE-SDK1.8-dirty #16 [ 26.371645] Hardware name: Siengine Se1000 Evaluation board (DT) [ 26.377647] pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--) [ 26.383649] pc : drm_modeset_drop_locks+0x84/0x90 [ 26.388351] lr : drm_mode_atomic_ioctl+0x860/0xb8c [ 26.393137] sp : ffff800017b8bb10 [ 26.396447] pmr_save: 000000e0 [ 26.399497] x29: ffff800017b8bb10 x28: 0000000000000001 [ 26.404807] x27: 0000000000000038 x26: 0000000000000002 [ 26.410115] x25: ffff000cecbefa00 x24: ffff000cf2f96200 [ 26.415423] x23: 0000000000000001 x22: 0000000000000018 [ 26.420731] x21: 0000000000000001 x20: ffff800017b8bc10 [ 26.426039] x19: 0000000000000000 x18: 0000000000000000 [ 26.431347] x17: 0000000002e8bf2c x16: 0000000002e94c6b [ 26.436655] x15: 0000000002ea48b9 x14: ffff8000121f0300 [ 26.441963] x13: 0000000002ee2ca8 x12: ffff80001129cae0 [ 26.447272] x11: ffff800012435000 x10: ffff000ed46b5e88 [ 26.452580] x9 : ffff000c9935e600 x8 : 0000000000000000 [ 26.457888] x7 : 000000008020001e x6 : 000000008020001f [ 26.463196] x5 : ffff80001085fbe0 x4 : fffffe0033a59f20 [ 26.468504] x3 : 000000008020001e x2 : 0000000000000000 [ 26.473813] x1 : 0000000000000000 x0 : ffff000c8f596090 [ 26.479122] Call trace: [ 26.481566] drm_modeset_drop_locks+0x84/0x90 [ 26.485918] drm_mode_atomic_ioctl+0x860/0xb8c [ 26.490359] drm_ioctl_kernel+0xc4/0x120 [ 26.494278] drm_ioctl+0x268/0x534 [ 26.497677] __arm64_sys_ioctl+0xa8/0xf0 [ 26.501598] el0_svc_common.constprop.0+0x80/0x240 [ 26.506384] do_el0_svc+0x24/0x90 [ 26.509697] el0_svc+0x20/0x30 [ 26.512748] el0_sync_handler+0xe8/0xf0 [ 26.516580] el0_sync+0x1a4/0x1c0 [ 26.519891] irq event stamp: 0 [ 26.522943] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 26.529207] hardirqs last disabled at (0): [<ffff800010056d34>] copy_process+0x5d0/0x183c [ 26.537379] softirqs last enabled at (0): [<ffff800010056d34>] copy_process+0x5d0/0x183c [ 26.545550] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 26.551812] ---[ end trace 20ae984fa860184b ]---
According to the call trace information,it can be located to be WARN_ON(IS_ERR(c_st)) in the komeda_pipeline_unbound_components function; Then follow the function. komeda_pipeline_unbound_components -> komeda_component_get_state_and_set_user -> komeda_pipeline_get_state_and_set_crtc -> komeda_pipeline_get_state ->drm_atomic_get_private_obj_state -> drm_atomic_get_private_obj_state -> drm_modeset_lock
komeda_pipeline_unbound_components -> komeda_component_get_state_and_set_user -> komeda_component_get_state -> drm_atomic_get_private_obj_state -> drm_modeset_lock
ret = drm_modeset_lock(&obj->lock, state->acquire_ctx); if (ret) return ERR_PTR(ret); Here it return -EDEADLK.
deal with the deadlock as suggested by [1], using the function drm_modeset_backoff(). [1] https://docs.kernel.org/gpu/drm-kms.html?highlight=kms#kms-locking
Therefore, handling this problem can be solved by adding return -EDEADLK back to the drm_modeset_backoff processing flow in the drm_mode_atomic_ioctl function.
Signed-off-by: baozhu.liu lucas.liu@siengine.com Signed-off-by: menghui.huang menghui.huang@siengine.com Reviewed-by: Liviu Dudau liviu.dudau@arm.com Signed-off-by: Liviu Dudau liviu.dudau@arm.com Link: https://patchwork.freedesktop.org/patch/msgid/20230804013117.6870-1-menghui.... Signed-off-by: Sasha Levin sashal@kernel.org --- .../gpu/drm/arm/display/komeda/komeda_pipeline_state.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c b/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c index b848270e0a1f4..31527fb66b5c5 100644 --- a/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c +++ b/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c @@ -1171,7 +1171,7 @@ int komeda_build_display_data_flow(struct komeda_crtc *kcrtc, return 0; }
-static void +static int komeda_pipeline_unbound_components(struct komeda_pipeline *pipe, struct komeda_pipeline_state *new) { @@ -1190,8 +1190,12 @@ komeda_pipeline_unbound_components(struct komeda_pipeline *pipe, c = komeda_pipeline_get_component(pipe, id); c_st = komeda_component_get_state_and_set_user(c, drm_st, NULL, new->crtc); + if (PTR_ERR(c_st) == -EDEADLK) + return -EDEADLK; WARN_ON(IS_ERR(c_st)); } + + return 0; }
/* release unclaimed pipeline resource */ @@ -1213,9 +1217,8 @@ int komeda_release_unclaimed_resources(struct komeda_pipeline *pipe, if (WARN_ON(IS_ERR_OR_NULL(st))) return -EINVAL;
- komeda_pipeline_unbound_components(pipe, st); + return komeda_pipeline_unbound_components(pipe, st);
- return 0; }
void komeda_pipeline_disable(struct komeda_pipeline *pipe,
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit 760efbca74a405dc439a013a5efaa9fadc95a8c3 ]
For pptable structs that use flexible array sizes, use flexible arrays.
Suggested-by: Felix Held felix.held@amd.com Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2874 Signed-off-by: Mario Limonciello mario.limonciello@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/include/pptable.h | 4 ++-- drivers/gpu/drm/amd/powerplay/hwmgr/pptable_v1_0.h | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/include/pptable.h b/drivers/gpu/drm/amd/include/pptable.h index 0b6a057e0a4c4..5aac8d545bdc6 100644 --- a/drivers/gpu/drm/amd/include/pptable.h +++ b/drivers/gpu/drm/amd/include/pptable.h @@ -78,7 +78,7 @@ typedef struct _ATOM_PPLIB_THERMALCONTROLLER typedef struct _ATOM_PPLIB_STATE { UCHAR ucNonClockStateIndex; - UCHAR ucClockStateIndices[1]; // variable-sized + UCHAR ucClockStateIndices[]; // variable-sized } ATOM_PPLIB_STATE;
@@ -473,7 +473,7 @@ typedef struct _ATOM_PPLIB_STATE_V2 /** * Driver will read the first ucNumDPMLevels in this array */ - UCHAR clockInfoIndex[1]; + UCHAR clockInfoIndex[]; } ATOM_PPLIB_STATE_V2;
typedef struct _StateArray{ diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/pptable_v1_0.h b/drivers/gpu/drm/amd/powerplay/hwmgr/pptable_v1_0.h index 1e870f58dd12a..d5a4a08c6d392 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/pptable_v1_0.h +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/pptable_v1_0.h @@ -179,7 +179,7 @@ typedef struct _ATOM_Tonga_MCLK_Dependency_Record { typedef struct _ATOM_Tonga_MCLK_Dependency_Table { UCHAR ucRevId; UCHAR ucNumEntries; /* Number of entries. */ - ATOM_Tonga_MCLK_Dependency_Record entries[1]; /* Dynamically allocate entries. */ + ATOM_Tonga_MCLK_Dependency_Record entries[]; /* Dynamically allocate entries. */ } ATOM_Tonga_MCLK_Dependency_Table;
typedef struct _ATOM_Tonga_SCLK_Dependency_Record { @@ -194,7 +194,7 @@ typedef struct _ATOM_Tonga_SCLK_Dependency_Record { typedef struct _ATOM_Tonga_SCLK_Dependency_Table { UCHAR ucRevId; UCHAR ucNumEntries; /* Number of entries. */ - ATOM_Tonga_SCLK_Dependency_Record entries[1]; /* Dynamically allocate entries. */ + ATOM_Tonga_SCLK_Dependency_Record entries[]; /* Dynamically allocate entries. */ } ATOM_Tonga_SCLK_Dependency_Table;
typedef struct _ATOM_Polaris_SCLK_Dependency_Record {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit 0f0e59075b5c22f1e871fbd508d6e4f495048356 ]
For pptable structs that use flexible array sizes, use flexible arrays.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742 Signed-off-by: Mario Limonciello mario.limonciello@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/powerplay/hwmgr/pptable_v1_0.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/pptable_v1_0.h b/drivers/gpu/drm/amd/powerplay/hwmgr/pptable_v1_0.h index d5a4a08c6d392..0c61e2bc14cde 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/pptable_v1_0.h +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/pptable_v1_0.h @@ -164,7 +164,7 @@ typedef struct _ATOM_Tonga_State { typedef struct _ATOM_Tonga_State_Array { UCHAR ucRevId; UCHAR ucNumEntries; /* Number of entries. */ - ATOM_Tonga_State entries[1]; /* Dynamically allocate entries. */ + ATOM_Tonga_State entries[]; /* Dynamically allocate entries. */ } ATOM_Tonga_State_Array;
typedef struct _ATOM_Tonga_MCLK_Dependency_Record { @@ -210,7 +210,7 @@ typedef struct _ATOM_Polaris_SCLK_Dependency_Record { typedef struct _ATOM_Polaris_SCLK_Dependency_Table { UCHAR ucRevId; UCHAR ucNumEntries; /* Number of entries. */ - ATOM_Polaris_SCLK_Dependency_Record entries[1]; /* Dynamically allocate entries. */ + ATOM_Polaris_SCLK_Dependency_Record entries[]; /* Dynamically allocate entries. */ } ATOM_Polaris_SCLK_Dependency_Table;
typedef struct _ATOM_Tonga_PCIE_Record { @@ -222,7 +222,7 @@ typedef struct _ATOM_Tonga_PCIE_Record { typedef struct _ATOM_Tonga_PCIE_Table { UCHAR ucRevId; UCHAR ucNumEntries; /* Number of entries. */ - ATOM_Tonga_PCIE_Record entries[1]; /* Dynamically allocate entries. */ + ATOM_Tonga_PCIE_Record entries[]; /* Dynamically allocate entries. */ } ATOM_Tonga_PCIE_Table;
typedef struct _ATOM_Polaris10_PCIE_Record { @@ -235,7 +235,7 @@ typedef struct _ATOM_Polaris10_PCIE_Record { typedef struct _ATOM_Polaris10_PCIE_Table { UCHAR ucRevId; UCHAR ucNumEntries; /* Number of entries. */ - ATOM_Polaris10_PCIE_Record entries[1]; /* Dynamically allocate entries. */ + ATOM_Polaris10_PCIE_Record entries[]; /* Dynamically allocate entries. */ } ATOM_Polaris10_PCIE_Table;
@@ -252,7 +252,7 @@ typedef struct _ATOM_Tonga_MM_Dependency_Record { typedef struct _ATOM_Tonga_MM_Dependency_Table { UCHAR ucRevId; UCHAR ucNumEntries; /* Number of entries. */ - ATOM_Tonga_MM_Dependency_Record entries[1]; /* Dynamically allocate entries. */ + ATOM_Tonga_MM_Dependency_Record entries[]; /* Dynamically allocate entries. */ } ATOM_Tonga_MM_Dependency_Table;
typedef struct _ATOM_Tonga_Voltage_Lookup_Record { @@ -265,7 +265,7 @@ typedef struct _ATOM_Tonga_Voltage_Lookup_Record { typedef struct _ATOM_Tonga_Voltage_Lookup_Table { UCHAR ucRevId; UCHAR ucNumEntries; /* Number of entries. */ - ATOM_Tonga_Voltage_Lookup_Record entries[1]; /* Dynamically allocate entries. */ + ATOM_Tonga_Voltage_Lookup_Record entries[]; /* Dynamically allocate entries. */ } ATOM_Tonga_Voltage_Lookup_Table;
typedef struct _ATOM_Tonga_Fan_Table {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Huang qu.huang@linux.dev
[ Upstream commit 5104fdf50d326db2c1a994f8b35dcd46e63ae4ad ]
In certain types of chips, such as VEGA20, reading the amdgpu_regs_smc file could result in an abnormal null pointer access when the smc_rreg pointer is NULL. Below are the steps to reproduce this issue and the corresponding exception log:
1. Navigate to the directory: /sys/kernel/debug/dri/0 2. Execute command: cat amdgpu_regs_smc 3. Exception Log:: [4005007.702554] BUG: kernel NULL pointer dereference, address: 0000000000000000 [4005007.702562] #PF: supervisor instruction fetch in kernel mode [4005007.702567] #PF: error_code(0x0010) - not-present page [4005007.702570] PGD 0 P4D 0 [4005007.702576] Oops: 0010 [#1] SMP NOPTI [4005007.702581] CPU: 4 PID: 62563 Comm: cat Tainted: G OE 5.15.0-43-generic #46-Ubunt u [4005007.702590] RIP: 0010:0x0 [4005007.702598] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [4005007.702600] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206 [4005007.702605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68 [4005007.702609] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000 [4005007.702612] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980 [4005007.702615] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000 [4005007.702618] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000 [4005007.702622] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000 [4005007.702626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4005007.702629] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0 [4005007.702633] Call Trace: [4005007.702636] <TASK> [4005007.702640] amdgpu_debugfs_regs_smc_read+0xb0/0x120 [amdgpu] [4005007.703002] full_proxy_read+0x5c/0x80 [4005007.703011] vfs_read+0x9f/0x1a0 [4005007.703019] ksys_read+0x67/0xe0 [4005007.703023] __x64_sys_read+0x19/0x20 [4005007.703028] do_syscall_64+0x5c/0xc0 [4005007.703034] ? do_user_addr_fault+0x1e3/0x670 [4005007.703040] ? exit_to_user_mode_prepare+0x37/0xb0 [4005007.703047] ? irqentry_exit_to_user_mode+0x9/0x20 [4005007.703052] ? irqentry_exit+0x19/0x30 [4005007.703057] ? exc_page_fault+0x89/0x160 [4005007.703062] ? asm_exc_page_fault+0x8/0x30 [4005007.703068] entry_SYSCALL_64_after_hwframe+0x44/0xae [4005007.703075] RIP: 0033:0x7f5e07672992 [4005007.703079] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 e c 28 48 89 54 24 [4005007.703083] RSP: 002b:00007ffe03097898 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [4005007.703088] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f5e07672992 [4005007.703091] RDX: 0000000000020000 RSI: 00007f5e06753000 RDI: 0000000000000003 [4005007.703094] RBP: 00007f5e06753000 R08: 00007f5e06752010 R09: 00007f5e06752010 [4005007.703096] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000 [4005007.703099] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [4005007.703105] </TASK> [4005007.703107] Modules linked in: nf_tables libcrc32c nfnetlink algif_hash af_alg binfmt_misc nls_ iso8859_1 ipmi_ssif ast intel_rapl_msr intel_rapl_common drm_vram_helper drm_ttm_helper amd64_edac t tm edac_mce_amd kvm_amd ccp mac_hid k10temp kvm acpi_ipmi ipmi_si rapl sch_fq_codel ipmi_devintf ipm i_msghandler msr parport_pc ppdev lp parport mtd pstore_blk efi_pstore ramoops pstore_zone reed_solo mon ip_tables x_tables autofs4 ib_uverbs ib_core amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) iommu_v 2 amd_sched(OE) amdkcl(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm igb ahci xhci_pci libahci i2c_piix4 i2c_algo_bit xhci_pci_renesas dca [4005007.703184] CR2: 0000000000000000 [4005007.703188] ---[ end trace ac65a538d240da39 ]--- [4005007.800865] RIP: 0010:0x0 [4005007.800871] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [4005007.800874] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206 [4005007.800878] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68 [4005007.800881] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000 [4005007.800883] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980 [4005007.800886] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000 [4005007.800888] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000 [4005007.800891] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000 [4005007.800895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4005007.800898] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0
Signed-off-by: Qu Huang qu.huang@linux.dev Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c index a9a81e55777bf..d81034023144a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c @@ -392,6 +392,9 @@ static ssize_t amdgpu_debugfs_regs_smc_read(struct file *f, char __user *buf, ssize_t result = 0; int r;
+ if (!adev->smc_rreg) + return -EPERM; + if (size & 0x3 || *pos & 0x3) return -EINVAL;
@@ -431,6 +434,9 @@ static ssize_t amdgpu_debugfs_regs_smc_write(struct file *f, const char __user * ssize_t result = 0; int r;
+ if (!adev->smc_wreg) + return -EPERM; + if (size & 0x3 || *pos & 0x3) return -EINVAL;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: zhujun2 zhujun2@cmss.chinamobile.com
[ Upstream commit 3f6f8a8c5e11a9b384a36df4f40f0c9a653b6975 ]
The opened file should be closed in main(), otherwise resource leak will occur that this problem was discovered by code reading
Signed-off-by: zhujun2 zhujun2@cmss.chinamobile.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/efivarfs/create-read.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/efivarfs/create-read.c b/tools/testing/selftests/efivarfs/create-read.c index 9674a19396a32..7bc7af4eb2c17 100644 --- a/tools/testing/selftests/efivarfs/create-read.c +++ b/tools/testing/selftests/efivarfs/create-read.c @@ -32,8 +32,10 @@ int main(int argc, char **argv) rc = read(fd, buf, sizeof(buf)); if (rc != 0) { fprintf(stderr, "Reading a new var should return EOF\n"); + close(fd); return EXIT_FAILURE; }
+ close(fd); return EXIT_SUCCESS; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lu Jialin lujialin4@huawei.com
[ Upstream commit 8f4f68e788c3a7a696546291258bfa5fdb215523 ]
We found a hungtask bug in test_aead_vec_cfg as follows:
INFO: task cryptomgr_test:391009 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Call trace: __switch_to+0x98/0xe0 __schedule+0x6c4/0xf40 schedule+0xd8/0x1b4 schedule_timeout+0x474/0x560 wait_for_common+0x368/0x4e0 wait_for_completion+0x20/0x30 wait_for_completion+0x20/0x30 test_aead_vec_cfg+0xab4/0xd50 test_aead+0x144/0x1f0 alg_test_aead+0xd8/0x1e0 alg_test+0x634/0x890 cryptomgr_test+0x40/0x70 kthread+0x1e0/0x220 ret_from_fork+0x10/0x18 Kernel panic - not syncing: hung_task: blocked tasks
For padata_do_parallel, when the return err is 0 or -EBUSY, it will call wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal case, aead_request_complete() will be called in pcrypt_aead_serial and the return err is 0 for padata_do_parallel. But, when pinst->flags is PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it won't call aead_request_complete(). Therefore, test_aead_vec_cfg will hung at wait_for_completion(&wait->completion), which will cause hungtask.
The problem comes as following: (padata_do_parallel) | rcu_read_lock_bh(); | err = -EINVAL; | (padata_replace) | pinst->flags |= PADATA_RESET; err = -EBUSY | if (pinst->flags & PADATA_RESET) | rcu_read_unlock_bh() | return err
In order to resolve the problem, we replace the return err -EBUSY with -EAGAIN, which means parallel_data is changing, and the caller should call it again.
v3: remove retry and just change the return err. v2: introduce padata_try_do_parallel() in pcrypt_aead_encrypt and pcrypt_aead_decrypt to solve the hungtask.
Signed-off-by: Lu Jialin lujialin4@huawei.com Signed-off-by: Guo Zihua guozihua@huawei.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Sasha Levin sashal@kernel.org --- crypto/pcrypt.c | 4 ++++ kernel/padata.c | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c index 276d2fd9e911c..63e64164900e8 100644 --- a/crypto/pcrypt.c +++ b/crypto/pcrypt.c @@ -118,6 +118,8 @@ static int pcrypt_aead_encrypt(struct aead_request *req) err = padata_do_parallel(ictx->psenc, padata, &ctx->cb_cpu); if (!err) return -EINPROGRESS; + if (err == -EBUSY) + return -EAGAIN;
return err; } @@ -165,6 +167,8 @@ static int pcrypt_aead_decrypt(struct aead_request *req) err = padata_do_parallel(ictx->psdec, padata, &ctx->cb_cpu); if (!err) return -EINPROGRESS; + if (err == -EBUSY) + return -EAGAIN;
return err; } diff --git a/kernel/padata.c b/kernel/padata.c index 92a4867e8adc7..a544da60014c0 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -130,7 +130,7 @@ int padata_do_parallel(struct padata_shell *ps, *cb_cpu = cpu; }
- err = -EBUSY; + err = -EBUSY; if ((pinst->flags & PADATA_RESET)) goto out;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
[ Upstream commit 8bf7187d978610b9e327a3d92728c8864a575ebd ]
Use FIELD_GET() to extract PCIe Negotiated Link Width field instead of custom masking and shifting, and remove extract_width() which only wraps that FIELD_GET().
Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/r/20230919125648.1920-2-ilpo.jarvinen@linux.intel.co... Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Reviewed-by: Dean Luick dean.luick@cornelisnetworks.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hfi1/pcie.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c index 61362bd6d3ced..111705e6609c9 100644 --- a/drivers/infiniband/hw/hfi1/pcie.c +++ b/drivers/infiniband/hw/hfi1/pcie.c @@ -45,6 +45,7 @@ * */
+#include <linux/bitfield.h> #include <linux/pci.h> #include <linux/io.h> #include <linux/delay.h> @@ -261,12 +262,6 @@ static u32 extract_speed(u16 linkstat) return speed; }
-/* return the PCIe link speed from the given link status */ -static u32 extract_width(u16 linkstat) -{ - return (linkstat & PCI_EXP_LNKSTA_NLW) >> PCI_EXP_LNKSTA_NLW_SHIFT; -} - /* read the link status and set dd->{lbus_width,lbus_speed,lbus_info} */ static void update_lbus_info(struct hfi1_devdata *dd) { @@ -279,7 +274,7 @@ static void update_lbus_info(struct hfi1_devdata *dd) return; }
- dd->lbus_width = extract_width(linkstat); + dd->lbus_width = FIELD_GET(PCI_EXP_LNKSTA_NLW, linkstat); dd->lbus_speed = extract_speed(linkstat); snprintf(dd->lbus_info, sizeof(dd->lbus_info), "PCIe,%uMHz,x%u", dd->lbus_speed, dd->lbus_width);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juntong Deng juntong.deng@outlook.com
[ Upstream commit 525b861a008143048535011f3816d407940f4bfa ]
l2nbperpage is log2(number of blks per page), and the minimum legal value should be 0, not negative.
In the case of l2nbperpage being negative, an error will occur when subsequently used as shift exponent.
Syzbot reported this bug:
UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:799:12 shift exponent -16777216 is negative
Reported-by: syzbot+debee9ab7ae2b34b0307@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=debee9ab7ae2b34b0307 Signed-off-by: Juntong Deng juntong.deng@outlook.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jfs/jfs_dmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c index a785c747a8cbb..495a1c6e5fd46 100644 --- a/fs/jfs/jfs_dmap.c +++ b/fs/jfs/jfs_dmap.c @@ -180,7 +180,8 @@ int dbMount(struct inode *ipbmap) bmp->db_nfree = le64_to_cpu(dbmp_le->dn_nfree);
bmp->db_l2nbperpage = le32_to_cpu(dbmp_le->dn_l2nbperpage); - if (bmp->db_l2nbperpage > L2PSIZE - L2MINBLOCKSIZE) { + if (bmp->db_l2nbperpage > L2PSIZE - L2MINBLOCKSIZE || + bmp->db_l2nbperpage < 0) { err = -EINVAL; goto err_release_metapage; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juntong Deng juntong.deng@outlook.com
[ Upstream commit 64933ab7b04881c6c18b21ff206c12278341c72e ]
Both db_maxag and db_agpref are used as the index of the db_agfree array, but there is currently no validity check for db_maxag and db_agpref, which can lead to errors.
The following is related bug reported by Syzbot:
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:639:20 index 7936 is out of range for type 'atomic_t[128]'
Add checking that the values of db_maxag and db_agpref are valid indexes for the db_agfree array.
Reported-by: syzbot+38e876a8aa44b7115c76@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=38e876a8aa44b7115c76 Signed-off-by: Juntong Deng juntong.deng@outlook.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jfs/jfs_dmap.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c index 495a1c6e5fd46..b23b219b20aab 100644 --- a/fs/jfs/jfs_dmap.c +++ b/fs/jfs/jfs_dmap.c @@ -195,6 +195,12 @@ int dbMount(struct inode *ipbmap) bmp->db_maxlevel = le32_to_cpu(dbmp_le->dn_maxlevel); bmp->db_maxag = le32_to_cpu(dbmp_le->dn_maxag); bmp->db_agpref = le32_to_cpu(dbmp_le->dn_agpref); + if (bmp->db_maxag >= MAXAG || bmp->db_maxag < 0 || + bmp->db_agpref >= MAXAG || bmp->db_agpref < 0) { + err = -EINVAL; + goto err_release_metapage; + } + bmp->db_aglevel = le32_to_cpu(dbmp_le->dn_aglevel); bmp->db_agheight = le32_to_cpu(dbmp_le->dn_agheight); bmp->db_agwidth = le32_to_cpu(dbmp_le->dn_agwidth);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Manas Ghandat ghandatmanas@gmail.com
[ Upstream commit 22cad8bc1d36547cdae0eef316c47d917ce3147c ]
Currently while searching for dmtree_t for sufficient free blocks there is an array out of bounds while getting element in tp->dm_stree. To add the required check for out of bound we first need to determine the type of dmtree. Thus added an extra parameter to dbFindLeaf so that the type of tree can be determined and the required check can be applied.
Reported-by: syzbot+aea1ad91e854d0a83e04@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=aea1ad91e854d0a83e04 Signed-off-by: Manas Ghandat ghandatmanas@gmail.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jfs/jfs_dmap.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c index b23b219b20aab..ea330ce921b1a 100644 --- a/fs/jfs/jfs_dmap.c +++ b/fs/jfs/jfs_dmap.c @@ -87,7 +87,7 @@ static int dbAllocCtl(struct bmap * bmp, s64 nblocks, int l2nb, s64 blkno, static int dbExtend(struct inode *ip, s64 blkno, s64 nblocks, s64 addnblocks); static int dbFindBits(u32 word, int l2nb); static int dbFindCtl(struct bmap * bmp, int l2nb, int level, s64 * blkno); -static int dbFindLeaf(dmtree_t * tp, int l2nb, int *leafidx); +static int dbFindLeaf(dmtree_t *tp, int l2nb, int *leafidx, bool is_ctl); static int dbFreeBits(struct bmap * bmp, struct dmap * dp, s64 blkno, int nblocks); static int dbFreeDmap(struct bmap * bmp, struct dmap * dp, s64 blkno, @@ -1785,7 +1785,7 @@ static int dbFindCtl(struct bmap * bmp, int l2nb, int level, s64 * blkno) * dbFindLeaf() returns the index of the leaf at which * free space was found. */ - rc = dbFindLeaf((dmtree_t *) dcp, l2nb, &leafidx); + rc = dbFindLeaf((dmtree_t *) dcp, l2nb, &leafidx, true);
/* release the buffer. */ @@ -2032,7 +2032,7 @@ dbAllocDmapLev(struct bmap * bmp, * free space. if sufficient free space is found, dbFindLeaf() * returns the index of the leaf at which free space was found. */ - if (dbFindLeaf((dmtree_t *) & dp->tree, l2nb, &leafidx)) + if (dbFindLeaf((dmtree_t *) &dp->tree, l2nb, &leafidx, false)) return -ENOSPC;
if (leafidx < 0) @@ -2992,14 +2992,18 @@ static void dbAdjTree(dmtree_t * tp, int leafno, int newval) * leafidx - return pointer to be set to the index of the leaf * describing at least l2nb free blocks if sufficient * free blocks are found. + * is_ctl - determines if the tree is of type ctl * * RETURN VALUES: * 0 - success * -ENOSPC - insufficient free blocks. */ -static int dbFindLeaf(dmtree_t * tp, int l2nb, int *leafidx) +static int dbFindLeaf(dmtree_t *tp, int l2nb, int *leafidx, bool is_ctl) { int ti, n = 0, k, x = 0; + int max_size; + + max_size = is_ctl ? CTLTREESIZE : TREESIZE;
/* first check the root of the tree to see if there is * sufficient free space. @@ -3020,6 +3024,8 @@ static int dbFindLeaf(dmtree_t * tp, int l2nb, int *leafidx) /* sufficient free space found. move to the next * level (or quit if this is the last level). */ + if (x + n > max_size) + return -ENOSPC; if (l2nb <= tp->dmt_stree[x + n]) break; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Manas Ghandat ghandatmanas@gmail.com
[ Upstream commit 05d9ea1ceb62a55af6727a69269a4fd310edf483 ]
Currently there is not check against the agno of the iag while allocating new inodes to avoid fragmentation problem. Added the check which is required.
Reported-by: syzbot+79d792676d8ac050949f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=79d792676d8ac050949f Signed-off-by: Manas Ghandat ghandatmanas@gmail.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jfs/jfs_imap.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/jfs/jfs_imap.c b/fs/jfs/jfs_imap.c index 67c67604b8c85..14f918a4831d3 100644 --- a/fs/jfs/jfs_imap.c +++ b/fs/jfs/jfs_imap.c @@ -1322,7 +1322,7 @@ diInitInode(struct inode *ip, int iagno, int ino, int extno, struct iag * iagp) int diAlloc(struct inode *pip, bool dir, struct inode *ip) { int rc, ino, iagno, addext, extno, bitno, sword; - int nwords, rem, i, agno; + int nwords, rem, i, agno, dn_numag; u32 mask, inosmap, extsmap; struct inode *ipimap; struct metapage *mp; @@ -1358,6 +1358,9 @@ int diAlloc(struct inode *pip, bool dir, struct inode *ip)
/* get the ag number of this iag */ agno = BLKTOAG(JFS_IP(pip)->agstart, JFS_SBI(pip->i_sb)); + dn_numag = JFS_SBI(pip->i_sb)->bmap->db_numag; + if (agno < 0 || agno > dn_numag) + return -EIO;
if (atomic_read(&JFS_SBI(pip->i_sb)->bmap->db_active[agno])) { /*
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vincent Whitchurch vincent.whitchurch@axis.com
[ Upstream commit b0150014878c32197cfa66e3e2f79e57f66babc0 ]
Place IRQ handlers such as gic_handle_irq() in the irqentry section even if FUNCTION_GRAPH_TRACER is not enabled. Without this, the stack depot's filter_irq_stacks() does not correctly filter out IRQ stacks in those configurations, which hampers deduplication and eventually leads to "Stack depot reached limit capacity" splats with KASAN.
A similar fix was done for arm64 in commit f6794950f0e5ba37e3bbed ("arm64: set __exception_irq_entry with __irq_entry as a default").
Link: https://lore.kernel.org/r/20230803-arm-irqentry-v1-1-8aad8e260b1c@axis.com
Signed-off-by: Vincent Whitchurch vincent.whitchurch@axis.com Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/include/asm/exception.h | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/arm/include/asm/exception.h b/arch/arm/include/asm/exception.h index 58e039a851af0..3c82975d46db3 100644 --- a/arch/arm/include/asm/exception.h +++ b/arch/arm/include/asm/exception.h @@ -10,10 +10,6 @@
#include <linux/interrupt.h>
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER #define __exception_irq_entry __irq_entry -#else -#define __exception_irq_entry -#endif
#endif /* __ASM_ARM_EXCEPTION_H */
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cezary Rojewski cezary.rojewski@intel.com
[ Upstream commit f93dc90c2e8ed664985e366aa6459ac83cdab236 ]
While AudioDSP drivers assign streams exclusively of HOST or LINK type, nothing blocks a user to attempt to assign a COUPLED stream. As supplied substream instance may be a stub, what is the case when code-loading, such scenario ends with null-ptr-deref.
Signed-off-by: Cezary Rojewski cezary.rojewski@intel.com Link: https://lore.kernel.org/r/20231006102857.749143-2-cezary.rojewski@intel.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/hda/hdac_stream.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/sound/hda/hdac_stream.c b/sound/hda/hdac_stream.c index 2beb94828729d..f810f401c1de8 100644 --- a/sound/hda/hdac_stream.c +++ b/sound/hda/hdac_stream.c @@ -313,8 +313,10 @@ struct hdac_stream *snd_hdac_stream_assign(struct hdac_bus *bus, struct hdac_stream *res = NULL;
/* make a non-zero unique key for the substream */ - int key = (substream->pcm->device << 16) | (substream->number << 2) | - (substream->stream + 1); + int key = (substream->number << 2) | (substream->stream + 1); + + if (substream->pcm) + key |= (substream->pcm->device << 16);
spin_lock_irq(&bus->reg_lock); list_for_each_entry(azx_dev, &bus->stream_list, list) {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
[ Upstream commit 759574abd78e3b47ec45bbd31a64e8832cf73f97 ]
Use FIELD_GET() to extract PCIe Negotiated Link Width field instead of custom masking and shifting.
Similarly, change custom code that misleadingly used PCI_EXP_LNKSTA_NLW_SHIFT to prepare value for PCI_EXP_LNKCAP write to use FIELD_PREP() with correct field define (PCI_EXP_LNKCAP_MLW).
Link: https://lore.kernel.org/r/20230919125648.1920-5-ilpo.jarvinen@linux.intel.co... Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/controller/dwc/pcie-tegra194.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c index 120d64c1a27fd..1cf94854c44fd 100644 --- a/drivers/pci/controller/dwc/pcie-tegra194.c +++ b/drivers/pci/controller/dwc/pcie-tegra194.c @@ -7,6 +7,7 @@ * Author: Vidya Sagar vidyas@nvidia.com */
+#include <linux/bitfield.h> #include <linux/clk.h> #include <linux/debugfs.h> #include <linux/delay.h> @@ -321,8 +322,7 @@ static void apply_bad_link_workaround(struct pcie_port *pp) */ val = dw_pcie_readw_dbi(pci, pcie->pcie_cap_base + PCI_EXP_LNKSTA); if (val & PCI_EXP_LNKSTA_LBMS) { - current_link_width = (val & PCI_EXP_LNKSTA_NLW) >> - PCI_EXP_LNKSTA_NLW_SHIFT; + current_link_width = FIELD_GET(PCI_EXP_LNKSTA_NLW, val); if (pcie->init_link_width > current_link_width) { dev_warn(pci->dev, "PCIe link is bad, width reduced\n"); val = dw_pcie_readw_dbi(pci, pcie->pcie_cap_base + @@ -596,8 +596,7 @@ static void tegra_pcie_enable_system_interrupts(struct pcie_port *pp)
val_w = dw_pcie_readw_dbi(&pcie->pci, pcie->pcie_cap_base + PCI_EXP_LNKSTA); - pcie->init_link_width = (val_w & PCI_EXP_LNKSTA_NLW) >> - PCI_EXP_LNKSTA_NLW_SHIFT; + pcie->init_link_width = FIELD_GET(PCI_EXP_LNKSTA_NLW, val_w);
val_w = dw_pcie_readw_dbi(&pcie->pci, pcie->pcie_cap_base + PCI_EXP_LNKCTL); @@ -773,7 +772,7 @@ static void tegra_pcie_prepare_host(struct pcie_port *pp) /* Configure Max lane width from DT */ val = dw_pcie_readl_dbi(pci, pcie->pcie_cap_base + PCI_EXP_LNKCAP); val &= ~PCI_EXP_LNKCAP_MLW; - val |= (pcie->num_lanes << PCI_EXP_LNKSTA_NLW_SHIFT); + val |= FIELD_PREP(PCI_EXP_LNKCAP_MLW, pcie->num_lanes); dw_pcie_writel_dbi(pci, pcie->pcie_cap_base + PCI_EXP_LNKCAP, val);
config_gen3_gen4_eq_presets(pcie);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
[ Upstream commit c28742447ca9879b52fbaf022ad844f0ffcd749c ]
In get_esi() PCI errors are checked inside line-split "if" conditions (in addition to the file not following the coding style). To make the code in get_esi() more readable, fix the coding style and use the usual error handling pattern with a separate variable.
In addition, initialization of 'error' variable at declaration is not needed.
No functional changes intended.
Link: https://lore.kernel.org/r/20230911125354.25501-4-ilpo.jarvinen@linux.intel.c... Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/atm/iphase.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/atm/iphase.c b/drivers/atm/iphase.c index 46990352b5d3f..bfc889367d5e3 100644 --- a/drivers/atm/iphase.c +++ b/drivers/atm/iphase.c @@ -2290,19 +2290,21 @@ static int get_esi(struct atm_dev *dev) static int reset_sar(struct atm_dev *dev) { IADEV *iadev; - int i, error = 1; + int i, error; unsigned int pci[64]; iadev = INPH_IA_DEV(dev); - for(i=0; i<64; i++) - if ((error = pci_read_config_dword(iadev->pci, - i*4, &pci[i])) != PCIBIOS_SUCCESSFUL) - return error; + for (i = 0; i < 64; i++) { + error = pci_read_config_dword(iadev->pci, i * 4, &pci[i]); + if (error != PCIBIOS_SUCCESSFUL) + return error; + } writel(0, iadev->reg+IPHASE5575_EXT_RESET); - for(i=0; i<64; i++) - if ((error = pci_write_config_dword(iadev->pci, - i*4, pci[i])) != PCIBIOS_SUCCESSFUL) - return error; + for (i = 0; i < 64; i++) { + error = pci_write_config_dword(iadev->pci, i * 4, pci[i]); + if (error != PCIBIOS_SUCCESSFUL) + return error; + } udelay(5); return 0; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wenchao Hao haowenchao2@huawei.com
[ Upstream commit 4df105f0ce9f6f30cda4e99f577150d23f0c9c5f ]
fc_lport_ptp_setup() did not check the return value of fc_rport_create() which can return NULL and would cause a NULL pointer dereference. Address this issue by checking return value of fc_rport_create() and log error message on fc_rport_create() failed.
Signed-off-by: Wenchao Hao haowenchao2@huawei.com Link: https://lore.kernel.org/r/20231011130350.819571-1-haowenchao2@huawei.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/libfc/fc_lport.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/scsi/libfc/fc_lport.c b/drivers/scsi/libfc/fc_lport.c index 9399e1455d597..97087eef05dbc 100644 --- a/drivers/scsi/libfc/fc_lport.c +++ b/drivers/scsi/libfc/fc_lport.c @@ -238,6 +238,12 @@ static void fc_lport_ptp_setup(struct fc_lport *lport, } mutex_lock(&lport->disc.disc_mutex); lport->ptp_rdata = fc_rport_create(lport, remote_fid); + if (!lport->ptp_rdata) { + printk(KERN_WARNING "libfc: Failed to setup lport 0x%x\n", + lport->port_id); + mutex_unlock(&lport->disc.disc_mutex); + return; + } kref_get(&lport->ptp_rdata->kref); lport->ptp_rdata->ids.port_name = remote_wwpn; lport->ptp_rdata->ids.node_name = remote_wwnn;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Kosina jkosina@suse.cz
[ Upstream commit 62cc9c3cb3ec1bf31cc116146185ed97b450836a ]
This device needs ALWAYS_POLL quirk, otherwise it keeps reconnecting indefinitely.
Reported-by: Robert Ayrapetyan robert.ayrapetyan@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-quirks.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 64842926aff64..182068bf28c0a 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -350,6 +350,7 @@
#define USB_VENDOR_ID_DELL 0x413c #define USB_DEVICE_ID_DELL_PIXART_USB_OPTICAL_MOUSE 0x301a +#define USB_DEVICE_ID_DELL_PRO_WIRELESS_KM5221W 0x4503
#define USB_VENDOR_ID_DELORME 0x1163 #define USB_DEVICE_ID_DELORME_EARTHMATE 0x0100 diff --git a/drivers/hid/hid-quirks.c b/drivers/hid/hid-quirks.c index 83c3322fcf187..fae784df084d5 100644 --- a/drivers/hid/hid-quirks.c +++ b/drivers/hid/hid-quirks.c @@ -66,6 +66,7 @@ static const struct hid_device_id hid_quirks[] = { { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_STRAFE), HID_QUIRK_NO_INIT_REPORTS | HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CREATIVELABS, USB_DEVICE_ID_CREATIVE_SB_OMNI_SURROUND_51), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_DELL, USB_DEVICE_ID_DELL_PIXART_USB_OPTICAL_MOUSE), HID_QUIRK_ALWAYS_POLL }, + { HID_USB_DEVICE(USB_VENDOR_ID_DELL, USB_DEVICE_ID_DELL_PRO_WIRELESS_KM5221W), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_DMI, USB_DEVICE_ID_DMI_ENC), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_DRACAL_RAPHNET, USB_DEVICE_ID_RAPHNET_2NES2SNES), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_DRACAL_RAPHNET, USB_DEVICE_ID_RAPHNET_4NES4SNES), HID_QUIRK_MULTI_INPUT },
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yi Yang yiyang13@huawei.com
[ Upstream commit d81ffb87aaa75f842cd7aa57091810353755b3e6 ]
Add check for the return value of kstrdup() and return the error, if it fails in order to avoid NULL pointer dereference.
Signed-off-by: Yi Yang yiyang13@huawei.com Reviewed-by: Jiri Slaby jirislaby@kernel.org Link: https://lore.kernel.org/r/20230904035220.48164-1-yiyang13@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/vcc.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/vcc.c b/drivers/tty/vcc.c index 9ffd42e333b83..6b2d35ac6e3b3 100644 --- a/drivers/tty/vcc.c +++ b/drivers/tty/vcc.c @@ -587,18 +587,22 @@ static int vcc_probe(struct vio_dev *vdev, const struct vio_device_id *id) return -ENOMEM;
name = kstrdup(dev_name(&vdev->dev), GFP_KERNEL); + if (!name) { + rv = -ENOMEM; + goto free_port; + }
rv = vio_driver_init(&port->vio, vdev, VDEV_CONSOLE_CON, vcc_versions, ARRAY_SIZE(vcc_versions), NULL, name); if (rv) - goto free_port; + goto free_name;
port->vio.debug = vcc_dbg_vio; vcc_ldc_cfg.debug = vcc_dbg_ldc;
rv = vio_ldc_alloc(&port->vio, &vcc_ldc_cfg, port); if (rv) - goto free_port; + goto free_name;
spin_lock_init(&port->lock);
@@ -632,6 +636,11 @@ static int vcc_probe(struct vio_dev *vdev, const struct vio_device_id *id) goto unreg_tty; } port->domain = kstrdup(domain, GFP_KERNEL); + if (!port->domain) { + rv = -ENOMEM; + goto unreg_tty; + } +
mdesc_release(hp);
@@ -661,8 +670,9 @@ static int vcc_probe(struct vio_dev *vdev, const struct vio_device_id *id) vcc_table_remove(port->index); free_ldc: vio_ldc_free(&port->vio); -free_port: +free_name: kfree(name); +free_port: kfree(port);
return rv;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hardik Gajjar hgajjar@de.adit-jv.com
[ Upstream commit a04224da1f3424b2c607b12a3bd1f0e302fb8231 ]
Previously, gadget assignment to the net device occurred exclusively during the initial binding attempt.
Nevertheless, the gadget pointer could change during bind/unbind cycles due to various conditions, including the unloading/loading of the UDC device driver or the detachment/reconnection of an OTG-capable USB hub device.
This patch relocates the gether_set_gadget() function out from ncm_opts->bound condition check, ensuring that the correct gadget is assigned during each bind request.
The provided logs demonstrate the consistency of ncm_opts throughout the power cycle, while the gadget may change.
* OTG hub connected during boot up and assignment of gadget and ncm_opts pointer
[ 2.366301] usb 2-1.5: New USB device found, idVendor=2996, idProduct=0105 [ 2.366304] usb 2-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 2.366306] usb 2-1.5: Product: H2H Bridge [ 2.366308] usb 2-1.5: Manufacturer: Aptiv [ 2.366309] usb 2-1.5: SerialNumber: 13FEB2021 [ 2.427989] usb 2-1.5: New USB device found, VID=2996, PID=0105 [ 2.428959] dabridge 2-1.5:1.0: dabridge 2-4 total endpoints=5, 0000000093a8d681 [ 2.429710] dabridge 2-1.5:1.0: P(0105) D(22.06.22) F(17.3.16) H(1.1) high-speed [ 2.429714] dabridge 2-1.5:1.0: Hub 2-2 P(0151) V(06.87) [ 2.429956] dabridge 2-1.5:1.0: All downstream ports in host mode
[ 2.430093] gadget 000000003c414d59 ------> gadget pointer
* NCM opts and associated gadget pointer during First ncm_bind
[ 34.763929] NCM opts 00000000aa304ac9 [ 34.763930] NCM gadget 000000003c414d59
* OTG capable hub disconnecte or assume driver unload.
[ 97.203114] usb 2-1: USB disconnect, device number 2 [ 97.203118] usb 2-1.1: USB disconnect, device number 3 [ 97.209217] usb 2-1.5: USB disconnect, device number 4 [ 97.230990] dabr_udc deleted
* Reconnect the OTG hub or load driver assaign new gadget pointer.
[ 111.534035] usb 2-1.1: New USB device found, idVendor=2996, idProduct=0120, bcdDevice= 6.87 [ 111.534038] usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 111.534040] usb 2-1.1: Product: Vendor [ 111.534041] usb 2-1.1: Manufacturer: Aptiv [ 111.534042] usb 2-1.1: SerialNumber: Superior [ 111.535175] usb 2-1.1: New USB device found, VID=2996, PID=0120 [ 111.610995] usb 2-1.5: new high-speed USB device number 8 using xhci-hcd [ 111.630052] usb 2-1.5: New USB device found, idVendor=2996, idProduct=0105, bcdDevice=21.02 [ 111.630055] usb 2-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 111.630057] usb 2-1.5: Product: H2H Bridge [ 111.630058] usb 2-1.5: Manufacturer: Aptiv [ 111.630059] usb 2-1.5: SerialNumber: 13FEB2021 [ 111.687464] usb 2-1.5: New USB device found, VID=2996, PID=0105 [ 111.690375] dabridge 2-1.5:1.0: dabridge 2-8 total endpoints=5, 000000000d87c961 [ 111.691172] dabridge 2-1.5:1.0: P(0105) D(22.06.22) F(17.3.16) H(1.1) high-speed [ 111.691176] dabridge 2-1.5:1.0: Hub 2-6 P(0151) V(06.87) [ 111.691646] dabridge 2-1.5:1.0: All downstream ports in host mode
[ 111.692298] gadget 00000000dc72f7a9 --------> new gadget ptr on connect
* NCM opts and associated gadget pointer during second ncm_bind
[ 113.271786] NCM opts 00000000aa304ac9 -----> same opts ptr used during first bind [ 113.271788] NCM gadget 00000000dc72f7a9 ----> however new gaget ptr, that will not set in net_device due to ncm_opts->bound = true
Signed-off-by: Hardik Gajjar hgajjar@de.adit-jv.com Link: https://lore.kernel.org/r/20231020153324.82794-1-hgajjar@de.adit-jv.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/function/f_ncm.c | 27 +++++++++++---------------- 1 file changed, 11 insertions(+), 16 deletions(-)
diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c index 8d23a870b7b7f..2ef2464a50432 100644 --- a/drivers/usb/gadget/function/f_ncm.c +++ b/drivers/usb/gadget/function/f_ncm.c @@ -1435,7 +1435,7 @@ static int ncm_bind(struct usb_configuration *c, struct usb_function *f) struct usb_composite_dev *cdev = c->cdev; struct f_ncm *ncm = func_to_ncm(f); struct usb_string *us; - int status; + int status = 0; struct usb_ep *ep; struct f_ncm_opts *ncm_opts;
@@ -1453,22 +1453,17 @@ static int ncm_bind(struct usb_configuration *c, struct usb_function *f) f->os_desc_table[0].os_desc = &ncm_opts->ncm_os_desc; }
- /* - * in drivers/usb/gadget/configfs.c:configfs_composite_bind() - * configurations are bound in sequence with list_for_each_entry, - * in each configuration its functions are bound in sequence - * with list_for_each_entry, so we assume no race condition - * with regard to ncm_opts->bound access - */ - if (!ncm_opts->bound) { - mutex_lock(&ncm_opts->lock); - gether_set_gadget(ncm_opts->net, cdev->gadget); + mutex_lock(&ncm_opts->lock); + gether_set_gadget(ncm_opts->net, cdev->gadget); + if (!ncm_opts->bound) status = gether_register_netdev(ncm_opts->net); - mutex_unlock(&ncm_opts->lock); - if (status) - goto fail; - ncm_opts->bound = true; - } + mutex_unlock(&ncm_opts->lock); + + if (status) + goto fail; + + ncm_opts->bound = true; + us = usb_gstrings_attach(cdev, ncm_strings, ARRAY_SIZE(ncm_string_defs)); if (IS_ERR(us)) {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Axel Lin axel.lin@ingics.com
[ Upstream commit 5ac61d26b8baff5b2e5a9f3dc1ef63297e4b53e7 ]
Make sure we don't OOPS in case clock-frequency is set to 0 in a DT. The variable set here is later used as a divisor.
Signed-off-by: Axel Lin axel.lin@ingics.com Acked-by: Boris Brezillon boris.brezillon@free-electrons.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-sun6i-p2wi.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/i2c/busses/i2c-sun6i-p2wi.c b/drivers/i2c/busses/i2c-sun6i-p2wi.c index 7c07ce116e384..540c33f4e3500 100644 --- a/drivers/i2c/busses/i2c-sun6i-p2wi.c +++ b/drivers/i2c/busses/i2c-sun6i-p2wi.c @@ -202,6 +202,11 @@ static int p2wi_probe(struct platform_device *pdev) return -EINVAL; }
+ if (clk_freq == 0) { + dev_err(dev, "clock-frequency is set to 0 in DT\n"); + return -EINVAL; + } + if (of_get_child_count(np) > 1) { dev_err(dev, "P2WI only supports one slave device\n"); return -EINVAL;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rajeshwar R Shinde coolrrsh@gmail.com
[ Upstream commit 099be1822d1f095433f4b08af9cc9d6308ec1953 ]
Syzkaller reported the following issue: UBSAN: shift-out-of-bounds in drivers/media/usb/gspca/cpia1.c:1031:27 shift exponent 245 is too large for 32-bit type 'int'
When the value of the variable "sd->params.exposure.gain" exceeds the number of bits in an integer, a shift-out-of-bounds error is reported. It is triggered because the variable "currentexp" cannot be left-shifted by more than the number of bits in an integer. In order to avoid invalid range during left-shift, the conditional expression is added.
Reported-by: syzbot+e27f3dbdab04e43b9f73@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/20230818164522.12806-1-coolrrsh@gmail.com Link: https://syzkaller.appspot.com/bug?extid=e27f3dbdab04e43b9f73 Signed-off-by: Rajeshwar R Shinde coolrrsh@gmail.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/usb/gspca/cpia1.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/media/usb/gspca/cpia1.c b/drivers/media/usb/gspca/cpia1.c index d93d384286c16..de945e13c7c6b 100644 --- a/drivers/media/usb/gspca/cpia1.c +++ b/drivers/media/usb/gspca/cpia1.c @@ -18,6 +18,7 @@
#include <linux/input.h> #include <linux/sched/signal.h> +#include <linux/bitops.h>
#include "gspca.h"
@@ -1027,6 +1028,8 @@ static int set_flicker(struct gspca_dev *gspca_dev, int on, int apply) sd->params.exposure.expMode = 2; sd->exposure_status = EXPOSURE_NORMAL; } + if (sd->params.exposure.gain >= BITS_PER_TYPE(currentexp)) + return -EINVAL; currentexp = currentexp << sd->params.exposure.gain; sd->params.exposure.gain = 0; /* round down current exposure to nearest value */
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 4567ebf8e8f9546b373e78e3b7d584cc30b62028 ]
Fixes these compiler warnings:
drivers/media/test-drivers/vivid/vivid-rds-gen.c: In function 'vivid_rds_gen_fill': drivers/media/test-drivers/vivid/vivid-rds-gen.c:147:56: warning: '.' directive output may be truncated writing 1 byte into a region of size between 0 and 3 [-Wformat-truncation=] 147 | snprintf(rds->psname, sizeof(rds->psname), "%6d.%1d", | ^ drivers/media/test-drivers/vivid/vivid-rds-gen.c:147:52: note: directive argument in the range [0, 9] 147 | snprintf(rds->psname, sizeof(rds->psname), "%6d.%1d", | ^~~~~~~~~ drivers/media/test-drivers/vivid/vivid-rds-gen.c:147:9: note: 'snprintf' output between 9 and 12 bytes into a destination of size 9 147 | snprintf(rds->psname, sizeof(rds->psname), "%6d.%1d", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 148 | freq / 16, ((freq & 0xf) * 10) / 16); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Acked-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/vivid/vivid-rds-gen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/platform/vivid/vivid-rds-gen.c b/drivers/media/platform/vivid/vivid-rds-gen.c index b5b104ee64c99..c57771119a34b 100644 --- a/drivers/media/platform/vivid/vivid-rds-gen.c +++ b/drivers/media/platform/vivid/vivid-rds-gen.c @@ -145,7 +145,7 @@ void vivid_rds_gen_fill(struct vivid_rds_gen *rds, unsigned freq, rds->ta = alt; rds->ms = true; snprintf(rds->psname, sizeof(rds->psname), "%6d.%1d", - freq / 16, ((freq & 0xf) * 10) / 16); + (freq / 16) % 1000000, (((freq & 0xf) * 10) / 16) % 10); if (alt) strscpy(rds->radiotext, " The Radio Data System can switch between different Radio Texts ",
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bob Peterson rpeterso@redhat.com
[ Upstream commit 4c6a08125f2249531ec01783a5f4317d7342add5 ]
When lots of quota changes are made, there may be cases in which an inode's quota information is increased and then decreased, such as when blocks are added to a file, then deleted from it. If the timing is right, function do_qc can add pending quota changes to a transaction, then later, another call to do_qc can negate those changes, resulting in a net gain of 0. The quota_change information is recorded in the qc buffer (and qd element of the inode as well). The buffer is added to the transaction by the first call to do_qc, but a subsequent call changes the value from non-zero back to zero. At that point it's too late to remove the buffer_head from the transaction. Later, when the quota sync code is called, the zero-change qd element is discovered and flagged as an assert warning. If the fs is mounted with errors=panic, the kernel will panic.
This is usually seen when files are truncated and the quota changes are negated by punch_hole/truncate which uses gfs2_quota_hold and gfs2_quota_unhold rather than block allocations that use gfs2_quota_lock and gfs2_quota_unlock which automatically do quota sync.
This patch solves the problem by adding a check to qd_check_sync such that net-zero quota changes already added to the transaction are no longer deemed necessary to be synced, and skipped.
In this case references are taken for the qd and the slot from do_qc so those need to be put. The normal sequence of events for a normal non-zero quota change is as follows:
gfs2_quota_change do_qc qd_hold slot_hold
Later, when the changes are to be synced:
gfs2_quota_sync qd_fish qd_check_sync gets qd ref via lockref_get_not_dead do_sync do_qc(QC_SYNC) qd_put lockref_put_or_lock qd_unlock qd_put lockref_put_or_lock
In the net-zero change case, we add a check to qd_check_sync so it puts the qd and slot references acquired in gfs2_quota_change and skip the unneeded sync.
Signed-off-by: Bob Peterson rpeterso@redhat.com Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/quota.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index cbee745169b8f..ce3d65787e016 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -431,6 +431,17 @@ static int qd_check_sync(struct gfs2_sbd *sdp, struct gfs2_quota_data *qd, (sync_gen && (qd->qd_sync_gen >= *sync_gen))) return 0;
+ /* + * If qd_change is 0 it means a pending quota change was negated. + * We should not sync it, but we still have a qd reference and slot + * reference taken by gfs2_quota_change -> do_qc that need to be put. + */ + if (!qd->qd_change && test_and_clear_bit(QDF_CHANGE, &qd->qd_flags)) { + slot_put(qd); + qd_put(qd); + return 0; + } + if (!lockref_get_not_dead(&qd->qd_lockref)) return 0;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
[ Upstream commit f301fedbeecfdce91cb898d6fa5e62f269801fee ]
Use FIELD_GET() to extract PCIe Negotiated and Maximum Link Width fields instead of custom masking and shifting.
Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/pci/cobalt/cobalt-driver.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/media/pci/cobalt/cobalt-driver.c b/drivers/media/pci/cobalt/cobalt-driver.c index 1bd8bbe57a30e..1f230b14cbfdd 100644 --- a/drivers/media/pci/cobalt/cobalt-driver.c +++ b/drivers/media/pci/cobalt/cobalt-driver.c @@ -8,6 +8,7 @@ * All rights reserved. */
+#include <linux/bitfield.h> #include <linux/delay.h> #include <media/i2c/adv7604.h> #include <media/i2c/adv7842.h> @@ -210,17 +211,17 @@ void cobalt_pcie_status_show(struct cobalt *cobalt) pcie_capability_read_word(pci_dev, PCI_EXP_LNKSTA, &stat); cobalt_info("PCIe link capability 0x%08x: %s per lane and %u lanes\n", capa, get_link_speed(capa), - (capa & PCI_EXP_LNKCAP_MLW) >> 4); + FIELD_GET(PCI_EXP_LNKCAP_MLW, capa)); cobalt_info("PCIe link control 0x%04x\n", ctrl); cobalt_info("PCIe link status 0x%04x: %s per lane and %u lanes\n", stat, get_link_speed(stat), - (stat & PCI_EXP_LNKSTA_NLW) >> 4); + FIELD_GET(PCI_EXP_LNKSTA_NLW, stat));
/* Bus */ pcie_capability_read_dword(pci_bus_dev, PCI_EXP_LNKCAP, &capa); cobalt_info("PCIe bus link capability 0x%08x: %s per lane and %u lanes\n", capa, get_link_speed(capa), - (capa & PCI_EXP_LNKCAP_MLW) >> 4); + FIELD_GET(PCI_EXP_LNKCAP_MLW, capa));
/* Slot */ pcie_capability_read_dword(pci_dev, PCI_EXP_SLTCAP, &capa); @@ -239,7 +240,7 @@ static unsigned pcie_link_get_lanes(struct cobalt *cobalt) if (!pci_is_pcie(pci_dev)) return 0; pcie_capability_read_word(pci_dev, PCI_EXP_LNKSTA, &link); - return (link & PCI_EXP_LNKSTA_NLW) >> 4; + return FIELD_GET(PCI_EXP_LNKSTA_NLW, link); }
static unsigned pcie_bus_link_get_lanes(struct cobalt *cobalt) @@ -250,7 +251,7 @@ static unsigned pcie_bus_link_get_lanes(struct cobalt *cobalt) if (!pci_is_pcie(pci_dev)) return 0; pcie_capability_read_dword(pci_dev, PCI_EXP_LNKCAP, &link); - return (link & PCI_EXP_LNKCAP_MLW) >> 4; + return FIELD_GET(PCI_EXP_LNKCAP_MLW, link); }
static void msi_config_show(struct cobalt *cobalt, struct pci_dev *pci_dev)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wayne Lin wayne.lin@amd.com
[ Upstream commit b1904ed480cee3f9f4036ea0e36d139cb5fee2d6 ]
[Why & How] Check whether assigned timing generator is NULL or not before accessing its funcs to prevent NULL dereference.
Reviewed-by: Jun Lei jun.lei@amd.com Acked-by: Hersen Wu hersenxs.wu@amd.com Signed-off-by: Wayne Lin wayne.lin@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c index bb09243758fe3..71b10b45a9b9e 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c @@ -492,7 +492,7 @@ uint32_t dc_stream_get_vblank_counter(const struct dc_stream_state *stream) for (i = 0; i < MAX_PIPES; i++) { struct timing_generator *tg = res_ctx->pipe_ctx[i].stream_res.tg;
- if (res_ctx->pipe_ctx[i].stream != stream) + if (res_ctx->pipe_ctx[i].stream != stream || !tg) continue;
return tg->funcs->get_frame_count(tg); @@ -551,7 +551,7 @@ bool dc_stream_get_scanoutpos(const struct dc_stream_state *stream, for (i = 0; i < MAX_PIPES; i++) { struct timing_generator *tg = res_ctx->pipe_ctx[i].stream_res.tg;
- if (res_ctx->pipe_ctx[i].stream != stream) + if (res_ctx->pipe_ctx[i].stream != stream || !tg) continue;
tg->funcs->get_scanoutpos(tg,
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Douglas Anderson dianders@chromium.org
[ Upstream commit dd712d3d45807db9fcae28a522deee85c1f2fde6 ]
When entering kdb/kgdb on a kernel panic, it was be observed that the console isn't flushed before the `kdb` prompt came up. Specifically, when using the buddy lockup detector on arm64 and running: echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
I could see: [ 26.161099] lkdtm: Performing direct entry HARDLOCKUP [ 32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6 [ 32.552865] Sending NMI from CPU 5 to CPUs 6: [ 32.557359] NMI backtrace for cpu 6 ... [backtrace for cpu 6] ... [ 32.558353] NMI backtrace for cpu 5 ... [backtrace for cpu 5] ... [ 32.867471] Sending NMI from CPU 5 to CPUs 0-4,7: [ 32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry [5]kdb>
As you can see, backtraces for the other CPUs start printing and get interleaved with the kdb PANIC print.
Let's replicate the commands to flush the console in the kdb panic entry point to avoid this.
Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://lore.kernel.org/r/20230822131945.1.I5b460ae8f954e4c4f628a373d6e74713... Signed-off-by: Daniel Thompson daniel.thompson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/debug/debug_core.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c index f88611fadb195..1ab2e97034868 100644 --- a/kernel/debug/debug_core.c +++ b/kernel/debug/debug_core.c @@ -945,6 +945,9 @@ void kgdb_panic(const char *msg) if (panic_timeout) return;
+ debug_locks_off(); + console_flush_on_panic(CONSOLE_FLUSH_PENDING); + if (dbg_kdb_mode) kdb_printf("PANIC: %s\n", msg);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tony Lindgren tony@atomide.com
[ Upstream commit fbb74e56378d8306f214658e3d525a8b3f000c5a ]
We need to check for an active device as otherwise we get warnings for some mcbsp instances for "Runtime PM usage count underflow!".
Reported-by: Andreas Kemnade andreas@kemnade.info Signed-off-by: Tony Lindgren tony@atomide.com Link: https://lore.kernel.org/r/20231030052340.13415-1-tony@atomide.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/ti/omap-mcbsp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/sound/soc/ti/omap-mcbsp.c b/sound/soc/ti/omap-mcbsp.c index 3273b317fa3b9..3e8ed05f3ebd8 100644 --- a/sound/soc/ti/omap-mcbsp.c +++ b/sound/soc/ti/omap-mcbsp.c @@ -74,7 +74,8 @@ static int omap2_mcbsp_set_clks_src(struct omap_mcbsp *mcbsp, u8 fck_src_id) return -EINVAL; }
- pm_runtime_put_sync(mcbsp->dev); + if (mcbsp->active) + pm_runtime_put_sync(mcbsp->dev);
r = clk_set_parent(mcbsp->fclk, fck_src); if (r) { @@ -84,7 +85,8 @@ static int omap2_mcbsp_set_clks_src(struct omap_mcbsp *mcbsp, u8 fck_src_id) return r; }
- pm_runtime_get_sync(mcbsp->dev); + if (mcbsp->active) + pm_runtime_get_sync(mcbsp->dev);
clk_put(fck_src);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit d27abbfd4888d79dd24baf50e774631046ac4732 ]
These enums are passed to set/test_bit(). The set/test_bit() functions take a bit number instead of a shifted value. Passing a shifted value is a double shift bug like doing BIT(BIT(1)). The double shift bug doesn't cause a problem here because we are only checking 0 and 1 but if the value was 5 or above then it can lead to a buffer overflow.
Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Reviewed-by: Sam Protsenko semen.protsenko@linaro.org Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/pwm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/pwm.h b/include/linux/pwm.h index b2c9c460947d1..d1c26f5174e53 100644 --- a/include/linux/pwm.h +++ b/include/linux/pwm.h @@ -44,8 +44,8 @@ struct pwm_args { };
enum { - PWMF_REQUESTED = 1 << 0, - PWMF_EXPORTED = 1 << 1, + PWMF_REQUESTED = 0, + PWMF_EXPORTED = 1, };
/*
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miri Korenblit miriam.rachel.korenblit@intel.com
[ Upstream commit 499d02790495958506a64f37ceda7e97345a50a8 ]
Currently we are setting the rate in the tx cmd for mgmt frames (e.g. during connection establishment). This was problematic when sending mgmt frames in eSR mode, as we don't know what link this frame will be sent on (This is decided by the FW), so we don't know what is the lowest rate. Fix this by not setting the rate in tx cmd and rely on FW to choose the right one. Set rate only for injected frames with fixed rate, or when no sta is given. Also set for important frames (EAPOL etc.) the High Priority flag.
Fixes: 055b22e770dd ("iwlwifi: mvm: Set Tx rate and flags when there is not station") Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Signed-off-by: Gregory Greenman gregory.greenman@intel.com Link: https://lore.kernel.org/r/20230913145231.6c7e59620ee0.I6eaed3ccdd6dd62b9e664... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/tx.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c index 9a81ce299d0d1..fbcd46aedade3 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c @@ -529,16 +529,20 @@ iwl_mvm_set_tx_params(struct iwl_mvm *mvm, struct sk_buff *skb, flags |= IWL_TX_FLAGS_ENCRYPT_DIS;
/* - * For data packets rate info comes from the fw. Only - * set rate/antenna during connection establishment or in case - * no station is given. + * For data and mgmt packets rate info comes from the fw. Only + * set rate/antenna for injected frames with fixed rate, or + * when no sta is given. */ - if (!sta || !ieee80211_is_data(hdr->frame_control) || - mvmsta->sta_state < IEEE80211_STA_AUTHORIZED) { + if (unlikely(!sta || + info->control.flags & IEEE80211_TX_CTRL_RATE_INJECT)) { flags |= IWL_TX_FLAGS_CMD_RATE; rate_n_flags = iwl_mvm_get_tx_rate_n_flags(mvm, info, sta, hdr->frame_control); + } else if (!ieee80211_is_data(hdr->frame_control) || + mvmsta->sta_state < IEEE80211_STA_AUTHORIZED) { + /* These are important frames */ + flags |= IWL_TX_FLAGS_HIGH_PRI; }
if (mvm->trans->trans_cfg->device_family >=
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kan Liang kan.liang@linux.intel.com
[ Upstream commit 42bbabed09ce6208026648a71a45b4394c74585a ]
The low level index of raw branch records for the most recent branch can be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX branch_sample_type. Extend struct branch_stack to support it.
However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and entries[] will be output by kernel. The pointer of entries[] could be wrong, since the output format is different with new struct branch_stack. Add a variable no_hw_idx in struct perf_sample to indicate whether the hw_idx is output. Add get_branch_entry() to return corresponding pointer of entries[0].
To make dummy branch sample consistent as new branch sample, add hw_idx in struct dummy_branch_stack for cs-etm and intel-pt.
Apply the new struct branch_stack for synthetic events as well.
Extend test case sample-parsing to support new struct branch_stack.
Committer notes:
Renamed get_branch_entries() to perf_sample__branch_entries() to have proper namespacing and pave the way for this to be moved to libperf, eventually.
Add 'static' to that inline as it is in a header.
Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build on arm64.
Signed-off-by: Kan Liang kan.liang@linux.intel.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexey Budankov alexey.budankov@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Michael Ellerman mpe@ellerman.id.au Cc: Namhyung Kim namhyung@kernel.org Cc: Pavel Gerasimov pavel.gerasimov@intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi Bangoria ravi.bangoria@linux.ibm.com Cc: Stephane Eranian eranian@google.com Cc: Vitaly Slobodskoy vitaly.slobodskoy@intel.com Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Stable-dep-of: c1149037f65b ("perf hist: Add missing puts to hist__account_cycles") Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/builtin-script.c | 70 ++++++++++--------- tools/perf/tests/sample-parsing.c | 7 +- tools/perf/util/branch.h | 22 ++++++ tools/perf/util/cs-etm.c | 2 + tools/perf/util/event.h | 1 + tools/perf/util/evsel.c | 5 ++ tools/perf/util/evsel.h | 5 ++ tools/perf/util/hist.c | 3 +- tools/perf/util/intel-pt.c | 2 + tools/perf/util/machine.c | 35 +++++----- .../scripting-engines/trace-event-python.c | 30 ++++---- tools/perf/util/session.c | 8 ++- tools/perf/util/synthetic-events.c | 6 +- 13 files changed, 125 insertions(+), 71 deletions(-)
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index bbf1f2d3387e3..bb64dbfe043a5 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -735,6 +735,7 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample, struct perf_event_attr *attr, FILE *fp) { struct branch_stack *br = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); struct addr_location alf, alt; u64 i, from, to; int printed = 0; @@ -743,8 +744,8 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample, return 0;
for (i = 0; i < br->nr; i++) { - from = br->entries[i].from; - to = br->entries[i].to; + from = entries[i].from; + to = entries[i].to;
if (PRINT_FIELD(DSO)) { memset(&alf, 0, sizeof(alf)); @@ -768,10 +769,10 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample, }
printed += fprintf(fp, "/%c/%c/%c/%d ", - mispred_str( br->entries + i), - br->entries[i].flags.in_tx? 'X' : '-', - br->entries[i].flags.abort? 'A' : '-', - br->entries[i].flags.cycles); + mispred_str(entries + i), + entries[i].flags.in_tx ? 'X' : '-', + entries[i].flags.abort ? 'A' : '-', + entries[i].flags.cycles); }
return printed; @@ -782,6 +783,7 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample, struct perf_event_attr *attr, FILE *fp) { struct branch_stack *br = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); struct addr_location alf, alt; u64 i, from, to; int printed = 0; @@ -793,8 +795,8 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
memset(&alf, 0, sizeof(alf)); memset(&alt, 0, sizeof(alt)); - from = br->entries[i].from; - to = br->entries[i].to; + from = entries[i].from; + to = entries[i].to;
thread__find_symbol_fb(thread, sample->cpumode, from, &alf); thread__find_symbol_fb(thread, sample->cpumode, to, &alt); @@ -813,10 +815,10 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample, printed += fprintf(fp, ")"); } printed += fprintf(fp, "/%c/%c/%c/%d ", - mispred_str( br->entries + i), - br->entries[i].flags.in_tx? 'X' : '-', - br->entries[i].flags.abort? 'A' : '-', - br->entries[i].flags.cycles); + mispred_str(entries + i), + entries[i].flags.in_tx ? 'X' : '-', + entries[i].flags.abort ? 'A' : '-', + entries[i].flags.cycles); }
return printed; @@ -827,6 +829,7 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample, struct perf_event_attr *attr, FILE *fp) { struct branch_stack *br = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); struct addr_location alf, alt; u64 i, from, to; int printed = 0; @@ -838,8 +841,8 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
memset(&alf, 0, sizeof(alf)); memset(&alt, 0, sizeof(alt)); - from = br->entries[i].from; - to = br->entries[i].to; + from = entries[i].from; + to = entries[i].to;
if (thread__find_map_fb(thread, sample->cpumode, from, &alf) && !alf.map->dso->adjust_symbols) @@ -862,10 +865,10 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample, printed += fprintf(fp, ")"); } printed += fprintf(fp, "/%c/%c/%c/%d ", - mispred_str(br->entries + i), - br->entries[i].flags.in_tx ? 'X' : '-', - br->entries[i].flags.abort ? 'A' : '-', - br->entries[i].flags.cycles); + mispred_str(entries + i), + entries[i].flags.in_tx ? 'X' : '-', + entries[i].flags.abort ? 'A' : '-', + entries[i].flags.cycles); }
return printed; @@ -1011,6 +1014,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample, struct machine *machine, FILE *fp) { struct branch_stack *br = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); u64 start, end; int i, insn, len, nr, ilen, printed = 0; struct perf_insn x; @@ -1031,31 +1035,31 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample, printed += fprintf(fp, "%c", '\n');
/* Handle first from jump, of which we don't know the entry. */ - len = grab_bb(buffer, br->entries[nr-1].from, - br->entries[nr-1].from, + len = grab_bb(buffer, entries[nr-1].from, + entries[nr-1].from, machine, thread, &x.is64bit, &x.cpumode, false); if (len > 0) { - printed += ip__fprintf_sym(br->entries[nr - 1].from, thread, + printed += ip__fprintf_sym(entries[nr - 1].from, thread, x.cpumode, x.cpu, &lastsym, attr, fp); - printed += ip__fprintf_jump(br->entries[nr - 1].from, &br->entries[nr - 1], + printed += ip__fprintf_jump(entries[nr - 1].from, &entries[nr - 1], &x, buffer, len, 0, fp, &total_cycles); if (PRINT_FIELD(SRCCODE)) - printed += print_srccode(thread, x.cpumode, br->entries[nr - 1].from); + printed += print_srccode(thread, x.cpumode, entries[nr - 1].from); }
/* Print all blocks */ for (i = nr - 2; i >= 0; i--) { - if (br->entries[i].from || br->entries[i].to) + if (entries[i].from || entries[i].to) pr_debug("%d: %" PRIx64 "-%" PRIx64 "\n", i, - br->entries[i].from, - br->entries[i].to); - start = br->entries[i + 1].to; - end = br->entries[i].from; + entries[i].from, + entries[i].to); + start = entries[i + 1].to; + end = entries[i].from;
len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false); /* Patch up missing kernel transfers due to ring filters */ if (len == -ENXIO && i > 0) { - end = br->entries[--i].from; + end = entries[--i].from; pr_debug("\tpatching up to %" PRIx64 "-%" PRIx64 "\n", start, end); len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false); } @@ -1068,7 +1072,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
printed += ip__fprintf_sym(ip, thread, x.cpumode, x.cpu, &lastsym, attr, fp); if (ip == end) { - printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, ++insn, fp, + printed += ip__fprintf_jump(ip, &entries[i], &x, buffer + off, len - off, ++insn, fp, &total_cycles); if (PRINT_FIELD(SRCCODE)) printed += print_srccode(thread, x.cpumode, ip); @@ -1092,9 +1096,9 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample, * Hit the branch? In this case we are already done, and the target * has not been executed yet. */ - if (br->entries[0].from == sample->ip) + if (entries[0].from == sample->ip) goto out; - if (br->entries[0].flags.abort) + if (entries[0].flags.abort) goto out;
/* @@ -1105,7 +1109,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample, * between final branch and sample. When this happens just * continue walking after the last TO until we hit a branch. */ - start = br->entries[0].to; + start = entries[0].to; end = sample->ip; if (end < start) { /* Missing jump. Scan 128 bytes for the next branch */ diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c index 2f76d4a9de860..6da067d339429 100644 --- a/tools/perf/tests/sample-parsing.c +++ b/tools/perf/tests/sample-parsing.c @@ -99,6 +99,7 @@ static bool samples_same(const struct perf_sample *s1,
if (type & PERF_SAMPLE_BRANCH_STACK) { COMP(branch_stack->nr); + COMP(branch_stack->hw_idx); for (i = 0; i < s1->branch_stack->nr; i++) MCOMP(branch_stack->entries[i]); } @@ -177,7 +178,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) u64 data[64]; } branch_stack = { /* 1 branch_entry */ - .data = {1, 211, 212, 213}, + .data = {1, -1ULL, 211, 212, 213}, }; u64 regs[64]; const u32 raw_data[] = {0x12345678, 0x0a0b0c0d, 0x11020304, 0x05060708, 0 }; @@ -198,6 +199,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) .transaction = 112, .raw_data = (void *)raw_data, .callchain = &callchain.callchain, + .no_hw_idx = false, .branch_stack = &branch_stack.branch_stack, .user_regs = { .abi = PERF_SAMPLE_REGS_ABI_64, @@ -230,6 +232,9 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) if (sample_type & PERF_SAMPLE_REGS_INTR) evsel.core.attr.sample_regs_intr = sample_regs;
+ if (sample_type & PERF_SAMPLE_BRANCH_STACK) + evsel.core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; + for (i = 0; i < sizeof(regs); i++) *(i + (u8 *)regs) = i & 0xfe;
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h index 88e00d268f6f2..154a05cd03af5 100644 --- a/tools/perf/util/branch.h +++ b/tools/perf/util/branch.h @@ -12,6 +12,7 @@ #include <linux/stddef.h> #include <linux/perf_event.h> #include <linux/types.h> +#include "event.h"
struct branch_flags { u64 mispred:1; @@ -39,9 +40,30 @@ struct branch_entry {
struct branch_stack { u64 nr; + u64 hw_idx; struct branch_entry entries[0]; };
+/* + * The hw_idx is only available when PERF_SAMPLE_BRANCH_HW_INDEX is applied. + * Otherwise, the output format of a sample with branch stack is + * struct branch_stack { + * u64 nr; + * struct branch_entry entries[0]; + * } + * Check whether the hw_idx is available, + * and return the corresponding pointer of entries[0]. + */ +static inline struct branch_entry *perf_sample__branch_entries(struct perf_sample *sample) +{ + u64 *entry = (u64 *)sample->branch_stack; + + entry++; + if (sample->no_hw_idx) + return (struct branch_entry *)entry; + return (struct branch_entry *)(++entry); +} + struct branch_type_stat { bool branch_to; u64 counts[PERF_BR_MAX]; diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index f5a9cb4088080..f9cc15f93c4a7 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1192,6 +1192,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, union perf_event *event = tidq->event_buf; struct dummy_branch_stack { u64 nr; + u64 hw_idx; struct branch_entry entries; } dummy_bs; u64 ip; @@ -1222,6 +1223,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, if (etm->synth_opts.last_branch) { dummy_bs = (struct dummy_branch_stack){ .nr = 1, + .hw_idx = -1ULL, .entries = { .from = sample.ip, .to = sample.addr, diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h index a0a0c91cde4a6..47d1d0b78be10 100644 --- a/tools/perf/util/event.h +++ b/tools/perf/util/event.h @@ -134,6 +134,7 @@ struct perf_sample { u16 insn_len; u8 cpumode; u16 misc; + bool no_hw_idx; /* No hw_idx collected in branch_stack */ char insn[MAX_INSN]; void *raw_data; struct ip_callchain *callchain; diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 9dd9e3f4ef591..ee0ed7067cdb0 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -2126,7 +2126,12 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event,
if (data->branch_stack->nr > max_branch_nr) return -EFAULT; + sz = data->branch_stack->nr * sizeof(struct branch_entry); + if (perf_evsel__has_branch_hw_idx(evsel)) + sz += sizeof(u64); + else + data->no_hw_idx = true; OVERFLOW_CHECK(array, sz, max_size); array = (void *)array + sz; } diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index ddc5ee6f6592b..ae2c5c22357ad 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -382,6 +382,11 @@ static inline bool perf_evsel__has_branch_callstack(const struct evsel *evsel) return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK; }
+static inline bool perf_evsel__has_branch_hw_idx(const struct evsel *evsel) +{ + return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX; +} + static inline bool evsel__has_callchain(const struct evsel *evsel) { return (evsel->core.attr.sample_type & PERF_SAMPLE_CALLCHAIN) != 0; diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 7b6eaf5e0bda5..151b9e43c88f9 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -2572,9 +2572,10 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al, struct perf_sample *sample, bool nonany_branch_mode) { struct branch_info *bi; + struct branch_entry *entries = perf_sample__branch_entries(sample);
/* If we have branch cycles always annotate them. */ - if (bs && bs->nr && bs->entries[0].flags.cycles) { + if (bs && bs->nr && entries[0].flags.cycles) { int i;
bi = sample__resolve_bstack(sample, al); diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index b40832419a279..94f11cfe02364 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -1278,6 +1278,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq) struct perf_sample sample = { .ip = 0, }; struct dummy_branch_stack { u64 nr; + u64 hw_idx; struct branch_entry entries; } dummy_bs;
@@ -1299,6 +1300,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq) if (pt->synth_opts.last_branch && sort__mode == SORT_MODE__BRANCH) { dummy_bs = (struct dummy_branch_stack){ .nr = 1, + .hw_idx = -1ULL, .entries = { .from = sample.ip, .to = sample.addr, diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 8c3addc2e9e1e..0046ca19ca1a4 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2082,15 +2082,16 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample, { unsigned int i; const struct branch_stack *bs = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); struct branch_info *bi = calloc(bs->nr, sizeof(struct branch_info));
if (!bi) return NULL;
for (i = 0; i < bs->nr; i++) { - ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to); - ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from); - bi[i].flags = bs->entries[i].flags; + ip__resolve_ams(al->thread, &bi[i].to, entries[i].to); + ip__resolve_ams(al->thread, &bi[i].from, entries[i].from); + bi[i].flags = entries[i].flags; } return bi; } @@ -2186,6 +2187,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread, /* LBR only affects the user callchain */ if (i != chain_nr) { struct branch_stack *lbr_stack = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); int lbr_nr = lbr_stack->nr, j, k; bool branch; struct branch_flags *flags; @@ -2211,31 +2213,29 @@ static int resolve_lbr_callchain_sample(struct thread *thread, ip = chain->ips[j]; else if (j > i + 1) { k = j - i - 2; - ip = lbr_stack->entries[k].from; + ip = entries[k].from; branch = true; - flags = &lbr_stack->entries[k].flags; + flags = &entries[k].flags; } else { - ip = lbr_stack->entries[0].to; + ip = entries[0].to; branch = true; - flags = &lbr_stack->entries[0].flags; - branch_from = - lbr_stack->entries[0].from; + flags = &entries[0].flags; + branch_from = entries[0].from; } } else { if (j < lbr_nr) { k = lbr_nr - j - 1; - ip = lbr_stack->entries[k].from; + ip = entries[k].from; branch = true; - flags = &lbr_stack->entries[k].flags; + flags = &entries[k].flags; } else if (j > lbr_nr) ip = chain->ips[i + 1 - (j - lbr_nr)]; else { - ip = lbr_stack->entries[0].to; + ip = entries[0].to; branch = true; - flags = &lbr_stack->entries[0].flags; - branch_from = - lbr_stack->entries[0].from; + flags = &entries[0].flags; + branch_from = entries[0].from; } }
@@ -2282,6 +2282,7 @@ static int thread__resolve_callchain_sample(struct thread *thread, int max_stack) { struct branch_stack *branch = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); struct ip_callchain *chain = sample->callchain; int chain_nr = 0; u8 cpumode = PERF_RECORD_MISC_USER; @@ -2329,7 +2330,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
for (i = 0; i < nr; i++) { if (callchain_param.order == ORDER_CALLEE) { - be[i] = branch->entries[i]; + be[i] = entries[i];
if (chain == NULL) continue; @@ -2348,7 +2349,7 @@ static int thread__resolve_callchain_sample(struct thread *thread, be[i].from >= chain->ips[first_call] - 8) first_call++; } else - be[i] = branch->entries[branch->nr - i - 1]; + be[i] = entries[branch->nr - i - 1]; }
memset(iter, 0, sizeof(struct iterations) * nr); diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index 3b02c3f1b2895..2bdd10c4c2460 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -464,6 +464,7 @@ static PyObject *python_process_brstack(struct perf_sample *sample, struct thread *thread) { struct branch_stack *br = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); PyObject *pylist; u64 i;
@@ -484,28 +485,28 @@ static PyObject *python_process_brstack(struct perf_sample *sample, Py_FatalError("couldn't create Python dictionary");
pydict_set_item_string_decref(pyelem, "from", - PyLong_FromUnsignedLongLong(br->entries[i].from)); + PyLong_FromUnsignedLongLong(entries[i].from)); pydict_set_item_string_decref(pyelem, "to", - PyLong_FromUnsignedLongLong(br->entries[i].to)); + PyLong_FromUnsignedLongLong(entries[i].to)); pydict_set_item_string_decref(pyelem, "mispred", - PyBool_FromLong(br->entries[i].flags.mispred)); + PyBool_FromLong(entries[i].flags.mispred)); pydict_set_item_string_decref(pyelem, "predicted", - PyBool_FromLong(br->entries[i].flags.predicted)); + PyBool_FromLong(entries[i].flags.predicted)); pydict_set_item_string_decref(pyelem, "in_tx", - PyBool_FromLong(br->entries[i].flags.in_tx)); + PyBool_FromLong(entries[i].flags.in_tx)); pydict_set_item_string_decref(pyelem, "abort", - PyBool_FromLong(br->entries[i].flags.abort)); + PyBool_FromLong(entries[i].flags.abort)); pydict_set_item_string_decref(pyelem, "cycles", - PyLong_FromUnsignedLongLong(br->entries[i].flags.cycles)); + PyLong_FromUnsignedLongLong(entries[i].flags.cycles));
thread__find_map_fb(thread, sample->cpumode, - br->entries[i].from, &al); + entries[i].from, &al); dsoname = get_dsoname(al.map); pydict_set_item_string_decref(pyelem, "from_dsoname", _PyUnicode_FromString(dsoname));
thread__find_map_fb(thread, sample->cpumode, - br->entries[i].to, &al); + entries[i].to, &al); dsoname = get_dsoname(al.map); pydict_set_item_string_decref(pyelem, "to_dsoname", _PyUnicode_FromString(dsoname)); @@ -561,6 +562,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample, struct thread *thread) { struct branch_stack *br = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); PyObject *pylist; u64 i; char bf[512]; @@ -581,22 +583,22 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample, Py_FatalError("couldn't create Python dictionary");
thread__find_symbol_fb(thread, sample->cpumode, - br->entries[i].from, &al); + entries[i].from, &al); get_symoff(al.sym, &al, true, bf, sizeof(bf)); pydict_set_item_string_decref(pyelem, "from", _PyUnicode_FromString(bf));
thread__find_symbol_fb(thread, sample->cpumode, - br->entries[i].to, &al); + entries[i].to, &al); get_symoff(al.sym, &al, true, bf, sizeof(bf)); pydict_set_item_string_decref(pyelem, "to", _PyUnicode_FromString(bf));
- get_br_mspred(&br->entries[i].flags, bf, sizeof(bf)); + get_br_mspred(&entries[i].flags, bf, sizeof(bf)); pydict_set_item_string_decref(pyelem, "pred", _PyUnicode_FromString(bf));
- if (br->entries[i].flags.in_tx) { + if (entries[i].flags.in_tx) { pydict_set_item_string_decref(pyelem, "in_tx", _PyUnicode_FromString("X")); } else { @@ -604,7 +606,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample, _PyUnicode_FromString("-")); }
- if (br->entries[i].flags.abort) { + if (entries[i].flags.abort) { pydict_set_item_string_decref(pyelem, "abort", _PyUnicode_FromString("A")); } else { diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 01e15b445cb58..2f08e590c03b1 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -1003,6 +1003,7 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample) { struct ip_callchain *callchain = sample->callchain; struct branch_stack *lbr_stack = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); u64 kernel_callchain_nr = callchain->nr; unsigned int i;
@@ -1039,10 +1040,10 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample) i, callchain->ips[i]);
printf("..... %2d: %016" PRIx64 "\n", - (int)(kernel_callchain_nr), lbr_stack->entries[0].to); + (int)(kernel_callchain_nr), entries[0].to); for (i = 0; i < lbr_stack->nr; i++) printf("..... %2d: %016" PRIx64 "\n", - (int)(i + kernel_callchain_nr + 1), lbr_stack->entries[i].from); + (int)(i + kernel_callchain_nr + 1), entries[i].from); } }
@@ -1064,6 +1065,7 @@ static void callchain__printf(struct evsel *evsel,
static void branch_stack__printf(struct perf_sample *sample, bool callstack) { + struct branch_entry *entries = perf_sample__branch_entries(sample); uint64_t i;
printf("%s: nr:%" PRIu64 "\n", @@ -1071,7 +1073,7 @@ static void branch_stack__printf(struct perf_sample *sample, bool callstack) sample->branch_stack->nr);
for (i = 0; i < sample->branch_stack->nr; i++) { - struct branch_entry *e = &sample->branch_stack->entries[i]; + struct branch_entry *e = &entries[i];
if (!callstack) { printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n", diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c index 807cbca403a7d..e5fbece642d3c 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1183,7 +1183,8 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); - sz += sizeof(u64); + /* nr, hw_idx */ + sz += 2 * sizeof(u64); result += sz; }
@@ -1339,7 +1340,8 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); - sz += sizeof(u64); + /* nr, hw_idx */ + sz += 2 * sizeof(u64); memcpy(array, sample->branch_stack, sz); array = (void *)array + sz; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Rogers irogers@google.com
[ Upstream commit c1149037f65bcf0334886180ebe3d5efcf214912 ]
Caught using reference count checking on perf top with "--call-graph=lbr". After this no memory leaks were detected.
Fixes: 57849998e2cd ("perf report: Add processing for cycle histograms") Signed-off-by: Ian Rogers irogers@google.com Cc: K Prateek Nayak kprateek.nayak@amd.com Cc: Ravi Bangoria ravi.bangoria@amd.com Cc: Sandipan Das sandipan.das@amd.com Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: German Gomez german.gomez@arm.com Cc: James Clark james.clark@arm.com Cc: Nick Terrell terrelln@fb.com Cc: Sean Christopherson seanjc@google.com Cc: Changbin Du changbin.du@huawei.com Cc: liuwenyu liuwenyu7@huawei.com Cc: Yang Jihong yangjihong1@huawei.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Miguel Ojeda ojeda@kernel.org Cc: Song Liu song@kernel.org Cc: Leo Yan leo.yan@linaro.org Cc: Kajol Jain kjain@linux.ibm.com Cc: Andi Kleen ak@linux.intel.com Cc: Kan Liang kan.liang@linux.intel.com Cc: Athira Rajeev atrajeev@linux.vnet.ibm.com Cc: Yanteng Si siyanteng@loongson.cn Cc: Liam Howlett liam.howlett@oracle.com Cc: Paolo Bonzini pbonzini@redhat.com Link: https://lore.kernel.org/r/20231024222353.3024098-6-irogers@google.com Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/hist.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 151b9e43c88f9..9a02c1fd83493 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -2576,8 +2576,6 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
/* If we have branch cycles always annotate them. */ if (bs && bs->nr && entries[0].flags.cycles) { - int i; - bi = sample__resolve_bstack(sample, al); if (bi) { struct addr_map_symbol *prev = NULL; @@ -2592,12 +2590,18 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al, * Note that perf stores branches reversed from * program order! */ - for (i = bs->nr - 1; i >= 0; i--) { + for (int i = bs->nr - 1; i >= 0; i--) { addr_map_symbol__account_cycles(&bi[i].from, nonany_branch_mode ? NULL : prev, bi[i].flags.cycles); prev = &bi[i].to; } + for (unsigned int i = 0; i < bs->nr; i++) { + map__put(bi[i].to.ms.map); + maps__put(bi[i].to.ms.maps); + map__put(bi[i].from.ms.map); + maps__put(bi[i].from.ms.maps); + } free(bi); } }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Olga Kornievskaia kolga@netapp.com
[ Upstream commit 5cc7688bae7f0757c39c1d3dfdd827b724061067 ]
If the client is doing pnfs IO and Kerberos is configured and EXCHANGEID successfully negotiated SP4_MACH_CRED and WRITE/COMMIT are on the list of state protected operations, then we need to make sure to choose the DS's rpc_client structure instead of the MDS's one.
Fixes: fb91fb0ee7b2 ("NFS: Move call to nfs4_state_protect_write() to nfs4_write_setup()") Signed-off-by: Olga Kornievskaia kolga@netapp.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4proc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index c41d149626047..b7529656b4307 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -5369,7 +5369,7 @@ static void nfs4_proc_write_setup(struct nfs_pgio_header *hdr,
msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_WRITE]; nfs4_init_sequence(&hdr->args.seq_args, &hdr->res.seq_res, 0, 0); - nfs4_state_protect_write(server->nfs_client, clnt, msg, hdr); + nfs4_state_protect_write(hdr->ds_clp ? hdr->ds_clp : server->nfs_client, clnt, msg, hdr); }
static void nfs4_proc_commit_rpc_prepare(struct rpc_task *task, struct nfs_commit_data *data) @@ -5410,7 +5410,8 @@ static void nfs4_proc_commit_setup(struct nfs_commit_data *data, struct rpc_mess data->res.server = server; msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_COMMIT]; nfs4_init_sequence(&data->args.seq_args, &data->res.seq_res, 1, 0); - nfs4_state_protect(server->nfs_client, NFS_SP4_MACH_CRED_COMMIT, clnt, msg); + nfs4_state_protect(data->ds_clp ? data->ds_clp : server->nfs_client, + NFS_SP4_MACH_CRED_COMMIT, clnt, msg); }
static int _nfs4_proc_commit(struct file *dst, struct nfs_commitargs *args,
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 18f039428c7df183b09c69ebf10ffd4e521035d2 ]
Inspired by syzbot reports using a stack of multiple ipvlan devices.
Reduce stack size needed in ipvlan_process_v6_outbound() by moving the flowi6 struct used for the route lookup in an non inlined helper. ipvlan_route_v6_outbound() needs 120 bytes on the stack, immediately reclaimed.
Also make sure ipvlan_process_v4_outbound() is not inlined.
We might also have to lower MAX_NEST_DEV, because only syzbot uses setups with more than four stacked devices.
BUG: TASK stack guard page was hit at ffffc9000e803ff8 (stack is ffffc9000e804000..ffffc9000e808000) stack guard page: 0000 [#1] SMP KASAN CPU: 0 PID: 13442 Comm: syz-executor.4 Not tainted 6.1.52-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023 RIP: 0010:kasan_check_range+0x4/0x2a0 mm/kasan/generic.c:188 Code: 48 01 c6 48 89 c7 e8 db 4e c1 03 31 c0 5d c3 cc 0f 0b eb 02 0f 0b b8 ea ff ff ff 5d c3 cc 00 00 cc cc 00 00 cc cc 55 48 89 e5 <41> 57 41 56 41 55 41 54 53 b0 01 48 85 f6 0f 84 a4 01 00 00 48 89 RSP: 0018:ffffc9000e804000 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff817e5bf2 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff887c6568 RBP: ffffc9000e804000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: dffffc0000000001 R12: 1ffff92001d0080c R13: dffffc0000000000 R14: ffffffff87e6b100 R15: 0000000000000000 FS: 00007fd0c55826c0(0000) GS:ffff8881f6800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc9000e803ff8 CR3: 0000000170ef7000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <#DF> </#DF> <TASK> [<ffffffff81f281d1>] __kasan_check_read+0x11/0x20 mm/kasan/shadow.c:31 [<ffffffff817e5bf2>] instrument_atomic_read include/linux/instrumented.h:72 [inline] [<ffffffff817e5bf2>] _test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline] [<ffffffff817e5bf2>] cpumask_test_cpu include/linux/cpumask.h:506 [inline] [<ffffffff817e5bf2>] cpu_online include/linux/cpumask.h:1092 [inline] [<ffffffff817e5bf2>] trace_lock_acquire include/trace/events/lock.h:24 [inline] [<ffffffff817e5bf2>] lock_acquire+0xe2/0x590 kernel/locking/lockdep.c:5632 [<ffffffff8563221e>] rcu_lock_acquire+0x2e/0x40 include/linux/rcupdate.h:306 [<ffffffff8561464d>] rcu_read_lock include/linux/rcupdate.h:747 [inline] [<ffffffff8561464d>] ip6_pol_route+0x15d/0x1440 net/ipv6/route.c:2221 [<ffffffff85618120>] ip6_pol_route_output+0x50/0x80 net/ipv6/route.c:2606 [<ffffffff856f65b5>] pol_lookup_func include/net/ip6_fib.h:584 [inline] [<ffffffff856f65b5>] fib6_rule_lookup+0x265/0x620 net/ipv6/fib6_rules.c:116 [<ffffffff85618009>] ip6_route_output_flags_noref+0x2d9/0x3a0 net/ipv6/route.c:2638 [<ffffffff8561821a>] ip6_route_output_flags+0xca/0x340 net/ipv6/route.c:2651 [<ffffffff838bd5a3>] ip6_route_output include/net/ip6_route.h:100 [inline] [<ffffffff838bd5a3>] ipvlan_process_v6_outbound drivers/net/ipvlan/ipvlan_core.c:473 [inline] [<ffffffff838bd5a3>] ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:529 [inline] [<ffffffff838bd5a3>] ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline] [<ffffffff838bd5a3>] ipvlan_queue_xmit+0xc33/0x1be0 drivers/net/ipvlan/ipvlan_core.c:677 [<ffffffff838c2909>] ipvlan_start_xmit+0x49/0x100 drivers/net/ipvlan/ipvlan_main.c:229 [<ffffffff84d03900>] netdev_start_xmit include/linux/netdevice.h:4966 [inline] [<ffffffff84d03900>] xmit_one net/core/dev.c:3644 [inline] [<ffffffff84d03900>] dev_hard_start_xmit+0x320/0x980 net/core/dev.c:3660 [<ffffffff84d080e2>] __dev_queue_xmit+0x16b2/0x3370 net/core/dev.c:4324 [<ffffffff855ce4cd>] dev_queue_xmit include/linux/netdevice.h:3067 [inline] [<ffffffff855ce4cd>] neigh_hh_output include/net/neighbour.h:529 [inline] [<ffffffff855ce4cd>] neigh_output include/net/neighbour.h:543 [inline] [<ffffffff855ce4cd>] ip6_finish_output2+0x160d/0x1ae0 net/ipv6/ip6_output.c:139 [<ffffffff855b8616>] __ip6_finish_output net/ipv6/ip6_output.c:200 [inline] [<ffffffff855b8616>] ip6_finish_output+0x6c6/0xb10 net/ipv6/ip6_output.c:211 [<ffffffff855b7e3c>] NF_HOOK_COND include/linux/netfilter.h:298 [inline] [<ffffffff855b7e3c>] ip6_output+0x2bc/0x3d0 net/ipv6/ip6_output.c:232 [<ffffffff8575d27f>] dst_output include/net/dst.h:444 [inline] [<ffffffff8575d27f>] ip6_local_out+0x10f/0x140 net/ipv6/output_core.c:161 [<ffffffff838bdae4>] ipvlan_process_v6_outbound drivers/net/ipvlan/ipvlan_core.c:483 [inline] [<ffffffff838bdae4>] ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:529 [inline] [<ffffffff838bdae4>] ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline] [<ffffffff838bdae4>] ipvlan_queue_xmit+0x1174/0x1be0 drivers/net/ipvlan/ipvlan_core.c:677 [<ffffffff838c2909>] ipvlan_start_xmit+0x49/0x100 drivers/net/ipvlan/ipvlan_main.c:229 [<ffffffff84d03900>] netdev_start_xmit include/linux/netdevice.h:4966 [inline] [<ffffffff84d03900>] xmit_one net/core/dev.c:3644 [inline] [<ffffffff84d03900>] dev_hard_start_xmit+0x320/0x980 net/core/dev.c:3660 [<ffffffff84d080e2>] __dev_queue_xmit+0x16b2/0x3370 net/core/dev.c:4324 [<ffffffff855ce4cd>] dev_queue_xmit include/linux/netdevice.h:3067 [inline] [<ffffffff855ce4cd>] neigh_hh_output include/net/neighbour.h:529 [inline] [<ffffffff855ce4cd>] neigh_output include/net/neighbour.h:543 [inline] [<ffffffff855ce4cd>] ip6_finish_output2+0x160d/0x1ae0 net/ipv6/ip6_output.c:139 [<ffffffff855b8616>] __ip6_finish_output net/ipv6/ip6_output.c:200 [inline] [<ffffffff855b8616>] ip6_finish_output+0x6c6/0xb10 net/ipv6/ip6_output.c:211 [<ffffffff855b7e3c>] NF_HOOK_COND include/linux/netfilter.h:298 [inline] [<ffffffff855b7e3c>] ip6_output+0x2bc/0x3d0 net/ipv6/ip6_output.c:232 [<ffffffff8575d27f>] dst_output include/net/dst.h:444 [inline] [<ffffffff8575d27f>] ip6_local_out+0x10f/0x140 net/ipv6/output_core.c:161 [<ffffffff838bdae4>] ipvlan_process_v6_outbound drivers/net/ipvlan/ipvlan_core.c:483 [inline] [<ffffffff838bdae4>] ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:529 [inline] [<ffffffff838bdae4>] ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline] [<ffffffff838bdae4>] ipvlan_queue_xmit+0x1174/0x1be0 drivers/net/ipvlan/ipvlan_core.c:677 [<ffffffff838c2909>] ipvlan_start_xmit+0x49/0x100 drivers/net/ipvlan/ipvlan_main.c:229 [<ffffffff84d03900>] netdev_start_xmit include/linux/netdevice.h:4966 [inline] [<ffffffff84d03900>] xmit_one net/core/dev.c:3644 [inline] [<ffffffff84d03900>] dev_hard_start_xmit+0x320/0x980 net/core/dev.c:3660 [<ffffffff84d080e2>] __dev_queue_xmit+0x16b2/0x3370 net/core/dev.c:4324 [<ffffffff855ce4cd>] dev_queue_xmit include/linux/netdevice.h:3067 [inline] [<ffffffff855ce4cd>] neigh_hh_output include/net/neighbour.h:529 [inline] [<ffffffff855ce4cd>] neigh_output include/net/neighbour.h:543 [inline] [<ffffffff855ce4cd>] ip6_finish_output2+0x160d/0x1ae0 net/ipv6/ip6_output.c:139 [<ffffffff855b8616>] __ip6_finish_output net/ipv6/ip6_output.c:200 [inline] [<ffffffff855b8616>] ip6_finish_output+0x6c6/0xb10 net/ipv6/ip6_output.c:211 [<ffffffff855b7e3c>] NF_HOOK_COND include/linux/netfilter.h:298 [inline] [<ffffffff855b7e3c>] ip6_output+0x2bc/0x3d0 net/ipv6/ip6_output.c:232 [<ffffffff8575d27f>] dst_output include/net/dst.h:444 [inline] [<ffffffff8575d27f>] ip6_local_out+0x10f/0x140 net/ipv6/output_core.c:161 [<ffffffff838bdae4>] ipvlan_process_v6_outbound drivers/net/ipvlan/ipvlan_core.c:483 [inline] [<ffffffff838bdae4>] ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:529 [inline] [<ffffffff838bdae4>] ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline] [<ffffffff838bdae4>] ipvlan_queue_xmit+0x1174/0x1be0 drivers/net/ipvlan/ipvlan_core.c:677 [<ffffffff838c2909>] ipvlan_start_xmit+0x49/0x100 drivers/net/ipvlan/ipvlan_main.c:229 [<ffffffff84d03900>] netdev_start_xmit include/linux/netdevice.h:4966 [inline] [<ffffffff84d03900>] xmit_one net/core/dev.c:3644 [inline] [<ffffffff84d03900>] dev_hard_start_xmit+0x320/0x980 net/core/dev.c:3660 [<ffffffff84d080e2>] __dev_queue_xmit+0x16b2/0x3370 net/core/dev.c:4324 [<ffffffff855ce4cd>] dev_queue_xmit include/linux/netdevice.h:3067 [inline] [<ffffffff855ce4cd>] neigh_hh_output include/net/neighbour.h:529 [inline] [<ffffffff855ce4cd>] neigh_output include/net/neighbour.h:543 [inline] [<ffffffff855ce4cd>] ip6_finish_output2+0x160d/0x1ae0 net/ipv6/ip6_output.c:139 [<ffffffff855b8616>] __ip6_finish_output net/ipv6/ip6_output.c:200 [inline] [<ffffffff855b8616>] ip6_finish_output+0x6c6/0xb10 net/ipv6/ip6_output.c:211 [<ffffffff855b7e3c>] NF_HOOK_COND include/linux/netfilter.h:298 [inline] [<ffffffff855b7e3c>] ip6_output+0x2bc/0x3d0 net/ipv6/ip6_output.c:232 [<ffffffff8575d27f>] dst_output include/net/dst.h:444 [inline] [<ffffffff8575d27f>] ip6_local_out+0x10f/0x140 net/ipv6/output_core.c:161 [<ffffffff838bdae4>] ipvlan_process_v6_outbound drivers/net/ipvlan/ipvlan_core.c:483 [inline] [<ffffffff838bdae4>] ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:529 [inline] [<ffffffff838bdae4>] ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline] [<ffffffff838bdae4>] ipvlan_queue_xmit+0x1174/0x1be0 drivers/net/ipvlan/ipvlan_core.c:677 [<ffffffff838c2909>] ipvlan_start_xmit+0x49/0x100 drivers/net/ipvlan/ipvlan_main.c:229 [<ffffffff84d03900>] netdev_start_xmit include/linux/netdevice.h:4966 [inline] [<ffffffff84d03900>] xmit_one net/core/dev.c:3644 [inline] [<ffffffff84d03900>] dev_hard_start_xmit+0x320/0x980 net/core/dev.c:3660 [<ffffffff84d080e2>] __dev_queue_xmit+0x16b2/0x3370 net/core/dev.c:4324 [<ffffffff84d4a65e>] dev_queue_xmit include/linux/netdevice.h:3067 [inline] [<ffffffff84d4a65e>] neigh_resolve_output+0x64e/0x750 net/core/neighbour.c:1560 [<ffffffff855ce503>] neigh_output include/net/neighbour.h:545 [inline] [<ffffffff855ce503>] ip6_finish_output2+0x1643/0x1ae0 net/ipv6/ip6_output.c:139 [<ffffffff855b8616>] __ip6_finish_output net/ipv6/ip6_output.c:200 [inline] [<ffffffff855b8616>] ip6_finish_output+0x6c6/0xb10 net/ipv6/ip6_output.c:211 [<ffffffff855b7e3c>] NF_HOOK_COND include/linux/netfilter.h:298 [inline] [<ffffffff855b7e3c>] ip6_output+0x2bc/0x3d0 net/ipv6/ip6_output.c:232 [<ffffffff855b9ce4>] dst_output include/net/dst.h:444 [inline] [<ffffffff855b9ce4>] NF_HOOK include/linux/netfilter.h:309 [inline] [<ffffffff855b9ce4>] ip6_xmit+0x11a4/0x1b20 net/ipv6/ip6_output.c:352 [<ffffffff8597984e>] sctp_v6_xmit+0x9ae/0x1230 net/sctp/ipv6.c:250 [<ffffffff8594623e>] sctp_packet_transmit+0x25de/0x2bc0 net/sctp/output.c:653 [<ffffffff858f5142>] sctp_packet_singleton+0x202/0x310 net/sctp/outqueue.c:783 [<ffffffff858ea411>] sctp_outq_flush_ctrl net/sctp/outqueue.c:914 [inline] [<ffffffff858ea411>] sctp_outq_flush+0x661/0x3d40 net/sctp/outqueue.c:1212 [<ffffffff858f02f9>] sctp_outq_uncork+0x79/0xb0 net/sctp/outqueue.c:764 [<ffffffff8589f060>] sctp_side_effects net/sctp/sm_sideeffect.c:1199 [inline] [<ffffffff8589f060>] sctp_do_sm+0x55c0/0x5c30 net/sctp/sm_sideeffect.c:1170 [<ffffffff85941567>] sctp_primitive_ASSOCIATE+0x97/0xc0 net/sctp/primitive.c:73 [<ffffffff859408b2>] sctp_sendmsg_to_asoc+0xf62/0x17b0 net/sctp/socket.c:1839 [<ffffffff85910b5e>] sctp_sendmsg+0x212e/0x33b0 net/sctp/socket.c:2029 [<ffffffff8544d559>] inet_sendmsg+0x149/0x310 net/ipv4/af_inet.c:849 [<ffffffff84c6c4d2>] sock_sendmsg_nosec net/socket.c:716 [inline] [<ffffffff84c6c4d2>] sock_sendmsg net/socket.c:736 [inline] [<ffffffff84c6c4d2>] ____sys_sendmsg+0x572/0x8c0 net/socket.c:2504 [<ffffffff84c6ca91>] ___sys_sendmsg net/socket.c:2558 [inline] [<ffffffff84c6ca91>] __sys_sendmsg+0x271/0x360 net/socket.c:2587 [<ffffffff84c6cbff>] __do_sys_sendmsg net/socket.c:2596 [inline] [<ffffffff84c6cbff>] __se_sys_sendmsg net/socket.c:2594 [inline] [<ffffffff84c6cbff>] __x64_sys_sendmsg+0x7f/0x90 net/socket.c:2594 [<ffffffff85b32553>] do_syscall_x64 arch/x86/entry/common.c:51 [inline] [<ffffffff85b32553>] do_syscall_64+0x53/0x80 arch/x86/entry/common.c:84 [<ffffffff85c00087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Mahesh Bandewar maheshb@google.com Cc: Willem de Bruijn willemb@google.com Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ipvlan/ipvlan_core.c | 41 +++++++++++++++++++------------- 1 file changed, 25 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c index b5a61b16a7eab..bfea28bd45027 100644 --- a/drivers/net/ipvlan/ipvlan_core.c +++ b/drivers/net/ipvlan/ipvlan_core.c @@ -412,7 +412,7 @@ struct ipvl_addr *ipvlan_addr_lookup(struct ipvl_port *port, void *lyr3h, return addr; }
-static int ipvlan_process_v4_outbound(struct sk_buff *skb) +static noinline_for_stack int ipvlan_process_v4_outbound(struct sk_buff *skb) { const struct iphdr *ip4h = ip_hdr(skb); struct net_device *dev = skb->dev; @@ -454,13 +454,11 @@ static int ipvlan_process_v4_outbound(struct sk_buff *skb) }
#if IS_ENABLED(CONFIG_IPV6) -static int ipvlan_process_v6_outbound(struct sk_buff *skb) + +static noinline_for_stack int +ipvlan_route_v6_outbound(struct net_device *dev, struct sk_buff *skb) { const struct ipv6hdr *ip6h = ipv6_hdr(skb); - struct net_device *dev = skb->dev; - struct net *net = dev_net(dev); - struct dst_entry *dst; - int err, ret = NET_XMIT_DROP; struct flowi6 fl6 = { .flowi6_oif = dev->ifindex, .daddr = ip6h->daddr, @@ -470,27 +468,38 @@ static int ipvlan_process_v6_outbound(struct sk_buff *skb) .flowi6_mark = skb->mark, .flowi6_proto = ip6h->nexthdr, }; + struct dst_entry *dst; + int err;
- dst = ip6_route_output(net, NULL, &fl6); - if (dst->error) { - ret = dst->error; + dst = ip6_route_output(dev_net(dev), NULL, &fl6); + err = dst->error; + if (err) { dst_release(dst); - goto err; + return err; } skb_dst_set(skb, dst); + return 0; +} + +static int ipvlan_process_v6_outbound(struct sk_buff *skb) +{ + struct net_device *dev = skb->dev; + int err, ret = NET_XMIT_DROP; + + err = ipvlan_route_v6_outbound(dev, skb); + if (unlikely(err)) { + DEV_STATS_INC(dev, tx_errors); + kfree_skb(skb); + return err; + }
memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
- err = ip6_local_out(net, skb->sk, skb); + err = ip6_local_out(dev_net(dev), skb->sk, skb); if (unlikely(net_xmit_eval(err))) DEV_STATS_INC(dev, tx_errors); else ret = NET_XMIT_SUCCESS; - goto out; -err: - DEV_STATS_INC(dev, tx_errors); - kfree_skb(skb); -out: return ret; } #else
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shigeru Yoshida syoshida@redhat.com
[ Upstream commit 719639853d88071dfdfd8d9971eca9c283ff314c ]
KMSAN reported the following uninit-value access issue:
===================================================== BUG: KMSAN: uninit-value in ppp_sync_input drivers/net/ppp/ppp_synctty.c:690 [inline] BUG: KMSAN: uninit-value in ppp_sync_receive+0xdc9/0xe70 drivers/net/ppp/ppp_synctty.c:334 ppp_sync_input drivers/net/ppp/ppp_synctty.c:690 [inline] ppp_sync_receive+0xdc9/0xe70 drivers/net/ppp/ppp_synctty.c:334 tiocsti+0x328/0x450 drivers/tty/tty_io.c:2295 tty_ioctl+0x808/0x1920 drivers/tty/tty_io.c:2694 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl+0x211/0x400 fs/ioctl.c:857 __x64_sys_ioctl+0x97/0xe0 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was created at: __alloc_pages+0x75d/0xe80 mm/page_alloc.c:4591 __alloc_pages_node include/linux/gfp.h:238 [inline] alloc_pages_node include/linux/gfp.h:261 [inline] __page_frag_cache_refill+0x9a/0x2c0 mm/page_alloc.c:4691 page_frag_alloc_align+0x91/0x5d0 mm/page_alloc.c:4722 page_frag_alloc include/linux/gfp.h:322 [inline] __netdev_alloc_skb+0x215/0x6d0 net/core/skbuff.c:728 netdev_alloc_skb include/linux/skbuff.h:3225 [inline] dev_alloc_skb include/linux/skbuff.h:3238 [inline] ppp_sync_input drivers/net/ppp/ppp_synctty.c:669 [inline] ppp_sync_receive+0x237/0xe70 drivers/net/ppp/ppp_synctty.c:334 tiocsti+0x328/0x450 drivers/tty/tty_io.c:2295 tty_ioctl+0x808/0x1920 drivers/tty/tty_io.c:2694 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl+0x211/0x400 fs/ioctl.c:857 __x64_sys_ioctl+0x97/0xe0 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b
CPU: 0 PID: 12950 Comm: syz-executor.1 Not tainted 6.6.0-14500-g1c41041124bd #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 =====================================================
ppp_sync_input() checks the first 2 bytes of the data are PPP_ALLSTATIONS and PPP_UI. However, if the data length is 1 and the first byte is PPP_ALLSTATIONS, an access to an uninitialized value occurs when checking PPP_UI. This patch resolves this issue by checking the data length.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Shigeru Yoshida syoshida@redhat.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ppp/ppp_synctty.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c index 0f338752c38b9..d5af6b06a66a4 100644 --- a/drivers/net/ppp/ppp_synctty.c +++ b/drivers/net/ppp/ppp_synctty.c @@ -698,7 +698,7 @@ ppp_sync_input(struct syncppp *ap, const unsigned char *buf,
/* strip address/control field if present */ p = skb->data; - if (p[0] == PPP_ALLSTATIONS && p[1] == PPP_UI) { + if (skb->len >= 2 && p[0] == PPP_ALLSTATIONS && p[1] == PPP_UI) { /* chop off address/control */ if (skb->len < 3) goto err;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yonglong Liu liuyonglong@huawei.com
[ Upstream commit dbd2f3b20c6ae425665b6975d766e3653d453e73 ]
When a VF is calling hns3_init_mac_addr(), get_mac_addr() may return fail, then the value of mac_addr_temp is not initialized.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Yonglong Liu liuyonglong@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index ffd1018d43fbe..d09cc10b3517f 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -3773,7 +3773,7 @@ static int hns3_init_mac_addr(struct net_device *netdev, bool init) { struct hns3_nic_priv *priv = netdev_priv(netdev); struct hnae3_handle *h = priv->ae_handle; - u8 mac_addr_temp[ETH_ALEN]; + u8 mac_addr_temp[ETH_ALEN] = {0}; int ret = 0;
if (h->ae_algo->ops->get_mac_addr && init) {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shigeru Yoshida syoshida@redhat.com
[ Upstream commit fb317eb23b5ee4c37b0656a9a52a3db58d9dd072 ]
KMSAN reported the following kernel-infoleak issue:
===================================================== BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak in copy_to_user_iter lib/iov_iter.c:24 [inline] BUG: KMSAN: kernel-infoleak in iterate_ubuf include/linux/iov_iter.h:29 [inline] BUG: KMSAN: kernel-infoleak in iterate_and_advance2 include/linux/iov_iter.h:245 [inline] BUG: KMSAN: kernel-infoleak in iterate_and_advance include/linux/iov_iter.h:271 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x4ec/0x2bc0 lib/iov_iter.c:186 instrument_copy_to_user include/linux/instrumented.h:114 [inline] copy_to_user_iter lib/iov_iter.c:24 [inline] iterate_ubuf include/linux/iov_iter.h:29 [inline] iterate_and_advance2 include/linux/iov_iter.h:245 [inline] iterate_and_advance include/linux/iov_iter.h:271 [inline] _copy_to_iter+0x4ec/0x2bc0 lib/iov_iter.c:186 copy_to_iter include/linux/uio.h:197 [inline] simple_copy_to_iter net/core/datagram.c:532 [inline] __skb_datagram_iter.5+0x148/0xe30 net/core/datagram.c:420 skb_copy_datagram_iter+0x52/0x210 net/core/datagram.c:546 skb_copy_datagram_msg include/linux/skbuff.h:3960 [inline] netlink_recvmsg+0x43d/0x1630 net/netlink/af_netlink.c:1967 sock_recvmsg_nosec net/socket.c:1044 [inline] sock_recvmsg net/socket.c:1066 [inline] __sys_recvfrom+0x476/0x860 net/socket.c:2246 __do_sys_recvfrom net/socket.c:2264 [inline] __se_sys_recvfrom net/socket.c:2260 [inline] __x64_sys_recvfrom+0x130/0x200 net/socket.c:2260 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was created at: slab_post_alloc_hook+0x103/0x9e0 mm/slab.h:768 slab_alloc_node mm/slub.c:3478 [inline] kmem_cache_alloc_node+0x5f7/0xb50 mm/slub.c:3523 kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:560 __alloc_skb+0x2fd/0x770 net/core/skbuff.c:651 alloc_skb include/linux/skbuff.h:1286 [inline] tipc_tlv_alloc net/tipc/netlink_compat.c:156 [inline] tipc_get_err_tlv+0x90/0x5d0 net/tipc/netlink_compat.c:170 tipc_nl_compat_recv+0x1042/0x15d0 net/tipc/netlink_compat.c:1324 genl_family_rcv_msg_doit net/netlink/genetlink.c:972 [inline] genl_family_rcv_msg net/netlink/genetlink.c:1052 [inline] genl_rcv_msg+0x1220/0x12c0 net/netlink/genetlink.c:1067 netlink_rcv_skb+0x4a4/0x6a0 net/netlink/af_netlink.c:2545 genl_rcv+0x41/0x60 net/netlink/genetlink.c:1076 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline] netlink_unicast+0xf4b/0x1230 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x1242/0x1420 net/netlink/af_netlink.c:1910 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x997/0xd60 net/socket.c:2588 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2642 __sys_sendmsg net/socket.c:2671 [inline] __do_sys_sendmsg net/socket.c:2680 [inline] __se_sys_sendmsg net/socket.c:2678 [inline] __x64_sys_sendmsg+0x2fa/0x4a0 net/socket.c:2678 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Bytes 34-35 of 36 are uninitialized Memory access of size 36 starts at ffff88802d464a00 Data copied to user address 00007ff55033c0a0
CPU: 0 PID: 30322 Comm: syz-executor.0 Not tainted 6.6.0-14500-g1c41041124bd #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 =====================================================
tipc_add_tlv() puts TLV descriptor and value onto `skb`. This size is calculated with TLV_SPACE() macro. It adds the size of struct tlv_desc and the length of TLV value passed as an argument, and aligns the result to a multiple of TLV_ALIGNTO, i.e., a multiple of 4 bytes.
If the size of struct tlv_desc plus the length of TLV value is not aligned, the current implementation leaves the remaining bytes uninitialized. This is the cause of the above kernel-infoleak issue.
This patch resolves this issue by clearing data up to an aligned size.
Fixes: d0796d1ef63d ("tipc: convert legacy nl bearer dump to nl compat") Signed-off-by: Shigeru Yoshida syoshida@redhat.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/netlink_compat.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/tipc/netlink_compat.c b/net/tipc/netlink_compat.c index bef28e900b3ed..5c61b8ee7fc09 100644 --- a/net/tipc/netlink_compat.c +++ b/net/tipc/netlink_compat.c @@ -101,6 +101,7 @@ static int tipc_add_tlv(struct sk_buff *skb, u16 type, void *data, u16 len) return -EMSGSIZE;
skb_put(skb, TLV_SPACE(len)); + memset(tlv, 0, TLV_SPACE(len)); tlv->tlv_type = htons(type); tlv->tlv_len = htons(TLV_LENGTH(len)); if (len && data)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
[ Upstream commit c0a2a1b0d631fc460d830f52d06211838874d655 ]
ppp_sync_ioctl allows setting device MRU, but does not sanity check this input.
Limit to a sane upper bound of 64KB.
No implementation I could find generates larger than 64KB frames. RFC 2823 mentions an upper bound of PPP over SDL of 64KB based on the 16-bit length field. Other protocols will be smaller, such as PPPoE (9KB jumbo frame) and PPPoA (18190 maximum CPCS-SDU size, RFC 2364). PPTP and L2TP encapsulate in IP.
Syzbot managed to trigger alloc warning in __alloc_pages:
if (WARN_ON_ONCE_GFP(order > MAX_ORDER, gfp))
WARNING: CPU: 1 PID: 37 at mm/page_alloc.c:4544 __alloc_pages+0x3ab/0x4a0 mm/page_alloc.c:4544
__alloc_skb+0x12b/0x330 net/core/skbuff.c:651 __netdev_alloc_skb+0x72/0x3f0 net/core/skbuff.c:715 netdev_alloc_skb include/linux/skbuff.h:3225 [inline] dev_alloc_skb include/linux/skbuff.h:3238 [inline] ppp_sync_input drivers/net/ppp/ppp_synctty.c:669 [inline] ppp_sync_receive+0xff/0x680 drivers/net/ppp/ppp_synctty.c:334 tty_ldisc_receive_buf+0x14c/0x180 drivers/tty/tty_buffer.c:390 tty_port_default_receive_buf+0x70/0xb0 drivers/tty/tty_port.c:37 receive_buf drivers/tty/tty_buffer.c:444 [inline] flush_to_ldisc+0x261/0x780 drivers/tty/tty_buffer.c:494 process_one_work+0x884/0x15c0 kernel/workqueue.c:2630
With call
ioctl$PPPIOCSMRU1(r1, 0x40047452, &(0x7f0000000100)=0x5e6417a8)
Similar code exists in other drivers that implement ppp_channel_ops ioctl PPPIOCSMRU. Those might also be in scope. Notably excluded from this are pppol2tp_ioctl and pppoe_ioctl.
This code goes back to the start of git history.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+6177e1f90d92583bcc58@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn willemb@google.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ppp/ppp_synctty.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c index d5af6b06a66a4..55641e01192dd 100644 --- a/drivers/net/ppp/ppp_synctty.c +++ b/drivers/net/ppp/ppp_synctty.c @@ -463,6 +463,10 @@ ppp_sync_ioctl(struct ppp_channel *chan, unsigned int cmd, unsigned long arg) case PPPIOCSMRU: if (get_user(val, (int __user *) argp)) break; + if (val > U16_MAX) { + err = -EINVAL; + break; + } if (val < PPP_MRU) val = PPP_MRU; ap->mru = val;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
[ Upstream commit 47d970204054f859f35a2237baa75c2d84fcf436 ]
When delaying eoi handling of events, the related elements are queued into the percpu lateeoi list. In case the list isn't empty, the elements should be sorted by the time when eoi handling is to happen.
Unfortunately a new element will never be queued at the start of the list, even if it has a handling time lower than all other list elements.
Fix that by handling that case the same way as for an empty list.
Fixes: e99502f76271 ("xen/events: defer eoi in case of excessive number of events") Reported-by: Jan Beulich jbeulich@suse.com Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Oleksandr Tyshchenko oleksandr_tyshchenko@epam.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/xen/events/events_base.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c index 230e77f9637cd..91806dc1236de 100644 --- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -491,7 +491,9 @@ static void lateeoi_list_add(struct irq_info *info)
spin_lock_irqsave(&eoi->eoi_list_lock, flags);
- if (list_empty(&eoi->eoi_list)) { + elem = list_first_entry_or_null(&eoi->eoi_list, struct irq_info, + eoi_list); + if (!elem || info->eoi_time < elem->eoi_time) { list_add(&info->eoi_list, &eoi->eoi_list); mod_delayed_work_on(info->eoi_cpu, system_wq, &eoi->delayed, delay);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 73bde5a3294853947252cd9092a3517c7cb0cd2d ]
As I was working on a syzbot report, I found that KCSAN would probably complain that reading q->head or q->tail without barriers could lead to invalid results.
Add corresponding READ_ONCE() and WRITE_ONCE() to avoid load-store tearing.
Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.") Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Richard Cochran richardcochran@gmail.com Link: https://lore.kernel.org/r/20231109174859.3995880-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ptp/ptp_chardev.c | 3 ++- drivers/ptp/ptp_clock.c | 5 +++-- drivers/ptp/ptp_private.h | 8 ++++++-- drivers/ptp/ptp_sysfs.c | 3 ++- 4 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c index 9d72ab593f13f..87bd6c072ac2f 100644 --- a/drivers/ptp/ptp_chardev.c +++ b/drivers/ptp/ptp_chardev.c @@ -443,7 +443,8 @@ ssize_t ptp_read(struct posix_clock *pc,
for (i = 0; i < cnt; i++) { event[i] = queue->buf[queue->head]; - queue->head = (queue->head + 1) % PTP_MAX_TIMESTAMPS; + /* Paired with READ_ONCE() in queue_cnt() */ + WRITE_ONCE(queue->head, (queue->head + 1) % PTP_MAX_TIMESTAMPS); }
spin_unlock_irqrestore(&queue->lock, flags); diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c index eedf067ee8e35..a6ff02a02cab1 100644 --- a/drivers/ptp/ptp_clock.c +++ b/drivers/ptp/ptp_clock.c @@ -55,10 +55,11 @@ static void enqueue_external_timestamp(struct timestamp_event_queue *queue, dst->t.sec = seconds; dst->t.nsec = remainder;
+ /* Both WRITE_ONCE() are paired with READ_ONCE() in queue_cnt() */ if (!queue_free(queue)) - queue->head = (queue->head + 1) % PTP_MAX_TIMESTAMPS; + WRITE_ONCE(queue->head, (queue->head + 1) % PTP_MAX_TIMESTAMPS);
- queue->tail = (queue->tail + 1) % PTP_MAX_TIMESTAMPS; + WRITE_ONCE(queue->tail, (queue->tail + 1) % PTP_MAX_TIMESTAMPS);
spin_unlock_irqrestore(&queue->lock, flags); } diff --git a/drivers/ptp/ptp_private.h b/drivers/ptp/ptp_private.h index 6b97155148f11..d2cb956706763 100644 --- a/drivers/ptp/ptp_private.h +++ b/drivers/ptp/ptp_private.h @@ -55,9 +55,13 @@ struct ptp_clock { * that a writer might concurrently increment the tail does not * matter, since the queue remains nonempty nonetheless. */ -static inline int queue_cnt(struct timestamp_event_queue *q) +static inline int queue_cnt(const struct timestamp_event_queue *q) { - int cnt = q->tail - q->head; + /* + * Paired with WRITE_ONCE() in enqueue_external_timestamp(), + * ptp_read(), extts_fifo_show(). + */ + int cnt = READ_ONCE(q->tail) - READ_ONCE(q->head); return cnt < 0 ? PTP_MAX_TIMESTAMPS + cnt : cnt; }
diff --git a/drivers/ptp/ptp_sysfs.c b/drivers/ptp/ptp_sysfs.c index 8cd59e8481631..8d52815e05b31 100644 --- a/drivers/ptp/ptp_sysfs.c +++ b/drivers/ptp/ptp_sysfs.c @@ -78,7 +78,8 @@ static ssize_t extts_fifo_show(struct device *dev, qcnt = queue_cnt(queue); if (qcnt) { event = queue->buf[queue->head]; - queue->head = (queue->head + 1) % PTP_MAX_TIMESTAMPS; + /* Paired with READ_ONCE() in queue_cnt() */ + WRITE_ONCE(queue->head, (queue->head + 1) % PTP_MAX_TIMESTAMPS); } spin_unlock_irqrestore(&queue->lock, flags);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 3cffa2ddc4d3fcf70cde361236f5a614f81a09b2 ]
Commit 9eed321cde22 ("net: lapbether: only support ethernet devices") has been able to keep syzbot away from net/lapb, until today.
In the following splat [1], the issue is that a lapbether device has been created on a bonding device without members. Then adding a non ARPHRD_ETHER member forced the bonding master to change its type.
The fix is to make sure we call dev_close() in bond_setup_by_slave() so that the potential linked lapbether devices (or any other devices having assumptions on the physical device) are removed.
A similar bug has been addressed in commit 40baec225765 ("bonding: fix panic on non-ARPHRD_ETHER enslave failure")
[1] skbuff: skb_under_panic: text:ffff800089508810 len:44 put:40 head:ffff0000c78e7c00 data:ffff0000c78e7bea tail:0x16 end:0x140 dev:bond0 kernel BUG at net/core/skbuff.c:192 ! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 6007 Comm: syz-executor383 Not tainted 6.6.0-rc3-syzkaller-gbf6547d8715b #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : skb_panic net/core/skbuff.c:188 [inline] pc : skb_under_panic+0x13c/0x140 net/core/skbuff.c:202 lr : skb_panic net/core/skbuff.c:188 [inline] lr : skb_under_panic+0x13c/0x140 net/core/skbuff.c:202 sp : ffff800096a06aa0 x29: ffff800096a06ab0 x28: ffff800096a06ba0 x27: dfff800000000000 x26: ffff0000ce9b9b50 x25: 0000000000000016 x24: ffff0000c78e7bea x23: ffff0000c78e7c00 x22: 000000000000002c x21: 0000000000000140 x20: 0000000000000028 x19: ffff800089508810 x18: ffff800096a06100 x17: 0000000000000000 x16: ffff80008a629a3c x15: 0000000000000001 x14: 1fffe00036837a32 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000201 x10: 0000000000000000 x9 : cb50b496c519aa00 x8 : cb50b496c519aa00 x7 : 0000000000000001 x6 : 0000000000000001 x5 : ffff800096a063b8 x4 : ffff80008e280f80 x3 : ffff8000805ad11c x2 : 0000000000000001 x1 : 0000000100000201 x0 : 0000000000000086 Call trace: skb_panic net/core/skbuff.c:188 [inline] skb_under_panic+0x13c/0x140 net/core/skbuff.c:202 skb_push+0xf0/0x108 net/core/skbuff.c:2446 ip6gre_header+0xbc/0x738 net/ipv6/ip6_gre.c:1384 dev_hard_header include/linux/netdevice.h:3136 [inline] lapbeth_data_transmit+0x1c4/0x298 drivers/net/wan/lapbether.c:257 lapb_data_transmit+0x8c/0xb0 net/lapb/lapb_iface.c:447 lapb_transmit_buffer+0x178/0x204 net/lapb/lapb_out.c:149 lapb_send_control+0x220/0x320 net/lapb/lapb_subr.c:251 __lapb_disconnect_request+0x9c/0x17c net/lapb/lapb_iface.c:326 lapb_device_event+0x288/0x4e0 net/lapb/lapb_iface.c:492 notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93 raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461 call_netdevice_notifiers_info net/core/dev.c:1970 [inline] call_netdevice_notifiers_extack net/core/dev.c:2008 [inline] call_netdevice_notifiers net/core/dev.c:2022 [inline] __dev_close_many+0x1b8/0x3c4 net/core/dev.c:1508 dev_close_many+0x1e0/0x470 net/core/dev.c:1559 dev_close+0x174/0x250 net/core/dev.c:1585 lapbeth_device_event+0x2e4/0x958 drivers/net/wan/lapbether.c:466 notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93 raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461 call_netdevice_notifiers_info net/core/dev.c:1970 [inline] call_netdevice_notifiers_extack net/core/dev.c:2008 [inline] call_netdevice_notifiers net/core/dev.c:2022 [inline] __dev_close_many+0x1b8/0x3c4 net/core/dev.c:1508 dev_close_many+0x1e0/0x470 net/core/dev.c:1559 dev_close+0x174/0x250 net/core/dev.c:1585 bond_enslave+0x2298/0x30cc drivers/net/bonding/bond_main.c:2332 bond_do_ioctl+0x268/0xc64 drivers/net/bonding/bond_main.c:4539 dev_ifsioc+0x754/0x9ac dev_ioctl+0x4d8/0xd34 net/core/dev_ioctl.c:786 sock_do_ioctl+0x1d4/0x2d0 net/socket.c:1217 sock_ioctl+0x4e8/0x834 net/socket.c:1322 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __arm64_sys_ioctl+0x14c/0x1c8 fs/ioctl.c:857 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x58/0x16c arch/arm64/kernel/entry-common.c:678 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 Code: aa1803e6 aa1903e7 a90023f5 94785b8b (d4210000)
Fixes: 872254dd6b1f ("net/bonding: Enable bonding to enslave non ARPHRD_ETHER") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Jay Vosburgh jay.vosburgh@canonical.com Reviewed-by: Hangbin Liu liuhangbin@gmail.com Link: https://lore.kernel.org/r/20231109180102.4085183-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index afd327e88cf5e..bb1c6743222e5 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1144,6 +1144,10 @@ static void bond_compute_features(struct bonding *bond) static void bond_setup_by_slave(struct net_device *bond_dev, struct net_device *slave_dev) { + bool was_up = !!(bond_dev->flags & IFF_UP); + + dev_close(bond_dev); + bond_dev->header_ops = slave_dev->header_ops;
bond_dev->type = slave_dev->type; @@ -1158,6 +1162,8 @@ static void bond_setup_by_slave(struct net_device *bond_dev, bond_dev->flags &= ~(IFF_BROADCAST | IFF_MULTICAST); bond_dev->flags |= (IFF_POINTOPOINT | IFF_NOARP); } + if (was_up) + dev_open(bond_dev, NULL); }
/* On bonding slaves other than the currently active slave, suppress
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Walleij linus.walleij@linaro.org
[ Upstream commit 510e35fb931ffc3b100e5d5ae4595cd3beca9f1a ]
Enumerator 3 is 1548 bytes according to the datasheet. Not 1542.
Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet") Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Vladimir Oltean olteanv@gmail.com Link: https://lore.kernel.org/r/20231109-gemini-largeframe-fix-v4-1-6e611528db08@l... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cortina/gemini.c | 4 ++-- drivers/net/ethernet/cortina/gemini.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c index a8a8b77c1611e..fbb50a0602832 100644 --- a/drivers/net/ethernet/cortina/gemini.c +++ b/drivers/net/ethernet/cortina/gemini.c @@ -432,8 +432,8 @@ static const struct gmac_max_framelen gmac_maxlens[] = { .val = CONFIG0_MAXLEN_1536, }, { - .max_l3_len = 1542, - .val = CONFIG0_MAXLEN_1542, + .max_l3_len = 1548, + .val = CONFIG0_MAXLEN_1548, }, { .max_l3_len = 9212, diff --git a/drivers/net/ethernet/cortina/gemini.h b/drivers/net/ethernet/cortina/gemini.h index 9fdf77d5eb374..99efb11557436 100644 --- a/drivers/net/ethernet/cortina/gemini.h +++ b/drivers/net/ethernet/cortina/gemini.h @@ -787,7 +787,7 @@ union gmac_config0 { #define CONFIG0_MAXLEN_1536 0 #define CONFIG0_MAXLEN_1518 1 #define CONFIG0_MAXLEN_1522 2 -#define CONFIG0_MAXLEN_1542 3 +#define CONFIG0_MAXLEN_1548 3 #define CONFIG0_MAXLEN_9k 4 /* 9212 */ #define CONFIG0_MAXLEN_10k 5 /* 10236 */ #define CONFIG0_MAXLEN_1518__6 6
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Walleij linus.walleij@linaro.org
[ Upstream commit d4d0c5b4d279bfe3585fbd806efefd3e51c82afa ]
The Gemini ethernet controller provides hardware checksumming for frames up to 1514 bytes including ethernet headers but not FCS.
If we start sending bigger frames (after first bumping up the MTU on both interfaces sending and receiving the frames), truncated packets start to appear on the target such as in this tcpdump resulting from ping -s 1474:
23:34:17.241983 14:d6:4d:a8:3c:4f (oui Unknown) > bc:ae:c5:6b:a8:3d (oui Unknown), ethertype IPv4 (0x0800), length 1514: truncated-ip - 2 bytes missing! (tos 0x0, ttl 64, id 32653, offset 0, flags [DF], proto ICMP (1), length 1502) OpenWrt.lan > Fecusia: ICMP echo request, id 1672, seq 50, length 1482
If we bypass the hardware checksumming and provide a software fallback, everything starts working fine up to the max TX MTU of 2047 bytes, for example ping -s2000 192.168.1.2:
00:44:29.587598 bc:ae:c5:6b:a8:3d (oui Unknown) > 14:d6:4d:a8:3c:4f (oui Unknown), ethertype IPv4 (0x0800), length 2042: (tos 0x0, ttl 64, id 51828, offset 0, flags [none], proto ICMP (1), length 2028) Fecusia > OpenWrt.lan: ICMP echo reply, id 1683, seq 4, length 2008
The bit enabling to bypass hardware checksum (or any of the "TSS" bits) are undocumented in the hardware reference manual. The entire hardware checksum unit appears undocumented. The conclusion that we need to use the "bypass" bit was found by trial-and-error.
Since no hardware checksum will happen, we slot in a software checksum fallback.
Check for the condition where we need to compute checksum on the skb with either hardware or software using == CHECKSUM_PARTIAL instead of != CHECKSUM_NONE which is an incomplete check according to <linux/skbuff.h>.
On the D-Link DIR-685 router this fixes a bug on the conduit interface to the RTL8366RB DSA switch: as the switch needs to add space for its tag it increases the MTU on the conduit interface to 1504 and that means that when the router sends packages of 1500 bytes these get an extra 4 bytes of DSA tag and the transfer fails because of the erroneous hardware checksumming, affecting such basic functionality as the LuCI web interface.
Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet") Signed-off-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Vladimir Oltean olteanv@gmail.com Link: https://lore.kernel.org/r/20231109-gemini-largeframe-fix-v4-2-6e611528db08@l... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cortina/gemini.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c index fbb50a0602832..ce1ada712af69 100644 --- a/drivers/net/ethernet/cortina/gemini.c +++ b/drivers/net/ethernet/cortina/gemini.c @@ -1152,6 +1152,7 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb, dma_addr_t mapping; unsigned short mtu; void *buffer; + int ret;
mtu = ETH_HLEN; mtu += netdev->mtu; @@ -1166,9 +1167,30 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb, word3 |= mtu; }
- if (skb->ip_summed != CHECKSUM_NONE) { + if (skb->len >= ETH_FRAME_LEN) { + /* Hardware offloaded checksumming isn't working on frames + * bigger than 1514 bytes. A hypothesis about this is that the + * checksum buffer is only 1518 bytes, so when the frames get + * bigger they get truncated, or the last few bytes get + * overwritten by the FCS. + * + * Just use software checksumming and bypass on bigger frames. + */ + if (skb->ip_summed == CHECKSUM_PARTIAL) { + ret = skb_checksum_help(skb); + if (ret) + return ret; + } + word1 |= TSS_BYPASS_BIT; + } else if (skb->ip_summed == CHECKSUM_PARTIAL) { int tcp = 0;
+ /* We do not switch off the checksumming on non TCP/UDP + * frames: as is shown from tests, the checksumming engine + * is smart enough to see that a frame is not actually TCP + * or UDP and then just pass it through without any changes + * to the frame. + */ if (skb->protocol == htons(ETH_P_IP)) { word1 |= TSS_IP_CHKSUM_BIT; tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Walleij linus.walleij@linaro.org
[ Upstream commit dc6c0bfbaa947dd7976e30e8c29b10c868b6fa42 ]
The RX max frame size is over 10000 for the Gemini ethernet, but the TX max frame size is actually just 2047 (0x7ff after checking the datasheet). Reflect this in what we offer to Linux, cap the MTU at the TX max frame minus ethernet headers.
We delete the code disabling the hardware checksum for large MTUs as netdev->mtu can no longer be larger than netdev->max_mtu meaning the if()-clause in gmac_fix_features() is never true.
Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet") Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Vladimir Oltean olteanv@gmail.com Link: https://lore.kernel.org/r/20231109-gemini-largeframe-fix-v4-3-6e611528db08@l... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cortina/gemini.c | 17 ++++------------- drivers/net/ethernet/cortina/gemini.h | 2 +- 2 files changed, 5 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c index ce1ada712af69..4bcdb48b0e9cc 100644 --- a/drivers/net/ethernet/cortina/gemini.c +++ b/drivers/net/ethernet/cortina/gemini.c @@ -2015,15 +2015,6 @@ static int gmac_change_mtu(struct net_device *netdev, int new_mtu) return 0; }
-static netdev_features_t gmac_fix_features(struct net_device *netdev, - netdev_features_t features) -{ - if (netdev->mtu + ETH_HLEN + VLAN_HLEN > MTU_SIZE_BIT_MASK) - features &= ~GMAC_OFFLOAD_FEATURES; - - return features; -} - static int gmac_set_features(struct net_device *netdev, netdev_features_t features) { @@ -2244,7 +2235,6 @@ static const struct net_device_ops gmac_351x_ops = { .ndo_set_mac_address = gmac_set_mac_address, .ndo_get_stats64 = gmac_get_stats64, .ndo_change_mtu = gmac_change_mtu, - .ndo_fix_features = gmac_fix_features, .ndo_set_features = gmac_set_features, };
@@ -2498,11 +2488,12 @@ static int gemini_ethernet_port_probe(struct platform_device *pdev)
netdev->hw_features = GMAC_OFFLOAD_FEATURES; netdev->features |= GMAC_OFFLOAD_FEATURES | NETIF_F_GRO; - /* We can handle jumbo frames up to 10236 bytes so, let's accept - * payloads of 10236 bytes minus VLAN and ethernet header + /* We can receive jumbo frames up to 10236 bytes but only + * transmit 2047 bytes so, let's accept payloads of 2047 + * bytes minus VLAN and ethernet header */ netdev->min_mtu = ETH_MIN_MTU; - netdev->max_mtu = 10236 - VLAN_ETH_HLEN; + netdev->max_mtu = MTU_SIZE_BIT_MASK - VLAN_ETH_HLEN;
port->freeq_refill = 0; netif_napi_add(netdev, &port->napi, gmac_napi_poll, diff --git a/drivers/net/ethernet/cortina/gemini.h b/drivers/net/ethernet/cortina/gemini.h index 99efb11557436..24bb989981f23 100644 --- a/drivers/net/ethernet/cortina/gemini.h +++ b/drivers/net/ethernet/cortina/gemini.h @@ -502,7 +502,7 @@ union gmac_txdesc_3 { #define SOF_BIT 0x80000000 #define EOF_BIT 0x40000000 #define EOFIE_BIT BIT(29) -#define MTU_SIZE_BIT_MASK 0x1fff +#define MTU_SIZE_BIT_MASK 0x7ff /* Max MTU 2047 bytes */
/* GMAC Tx Descriptor */ struct gmac_txdesc {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linkui Xiao xiaolinkui@kylinos.cn
[ Upstream commit a44af08e3d4d7566eeea98d7a29fe06e7b9de944 ]
K2CI reported a problem:
consume_skb(skb); return err; [nf_br_ip_fragment() error] uninitialized symbol 'err'.
err is not initialized, because returning 0 is expected, initialize err to 0.
Fixes: 3c171f496ef5 ("netfilter: bridge: add connection tracking system") Reported-by: k2ci kernel-bot@kylinos.cn Signed-off-by: Linkui Xiao xiaolinkui@kylinos.cn Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/netfilter/nf_conntrack_bridge.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c index fdbed31585553..d14b2dbbd1dfb 100644 --- a/net/bridge/netfilter/nf_conntrack_bridge.c +++ b/net/bridge/netfilter/nf_conntrack_bridge.c @@ -36,7 +36,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk, ktime_t tstamp = skb->tstamp; struct ip_frag_state state; struct iphdr *iph; - int err; + int err = 0;
/* for offloaded checksums cleanup checksum before fragmentation */ if (skb->ip_summed == CHECKSUM_PARTIAL &&
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jose Abreu Jose.Abreu@synopsys.com
[ Upstream commit 88ebe2cf7f3fc9da95e0f06483fd58da3e67e675 ]
This looks over-engineered. Let's use some helpers to get the buffer length and hereby simplify the stmmac_rx() function. No performance drop was seen with the new implementation.
Signed-off-by: Jose Abreu Jose.Abreu@synopsys.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: fa02de9e7588 ("net: stmmac: fix rx budget limit check") Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/stmicro/stmmac/stmmac_main.c | 146 +++++++++++------- 1 file changed, 94 insertions(+), 52 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 6a3b0f76d9729..e521ab508f030 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3440,6 +3440,55 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue) stmmac_set_rx_tail_ptr(priv, priv->ioaddr, rx_q->rx_tail_addr, queue); }
+static unsigned int stmmac_rx_buf1_len(struct stmmac_priv *priv, + struct dma_desc *p, + int status, unsigned int len) +{ + int ret, coe = priv->hw->rx_csum; + unsigned int plen = 0, hlen = 0; + + /* Not first descriptor, buffer is always zero */ + if (priv->sph && len) + return 0; + + /* First descriptor, get split header length */ + ret = stmmac_get_rx_header_len(priv, p, &hlen); + if (priv->sph && hlen) { + priv->xstats.rx_split_hdr_pkt_n++; + return hlen; + } + + /* First descriptor, not last descriptor and not split header */ + if (status & rx_not_ls) + return priv->dma_buf_sz; + + plen = stmmac_get_rx_frame_len(priv, p, coe); + + /* First descriptor and last descriptor and not split header */ + return min_t(unsigned int, priv->dma_buf_sz, plen); +} + +static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv, + struct dma_desc *p, + int status, unsigned int len) +{ + int coe = priv->hw->rx_csum; + unsigned int plen = 0; + + /* Not split header, buffer is not available */ + if (!priv->sph) + return 0; + + /* Not last descriptor */ + if (status & rx_not_ls) + return priv->dma_buf_sz; + + plen = stmmac_get_rx_frame_len(priv, p, coe); + + /* Last descriptor */ + return plen - len; +} + /** * stmmac_rx - manage the receive process * @priv: driver private structure @@ -3469,11 +3518,10 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue) stmmac_display_ring(priv, rx_head, DMA_RX_SIZE, true); } while (count < limit) { - unsigned int hlen = 0, prev_len = 0; + unsigned int buf1_len = 0, buf2_len = 0; enum pkt_hash_types hash_type; struct stmmac_rx_buffer *buf; struct dma_desc *np, *p; - unsigned int sec_len; int entry; u32 hash;
@@ -3492,7 +3540,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue) break;
read_again: - sec_len = 0; + buf1_len = 0; + buf2_len = 0; entry = next_entry; buf = &rx_q->buf_pool[entry];
@@ -3517,7 +3566,6 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue) np = rx_q->dma_rx + next_entry;
prefetch(np); - prefetch(page_address(buf->page));
if (priv->extend_desc) stmmac_rx_extended_status(priv, &priv->dev->stats, @@ -3534,69 +3582,61 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue) goto read_again; if (unlikely(error)) { dev_kfree_skb(skb); + skb = NULL; count++; continue; }
/* Buffer is good. Go on. */
- if (likely(status & rx_not_ls)) { - len += priv->dma_buf_sz; - } else { - prev_len = len; - len = stmmac_get_rx_frame_len(priv, p, coe); - - /* ACS is set; GMAC core strips PAD/FCS for IEEE 802.3 - * Type frames (LLC/LLC-SNAP) - * - * llc_snap is never checked in GMAC >= 4, so this ACS - * feature is always disabled and packets need to be - * stripped manually. - */ - if (unlikely(priv->synopsys_id >= DWMAC_CORE_4_00) || - unlikely(status != llc_snap)) - len -= ETH_FCS_LEN; + prefetch(page_address(buf->page)); + if (buf->sec_page) + prefetch(page_address(buf->sec_page)); + + buf1_len = stmmac_rx_buf1_len(priv, p, status, len); + len += buf1_len; + buf2_len = stmmac_rx_buf2_len(priv, p, status, len); + len += buf2_len; + + /* ACS is set; GMAC core strips PAD/FCS for IEEE 802.3 + * Type frames (LLC/LLC-SNAP) + * + * llc_snap is never checked in GMAC >= 4, so this ACS + * feature is always disabled and packets need to be + * stripped manually. + */ + if (unlikely(priv->synopsys_id >= DWMAC_CORE_4_00) || + unlikely(status != llc_snap)) { + if (buf2_len) + buf2_len -= ETH_FCS_LEN; + else + buf1_len -= ETH_FCS_LEN; + + len -= ETH_FCS_LEN; }
if (!skb) { - int ret = stmmac_get_rx_header_len(priv, p, &hlen); - - if (priv->sph && !ret && (hlen > 0)) { - sec_len = len; - if (!(status & rx_not_ls)) - sec_len = sec_len - hlen; - len = hlen; - - prefetch(page_address(buf->sec_page)); - priv->xstats.rx_split_hdr_pkt_n++; - } - - skb = napi_alloc_skb(&ch->rx_napi, len); + skb = napi_alloc_skb(&ch->rx_napi, buf1_len); if (!skb) { priv->dev->stats.rx_dropped++; count++; - continue; + goto drain_data; }
- dma_sync_single_for_cpu(priv->device, buf->addr, len, - DMA_FROM_DEVICE); + dma_sync_single_for_cpu(priv->device, buf->addr, + buf1_len, DMA_FROM_DEVICE); skb_copy_to_linear_data(skb, page_address(buf->page), - len); - skb_put(skb, len); + buf1_len); + skb_put(skb, buf1_len);
/* Data payload copied into SKB, page ready for recycle */ page_pool_recycle_direct(rx_q->page_pool, buf->page); buf->page = NULL; - } else { - unsigned int buf_len = len - prev_len; - - if (likely(status & rx_not_ls)) - buf_len = priv->dma_buf_sz; - + } else if (buf1_len) { dma_sync_single_for_cpu(priv->device, buf->addr, - buf_len, DMA_FROM_DEVICE); + buf1_len, DMA_FROM_DEVICE); skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, - buf->page, 0, buf_len, + buf->page, 0, buf1_len, priv->dma_buf_sz);
/* Data payload appended into SKB */ @@ -3604,22 +3644,23 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue) buf->page = NULL; }
- if (sec_len > 0) { + if (buf2_len) { dma_sync_single_for_cpu(priv->device, buf->sec_addr, - sec_len, DMA_FROM_DEVICE); + buf2_len, DMA_FROM_DEVICE); skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, - buf->sec_page, 0, sec_len, + buf->sec_page, 0, buf2_len, priv->dma_buf_sz);
- len += sec_len; - /* Data payload appended into SKB */ page_pool_release_page(rx_q->page_pool, buf->sec_page); buf->sec_page = NULL; }
+drain_data: if (likely(status & rx_not_ls)) goto read_again; + if (!skb) + continue;
/* Got entire packet into SKB. Finish it. */
@@ -3637,13 +3678,14 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
skb_record_rx_queue(skb, queue); napi_gro_receive(&ch->rx_napi, skb); + skb = NULL;
priv->dev->stats.rx_packets++; priv->dev->stats.rx_bytes += len; count++; }
- if (status & rx_not_ls) { + if (status & rx_not_ls || skb) { rx_q->state_saved = true; rx_q->state.skb = skb; rx_q->state.error = error;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baruch Siach baruch@tkos.co.il
[ Upstream commit fa02de9e75889915b554eda1964a631fd019973b ]
The while loop condition verifies 'count < limit'. Neither value change before the 'count >= limit' check. As is this check is dead code. But code inspection reveals a code path that modifies 'count' and then goto 'drain_data' and back to 'read_again'. So there is a need to verify count value sanity after 'read_again'.
Move 'read_again' up to fix the count limit check.
Fixes: ec222003bd94 ("net: stmmac: Prepare to add Split Header support") Signed-off-by: Baruch Siach baruch@tkos.co.il Reviewed-by: Serge Semin fancer.lancer@gmail.com Link: https://lore.kernel.org/r/d9486296c3b6b12ab3a0515fcd47d56447a07bfc.169989737... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index e521ab508f030..4eaa65e8d58f2 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3536,10 +3536,10 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue) len = 0; }
+read_again: if (count >= limit) break;
-read_again: buf1_len = 0; buf2_len = 0; entry = next_entry;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dust Li dust.li@linux.alibaba.com
[ Upstream commit 6f9b1a0731662648949a1c0587f6acb3b7f8acf1 ]
When mlx5_packet_reformat_alloc() fails, the encap_header allocated in mlx5e_tc_tun_create_header_ipv4{6} will be released within it. However, e->encap_header is already set to the previously freed encap_header before mlx5_packet_reformat_alloc(). As a result, the later mlx5e_encap_put() will free e->encap_header again, causing a double free issue.
mlx5e_encap_put() --> mlx5e_encap_dealloc() --> kfree(e->encap_header)
This happens when cmd: MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT fail.
This patch fix it by not setting e->encap_header until mlx5_packet_reformat_alloc() success.
Fixes: d589e785baf5e ("net/mlx5e: Allow concurrent creation of encap entries") Reported-by: Cruz Zhao cruzzhao@linux.alibaba.com Reported-by: Tianchen Ding dtcccc@linux.alibaba.com Signed-off-by: Dust Li dust.li@linux.alibaba.com Reviewed-by: Wojciech Drewek wojciech.drewek@intel.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c index 362f01bc8372e..5a4bee5253ec1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c @@ -290,9 +290,6 @@ int mlx5e_tc_tun_create_header_ipv4(struct mlx5e_priv *priv, if (err) goto destroy_neigh_entry;
- e->encap_size = ipv4_encap_size; - e->encap_header = encap_header; - if (!(nud_state & NUD_VALID)) { neigh_event_send(n, NULL); /* the encap entry will be made valid on neigh update event @@ -309,6 +306,8 @@ int mlx5e_tc_tun_create_header_ipv4(struct mlx5e_priv *priv, goto destroy_neigh_entry; }
+ e->encap_size = ipv4_encap_size; + e->encap_header = encap_header; e->flags |= MLX5_ENCAP_ENTRY_VALID; mlx5e_rep_queue_neigh_stats_work(netdev_priv(out_dev)); neigh_release(n); @@ -408,9 +407,6 @@ int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, if (err) goto destroy_neigh_entry;
- e->encap_size = ipv6_encap_size; - e->encap_header = encap_header; - if (!(nud_state & NUD_VALID)) { neigh_event_send(n, NULL); /* the encap entry will be made valid on neigh update event @@ -428,6 +424,8 @@ int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, goto destroy_neigh_entry; }
+ e->encap_size = ipv6_encap_size; + e->encap_header = encap_header; e->flags |= MLX5_ENCAP_ENTRY_VALID; mlx5e_rep_queue_neigh_stats_work(netdev_priv(out_dev)); neigh_release(n);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leon Romanovsky leonro@nvidia.com
[ Upstream commit 17a7612b99e66d2539341ab4f888f970c2c7f76d ]
Remove exposed driver version as it was done in other drivers, so module version will work correctly by displaying the kernel version for which it is compiled.
And move mlx5_core module name to general include, so auxiliary drivers will be able to use it as a basis for a name in their device ID tables.
Reviewed-by: Parav Pandit parav@nvidia.com Reviewed-by: Roi Dayan roid@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Stable-dep-of: 1b2bd0c0264f ("net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 4 +--- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 1 - .../net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 10 ++++++---- drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h | 3 --- include/linux/mlx5/driver.h | 2 ++ 7 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index d63ce3feb65ca..6e763699d5043 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -55,7 +55,7 @@ mlx5_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req, u32 running_fw, stored_fw; int err;
- err = devlink_info_driver_name_put(req, DRIVER_NAME); + err = devlink_info_driver_name_put(req, KBUILD_MODNAME); if (err) return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index e92cc60eade3f..18e0cb02aee18 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -40,9 +40,7 @@ void mlx5e_ethtool_get_drvinfo(struct mlx5e_priv *priv, { struct mlx5_core_dev *mdev = priv->mdev;
- strlcpy(drvinfo->driver, DRIVER_NAME, sizeof(drvinfo->driver)); - strlcpy(drvinfo->version, DRIVER_VERSION, - sizeof(drvinfo->version)); + strlcpy(drvinfo->driver, KBUILD_MODNAME, sizeof(drvinfo->driver)); snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), "%d.%d.%04d (%.16s)", fw_rev_maj(mdev), fw_rev_min(mdev), fw_rev_sub(mdev), diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index f448a139e222e..e150d9fbd2ce1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -74,7 +74,6 @@ static void mlx5e_rep_get_drvinfo(struct net_device *dev,
strlcpy(drvinfo->driver, mlx5e_rep_driver_name, sizeof(drvinfo->driver)); - strlcpy(drvinfo->version, UTS_RELEASE, sizeof(drvinfo->version)); snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), "%d.%d.%04d (%.16s)", fw_rev_maj(mdev), fw_rev_min(mdev), diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c index 90cb50fe17fd9..f7f8098879843 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c @@ -39,7 +39,7 @@ static void mlx5i_get_drvinfo(struct net_device *dev, struct mlx5e_priv *priv = mlx5i_epriv(dev);
mlx5e_ethtool_get_drvinfo(priv, drvinfo); - strlcpy(drvinfo->driver, DRIVER_NAME "[ib_ipoib]", + strlcpy(drvinfo->driver, KBUILD_MODNAME "[ib_ipoib]", sizeof(drvinfo->driver)); }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index ff9ac7cffc321..a183613420d27 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -75,7 +75,6 @@ MODULE_AUTHOR("Eli Cohen eli@mellanox.com"); MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) core driver"); MODULE_LICENSE("Dual BSD/GPL"); -MODULE_VERSION(DRIVER_VERSION);
unsigned int mlx5_core_debug_mask; module_param_named(debug_mask, mlx5_core_debug_mask, uint, 0644); @@ -222,7 +221,7 @@ static void mlx5_set_driver_version(struct mlx5_core_dev *dev) strncat(string, ",", remaining_size);
remaining_size = max_t(int, 0, driver_ver_sz - strlen(string)); - strncat(string, DRIVER_NAME, remaining_size); + strncat(string, KBUILD_MODNAME, remaining_size);
remaining_size = max_t(int, 0, driver_ver_sz - strlen(string)); strncat(string, ",", remaining_size); @@ -307,7 +306,7 @@ static int request_bar(struct pci_dev *pdev) return -ENODEV; }
- err = pci_request_regions(pdev, DRIVER_NAME); + err = pci_request_regions(pdev, KBUILD_MODNAME); if (err) dev_err(&pdev->dev, "Couldn't get PCI resources, aborting\n");
@@ -1618,7 +1617,7 @@ void mlx5_recover_device(struct mlx5_core_dev *dev) }
static struct pci_driver mlx5_core_driver = { - .name = DRIVER_NAME, + .name = KBUILD_MODNAME, .id_table = mlx5_core_pci_table, .probe = init_one, .remove = remove_one, @@ -1644,6 +1643,9 @@ static int __init mlx5_init(void) { int err;
+ WARN_ONCE(strcmp(MLX5_ADEV_NAME, KBUILD_MODNAME), + "mlx5_core name not in sync with kernel module name"); + get_random_bytes(&sw_owner_id, sizeof(sw_owner_id));
mlx5_core_verify_params(); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index b100489dc85c8..e053a17e0c7ae 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -43,9 +43,6 @@ #include <linux/mlx5/fs.h> #include <linux/mlx5/driver.h>
-#define DRIVER_NAME "mlx5_core" -#define DRIVER_VERSION "5.0-0" - extern uint mlx5_core_debug_mask;
#define mlx5_core_dbg(__dev, format, ...) \ diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 3a19b9202a12d..18fd0a030584c 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -56,6 +56,8 @@ #include <linux/ptp_clock_kernel.h> #include <net/devlink.h>
+#define MLX5_ADEV_NAME "mlx5_core" + enum { MLX5_BOARD_ID_LEN = 64, };
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rahul Rameshbabu rrameshbabu@nvidia.com
[ Upstream commit 1b2bd0c0264febcd8d47209079a6671c38e6558b ]
Treat the operation as an error case when the return value is equivalent to the size of the name buffer. Failed to write null terminator to the name buffer, making the string malformed and should not be used. Provide a string with only the firmware version when forming the string with the board id fails. This logic for representors is identical to normal flow with ethtool.
Without check, will trigger -Wformat-truncation with W=1.
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c: In function 'mlx5e_rep_get_drvinfo': drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:78:31: warning: '%.16s' directive output may be truncated writing up to 16 bytes into a region of size between 13 and 22 [-Wformat-truncation=] 78 | "%d.%d.%04d (%.16s)", | ^~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:77:9: note: 'snprintf' output between 12 and 37 bytes into a destination of size 32 77 | snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 78 | "%d.%d.%04d (%.16s)", | ~~~~~~~~~~~~~~~~~~~~~ 79 | fw_rev_maj(mdev), fw_rev_min(mdev), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 80 | fw_rev_sub(mdev), mdev->board_id); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: cf83c8fdcd47 ("net/mlx5e: Add missing ethtool driver info for representors") Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... Signed-off-by: Rahul Rameshbabu rrameshbabu@nvidia.com Reviewed-by: Dragos Tatulea dtatulea@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Link: https://lore.kernel.org/r/20231114215846.5902-16-saeed@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index e150d9fbd2ce1..ed37cc7c9ae00 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -71,13 +71,17 @@ static void mlx5e_rep_get_drvinfo(struct net_device *dev, { struct mlx5e_priv *priv = netdev_priv(dev); struct mlx5_core_dev *mdev = priv->mdev; + int count;
strlcpy(drvinfo->driver, mlx5e_rep_driver_name, sizeof(drvinfo->driver)); - snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), - "%d.%d.%04d (%.16s)", - fw_rev_maj(mdev), fw_rev_min(mdev), - fw_rev_sub(mdev), mdev->board_id); + count = snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), + "%d.%d.%04d (%.16s)", fw_rev_maj(mdev), + fw_rev_min(mdev), fw_rev_sub(mdev), mdev->board_id); + if (count == sizeof(drvinfo->fw_version)) + snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), + "%d.%d.%04d", fw_rev_maj(mdev), + fw_rev_min(mdev), fw_rev_sub(mdev)); }
static void mlx5e_uplink_rep_get_drvinfo(struct net_device *dev,
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vlad Buslov vladbu@nvidia.com
[ Upstream commit 7e1caeace0418381f36b3aa8403dfd82fc57fc53 ]
Macvlan device in passthru mode sets its lower device promiscuous mode according to its MACVLAN_FLAG_NOPROMISC flag instead of synchronizing it to its own promiscuity setting. However, macvlan_change_rx_flags() function doesn't check the mode before propagating such changes to the lower device which can cause net_device->promiscuity counter overflow as illustrated by reproduction example [0] and resulting dmesg log [1]. Fix the issue by first verifying the mode in macvlan_change_rx_flags() function before propagating promiscuous mode change to the lower device.
[0]: ip link add macvlan1 link enp8s0f0 type macvlan mode passthru ip link set macvlan1 promisc on ip l set dev macvlan1 up ip link set macvlan1 promisc off ip l set dev macvlan1 down ip l set dev macvlan1 up
[1]: [ 5156.281724] macvlan1: entered promiscuous mode [ 5156.285467] mlx5_core 0000:08:00.0 enp8s0f0: entered promiscuous mode [ 5156.287639] macvlan1: left promiscuous mode [ 5156.288339] mlx5_core 0000:08:00.0 enp8s0f0: left promiscuous mode [ 5156.290907] mlx5_core 0000:08:00.0 enp8s0f0: entered promiscuous mode [ 5156.317197] mlx5_core 0000:08:00.0 enp8s0f0: promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken.
Fixes: efdbd2b30caa ("macvlan: Propagate promiscuity setting to lower devices.") Reviewed-by: Gal Pressman gal@nvidia.com Signed-off-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Jiri Pirko jiri@nvidia.com Link: https://lore.kernel.org/r/20231114175915.1649154-1-vladbu@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/macvlan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 545d181453504..46398b06676c0 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -765,7 +765,7 @@ static void macvlan_change_rx_flags(struct net_device *dev, int change) if (dev->flags & IFF_UP) { if (change & IFF_ALLMULTI) dev_set_allmulti(lowerdev, dev->flags & IFF_ALLMULTI ? 1 : -1); - if (change & IFF_PROMISC) + if (!macvlan_passthru(vlan->port) && change & IFF_PROMISC) dev_set_promiscuity(lowerdev, dev->flags & IFF_PROMISC ? 1 : -1);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit 137f01b3529d292a68d22e9681e2f903c768f790 ]
MSR_KNL_CORE_C6_RESIDENCY should be evaluated only if 1. this is KNL platform AND 2. need to get C6 residency or need to calculate C1 residency
Fix the broken logic introduced by commit 1e9042b9c8d4 ("tools/power turbostat: Fix CPU%C1 display value").
Fixes: 1e9042b9c8d4 ("tools/power turbostat: Fix CPU%C1 display value") Signed-off-by: Zhang Rui rui.zhang@intel.com Reviewed-by: Len Brown len.brown@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/power/x86/turbostat/turbostat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index 8bf6b01b35608..d4235d1ab912c 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -1881,7 +1881,7 @@ int get_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p) if ((DO_BIC(BIC_CPU_c6) || soft_c1_residency_display(BIC_CPU_c6)) && !do_knl_cstates) { if (get_msr(cpu, MSR_CORE_C6_RESIDENCY, &c->c6)) return -7; - } else if (do_knl_cstates || soft_c1_residency_display(BIC_CPU_c6)) { + } else if (do_knl_cstates && soft_c1_residency_display(BIC_CPU_c6)) { if (get_msr(cpu, MSR_KNL_CORE_C6_RESIDENCY, &c->c6)) return -7; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anastasia Belova abelova@astralinux.ru
[ Upstream commit ff31ba19d732efb9aca3633935d71085e68d5076 ]
"host=" should start with ';' (as in cifs_get_spnego_key) So its length should be 6.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Reviewed-by: Paulo Alcantara (SUSE) pc@manguebit.com Fixes: 7c9c3760b3a5 ("[CIFS] add constants for string lengths of keynames in SPNEGO upcall string") Signed-off-by: Anastasia Belova abelova@astralinux.ru Co-developed-by: Ekaterina Esina eesina@astralinux.ru Signed-off-by: Ekaterina Esina eesina@astralinux.ru Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/cifs_spnego.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/cifs/cifs_spnego.c b/fs/cifs/cifs_spnego.c index 7f01c6e607918..6eb65988321fc 100644 --- a/fs/cifs/cifs_spnego.c +++ b/fs/cifs/cifs_spnego.c @@ -76,8 +76,8 @@ struct key_type cifs_spnego_key_type = { * strlen(";sec=ntlmsspi") */ #define MAX_MECH_STR_LEN 13
-/* strlen of "host=" */ -#define HOST_KEY_LEN 5 +/* strlen of ";host=" */ +#define HOST_KEY_LEN 6
/* strlen of ";ip4=" or ";ip6=" */ #define IP_KEY_LEN 5
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vikash Garodia quic_vgarodia@quicinc.com
commit 5e538fce33589da6d7cb2de1445b84d3a8a692f7 upstream.
Read and write pointers are used to track the packet index in the memory shared between video driver and firmware. There is a possibility of OOB access if the read or write pointer goes beyond the queue memory size. Add checks for the read and write pointer to avoid OOB access.
Cc: stable@vger.kernel.org Fixes: d96d3f30c0f2 ("[media] media: venus: hfi: add Venus HFI files") Signed-off-by: Vikash Garodia quic_vgarodia@quicinc.com Signed-off-by: Stanimir Varbanov stanimir.k.varbanov@gmail.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/platform/qcom/venus/hfi_venus.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/media/platform/qcom/venus/hfi_venus.c +++ b/drivers/media/platform/qcom/venus/hfi_venus.c @@ -206,6 +206,11 @@ static int venus_write_queue(struct venu
new_wr_idx = wr_idx + dwords; wr_ptr = (u32 *)(queue->qmem.kva + (wr_idx << 2)); + + if (wr_ptr < (u32 *)queue->qmem.kva || + wr_ptr > (u32 *)(queue->qmem.kva + queue->qmem.size - sizeof(*wr_ptr))) + return -EINVAL; + if (new_wr_idx < qsize) { memcpy(wr_ptr, packet, dwords << 2); } else { @@ -273,6 +278,11 @@ static int venus_read_queue(struct venus }
rd_ptr = (u32 *)(queue->qmem.kva + (rd_idx << 2)); + + if (rd_ptr < (u32 *)queue->qmem.kva || + rd_ptr > (u32 *)(queue->qmem.kva + queue->qmem.size - sizeof(*rd_ptr))) + return -EINVAL; + dwords = *rd_ptr >> 2; if (!dwords) return -EINVAL;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
commit 381fdb73d1e2a48244de7260550e453d1003bb8e upstream.
The performance mode of the gcc-plugin randstruct was shuffling struct members outside of the cache-line groups. Limit the range to the specified group indexes.
Cc: linux-hardening@vger.kernel.org Cc: stable@vger.kernel.org Reported-by: Lukas Loidolt e1634039@student.tuwien.ac.at Closes: https://lore.kernel.org/all/f3ca77f0-e414-4065-83a5-ae4c4d25545d@student.tuw... Fixes: 313dd1b62921 ("gcc-plugins: Add the randstruct plugin") Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- scripts/gcc-plugins/randomize_layout_plugin.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/scripts/gcc-plugins/randomize_layout_plugin.c +++ b/scripts/gcc-plugins/randomize_layout_plugin.c @@ -209,12 +209,14 @@ static void partition_struct(tree *field
static void performance_shuffle(tree *newtree, unsigned long length, ranctx *prng_state) { - unsigned long i, x; + unsigned long i, x, index; struct partition_group size_group[length]; unsigned long num_groups = 0; unsigned long randnum;
partition_struct(newtree, length, (struct partition_group *)&size_group, &num_groups); + + /* FIXME: this group shuffle is currently a no-op. */ for (i = num_groups - 1; i > 0; i--) { struct partition_group tmp; randnum = ranval(prng_state) % (i + 1); @@ -224,11 +226,14 @@ static void performance_shuffle(tree *ne }
for (x = 0; x < num_groups; x++) { - for (i = size_group[x].start + size_group[x].length - 1; i > size_group[x].start; i--) { + for (index = size_group[x].length - 1; index > 0; index--) { tree tmp; + + i = size_group[x].start + index; if (DECL_BIT_FIELD_TYPE(newtree[i])) continue; - randnum = ranval(prng_state) % (i + 1); + randnum = ranval(prng_state) % (index + 1); + randnum += size_group[x].start; // we could handle this case differently if desired if (DECL_BIT_FIELD_TYPE(newtree[randnum])) continue;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shung-Hsi Yu shung-hsi.yu@suse.com
commit 291d044fd51f8484066300ee42afecf8c8db7b3a upstream.
BPF_END and BPF_NEG has a different specification for the source bit in the opcode compared to other ALU/ALU64 instructions, and is either reserved or use to specify the byte swap endianness. In both cases the source bit does not encode source operand location, and src_reg is a reserved field.
backtrack_insn() currently does not differentiate BPF_END and BPF_NEG from other ALU/ALU64 instructions, which leads to r0 being incorrectly marked as precise when processing BPF_ALU | BPF_TO_BE | BPF_END instructions. This commit teaches backtrack_insn() to correctly mark precision for such case.
While precise tracking of BPF_NEG and other BPF_END instructions are correct and does not need fixing, this commit opt to process all BPF_NEG and BPF_END instructions within the same if-clause to better align with current convention used in the verifier (e.g. check_alu_op).
Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking") Cc: stable@vger.kernel.org Reported-by: Mohamed Mahmoud mmahmoud@redhat.com Closes: https://lore.kernel.org/r/87jzrrwptf.fsf@toke.dk Tested-by: Toke Høiland-Jørgensen toke@redhat.com Tested-by: Tao Lyu tao.lyu@epfl.ch Acked-by: Eduard Zingerman eddyz87@gmail.com Signed-off-by: Shung-Hsi Yu shung-hsi.yu@suse.com Link: https://lore.kernel.org/r/20231102053913.12004-2-shung-hsi.yu@suse.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1469,7 +1469,12 @@ static int backtrack_insn(struct bpf_ver if (class == BPF_ALU || class == BPF_ALU64) { if (!(*reg_mask & dreg)) return 0; - if (opcode == BPF_MOV) { + if (opcode == BPF_END || opcode == BPF_NEG) { + /* sreg is reserved and unused + * dreg still need precision before this insn + */ + return 0; + } else if (opcode == BPF_MOV) { if (BPF_SRC(insn->code) == BPF_X) { /* dreg = sreg * dreg needs precision after this insn
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chandrakanth patil chandrakanth.patil@broadcom.com
commit 8e3ed9e786511ad800c33605ed904b9de49323cf upstream.
In BMC environments with concurrent access to multiple registers, certain registers occasionally yield a value of 0 even after 3 retries due to hardware errata. As a fix, we have extended the retry count from 3 to 30.
The same errata applies to the mpt3sas driver, and a similar patch has been accepted. Please find more details in the mpt3sas patch reference link.
Link: https://lore.kernel.org/r/20230829090020.5417-2-ranjan.kumar@broadcom.com Fixes: 272652fcbf1a ("scsi: megaraid_sas: add retry logic in megasas_readl") Cc: stable@vger.kernel.org Signed-off-by: Chandrakanth patil chandrakanth.patil@broadcom.com Signed-off-by: Sumit Saxena sumit.saxena@broadcom.com Link: https://lore.kernel.org/r/20231003110021.168862-2-chandrakanth.patil@broadco... Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/megaraid/megaraid_sas_base.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -252,13 +252,13 @@ u32 megasas_readl(struct megasas_instanc * Fusion registers could intermittently return all zeroes. * This behavior is transient in nature and subsequent reads will * return valid value. As a workaround in driver, retry readl for - * upto three times until a non-zero value is read. + * up to thirty times until a non-zero value is read. */ if (instance->adapter_type == AERO_SERIES) { do { ret_val = readl(addr); i++; - } while (ret_val == 0 && i < 3); + } while (ret_val == 0 && i < 30); return ret_val; } else { return readl(addr);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pu Wen puwen@hygon.cn
commit ee545b94d39a00c93dc98b1dbcbcf731d2eadeb4 upstream.
Hygon processors with a model ID > 3 have CPUID leaf 0xB correctly populated and don't need the fixed package ID shift workaround. The fixup is also incorrect when running in a guest.
Fixes: e0ceeae708ce ("x86/CPU/hygon: Fix phys_proc_id calculation logic for multi-die processors") Signed-off-by: Pu Wen puwen@hygon.cn Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/tencent_594804A808BD93A4EBF50A994F228E3A7F07@qq.co... Link: https://lore.kernel.org/r/20230814085112.089607918@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/hygon.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/hygon.c +++ b/arch/x86/kernel/cpu/hygon.c @@ -88,8 +88,12 @@ static void hygon_get_topology(struct cp if (!err) c->x86_coreid_bits = get_count_order(c->x86_max_cores);
- /* Socket ID is ApicId[6] for these processors. */ - c->phys_proc_id = c->apicid >> APICID_SOCKET_ID_BIT; + /* + * Socket ID is ApicId[6] for the processors with model <= 0x3 + * when running on host. + */ + if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) && c->x86_model <= 0x3) + c->phys_proc_id = c->apicid >> APICID_SOCKET_ID_BIT;
cacheinfo_hygon_init_llc_id(c, cpu); } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicolas Saenz Julienne nsaenz@amazon.com
commit d6800af51c76b6dae20e6023bbdc9b3da3ab5121 upstream.
Don't apply the stimer's counter side effects when modifying its value from user-space, as this may trigger spurious interrupts.
For example: - The stimer is configured in auto-enable mode. - The stimer's count is set and the timer enabled. - The stimer expires, an interrupt is injected. - The VM is live migrated. - The stimer config and count are deserialized, auto-enable is ON, the stimer is re-enabled. - The stimer expires right away, and injects an unwarranted interrupt.
Cc: stable@vger.kernel.org Fixes: 1f4b34f825e8 ("kvm/x86: Hyper-V SynIC timers") Signed-off-by: Nicolas Saenz Julienne nsaenz@amazon.com Reviewed-by: Vitaly Kuznetsov vkuznets@redhat.com Link: https://lore.kernel.org/r/20231017155101.40677-1-nsaenz@amazon.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/hyperv.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -555,10 +555,12 @@ static int stimer_set_count(struct kvm_v
stimer_cleanup(stimer); stimer->count = count; - if (stimer->count == 0) - stimer->config.enable = 0; - else if (stimer->config.auto_enable) - stimer->config.enable = 1; + if (!host) { + if (stimer->count == 0) + stimer->config.enable = 0; + else if (stimer->config.auto_enable) + stimer->config.enable = 1; + }
if (stimer->config.enable) stimer_mark_pending(stimer, false);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej S. Szmigiero maciej.szmigiero@oracle.com
commit 2770d4722036d6bd24bcb78e9cd7f6e572077d03 upstream.
Hyper-V enabled Windows Server 2022 KVM VM cannot be started on Zen1 Ryzen since it crashes at boot with SYSTEM_THREAD_EXCEPTION_NOT_HANDLED + STATUS_PRIVILEGED_INSTRUCTION (in other words, because of an unexpected #GP in the guest kernel).
This is because Windows tries to set bit 8 in MSR_AMD64_TW_CFG and can't handle receiving a #GP when doing so.
Give this MSR the same treatment that commit 2e32b7190641 ("x86, kvm: Add MSR_AMD64_BU_CFG2 to the list of ignored MSRs") gave MSR_AMD64_BU_CFG2 under justification that this MSR is baremetal-relevant only. Although apparently it was then needed for Linux guests, not Windows as in this case.
With this change, the aforementioned guest setup is able to finish booting successfully.
This issue can be reproduced either on a Summit Ridge Ryzen (with just "-cpu host") or on a Naples EPYC (with "-cpu host,stepping=1" since EPYC is ordinarily stepping 2).
Alternatively, userspace could solve the problem by using MSR filters, but forcing every userspace to define a filter isn't very friendly and doesn't add much, if any, value. The only potential hiccup is if one of these "baremetal-only" MSRs ever requires actual emulation and/or has F/M/S specific behavior. But if that happens, then KVM can still punt *that* handling to userspace since userspace MSR filters "win" over KVM's default handling.
Signed-off-by: Maciej S. Szmigiero maciej.szmigiero@oracle.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1ce85d9c7c9e9632393816cf19c902e0a3f411f1.169773140... [sean: call out MSR filtering alternative] Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/kvm/x86.c | 2 ++ 2 files changed, 3 insertions(+)
--- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -469,6 +469,7 @@ #define MSR_AMD64_OSVW_STATUS 0xc0010141 #define MSR_AMD64_LS_CFG 0xc0011020 #define MSR_AMD64_DC_CFG 0xc0011022 +#define MSR_AMD64_TW_CFG 0xc0011023
#define MSR_AMD64_DE_CFG 0xc0011029 #define MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT 1 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2720,6 +2720,7 @@ int kvm_set_msr_common(struct kvm_vcpu * case MSR_AMD64_PATCH_LOADER: case MSR_AMD64_BU_CFG2: case MSR_AMD64_DC_CFG: + case MSR_AMD64_TW_CFG: case MSR_F15H_EX_CFG: break;
@@ -3029,6 +3030,7 @@ int kvm_get_msr_common(struct kvm_vcpu * case MSR_AMD64_BU_CFG2: case MSR_IA32_PERF_CTL: case MSR_AMD64_DC_CFG: + case MSR_AMD64_TW_CFG: case MSR_F15H_EX_CFG: msr_info->data = 0; break;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Moore paul@paul-moore.com
commit 47846d51348dd62e5231a83be040981b17c955fa upstream.
The get_task_exe_file() function locks the given task with task_lock() which when used inside audit_exe_compare() can cause deadlocks on systems that generate audit records when the task_lock() is held. We resolve this problem with two changes: ignoring those cases where the task being audited is not the current task, and changing our approach to obtaining the executable file struct to not require task_lock().
With the intent of the audit exe filter being to filter on audit events generated by processes started by the specified executable, it makes sense that we would only want to use the exe filter on audit records associated with the currently executing process, e.g. @current. If we are asked to filter records using a non-@current task_struct we can safely ignore the exe filter without negatively impacting the admin's expectations for the exe filter.
Knowing that we only have to worry about filtering the currently executing task in audit_exe_compare() we can do away with the task_lock() and call get_mm_exe_file() with @current->mm directly.
Cc: stable@vger.kernel.org Fixes: 5efc244346f9 ("audit: fix exe_file access in audit_exe_compare") Reported-by: Andreas Steinmetz anstein99@googlemail.com Reviewed-by: John Johansen john.johanse@canonical.com Reviewed-by: Mateusz Guzik mjguzik@gmail.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/audit_watch.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -542,11 +542,18 @@ int audit_exe_compare(struct task_struct unsigned long ino; dev_t dev;
- exe_file = get_task_exe_file(tsk); + /* only do exe filtering if we are recording @current events/records */ + if (tsk != current) + return 0; + + if (WARN_ON_ONCE(!current->mm)) + return 0; + exe_file = get_mm_exe_file(current->mm); if (!exe_file) return 0; ino = file_inode(exe_file)->i_ino; dev = file_inode(exe_file)->i_sb->s_dev; fput(exe_file); + return audit_mark_compare(mark, ino, dev); }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Moore paul@paul-moore.com
commit 969d90ec212bae4b45bf9d21d7daa30aa6cf055e upstream.
eBPF can end up calling into the audit code from some odd places, and some of these places don't have @current set properly so we end up tripping the `WARN_ON_ONCE(!current->mm)` near the top of `audit_exe_compare()`. While the basic `!current->mm` check is good, the `WARN_ON_ONCE()` results in some scary console messages so let's drop that and just do the regular `!current->mm` check to avoid problems.
Cc: stable@vger.kernel.org Fixes: 47846d51348d ("audit: don't take task_lock() in audit_exe_compare() code path") Reported-by: Artem Savkov asavkov@redhat.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/audit_watch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -546,7 +546,7 @@ int audit_exe_compare(struct task_struct if (tsk != current) return 0;
- if (WARN_ON_ONCE(!current->mm)) + if (!current->mm) return 0; exe_file = get_mm_exe_file(current->mm); if (!exe_file)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit 2704c9a5593f4a47620c12dad78838ca62b52f48 upstream.
The xen_hvc_init() function should always register the frontend driver, even when there's no primary console — as there may be secondary consoles. (Qemu can always add secondary consoles, but only the toolstack can add the primary because it's special.)
Signed-off-by: David Woodhouse dwmw@amazon.co.uk Reviewed-by: Juergen Gross jgross@suse.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231020161529.355083-3-dwmw2@infradead.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/hvc/hvc_xen.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -587,7 +587,7 @@ static int __init xen_hvc_init(void) ops = &dom0_hvc_ops; r = xen_initial_domain_console_init(); if (r < 0) - return r; + goto register_fe; info = vtermno_to_xencons(HVC_COOKIE); } else { ops = &domU_hvc_ops; @@ -596,7 +596,7 @@ static int __init xen_hvc_init(void) else r = xen_pv_console_init(); if (r < 0) - return r; + goto register_fe;
info = vtermno_to_xencons(HVC_COOKIE); info->irq = bind_evtchn_to_irq_lateeoi(info->evtchn); @@ -621,6 +621,7 @@ static int __init xen_hvc_init(void) }
r = 0; + register_fe: #ifdef CONFIG_HVC_XEN_FRONTEND r = xenbus_register_frontend(&xencons_driver); #endif
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lukas Wunner lukas@wunner.de
commit 70b70a4307cccebe91388337b1c85735ce4de6ff upstream.
struct pci_dev contains two flags which govern whether the device may suspend to D3cold:
* no_d3cold provides an opt-out for drivers (e.g. if a device is known to not wake from D3cold)
* d3cold_allowed provides an opt-out for user space (default is true, user space may set to false)
Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend"), the user space setting overwrites the driver setting. Essentially user space is trusted to know better than the driver whether D3cold is working.
That feels unsafe and wrong. Assume that the change was introduced inadvertently and do not overwrite no_d3cold when d3cold_allowed is modified. Instead, consider d3cold_allowed in addition to no_d3cold when choosing a suspend state for the device.
That way, user space may opt out of D3cold if the driver hasn't, but it may no longer force an opt in if the driver has opted out.
Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend") Link: https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.169502536... Signed-off-by: Lukas Wunner lukas@wunner.de Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/pci-acpi.c | 2 +- drivers/pci/pci-sysfs.c | 5 +---- 2 files changed, 2 insertions(+), 5 deletions(-)
--- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -909,7 +909,7 @@ static pci_power_t acpi_pci_choose_state { int acpi_state, d_max;
- if (pdev->no_d3cold) + if (pdev->no_d3cold || !pdev->d3cold_allowed) d_max = ACPI_STATE_D3_HOT; else d_max = ACPI_STATE_D3_COLD; --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -517,10 +517,7 @@ static ssize_t d3cold_allowed_store(stru return -EINVAL;
pdev->d3cold_allowed = !!val; - if (pdev->d3cold_allowed) - pci_d3cold_enable(pdev); - else - pci_d3cold_disable(pdev); + pci_bridge_d3_update(pdev);
pm_runtime_resume(dev);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Werner Sembach wse@tuxedocomputers.com
commit 0da9eccde3270b832c059ad618bf66e510c75d33 upstream.
The TongFang GMxXGxx/TUXEDO Stellaris/Pollaris Gen5 needs IRQ overriding for the keyboard to work.
Adding an entry for this laptop to the override_table makes the internal keyboard functional.
Signed-off-by: Werner Sembach wse@tuxedocomputers.com Cc: All applicable stable@vger.kernel.org Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/resource.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -443,6 +443,18 @@ static const struct dmi_system_id asus_l }, }, { + /* TongFang GMxXGxx/TUXEDO Polaris 15 Gen5 AMD */ + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GMxXGxx"), + }, + }, + { + /* TongFang GM6XGxX/TUXEDO Stellaris 16 Gen5 AMD */ + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GM6XGxX"), + }, + }, + { .ident = "Asus ExpertBook B2502", .matches = { DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rong Chen rong.chen@amlogic.com
commit 57925e16c9f7d18012bcf45bfa658f92c087981a upstream.
For the t7 and older SoC families, the CMD_CFG_ERROR has no effect. Starting from SoC family C3, setting this bit without SG LINK data address will cause the controller to generate an IRQ and stop working.
To fix it, don't set the bit CMD_CFG_ERROR anymore.
Fixes: 18f92bc02f17 ("mmc: meson-gx: make sure the descriptor is stopped on errors") Signed-off-by: Rong Chen rong.chen@amlogic.com Reviewed-by: Jerome Brunet jbrunet@baylibre.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231026073156.2868310-1-rong.chen@amlogic.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/meson-gx-mmc.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/mmc/host/meson-gx-mmc.c +++ b/drivers/mmc/host/meson-gx-mmc.c @@ -803,7 +803,6 @@ static void meson_mmc_start_cmd(struct m
cmd_cfg |= FIELD_PREP(CMD_CFG_CMD_INDEX_MASK, cmd->opcode); cmd_cfg |= CMD_CFG_OWNER; /* owned by CPU */ - cmd_cfg |= CMD_CFG_ERROR; /* stop in case of error */
meson_mmc_set_response_bits(cmd, &cmd_cfg);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herve Codina herve.codina@bootlin.com
commit 5e7afb2eb7b2a7c81e9f608cbdf74a07606fd1b5 upstream.
irq_remove_generic_chip() calculates the Linux interrupt number for removing the handler and interrupt chip based on gc::irq_base as a linear function of the bit positions of set bits in the @msk argument.
When the generic chip is present in an irq domain, i.e. created with a call to irq_alloc_domain_generic_chips(), gc::irq_base contains not the base Linux interrupt number. It contains the base hardware interrupt for this chip. It is set to 0 for the first chip in the domain, 0 + N for the next chip, where $N is the number of hardware interrupts per chip.
That means the Linux interrupt number cannot be calculated based on gc::irq_base for irqdomain based chips without a domain map lookup, which is currently missing.
Rework the code to take the irqdomain case into account and calculate the Linux interrupt number by a irqdomain lookup of the domain specific hardware interrupt number.
[ tglx: Massage changelog. Reshuffle the logic and add a proper comment. ]
Fixes: cfefd21e693d ("genirq: Add chip suspend and resume callbacks") Signed-off-by: Herve Codina herve.codina@bootlin.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231024150335.322282-1-herve.codina@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/irq/generic-chip.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-)
--- a/kernel/irq/generic-chip.c +++ b/kernel/irq/generic-chip.c @@ -537,21 +537,34 @@ EXPORT_SYMBOL_GPL(irq_setup_alt_chip); void irq_remove_generic_chip(struct irq_chip_generic *gc, u32 msk, unsigned int clr, unsigned int set) { - unsigned int i = gc->irq_base; + unsigned int i, virq;
raw_spin_lock(&gc_lock); list_del(&gc->list); raw_spin_unlock(&gc_lock);
- for (; msk; msk >>= 1, i++) { + for (i = 0; msk; msk >>= 1, i++) { if (!(msk & 0x01)) continue;
+ /* + * Interrupt domain based chips store the base hardware + * interrupt number in gc::irq_base. Otherwise gc::irq_base + * contains the base Linux interrupt number. + */ + if (gc->domain) { + virq = irq_find_mapping(gc->domain, gc->irq_base + i); + if (!virq) + continue; + } else { + virq = gc->irq_base + i; + } + /* Remove handler first. That will mask the irq line */ - irq_set_handler(i, NULL); - irq_set_chip(i, &no_irq_chip); - irq_set_chip_data(i, NULL); - irq_modify_status(i, clr, set); + irq_set_handler(virq, NULL); + irq_set_chip(virq, &no_irq_chip); + irq_set_chip_data(virq, NULL); + irq_modify_status(virq, clr, set); } } EXPORT_SYMBOL_GPL(irq_remove_generic_chip);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
commit 200bddbb3f5202bbce96444fdc416305de14f547 upstream.
With CONFIG_PCIE_KEYSTONE=y and ks_pcie_remove() marked with __exit, the function is discarded from the driver. In this case a bound device can still get unbound, e.g via sysfs. Then no cleanup code is run resulting in resource leaks or worse.
The right thing to do is do always have the remove callback available. Note that this driver cannot be compiled as a module, so ks_pcie_remove() was always discarded before this change and modpost couldn't warn about this issue. Furthermore the __ref annotation also prevents a warning.
Fixes: 0c4ffcfe1fbc ("PCI: keystone: Add TI Keystone PCIe driver") Link: https://lore.kernel.org/r/20231001170254.2506508-4-u.kleine-koenig@pengutron... Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Bjorn Helgaas bhelgaas@google.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/dwc/pci-keystone.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/pci/controller/dwc/pci-keystone.c +++ b/drivers/pci/controller/dwc/pci-keystone.c @@ -1407,7 +1407,7 @@ err_link: return ret; }
-static int __exit ks_pcie_remove(struct platform_device *pdev) +static int ks_pcie_remove(struct platform_device *pdev) { struct keystone_pcie *ks_pcie = platform_get_drvdata(pdev); struct device_link **link = ks_pcie->link; @@ -1425,7 +1425,7 @@ static int __exit ks_pcie_remove(struct
static struct platform_driver ks_pcie_driver __refdata = { .probe = ks_pcie_probe, - .remove = __exit_p(ks_pcie_remove), + .remove = ks_pcie_remove, .driver = { .name = "keystone-pcie", .of_match_table = of_match_ptr(ks_pcie_of_match),
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
commit 7994db905c0fd692cf04c527585f08a91b560144 upstream.
The __init annotation makes the ks_pcie_probe() function disappear after booting completes. However a device can also be bound later. In that case, we try to call ks_pcie_probe(), but the backing memory is likely already overwritten.
The right thing to do is do always have the probe callback available. Note that the (wrong) __refdata annotation prevented this issue to be noticed by modpost.
Fixes: 0c4ffcfe1fbc ("PCI: keystone: Add TI Keystone PCIe driver") Link: https://lore.kernel.org/r/20231001170254.2506508-5-u.kleine-koenig@pengutron... Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Bjorn Helgaas bhelgaas@google.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/dwc/pci-keystone.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/pci/controller/dwc/pci-keystone.c +++ b/drivers/pci/controller/dwc/pci-keystone.c @@ -1181,7 +1181,7 @@ static const struct of_device_id ks_pcie { }, };
-static int __init ks_pcie_probe(struct platform_device *pdev) +static int ks_pcie_probe(struct platform_device *pdev) { const struct dw_pcie_host_ops *host_ops; const struct dw_pcie_ep_ops *ep_ops; @@ -1423,7 +1423,7 @@ static int ks_pcie_remove(struct platfor return 0; }
-static struct platform_driver ks_pcie_driver __refdata = { +static struct platform_driver ks_pcie_driver = { .probe = ks_pcie_probe, .remove = ks_pcie_remove, .driver = {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 6240553b52c475d9fc9674de0521b77e692f3764 upstream.
PDC2.0 specifies the additional PSW-bit field.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/include/uapi/asm/pdc.h | 1 + 1 file changed, 1 insertion(+)
--- a/arch/parisc/include/uapi/asm/pdc.h +++ b/arch/parisc/include/uapi/asm/pdc.h @@ -465,6 +465,7 @@ struct pdc_model { /* for PDC_MODEL */ unsigned long arch_rev; unsigned long pot_key; unsigned long curr_key; + unsigned long width; /* default of PSW_W bit (1=enabled) */ };
struct pdc_cache_cf { /* for PDC_CACHE (I/D-caches) */
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit d0c219472980d15f5cbc5c8aec736848bda3f235 upstream.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/parisc/power.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
--- a/drivers/parisc/power.c +++ b/drivers/parisc/power.c @@ -192,6 +192,14 @@ static struct notifier_block parisc_pani .priority = INT_MAX, };
+/* qemu soft power-off function */ +static int qemu_power_off(struct sys_off_data *data) +{ + /* this turns the system off via SeaBIOS */ + *(int *)data->cb_data = 0; + pdc_soft_power_button(1); + return NOTIFY_DONE; +}
static int __init power_init(void) { @@ -221,7 +229,13 @@ static int __init power_init(void) soft_power_reg); }
- power_task = kthread_run(kpowerswd, (void*)soft_power_reg, KTHREAD_NAME); + power_task = NULL; + if (running_on_qemu && soft_power_reg) + register_sys_off_handler(SYS_OFF_MODE_POWER_OFF, SYS_OFF_PRIO_DEFAULT, + qemu_power_off, (void *)soft_power_reg); + else + power_task = kthread_run(kpowerswd, (void*)soft_power_reg, + KTHREAD_NAME); if (IS_ERR(power_task)) { printk(KERN_ERR DRIVER_NAME ": thread creation failed. Driver not loaded.\n"); pdc_soft_power_button(0);
On 11/24/23 18:55, Greg Kroah-Hartman wrote:
5.4-stable review patch. If anyone has any objections, please let me know.
Please drop this patch from all stable kernels < 6.0. It depends on code which was added in 5.19...
Thanks, Helge
From: Helge Deller deller@gmx.de
commit d0c219472980d15f5cbc5c8aec736848bda3f235 upstream.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/parisc/power.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
--- a/drivers/parisc/power.c +++ b/drivers/parisc/power.c @@ -192,6 +192,14 @@ static struct notifier_block parisc_pani .priority = INT_MAX, };
+/* qemu soft power-off function */ +static int qemu_power_off(struct sys_off_data *data) +{
- /* this turns the system off via SeaBIOS */
- *(int *)data->cb_data = 0;
- pdc_soft_power_button(1);
- return NOTIFY_DONE;
+}
static int __init power_init(void) { @@ -221,7 +229,13 @@ static int __init power_init(void) soft_power_reg); }
- power_task = kthread_run(kpowerswd, (void*)soft_power_reg, KTHREAD_NAME);
- power_task = NULL;
- if (running_on_qemu && soft_power_reg)
register_sys_off_handler(SYS_OFF_MODE_POWER_OFF, SYS_OFF_PRIO_DEFAULT,
qemu_power_off, (void *)soft_power_reg);
- else
power_task = kthread_run(kpowerswd, (void*)soft_power_reg,
if (IS_ERR(power_task)) { printk(KERN_ERR DRIVER_NAME ": thread creation failed. Driver not loaded.\n"); pdc_soft_power_button(0);KTHREAD_NAME);
On Fri, Nov 24, 2023 at 08:47:38PM +0100, Helge Deller wrote:
On 11/24/23 18:55, Greg Kroah-Hartman wrote:
5.4-stable review patch. If anyone has any objections, please let me know.
Please drop this patch from all stable kernels < 6.0. It depends on code which was added in 5.19...
Now dropped, thanks.
greg k-h
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kathiravan Thirumoorthy quic_kathirav@quicinc.com
commit e641a070137dd959932c7c222e000d9d941167a2 upstream.
GPLL, NSS crypto PLL clock rates are fixed and shouldn't be scaled based on the request from dependent clocks. Doing so will result in the unexpected behaviour. So drop the CLK_SET_RATE_PARENT flag from the PLL clocks.
Cc: stable@vger.kernel.org Fixes: b8e7e519625f ("clk: qcom: ipq8074: add remaining PLL’s") Signed-off-by: Kathiravan Thirumoorthy quic_kathirav@quicinc.com Link: https://lore.kernel.org/r/20230913-gpll_cleanup-v2-1-c8ceb1a37680@quicinc.co... Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/qcom/gcc-ipq8074.c | 6 ------ 1 file changed, 6 deletions(-)
--- a/drivers/clk/qcom/gcc-ipq8074.c +++ b/drivers/clk/qcom/gcc-ipq8074.c @@ -423,7 +423,6 @@ static struct clk_fixed_factor gpll0_out }, .num_parents = 1, .ops = &clk_fixed_factor_ops, - .flags = CLK_SET_RATE_PARENT, }, };
@@ -470,7 +469,6 @@ static struct clk_alpha_pll_postdiv gpll }, .num_parents = 1, .ops = &clk_alpha_pll_postdiv_ro_ops, - .flags = CLK_SET_RATE_PARENT, }, };
@@ -503,7 +501,6 @@ static struct clk_alpha_pll_postdiv gpll }, .num_parents = 1, .ops = &clk_alpha_pll_postdiv_ro_ops, - .flags = CLK_SET_RATE_PARENT, }, };
@@ -537,7 +534,6 @@ static struct clk_alpha_pll_postdiv gpll }, .num_parents = 1, .ops = &clk_alpha_pll_postdiv_ro_ops, - .flags = CLK_SET_RATE_PARENT, }, };
@@ -551,7 +547,6 @@ static struct clk_fixed_factor gpll6_out }, .num_parents = 1, .ops = &clk_fixed_factor_ops, - .flags = CLK_SET_RATE_PARENT, }, };
@@ -616,7 +611,6 @@ static struct clk_alpha_pll_postdiv nss_ }, .num_parents = 1, .ops = &clk_alpha_pll_postdiv_ro_ops, - .flags = CLK_SET_RATE_PARENT, }, };
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
commit b44f9da81783fda72632ef9b0d05ea3f3ca447a5 upstream.
This error path should return -EINVAL instead of success.
Fixes: 88095e7b473a ("mmc: Add new VUB300 USB-to-SD/SDIO/MMC driver") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/0769d30c-ad80-421b-bf5d-7d6f5d85604e@moroto.mounta... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/vub300.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/mmc/host/vub300.c +++ b/drivers/mmc/host/vub300.c @@ -2318,6 +2318,7 @@ static int vub300_probe(struct usb_inter vub300->read_only = (0x0010 & vub300->system_port_status.port_flags) ? 1 : 0; } else { + retval = -EINVAL; goto error5; } usb_set_intfdata(interface, vub300);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Brian Geffon bgeffon@google.com
commit f0c7183008b41e92fa676406d87f18773724b48b upstream.
We found at least one situation where the safe pages list was empty and get_buffer() would gladly try to use a NULL pointer.
Signed-off-by: Brian Geffon bgeffon@google.com Fixes: 8357376d3df2 ("[PATCH] swsusp: Improve handling of highmem") Cc: All applicable stable@vger.kernel.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/power/snapshot.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -2377,8 +2377,9 @@ static void *get_highmem_page_buffer(str pbe->copy_page = tmp; } else { /* Copy of the page will be stored in normal memory */ - kaddr = safe_pages_list; - safe_pages_list = safe_pages_list->next; + kaddr = __get_safe_page(ca->gfp_mask); + if (!kaddr) + return ERR_PTR(-ENOMEM); pbe->copy_page = virt_to_page(kaddr); } pbe->next = highmem_pblist; @@ -2558,8 +2559,9 @@ static void *get_buffer(struct memory_bi return ERR_PTR(-ENOMEM); } pbe->orig_address = page_address(page); - pbe->address = safe_pages_list; - safe_pages_list = safe_pages_list->next; + pbe->address = __get_safe_page(ca->gfp_mask); + if (!pbe->address) + return ERR_PTR(-ENOMEM); pbe->next = restore_pblist; restore_pblist = pbe; return pbe->address;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Brian Geffon bgeffon@google.com
commit d08970df1980476f27936e24d452550f3e9e92e1 upstream.
In snapshot_write_next(), sync_read is set and unset in three different spots unnecessiarly. As a result there is a subtle bug where the first page after the meta data has been loaded unconditionally sets sync_read to 0. If this first PFN was actually a highmem page, then the returned buffer will be the global "buffer," and the page needs to be loaded synchronously.
That is, I'm not sure we can always assume the following to be safe:
handle->buffer = get_buffer(&orig_bm, &ca); handle->sync_read = 0;
Because get_buffer() can call get_highmem_page_buffer() which can return 'buffer'.
The easiest way to address this is just set sync_read before snapshot_write_next() returns if handle->buffer == buffer.
Signed-off-by: Brian Geffon bgeffon@google.com Fixes: 8357376d3df2 ("[PATCH] swsusp: Improve handling of highmem") Cc: All applicable stable@vger.kernel.org [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/power/snapshot.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
--- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -2592,8 +2592,6 @@ int snapshot_write_next(struct snapshot_ if (handle->cur > 1 && handle->cur > nr_meta_pages + nr_copy_pages) return 0;
- handle->sync_read = 1; - if (!handle->cur) { if (!buffer) /* This makes the buffer be freed by swsusp_free() */ @@ -2634,7 +2632,6 @@ int snapshot_write_next(struct snapshot_ memory_bm_position_reset(&orig_bm); restore_pblist = NULL; handle->buffer = get_buffer(&orig_bm, &ca); - handle->sync_read = 0; if (IS_ERR(handle->buffer)) return PTR_ERR(handle->buffer); } @@ -2646,9 +2643,8 @@ int snapshot_write_next(struct snapshot_ handle->buffer = get_buffer(&orig_bm, &ca); if (IS_ERR(handle->buffer)) return PTR_ERR(handle->buffer); - if (handle->buffer != buffer) - handle->sync_read = 0; } + handle->sync_read = (handle->buffer == buffer); handle->cur++; return PAGE_SIZE; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
commit 11aeb97b45ad2e0040cbb2a589bc403152526345 upstream.
We have a random schedule_timeout() if the current transaction is committing, which seems to be a holdover from the original delalloc reservation code.
Remove this, we have the proper flushing stuff, we shouldn't be hoping for random timing things to make everything work. This just induces latency for no reason.
CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/delalloc-space.c | 3 --- 1 file changed, 3 deletions(-)
--- a/fs/btrfs/delalloc-space.c +++ b/fs/btrfs/delalloc-space.c @@ -324,9 +324,6 @@ int btrfs_delalloc_reserve_metadata(stru } else { if (current->journal_info) flush = BTRFS_RESERVE_FLUSH_LIMIT; - - if (btrfs_transaction_in_commit(fs_info)) - schedule_timeout(1); }
if (delalloc_lock)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhihao Cheng chengzhihao1@huawei.com
commit 61187fce8600e8ef90e601be84f9d0f3222c1206 upstream.
JBD2 makes sure journal data is fallen on fs device by sync_blockdev(), however, other process could intercept the EIO information from bdev's mapping, which leads journal recovering successful even EIO occurs during data written back to fs device.
We found this problem in our product, iscsi + multipath is chosen for block device of ext4. Unstable network may trigger kpartx to rescan partitions in device mapper layer. Detailed process is shown as following:
mount kpartx irq jbd2_journal_recover do_one_pass memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal mark_buffer_dirty // mark bh dirty vfs_read generic_file_read_iter // dio filemap_write_and_wait_range __filemap_fdatawrite_range do_writepages block_write_full_folio submit_bh_wbc >> EIO occurs in disk << end_buffer_async_write mark_buffer_write_io_error mapping_set_error set_bit(AS_EIO, &mapping->flags) // set! filemap_check_errors test_and_clear_bit(AS_EIO, &mapping->flags) // clear! err2 = sync_blockdev filemap_write_and_wait filemap_check_errors test_and_clear_bit(AS_EIO, &mapping->flags) // false err2 = 0
Filesystem is mounted successfully even data from journal is failed written into disk, and ext4/ocfs2 could become corrupted.
Fix it by comparing the wb_err state in fs block device before recovering and after recovering.
A reproducer can be found in the kernel bugzilla referenced below.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888 Cc: stable@vger.kernel.org Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230919012525.1783108-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/jbd2/recovery.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/fs/jbd2/recovery.c +++ b/fs/jbd2/recovery.c @@ -247,6 +247,8 @@ int jbd2_journal_recover(journal_t *jour journal_superblock_t * sb;
struct recovery_info info; + errseq_t wb_err; + struct address_space *mapping;
memset(&info, 0, sizeof(info)); sb = journal->j_superblock; @@ -264,6 +266,9 @@ int jbd2_journal_recover(journal_t *jour return 0; }
+ wb_err = 0; + mapping = journal->j_fs_dev->bd_inode->i_mapping; + errseq_check_and_advance(&mapping->wb_err, &wb_err); err = do_one_pass(journal, &info, PASS_SCAN); if (!err) err = do_one_pass(journal, &info, PASS_REVOKE); @@ -284,6 +289,9 @@ int jbd2_journal_recover(journal_t *jour err2 = sync_blockdev(journal->j_fs_dev); if (!err) err = err2; + err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err); + if (!err) + err = err2; /* Make sure all replayed data is on permanent storage */ if (journal->j_flags & JBD2_BARRIER) { err2 = blkdev_issue_flush(journal->j_fs_dev, GFP_KERNEL, NULL);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
commit d3cc1b0be258191d6360c82ea158c2972f8d3991 upstream.
Since commit d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key"), xfstest generic/270 causes a WARNING when run on f2fs with test_dummy_encryption in the mount options:
$ kvm-xfstests -c f2fs/encrypt generic/270 [...] WARNING: CPU: 1 PID: 2453 at fs/crypto/keyring.c:240 fscrypt_destroy_keyring+0x1f5/0x260
The cause of the WARNING is that not all encrypted inodes have been evicted before fscrypt_destroy_keyring() is called, which violates an assumption. This happens because the test uses an external quota file, which gets automatically encrypted due to test_dummy_encryption.
Encryption of quota files has never really been supported. On ext4, ext4_quota_read() does not decrypt the data, so encrypted quota files are always considered invalid on ext4. On f2fs, f2fs_quota_read() uses the pagecache, so trying to use an encrypted quota file gets farther, resulting in the issue described above being possible. But this was never intended to be possible, and there is no use case for it.
Therefore, make the quota support layer explicitly reject using IS_ENCRYPTED inodes when quotaon is attempted.
Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Jan Kara jack@suse.cz Message-Id: 20230905003227.326998-1-ebiggers@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/quota/dquot.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -2388,6 +2388,20 @@ static int vfs_setup_quota_inode(struct if (sb_has_quota_loaded(sb, type)) return -EBUSY;
+ /* + * Quota files should never be encrypted. They should be thought of as + * filesystem metadata, not user data. New-style internal quota files + * cannot be encrypted by users anyway, but old-style external quota + * files could potentially be incorrectly created in an encrypted + * directory, hence this explicit check. Some reasons why encrypted + * quota files don't work include: (1) some filesystems that support + * encryption don't handle it in their quota_read and quota_write, and + * (2) cleaning up encrypted quota files at unmount would need special + * consideration, as quota files are cleaned up later than user files. + */ + if (IS_ENCRYPTED(inode)) + return -EINVAL; + dqopt->files[type] = igrab(inode); if (!dqopt->files[type]) return -EIO;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Bara benjamin.bara@skidata.com
commit 60466c067927abbcaff299845abd4b7069963139 upstream.
As the emergency restart does not call kernel_restart_prepare(), the system_state stays in SYSTEM_RUNNING.
Since bae1d3a05a8b, this hinders i2c_in_atomic_xfer_mode() from becoming active, and therefore might lead to avoidable warnings in the restart handlers, e.g.:
[ 12.667612] WARNING: CPU: 1 PID: 1 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x33c/0x6b0 [ 12.676926] Voluntary context switch within RCU read-side critical section! ... [ 12.742376] schedule_timeout from wait_for_completion_timeout+0x90/0x114 [ 12.749179] wait_for_completion_timeout from tegra_i2c_wait_completion+0x40/0x70 ... [ 12.994527] atomic_notifier_call_chain from machine_restart+0x34/0x58 [ 13.001050] machine_restart from panic+0x2a8/0x32c
Avoid these by setting the correct system_state.
Fixes: bae1d3a05a8b ("i2c: core: remove use of in_atomic()") Cc: stable@vger.kernel.org # v5.2+ Reviewed-by: Dmitry Osipenko dmitry.osipenko@collabora.com Tested-by: Nishanth Menon nm@ti.com Signed-off-by: Benjamin Bara benjamin.bara@skidata.com Link: https://lore.kernel.org/r/20230327-tegra-pmic-reboot-v7-1-18699d5dcd76@skida... Signed-off-by: Lee Jones lee@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/reboot.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -64,6 +64,7 @@ EXPORT_SYMBOL_GPL(pm_power_off_prepare); void emergency_restart(void) { kmsg_dump(KMSG_DUMP_EMERG); + system_state = SYSTEM_RESTART; machine_emergency_restart(); } EXPORT_SYMBOL_GPL(emergency_restart);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Bara benjamin.bara@skidata.com
commit aa49c90894d06e18a1ee7c095edbd2f37c232d02 upstream.
Since bae1d3a05a8b, i2c transfers are non-atomic if preemption is disabled. However, non-atomic i2c transfers require preemption (e.g. in wait_for_completion() while waiting for the DMA).
panic() calls preempt_disable_notrace() before calling emergency_restart(). Therefore, if an i2c device is used for the restart, the xfer should be atomic. This avoids warnings like:
[ 12.667612] WARNING: CPU: 1 PID: 1 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x33c/0x6b0 [ 12.676926] Voluntary context switch within RCU read-side critical section! ... [ 12.742376] schedule_timeout from wait_for_completion_timeout+0x90/0x114 [ 12.749179] wait_for_completion_timeout from tegra_i2c_wait_completion+0x40/0x70 ... [ 12.994527] atomic_notifier_call_chain from machine_restart+0x34/0x58 [ 13.001050] machine_restart from panic+0x2a8/0x32c
Use !preemptible() instead, which is basically the same check as pre-v5.2.
Fixes: bae1d3a05a8b ("i2c: core: remove use of in_atomic()") Cc: stable@vger.kernel.org # v5.2+ Suggested-by: Dmitry Osipenko dmitry.osipenko@collabora.com Acked-by: Wolfram Sang wsa@kernel.org Reviewed-by: Dmitry Osipenko dmitry.osipenko@collabora.com Tested-by: Nishanth Menon nm@ti.com Signed-off-by: Benjamin Bara benjamin.bara@skidata.com Link: https://lore.kernel.org/r/20230327-tegra-pmic-reboot-v7-2-18699d5dcd76@skida... Signed-off-by: Lee Jones lee@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/i2c-core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/i2c/i2c-core.h +++ b/drivers/i2c/i2c-core.h @@ -29,7 +29,7 @@ int i2c_dev_irq_from_resources(const str */ static inline bool i2c_in_atomic_xfer_mode(void) { - return system_state > SYSTEM_RUNNING && irqs_disabled(); + return system_state > SYSTEM_RUNNING && !preemptible(); }
static inline int __i2c_lock_bus_helper(struct i2c_adapter *adap)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sanjuán García, Jorge Jorge.SanjuanGarcia@duagon.com
commit 63ba2d07b4be72b94216d20561f43e1150b25d98 upstream.
chameleon_parse_gdd() may fail for different reasons and end up in the err tag. Make sure we at least always free the mcb_device allocated with mcb_alloc_dev().
If mcb_device_register() fails, make sure to give up the reference in the same place the device was added.
Fixes: 728ac3389296 ("mcb: mcb-parse: fix error handing in chameleon_parse_gdd()") Cc: stable stable@kernel.org Reviewed-by: Jose Javier Rodriguez Barbarin JoseJavier.Rodriguez@duagon.com Signed-off-by: Jorge Sanjuan Garcia jorge.sanjuangarcia@duagon.com Link: https://lore.kernel.org/r/20231019141434.57971-2-jorge.sanjuangarcia@duagon.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mcb/mcb-core.c | 1 + drivers/mcb/mcb-parse.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/mcb/mcb-core.c +++ b/drivers/mcb/mcb-core.c @@ -248,6 +248,7 @@ int mcb_device_register(struct mcb_bus * return 0;
out: + put_device(&dev->dev);
return ret; } --- a/drivers/mcb/mcb-parse.c +++ b/drivers/mcb/mcb-parse.c @@ -106,7 +106,7 @@ static int chameleon_parse_gdd(struct mc return 0;
err: - put_device(&mdev->dev); + mcb_free_dev(mdev);
return ret; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alain Volmat alain.volmat@foss.st.com
commit 03f25d53b145bc2f7ccc82fc04e4482ed734f524 upstream.
In case of the prep descriptor while the channel is already running, the CCR register value stored into the channel could already have its EN bit set. This would lead to a bad transfer since, at start transfer time, enabling the channel while other registers aren't yet properly set. To avoid this, ensure to mask the CCR_EN bit when storing the ccr value into the mdma channel structure.
Fixes: a4ffb13c8946 ("dmaengine: Add STM32 MDMA driver") Signed-off-by: Alain Volmat alain.volmat@foss.st.com Signed-off-by: Amelie Delaunay amelie.delaunay@foss.st.com Cc: stable@vger.kernel.org Tested-by: Alain Volmat alain.volmat@foss.st.com Link: https://lore.kernel.org/r/20231009082450.452877-1-amelie.delaunay@foss.st.co... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/stm32-mdma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/dma/stm32-mdma.c +++ b/drivers/dma/stm32-mdma.c @@ -510,7 +510,7 @@ static int stm32_mdma_set_xfer_param(str src_maxburst = chan->dma_config.src_maxburst; dst_maxburst = chan->dma_config.dst_maxburst;
- ccr = stm32_mdma_read(dmadev, STM32_MDMA_CCR(chan->id)); + ccr = stm32_mdma_read(dmadev, STM32_MDMA_CCR(chan->id)) & ~STM32_MDMA_CCR_EN; ctcr = stm32_mdma_read(dmadev, STM32_MDMA_CTCR(chan->id)); ctbr = stm32_mdma_read(dmadev, STM32_MDMA_CTBR(chan->id));
@@ -938,7 +938,7 @@ stm32_mdma_prep_dma_memcpy(struct dma_ch if (!desc) return NULL;
- ccr = stm32_mdma_read(dmadev, STM32_MDMA_CCR(chan->id)); + ccr = stm32_mdma_read(dmadev, STM32_MDMA_CCR(chan->id)) & ~STM32_MDMA_CCR_EN; ctcr = stm32_mdma_read(dmadev, STM32_MDMA_CTCR(chan->id)); ctbr = stm32_mdma_read(dmadev, STM32_MDMA_CTBR(chan->id)); cbndtr = stm32_mdma_read(dmadev, STM32_MDMA_CBNDTR(chan->id));
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiko Carstens hca@linux.ibm.com
commit 16ba44826a04834d3eeeda4b731c2ea3481062b7 upstream.
If the cmma no-dat feature is available the kernel page tables are walked to identify and mark all pages which are used for address translation (all region, segment, and page tables). In a subsequent loop all other pages are marked as "no-dat" pages with the ESSA instruction.
This information is visible to the hypervisor, so that the hypervisor can optimize purging of guest TLB entries. The initial loop however does not cover the complete kernel address space. This can result in pages being marked as not being used for dynamic address translation, even though they are. In turn guest TLB entries incorrectly may not be purged.
Fix this by adjusting the end address of the kernel address range being walked.
Cc: stable@vger.kernel.org Reviewed-by: Claudio Imbrenda imbrenda@linux.ibm.com Reviewed-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/mm/page-states.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
--- a/arch/s390/mm/page-states.c +++ b/arch/s390/mm/page-states.c @@ -161,15 +161,22 @@ static void mark_kernel_p4d(pgd_t *pgd,
static void mark_kernel_pgd(void) { - unsigned long addr, next; + unsigned long addr, next, max_addr; struct page *page; pgd_t *pgd; int i;
addr = 0; + /* + * Figure out maximum virtual address accessible with the + * kernel ASCE. This is required to keep the page table walker + * from accessing non-existent entries. + */ + max_addr = (S390_lowcore.kernel_asce.val & _ASCE_TYPE_MASK) >> 2; + max_addr = 1UL << (max_addr * 11 + 31); pgd = pgd_offset_k(addr); do { - next = pgd_addr_end(addr, MODULES_END); + next = pgd_addr_end(addr, max_addr); if (pgd_none(*pgd)) continue; if (!pgd_folded(*pgd)) { @@ -178,7 +185,7 @@ static void mark_kernel_pgd(void) set_bit(PG_arch_1, &page[i].flags); } mark_kernel_p4d(pgd, addr, next); - } while (pgd++, addr = next, addr != MODULES_END); + } while (pgd++, addr = next, addr != max_addr); }
void __init cmma_init_nodat(void)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiko Carstens hca@linux.ibm.com
commit 84bb41d5df48868055d159d9247b80927f1f70f9 upstream.
If the cmma no-dat feature is available the kernel page tables are walked to identify and mark all pages which are used for address translation (all region, segment, and page tables). In a subsequent loop all other pages are marked as "no-dat" pages with the ESSA instruction.
This information is visible to the hypervisor, so that the hypervisor can optimize purging of guest TLB entries. All pages used for swapper_pg_dir and invalid_pg_dir are incorrectly marked as no-dat, which in turn can result in incorrect guest TLB flushes.
Fix this by marking those pages correctly as being used for DAT.
Cc: stable@vger.kernel.org Reviewed-by: Claudio Imbrenda imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/mm/page-states.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/arch/s390/mm/page-states.c +++ b/arch/s390/mm/page-states.c @@ -198,6 +198,12 @@ void __init cmma_init_nodat(void) return; /* Mark pages used in kernel page tables */ mark_kernel_pgd(); + page = virt_to_page(&swapper_pg_dir); + for (i = 0; i < 4; i++) + set_bit(PG_arch_1, &page[i].flags); + page = virt_to_page(&invalid_pg_dir); + for (i = 0; i < 4; i++) + set_bit(PG_arch_1, &page[i].flags);
/* Set all kernel pages not used for page tables to stable/no-dat */ for_each_memblock(memory, reg) {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zi Yan ziy@nvidia.com
commit 2e7cfe5cd5b6b0b98abf57a3074885979e187c1c upstream.
Patch series "Use nth_page() in place of direct struct page manipulation", v3.
On SPARSEMEM without VMEMMAP, struct page is not guaranteed to be contiguous, since each memory section's memmap might be allocated independently. hugetlb pages can go beyond a memory section size, thus direct struct page manipulation on hugetlb pages/subpages might give wrong struct page. Kernel provides nth_page() to do the manipulation properly. Use that whenever code can see hugetlb pages.
This patch (of 5):
When dealing with hugetlb pages, manipulating struct page pointers directly can get to wrong struct page, since struct page is not guaranteed to be contiguous on SPARSEMEM without VMEMMAP. Use nth_page() to handle it properly.
Without the fix, page_kasan_tag_reset() could reset wrong page tags, causing a wrong kasan result. No related bug is reported. The fix comes from code inspection.
Link: https://lkml.kernel.org/r/20230913201248.452081-1-zi.yan@sent.com Link: https://lkml.kernel.org/r/20230913201248.452081-2-zi.yan@sent.com Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc") Signed-off-by: Zi Yan ziy@nvidia.com Reviewed-by: Muchun Song songmuchun@bytedance.com Cc: David Hildenbrand david@redhat.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Mike Rapoport (IBM) rppt@kernel.org Cc: Thomas Bogendoerfer tsbogend@alpha.franken.de Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/cma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/cma.c +++ b/mm/cma.c @@ -481,7 +481,7 @@ struct page *cma_alloc(struct cma *cma, */ if (page) { for (i = 0; i < count; i++) - page_kasan_tag_reset(page + i); + page_kasan_tag_reset(nth_page(page, i)); }
if (ret && !no_warn) {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joshua Yeong joshua.yeong@starfivetech.com
commit 4bd8405257da717cd556f99e5fb68693d12c9766 upstream.
IBIR_DEPTH and CMDR_DEPTH should read from status0 instead of status1.
Cc: stable@vger.kernel.org Fixes: 603f2bee2c54 ("i3c: master: Add driver for Cadence IP") Signed-off-by: Joshua Yeong joshua.yeong@starfivetech.com Reviewed-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/r/20230913031743.11439-2-joshua.yeong@starfivetech.c... Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i3c/master/i3c-master-cdns.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/i3c/master/i3c-master-cdns.c +++ b/drivers/i3c/master/i3c-master-cdns.c @@ -189,7 +189,7 @@ #define SLV_STATUS1_HJ_DIS BIT(18) #define SLV_STATUS1_MR_DIS BIT(17) #define SLV_STATUS1_PROT_ERR BIT(16) -#define SLV_STATUS1_DA(x) (((s) & GENMASK(15, 9)) >> 9) +#define SLV_STATUS1_DA(s) (((s) & GENMASK(15, 9)) >> 9) #define SLV_STATUS1_HAS_DA BIT(8) #define SLV_STATUS1_DDR_RX_FULL BIT(7) #define SLV_STATUS1_DDR_TX_FULL BIT(6) @@ -1580,13 +1580,13 @@ static int cdns_i3c_master_probe(struct /* Device ID0 is reserved to describe this master. */ master->maxdevs = CONF_STATUS0_DEVS_NUM(val); master->free_rr_slots = GENMASK(master->maxdevs, 1); + master->caps.ibirfifodepth = CONF_STATUS0_IBIR_DEPTH(val); + master->caps.cmdrfifodepth = CONF_STATUS0_CMDR_DEPTH(val);
val = readl(master->regs + CONF_STATUS1); master->caps.cmdfifodepth = CONF_STATUS1_CMD_DEPTH(val); master->caps.rxfifodepth = CONF_STATUS1_RX_DEPTH(val); master->caps.txfifodepth = CONF_STATUS1_TX_DEPTH(val); - master->caps.ibirfifodepth = CONF_STATUS0_IBIR_DEPTH(val); - master->caps.cmdrfifodepth = CONF_STATUS0_CMDR_DEPTH(val);
spin_lock_init(&master->ibi.lock); master->ibi.num_slots = CONF_STATUS1_IBI_HW_RES(val);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit a406b8b424fa01f244c1aab02ba186258448c36b upstream.
Bail out early with error message when trying to boot a 64-bit kernel on 32-bit machines. This fixes the previous commit to include the check for true 64-bit kernels as well.
Signed-off-by: Helge Deller deller@gmx.de Fixes: 591d2108f3abc ("parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines") Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/kernel/head.S | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/arch/parisc/kernel/head.S +++ b/arch/parisc/kernel/head.S @@ -69,9 +69,8 @@ $bss_loop: stw,ma %arg2,4(%r1) stw,ma %arg3,4(%r1)
-#if !defined(CONFIG_64BIT) && defined(CONFIG_PA20) - /* This 32-bit kernel was compiled for PA2.0 CPUs. Check current CPU - * and halt kernel if we detect a PA1.x CPU. */ +#if defined(CONFIG_PA20) + /* check for 64-bit capable CPU as required by current kernel */ ldi 32,%r10 mtctl %r10,%cr11 .level 2.0
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 166b0110d1ee53290bd11618df6e3991c117495a upstream.
When calculating the pfn for the iitlbt/idtlbt instruction, do not drop the upper 5 address bits. This doesn't seem to have an effect on physical hardware which uses less physical address bits, but in qemu the missing bits are visible.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/kernel/entry.S | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/arch/parisc/kernel/entry.S +++ b/arch/parisc/kernel/entry.S @@ -511,13 +511,13 @@ * to a CPU TLB 4k PFN (4k => 12 bits to shift) */ #define PAGE_ADD_SHIFT (PAGE_SHIFT-12) #define PAGE_ADD_HUGE_SHIFT (REAL_HPAGE_SHIFT-12) + #define PFN_START_BIT (63-ASM_PFN_PTE_SHIFT+(63-58)-PAGE_ADD_SHIFT)
/* Drop prot bits and convert to page addr for iitlbt and idtlbt */ .macro convert_for_tlb_insert20 pte,tmp #ifdef CONFIG_HUGETLB_PAGE copy \pte,\tmp - extrd,u \tmp,(63-ASM_PFN_PTE_SHIFT)+(63-58)+PAGE_ADD_SHIFT,\ - 64-PAGE_SHIFT-PAGE_ADD_SHIFT,\pte + extrd,u \tmp,PFN_START_BIT,PFN_START_BIT+1,\pte
depdi _PAGE_SIZE_ENCODING_DEFAULT,63,\ (63-58)+PAGE_ADD_SHIFT,\pte @@ -525,8 +525,7 @@ depdi _HUGE_PAGE_SIZE_ENCODING_DEFAULT,63,\ (63-58)+PAGE_ADD_HUGE_SHIFT,\pte #else /* Huge pages disabled */ - extrd,u \pte,(63-ASM_PFN_PTE_SHIFT)+(63-58)+PAGE_ADD_SHIFT,\ - 64-PAGE_SHIFT-PAGE_ADD_SHIFT,\pte + extrd,u \pte,PFN_START_BIT,PFN_START_BIT+1,\pte depdi _PAGE_SIZE_ENCODING_DEFAULT,63,\ (63-58)+PAGE_ADD_SHIFT,\pte #endif
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 6ad6e15a9c46b8f0932cd99724f26f3db4db1cdf upstream.
Firmware returns the physical address of the power switch, so need to use gsc_writel() instead of direct memory access.
Fixes: d0c219472980 ("parisc/power: Add power soft-off when running on qemu") Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/parisc/power.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/parisc/power.c +++ b/drivers/parisc/power.c @@ -196,7 +196,7 @@ static struct notifier_block parisc_pani static int qemu_power_off(struct sys_off_data *data) { /* this turns the system off via SeaBIOS */ - *(int *)data->cb_data = 0; + gsc_writel(0, (unsigned long) data->cb_data); pdc_soft_power_button(1); return NOTIFY_DONE; }
On 11/24/23 18:55, Greg Kroah-Hartman wrote:
5.4-stable review patch. If anyone has any objections, please let me know.
Please drop this patch from all stable kernels < 6.0. It depends on code which was added in 5.19...
Thanks, Helge
From: Helge Deller deller@gmx.de
commit 6ad6e15a9c46b8f0932cd99724f26f3db4db1cdf upstream.
Firmware returns the physical address of the power switch, so need to use gsc_writel() instead of direct memory access.
Fixes: d0c219472980 ("parisc/power: Add power soft-off when running on qemu") Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/parisc/power.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/parisc/power.c +++ b/drivers/parisc/power.c @@ -196,7 +196,7 @@ static struct notifier_block parisc_pani static int qemu_power_off(struct sys_off_data *data) { /* this turns the system off via SeaBIOS */
- *(int *)data->cb_data = 0;
- gsc_writel(0, (unsigned long) data->cb_data); pdc_soft_power_button(1); return NOTIFY_DONE; }
On Fri, Nov 24, 2023 at 08:48:17PM +0100, Helge Deller wrote:
On 11/24/23 18:55, Greg Kroah-Hartman wrote:
5.4-stable review patch. If anyone has any objections, please let me know.
Please drop this patch from all stable kernels < 6.0. It depends on code which was added in 5.19...
Now dropped, thanks.
greg k-h
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit c7a60651953359f98dbf24b43e1bf561e1573ed4 upstream.
As reported recently, ALSA core info helper may cause a deadlock at the forced device disconnection during the procfs operation.
The proc_remove() (that is called from the snd_card_disconnect() helper) has a synchronization of the pending procfs accesses via wait_for_completion(). Meanwhile, ALSA procfs helper takes the global mutex_lock(&info_mutex) at both the proc_open callback and snd_card_info_disconnect() helper. Since the proc_open can't finish due to the mutex lock, wait_for_completion() never returns, either, hence it deadlocks.
TASK#1 TASK#2 proc_reg_open() takes use_pde() snd_info_text_entry_open() snd_card_disconnect() snd_info_card_disconnect() takes mutex_lock(&info_mutex) proc_remove() wait_for_completion(unused_pde) ... waiting task#1 closes mutex_lock(&info_mutex) => DEADLOCK
This patch is a workaround for avoiding the deadlock scenario above.
The basic strategy is to move proc_remove() call outside the mutex lock. proc_remove() can work gracefully without extra locking, and it can delete the tree recursively alone. So, we call proc_remove() at snd_info_card_disconnection() at first, then delete the rest resources recursively within the info_mutex lock.
After the change, the function snd_info_disconnect() doesn't do disconnection by itself any longer, but it merely clears the procfs pointer. So rename the function to snd_info_clear_entries() for avoiding confusion.
The similar change is applied to snd_info_free_entry(), too. Since the proc_remove() is called only conditionally with the non-NULL entry->p, it's skipped after the snd_info_clear_entries() call.
Reported-by: Shinhyung Kang s47.kang@samsung.com Closes: https://lore.kernel.org/r/664457955.21699345385931.JavaMail.epsvc@epcpadp4 Reviewed-by: Jaroslav Kysela perex@perex.cz Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231109141954.4283-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/core/info.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)
--- a/sound/core/info.c +++ b/sound/core/info.c @@ -57,7 +57,7 @@ struct snd_info_private_data { };
static int snd_info_version_init(void); -static void snd_info_disconnect(struct snd_info_entry *entry); +static void snd_info_clear_entries(struct snd_info_entry *entry);
/*
@@ -572,11 +572,16 @@ void snd_info_card_disconnect(struct snd { if (!card) return; - mutex_lock(&info_mutex); + proc_remove(card->proc_root_link); - card->proc_root_link = NULL; if (card->proc_root) - snd_info_disconnect(card->proc_root); + proc_remove(card->proc_root->p); + + mutex_lock(&info_mutex); + if (card->proc_root) + snd_info_clear_entries(card->proc_root); + card->proc_root_link = NULL; + card->proc_root = NULL; mutex_unlock(&info_mutex); }
@@ -748,15 +753,14 @@ struct snd_info_entry *snd_info_create_c } EXPORT_SYMBOL(snd_info_create_card_entry);
-static void snd_info_disconnect(struct snd_info_entry *entry) +static void snd_info_clear_entries(struct snd_info_entry *entry) { struct snd_info_entry *p;
if (!entry->p) return; list_for_each_entry(p, &entry->children, list) - snd_info_disconnect(p); - proc_remove(entry->p); + snd_info_clear_entries(p); entry->p = NULL; }
@@ -773,8 +777,9 @@ void snd_info_free_entry(struct snd_info if (!entry) return; if (entry->p) { + proc_remove(entry->p); mutex_lock(&info_mutex); - snd_info_disconnect(entry); + snd_info_clear_entries(entry); mutex_unlock(&info_mutex); }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chandradeep Dey codesigning@chandradeepdey.com
commit 713f040cd22285fcc506f40a0d259566e6758c3c upstream.
Apply the already existing quirk chain ALC294_FIXUP_ASUS_SPK to enable the internal speaker of ASUS K6500ZC.
Signed-off-by: Chandradeep Dey codesigning@chandradeepdey.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/NizcVHQ--3-9@chandradeepdey.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -8242,6 +8242,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x1043, 0x10a1, "ASUS UX391UA", ALC294_FIXUP_ASUS_SPK), SND_PCI_QUIRK(0x1043, 0x10c0, "ASUS X540SA", ALC256_FIXUP_ASUS_MIC), SND_PCI_QUIRK(0x1043, 0x10d0, "ASUS X540LA/X540LJ", ALC255_FIXUP_ASUS_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1043, 0x10d3, "ASUS K6500ZC", ALC294_FIXUP_ASUS_SPK), SND_PCI_QUIRK(0x1043, 0x115d, "Asus 1015E", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x1043, 0x11c0, "ASUS X556UR", ALC255_FIXUP_ASUS_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1043, 0x125e, "ASUS Q524UQK", ALC255_FIXUP_ASUS_MIC_NO_PRESENCE),
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Safonov dima@arista.com
[ Upstream commit dca3ac8d3bc9436eb5fd35b80cdcad762fbfa518 ]
The SUPPORT_SYSRQ ifdeffery is not nice as: - May create misunderstanding about sizeof(struct uart_port) between different objects - Prevents moving functions from serial_core.h - Reduces readability (well, it's ifdeffery - it's hard to follow)
In order to remove SUPPORT_SYSRQ, has_sysrq variable has been added. Initialise it in driver's probe and remove ifdeffery.
Cc: Kevin Hilman khilman@baylibre.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-amlogic@lists.infradead.org Signed-off-by: Dmitry Safonov dima@arista.com Link: https://lore.kernel.org/r/20191213000657.931618-22-dima@arista.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 2a1d728f20ed ("tty: serial: meson: fix hard LOCKUP on crtscts mode") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/meson_uart.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/tty/serial/meson_uart.c b/drivers/tty/serial/meson_uart.c index 849ce8c1ef392..4c3616cc00833 100644 --- a/drivers/tty/serial/meson_uart.c +++ b/drivers/tty/serial/meson_uart.c @@ -5,10 +5,6 @@ * Copyright (C) 2014 Carlo Caione carlo@caione.org */
-#if defined(CONFIG_SERIAL_MESON_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ) -#define SUPPORT_SYSRQ -#endif - #include <linux/clk.h> #include <linux/console.h> #include <linux/delay.h> @@ -716,6 +712,7 @@ static int meson_uart_probe(struct platform_device *pdev) port->mapsize = resource_size(res_mem); port->irq = res_irq->start; port->flags = UPF_BOOT_AUTOCONF | UPF_LOW_LATENCY; + port->has_sysrq = IS_ENABLED(CONFIG_SERIAL_MESON_CONSOLE); port->dev = &pdev->dev; port->line = pdev->id; port->type = PORT_MESON;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Colin Ian King colin.king@canonical.com
[ Upstream commit 021212f5335229ed12e3d31f9b7d30bd3bb66f7d ]
The variable id being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Since id is just being used in a for-loop inside a local scope, move the declaration of id to that scope.
Reviewed-by: Kevin Hilman khilman@baylibre.com Reviewed-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Signed-off-by: Colin Ian King colin.king@canonical.com Addresses-Coverity: ("Unused value") Link: https://lore.kernel.org/r/20210426101106.9122-1-colin.king@canonical.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 2a1d728f20ed ("tty: serial: meson: fix hard LOCKUP on crtscts mode") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/meson_uart.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/meson_uart.c b/drivers/tty/serial/meson_uart.c index 4c3616cc00833..6a74a31231ebf 100644 --- a/drivers/tty/serial/meson_uart.c +++ b/drivers/tty/serial/meson_uart.c @@ -664,12 +664,13 @@ static int meson_uart_probe(struct platform_device *pdev) struct resource *res_mem, *res_irq; struct uart_port *port; int ret = 0; - int id = -1;
if (pdev->dev.of_node) pdev->id = of_alias_get_id(pdev->dev.of_node, "serial");
if (pdev->id < 0) { + int id; + for (id = AML_UART_PORT_OFFSET; id < AML_UART_PORT_NUM; id++) { if (!meson_ports[id]) { pdev->id = id;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neil Armstrong narmstrong@baylibre.com
[ Upstream commit 27d44e05d7b85d9d4cfe0a3c0663ea49752ece93 ]
Now the DT bindings has a property to get the FIFO size for a particular port, retrieve it and use to setup the FIFO interrupts threshold.
Reviewed-by: Kevin Hilman khilman@baylibre.com Signed-off-by: Neil Armstrong narmstrong@baylibre.com Link: https://lore.kernel.org/r/20210518075833.3736038-3-narmstrong@baylibre.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 2a1d728f20ed ("tty: serial: meson: fix hard LOCKUP on crtscts mode") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/meson_uart.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/meson_uart.c b/drivers/tty/serial/meson_uart.c index 6a74a31231ebf..7563fd215d816 100644 --- a/drivers/tty/serial/meson_uart.c +++ b/drivers/tty/serial/meson_uart.c @@ -663,6 +663,7 @@ static int meson_uart_probe(struct platform_device *pdev) { struct resource *res_mem, *res_irq; struct uart_port *port; + u32 fifosize = 64; /* Default is 64, 128 for EE UART_0 */ int ret = 0;
if (pdev->dev.of_node) @@ -690,6 +691,8 @@ static int meson_uart_probe(struct platform_device *pdev) if (!res_irq) return -ENODEV;
+ of_property_read_u32(pdev->dev.of_node, "fifo-size", &fifosize); + if (meson_ports[pdev->id]) { dev_err(&pdev->dev, "port %d already allocated\n", pdev->id); return -EBUSY; @@ -719,7 +722,7 @@ static int meson_uart_probe(struct platform_device *pdev) port->type = PORT_MESON; port->x_char = 0; port->ops = &meson_uart_ops; - port->fifosize = 64; + port->fifosize = fifosize;
meson_ports[pdev->id] = port; platform_set_drvdata(pdev, port);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com
[ Upstream commit 5b68061983471470d4109bac776145245f06bc09 ]
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static allocation of IRQ resources in DT core code, this causes an issue when using hierarchical interrupt domains using "interrupts" property in the node as this bypasses the hierarchical setup and messes up the irq chaining.
In preparation for removal of static setup of IRQ resource from DT core code use platform_get_irq().
Signed-off-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com Link: https://lore.kernel.org/r/20211224142917.6966-5-prabhakar.mahadev-lad.rj@bp.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 2a1d728f20ed ("tty: serial: meson: fix hard LOCKUP on crtscts mode") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/meson_uart.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/tty/serial/meson_uart.c b/drivers/tty/serial/meson_uart.c index 7563fd215d816..a193cbc78ebc0 100644 --- a/drivers/tty/serial/meson_uart.c +++ b/drivers/tty/serial/meson_uart.c @@ -661,10 +661,11 @@ static int meson_uart_probe_clocks(struct platform_device *pdev,
static int meson_uart_probe(struct platform_device *pdev) { - struct resource *res_mem, *res_irq; + struct resource *res_mem; struct uart_port *port; u32 fifosize = 64; /* Default is 64, 128 for EE UART_0 */ int ret = 0; + int irq;
if (pdev->dev.of_node) pdev->id = of_alias_get_id(pdev->dev.of_node, "serial"); @@ -687,9 +688,9 @@ static int meson_uart_probe(struct platform_device *pdev) if (!res_mem) return -ENODEV;
- res_irq = platform_get_resource(pdev, IORESOURCE_IRQ, 0); - if (!res_irq) - return -ENODEV; + irq = platform_get_irq(pdev, 0); + if (irq < 0) + return irq;
of_property_read_u32(pdev->dev.of_node, "fifo-size", &fifosize);
@@ -714,7 +715,7 @@ static int meson_uart_probe(struct platform_device *pdev) port->iotype = UPIO_MEM; port->mapbase = res_mem->start; port->mapsize = resource_size(res_mem); - port->irq = res_irq->start; + port->irq = irq; port->flags = UPF_BOOT_AUTOCONF | UPF_LOW_LATENCY; port->has_sysrq = IS_ENABLED(CONFIG_SERIAL_MESON_CONSOLE); port->dev = &pdev->dev;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Krasavin pkrasavin@imaqliq.com
[ Upstream commit 2a1d728f20edeee7f26dc307ed9df4e0d23947ab ]
There might be hard lockup if we set crtscts mode on port without RTS/CTS configured:
# stty -F /dev/ttyAML6 crtscts; echo 1 > /dev/ttyAML6; echo 2 > /dev/ttyAML6 [ 95.890386] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 95.890857] rcu: 3-...0: (201 ticks this GP) idle=e33c/1/0x4000000000000000 softirq=5844/5846 fqs=4984 [ 95.900212] rcu: (detected by 2, t=21016 jiffies, g=7753, q=296 ncpus=4) [ 95.906972] Task dump for CPU 3: [ 95.910178] task:bash state:R running task stack:0 pid:205 ppid:1 flags:0x00000202 [ 95.920059] Call trace: [ 95.922485] __switch_to+0xe4/0x168 [ 95.925951] 0xffffff8003477508 [ 95.974379] watchdog: Watchdog detected hard LOCKUP on cpu 3 [ 95.974424] Modules linked in: 88x2cs(O) rtc_meson_vrtc
Possible solution would be to not allow to setup crtscts on such port.
Tested on S905X3 based board.
Fixes: ff7693d079e5 ("ARM: meson: serial: add MesonX SoC on-chip uart driver") Cc: stable@vger.kernel.org Signed-off-by: Pavel Krasavin pkrasavin@imaqliq.com Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Reviewed-by: Dmitry Rokosov ddrokosov@salutedevices.com
v6: stable tag added v5: https://lore.kernel.org/lkml/OF43DA36FF.2BD3BB21-ON00258A47.005A8125-00258A4... added missed Reviewed-by tags, Fixes tag added according to Dmitry and Neil notes v4: https://lore.kernel.org/lkml/OF55521400.7512350F-ON00258A47.003F7254-00258A4... More correct patch subject according to Jiri's note v3: https://lore.kernel.org/lkml/OF6CF5FFA0.CCFD0E8E-ON00258A46.00549EDF-00258A4... "From:" line added to the mail v2: https://lore.kernel.org/lkml/OF950BEF72.7F425944-ON00258A46.00488A76-00258A4... braces for single statement removed according to Dmitry's note v1: https://lore.kernel.org/lkml/OF28B2B8C9.5BC0CD28-ON00258A46.0037688F-00258A4... Link: https://lore.kernel.org/r/OF66360032.51C36182-ON00258A48.003F656B-00258A48.0...
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/meson_uart.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/serial/meson_uart.c b/drivers/tty/serial/meson_uart.c index a193cbc78ebc0..60c0a4079e093 100644 --- a/drivers/tty/serial/meson_uart.c +++ b/drivers/tty/serial/meson_uart.c @@ -367,10 +367,14 @@ static void meson_uart_set_termios(struct uart_port *port, else val |= AML_UART_STOP_BIT_1SB;
- if (cflags & CRTSCTS) - val &= ~AML_UART_TWO_WIRE_EN; - else + if (cflags & CRTSCTS) { + if (port->flags & UPF_HARD_FLOW) + val &= ~AML_UART_TWO_WIRE_EN; + else + termios->c_cflag &= ~CRTSCTS; + } else { val |= AML_UART_TWO_WIRE_EN; + }
writel(val, port->membase + AML_UART_CONTROL);
@@ -666,6 +670,7 @@ static int meson_uart_probe(struct platform_device *pdev) u32 fifosize = 64; /* Default is 64, 128 for EE UART_0 */ int ret = 0; int irq; + bool has_rtscts;
if (pdev->dev.of_node) pdev->id = of_alias_get_id(pdev->dev.of_node, "serial"); @@ -693,6 +698,7 @@ static int meson_uart_probe(struct platform_device *pdev) return irq;
of_property_read_u32(pdev->dev.of_node, "fifo-size", &fifosize); + has_rtscts = of_property_read_bool(pdev->dev.of_node, "uart-has-rtscts");
if (meson_ports[pdev->id]) { dev_err(&pdev->dev, "port %d already allocated\n", pdev->id); @@ -717,6 +723,8 @@ static int meson_uart_probe(struct platform_device *pdev) port->mapsize = resource_size(res_mem); port->irq = irq; port->flags = UPF_BOOT_AUTOCONF | UPF_LOW_LATENCY; + if (has_rtscts) + port->flags |= UPF_HARD_FLOW; port->has_sysrq = IS_ENABLED(CONFIG_SERIAL_MESON_CONSOLE); port->dev = &pdev->dev; port->line = pdev->id;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alain Michaud alainm@chromium.org
[ Upstream commit 3e4e3f73b9f4944ebd8100dbe107f2325aa79c6d ]
This change adds a new flag to define a controller's wideband speech capability. This is required since no reliable over HCI mechanism exists to query the controller and driver's compatibility with wideband speech.
Signed-off-by: Alain Michaud alainm@chromium.org Signed-off-by: Marcel Holtmann marcel@holtmann.org Stable-dep-of: da06ff1f585e ("Bluetooth: btusb: Add 0bda:b85b for Fn-Link RTL8852BE") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btusb.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 79f77315854f4..c42324ae8eeff 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -57,6 +57,7 @@ static struct usb_driver btusb_driver; #define BTUSB_IFNUM_2 0x80000 #define BTUSB_CW6622 0x100000 #define BTUSB_MEDIATEK 0x200000 +#define BTUSB_WIDEBAND_SPEECH 0x400000
static const struct usb_device_id btusb_table[] = { /* Generic Bluetooth USB device */ @@ -332,15 +333,21 @@ static const struct usb_device_id blacklist_table[] = { { USB_DEVICE(0x1286, 0x204e), .driver_info = BTUSB_MARVELL },
/* Intel Bluetooth devices */ - { USB_DEVICE(0x8087, 0x0025), .driver_info = BTUSB_INTEL_NEW }, - { USB_DEVICE(0x8087, 0x0026), .driver_info = BTUSB_INTEL_NEW }, - { USB_DEVICE(0x8087, 0x0029), .driver_info = BTUSB_INTEL_NEW }, + { USB_DEVICE(0x8087, 0x0025), .driver_info = BTUSB_INTEL_NEW | + BTUSB_WIDEBAND_SPEECH }, + { USB_DEVICE(0x8087, 0x0026), .driver_info = BTUSB_INTEL_NEW | + BTUSB_WIDEBAND_SPEECH }, + { USB_DEVICE(0x8087, 0x0029), .driver_info = BTUSB_INTEL_NEW | + BTUSB_WIDEBAND_SPEECH }, { USB_DEVICE(0x8087, 0x07da), .driver_info = BTUSB_CSR }, { USB_DEVICE(0x8087, 0x07dc), .driver_info = BTUSB_INTEL }, { USB_DEVICE(0x8087, 0x0a2a), .driver_info = BTUSB_INTEL }, - { USB_DEVICE(0x8087, 0x0a2b), .driver_info = BTUSB_INTEL_NEW }, - { USB_DEVICE(0x8087, 0x0aa7), .driver_info = BTUSB_INTEL }, - { USB_DEVICE(0x8087, 0x0aaa), .driver_info = BTUSB_INTEL_NEW }, + { USB_DEVICE(0x8087, 0x0a2b), .driver_info = BTUSB_INTEL_NEW | + BTUSB_WIDEBAND_SPEECH }, + { USB_DEVICE(0x8087, 0x0aa7), .driver_info = BTUSB_INTEL | + BTUSB_WIDEBAND_SPEECH }, + { USB_DEVICE(0x8087, 0x0aaa), .driver_info = BTUSB_INTEL_NEW | + BTUSB_WIDEBAND_SPEECH },
/* Other Intel Bluetooth devices */ { USB_VENDOR_AND_INTERFACE_INFO(0x8087, 0xe0, 0x01, 0x01),
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joseph Hwang josephsih@chromium.org
[ Upstream commit 33bfd94a05abb5a63e323dd1454bc580d4bf992c ]
This patch adds the Realtek 8822CE controller to the usb_device_id table to support the wideband speech capability.
Signed-off-by: Joseph Hwang josephsih@chromium.org Reviewed-by: Alain Michaud alainm@chromium.org Signed-off-by: Marcel Holtmann marcel@holtmann.org Stable-dep-of: da06ff1f585e ("Bluetooth: btusb: Add 0bda:b85b for Fn-Link RTL8852BE") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btusb.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index c42324ae8eeff..854bf20353d3b 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -353,6 +353,10 @@ static const struct usb_device_id blacklist_table[] = { { USB_VENDOR_AND_INTERFACE_INFO(0x8087, 0xe0, 0x01, 0x01), .driver_info = BTUSB_IGNORE },
+ /* Realtek 8822CE Bluetooth devices */ + { USB_DEVICE(0x0bda, 0xb00c), .driver_info = BTUSB_REALTEK | + BTUSB_WIDEBAND_SPEECH }, + /* Realtek Bluetooth devices */ { USB_VENDOR_AND_INTERFACE_INFO(0x0bda, 0xe0, 0x01, 0x01), .driver_info = BTUSB_REALTEK },
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Artem Lukyanov dukzcry@ya.ru
[ Upstream commit 393b4916b7b5b94faf5c6a7c68df1c62d17e4f38 ]
Add the support ID(0x0cb8, 0xc559) to usb_device_id table for Realtek RTL8852BE.
The device info from /sys/kernel/debug/usb/devices as below.
T: Bus=03 Lev=01 Prnt=01 Port=02 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 1.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0cb8 ProdID=c559 Rev= 0.00 S: Manufacturer=Realtek S: Product=Bluetooth Radio S: SerialNumber=00e04c000001 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms
Signed-off-by: Artem Lukyanov dukzcry@ya.ru Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Stable-dep-of: da06ff1f585e ("Bluetooth: btusb: Add 0bda:b85b for Fn-Link RTL8852BE") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btusb.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 854bf20353d3b..ae7f984bd62c7 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -357,6 +357,10 @@ static const struct usb_device_id blacklist_table[] = { { USB_DEVICE(0x0bda, 0xb00c), .driver_info = BTUSB_REALTEK | BTUSB_WIDEBAND_SPEECH },
+ /* Realtek 8852BE Bluetooth devices */ + { USB_DEVICE(0x0cb8, 0xc559), .driver_info = BTUSB_REALTEK | + BTUSB_WIDEBAND_SPEECH }, + /* Realtek Bluetooth devices */ { USB_VENDOR_AND_INTERFACE_INFO(0x0bda, 0xe0, 0x01, 0x01), .driver_info = BTUSB_REALTEK },
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Larry Finger Larry.Finger@lwfinger.net
[ Upstream commit 730a1d1a93a3e30c3723f87af97a8517334b2203 ]
This device is part of a Realtek RTW8852BE chip.
The device table entry is as follows:
T: Bus=03 Lev=01 Prnt=01 Port=12 Cnt=02 Dev#= 3 Spd=12 MxCh= 0 D: Ver= 1.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0bda ProdID=887b Rev= 0.00 S: Manufacturer=Realtek S: Product=Bluetooth Radio S: SerialNumber=00e04c000001 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms
Signed-off-by: Larry Finger Larry.Finger@lwfinger.net Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Stable-dep-of: da06ff1f585e ("Bluetooth: btusb: Add 0bda:b85b for Fn-Link RTL8852BE") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btusb.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index ae7f984bd62c7..cef6cee5a1540 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -360,6 +360,8 @@ static const struct usb_device_id blacklist_table[] = { /* Realtek 8852BE Bluetooth devices */ { USB_DEVICE(0x0cb8, 0xc559), .driver_info = BTUSB_REALTEK | BTUSB_WIDEBAND_SPEECH }, + { USB_DEVICE(0x0bda, 0x887b), .driver_info = BTUSB_REALTEK | + BTUSB_WIDEBAND_SPEECH },
/* Realtek Bluetooth devices */ { USB_VENDOR_AND_INTERFACE_INFO(0x0bda, 0xe0, 0x01, 0x01),
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Larry Finger Larry.Finger@lwfinger.net
[ Upstream commit 069f534247bb6db4f8c2c2ea8e9155abf495c37e ]
This device is part of a Realtek RTW8852BE chip. The device table is as follows:
T: Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 1.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=13d3 ProdID=3571 Rev= 0.00 S: Manufacturer=Realtek S: Product=Bluetooth Radio S: SerialNumber=00e04c000001 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms
Signed-off-by: Larry Finger Larry.Finger@lwfinger.net Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Stable-dep-of: da06ff1f585e ("Bluetooth: btusb: Add 0bda:b85b for Fn-Link RTL8852BE") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btusb.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index cef6cee5a1540..04912b1080dc6 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -362,6 +362,8 @@ static const struct usb_device_id blacklist_table[] = { BTUSB_WIDEBAND_SPEECH }, { USB_DEVICE(0x0bda, 0x887b), .driver_info = BTUSB_REALTEK | BTUSB_WIDEBAND_SPEECH }, + { USB_DEVICE(0x13d3, 0x3571), .driver_info = BTUSB_REALTEK | + BTUSB_WIDEBAND_SPEECH },
/* Realtek Bluetooth devices */ { USB_VENDOR_AND_INTERFACE_INFO(0x0bda, 0xe0, 0x01, 0x01),
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masum Reza masumrezarock100@gmail.com
[ Upstream commit 02be109d3a405dbc4d53fb4b4473d7a113548088 ]
This device is used in TP-Link TX20E WiFi+Bluetooth adapter.
Relevant information in /sys/kernel/debug/usb/devices about the Bluetooth device is listed as the below.
T: Bus=01 Lev=01 Prnt=01 Port=08 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 1.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=13d3 ProdID=3570 Rev= 0.00 S: Manufacturer=Realtek S: Product=Bluetooth Radio S: SerialNumber=00e04c000001 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms
Signed-off-by: Masum Reza masumrezarock100@gmail.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Stable-dep-of: da06ff1f585e ("Bluetooth: btusb: Add 0bda:b85b for Fn-Link RTL8852BE") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btusb.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 04912b1080dc6..3ea4870b08b32 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -362,6 +362,8 @@ static const struct usb_device_id blacklist_table[] = { BTUSB_WIDEBAND_SPEECH }, { USB_DEVICE(0x0bda, 0x887b), .driver_info = BTUSB_REALTEK | BTUSB_WIDEBAND_SPEECH }, + { USB_DEVICE(0x13d3, 0x3570), .driver_info = BTUSB_REALTEK | + BTUSB_WIDEBAND_SPEECH }, { USB_DEVICE(0x13d3, 0x3571), .driver_info = BTUSB_REALTEK | BTUSB_WIDEBAND_SPEECH },
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guan Wentao guanwentao@uniontech.com
[ Upstream commit da06ff1f585ea784c79f80e7fab0e0c4ebb49c1c ]
Add PID/VID 0bda:b85b for Realtek RTL8852BE USB bluetooth part. The PID/VID was reported by the patch last year. [1] Some SBCs like rockpi 5B A8 module contains the device. And it`s founded in website. [2] [3]
Here is the device tables in /sys/kernel/debug/usb/devices .
T: Bus=07 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 1.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0bda ProdID=b85b Rev= 0.00 S: Manufacturer=Realtek S: Product=Bluetooth Radio S: SerialNumber=00e04c000001 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms
Link: https://lore.kernel.org/all/20220420052402.19049-1-tangmeng@uniontech.com/ [1] Link: https://forum.radxa.com/t/bluetooth-on-ubuntu/13051/4 [2] Link: https://ubuntuforums.org/showthread.php?t=2489527 [3]
Cc: stable@vger.kernel.org Signed-off-by: Meng Tang tangmeng@uniontech.com Signed-off-by: Guan Wentao guanwentao@uniontech.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btusb.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 3ea4870b08b32..dbba6a09e51e4 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -362,6 +362,8 @@ static const struct usb_device_id blacklist_table[] = { BTUSB_WIDEBAND_SPEECH }, { USB_DEVICE(0x0bda, 0x887b), .driver_info = BTUSB_REALTEK | BTUSB_WIDEBAND_SPEECH }, + { USB_DEVICE(0x0bda, 0xb85b), .driver_info = BTUSB_REALTEK | + BTUSB_WIDEBAND_SPEECH }, { USB_DEVICE(0x13d3, 0x3570), .driver_info = BTUSB_REALTEK | BTUSB_WIDEBAND_SPEECH }, { USB_DEVICE(0x13d3, 0x3571), .driver_info = BTUSB_REALTEK |
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johnathan Mantey johnathanx.mantey@intel.com
commit 9e2e7efbbbff69d8340abb56d375dd79d1f5770f upstream.
This reverts commit 3780bb29311eccb7a1c9641032a112eed237f7e3.
The cited commit introduced unwanted behavior.
The intent for the commit was to be able to detect carrier loss/gain for just the NIC connected to the BMC. The unwanted effect is a carrier loss for auxiliary paths also causes the BMC to lose carrier. The BMC never regains carrier despite the secondary NIC regaining a link.
This change, when merged, needs to be backported to stable kernels. 5.4-stable, 5.10-stable, 5.15-stable, 6.1-stable, 6.5-stable
Fixes: 3780bb29311e ("ncsi: Propagate carrier gain/loss events to the NCSI controller") CC: stable@vger.kernel.org Signed-off-by: Johnathan Mantey johnathanx.mantey@intel.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ncsi/ncsi-aen.c | 5 ----- 1 file changed, 5 deletions(-)
--- a/net/ncsi/ncsi-aen.c +++ b/net/ncsi/ncsi-aen.c @@ -89,11 +89,6 @@ static int ncsi_aen_handler_lsc(struct n if ((had_link == has_link) || chained) return 0;
- if (had_link) - netif_carrier_off(ndp->ndev.dev); - else - netif_carrier_on(ndp->ndev.dev); - if (!ndp->multi_package && !nc->package->multi_channel) { if (had_link) { ndp->flags |= NCSI_DEV_RESHUFFLE;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Sverdlin alexander.sverdlin@siemens.com
commit 5a22fbcc10f3f7d94c5d88afbbffa240a3677057 upstream.
When LAN9303 is MDIO-connected two callchains exist into mdio->bus->write():
1. switch ports 1&2 ("physical" PHYs):
virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})-> lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested
2. LAN9303 virtual PHY:
virtual MDIO bus (lan9303_phy_{read|write}) -> lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write}
If the latter functions just take mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP false-positive splat. It's false-positive because the first mdio_lock in the second callchain above belongs to virtual MDIO bus, the second mdio_lock belongs to physical MDIO bus.
Consequent annotation in lan9303_mdio_{read|write} as nested lock (similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus) prevents the following splat:
WARNING: possible circular locking dependency detected 5.15.71 #1 Not tainted ------------------------------------------------------ kworker/u4:3/609 is trying to acquire lock: ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex but task is already holding lock: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&bus->mdio_lock){+.+.}-{3:3}: lock_acquire __mutex_lock mutex_lock_nested lan9303_mdio_read _regmap_read regmap_read lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork -> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}: __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); *** DEADLOCK *** 5 locks held by kworker/u4:3/609: #0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work #1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work #2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach #3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch #4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read stack backtrace: CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 #1 Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace show_stack dump_stack_lvl dump_stack print_circular_bug check_noncircular __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe ...
Cc: stable@vger.kernel.org Fixes: dc7005831523 ("net: dsa: LAN9303: add MDIO managed mode support") Signed-off-by: Alexander Sverdlin alexander.sverdlin@siemens.com Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20231027065741.534971-1-alexander.sverdlin@siemens... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/lan9303_mdio.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/dsa/lan9303_mdio.c +++ b/drivers/net/dsa/lan9303_mdio.c @@ -32,7 +32,7 @@ static int lan9303_mdio_write(void *ctx, struct lan9303_mdio *sw_dev = (struct lan9303_mdio *)ctx;
reg <<= 2; /* reg num to offset */ - mutex_lock(&sw_dev->device->bus->mdio_lock); + mutex_lock_nested(&sw_dev->device->bus->mdio_lock, MDIO_MUTEX_NESTED); lan9303_mdio_real_write(sw_dev->device, reg, val & 0xffff); lan9303_mdio_real_write(sw_dev->device, reg + 2, (val >> 16) & 0xffff); mutex_unlock(&sw_dev->device->bus->mdio_lock); @@ -50,7 +50,7 @@ static int lan9303_mdio_read(void *ctx, struct lan9303_mdio *sw_dev = (struct lan9303_mdio *)ctx;
reg <<= 2; /* reg num to offset */ - mutex_lock(&sw_dev->device->bus->mdio_lock); + mutex_lock_nested(&sw_dev->device->bus->mdio_lock, MDIO_MUTEX_NESTED); *val = lan9303_mdio_real_read(sw_dev->device, reg); *val |= (lan9303_mdio_real_read(sw_dev->device, reg + 2) << 16); mutex_unlock(&sw_dev->device->bus->mdio_lock);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiner Kallweit hkallweit1@gmail.com
commit f78ca48a8ba9cdec96e8839351e49eec3233b177 upstream.
Currently we set SMBHSTCNT_LAST_BYTE only after the host has started receiving the last byte. If we get e.g. preempted before setting SMBHSTCNT_LAST_BYTE, the host may be finished with receiving the byte before SMBHSTCNT_LAST_BYTE is set. Therefore change the code to set SMBHSTCNT_LAST_BYTE before writing SMBHSTSTS_BYTE_DONE for the byte before the last byte. Now the code is also consistent with what we do in i801_isr_byte_done().
Reported-by: Jean Delvare jdelvare@suse.com Closes: https://lore.kernel.org/linux-i2c/20230828152747.09444625@endymion.delvare/ Cc: stable@vger.kernel.org Acked-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Reviewed-by: Jean Delvare jdelvare@suse.de Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-i801.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-)
--- a/drivers/i2c/busses/i2c-i801.c +++ b/drivers/i2c/busses/i2c-i801.c @@ -723,15 +723,11 @@ static int i801_block_transaction_byte_b return i801_check_post(priv, status); }
- for (i = 1; i <= len; i++) { - if (i == len && read_write == I2C_SMBUS_READ) - smbcmd |= SMBHSTCNT_LAST_BYTE; - outb_p(smbcmd, SMBHSTCNT(priv)); - - if (i == 1) - outb_p(inb(SMBHSTCNT(priv)) | SMBHSTCNT_START, - SMBHSTCNT(priv)); + if (len == 1 && read_write == I2C_SMBUS_READ) + smbcmd |= SMBHSTCNT_LAST_BYTE; + outb_p(smbcmd | SMBHSTCNT_START, SMBHSTCNT(priv));
+ for (i = 1; i <= len; i++) { status = i801_wait_byte_done(priv); if (status) goto exit; @@ -754,9 +750,12 @@ static int i801_block_transaction_byte_b data->block[0] = len; }
- /* Retrieve/store value in SMBBLKDAT */ - if (read_write == I2C_SMBUS_READ) + if (read_write == I2C_SMBUS_READ) { data->block[i] = inb_p(SMBBLKDAT(priv)); + if (i == len - 1) + outb_p(smbcmd | SMBHSTCNT_LAST_BYTE, SMBHSTCNT(priv)); + } + if (read_write == I2C_SMBUS_WRITE && i+1 <= len) outb_p(data->block[i+1], SMBBLKDAT(priv));
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Young sean@mess.org
commit c8a489f820179fb12251e262b50303c29de991ac upstream.
When transmitting, infrared drivers expect an odd number of samples; iow without a trailing space. No problems have been observed so far, so this is just belt and braces.
Fixes: 9b6192589be7 ("media: lirc: implement scancode sending") Cc: stable@vger.kernel.org Signed-off-by: Sean Young sean@mess.org Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/rc/lirc_dev.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/media/rc/lirc_dev.c +++ b/drivers/media/rc/lirc_dev.c @@ -292,7 +292,11 @@ static ssize_t ir_lirc_transmit_ir(struc if (ret < 0) goto out_kfree_raw;
- count = ret; + /* drop trailing space */ + if (!(ret % 2)) + count = ret - 1; + else + count = ret;
txbuf = kmalloc_array(count, sizeof(unsigned int), GFP_KERNEL); if (!txbuf) {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Young sean@mess.org
commit 4f7efc71891462ab7606da7039f480d7c1584a13 upstream.
The Sharp protocol[1] encoding has incorrect timings for bit space.
[1] https://www.sbprojects.net/knowledge/ir/sharp.php
Fixes: d35afc5fe097 ("[media] rc: ir-sharp-decoder: Add encode capability") Cc: stable@vger.kernel.org Reported-by: Joe Ferner joe.m.ferner@gmail.com Closes: https://sourceforge.net/p/lirc/mailman/message/38604507/ Signed-off-by: Sean Young sean@mess.org Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/rc/ir-sharp-decoder.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/media/rc/ir-sharp-decoder.c +++ b/drivers/media/rc/ir-sharp-decoder.c @@ -15,7 +15,9 @@ #define SHARP_UNIT 40000 /* ns */ #define SHARP_BIT_PULSE (8 * SHARP_UNIT) /* 320us */ #define SHARP_BIT_0_PERIOD (25 * SHARP_UNIT) /* 1ms (680us space) */ -#define SHARP_BIT_1_PERIOD (50 * SHARP_UNIT) /* 2ms (1680ms space) */ +#define SHARP_BIT_1_PERIOD (50 * SHARP_UNIT) /* 2ms (1680us space) */ +#define SHARP_BIT_0_SPACE (17 * SHARP_UNIT) /* 680us space */ +#define SHARP_BIT_1_SPACE (42 * SHARP_UNIT) /* 1680us space */ #define SHARP_ECHO_SPACE (1000 * SHARP_UNIT) /* 40 ms */ #define SHARP_TRAILER_SPACE (125 * SHARP_UNIT) /* 5 ms (even longer) */
@@ -168,8 +170,8 @@ static const struct ir_raw_timings_pd ir .header_pulse = 0, .header_space = 0, .bit_pulse = SHARP_BIT_PULSE, - .bit_space[0] = SHARP_BIT_0_PERIOD, - .bit_space[1] = SHARP_BIT_1_PERIOD, + .bit_space[0] = SHARP_BIT_0_SPACE, + .bit_space[1] = SHARP_BIT_1_SPACE, .trailer_pulse = SHARP_BIT_PULSE, .trailer_space = SHARP_ECHO_SPACE, .msb_first = 1,
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vikash Garodia quic_vgarodia@quicinc.com
commit 0768a9dd809ef52440b5df7dce5a1c1c7e97abbd upstream.
Supported codec bitmask is populated from the payload from venus firmware. There is a possible case when all the bits in the codec bitmask is set. In such case, core cap for decoder is filled and MAX_CODEC_NUM is utilized. Now while filling the caps for encoder, it can lead to access the caps array beyong 32 index. Hence leading to OOB write. The fix counts the supported encoder and decoder. If the count is more than max, then it skips accessing the caps.
Cc: stable@vger.kernel.org Fixes: 1a73374a04e5 ("media: venus: hfi_parser: add common capability parser") Signed-off-by: Vikash Garodia quic_vgarodia@quicinc.com Signed-off-by: Stanimir Varbanov stanimir.k.varbanov@gmail.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/platform/qcom/venus/hfi_parser.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/media/platform/qcom/venus/hfi_parser.c +++ b/drivers/media/platform/qcom/venus/hfi_parser.c @@ -19,6 +19,9 @@ static void init_codecs(struct venus_cor struct venus_caps *caps = core->caps, *cap; unsigned long bit;
+ if (hweight_long(core->dec_codecs) + hweight_long(core->enc_codecs) > MAX_CODEC_NUM) + return; + for_each_set_bit(bit, &core->dec_codecs, MAX_CODEC_NUM) { cap = &caps[core->codecs_count++]; cap->codec = BIT(bit);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vikash Garodia quic_vgarodia@quicinc.com
commit b18e36dfd6c935da60a971310374f3dfec3c82e1 upstream.
Buffer requirement, for different buffer type, comes from video firmware. While copying these requirements, there is an OOB possibility when the payload from firmware is more than expected size. Fix the check to avoid the OOB possibility.
Cc: stable@vger.kernel.org Fixes: 09c2845e8fe4 ("[media] media: venus: hfi: add Host Firmware Interface (HFI)") Reviewed-by: Nathan Hebert nhebert@chromium.org Signed-off-by: Vikash Garodia quic_vgarodia@quicinc.com Signed-off-by: Stanimir Varbanov stanimir.k.varbanov@gmail.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/platform/qcom/venus/hfi_msgs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/media/platform/qcom/venus/hfi_msgs.c +++ b/drivers/media/platform/qcom/venus/hfi_msgs.c @@ -350,7 +350,7 @@ session_get_prop_buf_req(struct hfi_msg_ memcpy(&bufreq[idx], buf_req, sizeof(*bufreq)); idx++;
- if (idx > HFI_BUFFER_TYPE_MAX) + if (idx >= HFI_BUFFER_TYPE_MAX) return HFI_ERR_SESSION_INVALID_PARAMETER;
req_bytes -= sizeof(struct hfi_buffer_requirements);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vikash Garodia quic_vgarodia@quicinc.com
commit 8d0b89398b7ebc52103e055bf36b60b045f5258f upstream.
The hfi parser, parses the capabilities received from venus firmware and copies them to core capabilities. Consider below api, for example, fill_caps - In this api, caps in core structure gets updated with the number of capabilities received in firmware data payload. If the same api is called multiple times, there is a possibility of copying beyond the max allocated size in core caps. Similar possibilities in fill_raw_fmts and fill_profile_level functions.
Cc: stable@vger.kernel.org Fixes: 1a73374a04e5 ("media: venus: hfi_parser: add common capability parser") Signed-off-by: Vikash Garodia quic_vgarodia@quicinc.com Signed-off-by: Stanimir Varbanov stanimir.k.varbanov@gmail.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/platform/qcom/venus/hfi_parser.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/drivers/media/platform/qcom/venus/hfi_parser.c +++ b/drivers/media/platform/qcom/venus/hfi_parser.c @@ -89,6 +89,9 @@ static void fill_profile_level(struct ve { const struct hfi_profile_level *pl = data;
+ if (cap->num_pl + num >= HFI_MAX_PROFILE_COUNT) + return; + memcpy(&cap->pl[cap->num_pl], pl, num * sizeof(*pl)); cap->num_pl += num; } @@ -114,6 +117,9 @@ fill_caps(struct venus_caps *cap, const { const struct hfi_capability *caps = data;
+ if (cap->num_caps + num >= MAX_CAP_ENTRIES) + return; + memcpy(&cap->caps[cap->num_caps], caps, num * sizeof(*caps)); cap->num_caps += num; } @@ -140,6 +146,9 @@ static void fill_raw_fmts(struct venus_c { const struct raw_formats *formats = fmts;
+ if (cap->num_fmts + num_fmts >= MAX_FMT_ENTRIES) + return; + memcpy(&cap->fmts[cap->num_fmts], formats, num_fmts * sizeof(*formats)); cap->num_fmts += num_fmts; } @@ -162,6 +171,9 @@ parse_raw_formats(struct venus_core *cor rawfmts[i].buftype = fmt->buffer_type; i++;
+ if (i >= MAX_FMT_ENTRIES) + return; + if (pinfo->num_planes > MAX_PLANES) break;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mahmoud Adam mngyadam@amazon.com
commit bc1b5acb40201a0746d68a7d7cfc141899937f4f upstream.
seq_release should be called to free the allocated seq_file
Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Mahmoud Adam mngyadam@amazon.com Reviewed-by: Jeff Layton jlayton@kernel.org Fixes: 78599c42ae3c ("nfsd4: add file to display list of client's opens") Reviewed-by: NeilBrown neilb@suse.de Tested-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2571,7 +2571,7 @@ static int client_opens_release(struct i
/* XXX: alternatively, we could get/drop in seq start/stop */ drop_client(clp); - return 0; + return seq_release(inode, file); }
static const struct file_operations client_states_fops = {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiner Kallweit hkallweit1@gmail.com
commit 6a26310273c323380da21eb23fcfd50e31140913 upstream.
This reverts commit efa5f1311c4998e9e6317c52bc5ee93b3a0f36df.
I couldn't reproduce the reported issue. What I did, based on a pcap packet log provided by the reporter: - Used same chip version (RTL8168h) - Set MAC address to the one used on the reporters system - Replayed the EAPOL unicast packet that, according to the reporter, was filtered out by the mc filter. The packet was properly received.
Therefore the root cause of the reported issue seems to be somewhere else. Disabling mc filtering completely for the most common chip version is a quite big hammer. Therefore revert the change and wait for further analysis results from the reporter.
Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/realtek/r8169_main.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4292,9 +4292,7 @@ static void rtl_set_rx_mode(struct net_d rx_mode &= ~AcceptMulticast; } else if (netdev_mc_count(dev) > MC_FILTER_LIMIT || dev->flags & IFF_ALLMULTI || - tp->mac_version == RTL_GIGA_MAC_VER_35 || - tp->mac_version == RTL_GIGA_MAC_VER_46 || - tp->mac_version == RTL_GIGA_MAC_VER_48) { + tp->mac_version == RTL_GIGA_MAC_VER_35) { /* accept all multicasts */ } else if (netdev_mc_empty(dev)) { rx_mode &= ~AcceptMulticast;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Kellermann max.kellermann@ionos.com
commit 484fd6c1de13b336806a967908a927cc0356e312 upstream.
The function ext4_init_acl() calls posix_acl_create() which is responsible for applying the umask. But without CONFIG_EXT4_FS_POSIX_ACL, ext4_init_acl() is an empty inline function, and nobody applies the umask.
This fixes a bug which causes the umask to be ignored with O_TMPFILE on ext4:
https://github.com/MusicPlayerDaemon/MPD/issues/558 https://bugs.gentoo.org/show_bug.cgi?id=686142#c3 https://bugzilla.kernel.org/show_bug.cgi?id=203625
Reviewed-by: "J. Bruce Fields" bfields@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann max.kellermann@ionos.com Link: https://lore.kernel.org/r/20230919081824.1096619-1-max.kellermann@ionos.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/acl.h | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/ext4/acl.h +++ b/fs/ext4/acl.h @@ -67,6 +67,11 @@ extern int ext4_init_acl(handle_t *, str static inline int ext4_init_acl(handle_t *handle, struct inode *inode, struct inode *dir) { + /* usually, the umask is applied by posix_acl_create(), but if + ext4 ACL support is disabled at compile time, we need to do + it here, because posix_acl_create() will never be called */ + inode->i_mode &= ~current_umask(); + return 0; } #endif /* CONFIG_EXT4_FS_POSIX_ACL */
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kemeng Shi shikemeng@huaweicloud.com
commit 31f13421c004a420c0e9d288859c9ea9259ea0cc upstream.
Commit 0aeaa2559d6d5 ("ext4: fix corruption when online resizing a 1K bigalloc fs") found that primary superblock's offset in its group is not equal to offset of backup superblock in its group when block size is 1K and bigalloc is enabled. As group descriptor blocks are right after superblock, we can't pass block number of gdb to update_backups for the same reason.
The root casue of the issue above is that leading 1K padding block is count as data block offset for primary block while backup block has no padding block offset in its group.
Remove padding data block count to fix the issue for gdb backups.
For meta_bg case, update_backups treat blk_off as block number, do no conversion in this case.
Signed-off-by: Kemeng Shi shikemeng@huaweicloud.com Reviewed-by: Theodore Ts'o tytso@mit.edu Link: https://lore.kernel.org/r/20230826174712.4059355-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/resize.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -1565,6 +1565,8 @@ exit_journal: int gdb_num_end = ((group + flex_gd->count - 1) / EXT4_DESC_PER_BLOCK(sb)); int meta_bg = ext4_has_feature_meta_bg(sb); + sector_t padding_blocks = meta_bg ? 0 : sbi->s_sbh->b_blocknr - + ext4_group_first_block_no(sb, 0); sector_t old_gdb = 0;
update_backups(sb, ext4_group_first_block_no(sb, 0), @@ -1576,8 +1578,8 @@ exit_journal: gdb_num); if (old_gdb == gdb_bh->b_blocknr) continue; - update_backups(sb, gdb_bh->b_blocknr, gdb_bh->b_data, - gdb_bh->b_size, meta_bg); + update_backups(sb, gdb_bh->b_blocknr - padding_blocks, + gdb_bh->b_data, gdb_bh->b_size, meta_bg); old_gdb = gdb_bh->b_blocknr; } }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kemeng Shi shikemeng@huaweicloud.com
commit 48f1551592c54f7d8e2befc72a99ff4e47f7dca0 upstream.
Avoid to ignore error in "err".
Signed-off-by: Kemeng Shi shikemeng@huaweicloud.com Link: https://lore.kernel.org/r/20230826174712.4059355-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/resize.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -1940,9 +1940,7 @@ static int ext4_convert_meta_bg(struct s
errout: ret = ext4_journal_stop(handle); - if (!err) - err = ret; - return ret; + return err ? err : ret;
invalid_resize_inode: ext4_error(sb, "corrupted/inconsistent resize inode");
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Yi yi.zhang@huawei.com
commit 40ea98396a3659062267d1fe5f99af4f7e4f05e3 upstream.
When big allocate feature is enabled, we need to count and update reserved clusters before removing a delayed only extent_status entry. {init|count|get}_rsvd() have already done this, but the start block number of this counting isn't correct in the following case.
lblk end | | v v ------------------------- | | orig_es ------------------------- ^ ^ len1 is 0 | len2 |
If the start block of the orig_es entry founded is bigger than lblk, we passed lblk as start block to count_rsvd(), but the length is correct, finally, the range to be counted is offset. This patch fix this by passing the start blocks to 'orig_es->lblk + len1'.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230824092619.1327976-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/extents_status.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -1348,8 +1348,8 @@ retry: } } if (count_reserved) - count_rsvd(inode, lblk, orig_es.es_len - len1 - len2, - &orig_es, &rc); + count_rsvd(inode, orig_es.es_lblk + len1, + orig_es.es_len - len1 - len2, &orig_es, &rc); goto out_get_reserved; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kemeng Shi shikemeng@huaweicloud.com
commit 40dd7953f4d606c280074f10d23046b6812708ce upstream.
Wrong check of gdb backup in meta bg as following: first_group is the first group of meta_bg which contains target group, so target group is always >= first_group. We check if target group has gdb backup by comparing first_group with [group + 1] and [group + EXT4_DESC_PER_BLOCK(sb) - 1]. As group >= first_group, then [group + N] is
first_group. So no copy of gdb backup in meta bg is done in
setup_new_flex_group_blocks.
No need to do gdb backup copy in meta bg from setup_new_flex_group_blocks as we always copy updated gdb block to backups at end of ext4_flex_group_add as following:
ext4_flex_group_add /* no gdb backup copy for meta bg any more */ setup_new_flex_group_blocks
/* update current group number */ ext4_update_super sbi->s_groups_count += flex_gd->count;
/* * if group in meta bg contains backup is added, the primary gdb block * of the meta bg will be copy to backup in new added group here. */ for (; gdb_num <= gdb_num_end; gdb_num++) update_backups(...)
In summary, we can remove wrong gdb backup copy code in setup_new_flex_group_blocks.
Signed-off-by: Kemeng Shi shikemeng@huaweicloud.com Reviewed-by: Theodore Ts'o tytso@mit.edu Link: https://lore.kernel.org/r/20230826174712.4059355-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/resize.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
--- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -572,13 +572,8 @@ static int setup_new_flex_group_blocks(s if (meta_bg == 0 && !ext4_bg_has_super(sb, group)) goto handle_itb;
- if (meta_bg == 1) { - ext4_group_t first_group; - first_group = ext4_meta_bg_first_group(sb, group); - if (first_group != group + 1 && - first_group != group + EXT4_DESC_PER_BLOCK(sb) - 1) - goto handle_itb; - } + if (meta_bg == 1) + goto handle_itb;
block = start + ext4_bg_has_super(sb, group); /* Copy all of the GDT blocks into the backup in this group */
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian König christian.koenig@amd.com
commit 12f76050d8d4d10dab96333656b821bd4620d103 upstream.
We should not leak the pointer where we couldn't grab the reference on to the caller because it can be that the error handling still tries to put the reference then.
Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c @@ -178,6 +178,7 @@ int amdgpu_bo_list_get(struct amdgpu_fpr }
rcu_read_unlock(); + *result = NULL; return -ENOENT; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt (Google) rostedt@goodmis.org
commit bb32500fb9b78215e4ef6ee8b4345c5f5d7eafb4 upstream.
The following can crash the kernel:
# cd /sys/kernel/tracing # echo 'p:sched schedule' > kprobe_events # exec 5>>events/kprobes/sched/enable # > kprobe_events # exec 5>&-
The above commands:
1. Change directory to the tracefs directory 2. Create a kprobe event (doesn't matter what one) 3. Open bash file descriptor 5 on the enable file of the kprobe event 4. Delete the kprobe event (removes the files too) 5. Close the bash file descriptor 5
The above causes a crash!
BUG: kernel NULL pointer dereference, address: 0000000000000028 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:tracing_release_file_tr+0xc/0x50
What happens here is that the kprobe event creates a trace_event_file "file" descriptor that represents the file in tracefs to the event. It maintains state of the event (is it enabled for the given instance?). Opening the "enable" file gets a reference to the event "file" descriptor via the open file descriptor. When the kprobe event is deleted, the file is also deleted from the tracefs system which also frees the event "file" descriptor.
But as the tracefs file is still opened by user space, it will not be totally removed until the final dput() is called on it. But this is not true with the event "file" descriptor that is already freed. If the user does a write to or simply closes the file descriptor it will reference the event "file" descriptor that was just freed, causing a use-after-free bug.
To solve this, add a ref count to the event "file" descriptor as well as a new flag called "FREED". The "file" will not be freed until the last reference is released. But the FREE flag will be set when the event is removed to prevent any more modifications to that event from happening, even if there's still a reference to the event "file" descriptor.
Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.l... Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.l...
Cc: stable@vger.kernel.org Cc: Mark Rutland mark.rutland@arm.com Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files") Reported-by: Beau Belgrave beaub@linux.microsoft.com Tested-by: Beau Belgrave beaub@linux.microsoft.com Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/trace_events.h | 4 +++ kernel/trace/trace.c | 15 ++++++++++++++ kernel/trace/trace.h | 3 ++ kernel/trace/trace_events.c | 39 ++++++++++++++++++++++++------------- kernel/trace/trace_events_filter.c | 3 ++ 5 files changed, 51 insertions(+), 13 deletions(-)
--- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -341,6 +341,7 @@ enum { EVENT_FILE_FL_TRIGGER_COND_BIT, EVENT_FILE_FL_PID_FILTER_BIT, EVENT_FILE_FL_WAS_ENABLED_BIT, + EVENT_FILE_FL_FREED_BIT, };
/* @@ -357,6 +358,7 @@ enum { * TRIGGER_COND - When set, one or more triggers has an associated filter * PID_FILTER - When set, the event is filtered based on pid * WAS_ENABLED - Set when enabled to know to clear trace on module removal + * FREED - File descriptor is freed, all fields should be considered invalid */ enum { EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT), @@ -370,6 +372,7 @@ enum { EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT), EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT), EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT), + EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT), };
struct trace_event_file { @@ -398,6 +401,7 @@ struct trace_event_file { * caching and such. Which is mostly OK ;-) */ unsigned long flags; + atomic_t ref; /* ref count for opened files */ atomic_t sm_ref; /* soft-mode reference counter */ atomic_t tm_ref; /* trigger-mode reference counter */ }; --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4257,6 +4257,20 @@ int tracing_open_file_tr(struct inode *i if (ret) return ret;
+ mutex_lock(&event_mutex); + + /* Fail if the file is marked for removal */ + if (file->flags & EVENT_FILE_FL_FREED) { + trace_array_put(file->tr); + ret = -ENODEV; + } else { + event_file_get(file); + } + + mutex_unlock(&event_mutex); + if (ret) + return ret; + filp->private_data = inode->i_private;
return 0; @@ -4267,6 +4281,7 @@ int tracing_release_file_tr(struct inode struct trace_event_file *file = inode->i_private;
trace_array_put(file->tr); + event_file_put(file);
return 0; } --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1696,6 +1696,9 @@ extern int register_event_command(struct extern int unregister_event_command(struct event_command *cmd); extern int register_trigger_hist_enable_disable_cmds(void);
+extern void event_file_get(struct trace_event_file *file); +extern void event_file_put(struct trace_event_file *file); + /** * struct event_trigger_ops - callbacks for trace event triggers * --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -698,21 +698,33 @@ static void remove_subsystem(struct trac } }
+void event_file_get(struct trace_event_file *file) +{ + atomic_inc(&file->ref); +} + +void event_file_put(struct trace_event_file *file) +{ + if (WARN_ON_ONCE(!atomic_read(&file->ref))) { + if (file->flags & EVENT_FILE_FL_FREED) + kmem_cache_free(file_cachep, file); + return; + } + + if (atomic_dec_and_test(&file->ref)) { + /* Count should only go to zero when it is freed */ + if (WARN_ON_ONCE(!(file->flags & EVENT_FILE_FL_FREED))) + return; + kmem_cache_free(file_cachep, file); + } +} + static void remove_event_file_dir(struct trace_event_file *file) { struct dentry *dir = file->dir; - struct dentry *child; - - if (dir) { - spin_lock(&dir->d_lock); /* probably unneeded */ - list_for_each_entry(child, &dir->d_subdirs, d_child) { - if (d_really_is_positive(child)) /* probably unneeded */ - d_inode(child)->i_private = NULL; - } - spin_unlock(&dir->d_lock);
+ if (dir) tracefs_remove_recursive(dir); - }
list_del(&file->list); remove_subsystem(file->system); @@ -1033,7 +1045,7 @@ event_enable_read(struct file *filp, cha flags = file->flags; mutex_unlock(&event_mutex);
- if (!file) + if (!file || flags & EVENT_FILE_FL_FREED) return -ENODEV;
if (flags & EVENT_FILE_FL_ENABLED && @@ -1071,7 +1083,7 @@ event_enable_write(struct file *filp, co ret = -ENODEV; mutex_lock(&event_mutex); file = event_file_data(filp); - if (likely(file)) + if (likely(file && !(file->flags & EVENT_FILE_FL_FREED))) ret = ftrace_event_enable_disable(file, val); mutex_unlock(&event_mutex); break; @@ -1340,7 +1352,7 @@ event_filter_read(struct file *filp, cha
mutex_lock(&event_mutex); file = event_file_data(filp); - if (file) + if (file && !(file->flags & EVENT_FILE_FL_FREED)) print_event_filter(file, s); mutex_unlock(&event_mutex);
@@ -2264,6 +2276,7 @@ trace_create_new_event(struct trace_even atomic_set(&file->tm_ref, 0); INIT_LIST_HEAD(&file->triggers); list_add(&file->list, &tr->events); + event_file_get(file);
return file; } --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -1800,6 +1800,9 @@ int apply_event_filter(struct trace_even struct event_filter *filter = NULL; int err;
+ if (file->flags & EVENT_FILE_FL_FREED) + return -ENODEV; + if (!strcmp(strstrip(filter_string), "0")) { filter_disable(file); filter = event_filter(file);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 0c2a85edd143162b3a698f31e94bf8cdc041da87 upstream.
The patch that adds support for stateful expressions in set definitions require this.
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3852,7 +3852,7 @@ err1: return err; }
-static void nft_set_destroy(struct nft_set *set) +static void nft_set_destroy(const struct nft_ctx *ctx, struct nft_set *set) { if (WARN_ON(set->use > 0)) return; @@ -4024,7 +4024,7 @@ EXPORT_SYMBOL_GPL(nf_tables_deactivate_s void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set) { if (list_empty(&set->bindings) && nft_set_is_anonymous(set)) - nft_set_destroy(set); + nft_set_destroy(ctx, set); } EXPORT_SYMBOL_GPL(nf_tables_destroy_set);
@@ -6715,7 +6715,7 @@ static void nft_commit_release(struct nf nf_tables_rule_destroy(&trans->ctx, nft_trans_rule(trans)); break; case NFT_MSG_DELSET: - nft_set_destroy(nft_trans_set(trans)); + nft_set_destroy(&trans->ctx, nft_trans_set(trans)); break; case NFT_MSG_DELSETELEM: nf_tables_set_elem_destroy(&trans->ctx, @@ -7176,7 +7176,7 @@ static void nf_tables_abort_release(stru nf_tables_rule_destroy(&trans->ctx, nft_trans_rule(trans)); break; case NFT_MSG_NEWSET: - nft_set_destroy(nft_trans_set(trans)); + nft_set_destroy(&trans->ctx, nft_trans_set(trans)); break; case NFT_MSG_NEWSETELEM: nft_set_elem_destroy(nft_trans_elem_set(trans), @@ -7951,7 +7951,7 @@ static void __nft_release_table(struct n list_for_each_entry_safe(set, ns, &table->sets, list) { list_del(&set->list); nft_use_dec(&table->use); - nft_set_destroy(set); + nft_set_destroy(&ctx, set); } list_for_each_entry_safe(obj, ne, &table->objects, list) { nft_obj_del(obj);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit f8bb7889af58d8e74d2d61c76b1418230f1610fa upstream.
Rename:
- nft_set_elem_activate() to nft_set_elem_data_activate(). - nft_set_elem_deactivate() to nft_set_elem_data_deactivate().
To prepare for updates in the set element infrastructure to add support for the special catch-all element.
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4602,8 +4602,8 @@ void nft_set_elem_destroy(const struct n } EXPORT_SYMBOL_GPL(nft_set_elem_destroy);
-/* Only called from commit path, nft_set_elem_deactivate() already deals with - * the refcounting from the preparation phase. +/* Only called from commit path, nft_setelem_data_deactivate() already deals + * with the refcounting from the preparation phase. */ static void nf_tables_set_elem_destroy(const struct nft_ctx *ctx, const struct nft_set *set, void *elem) @@ -4919,9 +4919,9 @@ void nft_data_hold(const struct nft_data } }
-static void nft_set_elem_activate(const struct net *net, - const struct nft_set *set, - struct nft_set_elem *elem) +static void nft_setelem_data_activate(const struct net *net, + const struct nft_set *set, + struct nft_set_elem *elem) { const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
@@ -4931,9 +4931,9 @@ static void nft_set_elem_activate(const nft_use_inc_restore(&(*nft_set_ext_obj(ext))->use); }
-static void nft_set_elem_deactivate(const struct net *net, - const struct nft_set *set, - struct nft_set_elem *elem) +static void nft_setelem_data_deactivate(const struct net *net, + const struct nft_set *set, + struct nft_set_elem *elem) { const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
@@ -5000,7 +5000,7 @@ static int nft_del_setelem(struct nft_ct kfree(elem.priv); elem.priv = priv;
- nft_set_elem_deactivate(ctx->net, set, &elem); + nft_setelem_data_deactivate(ctx->net, set, &elem);
nft_trans_elem(trans) = elem; nft_trans_commit_list_add_tail(ctx->net, trans); @@ -5034,7 +5034,7 @@ static int nft_flush_set(const struct nf } set->ndeact++;
- nft_set_elem_deactivate(ctx->net, set, elem); + nft_setelem_data_deactivate(ctx->net, set, elem); nft_trans_elem_set(trans) = set; nft_trans_elem(trans) = *elem; nft_trans_commit_list_add_tail(ctx->net, trans); @@ -7277,7 +7277,7 @@ static int __nf_tables_abort(struct net case NFT_MSG_DELSETELEM: te = (struct nft_trans_elem *)trans->data;
- nft_set_elem_activate(net, te->set, &te->elem); + nft_setelem_data_activate(net, te->set, &te->elem); te->set->ops->activate(net, te->set, &te->elem); te->set->ndeact--;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 628bd3e49cba1c066228e23d71a852c23e26da73 upstream.
set .destroy callback releases the references to other objects in maps. This is very late and it results in spurious EBUSY errors. Drop refcount from the preparation phase instead, update set backend not to drop reference counter from set .destroy path.
Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the reference counter because the transaction abort path releases the map references for each element since the set is unbound. The abort path also deals with releasing reference counter for new elements added to unbound sets.
Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 5 +- net/netfilter/nf_tables_api.c | 89 ++++++++++++++++++++++++++++++++++---- net/netfilter/nft_set_bitmap.c | 5 +- net/netfilter/nft_set_hash.c | 23 +++++++-- net/netfilter/nft_set_rbtree.c | 5 +- 5 files changed, 108 insertions(+), 19 deletions(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -371,7 +371,8 @@ struct nft_set_ops { int (*init)(const struct nft_set *set, const struct nft_set_desc *desc, const struct nlattr * const nla[]); - void (*destroy)(const struct nft_set *set); + void (*destroy)(const struct nft_ctx *ctx, + const struct nft_set *set); void (*gc_init)(const struct nft_set *set);
unsigned int elemsize; @@ -665,6 +666,8 @@ void *nft_set_elem_init(const struct nft u64 timeout, u64 expiration, gfp_t gfp); void nft_set_elem_destroy(const struct nft_set *set, void *elem, bool destroy_expr); +void nf_tables_set_elem_destroy(const struct nft_ctx *ctx, + const struct nft_set *set, void *elem);
/** * struct nft_set_gc_batch_head - nf_tables set garbage collection batch --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -403,6 +403,31 @@ static int nft_trans_set_add(const struc return 0; }
+static void nft_setelem_data_deactivate(const struct net *net, + const struct nft_set *set, + struct nft_set_elem *elem); + +static int nft_mapelem_deactivate(const struct nft_ctx *ctx, + struct nft_set *set, + const struct nft_set_iter *iter, + struct nft_set_elem *elem) +{ + nft_setelem_data_deactivate(ctx->net, set, elem); + + return 0; +} + +static void nft_map_deactivate(const struct nft_ctx *ctx, struct nft_set *set) +{ + struct nft_set_iter iter = { + .genmask = nft_genmask_next(ctx->net), + .fn = nft_mapelem_deactivate, + }; + + set->ops->walk(ctx, set, &iter); + WARN_ON_ONCE(iter.err); +} + static int nft_delset(const struct nft_ctx *ctx, struct nft_set *set) { int err; @@ -411,6 +436,9 @@ static int nft_delset(const struct nft_c if (err < 0) return err;
+ if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) + nft_map_deactivate(ctx, set); + nft_deactivate_next(ctx->net, set); nft_use_dec(&ctx->table->use);
@@ -3840,7 +3868,7 @@ static int nf_tables_newset(struct net * return 0;
err4: - ops->destroy(set); + ops->destroy(&ctx, set); err3: kfree(set->name); err2: @@ -3857,7 +3885,7 @@ static void nft_set_destroy(const struct if (WARN_ON(set->use > 0)) return;
- set->ops->destroy(set); + set->ops->destroy(ctx, set); module_put(to_set_type(set->ops)->owner); kfree(set->name); kvfree(set); @@ -3981,10 +4009,39 @@ static void nf_tables_unbind_set(const s } }
+static void nft_setelem_data_activate(const struct net *net, + const struct nft_set *set, + struct nft_set_elem *elem); + +static int nft_mapelem_activate(const struct nft_ctx *ctx, + struct nft_set *set, + const struct nft_set_iter *iter, + struct nft_set_elem *elem) +{ + nft_setelem_data_activate(ctx->net, set, elem); + + return 0; +} + +static void nft_map_activate(const struct nft_ctx *ctx, struct nft_set *set) +{ + struct nft_set_iter iter = { + .genmask = nft_genmask_next(ctx->net), + .fn = nft_mapelem_activate, + }; + + set->ops->walk(ctx, set, &iter); + WARN_ON_ONCE(iter.err); +} + void nf_tables_activate_set(const struct nft_ctx *ctx, struct nft_set *set) { - if (nft_set_is_anonymous(set)) + if (nft_set_is_anonymous(set)) { + if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) + nft_map_activate(ctx, set); + nft_clear(ctx->net, set); + }
nft_use_inc_restore(&set->use); } @@ -4005,13 +4062,20 @@ void nf_tables_deactivate_set(const stru nft_use_dec(&set->use); break; case NFT_TRANS_PREPARE: - if (nft_set_is_anonymous(set)) - nft_deactivate_next(ctx->net, set); + if (nft_set_is_anonymous(set)) { + if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) + nft_map_deactivate(ctx, set);
+ nft_deactivate_next(ctx->net, set); + } nft_use_dec(&set->use); return; case NFT_TRANS_ABORT: case NFT_TRANS_RELEASE: + if (nft_set_is_anonymous(set) && + set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) + nft_map_deactivate(ctx, set); + nft_use_dec(&set->use); /* fall through */ default: @@ -4574,6 +4638,7 @@ void *nft_set_elem_init(const struct nft return elem; }
+/* Drop references and destroy. Called from gc, dynset and abort path. */ void nft_set_elem_destroy(const struct nft_set *set, void *elem, bool destroy_expr) { @@ -4602,11 +4667,11 @@ void nft_set_elem_destroy(const struct n } EXPORT_SYMBOL_GPL(nft_set_elem_destroy);
-/* Only called from commit path, nft_setelem_data_deactivate() already deals - * with the refcounting from the preparation phase. +/* Destroy element. References have been already dropped in the preparation + * path via nft_setelem_data_deactivate(). */ -static void nf_tables_set_elem_destroy(const struct nft_ctx *ctx, - const struct nft_set *set, void *elem) +void nf_tables_set_elem_destroy(const struct nft_ctx *ctx, + const struct nft_set *set, void *elem) { struct nft_set_ext *ext = nft_set_elem_ext(set, elem);
@@ -4614,6 +4679,7 @@ static void nf_tables_set_elem_destroy(c nf_tables_expr_destroy(ctx, nft_set_ext_expr(ext)); kfree(elem); } +EXPORT_SYMBOL_GPL(nf_tables_set_elem_destroy);
static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, const struct nlattr *attr, u32 nlmsg_flags) @@ -7263,6 +7329,8 @@ static int __nf_tables_abort(struct net case NFT_MSG_DELSET: nft_use_inc_restore(&trans->ctx.table->use); nft_clear(trans->ctx.net, nft_trans_set(trans)); + if (nft_trans_set(trans)->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) + nft_map_activate(&trans->ctx, nft_trans_set(trans)); nft_trans_destroy(trans); break; case NFT_MSG_NEWSETELEM: @@ -7951,6 +8019,9 @@ static void __nft_release_table(struct n list_for_each_entry_safe(set, ns, &table->sets, list) { list_del(&set->list); nft_use_dec(&table->use); + if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) + nft_map_deactivate(&ctx, set); + nft_set_destroy(&ctx, set); } list_for_each_entry_safe(obj, ne, &table->objects, list) { --- a/net/netfilter/nft_set_bitmap.c +++ b/net/netfilter/nft_set_bitmap.c @@ -270,13 +270,14 @@ static int nft_bitmap_init(const struct return 0; }
-static void nft_bitmap_destroy(const struct nft_set *set) +static void nft_bitmap_destroy(const struct nft_ctx *ctx, + const struct nft_set *set) { struct nft_bitmap *priv = nft_set_priv(set); struct nft_bitmap_elem *be, *n;
list_for_each_entry_safe(be, n, &priv->list, head) - nft_set_elem_destroy(set, be, true); + nf_tables_set_elem_destroy(ctx, set, be); }
static bool nft_bitmap_estimate(const struct nft_set_desc *desc, u32 features, --- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -380,19 +380,31 @@ static int nft_rhash_init(const struct n return 0; }
+struct nft_rhash_ctx { + const struct nft_ctx ctx; + const struct nft_set *set; +}; + static void nft_rhash_elem_destroy(void *ptr, void *arg) { - nft_set_elem_destroy(arg, ptr, true); + struct nft_rhash_ctx *rhash_ctx = arg; + + nf_tables_set_elem_destroy(&rhash_ctx->ctx, rhash_ctx->set, ptr); }
-static void nft_rhash_destroy(const struct nft_set *set) +static void nft_rhash_destroy(const struct nft_ctx *ctx, + const struct nft_set *set) { struct nft_rhash *priv = nft_set_priv(set); + struct nft_rhash_ctx rhash_ctx = { + .ctx = *ctx, + .set = set, + };
cancel_delayed_work_sync(&priv->gc_work); rcu_barrier(); rhashtable_free_and_destroy(&priv->ht, nft_rhash_elem_destroy, - (void *)set); + (void *)&rhash_ctx); }
/* Number of buckets is stored in u32, so cap our result to 1U<<31 */ @@ -621,7 +633,8 @@ static int nft_hash_init(const struct nf return 0; }
-static void nft_hash_destroy(const struct nft_set *set) +static void nft_hash_destroy(const struct nft_ctx *ctx, + const struct nft_set *set) { struct nft_hash *priv = nft_set_priv(set); struct nft_hash_elem *he; @@ -631,7 +644,7 @@ static void nft_hash_destroy(const struc for (i = 0; i < priv->buckets; i++) { hlist_for_each_entry_safe(he, next, &priv->table[i], node) { hlist_del_rcu(&he->node); - nft_set_elem_destroy(set, he, true); + nf_tables_set_elem_destroy(ctx, set, he); } } } --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -480,7 +480,8 @@ static int nft_rbtree_init(const struct return 0; }
-static void nft_rbtree_destroy(const struct nft_set *set) +static void nft_rbtree_destroy(const struct nft_ctx *ctx, + const struct nft_set *set) { struct nft_rbtree *priv = nft_set_priv(set); struct nft_rbtree_elem *rbe; @@ -491,7 +492,7 @@ static void nft_rbtree_destroy(const str while ((node = priv->root.rb_node) != NULL) { rb_erase(node, &priv->root); rbe = rb_entry(node, struct nft_rbtree_elem, node); - nft_set_elem_destroy(set, rbe, true); + nf_tables_set_elem_destroy(ctx, set, rbe); } }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit c9e6978e2725a7d4b6cd23b2facd3f11422c0643 upstream.
...instead of a tree descent, which became overly complicated in an attempt to cover cases where expired or inactive elements would affect comparisons with the new element being inserted.
Further, it turned out that it's probably impossible to cover all those cases, as inactive nodes might entirely hide subtrees consisting of a complete interval plus a node that makes the current insertion not overlap.
To speed up the overlap check, descent the tree to find a greater element that is closer to the key value to insert. Then walk down the node list for overlap detection. Starting the overlap check from rb_first() unconditionally is slow, it takes 10 times longer due to the full linear traversal of the list.
Moreover, perform garbage collection of expired elements when walking down the node list to avoid bogus overlap reports.
For the insertion operation itself, this essentially reverts back to the implementation before commit 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion"), except that cases of complete overlap are already handled in the overlap detection phase itself, which slightly simplifies the loop to find the insertion point.
Based on initial patch from Stefano Brivio, including text from the original patch description too.
Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_set_rbtree.c | 223 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 198 insertions(+), 25 deletions(-)
--- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -38,10 +38,12 @@ static bool nft_rbtree_interval_start(co return !nft_rbtree_interval_end(rbe); }
-static bool nft_rbtree_equal(const struct nft_set *set, const void *this, - const struct nft_rbtree_elem *interval) +static int nft_rbtree_cmp(const struct nft_set *set, + const struct nft_rbtree_elem *e1, + const struct nft_rbtree_elem *e2) { - return memcmp(this, nft_set_ext_key(&interval->ext), set->klen) == 0; + return memcmp(nft_set_ext_key(&e1->ext), nft_set_ext_key(&e2->ext), + set->klen); }
static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set, @@ -52,7 +54,6 @@ static bool __nft_rbtree_lookup(const st const struct nft_rbtree_elem *rbe, *interval = NULL; u8 genmask = nft_genmask_cur(net); const struct rb_node *parent; - const void *this; int d;
parent = rcu_dereference_raw(priv->root.rb_node); @@ -62,12 +63,11 @@ static bool __nft_rbtree_lookup(const st
rbe = rb_entry(parent, struct nft_rbtree_elem, node);
- this = nft_set_ext_key(&rbe->ext); - d = memcmp(this, key, set->klen); + d = memcmp(nft_set_ext_key(&rbe->ext), key, set->klen); if (d < 0) { parent = rcu_dereference_raw(parent->rb_left); if (interval && - nft_rbtree_equal(set, this, interval) && + !nft_rbtree_cmp(set, rbe, interval) && nft_rbtree_interval_end(rbe) && nft_rbtree_interval_start(interval)) continue; @@ -214,43 +214,216 @@ static void *nft_rbtree_get(const struct return rbe; }
+static int nft_rbtree_gc_elem(const struct nft_set *__set, + struct nft_rbtree *priv, + struct nft_rbtree_elem *rbe) +{ + struct nft_set *set = (struct nft_set *)__set; + struct rb_node *prev = rb_prev(&rbe->node); + struct nft_rbtree_elem *rbe_prev; + struct nft_set_gc_batch *gcb; + + gcb = nft_set_gc_batch_check(set, NULL, GFP_ATOMIC); + if (!gcb) + return -ENOMEM; + + /* search for expired end interval coming before this element. */ + do { + rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node); + if (nft_rbtree_interval_end(rbe_prev)) + break; + + prev = rb_prev(prev); + } while (prev != NULL); + + rb_erase(&rbe_prev->node, &priv->root); + rb_erase(&rbe->node, &priv->root); + atomic_sub(2, &set->nelems); + + nft_set_gc_batch_add(gcb, rbe); + nft_set_gc_batch_complete(gcb); + + return 0; +} + +static bool nft_rbtree_update_first(const struct nft_set *set, + struct nft_rbtree_elem *rbe, + struct rb_node *first) +{ + struct nft_rbtree_elem *first_elem; + + first_elem = rb_entry(first, struct nft_rbtree_elem, node); + /* this element is closest to where the new element is to be inserted: + * update the first element for the node list path. + */ + if (nft_rbtree_cmp(set, rbe, first_elem) < 0) + return true; + + return false; +} + static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, struct nft_rbtree_elem *new, struct nft_set_ext **ext) { + struct nft_rbtree_elem *rbe, *rbe_le = NULL, *rbe_ge = NULL; + struct rb_node *node, *parent, **p, *first = NULL; struct nft_rbtree *priv = nft_set_priv(set); u8 genmask = nft_genmask_next(net); - struct nft_rbtree_elem *rbe; - struct rb_node *parent, **p; - int d; + int d, err;
+ /* Descend the tree to search for an existing element greater than the + * key value to insert that is greater than the new element. This is the + * first element to walk the ordered elements to find possible overlap. + */ parent = NULL; p = &priv->root.rb_node; while (*p != NULL) { parent = *p; rbe = rb_entry(parent, struct nft_rbtree_elem, node); - d = memcmp(nft_set_ext_key(&rbe->ext), - nft_set_ext_key(&new->ext), - set->klen); - if (d < 0) + d = nft_rbtree_cmp(set, rbe, new); + + if (d < 0) { p = &parent->rb_left; - else if (d > 0) + } else if (d > 0) { + if (!first || + nft_rbtree_update_first(set, rbe, first)) + first = &rbe->node; + p = &parent->rb_right; - else { - if (nft_rbtree_interval_end(rbe) && - nft_rbtree_interval_start(new)) { + } else { + if (nft_rbtree_interval_end(rbe)) p = &parent->rb_left; - } else if (nft_rbtree_interval_start(rbe) && - nft_rbtree_interval_end(new)) { + else p = &parent->rb_right; - } else if (nft_set_elem_active(&rbe->ext, genmask)) { - *ext = &rbe->ext; - return -EEXIST; - } else { - p = &parent->rb_left; + } + } + + if (!first) + first = rb_first(&priv->root); + + /* Detect overlap by going through the list of valid tree nodes. + * Values stored in the tree are in reversed order, starting from + * highest to lowest value. + */ + for (node = first; node != NULL; node = rb_next(node)) { + rbe = rb_entry(node, struct nft_rbtree_elem, node); + + if (!nft_set_elem_active(&rbe->ext, genmask)) + continue; + + /* perform garbage collection to avoid bogus overlap reports. */ + if (nft_set_elem_expired(&rbe->ext)) { + err = nft_rbtree_gc_elem(set, priv, rbe); + if (err < 0) + return err; + + continue; + } + + d = nft_rbtree_cmp(set, rbe, new); + if (d == 0) { + /* Matching end element: no need to look for an + * overlapping greater or equal element. + */ + if (nft_rbtree_interval_end(rbe)) { + rbe_le = rbe; + break; + } + + /* first element that is greater or equal to key value. */ + if (!rbe_ge) { + rbe_ge = rbe; + continue; + } + + /* this is a closer more or equal element, update it. */ + if (nft_rbtree_cmp(set, rbe_ge, new) != 0) { + rbe_ge = rbe; + continue; } + + /* element is equal to key value, make sure flags are + * the same, an existing more or equal start element + * must not be replaced by more or equal end element. + */ + if ((nft_rbtree_interval_start(new) && + nft_rbtree_interval_start(rbe_ge)) || + (nft_rbtree_interval_end(new) && + nft_rbtree_interval_end(rbe_ge))) { + rbe_ge = rbe; + continue; + } + } else if (d > 0) { + /* annotate element greater than the new element. */ + rbe_ge = rbe; + continue; + } else if (d < 0) { + /* annotate element less than the new element. */ + rbe_le = rbe; + break; } } + + /* - new start element matching existing start element: full overlap + * reported as -EEXIST, cleared by caller if NLM_F_EXCL is not given. + */ + if (rbe_ge && !nft_rbtree_cmp(set, new, rbe_ge) && + nft_rbtree_interval_start(rbe_ge) == nft_rbtree_interval_start(new)) { + *ext = &rbe_ge->ext; + return -EEXIST; + } + + /* - new end element matching existing end element: full overlap + * reported as -EEXIST, cleared by caller if NLM_F_EXCL is not given. + */ + if (rbe_le && !nft_rbtree_cmp(set, new, rbe_le) && + nft_rbtree_interval_end(rbe_le) == nft_rbtree_interval_end(new)) { + *ext = &rbe_le->ext; + return -EEXIST; + } + + /* - new start element with existing closest, less or equal key value + * being a start element: partial overlap, reported as -ENOTEMPTY. + * Anonymous sets allow for two consecutive start element since they + * are constant, skip them to avoid bogus overlap reports. + */ + if (!nft_set_is_anonymous(set) && rbe_le && + nft_rbtree_interval_start(rbe_le) && nft_rbtree_interval_start(new)) + return -ENOTEMPTY; + + /* - new end element with existing closest, less or equal key value + * being a end element: partial overlap, reported as -ENOTEMPTY. + */ + if (rbe_le && + nft_rbtree_interval_end(rbe_le) && nft_rbtree_interval_end(new)) + return -ENOTEMPTY; + + /* - new end element with existing closest, greater or equal key value + * being an end element: partial overlap, reported as -ENOTEMPTY + */ + if (rbe_ge && + nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new)) + return -ENOTEMPTY; + + /* Accepted element: pick insertion point depending on key value */ + parent = NULL; + p = &priv->root.rb_node; + while (*p != NULL) { + parent = *p; + rbe = rb_entry(parent, struct nft_rbtree_elem, node); + d = nft_rbtree_cmp(set, rbe, new); + + if (d < 0) + p = &parent->rb_left; + else if (d > 0) + p = &parent->rb_right; + else if (nft_rbtree_interval_end(rbe)) + p = &parent->rb_left; + else + p = &parent->rb_right; + } + rb_link_node_rcu(&new->node, parent, p); rb_insert_color(&new->node, &priv->root); return 0;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 61ae320a29b0540c16931816299eb86bf2b66c08 upstream.
There is no guarantee that rb_prev() will not return NULL in nft_rbtree_gc_elem():
general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] nft_add_set_elem+0x14b0/0x2990 nf_tables_newsetelem+0x528/0xb30
Furthermore, there is a possible use-after-free while iterating, 'node' can be free'd so we need to cache the next value to use.
Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_set_rbtree.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-)
--- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -220,7 +220,7 @@ static int nft_rbtree_gc_elem(const stru { struct nft_set *set = (struct nft_set *)__set; struct rb_node *prev = rb_prev(&rbe->node); - struct nft_rbtree_elem *rbe_prev; + struct nft_rbtree_elem *rbe_prev = NULL; struct nft_set_gc_batch *gcb;
gcb = nft_set_gc_batch_check(set, NULL, GFP_ATOMIC); @@ -228,17 +228,21 @@ static int nft_rbtree_gc_elem(const stru return -ENOMEM;
/* search for expired end interval coming before this element. */ - do { + while (prev) { rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node); if (nft_rbtree_interval_end(rbe_prev)) break;
prev = rb_prev(prev); - } while (prev != NULL); + } + + if (rbe_prev) { + rb_erase(&rbe_prev->node, &priv->root); + atomic_dec(&set->nelems); + }
- rb_erase(&rbe_prev->node, &priv->root); rb_erase(&rbe->node, &priv->root); - atomic_sub(2, &set->nelems); + atomic_dec(&set->nelems);
nft_set_gc_batch_add(gcb, rbe); nft_set_gc_batch_complete(gcb); @@ -267,7 +271,7 @@ static int __nft_rbtree_insert(const str struct nft_set_ext **ext) { struct nft_rbtree_elem *rbe, *rbe_le = NULL, *rbe_ge = NULL; - struct rb_node *node, *parent, **p, *first = NULL; + struct rb_node *node, *next, *parent, **p, *first = NULL; struct nft_rbtree *priv = nft_set_priv(set); u8 genmask = nft_genmask_next(net); int d, err; @@ -306,7 +310,9 @@ static int __nft_rbtree_insert(const str * Values stored in the tree are in reversed order, starting from * highest to lowest value. */ - for (node = first; node != NULL; node = rb_next(node)) { + for (node = first; node != NULL; node = next) { + next = rb_next(node); + rbe = rb_entry(node, struct nft_rbtree_elem, node);
if (!nft_set_elem_active(&rbe->ext, genmask))
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit f718863aca469a109895cb855e6b81fff4827d71 upstream.
The lazy gc on insert that should remove timed-out entries fails to release the other half of the interval, if any.
Can be reproduced with tests/shell/testcases/sets/0044interval_overlap_0 in nftables.git and kmemleak enabled kernel.
Second bug is the use of rbe_prev vs. prev pointer. If rbe_prev() returns NULL after at least one iteration, rbe_prev points to element that is not an end interval, hence it should not be removed.
Lastly, check the genmask of the end interval if this is active in the current generation.
Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_set_rbtree.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
--- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -216,29 +216,37 @@ static void *nft_rbtree_get(const struct
static int nft_rbtree_gc_elem(const struct nft_set *__set, struct nft_rbtree *priv, - struct nft_rbtree_elem *rbe) + struct nft_rbtree_elem *rbe, + u8 genmask) { struct nft_set *set = (struct nft_set *)__set; struct rb_node *prev = rb_prev(&rbe->node); - struct nft_rbtree_elem *rbe_prev = NULL; + struct nft_rbtree_elem *rbe_prev; struct nft_set_gc_batch *gcb;
gcb = nft_set_gc_batch_check(set, NULL, GFP_ATOMIC); if (!gcb) return -ENOMEM;
- /* search for expired end interval coming before this element. */ + /* search for end interval coming before this element. + * end intervals don't carry a timeout extension, they + * are coupled with the interval start element. + */ while (prev) { rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node); - if (nft_rbtree_interval_end(rbe_prev)) + if (nft_rbtree_interval_end(rbe_prev) && + nft_set_elem_active(&rbe_prev->ext, genmask)) break;
prev = rb_prev(prev); }
- if (rbe_prev) { + if (prev) { + rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node); + rb_erase(&rbe_prev->node, &priv->root); atomic_dec(&set->nelems); + nft_set_gc_batch_add(gcb, rbe_prev); }
rb_erase(&rbe->node, &priv->root); @@ -320,7 +328,7 @@ static int __nft_rbtree_insert(const str
/* perform garbage collection to avoid bogus overlap reports. */ if (nft_set_elem_expired(&rbe->ext)) { - err = nft_rbtree_gc_elem(set, priv, rbe); + err = nft_rbtree_gc_elem(set, priv, rbe, genmask); if (err < 0) return err;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 24138933b97b055d486e8064b4a1721702442a9b upstream.
There is an asymmetry between commit/abort and preparation phase if the following conditions are met:
1. set is a verdict map ("1.2.3.4 : jump foo") 2. timeouts are enabled
In this case, following sequence is problematic:
1. element E in set S refers to chain C 2. userspace requests removal of set S 3. kernel does a set walk to decrement chain->use count for all elements from preparation phase 4. kernel does another set walk to remove elements from the commit phase (or another walk to do a chain->use increment for all elements from abort phase)
If E has already expired in 1), it will be ignored during list walk, so its use count won't have been changed.
Then, when set is culled, ->destroy callback will zap the element via nf_tables_set_elem_destroy(), but this function is only safe for elements that have been deactivated earlier from the preparation phase: lack of earlier deactivate removes the element but leaks the chain use count, which results in a WARN splat when the chain gets removed later, plus a leak of the nft_chain structure.
Update pipapo_get() not to skip expired elements, otherwise flush command reports bogus ENOENT errors.
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 4 ++++ net/netfilter/nft_set_hash.c | 2 -- net/netfilter/nft_set_rbtree.c | 2 -- 3 files changed, 4 insertions(+), 4 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4258,8 +4258,12 @@ static int nf_tables_dump_setelem(const const struct nft_set_iter *iter, struct nft_set_elem *elem) { + const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv); struct nft_set_dump_args *args;
+ if (nft_set_elem_expired(ext)) + return 0; + args = container_of(iter, struct nft_set_dump_args, iter); return nf_tables_fill_setelem(args->skb, set, elem); } --- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -277,8 +277,6 @@ static void nft_rhash_walk(const struct
if (iter->count < iter->skip) goto cont; - if (nft_set_elem_expired(&he->ext)) - goto cont; if (!nft_set_elem_active(&he->ext, iter->genmask)) goto cont;
--- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -553,8 +553,6 @@ static void nft_rbtree_walk(const struct
if (iter->count < iter->skip) goto cont; - if (nft_set_elem_expired(&rbe->ext)) - goto cont; if (!nft_set_elem_active(&rbe->ext, iter->genmask)) goto cont;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 5f68718b34a531a556f2f50300ead2862278da26 upstream.
The set types rhashtable and rbtree use a GC worker to reclaim memory.
From system work queue, in periodic intervals, a scan of the table is
done.
The major caveat here is that the nft transaction mutex is not held. This causes a race between control plane and GC when they attempt to delete the same element.
We cannot grab the netlink mutex from the work queue, because the control plane has to wait for the GC work queue in case the set is to be removed, so we get following deadlock:
cpu 1 cpu2 GC work transaction comes in , lock nft mutex `acquire nft mutex // BLOCKS transaction asks to remove the set set destruction calls cancel_work_sync()
cancel_work_sync will now block forever, because it is waiting for the mutex the caller already owns.
This patch adds a new API that deals with garbage collection in two steps:
1) Lockless GC of expired elements sets on the NFT_SET_ELEM_DEAD_BIT so they are not visible via lookup. Annotate current GC sequence in the GC transaction. Enqueue GC transaction work as soon as it is full. If ruleset is updated, then GC transaction is aborted and retried later.
2) GC work grabs the mutex. If GC sequence has changed then this GC transaction lost race with control plane, abort it as it contains stale references to objects and let GC try again later. If the ruleset is intact, then this GC transaction deactivates and removes the elements and it uses call_rcu() to destroy elements.
Note that no elements are removed from GC lockless path, the _DEAD bit is set and pointers are collected. GC catchall does not remove the elements anymore too. There is a new set->dead flag that is set on to abort the GC transaction to deal with set->ops->destroy() path which removes the remaining elements in the set from commit_release, where no mutex is held.
To deal with GC when mutex is held, which allows safe deactivate and removal, add sync GC API which releases the set element object via call_rcu(). This is used by rbtree and pipapo backends which also perform garbage collection from control plane path.
Since element removal from sets can happen from control plane and element garbage collection/timeout, it is necessary to keep the set structure alive until all elements have been deactivated and destroyed.
We cannot do a cancel_work_sync or flush_work in nft_set_destroy because its called with the transaction mutex held, but the aforementioned async work queue might be blocked on the very mutex that nft_set_destroy() callchain is sitting on.
This gives us the choice of ABBA deadlock or UaF.
To avoid both, add set->refs refcount_t member. The GC API can then increment the set refcount and release it once the elements have been free'd.
Set backends are adapted to use the GC transaction API in a follow up patch entitled:
("netfilter: nf_tables: use gc transaction API in set backends")
This is joint work with Florian Westphal.
Fixes: cfed7e1b1f8e ("netfilter: nf_tables: add set garbage collection helpers") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 61 ++++++++++ net/netfilter/nf_tables_api.c | 225 ++++++++++++++++++++++++++++++++++++-- 2 files changed, 276 insertions(+), 10 deletions(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -402,6 +402,7 @@ void nft_unregister_set(struct nft_set_t * * @list: table set list node * @bindings: list of set bindings + * @refs: internal refcounting for async set destruction * @table: table this set belongs to * @net: netnamespace this set belongs to * @name: name of the set @@ -428,6 +429,7 @@ void nft_unregister_set(struct nft_set_t struct nft_set { struct list_head list; struct list_head bindings; + refcount_t refs; struct nft_table *table; possible_net_t net; char *name; @@ -446,7 +448,8 @@ struct nft_set { unsigned char *udata; /* runtime data below here */ const struct nft_set_ops *ops ____cacheline_aligned; - u16 flags:14, + u16 flags:13, + dead:1, genmask:2; u8 klen; u8 dlen; @@ -1386,6 +1389,32 @@ static inline void nft_set_elem_clear_bu clear_bit(NFT_SET_ELEM_BUSY_BIT, word); }
+#define NFT_SET_ELEM_DEAD_MASK (1 << 3) + +#if defined(__LITTLE_ENDIAN_BITFIELD) +#define NFT_SET_ELEM_DEAD_BIT 3 +#elif defined(__BIG_ENDIAN_BITFIELD) +#define NFT_SET_ELEM_DEAD_BIT (BITS_PER_LONG - BITS_PER_BYTE + 3) +#else +#error +#endif + +static inline void nft_set_elem_dead(struct nft_set_ext *ext) +{ + unsigned long *word = (unsigned long *)ext; + + BUILD_BUG_ON(offsetof(struct nft_set_ext, genmask) != 0); + set_bit(NFT_SET_ELEM_DEAD_BIT, word); +} + +static inline int nft_set_elem_is_dead(const struct nft_set_ext *ext) +{ + unsigned long *word = (unsigned long *)ext; + + BUILD_BUG_ON(offsetof(struct nft_set_ext, genmask) != 0); + return test_bit(NFT_SET_ELEM_DEAD_BIT, word); +} + /** * struct nft_trans - nf_tables object update in transaction * @@ -1490,6 +1519,35 @@ struct nft_trans_flowtable { #define nft_trans_flowtable(trans) \ (((struct nft_trans_flowtable *)trans->data)->flowtable)
+#define NFT_TRANS_GC_BATCHCOUNT 256 + +struct nft_trans_gc { + struct list_head list; + struct net *net; + struct nft_set *set; + u32 seq; + u8 count; + void *priv[NFT_TRANS_GC_BATCHCOUNT]; + struct rcu_head rcu; +}; + +struct nft_trans_gc *nft_trans_gc_alloc(struct nft_set *set, + unsigned int gc_seq, gfp_t gfp); +void nft_trans_gc_destroy(struct nft_trans_gc *trans); + +struct nft_trans_gc *nft_trans_gc_queue_async(struct nft_trans_gc *gc, + unsigned int gc_seq, gfp_t gfp); +void nft_trans_gc_queue_async_done(struct nft_trans_gc *gc); + +struct nft_trans_gc *nft_trans_gc_queue_sync(struct nft_trans_gc *gc, gfp_t gfp); +void nft_trans_gc_queue_sync_done(struct nft_trans_gc *trans); + +void nft_trans_gc_elem_add(struct nft_trans_gc *gc, void *priv); + +void nft_setelem_data_deactivate(const struct net *net, + const struct nft_set *set, + struct nft_set_elem *elem); + int __init nft_chain_filter_init(void); void nft_chain_filter_fini(void);
@@ -1510,6 +1568,7 @@ struct nftables_pernet { struct mutex commit_mutex; unsigned int base_seq; u8 validate_state; + unsigned int gc_seq; };
#endif /* _NET_NF_TABLES_H */ --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -26,12 +26,15 @@ #define NFT_MODULE_AUTOLOAD_LIMIT (MODULE_NAME_LEN - sizeof("nft-expr-255-"))
unsigned int nf_tables_net_id __read_mostly; +EXPORT_SYMBOL_GPL(nf_tables_net_id);
static LIST_HEAD(nf_tables_expressions); static LIST_HEAD(nf_tables_objects); static LIST_HEAD(nf_tables_flowtables); static LIST_HEAD(nf_tables_destroy_list); +static LIST_HEAD(nf_tables_gc_list); static DEFINE_SPINLOCK(nf_tables_destroy_list_lock); +static DEFINE_SPINLOCK(nf_tables_gc_list_lock); static u64 table_handle;
enum { @@ -88,6 +91,9 @@ static void nft_validate_state_update(st static void nf_tables_trans_destroy_work(struct work_struct *w); static DECLARE_WORK(trans_destroy_work, nf_tables_trans_destroy_work);
+static void nft_trans_gc_work(struct work_struct *work); +static DECLARE_WORK(trans_gc_work, nft_trans_gc_work); + static void nft_ctx_init(struct nft_ctx *ctx, struct net *net, const struct sk_buff *skb, @@ -403,10 +409,6 @@ static int nft_trans_set_add(const struc return 0; }
-static void nft_setelem_data_deactivate(const struct net *net, - const struct nft_set *set, - struct nft_set_elem *elem); - static int nft_mapelem_deactivate(const struct nft_ctx *ctx, struct nft_set *set, const struct nft_set_iter *iter, @@ -3838,6 +3840,7 @@ static int nf_tables_newset(struct net * }
INIT_LIST_HEAD(&set->bindings); + refcount_set(&set->refs, 1); set->table = table; write_pnet(&set->net, net); set->ops = ops; @@ -3880,6 +3883,14 @@ err1: return err; }
+static void nft_set_put(struct nft_set *set) +{ + if (refcount_dec_and_test(&set->refs)) { + kfree(set->name); + kvfree(set); + } +} + static void nft_set_destroy(const struct nft_ctx *ctx, struct nft_set *set) { if (WARN_ON(set->use > 0)) @@ -3887,8 +3898,7 @@ static void nft_set_destroy(const struct
set->ops->destroy(ctx, set); module_put(to_set_type(set->ops)->owner); - kfree(set->name); - kvfree(set); + nft_set_put(set); }
static int nf_tables_delset(struct net *net, struct sock *nlsk, @@ -5001,9 +5011,9 @@ static void nft_setelem_data_activate(co nft_use_inc_restore(&(*nft_set_ext_obj(ext))->use); }
-static void nft_setelem_data_deactivate(const struct net *net, - const struct nft_set *set, - struct nft_set_elem *elem) +void nft_setelem_data_deactivate(const struct net *net, + const struct nft_set *set, + struct nft_set_elem *elem) { const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
@@ -5012,6 +5022,7 @@ static void nft_setelem_data_deactivate( if (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF)) nft_use_dec(&(*nft_set_ext_obj(ext))->use); } +EXPORT_SYMBOL_GPL(nft_setelem_data_deactivate);
static int nft_del_setelem(struct nft_ctx *ctx, struct nft_set *set, const struct nlattr *attr) @@ -6964,6 +6975,186 @@ static void nft_chain_del(struct nft_cha list_del_rcu(&chain->list); }
+static void nft_trans_gc_setelem_remove(struct nft_ctx *ctx, + struct nft_trans_gc *trans) +{ + void **priv = trans->priv; + unsigned int i; + + for (i = 0; i < trans->count; i++) { + struct nft_set_elem elem = { + .priv = priv[i], + }; + + nft_setelem_data_deactivate(ctx->net, trans->set, &elem); + trans->set->ops->remove(trans->net, trans->set, &elem); + } +} + +void nft_trans_gc_destroy(struct nft_trans_gc *trans) +{ + nft_set_put(trans->set); + put_net(trans->net); + kfree(trans); +} +EXPORT_SYMBOL_GPL(nft_trans_gc_destroy); + +static void nft_trans_gc_trans_free(struct rcu_head *rcu) +{ + struct nft_set_elem elem = {}; + struct nft_trans_gc *trans; + struct nft_ctx ctx = {}; + unsigned int i; + + trans = container_of(rcu, struct nft_trans_gc, rcu); + ctx.net = read_pnet(&trans->set->net); + + for (i = 0; i < trans->count; i++) { + elem.priv = trans->priv[i]; + atomic_dec(&trans->set->nelems); + + nf_tables_set_elem_destroy(&ctx, trans->set, elem.priv); + } + + nft_trans_gc_destroy(trans); +} + +static bool nft_trans_gc_work_done(struct nft_trans_gc *trans) +{ + struct nftables_pernet *nft_net; + struct nft_ctx ctx = {}; + + nft_net = net_generic(trans->net, nf_tables_net_id); + + mutex_lock(&nft_net->commit_mutex); + + /* Check for race with transaction, otherwise this batch refers to + * stale objects that might not be there anymore. Skip transaction if + * set has been destroyed from control plane transaction in case gc + * worker loses race. + */ + if (READ_ONCE(nft_net->gc_seq) != trans->seq || trans->set->dead) { + mutex_unlock(&nft_net->commit_mutex); + return false; + } + + ctx.net = trans->net; + ctx.table = trans->set->table; + + nft_trans_gc_setelem_remove(&ctx, trans); + mutex_unlock(&nft_net->commit_mutex); + + return true; +} + +static void nft_trans_gc_work(struct work_struct *work) +{ + struct nft_trans_gc *trans, *next; + LIST_HEAD(trans_gc_list); + + spin_lock(&nf_tables_destroy_list_lock); + list_splice_init(&nf_tables_gc_list, &trans_gc_list); + spin_unlock(&nf_tables_destroy_list_lock); + + list_for_each_entry_safe(trans, next, &trans_gc_list, list) { + list_del(&trans->list); + if (!nft_trans_gc_work_done(trans)) { + nft_trans_gc_destroy(trans); + continue; + } + call_rcu(&trans->rcu, nft_trans_gc_trans_free); + } +} + +struct nft_trans_gc *nft_trans_gc_alloc(struct nft_set *set, + unsigned int gc_seq, gfp_t gfp) +{ + struct net *net = read_pnet(&set->net); + struct nft_trans_gc *trans; + + trans = kzalloc(sizeof(*trans), gfp); + if (!trans) + return NULL; + + refcount_inc(&set->refs); + trans->set = set; + trans->net = get_net(net); + trans->seq = gc_seq; + + return trans; +} +EXPORT_SYMBOL_GPL(nft_trans_gc_alloc); + +void nft_trans_gc_elem_add(struct nft_trans_gc *trans, void *priv) +{ + trans->priv[trans->count++] = priv; +} +EXPORT_SYMBOL_GPL(nft_trans_gc_elem_add); + +static void nft_trans_gc_queue_work(struct nft_trans_gc *trans) +{ + spin_lock(&nf_tables_gc_list_lock); + list_add_tail(&trans->list, &nf_tables_gc_list); + spin_unlock(&nf_tables_gc_list_lock); + + schedule_work(&trans_gc_work); +} + +static int nft_trans_gc_space(struct nft_trans_gc *trans) +{ + return NFT_TRANS_GC_BATCHCOUNT - trans->count; +} + +struct nft_trans_gc *nft_trans_gc_queue_async(struct nft_trans_gc *gc, + unsigned int gc_seq, gfp_t gfp) +{ + if (nft_trans_gc_space(gc)) + return gc; + + nft_trans_gc_queue_work(gc); + + return nft_trans_gc_alloc(gc->set, gc_seq, gfp); +} +EXPORT_SYMBOL_GPL(nft_trans_gc_queue_async); + +void nft_trans_gc_queue_async_done(struct nft_trans_gc *trans) +{ + if (trans->count == 0) { + nft_trans_gc_destroy(trans); + return; + } + + nft_trans_gc_queue_work(trans); +} +EXPORT_SYMBOL_GPL(nft_trans_gc_queue_async_done); + +struct nft_trans_gc *nft_trans_gc_queue_sync(struct nft_trans_gc *gc, gfp_t gfp) +{ + if (WARN_ON_ONCE(!lockdep_commit_lock_is_held(gc->net))) + return NULL; + + if (nft_trans_gc_space(gc)) + return gc; + + call_rcu(&gc->rcu, nft_trans_gc_trans_free); + + return nft_trans_gc_alloc(gc->set, 0, gfp); +} +EXPORT_SYMBOL_GPL(nft_trans_gc_queue_sync); + +void nft_trans_gc_queue_sync_done(struct nft_trans_gc *trans) +{ + WARN_ON_ONCE(!lockdep_commit_lock_is_held(trans->net)); + + if (trans->count == 0) { + nft_trans_gc_destroy(trans); + return; + } + + call_rcu(&trans->rcu, nft_trans_gc_trans_free); +} +EXPORT_SYMBOL_GPL(nft_trans_gc_queue_sync_done); + static void nf_tables_module_autoload_cleanup(struct net *net) { struct nftables_pernet *nft_net = net_generic(net, nf_tables_net_id); @@ -7018,6 +7209,7 @@ static int nf_tables_commit(struct net * struct nft_trans_elem *te; struct nft_chain *chain; struct nft_table *table; + unsigned int gc_seq; int err;
if (list_empty(&nft_net->commit_list)) { @@ -7074,6 +7266,10 @@ static int nf_tables_commit(struct net * while (++nft_net->base_seq == 0) ;
+ /* Bump gc counter, it becomes odd, this is the busy mark. */ + gc_seq = READ_ONCE(nft_net->gc_seq); + WRITE_ONCE(nft_net->gc_seq, ++gc_seq); + /* step 3. Start new generation, rules_gen_X now in use. */ net->nft.gencursor = nft_gencursor_next(net);
@@ -7151,6 +7347,7 @@ static int nf_tables_commit(struct net * nft_trans_destroy(trans); break; case NFT_MSG_DELSET: + nft_trans_set(trans)->dead = 1; list_del_rcu(&nft_trans_set(trans)->list); nf_tables_set_notify(&trans->ctx, nft_trans_set(trans), NFT_MSG_DELSET, GFP_KERNEL); @@ -7212,6 +7409,8 @@ static int nf_tables_commit(struct net * }
nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN); + + WRITE_ONCE(nft_net->gc_seq, ++gc_seq); nf_tables_commit_release(net);
return 0; @@ -8064,6 +8263,7 @@ static int __net_init nf_tables_init_net mutex_init(&nft_net->commit_mutex); nft_net->base_seq = 1; nft_net->validate_state = NFT_VALIDATE_SKIP; + nft_net->gc_seq = 0;
return 0; } @@ -8090,10 +8290,16 @@ static void __net_exit nf_tables_exit_ne WARN_ON_ONCE(!list_empty(&nft_net->module_list)); }
+static void nf_tables_exit_batch(struct list_head *net_exit_list) +{ + flush_work(&trans_gc_work); +} + static struct pernet_operations nf_tables_net_ops = { .init = nf_tables_init_net, .pre_exit = nf_tables_pre_exit_net, .exit = nf_tables_exit_net, + .exit_batch = nf_tables_exit_batch, .id = &nf_tables_net_id, .size = sizeof(struct nftables_pernet), }; @@ -8158,6 +8364,7 @@ static void __exit nf_tables_module_exit nft_chain_filter_fini(); nft_chain_route_fini(); unregister_pernet_subsys(&nf_tables_net_ops); + cancel_work_sync(&trans_gc_work); cancel_work_sync(&trans_destroy_work); rcu_barrier(); rhltable_destroy(&nft_objname_ht);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit f6c383b8c31a93752a52697f8430a71dcbc46adf upstream.
Use the GC transaction API to replace the old and buggy gc API and the busy mark approach.
No set elements are removed from async garbage collection anymore, instead the _DEAD bit is set on so the set element is not visible from lookup path anymore. Async GC enqueues transaction work that might be aborted and retried later.
rbtree and pipapo set backends does not set on the _DEAD bit from the sync GC path since this runs in control plane path where mutex is held. In this case, set elements are deactivated, removed and then released via RCU callback, sync GC never fails.
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_set_hash.c | 77 +++++++++++++++------- net/netfilter/nft_set_rbtree.c | 142 +++++++++++++++++++++++++---------------- 2 files changed, 142 insertions(+), 77 deletions(-)
--- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -17,6 +17,9 @@ #include <linux/netfilter.h> #include <linux/netfilter/nf_tables.h> #include <net/netfilter/nf_tables_core.h> +#include <net/netns/generic.h> + +extern unsigned int nf_tables_net_id;
/* We target a hash table size of 4, element hint is 75% of final size */ #define NFT_RHASH_ELEMENT_HINT 3 @@ -59,6 +62,8 @@ static inline int nft_rhash_cmp(struct r
if (memcmp(nft_set_ext_key(&he->ext), x->key, x->set->klen)) return 1; + if (nft_set_elem_is_dead(&he->ext)) + return 1; if (nft_set_elem_expired(&he->ext)) return 1; if (!nft_set_elem_active(&he->ext, x->genmask)) @@ -187,7 +192,6 @@ static void nft_rhash_activate(const str struct nft_rhash_elem *he = elem->priv;
nft_set_elem_change_active(net, set, &he->ext); - nft_set_elem_clear_busy(&he->ext); }
static bool nft_rhash_flush(const struct net *net, @@ -195,12 +199,9 @@ static bool nft_rhash_flush(const struct { struct nft_rhash_elem *he = priv;
- if (!nft_set_elem_mark_busy(&he->ext) || - !nft_is_active(net, &he->ext)) { - nft_set_elem_change_active(net, set, &he->ext); - return true; - } - return false; + nft_set_elem_change_active(net, set, &he->ext); + + return true; }
static void *nft_rhash_deactivate(const struct net *net, @@ -217,9 +218,8 @@ static void *nft_rhash_deactivate(const
rcu_read_lock(); he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params); - if (he != NULL && - !nft_rhash_flush(net, set, he)) - he = NULL; + if (he) + nft_set_elem_change_active(net, set, &he->ext);
rcu_read_unlock();
@@ -295,49 +295,77 @@ cont:
static void nft_rhash_gc(struct work_struct *work) { + struct nftables_pernet *nft_net; struct nft_set *set; struct nft_rhash_elem *he; struct nft_rhash *priv; - struct nft_set_gc_batch *gcb = NULL; struct rhashtable_iter hti; + struct nft_trans_gc *gc; + struct net *net; + u32 gc_seq;
priv = container_of(work, struct nft_rhash, gc_work.work); set = nft_set_container_of(priv); + net = read_pnet(&set->net); + nft_net = net_generic(net, nf_tables_net_id); + gc_seq = READ_ONCE(nft_net->gc_seq); + + gc = nft_trans_gc_alloc(set, gc_seq, GFP_KERNEL); + if (!gc) + goto done;
rhashtable_walk_enter(&priv->ht, &hti); rhashtable_walk_start(&hti);
while ((he = rhashtable_walk_next(&hti))) { if (IS_ERR(he)) { - if (PTR_ERR(he) != -EAGAIN) - break; + if (PTR_ERR(he) != -EAGAIN) { + nft_trans_gc_destroy(gc); + gc = NULL; + goto try_later; + } continue; }
+ /* Ruleset has been updated, try later. */ + if (READ_ONCE(nft_net->gc_seq) != gc_seq) { + nft_trans_gc_destroy(gc); + gc = NULL; + goto try_later; + } + + if (nft_set_elem_is_dead(&he->ext)) + goto dead_elem; + if (nft_set_ext_exists(&he->ext, NFT_SET_EXT_EXPR)) { struct nft_expr *expr = nft_set_ext_expr(&he->ext);
if (expr->ops->gc && expr->ops->gc(read_pnet(&set->net), expr)) - goto gc; + goto needs_gc_run; } if (!nft_set_elem_expired(&he->ext)) continue; -gc: - if (nft_set_elem_mark_busy(&he->ext)) - continue;
- gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC); - if (gcb == NULL) - break; - rhashtable_remove_fast(&priv->ht, &he->node, nft_rhash_params); - atomic_dec(&set->nelems); - nft_set_gc_batch_add(gcb, he); +needs_gc_run: + nft_set_elem_dead(&he->ext); +dead_elem: + gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC); + if (!gc) + goto try_later; + + nft_trans_gc_elem_add(gc, he); } + +try_later: + /* catchall list iteration requires rcu read side lock. */ rhashtable_walk_stop(&hti); rhashtable_walk_exit(&hti);
- nft_set_gc_batch_complete(gcb); + if (gc) + nft_trans_gc_queue_async_done(gc); + +done: queue_delayed_work(system_power_efficient_wq, &priv->gc_work, nft_set_gc_interval(set)); } @@ -400,7 +428,6 @@ static void nft_rhash_destroy(const stru };
cancel_delayed_work_sync(&priv->gc_work); - rcu_barrier(); rhashtable_free_and_destroy(&priv->ht, nft_rhash_elem_destroy, (void *)&rhash_ctx); } --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -14,6 +14,9 @@ #include <linux/netfilter.h> #include <linux/netfilter/nf_tables.h> #include <net/netfilter/nf_tables_core.h> +#include <net/netns/generic.h> + +extern unsigned int nf_tables_net_id;
struct nft_rbtree { struct rb_root root; @@ -46,6 +49,12 @@ static int nft_rbtree_cmp(const struct n set->klen); }
+static bool nft_rbtree_elem_expired(const struct nft_rbtree_elem *rbe) +{ + return nft_set_elem_expired(&rbe->ext) || + nft_set_elem_is_dead(&rbe->ext); +} + static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set, const u32 *key, const struct nft_set_ext **ext, unsigned int seq) @@ -80,7 +89,7 @@ static bool __nft_rbtree_lookup(const st continue; }
- if (nft_set_elem_expired(&rbe->ext)) + if (nft_rbtree_elem_expired(rbe)) return false;
if (nft_rbtree_interval_end(rbe)) { @@ -98,7 +107,7 @@ static bool __nft_rbtree_lookup(const st
if (set->flags & NFT_SET_INTERVAL && interval != NULL && nft_set_elem_active(&interval->ext, genmask) && - !nft_set_elem_expired(&interval->ext) && + !nft_rbtree_elem_expired(interval) && nft_rbtree_interval_start(interval)) { *ext = &interval->ext; return true; @@ -214,6 +223,18 @@ static void *nft_rbtree_get(const struct return rbe; }
+static void nft_rbtree_gc_remove(struct net *net, struct nft_set *set, + struct nft_rbtree *priv, + struct nft_rbtree_elem *rbe) +{ + struct nft_set_elem elem = { + .priv = rbe, + }; + + nft_setelem_data_deactivate(net, set, &elem); + rb_erase(&rbe->node, &priv->root); +} + static int nft_rbtree_gc_elem(const struct nft_set *__set, struct nft_rbtree *priv, struct nft_rbtree_elem *rbe, @@ -221,11 +242,12 @@ static int nft_rbtree_gc_elem(const stru { struct nft_set *set = (struct nft_set *)__set; struct rb_node *prev = rb_prev(&rbe->node); + struct net *net = read_pnet(&set->net); struct nft_rbtree_elem *rbe_prev; - struct nft_set_gc_batch *gcb; + struct nft_trans_gc *gc;
- gcb = nft_set_gc_batch_check(set, NULL, GFP_ATOMIC); - if (!gcb) + gc = nft_trans_gc_alloc(set, 0, GFP_ATOMIC); + if (!gc) return -ENOMEM;
/* search for end interval coming before this element. @@ -243,17 +265,28 @@ static int nft_rbtree_gc_elem(const stru
if (prev) { rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node); + nft_rbtree_gc_remove(net, set, priv, rbe_prev);
- rb_erase(&rbe_prev->node, &priv->root); - atomic_dec(&set->nelems); - nft_set_gc_batch_add(gcb, rbe_prev); + /* There is always room in this trans gc for this element, + * memory allocation never actually happens, hence, the warning + * splat in such case. No need to set NFT_SET_ELEM_DEAD_BIT, + * this is synchronous gc which never fails. + */ + gc = nft_trans_gc_queue_sync(gc, GFP_ATOMIC); + if (WARN_ON_ONCE(!gc)) + return -ENOMEM; + + nft_trans_gc_elem_add(gc, rbe_prev); }
- rb_erase(&rbe->node, &priv->root); - atomic_dec(&set->nelems); + nft_rbtree_gc_remove(net, set, priv, rbe); + gc = nft_trans_gc_queue_sync(gc, GFP_ATOMIC); + if (WARN_ON_ONCE(!gc)) + return -ENOMEM; + + nft_trans_gc_elem_add(gc, rbe);
- nft_set_gc_batch_add(gcb, rbe); - nft_set_gc_batch_complete(gcb); + nft_trans_gc_queue_sync_done(gc);
return 0; } @@ -481,7 +514,6 @@ static void nft_rbtree_activate(const st struct nft_rbtree_elem *rbe = elem->priv;
nft_set_elem_change_active(net, set, &rbe->ext); - nft_set_elem_clear_busy(&rbe->ext); }
static bool nft_rbtree_flush(const struct net *net, @@ -489,12 +521,9 @@ static bool nft_rbtree_flush(const struc { struct nft_rbtree_elem *rbe = priv;
- if (!nft_set_elem_mark_busy(&rbe->ext) || - !nft_is_active(net, &rbe->ext)) { - nft_set_elem_change_active(net, set, &rbe->ext); - return true; - } - return false; + nft_set_elem_change_active(net, set, &rbe->ext); + + return true; }
static void *nft_rbtree_deactivate(const struct net *net, @@ -571,26 +600,40 @@ cont:
static void nft_rbtree_gc(struct work_struct *work) { - struct nft_rbtree_elem *rbe, *rbe_end = NULL, *rbe_prev = NULL; - struct nft_set_gc_batch *gcb = NULL; + struct nft_rbtree_elem *rbe, *rbe_end = NULL; + struct nftables_pernet *nft_net; struct nft_rbtree *priv; + struct nft_trans_gc *gc; struct rb_node *node; struct nft_set *set; + unsigned int gc_seq; struct net *net; - u8 genmask;
priv = container_of(work, struct nft_rbtree, gc_work.work); set = nft_set_container_of(priv); net = read_pnet(&set->net); - genmask = nft_genmask_cur(net); + nft_net = net_generic(net, nf_tables_net_id); + gc_seq = READ_ONCE(nft_net->gc_seq); + + gc = nft_trans_gc_alloc(set, gc_seq, GFP_KERNEL); + if (!gc) + goto done;
write_lock_bh(&priv->lock); write_seqcount_begin(&priv->count); for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) { + + /* Ruleset has been updated, try later. */ + if (READ_ONCE(nft_net->gc_seq) != gc_seq) { + nft_trans_gc_destroy(gc); + gc = NULL; + goto try_later; + } + rbe = rb_entry(node, struct nft_rbtree_elem, node);
- if (!nft_set_elem_active(&rbe->ext, genmask)) - continue; + if (nft_set_elem_is_dead(&rbe->ext)) + goto dead_elem;
/* elements are reversed in the rbtree for historical reasons, * from highest to lowest value, that is why end element is @@ -600,43 +643,38 @@ static void nft_rbtree_gc(struct work_st rbe_end = rbe; continue; } + if (!nft_set_elem_expired(&rbe->ext)) continue;
- if (nft_set_elem_mark_busy(&rbe->ext)) { - rbe_end = NULL; + nft_set_elem_dead(&rbe->ext); + + if (!rbe_end) continue; - }
- if (rbe_prev) { - rb_erase(&rbe_prev->node, &priv->root); - rbe_prev = NULL; - } - gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC); - if (!gcb) - break; + nft_set_elem_dead(&rbe_end->ext);
- atomic_dec(&set->nelems); - nft_set_gc_batch_add(gcb, rbe); - rbe_prev = rbe; - - if (rbe_end) { - atomic_dec(&set->nelems); - nft_set_gc_batch_add(gcb, rbe_end); - rb_erase(&rbe_end->node, &priv->root); - rbe_end = NULL; - } - node = rb_next(node); - if (!node) - break; + gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC); + if (!gc) + goto try_later; + + nft_trans_gc_elem_add(gc, rbe_end); + rbe_end = NULL; +dead_elem: + gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC); + if (!gc) + goto try_later; + + nft_trans_gc_elem_add(gc, rbe); } - if (rbe_prev) - rb_erase(&rbe_prev->node, &priv->root); + +try_later: write_seqcount_end(&priv->count); write_unlock_bh(&priv->lock);
- nft_set_gc_batch_complete(gcb); - + if (gc) + nft_trans_gc_queue_async_done(gc); +done: queue_delayed_work(system_power_efficient_wq, &priv->gc_work, nft_set_gc_interval(set)); }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit c92db3030492b8ad1d0faace7a93bbcf53850d0c upstream.
Set on the NFT_SET_ELEM_DEAD_BIT flag on this element, instead of performing element removal which might race with an ongoing transaction. Enable gc when dynamic flag is set on since dynset deletion requires garbage collection after this patch.
Fixes: d0a8d877da97 ("netfilter: nft_dynset: support for element deletion") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_set_hash.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -251,7 +251,9 @@ static bool nft_rhash_delete(const struc if (he == NULL) return false;
- return rhashtable_remove_fast(&priv->ht, &he->node, nft_rhash_params) == 0; + nft_set_elem_dead(&he->ext); + + return true; }
static void nft_rhash_walk(const struct nft_ctx *ctx, struct nft_set *set, @@ -400,7 +402,7 @@ static int nft_rhash_init(const struct n return err;
INIT_DEFERRABLE_WORK(&priv->gc_work, nft_rhash_gc); - if (set->flags & NFT_SET_TIMEOUT) + if (set->flags & (NFT_SET_TIMEOUT | NFT_SET_EVAL)) nft_rhash_gc_init(set);
return 0;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit a2dd0233cbc4d8a0abb5f64487487ffc9265beb5 upstream.
Ditch it, it has been replace it by the GC transaction API and it has no clients anymore.
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 97 +------------------------------------- net/netfilter/nf_tables_api.c | 28 ---------- 2 files changed, 5 insertions(+), 120 deletions(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -672,62 +672,6 @@ void nft_set_elem_destroy(const struct n void nf_tables_set_elem_destroy(const struct nft_ctx *ctx, const struct nft_set *set, void *elem);
-/** - * struct nft_set_gc_batch_head - nf_tables set garbage collection batch - * - * @rcu: rcu head - * @set: set the elements belong to - * @cnt: count of elements - */ -struct nft_set_gc_batch_head { - struct rcu_head rcu; - const struct nft_set *set; - unsigned int cnt; -}; - -#define NFT_SET_GC_BATCH_SIZE ((PAGE_SIZE - \ - sizeof(struct nft_set_gc_batch_head)) / \ - sizeof(void *)) - -/** - * struct nft_set_gc_batch - nf_tables set garbage collection batch - * - * @head: GC batch head - * @elems: garbage collection elements - */ -struct nft_set_gc_batch { - struct nft_set_gc_batch_head head; - void *elems[NFT_SET_GC_BATCH_SIZE]; -}; - -struct nft_set_gc_batch *nft_set_gc_batch_alloc(const struct nft_set *set, - gfp_t gfp); -void nft_set_gc_batch_release(struct rcu_head *rcu); - -static inline void nft_set_gc_batch_complete(struct nft_set_gc_batch *gcb) -{ - if (gcb != NULL) - call_rcu(&gcb->head.rcu, nft_set_gc_batch_release); -} - -static inline struct nft_set_gc_batch * -nft_set_gc_batch_check(const struct nft_set *set, struct nft_set_gc_batch *gcb, - gfp_t gfp) -{ - if (gcb != NULL) { - if (gcb->head.cnt + 1 < ARRAY_SIZE(gcb->elems)) - return gcb; - nft_set_gc_batch_complete(gcb); - } - return nft_set_gc_batch_alloc(set, gfp); -} - -static inline void nft_set_gc_batch_add(struct nft_set_gc_batch *gcb, - void *elem) -{ - gcb->elems[gcb->head.cnt++] = elem; -} - struct nft_expr_ops; /** * struct nft_expr_type - nf_tables expression type @@ -1354,47 +1298,12 @@ static inline void nft_set_elem_change_a
#endif /* IS_ENABLED(CONFIG_NF_TABLES) */
-/* - * We use a free bit in the genmask field to indicate the element - * is busy, meaning it is currently being processed either by - * the netlink API or GC. - * - * Even though the genmask is only a single byte wide, this works - * because the extension structure if fully constant once initialized, - * so there are no non-atomic write accesses unless it is already - * marked busy. - */ -#define NFT_SET_ELEM_BUSY_MASK (1 << 2) - -#if defined(__LITTLE_ENDIAN_BITFIELD) -#define NFT_SET_ELEM_BUSY_BIT 2 -#elif defined(__BIG_ENDIAN_BITFIELD) -#define NFT_SET_ELEM_BUSY_BIT (BITS_PER_LONG - BITS_PER_BYTE + 2) -#else -#error -#endif - -static inline int nft_set_elem_mark_busy(struct nft_set_ext *ext) -{ - unsigned long *word = (unsigned long *)ext; - - BUILD_BUG_ON(offsetof(struct nft_set_ext, genmask) != 0); - return test_and_set_bit(NFT_SET_ELEM_BUSY_BIT, word); -} - -static inline void nft_set_elem_clear_busy(struct nft_set_ext *ext) -{ - unsigned long *word = (unsigned long *)ext; - - clear_bit(NFT_SET_ELEM_BUSY_BIT, word); -} - -#define NFT_SET_ELEM_DEAD_MASK (1 << 3) +#define NFT_SET_ELEM_DEAD_MASK (1 << 2)
#if defined(__LITTLE_ENDIAN_BITFIELD) -#define NFT_SET_ELEM_DEAD_BIT 3 +#define NFT_SET_ELEM_DEAD_BIT 2 #elif defined(__BIG_ENDIAN_BITFIELD) -#define NFT_SET_ELEM_DEAD_BIT (BITS_PER_LONG - BITS_PER_BYTE + 3) +#define NFT_SET_ELEM_DEAD_BIT (BITS_PER_LONG - BITS_PER_BYTE + 2) #else #error #endif --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4879,7 +4879,8 @@ static int nft_add_set_elem(struct nft_c if (trans == NULL) goto err4;
- ext->genmask = nft_genmask_cur(ctx->net) | NFT_SET_ELEM_BUSY_MASK; + ext->genmask = nft_genmask_cur(ctx->net); + err = set->ops->insert(ctx->net, set, &elem, &ext2); if (err) { if (err == -EEXIST) { @@ -5172,31 +5173,6 @@ static int nf_tables_delsetelem(struct n return err; }
-void nft_set_gc_batch_release(struct rcu_head *rcu) -{ - struct nft_set_gc_batch *gcb; - unsigned int i; - - gcb = container_of(rcu, struct nft_set_gc_batch, head.rcu); - for (i = 0; i < gcb->head.cnt; i++) - nft_set_elem_destroy(gcb->head.set, gcb->elems[i], true); - kfree(gcb); -} -EXPORT_SYMBOL_GPL(nft_set_gc_batch_release); - -struct nft_set_gc_batch *nft_set_gc_batch_alloc(const struct nft_set *set, - gfp_t gfp) -{ - struct nft_set_gc_batch *gcb; - - gcb = kzalloc(sizeof(*gcb), gfp); - if (gcb == NULL) - return gcb; - gcb->head.set = set; - return gcb; -} -EXPORT_SYMBOL_GPL(nft_set_gc_batch_alloc); - /* * Stateful objects */
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 6a33d8b73dfac0a41f3877894b38082bd0c9a5bc upstream.
Netlink event path is missing a synchronization point with GC transactions. Add GC sequence number update to netns release path and netlink event path, any GC transaction losing race will be discarded.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7178,6 +7178,22 @@ static void nf_tables_commit_release(str mutex_unlock(&nft_net->commit_mutex); }
+static unsigned int nft_gc_seq_begin(struct nftables_pernet *nft_net) +{ + unsigned int gc_seq; + + /* Bump gc counter, it becomes odd, this is the busy mark. */ + gc_seq = READ_ONCE(nft_net->gc_seq); + WRITE_ONCE(nft_net->gc_seq, ++gc_seq); + + return gc_seq; +} + +static void nft_gc_seq_end(struct nftables_pernet *nft_net, unsigned int gc_seq) +{ + WRITE_ONCE(nft_net->gc_seq, ++gc_seq); +} + static int nf_tables_commit(struct net *net, struct sk_buff *skb) { struct nftables_pernet *nft_net = net_generic(net, nf_tables_net_id); @@ -7242,9 +7258,7 @@ static int nf_tables_commit(struct net * while (++nft_net->base_seq == 0) ;
- /* Bump gc counter, it becomes odd, this is the busy mark. */ - gc_seq = READ_ONCE(nft_net->gc_seq); - WRITE_ONCE(nft_net->gc_seq, ++gc_seq); + gc_seq = nft_gc_seq_begin(nft_net);
/* step 3. Start new generation, rules_gen_X now in use. */ net->nft.gencursor = nft_gencursor_next(net); @@ -7386,7 +7400,7 @@ static int nf_tables_commit(struct net *
nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN);
- WRITE_ONCE(nft_net->gc_seq, ++gc_seq); + nft_gc_seq_end(nft_net, gc_seq); nf_tables_commit_release(net);
return 0; @@ -8256,11 +8270,18 @@ static void __net_exit nf_tables_pre_exi static void __net_exit nf_tables_exit_net(struct net *net) { struct nftables_pernet *nft_net = net_generic(net, nf_tables_net_id); + unsigned int gc_seq;
mutex_lock(&nft_net->commit_mutex); + + gc_seq = nft_gc_seq_begin(nft_net); + if (!list_empty(&nft_net->commit_list)) __nf_tables_abort(net, NFNL_ABORT_NONE); __nft_release_tables(net); + + nft_gc_seq_end(nft_net, gc_seq); + mutex_unlock(&nft_net->commit_mutex); WARN_ON_ONCE(!list_empty(&nft_net->tables)); WARN_ON_ONCE(!list_empty(&nft_net->module_list));
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 02c6c24402bf1c1e986899c14ba22a10b510916b upstream.
Use maybe_get_net() since GC workqueue might race with netns exit path.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7052,9 +7052,14 @@ struct nft_trans_gc *nft_trans_gc_alloc( if (!trans) return NULL;
+ trans->net = maybe_get_net(net); + if (!trans->net) { + kfree(trans); + return NULL; + } + refcount_inc(&set->refs); trans->set = set; - trans->net = get_net(net); trans->seq = gc_seq;
return trans;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 720344340fb9be2765bbaab7b292ece0a4570eae upstream.
Abort path is missing a synchronization point with GC transactions. Add GC sequence number hence any GC transaction losing race will be discarded.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7597,7 +7597,12 @@ static int nf_tables_abort(struct net *n enum nfnl_abort_action action) { struct nftables_pernet *nft_net = net_generic(net, nf_tables_net_id); - int ret = __nf_tables_abort(net, action); + unsigned int gc_seq; + int ret; + + gc_seq = nft_gc_seq_begin(nft_net); + ret = __nf_tables_abort(net, action); + nft_gc_seq_end(nft_net, gc_seq);
mutex_unlock(&nft_net->commit_mutex);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 8357bc946a2abc2a10ca40e5a2105d2b4c57515e upstream.
Use nf_tables_gc_list_lock spinlock, not nf_tables_destroy_list_lock to protect the gc_list.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7028,9 +7028,9 @@ static void nft_trans_gc_work(struct wor struct nft_trans_gc *trans, *next; LIST_HEAD(trans_gc_list);
- spin_lock(&nf_tables_destroy_list_lock); + spin_lock(&nf_tables_gc_list_lock); list_splice_init(&nf_tables_gc_list, &trans_gc_list); - spin_unlock(&nf_tables_destroy_list_lock); + spin_unlock(&nf_tables_gc_list_lock);
list_for_each_entry_safe(trans, next, &trans_gc_list, list) { list_del(&trans->list);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 8e51830e29e12670b4c10df070a4ea4c9593e961 upstream.
Don't queue more gc work, else we may queue the same elements multiple times.
If an element is flagged as dead, this can mean that either the previous gc request was invalidated/discarded by a transaction or that the previous request is still pending in the system work queue.
The latter will happen if the gc interval is set to a very low value, e.g. 1ms, and system work queue is backlogged.
The sets refcount is 1 if no previous gc requeusts are queued, so add a helper for this and skip gc run if old requests are pending.
Add a helper for this and skip the gc run in this case.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 5 +++++ net/netfilter/nft_set_hash.c | 3 +++ net/netfilter/nft_set_rbtree.c | 3 +++ 3 files changed, 11 insertions(+)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -467,6 +467,11 @@ static inline void *nft_set_priv(const s return (void *)set->data; }
+static inline bool nft_set_gc_is_pending(const struct nft_set *s) +{ + return refcount_read(&s->refs) != 1; +} + static inline struct nft_set *nft_set_container_of(const void *priv) { return (void *)priv - offsetof(struct nft_set, data); --- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -312,6 +312,9 @@ static void nft_rhash_gc(struct work_str nft_net = net_generic(net, nf_tables_net_id); gc_seq = READ_ONCE(nft_net->gc_seq);
+ if (nft_set_gc_is_pending(set)) + goto done; + gc = nft_trans_gc_alloc(set, gc_seq, GFP_KERNEL); if (!gc) goto done; --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -615,6 +615,9 @@ static void nft_rbtree_gc(struct work_st nft_net = net_generic(net, nf_tables_net_id); gc_seq = READ_ONCE(nft_net->gc_seq);
+ if (nft_set_gc_is_pending(set)) + goto done; + gc = nft_trans_gc_alloc(set, gc_seq, GFP_KERNEL); if (!gc) goto done;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 2ee52ae94baabf7ee09cf2a8d854b990dac5d0e4 upstream.
New elements in this transaction might expired before such transaction ends. Skip sync GC for such elements otherwise commit path might walk over an already released object. Once transaction is finished, async GC will collect such expired element.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_set_rbtree.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -314,6 +314,7 @@ static int __nft_rbtree_insert(const str struct nft_rbtree_elem *rbe, *rbe_le = NULL, *rbe_ge = NULL; struct rb_node *node, *next, *parent, **p, *first = NULL; struct nft_rbtree *priv = nft_set_priv(set); + u8 cur_genmask = nft_genmask_cur(net); u8 genmask = nft_genmask_next(net); int d, err;
@@ -359,8 +360,11 @@ static int __nft_rbtree_insert(const str if (!nft_set_elem_active(&rbe->ext, genmask)) continue;
- /* perform garbage collection to avoid bogus overlap reports. */ - if (nft_set_elem_expired(&rbe->ext)) { + /* perform garbage collection to avoid bogus overlap reports + * but skip new elements in this transaction. + */ + if (nft_set_elem_expired(&rbe->ext) && + nft_set_elem_active(&rbe->ext, cur_genmask)) { err = nft_rbtree_gc_elem(set, priv, rbe, genmask); if (err < 0) return err;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 96b33300fba880ec0eafcf3d82486f3463b4b6da upstream.
rbtree GC does not modify the datastructure, instead it collects expired elements and it enqueues a GC transaction. Use a read spinlock instead to avoid data contention while GC worker is running.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_set_rbtree.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -626,8 +626,7 @@ static void nft_rbtree_gc(struct work_st if (!gc) goto done;
- write_lock_bh(&priv->lock); - write_seqcount_begin(&priv->count); + read_lock_bh(&priv->lock); for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) {
/* Ruleset has been updated, try later. */ @@ -676,8 +675,7 @@ dead_elem: }
try_later: - write_seqcount_end(&priv->count); - write_unlock_bh(&priv->lock); + read_unlock_bh(&priv->lock);
if (gc) nft_trans_gc_queue_async_done(gc);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit b079155faae94e9b3ab9337e82100a914ebb4e8d upstream.
Skip GC run if iterator rewinds to the beginning with EAGAIN, otherwise GC might collect the same element more than once.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_set_hash.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
--- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -324,12 +324,9 @@ static void nft_rhash_gc(struct work_str
while ((he = rhashtable_walk_next(&hti))) { if (IS_ERR(he)) { - if (PTR_ERR(he) != -EAGAIN) { - nft_trans_gc_destroy(gc); - gc = NULL; - goto try_later; - } - continue; + nft_trans_gc_destroy(gc); + gc = NULL; + goto try_later; }
/* Ruleset has been updated, try later. */
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit cf5000a7787cbc10341091d37245a42c119d26c5 upstream.
When more than 255 elements expired we're supposed to switch to a new gc container structure.
This never happens: u8 type will wrap before reaching the boundary and nft_trans_gc_space() always returns true.
This means we recycle the initial gc container structure and lose track of the elements that came before.
While at it, don't deref 'gc' after we've passed it to call_rcu.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Reported-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 2 +- net/netfilter/nf_tables_api.c | 10 ++++++++-- 2 files changed, 9 insertions(+), 3 deletions(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1440,7 +1440,7 @@ struct nft_trans_gc { struct net *net; struct nft_set *set; u32 seq; - u8 count; + u16 count; void *priv[NFT_TRANS_GC_BATCHCOUNT]; struct rcu_head rcu; }; --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7089,12 +7089,15 @@ static int nft_trans_gc_space(struct nft struct nft_trans_gc *nft_trans_gc_queue_async(struct nft_trans_gc *gc, unsigned int gc_seq, gfp_t gfp) { + struct nft_set *set; + if (nft_trans_gc_space(gc)) return gc;
+ set = gc->set; nft_trans_gc_queue_work(gc);
- return nft_trans_gc_alloc(gc->set, gc_seq, gfp); + return nft_trans_gc_alloc(set, gc_seq, gfp); } EXPORT_SYMBOL_GPL(nft_trans_gc_queue_async);
@@ -7111,15 +7114,18 @@ EXPORT_SYMBOL_GPL(nft_trans_gc_queue_asy
struct nft_trans_gc *nft_trans_gc_queue_sync(struct nft_trans_gc *gc, gfp_t gfp) { + struct nft_set *set; + if (WARN_ON_ONCE(!lockdep_commit_lock_is_held(gc->net))) return NULL;
if (nft_trans_gc_space(gc)) return gc;
+ set = gc->set; call_rcu(&gc->rcu, nft_trans_gc_trans_free);
- return nft_trans_gc_alloc(gc->set, 0, gfp); + return nft_trans_gc_alloc(set, 0, gfp); } EXPORT_SYMBOL_GPL(nft_trans_gc_queue_sync);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 6069da443bf65f513bb507bb21e2f87cfb1ad0b6 upstream.
Unregister flowtable hooks before they are releases via nf_tables_flowtable_destroy() otherwise hook core reports UAF.
BUG: KASAN: use-after-free in nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142 Read of size 4 at addr ffff8880736f7438 by task syz-executor579/3666
CPU: 0 PID: 3666 Comm: syz-executor579 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] __dump_stack lib/dump_stack.c:88 [inline] lib/dump_stack.c:106 dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106 lib/dump_stack.c:106 print_address_description+0x65/0x380 mm/kasan/report.c:247 mm/kasan/report.c:247 __kasan_report mm/kasan/report.c:433 [inline] __kasan_report mm/kasan/report.c:433 [inline] mm/kasan/report.c:450 kasan_report+0x19a/0x1f0 mm/kasan/report.c:450 mm/kasan/report.c:450 nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142 __nf_register_net_hook+0x27e/0x8d0 net/netfilter/core.c:429 net/netfilter/core.c:429 nf_register_net_hook+0xaa/0x180 net/netfilter/core.c:571 net/netfilter/core.c:571 nft_register_flowtable_net_hooks+0x3c5/0x730 net/netfilter/nf_tables_api.c:7232 net/netfilter/nf_tables_api.c:7232 nf_tables_newflowtable+0x2022/0x2cf0 net/netfilter/nf_tables_api.c:7430 net/netfilter/nf_tables_api.c:7430 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] net/netfilter/nfnetlink.c:652 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] net/netfilter/nfnetlink.c:652 nfnetlink_rcv+0x10e6/0x2550 net/netfilter/nfnetlink.c:652 net/netfilter/nfnetlink.c:652
__nft_release_hook() calls nft_unregister_flowtable_net_hooks() which only unregisters the hooks, then after RCU grace period, it is guaranteed that no packets add new entries to the flowtable (no flow offload rules and flowtable hooks are reachable from packet path), so it is safe to call nf_flow_table_free() which cleans up the remaining entries from the flowtable (both software and hardware) and it unbinds the flow_block.
Fixes: ff4bf2f42a40 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()") Reported-by: syzbot+e918523f77e62790d6d9@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8186,16 +8186,24 @@ int __nft_release_basechain(struct nft_c } EXPORT_SYMBOL_GPL(__nft_release_basechain);
+static void __nft_release_hook(struct net *net, struct nft_table *table) +{ + struct nft_flowtable *flowtable; + struct nft_chain *chain; + + list_for_each_entry(chain, &table->chains, list) + nf_tables_unregister_hook(net, table, chain); + list_for_each_entry(flowtable, &table->flowtables, list) + nft_unregister_flowtable_net_hooks(net, flowtable); +} + static void __nft_release_hooks(struct net *net) { struct nftables_pernet *nft_net = net_generic(net, nf_tables_net_id); struct nft_table *table; - struct nft_chain *chain;
- list_for_each_entry(table, &nft_net->tables, list) { - list_for_each_entry(chain, &table->chains, list) - nf_tables_unregister_hook(net, table, chain); - } + list_for_each_entry(table, &nft_net->tables, list) + __nft_release_hook(net, table); }
static void __nft_release_table(struct net *net, struct nft_table *table)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit f9a43007d3f7ba76d5e7f9421094f00f2ef202f8 upstream.
__nft_release_hooks() is called from pre_netns exit path which unregisters the hooks, then the NETDEV_UNREGISTER event is triggered which unregisters the hooks again.
[ 565.221461] WARNING: CPU: 18 PID: 193 at net/netfilter/core.c:495 __nf_unregister_net_hook+0x247/0x270 [...] [ 565.246890] CPU: 18 PID: 193 Comm: kworker/u64:1 Tainted: G E 5.18.0-rc7+ #27 [ 565.253682] Workqueue: netns cleanup_net [ 565.257059] RIP: 0010:__nf_unregister_net_hook+0x247/0x270 [...] [ 565.297120] Call Trace: [ 565.300900] <TASK> [ 565.304683] nf_tables_flowtable_event+0x16a/0x220 [nf_tables] [ 565.308518] raw_notifier_call_chain+0x63/0x80 [ 565.312386] unregister_netdevice_many+0x54f/0xb50
Unregister and destroy netdev hook from netns pre_exit via kfree_rcu so the NETDEV_UNREGISTER path see unregistered hooks.
Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 34 +++++++++++++++++++++++++++------- net/netfilter/nft_chain_filter.c | 3 +++ 2 files changed, 30 insertions(+), 7 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -219,9 +219,10 @@ static int nf_tables_register_hook(struc return nf_register_net_hook(net, ops); }
-static void nf_tables_unregister_hook(struct net *net, - const struct nft_table *table, - struct nft_chain *chain) +static void __nf_tables_unregister_hook(struct net *net, + const struct nft_table *table, + struct nft_chain *chain, + bool release_netdev) { const struct nft_base_chain *basechain; const struct nf_hook_ops *ops; @@ -236,6 +237,16 @@ static void nf_tables_unregister_hook(st return basechain->type->ops_unregister(net, ops);
nf_unregister_net_hook(net, ops); + if (release_netdev && + table->family == NFPROTO_NETDEV) + nft_base_chain(chain)->ops.dev = NULL; +} + +static void nf_tables_unregister_hook(struct net *net, + const struct nft_table *table, + struct nft_chain *chain) +{ + __nf_tables_unregister_hook(net, table, chain, false); }
static int nft_trans_table_add(struct nft_ctx *ctx, int msg_type) @@ -5997,8 +6008,9 @@ nft_flowtable_type_get(struct net *net, return ERR_PTR(-ENOENT); }
-static void nft_unregister_flowtable_net_hooks(struct net *net, - struct nft_flowtable *flowtable) +static void __nft_unregister_flowtable_net_hooks(struct net *net, + struct nft_flowtable *flowtable, + bool release_netdev) { int i;
@@ -6007,9 +6019,17 @@ static void nft_unregister_flowtable_net continue;
nf_unregister_net_hook(net, &flowtable->ops[i]); + if (release_netdev) + flowtable->ops[i].dev = NULL; } }
+static void nft_unregister_flowtable_net_hooks(struct net *net, + struct nft_flowtable *flowtable) +{ + __nft_unregister_flowtable_net_hooks(net, flowtable, false); +} + static int nf_tables_newflowtable(struct net *net, struct sock *nlsk, struct sk_buff *skb, const struct nlmsghdr *nlh, @@ -8192,9 +8212,9 @@ static void __nft_release_hook(struct ne struct nft_chain *chain;
list_for_each_entry(chain, &table->chains, list) - nf_tables_unregister_hook(net, table, chain); + __nf_tables_unregister_hook(net, table, chain, true); list_for_each_entry(flowtable, &table->flowtables, list) - nft_unregister_flowtable_net_hooks(net, flowtable); + __nft_unregister_flowtable_net_hooks(net, flowtable, true); }
static void __nft_release_hooks(struct net *net) --- a/net/netfilter/nft_chain_filter.c +++ b/net/netfilter/nft_chain_filter.c @@ -296,6 +296,9 @@ static void nft_netdev_event(unsigned lo if (strcmp(basechain->dev_name, dev->name) != 0) return;
+ if (!basechain->ops.dev) + return; + /* UNREGISTER events are also happpening on netns exit. * * Altough nf_tables core releases all tables/chains, only
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 0ce7cf4127f14078ca598ba9700d813178a59409 upstream.
Do not update table flags from the preparation phase. Store the flags update into the transaction, then update the flags from the commit phase.
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 9 ++++++--- net/netfilter/nf_tables_api.c | 31 ++++++++++++++++--------------- 2 files changed, 22 insertions(+), 18 deletions(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1392,13 +1392,16 @@ struct nft_trans_chain {
struct nft_trans_table { bool update; - bool enable; + u8 state; + u32 flags; };
#define nft_trans_table_update(trans) \ (((struct nft_trans_table *)trans->data)->update) -#define nft_trans_table_enable(trans) \ - (((struct nft_trans_table *)trans->data)->enable) +#define nft_trans_table_state(trans) \ + (((struct nft_trans_table *)trans->data)->state) +#define nft_trans_table_flags(trans) \ + (((struct nft_trans_table *)trans->data)->flags)
struct nft_trans_elem { struct nft_set *set; --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -893,6 +893,12 @@ static void nf_tables_table_disable(stru nft_table_disable(net, table, 0); }
+enum { + NFT_TABLE_STATE_UNCHANGED = 0, + NFT_TABLE_STATE_DORMANT, + NFT_TABLE_STATE_WAKEUP +}; + static int nf_tables_updtable(struct nft_ctx *ctx) { struct nft_trans *trans; @@ -916,19 +922,17 @@ static int nf_tables_updtable(struct nft
if ((flags & NFT_TABLE_F_DORMANT) && !(ctx->table->flags & NFT_TABLE_F_DORMANT)) { - nft_trans_table_enable(trans) = false; + nft_trans_table_state(trans) = NFT_TABLE_STATE_DORMANT; } else if (!(flags & NFT_TABLE_F_DORMANT) && ctx->table->flags & NFT_TABLE_F_DORMANT) { - ctx->table->flags &= ~NFT_TABLE_F_DORMANT; ret = nf_tables_table_enable(ctx->net, ctx->table); if (ret >= 0) - nft_trans_table_enable(trans) = true; - else - ctx->table->flags |= NFT_TABLE_F_DORMANT; + nft_trans_table_state(trans) = NFT_TABLE_STATE_WAKEUP; } if (ret < 0) goto err;
+ nft_trans_table_flags(trans) = flags; nft_trans_table_update(trans) = true; nft_trans_commit_list_add_tail(ctx->net, trans); return 0; @@ -7298,11 +7302,10 @@ static int nf_tables_commit(struct net * switch (trans->msg_type) { case NFT_MSG_NEWTABLE: if (nft_trans_table_update(trans)) { - if (!nft_trans_table_enable(trans)) { - nf_tables_table_disable(net, - trans->ctx.table); - trans->ctx.table->flags |= NFT_TABLE_F_DORMANT; - } + if (nft_trans_table_state(trans) == NFT_TABLE_STATE_DORMANT) + nf_tables_table_disable(net, trans->ctx.table); + + trans->ctx.table->flags = nft_trans_table_flags(trans); } else { nft_clear(net, trans->ctx.table); } @@ -7497,11 +7500,9 @@ static int __nf_tables_abort(struct net switch (trans->msg_type) { case NFT_MSG_NEWTABLE: if (nft_trans_table_update(trans)) { - if (nft_trans_table_enable(trans)) { - nf_tables_table_disable(net, - trans->ctx.table); - trans->ctx.table->flags |= NFT_TABLE_F_DORMANT; - } + if (nft_trans_table_state(trans) == NFT_TABLE_STATE_WAKEUP) + nf_tables_table_disable(net, trans->ctx.table); + nft_trans_destroy(trans); } else { list_del_rcu(&trans->ctx.table->list);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 179d9ba5559a756f4322583388b3213fe4e391b0 upstream.
The dormant flag need to be updated from the preparation phase, otherwise, two consecutive requests to dorm a table in the same batch might try to remove the same hooks twice, resulting in the following warning:
hook not found, pf 3 num 0 WARNING: CPU: 0 PID: 334 at net/netfilter/core.c:480 __nf_unregister_net_hook+0x1eb/0x610 net/netfilter/core.c:480 Modules linked in: CPU: 0 PID: 334 Comm: kworker/u4:5 Not tainted 5.12.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net RIP: 0010:__nf_unregister_net_hook+0x1eb/0x610 net/netfilter/core.c:480
This patch is a partial revert of 0ce7cf4127f1 ("netfilter: nftables: update table flags from the commit phase") to restore the previous behaviour.
However, there is still another problem: A batch containing a series of dorm-wakeup-dorm table and vice-versa also trigger the warning above since hook unregistration happens from the preparation phase, while hook registration occurs from the commit phase.
To fix this problem, this patch adds two internal flags to annotate the original dormant flag status which are __NFT_TABLE_F_WAS_DORMANT and __NFT_TABLE_F_WAS_AWAKEN, to restore it from the abort path.
The __NFT_TABLE_F_UPDATE bitmask allows to handle the dormant flag update with one single transaction.
Reported-by: syzbot+7ad5cd1615f2d89c6e7e@syzkaller.appspotmail.com Fixes: 0ce7cf4127f1 ("netfilter: nftables: update table flags from the commit phase") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 6 --- include/uapi/linux/netfilter/nf_tables.h | 1 net/netfilter/nf_tables_api.c | 59 +++++++++++++++++++++---------- 3 files changed, 41 insertions(+), 25 deletions(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1392,16 +1392,10 @@ struct nft_trans_chain {
struct nft_trans_table { bool update; - u8 state; - u32 flags; };
#define nft_trans_table_update(trans) \ (((struct nft_trans_table *)trans->data)->update) -#define nft_trans_table_state(trans) \ - (((struct nft_trans_table *)trans->data)->state) -#define nft_trans_table_flags(trans) \ - (((struct nft_trans_table *)trans->data)->flags)
struct nft_trans_elem { struct nft_set *set; --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -162,6 +162,7 @@ enum nft_hook_attributes { enum nft_table_flags { NFT_TABLE_F_DORMANT = 0x1, }; +#define NFT_TABLE_F_MASK (NFT_TABLE_F_DORMANT)
/** * enum nft_table_attributes - nf_tables table netlink attributes --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -701,7 +701,8 @@ static int nf_tables_fill_table_info(str goto nla_put_failure;
if (nla_put_string(skb, NFTA_TABLE_NAME, table->name) || - nla_put_be32(skb, NFTA_TABLE_FLAGS, htonl(table->flags)) || + nla_put_be32(skb, NFTA_TABLE_FLAGS, + htonl(table->flags & NFT_TABLE_F_MASK)) || nla_put_be32(skb, NFTA_TABLE_USE, htonl(table->use)) || nla_put_be64(skb, NFTA_TABLE_HANDLE, cpu_to_be64(table->handle), NFTA_TABLE_PAD)) @@ -890,20 +891,22 @@ err:
static void nf_tables_table_disable(struct net *net, struct nft_table *table) { + table->flags &= ~NFT_TABLE_F_DORMANT; nft_table_disable(net, table, 0); + table->flags |= NFT_TABLE_F_DORMANT; }
-enum { - NFT_TABLE_STATE_UNCHANGED = 0, - NFT_TABLE_STATE_DORMANT, - NFT_TABLE_STATE_WAKEUP -}; +#define __NFT_TABLE_F_INTERNAL (NFT_TABLE_F_MASK + 1) +#define __NFT_TABLE_F_WAS_DORMANT (__NFT_TABLE_F_INTERNAL << 0) +#define __NFT_TABLE_F_WAS_AWAKEN (__NFT_TABLE_F_INTERNAL << 1) +#define __NFT_TABLE_F_UPDATE (__NFT_TABLE_F_WAS_DORMANT | \ + __NFT_TABLE_F_WAS_AWAKEN)
static int nf_tables_updtable(struct nft_ctx *ctx) { struct nft_trans *trans; u32 flags; - int ret = 0; + int ret;
if (!ctx->nla[NFTA_TABLE_FLAGS]) return 0; @@ -922,21 +925,27 @@ static int nf_tables_updtable(struct nft
if ((flags & NFT_TABLE_F_DORMANT) && !(ctx->table->flags & NFT_TABLE_F_DORMANT)) { - nft_trans_table_state(trans) = NFT_TABLE_STATE_DORMANT; + ctx->table->flags |= NFT_TABLE_F_DORMANT; + if (!(ctx->table->flags & __NFT_TABLE_F_UPDATE)) + ctx->table->flags |= __NFT_TABLE_F_WAS_AWAKEN; } else if (!(flags & NFT_TABLE_F_DORMANT) && ctx->table->flags & NFT_TABLE_F_DORMANT) { - ret = nf_tables_table_enable(ctx->net, ctx->table); - if (ret >= 0) - nft_trans_table_state(trans) = NFT_TABLE_STATE_WAKEUP; + ctx->table->flags &= ~NFT_TABLE_F_DORMANT; + if (!(ctx->table->flags & __NFT_TABLE_F_UPDATE)) { + ret = nf_tables_table_enable(ctx->net, ctx->table); + if (ret < 0) + goto err_register_hooks; + + ctx->table->flags |= __NFT_TABLE_F_WAS_DORMANT; + } } - if (ret < 0) - goto err;
- nft_trans_table_flags(trans) = flags; nft_trans_table_update(trans) = true; nft_trans_commit_list_add_tail(ctx->net, trans); + return 0; -err: + +err_register_hooks: nft_trans_destroy(trans); return ret; } @@ -7302,10 +7311,14 @@ static int nf_tables_commit(struct net * switch (trans->msg_type) { case NFT_MSG_NEWTABLE: if (nft_trans_table_update(trans)) { - if (nft_trans_table_state(trans) == NFT_TABLE_STATE_DORMANT) + if (!(trans->ctx.table->flags & __NFT_TABLE_F_UPDATE)) { + nft_trans_destroy(trans); + break; + } + if (trans->ctx.table->flags & NFT_TABLE_F_DORMANT) nf_tables_table_disable(net, trans->ctx.table);
- trans->ctx.table->flags = nft_trans_table_flags(trans); + trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE; } else { nft_clear(net, trans->ctx.table); } @@ -7500,9 +7513,17 @@ static int __nf_tables_abort(struct net switch (trans->msg_type) { case NFT_MSG_NEWTABLE: if (nft_trans_table_update(trans)) { - if (nft_trans_table_state(trans) == NFT_TABLE_STATE_WAKEUP) + if (!(trans->ctx.table->flags & __NFT_TABLE_F_UPDATE)) { + nft_trans_destroy(trans); + break; + } + if (trans->ctx.table->flags & __NFT_TABLE_F_WAS_DORMANT) { nf_tables_table_disable(net, trans->ctx.table); - + trans->ctx.table->flags |= NFT_TABLE_F_DORMANT; + } else if (trans->ctx.table->flags & __NFT_TABLE_F_WAS_AWAKEN) { + trans->ctx.table->flags &= ~NFT_TABLE_F_DORMANT; + } + trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE; nft_trans_destroy(trans); } else { list_del_rcu(&trans->ctx.table->list);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit c9bd26513b3a11b3adb3c2ed8a31a01a87173ff1 upstream.
nft -f -<<EOF add table ip t add table ip t { flags dormant; } add chain ip t c { type filter hook input priority 0; } add table ip t EOF
Triggers a splat from nf core on next table delete because we lose track of right hook register state:
WARNING: CPU: 2 PID: 1597 at net/netfilter/core.c:501 __nf_unregister_net_hook RIP: 0010:__nf_unregister_net_hook+0x41b/0x570 nf_unregister_net_hook+0xb4/0xf0 __nf_tables_unregister_hook+0x160/0x1d0 [..]
The above should have table in *active* state, but in fact no hooks were registered.
Reject on/off/on games rather than attempting to fix this.
Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Reported-by: "Lee, Cherie-Anne" cherie.lee@starlabs.sg Cc: Bing-Jhong Billy Jheng billy@starlabs.sg Cc: info@starlabs.sg Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -918,6 +918,10 @@ static int nf_tables_updtable(struct nft if (flags == ctx->table->flags) return 0;
+ /* No dormant off/on/off/on games in single transaction */ + if (ctx->table->flags & __NFT_TABLE_F_UPDATE) + return -EINVAL; + trans = nft_trans_alloc(ctx, NFT_MSG_NEWTABLE, sizeof(struct nft_trans_table)); if (trans == NULL)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
3f0465a9ef02 ("netfilter: nf_tables: dynamically allocate hooks per net_device in flowtables") reworks flowtable support to allow for dynamic allocation of hooks, which implicitly fixes the following bogus EBUSY in transaction:
delete flowtable add flowtable # same flowtable with same devices, it hits EBUSY
This patch does not exist in any tree, but it fixes this issue for -stable Linux kernel 5.4
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -6132,6 +6132,9 @@ static int nf_tables_newflowtable(struct continue;
list_for_each_entry(ft, &table->flowtables, list) { + if (!nft_is_active_next(net, ft)) + continue; + for (k = 0; k < ft->ops_len; k++) { if (!ft->ops[k].dev) continue;
On Sat, 25 Nov 2023 at 00:52, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.262 release. There are 159 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 26 Nov 2023 17:19:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.262-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
As Daniel replied on 4.19 build failures, Following build warning / errors occurred on arm and arm64 on the stable-rc linux.5.4.y and linux-4.19.y.
tty/serial: Migrate meson_uart to use has_sysrq [ Upstream commit dca3ac8d3bc9436eb5fd35b80cdcad762fbfa518 ]
drivers/tty/serial/meson_uart.c: In function 'meson_uart_probe': drivers/tty/serial/meson_uart.c:728:13: error: 'struct uart_port' has no member named 'has_sysrq' 728 | port->has_sysrq = IS_ENABLED(CONFIG_SERIAL_MESON_CONSOLE); | ^~
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
-- Linaro LKFT https://lkft.linaro.org
On Sat, Nov 25, 2023 at 01:09:43AM +0530, Naresh Kamboju wrote:
On Sat, 25 Nov 2023 at 00:52, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.262 release. There are 159 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 26 Nov 2023 17:19:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.262-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
As Daniel replied on 4.19 build failures, Following build warning / errors occurred on arm and arm64 on the stable-rc linux.5.4.y and linux-4.19.y.
tty/serial: Migrate meson_uart to use has_sysrq [ Upstream commit dca3ac8d3bc9436eb5fd35b80cdcad762fbfa518 ]
drivers/tty/serial/meson_uart.c: In function 'meson_uart_probe': drivers/tty/serial/meson_uart.c:728:13: error: 'struct uart_port' has no member named 'has_sysrq' 728 | port->has_sysrq = IS_ENABLED(CONFIG_SERIAL_MESON_CONSOLE); | ^~
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Now fixed, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org