This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.45-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.1.45-rc1
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "drm/i915: Disable DC states for all commits"
Lijo Lazar lijo.lazar@amd.com drm/amdgpu: Use apt name for FW reserved region
Luben Tuikov luben.tuikov@amd.com drm/amdgpu: Remove unnecessary domain argument
Tong Liu01 Tong.Liu01@amd.com drm/amdgpu: add vram reservation based on vram_usagebyfirmware_v2_2
Mark Brown broonie@kernel.org arm64/ptrace: Don't enable SVE when setting streaming SVE
Namjae Jeon linkinjeon@kernel.org exfat: check if filename entries exceeds max filename length
Chao Yu chao@kernel.org f2fs: don't reset unchangable mount option in f2fs_remount()
Yangtao Li frank.li@vivo.com f2fs: fix to set flush_merge opt and show noflush_merge
Sean Christopherson seanjc@google.com selftests/rseq: Play nice with binaries statically linked against glibc 2.35+
Peichen Huang PeiChen.Huang@amd.com drm/amd/display: skip CLEAR_PAYLOAD_ID_TABLE if device mst_en is 0
Rodrigo Siqueira Rodrigo.Siqueira@amd.com drm/amd/display: Ensure that planes are in the same order
Alexander Stein alexander.stein@ew.tq-group.com drm/imx/ipuv3: Fix front porch adjustment upon hactive aligning
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/mm/altmap: Fix altmap boundary check
Christophe JAILLET christophe.jaillet@wanadoo.fr mtd: rawnand: fsl_upm: Fix an off-by one test in fun_exec_op()
Johan Jonker jbx6244@gmail.com mtd: rawnand: rockchip: Align hwecc vs. raw page helper layouts
Johan Jonker jbx6244@gmail.com mtd: rawnand: rockchip: fix oobfree offset and description
Roger Quadros rogerq@kernel.org mtd: rawnand: omap_elm: Fix incorrect type in assignment
Pavel Begunkov asml.silence@gmail.com io_uring: annotate offset timeout races
Chao Yu chao@kernel.org f2fs: fix to do sanity check on direct node in truncate_dnode()
Filipe Manana fdmanana@suse.com btrfs: remove BUG_ON()'s in add_new_free_space()
Jan Kara jack@suse.cz ext2: Drop fragment support
Jan Kara jack@suse.cz fs: Protect reconfiguration of sb read-write from racing writes
Alan Stern stern@rowland.harvard.edu net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp debugobjects: Recheck debug_objects_enabled before reporting
Sungwoo Kim iam@sung-woo.kim Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
Prince Kumar Maurya princekumarmaurya06@gmail.com fs/sysv: Null check to prevent null-ptr-deref bug
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_load_attr_list()
Roman Gushchin roman.gushchin@linux.dev mm: kmem: fix a NULL pointer dereference in obj_stock_flush_required()
Linus Torvalds torvalds@linux-foundation.org file: reinstate f_pos locking optimization for regular files
Hou Tao houtao1@huawei.com bpf, cpumap: Make sure kthread is running before map update returns
Geert Uytterhoeven geert+renesas@glider.be clk: imx93: Propagate correct error in imx93_clocks_probe()
Andi Shyti andi.shyti@linux.intel.com drm/i915/gt: Cleanup aux invalidation registers
Janusz Krzysztofik janusz.krzysztofik@linux.intel.com drm/i915: Fix premature release of request's reusable memory
Guchun Chen guchun.chen@amd.com drm/ttm: check null pointer before accessing when swapping
Aleksa Sarai cyphar@cyphar.com open: make RESOLVE_CACHED correctly test for O_TMPFILE
Mark Brown broonie@kernel.org arm64/fpsimd: Sync FPSIMD state with SVE for SME only systems
Mark Brown broonie@kernel.org arm64/fpsimd: Clear SME state in the target task when setting the VL
Mark Brown broonie@kernel.org arm64/fpsimd: Sync and zero pad FPSIMD state for streaming SVE
Naveen N Rao naveen@kernel.org powerpc/ftrace: Create a dummy stackframe to fix stack unwind
Jiri Olsa jolsa@kernel.org bpf: Disable preemption in bpf_event_output
Ilya Dryomov idryomov@gmail.com rbd: prevent busy loop when requesting exclusive lock
Michael Kelley mikelley@microsoft.com x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction
Paul Fertser fercerpav@gmail.com wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC)
Laszlo Ersek lersek@redhat.com net: tap_open(): set sk_uid from current_fsuid()
Laszlo Ersek lersek@redhat.com net: tun_chr_open(): set sk_uid from current_fsuid()
Dinh Nguyen dinguyen@kernel.org arm64: dts: stratix10: fix incorrect I2C property for SCL signal
Jiri Olsa jolsa@kernel.org bpf: Disable preemption in bpf_perf_event_output
Arseniy Krasnov AVKrasnov@sberdevices.ru mtd: rawnand: meson: fix OOB available bytes for ECC
Olivier Maignial olivier.maignial@hotmail.fr mtd: spinand: toshiba: Fix ecc_get_status
Sungjong Seo sj1557.seo@samsung.com exfat: release s_lock before calling dir_emit()
gaoming gaoming20@hihonor.com exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfree
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org firmware: arm_scmi: Drop OF node reference in the transport channel setup
Xiubo Li xiubli@redhat.com ceph: defer stopping mdsc delayed_work
Ross Maynard bids.7405@bigpond.com USB: zaurus: Add ID for A-300/B-500/C-700
Ilya Dryomov idryomov@gmail.com libceph: fix potential hang in ceph_osdc_notify()
Michael Kelley mikelley@microsoft.com scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices
Steffen Maier maier@linux.ibm.com scsi: zfcp: Defer fc_rport blocking until after ADISC response
Boqun Feng boqun.feng@gmail.com rust: allocator: Prevent mis-aligned allocation
Eric Dumazet edumazet@google.com tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
Eric Dumazet edumazet@google.com tcp_metrics: annotate data-races around tm->tcpm_net
Eric Dumazet edumazet@google.com tcp_metrics: annotate data-races around tm->tcpm_vals[]
Eric Dumazet edumazet@google.com tcp_metrics: annotate data-races around tm->tcpm_lock
Eric Dumazet edumazet@google.com tcp_metrics: annotate data-races around tm->tcpm_stamp
Eric Dumazet edumazet@google.com tcp_metrics: fix addr_same() helper
Jonas Gorski jonas.gorski@bisdn.de prestera: fix fallback to previous version on same major version
Jianbo Liu jianbol@nvidia.com net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio
Jianbo Liu jianbol@nvidia.com net/mlx5: fs_core: Make find_closest_ft more generic
Benjamin Poirier bpoirier@nvidia.com vxlan: Fix nexthop hash size
Yue Haibing yuehaibing@huawei.com ip6mr: Fix skb_under_panic in ip6mr_cache_report()
Alexandra Winter wintera@linux.ibm.com s390/qeth: Don't call dev_close/dev_open (DOWN/UP)
Lin Ma linma@zju.edu.cn net: dcb: choose correct policy to parse DCB_ATTR_BCN
Michael Chan michael.chan@broadcom.com bnxt_en: Fix max_mtu setting for multi-buf XDP
Somnath Kotur somnath.kotur@broadcom.com bnxt_en: Fix page pool logic for page size >= 64K
Mark Brown broonie@kernel.org net: netsec: Ignore 'phy-mode' on SynQuacer in DT mode
Yuanjun Gong ruc_gongyuanjun@163.com net: korina: handle clk prepare error in korina_probe()
Dan Carpenter dan.carpenter@linaro.org net: ll_temac: fix error checking of irq_of_parse_and_map()
Tomas Glozar tglozar@redhat.com bpf: sockmap: Remove preempt_disable in sock_map_sk_acquire
valis sec@valis.email net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
valis sec@valis.email net/sched: cls_fw: No longer copy tcf_result on update to avoid use-after-free
valis sec@valis.email net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
Hou Tao houtao1@huawei.com bpf, cpumap: Handle skb as well when clean up ptr_ring
Rafal Rogalski rafalx.rogalski@intel.com ice: Fix RDMA VSI removal during queue rebuild
Kuniyuki Iwashima kuniyu@amazon.com net/sched: taprio: Limit TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME to INT_MAX.
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_priority
Eric Dumazet edumazet@google.com net: add missing data-race annotation for sk_ll_usec
Eric Dumazet edumazet@google.com net: add missing data-race annotations around sk->sk_peek_off
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_mark
Eric Dumazet edumazet@google.com net: add missing READ_ONCE(sk->sk_rcvbuf) annotation
Eric Dumazet edumazet@google.com net: add missing READ_ONCE(sk->sk_sndbuf) annotation
Eric Dumazet edumazet@google.com net: add missing READ_ONCE(sk->sk_rcvlowat) annotation
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_max_pacing_rate
Eric Dumazet edumazet@google.com net: annotate data-race around sk->sk_txrehash
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_reserved_mem
Konstantin Khorenko khorenko@virtuozzo.com qed: Fix scheduling in a tasklet while getting stats
Chengfeng Ye dg573847474@gmail.com mISDN: hfcpci: Fix potential deadlock on &hc->lock
Jamal Hadi Salim jhs@mojatatu.com net: sched: cls_u32: Fix match key mis-addressing
Georg Müller georgmueller@gmx.net perf test uprobe_from_different_cu: Skip if there is no gcc
Yuanjun Gong ruc_gongyuanjun@163.com net: dsa: fix value check in bcm_sf2_sw_probe()
Lin Ma linma@zju.edu.cn rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length
Lin Ma linma@zju.edu.cn bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing
Jianbo Liu jianbol@nvidia.com net/mlx5e: Move representor neigh cleanup to profile cleanup_tx
Amir Tzin amirtz@nvidia.com net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set
Yuanjun Gong ruc_gongyuanjun@163.com net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
Zhengchao Shao shaozhengchao@huawei.com net/mlx5: fix potential memory leak in mlx5e_init_rep_rx
Zhengchao Shao shaozhengchao@huawei.com net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
Zhengchao Shao shaozhengchao@huawei.com net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups
Ilan Peer ilan.peer@intel.com wifi: cfg80211: Fix return value in scan logic
Gao Xiang xiang@kernel.org erofs: fix wrong primary bvec selection on deduplicated extents
Heiko Carstens hca@linux.ibm.com KVM: s390: fix sthyi error handling
ndesaulniers@google.com ndesaulniers@google.com word-at-a-time: use the same return type for has_zero regardless of endianness
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Fix chan_free cleanup on SMC
Yury Norov yury.norov@gmail.com lib/bitmap: workaround const_eval test build failure
Punit Agrawal punit.agrawal@bytedance.com firmware: smccc: Fix use of uninitialised results structure
Benjamin Gaignard benjamin.gaignard@collabora.com arm64: dts: freescale: Fix VPU G2 clock
Hugo Villeneuve hvilleneuve@dimonoff.com arm64: dts: imx8mn-var-som: add missing pull-up for onboard PHY reset pinmux
Yashwanth Varakala y.varakala@phytec.de arm64: dts: phycore-imx8mm: Correction in gpio-line-names
Yashwanth Varakala y.varakala@phytec.de arm64: dts: phycore-imx8mm: Label typo-fix of VPU
Tim Harvey tharvey@gateworks.com arm64: dts: imx8mm-venice-gw7904: disable disp_blk_ctrl
Tim Harvey tharvey@gateworks.com arm64: dts: imx8mm-venice-gw7903: disable disp_blk_ctrl
Robin Murphy robin.murphy@arm.com iommu/arm-smmu-v3: Document nesting-related errata
Robin Murphy robin.murphy@arm.com iommu/arm-smmu-v3: Add explicit feature for nesting
Robin Murphy robin.murphy@arm.com iommu/arm-smmu-v3: Document MMU-700 erratum 2812531
Robin Murphy robin.murphy@arm.com iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982
Alex Elder elder@linaro.org net: ipa: only reset hashed tables when supported
Shay Drory shayd@nvidia.com net/mlx5: Free irqs only on shutdown callback
Peter Zijlstra peterz@infradead.org perf: Fix function pointer case
Jens Axboe axboe@kernel.dk io_uring: gate iowait schedule on having pending requests
-------------
Diffstat:
Documentation/arm64/silicon-errata.rst | 4 + Makefile | 4 +- .../boot/dts/altera/socfpga_stratix10_socdk.dts | 2 +- .../dts/altera/socfpga_stratix10_socdk_nand.dts | 2 +- .../dts/freescale/imx8mm-phyboard-polis-rdk.dts | 2 +- .../boot/dts/freescale/imx8mm-phycore-som.dtsi | 4 +- .../boot/dts/freescale/imx8mm-venice-gw7903.dts | 4 + .../boot/dts/freescale/imx8mm-venice-gw7904.dts | 4 + arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi | 2 +- arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +- arch/arm64/kernel/fpsimd.c | 9 +- arch/arm64/kernel/ptrace.c | 8 +- arch/powerpc/include/asm/word-at-a-time.h | 2 +- arch/powerpc/kernel/trace/ftrace_mprofile.S | 9 +- arch/powerpc/mm/init_64.c | 3 +- arch/s390/kernel/sthyi.c | 6 +- arch/s390/kvm/intercept.c | 9 +- arch/x86/hyperv/hv_init.c | 21 +++++ drivers/block/rbd.c | 28 +++--- drivers/clk/imx/clk-imx93.c | 2 +- drivers/firmware/arm_scmi/mailbox.c | 4 +- drivers/firmware/arm_scmi/smc.c | 21 +++-- drivers/firmware/smccc/soc_id.c | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 104 +++++++++++++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 89 +++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 - drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++ drivers/gpu/drm/amd/display/dc/core/dc_link.c | 5 +- drivers/gpu/drm/amd/include/atomfirmware.h | 63 +++++++++++-- drivers/gpu/drm/i915/display/intel_display.c | 28 +----- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 8 +- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 16 ++-- drivers/gpu/drm/i915/gt/intel_lrc.c | 6 +- drivers/gpu/drm/i915/i915_active.c | 99 +++++++++++++------ drivers/gpu/drm/i915/i915_request.c | 11 +++ drivers/gpu/drm/imx/ipuv3-crtc.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 3 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 ++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 ++ drivers/isdn/hardware/mISDN/hfcpci.c | 10 +- drivers/mtd/nand/raw/fsl_upm.c | 2 +- drivers/mtd/nand/raw/meson_nand.c | 3 +- drivers/mtd/nand/raw/omap_elm.c | 24 ++--- drivers/mtd/nand/raw/rockchip-nand-controller.c | 45 +++++---- drivers/mtd/nand/spi/toshiba.c | 4 +- drivers/net/dsa/bcm_sf2.c | 8 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 59 +++++++----- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 6 +- drivers/net/ethernet/intel/ice/ice_main.c | 18 ++++ drivers/net/ethernet/korina.c | 3 +- .../net/ethernet/marvell/prestera/prestera_pci.c | 3 +- .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c | 4 +- .../mellanox/mlx5/core/en_accel/macsec_fs.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 10 ++ drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 20 ++-- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 105 ++++++++++++++++----- drivers/net/ethernet/mellanox/mlx5/core/mlx5_irq.h | 1 + drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 29 ++++++ .../ethernet/mellanox/mlx5/core/steering/dr_cmd.c | 5 +- drivers/net/ethernet/qlogic/qed/qed_dev_api.h | 16 ++++ drivers/net/ethernet/qlogic/qed/qed_fcoe.c | 19 +++- drivers/net/ethernet/qlogic/qed/qed_fcoe.h | 17 +++- drivers/net/ethernet/qlogic/qed/qed_hw.c | 26 ++++- drivers/net/ethernet/qlogic/qed/qed_iscsi.c | 19 +++- drivers/net/ethernet/qlogic/qed/qed_iscsi.h | 8 +- drivers/net/ethernet/qlogic/qed/qed_l2.c | 19 +++- drivers/net/ethernet/qlogic/qed/qed_l2.h | 24 +++++ drivers/net/ethernet/qlogic/qed/qed_main.c | 6 +- drivers/net/ethernet/socionext/netsec.c | 11 +++ drivers/net/ethernet/xilinx/ll_temac_main.c | 12 ++- drivers/net/ipa/ipa_table.c | 26 ++--- drivers/net/tap.c | 2 +- drivers/net/tun.c | 2 +- drivers/net/usb/cdc_ether.c | 21 +++++ drivers/net/usb/usbnet.c | 6 ++ drivers/net/usb/zaurus.c | 21 +++++ drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c | 6 +- drivers/s390/net/qeth_core.h | 1 - drivers/s390/net/qeth_core_main.c | 2 - drivers/s390/net/qeth_l2_main.c | 9 +- drivers/s390/net/qeth_l3_main.c | 8 +- drivers/s390/scsi/zfcp_fc.c | 6 +- drivers/scsi/storvsc_drv.c | 4 + fs/btrfs/block-group.c | 51 ++++++---- fs/btrfs/block-group.h | 4 +- fs/btrfs/free-space-tree.c | 24 +++-- fs/ceph/mds_client.c | 4 +- fs/ceph/mds_client.h | 5 + fs/ceph/super.c | 10 ++ fs/erofs/zdata.c | 7 +- fs/exfat/balloc.c | 6 +- fs/exfat/dir.c | 36 +++---- fs/ext2/ext2.h | 12 --- fs/ext2/super.c | 23 +---- fs/f2fs/f2fs.h | 1 - fs/f2fs/file.c | 5 - fs/f2fs/node.c | 14 ++- fs/f2fs/super.c | 43 ++++++--- fs/file.c | 18 +++- fs/ntfs3/attrlist.c | 4 +- fs/open.c | 2 +- fs/super.c | 11 ++- fs/sysv/itree.c | 4 + include/asm-generic/word-at-a-time.h | 2 +- include/linux/f2fs_fs.h | 1 + include/net/inet_sock.h | 7 +- include/net/ip.h | 2 +- include/net/route.h | 4 +- include/net/vxlan.h | 4 +- io_uring/io_uring.c | 23 +++-- io_uring/timeout.c | 2 +- kernel/bpf/cpumap.c | 35 ++++--- kernel/events/core.c | 8 +- kernel/trace/bpf_trace.c | 17 +++- lib/Makefile | 6 ++ lib/debugobjects.c | 9 ++ lib/test_bitmap.c | 8 +- mm/memcontrol.c | 19 ++-- net/bluetooth/l2cap_sock.c | 2 + net/ceph/osd_client.c | 20 ++-- net/core/bpf_sk_storage.c | 5 +- net/core/rtnetlink.c | 8 +- net/core/sock.c | 45 +++++---- net/core/sock_map.c | 2 - net/dcb/dcbnl.c | 2 +- net/dccp/ipv6.c | 4 +- net/ipv4/inet_diag.c | 4 +- net/ipv4/ip_output.c | 8 +- net/ipv4/ip_sockglue.c | 2 +- net/ipv4/raw.c | 2 +- net/ipv4/route.c | 4 +- net/ipv4/tcp_ipv4.c | 4 +- net/ipv4/tcp_metrics.c | 70 +++++++++----- net/ipv6/ip6mr.c | 2 +- net/ipv6/ping.c | 2 +- net/ipv6/raw.c | 6 +- net/ipv6/route.c | 7 +- net/ipv6/tcp_ipv6.c | 9 +- net/ipv6/udp.c | 4 +- net/l2tp/l2tp_ip6.c | 2 +- net/mptcp/sockopt.c | 2 +- net/netfilter/nft_socket.c | 2 +- net/netfilter/xt_socket.c | 4 +- net/packet/af_packet.c | 12 +-- net/sched/cls_fw.c | 1 - net/sched/cls_route.c | 1 - net/sched/cls_u32.c | 57 +++++++++-- net/sched/sch_taprio.c | 15 ++- net/smc/af_smc.c | 2 +- net/unix/af_unix.c | 2 +- net/wireless/scan.c | 2 +- net/xdp/xsk.c | 2 +- net/xfrm/xfrm_policy.c | 2 +- rust/bindings/bindings_helper.h | 1 + rust/kernel/allocator.rs | 74 ++++++++++++--- .../tests/shell/test_uprobe_from_different_cu.sh | 8 +- tools/testing/selftests/rseq/rseq.c | 28 ++++-- .../tc-testing/tc-tests/qdiscs/taprio.json | 25 +++++ 162 files changed, 1576 insertions(+), 647 deletions(-)
From: Jens Axboe axboe@kernel.dk
Commit 7b72d661f1f2f950ab8c12de7e2bc48bdac8ed69 upstream.
A previous commit made all cqring waits marked as iowait, as a way to improve performance for short schedules with pending IO. However, for use cases that have a special reaper thread that does nothing but wait on events on the ring, this causes a cosmetic issue where we know have one core marked as being "busy" with 100% iowait.
While this isn't a grave issue, it is confusing to users. Rather than always mark us as being in iowait, gate setting of current->in_iowait to 1 by whether or not the waiting task has pending requests.
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/io-uring/CAMEGJJ2RxopfNQ7GNLhr7X9=bHXKo+G5OOe0LUq=+U... Link: https://bugzilla.kernel.org/show_bug.cgi?id=217699 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217700 Reported-by: Oleksandr Natalenko oleksandr@natalenko.name Reported-by: Phil Elwell phil@raspberrypi.com Tested-by: Andres Freund andres@anarazel.de Fixes: 8a796565cec3 ("io_uring: Use io_schedule* in cqring wait") Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
--- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2349,12 +2349,21 @@ int io_run_task_work_sig(struct io_ring_ return 0; }
+static bool current_pending_io(void) +{ + struct io_uring_task *tctx = current->io_uring; + + if (!tctx) + return false; + return percpu_counter_read_positive(&tctx->inflight); +} + /* when returns >0, the caller should retry */ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, struct io_wait_queue *iowq, ktime_t *timeout) { - int token, ret; + int io_wait, ret; unsigned long check_cq;
/* make sure we run task_work before checking for signals */ @@ -2372,15 +2381,17 @@ static inline int io_cqring_wait_schedul }
/* - * Use io_schedule_prepare/finish, so cpufreq can take into account - * that the task is waiting for IO - turns out to be important for low - * QD IO. + * Mark us as being in io_wait if we have pending requests, so cpufreq + * can take into account that the task is waiting for IO - turns out + * to be important for low QD IO. */ - token = io_schedule_prepare(); + io_wait = current->in_iowait; + if (current_pending_io()) + current->in_iowait = 1; ret = 1; if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS)) ret = -ETIME; - io_schedule_finish(token); + current->in_iowait = io_wait; return ret; }
From: Peter Zijlstra peterz@infradead.org
commit 1af6239d1d3e61d33fd2f0ba53d3d1a67cc50574 upstream.
With the advent of CFI it is no longer acceptible to cast function pointers.
The robot complains thusly:
kernel-events-core.c:warning:cast-from-int-(-)(struct-perf_cpu_pmu_context-)-to-remote_function_f-(aka-int-(-)(void-)-)-converts-to-incompatible-function-type
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Cixi Geng cixi.geng1@unisoc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/events/core.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1133,6 +1133,11 @@ static int perf_mux_hrtimer_restart(stru return 0; }
+static int perf_mux_hrtimer_restart_ipi(void *arg) +{ + return perf_mux_hrtimer_restart(arg); +} + void perf_pmu_disable(struct pmu *pmu) { int *count = this_cpu_ptr(pmu->pmu_disable_count); @@ -11155,8 +11160,7 @@ perf_event_mux_interval_ms_store(struct cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu); cpuctx->hrtimer_interval = ns_to_ktime(NSEC_PER_MSEC * timer);
- cpu_function_call(cpu, - (remote_function_f)perf_mux_hrtimer_restart, cpuctx); + cpu_function_call(cpu, perf_mux_hrtimer_restart_ipi, cpuctx); } cpus_read_unlock(); mutex_unlock(&mux_interval_mutex);
From: Shay Drory shayd@nvidia.com
commit 9c2d08010963a61a171e8cb2852d3ce015b60cb4 upstream.
Whenever a shutdown is invoked, free irqs only and keep mlx5_irq synthetic wrapper intact in order to avoid use-after-free on system shutdown.
for example: ================================================================== BUG: KASAN: use-after-free in _find_first_bit+0x66/0x80 Read of size 8 at addr ffff88823fc0d318 by task kworker/u192:0/13608
CPU: 25 PID: 13608 Comm: kworker/u192:0 Tainted: G B W O 6.1.21-cloudflare-kasan-2023.3.21 #1 Hardware name: GIGABYTE R162-R2-GEN0/MZ12-HD2-CD, BIOS R14 05/03/2021 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x34/0x48 print_report+0x170/0x473 ? _find_first_bit+0x66/0x80 kasan_report+0xad/0x130 ? _find_first_bit+0x66/0x80 _find_first_bit+0x66/0x80 mlx5e_open_channels+0x3c5/0x3a10 [mlx5_core] ? console_unlock+0x2fa/0x430 ? _raw_spin_lock_irqsave+0x8d/0xf0 ? _raw_spin_unlock_irqrestore+0x42/0x80 ? preempt_count_add+0x7d/0x150 ? __wake_up_klogd.part.0+0x7d/0xc0 ? vprintk_emit+0xfe/0x2c0 ? mlx5e_trigger_napi_sched+0x40/0x40 [mlx5_core] ? dev_attr_show.cold+0x35/0x35 ? devlink_health_do_dump.part.0+0x174/0x340 ? devlink_health_report+0x504/0x810 ? mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core] ? mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core] ? process_one_work+0x680/0x1050 mlx5e_safe_switch_params+0x156/0x220 [mlx5_core] ? mlx5e_switch_priv_channels+0x310/0x310 [mlx5_core] ? mlx5_eq_poll_irq_disabled+0xb6/0x100 [mlx5_core] mlx5e_tx_reporter_timeout_recover+0x123/0x240 [mlx5_core] ? __mutex_unlock_slowpath.constprop.0+0x2b0/0x2b0 devlink_health_reporter_recover+0xa6/0x1f0 devlink_health_report+0x2f7/0x810 ? vsnprintf+0x854/0x15e0 mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core] ? mlx5e_reporter_tx_err_cqe+0x1a0/0x1a0 [mlx5_core] ? mlx5e_tx_reporter_timeout_dump+0x50/0x50 [mlx5_core] ? mlx5e_tx_reporter_dump_sq+0x260/0x260 [mlx5_core] ? newidle_balance+0x9b7/0xe30 ? psi_group_change+0x6a7/0xb80 ? mutex_lock+0x96/0xf0 ? __mutex_lock_slowpath+0x10/0x10 mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core] process_one_work+0x680/0x1050 worker_thread+0x5a0/0xeb0 ? process_one_work+0x1050/0x1050 kthread+0x2a2/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK>
Freed by task 1: kasan_save_stack+0x23/0x50 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x169/0x1d0 slab_free_freelist_hook+0xd2/0x190 __kmem_cache_free+0x1a1/0x2f0 irq_pool_free+0x138/0x200 [mlx5_core] mlx5_irq_table_destroy+0xf6/0x170 [mlx5_core] mlx5_core_eq_free_irqs+0x74/0xf0 [mlx5_core] shutdown+0x194/0x1aa [mlx5_core] pci_device_shutdown+0x75/0x120 device_shutdown+0x35c/0x620 kernel_restart+0x60/0xa0 __do_sys_reboot+0x1cb/0x2c0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x4b/0xb5
The buggy address belongs to the object at ffff88823fc0d300 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 24 bytes inside of 192-byte region [ffff88823fc0d300, ffff88823fc0d3c0)
The buggy address belongs to the physical page: page:0000000010139587 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x23fc0c head:0000000010139587 order:1 compound_mapcount:0 compound_pincount:0 flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004ca00 raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff88823fc0d200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88823fc0d280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff88823fc0d300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff88823fc0d380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff88823fc0d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== general protection fault, probably for non-canonical address 0xdffffc005c40d7ac: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: probably user-memory-access in range [0x00000002e206bd60-0x00000002e206bd67] CPU: 25 PID: 13608 Comm: kworker/u192:0 Tainted: G B W O 6.1.21-cloudflare-kasan-2023.3.21 #1 Hardware name: GIGABYTE R162-R2-GEN0/MZ12-HD2-CD, BIOS R14 05/03/2021 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core] RIP: 0010:__alloc_pages+0x141/0x5c0 Call Trace: <TASK> ? sysvec_apic_timer_interrupt+0xa0/0xc0 ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? __alloc_pages_slowpath.constprop.0+0x1ec0/0x1ec0 ? _raw_spin_unlock_irqrestore+0x3d/0x80 __kmalloc_large_node+0x80/0x120 ? kvmalloc_node+0x4e/0x170 __kmalloc_node+0xd4/0x150 kvmalloc_node+0x4e/0x170 mlx5e_open_channels+0x631/0x3a10 [mlx5_core] ? console_unlock+0x2fa/0x430 ? _raw_spin_lock_irqsave+0x8d/0xf0 ? _raw_spin_unlock_irqrestore+0x42/0x80 ? preempt_count_add+0x7d/0x150 ? __wake_up_klogd.part.0+0x7d/0xc0 ? vprintk_emit+0xfe/0x2c0 ? mlx5e_trigger_napi_sched+0x40/0x40 [mlx5_core] ? dev_attr_show.cold+0x35/0x35 ? devlink_health_do_dump.part.0+0x174/0x340 ? devlink_health_report+0x504/0x810 ? mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core] ? mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core] ? process_one_work+0x680/0x1050 mlx5e_safe_switch_params+0x156/0x220 [mlx5_core] ? mlx5e_switch_priv_channels+0x310/0x310 [mlx5_core] ? mlx5_eq_poll_irq_disabled+0xb6/0x100 [mlx5_core] mlx5e_tx_reporter_timeout_recover+0x123/0x240 [mlx5_core] ? __mutex_unlock_slowpath.constprop.0+0x2b0/0x2b0 devlink_health_reporter_recover+0xa6/0x1f0 devlink_health_report+0x2f7/0x810 ? vsnprintf+0x854/0x15e0 mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core] ? mlx5e_reporter_tx_err_cqe+0x1a0/0x1a0 [mlx5_core] ? mlx5e_tx_reporter_timeout_dump+0x50/0x50 [mlx5_core] ? mlx5e_tx_reporter_dump_sq+0x260/0x260 [mlx5_core] ? newidle_balance+0x9b7/0xe30 ? psi_group_change+0x6a7/0xb80 ? mutex_lock+0x96/0xf0 ? __mutex_lock_slowpath+0x10/0x10 mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core] process_one_work+0x680/0x1050 worker_thread+0x5a0/0xeb0 ? process_one_work+0x1050/0x1050 kthread+0x2a2/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> ---[ end trace 0000000000000000 ]--- RIP: 0010:__alloc_pages+0x141/0x5c0 Code: e0 39 a3 96 89 e9 b8 22 01 32 01 83 e1 0f 48 89 fa 01 c9 48 c1 ea 03 d3 f8 83 e0 03 89 44 24 6c 48 b8 00 00 00 00 00 fc ff df <80> 3c 02 00 0f 85 fc 03 00 00 89 e8 4a 8b 14 f5 e0 39 a3 96 4c 89 RSP: 0018:ffff888251f0f438 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 1ffff1104a3e1e8b RCX: 0000000000000000 RDX: 000000005c40d7ac RSI: 0000000000000003 RDI: 00000002e206bd60 RBP: 0000000000052dc0 R08: ffff8882b0044218 R09: ffff8882b0045e8a R10: fffffbfff300fefc R11: ffff888167af4000 R12: 0000000000000003 R13: 0000000000000000 R14: 00000000696c7070 R15: ffff8882373f4380 FS: 0000000000000000(0000) GS:ffff88bf2be80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005641d031eee8 CR3: 0000002e7ca14000 CR4: 0000000000350ee0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]---]
Reported-by: Frederick Lawler fred@cloudflare.com Link: https://lore.kernel.org/netdev/be5b9271-7507-19c5-ded1-fa78f1980e69@cloudfla... Signed-off-by: Shay Drory shayd@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com [hardik: Refer to the irqn member of the mlx5_irq struct, instead of the msi_map, since we don't have upstream v6.4 commit 235a25fe28de ("net/mlx5: Modify struct mlx5_irq to use struct msi_map")]. [hardik: Refer to the pf_pool member of the mlx5_irq_table struct, instead of pcif_pool, since we don't have upstream v6.4 commit 8bebfd767909 ("net/mlx5: Improve naming of pci function vectors")]. Signed-off-by: Hardik Garg hargar@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 - drivers/net/ethernet/mellanox/mlx5/core/mlx5_irq.h | 1 drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 29 +++++++++++++++++++++ 3 files changed, 31 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c @@ -1061,7 +1061,7 @@ void mlx5_core_eq_free_irqs(struct mlx5_ mutex_lock(&table->lock); /* sync with create/destroy_async_eq */ if (!mlx5_core_is_sf(dev)) clear_rmap(dev); - mlx5_irq_table_destroy(dev); + mlx5_irq_table_free_irqs(dev); mutex_unlock(&table->lock); }
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_irq.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_irq.h @@ -14,6 +14,7 @@ int mlx5_irq_table_init(struct mlx5_core void mlx5_irq_table_cleanup(struct mlx5_core_dev *dev); int mlx5_irq_table_create(struct mlx5_core_dev *dev); void mlx5_irq_table_destroy(struct mlx5_core_dev *dev); +void mlx5_irq_table_free_irqs(struct mlx5_core_dev *dev); int mlx5_irq_table_get_num_comp(struct mlx5_irq_table *table); int mlx5_irq_table_get_sfs_vec(struct mlx5_irq_table *table); struct mlx5_irq_table *mlx5_irq_table_get(struct mlx5_core_dev *dev); --- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c @@ -591,6 +591,24 @@ static void irq_pools_destroy(struct mlx irq_pool_free(table->pf_pool); }
+static void mlx5_irq_pool_free_irqs(struct mlx5_irq_pool *pool) +{ + struct mlx5_irq *irq; + unsigned long index; + + xa_for_each(&pool->irqs, index, irq) + free_irq(irq->irqn, &irq->nh); +} + +static void mlx5_irq_pools_free_irqs(struct mlx5_irq_table *table) +{ + if (table->sf_ctrl_pool) { + mlx5_irq_pool_free_irqs(table->sf_comp_pool); + mlx5_irq_pool_free_irqs(table->sf_ctrl_pool); + } + mlx5_irq_pool_free_irqs(table->pf_pool); +} + /* irq_table API */
int mlx5_irq_table_init(struct mlx5_core_dev *dev) @@ -670,6 +688,17 @@ void mlx5_irq_table_destroy(struct mlx5_ pci_free_irq_vectors(dev->pdev); }
+void mlx5_irq_table_free_irqs(struct mlx5_core_dev *dev) +{ + struct mlx5_irq_table *table = dev->priv.irq_table; + + if (mlx5_core_is_sf(dev)) + return; + + mlx5_irq_pools_free_irqs(table); + pci_free_irq_vectors(dev->pdev); +} + int mlx5_irq_table_get_sfs_vec(struct mlx5_irq_table *table) { if (table->sf_comp_pool)
From: Alex Elder elder@linaro.org
commit e11ec2b868af2b351c6c1e2e50eb711cc5423a10 upstream.
Last year, the code that manages GSI channel transactions switched from using spinlock-protected linked lists to using indexes into the ring buffer used for a channel. Recently, Google reported seeing transaction reference count underflows occasionally during shutdown.
Doug Anderson found a way to reproduce the issue reliably, and bisected the issue to the commit that eliminated the linked lists and the lock. The root cause was ultimately determined to be related to unused transactions being committed as part of the modem shutdown cleanup activity. Unused transactions are not normally expected (except in error cases).
The modem uses some ranges of IPA-resident memory, and whenever it shuts down we zero those ranges. In ipa_filter_reset_table() a transaction is allocated to zero modem filter table entries. If hashing is not supported, hashed table memory should not be zeroed. But currently nothing prevents that, and the result is an unused transaction. Something similar occurs when we zero routing table entries for the modem.
By preventing any attempt to clear hashed tables when hashing is not supported, the reference count underflow is avoided in this case.
Note that there likely remains an issue with properly freeing unused transactions (if they occur due to errors). This patch addresses only the underflows that Google originally reported.
Cc: stable@vger.kernel.org # 6.1.x Fixes: d338ae28d8a8 ("net: ipa: kill all other transaction lists") Tested-by: Douglas Anderson dianders@chromium.org Signed-off-by: Alex Elder elder@linaro.org Link: https://lore.kernel.org/r/20230724224055.1688854-1-elder@linaro.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Alex Elder elder@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ipa/ipa_table.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
--- a/drivers/net/ipa/ipa_table.c +++ b/drivers/net/ipa/ipa_table.c @@ -311,16 +311,15 @@ static int ipa_filter_reset(struct ipa * if (ret) return ret;
- ret = ipa_filter_reset_table(ipa, IPA_MEM_V4_FILTER_HASHED, modem); - if (ret) + ret = ipa_filter_reset_table(ipa, IPA_MEM_V6_FILTER, modem); + if (ret || !ipa_table_hash_support(ipa)) return ret;
- ret = ipa_filter_reset_table(ipa, IPA_MEM_V6_FILTER, modem); + ret = ipa_filter_reset_table(ipa, IPA_MEM_V4_FILTER_HASHED, modem); if (ret) return ret; - ret = ipa_filter_reset_table(ipa, IPA_MEM_V6_FILTER_HASHED, modem);
- return ret; + return ipa_filter_reset_table(ipa, IPA_MEM_V6_FILTER_HASHED, modem); }
/* The AP routes and modem routes are each contiguous within the @@ -329,11 +328,12 @@ static int ipa_filter_reset(struct ipa * * */ static int ipa_route_reset(struct ipa *ipa, bool modem) { + bool hash_support = ipa_table_hash_support(ipa); struct gsi_trans *trans; u16 first; u16 count;
- trans = ipa_cmd_trans_alloc(ipa, 4); + trans = ipa_cmd_trans_alloc(ipa, hash_support ? 4 : 2); if (!trans) { dev_err(&ipa->pdev->dev, "no transaction for %s route reset\n", @@ -350,12 +350,14 @@ static int ipa_route_reset(struct ipa *i }
ipa_table_reset_add(trans, false, first, count, IPA_MEM_V4_ROUTE); - ipa_table_reset_add(trans, false, first, count, - IPA_MEM_V4_ROUTE_HASHED); - ipa_table_reset_add(trans, false, first, count, IPA_MEM_V6_ROUTE); - ipa_table_reset_add(trans, false, first, count, - IPA_MEM_V6_ROUTE_HASHED); + + if (hash_support) { + ipa_table_reset_add(trans, false, first, count, + IPA_MEM_V4_ROUTE_HASHED); + ipa_table_reset_add(trans, false, first, count, + IPA_MEM_V6_ROUTE_HASHED); + }
gsi_trans_commit_wait(trans);
From: Robin Murphy robin.murphy@arm.com
commit f322e8af35c7f23a8c08b595c38d6c855b2d836f upstream
MMU-600 versions prior to r1p0 fail to correctly generate a WFE wakeup event when the command queue transitions fom full to non-full. We can easily work around this by simply hiding the SEV capability such that we fall back to polling for space in the queue - since MMU-600 implements MSIs we wouldn't expect to need SEV for sync completion either, so this should have little to no impact.
Signed-off-by: Robin Murphy robin.murphy@arm.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Nicolin Chen nicolinc@nvidia.com Link: https://lore.kernel.org/r/08adbe3d01024d8382a478325f73b56851f76e49.168373125... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Easwar Hariharan eahariha@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/arm64/silicon-errata.rst | 2 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++++++++++++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 +++++ 3 files changed, 37 insertions(+)
--- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -141,6 +141,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | MMU-500 | #841119,826419 | N/A | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | MMU-600 | #1076982 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ | Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 | +----------------+-----------------+-----------------+-----------------------------+ --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3410,6 +3410,33 @@ static int arm_smmu_device_reset(struct return 0; }
+#define IIDR_IMPLEMENTER_ARM 0x43b +#define IIDR_PRODUCTID_ARM_MMU_600 0x483 + +static void arm_smmu_device_iidr_probe(struct arm_smmu_device *smmu) +{ + u32 reg; + unsigned int implementer, productid, variant, revision; + + reg = readl_relaxed(smmu->base + ARM_SMMU_IIDR); + implementer = FIELD_GET(IIDR_IMPLEMENTER, reg); + productid = FIELD_GET(IIDR_PRODUCTID, reg); + variant = FIELD_GET(IIDR_VARIANT, reg); + revision = FIELD_GET(IIDR_REVISION, reg); + + switch (implementer) { + case IIDR_IMPLEMENTER_ARM: + switch (productid) { + case IIDR_PRODUCTID_ARM_MMU_600: + /* Arm erratum 1076982 */ + if (variant == 0 && revision <= 2) + smmu->features &= ~ARM_SMMU_FEAT_SEV; + break; + } + break; + } +} + static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) { u32 reg; @@ -3615,6 +3642,8 @@ static int arm_smmu_device_hw_probe(stru
smmu->ias = max(smmu->ias, smmu->oas);
+ arm_smmu_device_iidr_probe(smmu); + if (arm_smmu_sva_supported(smmu)) smmu->features |= ARM_SMMU_FEAT_SVA;
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -69,6 +69,12 @@ #define IDR5_VAX GENMASK(11, 10) #define IDR5_VAX_52_BIT 1
+#define ARM_SMMU_IIDR 0x18 +#define IIDR_PRODUCTID GENMASK(31, 20) +#define IIDR_VARIANT GENMASK(19, 16) +#define IIDR_REVISION GENMASK(15, 12) +#define IIDR_IMPLEMENTER GENMASK(11, 0) + #define ARM_SMMU_CR0 0x20 #define CR0_ATSCHK (1 << 4) #define CR0_CMDQEN (1 << 3)
From: Robin Murphy robin.murphy@arm.com
commit 309a15cb16bb075da1c99d46fb457db6a1a2669e upstream
To work around MMU-700 erratum 2812531 we need to ensure that certain sequences of commands cannot be issued without an intervening sync. In practice this falls out of our current command-batching machinery anyway - each batch only contains a single type of invalidation command, and ends with a sync. The only exception is when a batch is sufficiently large to need issuing across multiple command queue slots, wherein the earlier slots will not contain a sync and thus may in theory interleave with another batch being issued in parallel to create an affected sequence across the slot boundary.
Since MMU-700 supports range invalidate commands and thus we will prefer to use them (which also happens to avoid conditions for other errata), I'm not entirely sure it's even possible for a single high-level invalidate call to generate a batch of more than 63 commands, but for the sake of robustness and documentation, wire up an option to enforce that a sync is always inserted for every slot issued.
The other aspect is that the relative order of DVM commands cannot be controlled, so DVM cannot be used. Again that is already the status quo, but since we have at least defined ARM_SMMU_FEAT_BTM, we can explicitly disable it for documentation purposes even if it's not wired up anywhere yet.
Signed-off-by: Robin Murphy robin.murphy@arm.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Link: https://lore.kernel.org/r/330221cdfd0003cd51b6c04e7ff3566741ad8374.168373125... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Easwar Hariharan eahariha@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/arm64/silicon-errata.rst | 2 ++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 12 ++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 + 3 files changed, 15 insertions(+)
--- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -143,6 +143,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | MMU-600 | #1076982 | N/A | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | MMU-700 | #2812531 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ | Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 | +----------------+-----------------+-----------------+-----------------------------+ --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -882,6 +882,12 @@ static void arm_smmu_cmdq_batch_add(stru { int index;
+ if (cmds->num == CMDQ_BATCH_ENTRIES - 1 && + (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) { + arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, true); + cmds->num = 0; + } + if (cmds->num == CMDQ_BATCH_ENTRIES) { arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, false); cmds->num = 0; @@ -3412,6 +3418,7 @@ static int arm_smmu_device_reset(struct
#define IIDR_IMPLEMENTER_ARM 0x43b #define IIDR_PRODUCTID_ARM_MMU_600 0x483 +#define IIDR_PRODUCTID_ARM_MMU_700 0x487
static void arm_smmu_device_iidr_probe(struct arm_smmu_device *smmu) { @@ -3432,6 +3439,11 @@ static void arm_smmu_device_iidr_probe(s if (variant == 0 && revision <= 2) smmu->features &= ~ARM_SMMU_FEAT_SEV; break; + case IIDR_PRODUCTID_ARM_MMU_700: + /* Arm erratum 2812531 */ + smmu->features &= ~ARM_SMMU_FEAT_BTM; + smmu->options |= ARM_SMMU_OPT_CMDQ_FORCE_SYNC; + break; } break; } --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -650,6 +650,7 @@ struct arm_smmu_device { #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0) #define ARM_SMMU_OPT_PAGE0_REGS_ONLY (1 << 1) #define ARM_SMMU_OPT_MSIPOLL (1 << 2) +#define ARM_SMMU_OPT_CMDQ_FORCE_SYNC (1 << 3) u32 options;
struct arm_smmu_cmdq cmdq;
From: Robin Murphy robin.murphy@arm.com
commit 1d9777b9f3d55b4b6faf186ba4f1d6fb560c0523 upstream
In certain cases we may want to refuse to allow nested translation even when both stages are implemented, so let's add an explicit feature for nesting support which we can control in its own right. For now this merely serves as documentation, but it means a nice convenient check will be ready and waiting for the future nesting code.
Signed-off-by: Robin Murphy robin.murphy@arm.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Link: https://lore.kernel.org/r/136c3f4a3a84cc14a5a1978ace57dfd3ed67b688.168373125... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Easwar Hariharan eahariha@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 + 2 files changed, 5 insertions(+)
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3654,6 +3654,10 @@ static int arm_smmu_device_hw_probe(stru
smmu->ias = max(smmu->ias, smmu->oas);
+ if ((smmu->features & ARM_SMMU_FEAT_TRANS_S1) && + (smmu->features & ARM_SMMU_FEAT_TRANS_S2)) + smmu->features |= ARM_SMMU_FEAT_NESTING; + arm_smmu_device_iidr_probe(smmu);
if (arm_smmu_sva_supported(smmu)) --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -645,6 +645,7 @@ struct arm_smmu_device { #define ARM_SMMU_FEAT_BTM (1 << 16) #define ARM_SMMU_FEAT_SVA (1 << 17) #define ARM_SMMU_FEAT_E2H (1 << 18) +#define ARM_SMMU_FEAT_NESTING (1 << 19) u32 features;
#define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
From: Robin Murphy robin.murphy@arm.com
commit 0bfbfc526c70606bf0fad302e4821087cbecfaf4 upstream
Both MMU-600 and MMU-700 have similar errata around TLB invalidation while both stages of translation are active, which will need some consideration once nesting support is implemented. For now, though, it's very easy to make our implicit lack of nesting support explicit for those cases, so they're less likely to be missed in future.
Signed-off-by: Robin Murphy robin.murphy@arm.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Link: https://lore.kernel.org/r/696da78d32bb4491f898f11b0bb4d850a8aa7c6a.168373125... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Easwar Hariharan eahariha@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/arm64/silicon-errata.rst | 4 ++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +++++ 2 files changed, 7 insertions(+), 2 deletions(-)
--- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -141,9 +141,9 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | MMU-500 | #841119,826419 | N/A | +----------------+-----------------+-----------------+-----------------------------+ -| ARM | MMU-600 | #1076982 | N/A | +| ARM | MMU-600 | #1076982,1209401| N/A | +----------------+-----------------+-----------------+-----------------------------+ -| ARM | MMU-700 | #2812531 | N/A | +| ARM | MMU-700 | #2268618,2812531| N/A | +----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ | Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 | --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3438,11 +3438,16 @@ static void arm_smmu_device_iidr_probe(s /* Arm erratum 1076982 */ if (variant == 0 && revision <= 2) smmu->features &= ~ARM_SMMU_FEAT_SEV; + /* Arm erratum 1209401 */ + if (variant < 2) + smmu->features &= ~ARM_SMMU_FEAT_NESTING; break; case IIDR_PRODUCTID_ARM_MMU_700: /* Arm erratum 2812531 */ smmu->features &= ~ARM_SMMU_FEAT_BTM; smmu->options |= ARM_SMMU_OPT_CMDQ_FORCE_SYNC; + /* Arm errata 2268618, 2812531 */ + smmu->features &= ~ARM_SMMU_FEAT_NESTING; break; } break;
From: Tim Harvey tharvey@gateworks.com
[ Upstream commit 3e7d3c5e13b05dda9db92d98803a626378e75438 ]
The GW7903 does not connect the VDD_MIPI power rails thus MIPI is disabled. However we must also disable disp_blk_ctrl as it uses the pgc_mipi power domain and without it being disabled imx8m-blk-ctrl will fail to probe: imx8m-blk-ctrl 32e28000.blk-ctrl: error -ETIMEDOUT: failed to attach power domain "mipi-dsi" imx8m-blk-ctrl: probe of 32e28000.blk-ctrl failed with error -110
Fixes: a72ba91e5bc7 ("arm64: dts: imx: Add i.mx8mm Gateworks gw7903 dts support") Signed-off-by: Tim Harvey tharvey@gateworks.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mm-venice-gw7903.dts | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/arm64/boot/dts/freescale/imx8mm-venice-gw7903.dts b/arch/arm64/boot/dts/freescale/imx8mm-venice-gw7903.dts index 8e861b920d09e..7c9b60f4da922 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm-venice-gw7903.dts +++ b/arch/arm64/boot/dts/freescale/imx8mm-venice-gw7903.dts @@ -559,6 +559,10 @@ status = "okay"; };
+&disp_blk_ctrl { + status = "disabled"; +}; + &pgc_mipi { status = "disabled"; };
From: Tim Harvey tharvey@gateworks.com
[ Upstream commit f7a0b57524cf811ac06257a5099f1b7c19ee7310 ]
The GW7904 does not connect the VDD_MIPI power rails thus MIPI is disabled. However we must also disable disp_blk_ctrl as it uses the pgc_mipi power domain and without it being disabled imx8m-blk-ctrl will fail to probe: imx8m-blk-ctrl 32e28000.blk-ctrl: error -ETIMEDOUT: failed to attach power domain "mipi-dsi" imx8m-blk-ctrl: probe of 32e28000.blk-ctrl failed with error -110
Fixes: b999bdaf0597 ("arm64: dts: imx: Add i.mx8mm Gateworks gw7904 dts support") Signed-off-by: Tim Harvey tharvey@gateworks.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mm-venice-gw7904.dts | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/arm64/boot/dts/freescale/imx8mm-venice-gw7904.dts b/arch/arm64/boot/dts/freescale/imx8mm-venice-gw7904.dts index a67771d021464..46a07dfc0086c 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm-venice-gw7904.dts +++ b/arch/arm64/boot/dts/freescale/imx8mm-venice-gw7904.dts @@ -617,6 +617,10 @@ status = "okay"; };
+&disp_blk_ctrl { + status = "disabled"; +}; + &pgc_mipi { status = "disabled"; };
From: Yashwanth Varakala y.varakala@phytec.de
[ Upstream commit cddeefc1663294fb74b31ff5029a83c0e819ff3a ]
Corrected the label of the VPU regulator node (buck 3) from reg_vdd_gpu to reg_vdd_vpu.
Fixes: ae6847f26ac9 ("arm64: dts: freescale: Add phyBOARD-Polis-i.MX8MM support") Signed-off-by: Yashwanth Varakala y.varakala@phytec.de Signed-off-by: Cem Tenruh c.tenruh@phytec.de Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mm-phycore-som.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mm-phycore-som.dtsi b/arch/arm64/boot/dts/freescale/imx8mm-phycore-som.dtsi index 995b44efb1b65..3e5e7d861882f 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm-phycore-som.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mm-phycore-som.dtsi @@ -210,7 +210,7 @@ }; };
- reg_vdd_gpu: buck3 { + reg_vdd_vpu: buck3 { regulator-always-on; regulator-boot-on; regulator-max-microvolt = <1000000>;
From: Yashwanth Varakala y.varakala@phytec.de
[ Upstream commit 1ef0aa137a96c5f0564f2db0c556a4f0f60ce8f5 ]
Remove unused nINT_ETHPHY entry from gpio-line-names in gpio1 nodes of phyCORE-i.MX8MM and phyBOARD-Polis-i.MX8MM devicetrees.
Fixes: ae6847f26ac9 ("arm64: dts: freescale: Add phyBOARD-Polis-i.MX8MM support") Signed-off-by: Yashwanth Varakala y.varakala@phytec.de Signed-off-by: Cem Tenruh c.tenruh@phytec.de Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mm-phyboard-polis-rdk.dts | 2 +- arch/arm64/boot/dts/freescale/imx8mm-phycore-som.dtsi | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mm-phyboard-polis-rdk.dts b/arch/arm64/boot/dts/freescale/imx8mm-phyboard-polis-rdk.dts index 4a3df2b77b0be..6720ddf597839 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm-phyboard-polis-rdk.dts +++ b/arch/arm64/boot/dts/freescale/imx8mm-phyboard-polis-rdk.dts @@ -141,7 +141,7 @@ };
&gpio1 { - gpio-line-names = "nINT_ETHPHY", "LED_RED", "WDOG_INT", "X_RTC_INT", + gpio-line-names = "", "LED_RED", "WDOG_INT", "X_RTC_INT", "", "", "", "RESET_ETHPHY", "CAN_nINT", "CAN_EN", "nENABLE_FLATLINK", "", "USB_OTG_VBUS_EN", "", "LED_GREEN", "LED_BLUE"; diff --git a/arch/arm64/boot/dts/freescale/imx8mm-phycore-som.dtsi b/arch/arm64/boot/dts/freescale/imx8mm-phycore-som.dtsi index 3e5e7d861882f..9d9b103c79c77 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm-phycore-som.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mm-phycore-som.dtsi @@ -111,7 +111,7 @@ };
&gpio1 { - gpio-line-names = "nINT_ETHPHY", "", "WDOG_INT", "X_RTC_INT", + gpio-line-names = "", "", "WDOG_INT", "X_RTC_INT", "", "", "", "RESET_ETHPHY", "", "", "nENABLE_FLATLINK"; };
From: Hugo Villeneuve hvilleneuve@dimonoff.com
[ Upstream commit 253be5b53c2792fb4384f8005b05421e6f040ee3 ]
For SOMs with an onboard PHY, the RESET_N pull-up resistor is currently deactivated in the pinmux configuration. When the pinmux code selects the GPIO function for this pin, with a default direction of input, this prevents the RESET_N pin from being taken to the proper 3.3V level (deasserted), and this results in the PHY being not detected since it is held in reset.
Taken from RESET_N pin description in ADIN13000 datasheet: This pin requires a 1K pull-up resistor to AVDD_3P3.
Activate the pull-up resistor to fix the issue.
Fixes: ade0176dd8a0 ("arm64: dts: imx8mn-var-som: Add Variscite VAR-SOM-MX8MN System on Module") Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Reviewed-by: Fabio Estevam festevam@gmail.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi b/arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi index d053ef302fb82..faafefe562e4b 100644 --- a/arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi @@ -351,7 +351,7 @@ MX8MN_IOMUXC_ENET_RXC_ENET1_RGMII_RXC 0x91 MX8MN_IOMUXC_ENET_RX_CTL_ENET1_RGMII_RX_CTL 0x91 MX8MN_IOMUXC_ENET_TX_CTL_ENET1_RGMII_TX_CTL 0x1f - MX8MN_IOMUXC_GPIO1_IO09_GPIO1_IO9 0x19 + MX8MN_IOMUXC_GPIO1_IO09_GPIO1_IO9 0x159 >; };
From: Benjamin Gaignard benjamin.gaignard@collabora.com
[ Upstream commit b27bfc5103c72f84859bd32731b6a09eafdeda05 ]
Set VPU G2 clock to 300MHz like described in documentation. This fixes pixels error occurring with large resolution ( >= 2560x1600) HEVC test stream when using the postprocessor to produce NV12.
Fixes: 4ac7e4a81272 ("arm64: dts: imx8mq: Enable both G1 and G2 VPU's with vpu-blk-ctrl") Signed-off-by: Benjamin Gaignard benjamin.gaignard@collabora.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mq.dtsi b/arch/arm64/boot/dts/freescale/imx8mq.dtsi index 4724ed0cbff94..bf8f02c1535c1 100644 --- a/arch/arm64/boot/dts/freescale/imx8mq.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mq.dtsi @@ -756,7 +756,7 @@ <&clk IMX8MQ_SYS1_PLL_800M>, <&clk IMX8MQ_VPU_PLL>; assigned-clock-rates = <600000000>, - <600000000>, + <300000000>, <800000000>, <0>; };
From: Punit Agrawal punit.agrawal@bytedance.com
[ Upstream commit d05799d7b4a39fa71c65aa277128ac7c843ffcdc ]
Commit 35727af2b15d ("irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4") moved the initialisation of the SoC version to arm_smccc_version_init() but forgot to update the results structure and it's usage.
Fix the use of the uninitialised results structure and update the error strings.
Fixes: 35727af2b15d ("irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4") Signed-off-by: Punit Agrawal punit.agrawal@bytedance.com Cc: Sudeep Holla sudeep.holla@arm.com Cc: Marc Zyngier maz@kernel.org Cc: Vikram Sethi vsethi@nvidia.com Cc: Shanker Donthineni sdonthineni@nvidia.com Acked-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20230717171702.424253-1-punit.agrawal@bytedance.co... Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/smccc/soc_id.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/firmware/smccc/soc_id.c b/drivers/firmware/smccc/soc_id.c index 890eb454599a3..1990263fbba0e 100644 --- a/drivers/firmware/smccc/soc_id.c +++ b/drivers/firmware/smccc/soc_id.c @@ -34,7 +34,6 @@ static struct soc_device_attribute *soc_dev_attr;
static int __init smccc_soc_init(void) { - struct arm_smccc_res res; int soc_id_rev, soc_id_version; static char soc_id_str[20], soc_id_rev_str[12]; static char soc_id_jep106_id_str[12]; @@ -49,13 +48,13 @@ static int __init smccc_soc_init(void) }
if (soc_id_version < 0) { - pr_err("ARCH_SOC_ID(0) returned error: %lx\n", res.a0); + pr_err("Invalid SoC Version: %x\n", soc_id_version); return -EINVAL; }
soc_id_rev = arm_smccc_get_soc_id_revision(); if (soc_id_rev < 0) { - pr_err("ARCH_SOC_ID(1) returned error: %lx\n", res.a0); + pr_err("Invalid SoC Revision: %x\n", soc_id_rev); return -EINVAL; }
From: Yury Norov yury.norov@gmail.com
[ Upstream commit 2356d198d2b4ddec24efea98271cb3be230bc787 ]
When building with Clang, and when KASAN and GCOV_PROFILE_ALL are both enabled, the test fails to build [1]:
lib/test_bitmap.c:920:2: error: call to '__compiletime_assert_239' declared with 'error' attribute: BUILD_BUG_ON failed: !__builtin_constant_p(res)
BUILD_BUG_ON(!__builtin_constant_p(res)); ^ include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON' BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) ^ include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^ include/linux/compiler_types.h:352:2: note: expanded from macro 'compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ include/linux/compiler_types.h:340:2: note: expanded from macro '_compiletime_assert' __compiletime_assert(condition, msg, prefix, suffix) ^ include/linux/compiler_types.h:333:4: note: expanded from macro '__compiletime_assert' prefix ## suffix(); \ ^ <scratch space>:185:1: note: expanded from here __compiletime_assert_239
Originally it was attributed to s390, which now looks seemingly wrong. The issue is not related to bitmap code itself, but it breaks build for a given configuration.
Disabling the const_eval test under that config may potentially hide other bugs. Instead, workaround it by disabling GCOV for the test_bitmap unless the compiler will get fixed.
[1] https://github.com/ClangBuiltLinux/linux/issues/1874
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202307171254.yFcH97ej-lkp@intel.com/ Fixes: dc34d5036692 ("lib: test_bitmap: add compile-time optimization/evaluations assertions") Co-developed-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Yury Norov yury.norov@gmail.com Reviewed-by: Nick Desaulniers ndesaulniers@google.com Reviewed-by: Alexander Lobakin aleksander.lobakin@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- lib/Makefile | 6 ++++++ lib/test_bitmap.c | 8 ++++---- 2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/lib/Makefile b/lib/Makefile index 59bd7c2f793a7..5ffe72ec99797 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -81,8 +81,14 @@ obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o obj-$(CONFIG_TEST_DYNAMIC_DEBUG) += test_dynamic_debug.o obj-$(CONFIG_TEST_PRINTF) += test_printf.o obj-$(CONFIG_TEST_SCANF) += test_scanf.o + obj-$(CONFIG_TEST_BITMAP) += test_bitmap.o obj-$(CONFIG_TEST_STRSCPY) += test_strscpy.o +ifeq ($(CONFIG_CC_IS_CLANG)$(CONFIG_KASAN),yy) +# FIXME: Clang breaks test_bitmap_const_eval when KASAN and GCOV are enabled +GCOV_PROFILE_test_bitmap.o := n +endif + obj-$(CONFIG_TEST_UUID) += test_uuid.o obj-$(CONFIG_TEST_XARRAY) += test_xarray.o obj-$(CONFIG_TEST_MAPLE_TREE) += test_maple_tree.o diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c index a8005ad3bd589..37a9108c4f588 100644 --- a/lib/test_bitmap.c +++ b/lib/test_bitmap.c @@ -1149,6 +1149,10 @@ static void __init test_bitmap_print_buf(void) } }
+/* + * FIXME: Clang breaks compile-time evaluations when KASAN and GCOV are enabled. + * To workaround it, GCOV is force-disabled in Makefile for this configuration. + */ static void __init test_bitmap_const_eval(void) { DECLARE_BITMAP(bitmap, BITS_PER_LONG); @@ -1174,11 +1178,7 @@ static void __init test_bitmap_const_eval(void) * the compiler is fixed. */ bitmap_clear(bitmap, 0, BITS_PER_LONG); -#if defined(__s390__) && defined(__clang__) - if (!const_test_bit(7, bitmap)) -#else if (!test_bit(7, bitmap)) -#endif bitmap_set(bitmap, 5, 2);
/* Equals to `unsigned long bitopvar = BIT(20)` */
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit d1ff11d7ad8704f8d615f6446041c221b2d2ec4d ]
SCMI transport based on SMC can optionally use an additional IRQ to signal message completion. The associated interrupt handler is currently allocated using devres but on shutdown the core SCMI stack will call .chan_free() well before any managed cleanup is invoked by devres. As a consequence, the arrival of a late reply to an in-flight pending transaction could still trigger the interrupt handler well after the SCMI core has cleaned up the channels, with unpleasant results.
Inhibit further message processing on the IRQ path by explicitly freeing the IRQ inside .chan_free() callback itself.
Fixes: dd820ee21d5e ("firmware: arm_scmi: Augment SMC/HVC to allow optional interrupt") Reported-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Cristian Marussi cristian.marussi@arm.com Link: https://lore.kernel.org/r/20230719173533.2739319-1-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/smc.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/firmware/arm_scmi/smc.c b/drivers/firmware/arm_scmi/smc.c index 87a7b13cf868b..dc383d874ee3a 100644 --- a/drivers/firmware/arm_scmi/smc.c +++ b/drivers/firmware/arm_scmi/smc.c @@ -23,6 +23,7 @@ /** * struct scmi_smc - Structure representing a SCMI smc transport * + * @irq: An optional IRQ for completion * @cinfo: SCMI channel info * @shmem: Transmit/Receive shared memory area * @shmem_lock: Lock to protect access to Tx/Rx shared memory area. @@ -33,6 +34,7 @@ */
struct scmi_smc { + int irq; struct scmi_chan_info *cinfo; struct scmi_shared_mem __iomem *shmem; /* Protect access to shmem area */ @@ -106,7 +108,7 @@ static int smc_chan_setup(struct scmi_chan_info *cinfo, struct device *dev, struct resource res; struct device_node *np; u32 func_id; - int ret, irq; + int ret;
if (!tx) return -ENODEV; @@ -142,11 +144,10 @@ static int smc_chan_setup(struct scmi_chan_info *cinfo, struct device *dev, * completion of a message is signaled by an interrupt rather than by * the return of the SMC call. */ - irq = of_irq_get_byname(cdev->of_node, "a2p"); - if (irq > 0) { - ret = devm_request_irq(dev, irq, smc_msg_done_isr, - IRQF_NO_SUSPEND, - dev_name(dev), scmi_info); + scmi_info->irq = of_irq_get_byname(cdev->of_node, "a2p"); + if (scmi_info->irq > 0) { + ret = request_irq(scmi_info->irq, smc_msg_done_isr, + IRQF_NO_SUSPEND, dev_name(dev), scmi_info); if (ret) { dev_err(dev, "failed to setup SCMI smc irq\n"); return ret; @@ -168,6 +169,10 @@ static int smc_chan_free(int id, void *p, void *data) struct scmi_chan_info *cinfo = p; struct scmi_smc *scmi_info = cinfo->transport_info;
+ /* Ignore any possible further reception on the IRQ path */ + if (scmi_info->irq > 0) + free_irq(scmi_info->irq, scmi_info); + cinfo->transport_info = NULL; scmi_info->cinfo = NULL;
From: ndesaulniers@google.com ndesaulniers@google.com
[ Upstream commit 79e8328e5acbe691bbde029a52c89d70dcbc22f3 ]
Compiling big-endian targets with Clang produces the diagnostic:
fs/namei.c:2173:13: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical] } while (!(has_zero(a, &adata, &constants) | has_zero(b, &bdata, &constants))); ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ || fs/namei.c:2173:13: note: cast one or both operands to int to silence this warning
It appears that when has_zero was introduced, two definitions were produced with different signatures (in particular different return types).
Looking at the usage in hash_name() in fs/namei.c, I suspect that has_zero() is meant to be invoked twice per while loop iteration; using logical-or would not update `bdata` when `a` did not have zeros. So I think it's preferred to always return an unsigned long rather than a bool than update the while loop in hash_name() to use a logical-or rather than bitwise-or.
[ Also changed powerpc version to do the same - Linus ]
Link: https://github.com/ClangBuiltLinux/linux/issues/1832 Link: https://lore.kernel.org/lkml/20230801-bitwise-v1-1-799bec468dc4@google.com/ Fixes: 36126f8f2ed8 ("word-at-a-time: make the interfaces truly generic") Debugged-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Nick Desaulniers ndesaulniers@google.com Acked-by: Heiko Carstens hca@linux.ibm.com Cc: Arnd Bergmann arnd@arndb.de Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/word-at-a-time.h | 2 +- include/asm-generic/word-at-a-time.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/word-at-a-time.h b/arch/powerpc/include/asm/word-at-a-time.h index 46c31fb8748d5..30a12d2086871 100644 --- a/arch/powerpc/include/asm/word-at-a-time.h +++ b/arch/powerpc/include/asm/word-at-a-time.h @@ -34,7 +34,7 @@ static inline long find_zero(unsigned long mask) return leading_zero_bits >> 3; }
-static inline bool has_zero(unsigned long val, unsigned long *data, const struct word_at_a_time *c) +static inline unsigned long has_zero(unsigned long val, unsigned long *data, const struct word_at_a_time *c) { unsigned long rhs = val | c->low_bits; *data = rhs; diff --git a/include/asm-generic/word-at-a-time.h b/include/asm-generic/word-at-a-time.h index 20c93f08c9933..95a1d214108a5 100644 --- a/include/asm-generic/word-at-a-time.h +++ b/include/asm-generic/word-at-a-time.h @@ -38,7 +38,7 @@ static inline long find_zero(unsigned long mask) return (mask >> 8) ? byte : byte + 1; }
-static inline bool has_zero(unsigned long val, unsigned long *data, const struct word_at_a_time *c) +static inline unsigned long has_zero(unsigned long val, unsigned long *data, const struct word_at_a_time *c) { unsigned long rhs = val | c->low_bits; *data = rhs;
From: Heiko Carstens hca@linux.ibm.com
[ Upstream commit 0c02cc576eac161601927b41634f80bfd55bfa9e ]
Commit 9fb6c9b3fea1 ("s390/sthyi: add cache to store hypervisor info") added cache handling for store hypervisor info. This also changed the possible return code for sthyi_fill().
Instead of only returning a condition code like the sthyi instruction would do, it can now also return a negative error value (-ENOMEM). handle_styhi() was not changed accordingly. In case of an error, the negative error value would incorrectly injected into the guest PSW.
Add proper error handling to prevent this, and update the comment which describes the possible return values of sthyi_fill().
Fixes: 9fb6c9b3fea1 ("s390/sthyi: add cache to store hypervisor info") Reviewed-by: Christian Borntraeger borntraeger@linux.ibm.com Link: https://lore.kernel.org/r/20230727182939.2050744-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/sthyi.c | 6 +++--- arch/s390/kvm/intercept.c | 9 ++++++--- 2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/s390/kernel/sthyi.c b/arch/s390/kernel/sthyi.c index 4d141e2c132e5..2ea7f208f0e73 100644 --- a/arch/s390/kernel/sthyi.c +++ b/arch/s390/kernel/sthyi.c @@ -459,9 +459,9 @@ static int sthyi_update_cache(u64 *rc) * * Fills the destination with system information returned by the STHYI * instruction. The data is generated by emulation or execution of STHYI, - * if available. The return value is the condition code that would be - * returned, the rc parameter is the return code which is passed in - * register R2 + 1. + * if available. The return value is either a negative error value or + * the condition code that would be returned, the rc parameter is the + * return code which is passed in register R2 + 1. */ int sthyi_fill(void *dst, u64 *rc) { diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c index ee7478a601442..b37bb960bfaf0 100644 --- a/arch/s390/kvm/intercept.c +++ b/arch/s390/kvm/intercept.c @@ -389,8 +389,8 @@ static int handle_partial_execution(struct kvm_vcpu *vcpu) */ int handle_sthyi(struct kvm_vcpu *vcpu) { - int reg1, reg2, r = 0; - u64 code, addr, cc = 0, rc = 0; + int reg1, reg2, cc = 0, r = 0; + u64 code, addr, rc = 0; struct sthyi_sctns *sctns = NULL;
if (!test_kvm_facility(vcpu->kvm, 74)) @@ -421,7 +421,10 @@ int handle_sthyi(struct kvm_vcpu *vcpu) return -ENOMEM;
cc = sthyi_fill(sctns, &rc); - + if (cc < 0) { + free_page((unsigned long)sctns); + return cc; + } out: if (!cc) { if (kvm_s390_pv_cpu_is_protected(vcpu)) {
From: Gao Xiang hsiangkao@linux.alibaba.com
[ Upstream commit 94c43de73521d8ed7ebcfc6191d9dace1cbf7caa ]
When handling deduplicated compressed data, there can be multiple decompressed extents pointing to the same compressed data in one shot.
In such cases, the bvecs which belong to the longest extent will be selected as the primary bvecs for real decompressors to decode and the other duplicated bvecs will be directly copied from the primary bvecs.
Previously, only relative offsets of the longest extent were checked to decompress the primary bvecs. On rare occasions, it can be incorrect if there are several extents with the same start relative offset. As a result, some short bvecs could be selected for decompression and then cause data corruption.
For example, as Shijie Sun reported off-list, considering the following extents of a file: 117: 903345.. 915250 | 11905 : 385024.. 389120 | 4096 ... 119: 919729.. 930323 | 10594 : 385024.. 389120 | 4096 ... 124: 968881.. 980786 | 11905 : 385024.. 389120 | 4096
The start relative offset is the same: 2225, but extent 119 (919729.. 930323) is shorter than the others.
Let's restrict the bvec length in addition to the start offset if bvecs are not full.
Reported-by: Shijie Sun sunshijie@xiaomi.com Fixes: 5c2a64252c5d ("erofs: introduce partial-referenced pclusters") Tested-by Shijie Sun sunshijie@xiaomi.com Reviewed-by: Yue Hu huyue2@coolpad.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20230719065459.60083-1-hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/erofs/zdata.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 533e612b6a486..361f3c29897e8 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -989,10 +989,11 @@ static void z_erofs_do_decompressed_bvec(struct z_erofs_decompress_backend *be, struct z_erofs_bvec *bvec) { struct z_erofs_bvec_item *item; + unsigned int pgnr;
- if (!((bvec->offset + be->pcl->pageofs_out) & ~PAGE_MASK)) { - unsigned int pgnr; - + if (!((bvec->offset + be->pcl->pageofs_out) & ~PAGE_MASK) && + (bvec->end == PAGE_SIZE || + bvec->offset + bvec->end == be->pcl->length)) { pgnr = (bvec->offset + be->pcl->pageofs_out) >> PAGE_SHIFT; DBG_BUGON(pgnr >= be->nr_pages); if (!be->decompressed_pages[pgnr]) {
From: Ilan Peer ilan.peer@intel.com
[ Upstream commit fd7f08d92fcd7cc3eca0dd6c853f722a4c6176df ]
The reporter noticed a warning when running iwlwifi:
WARNING: CPU: 8 PID: 659 at mm/page_alloc.c:4453 __alloc_pages+0x329/0x340
As cfg80211_parse_colocated_ap() is not expected to return a negative value return 0 and not a negative value if cfg80211_calc_short_ssid() fails.
Fixes: c8cb5b854b40f ("nl80211/cfg80211: support 6 GHz scanning") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217675 Signed-off-by: Ilan Peer ilan.peer@intel.com Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20230723201043.3007430-1-ilan.peer@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/scan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/wireless/scan.c b/net/wireless/scan.c index efe9283e98935..e5c1510c098fd 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -643,7 +643,7 @@ static int cfg80211_parse_colocated_ap(const struct cfg80211_bss_ies *ies,
ret = cfg80211_calc_short_ssid(ies, &ssid_elem, &s_ssid_tmp); if (ret) - return ret; + return 0;
/* RNR IE may contain more than one NEIGHBOR_AP_INFO */ while (pos + sizeof(*ap_info) <= end) {
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit aeb660171b0663847fa04806a96302ac6112ad26 ]
In function macsec_fs_tx_create_crypto_table_groups(), when the ft->g memory is successfully allocated but the 'in' memory fails to be allocated, the memory pointed to by ft->g is released once. And in function macsec_fs_tx_create(), macsec_fs_tx_destroy() is called to release the memory pointed to by ft->g again. This will cause double free problem.
Fixes: e467b283ffd5 ("net/mlx5e: Add MACsec TX steering rules") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c index 5b658a5588c64..6ecf0bf2366ad 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c @@ -160,6 +160,7 @@ static int macsec_fs_tx_create_crypto_table_groups(struct mlx5e_flow_table *ft)
if (!in) { kfree(ft->g); + ft->g = NULL; return -ENOMEM; }
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 5dd77585dd9d0e03dd1bceb95f0269a7eaf6b936 ]
when mlx5_cmd_exec failed in mlx5dr_cmd_create_reformat_ctx, the memory pointed by 'in' is not released, which will cause memory leak. Move memory release after mlx5_cmd_exec.
Fixes: 1d9186476e12 ("net/mlx5: DR, Add direct rule command utilities") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c index 84364691a3791..d7b1a230b59e8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c @@ -538,11 +538,12 @@ int mlx5dr_cmd_create_reformat_ctx(struct mlx5_core_dev *mdev,
err = mlx5_cmd_exec(mdev, in, inlen, out, sizeof(out)); if (err) - return err; + goto err_free_in;
*reformat_id = MLX5_GET(alloc_packet_reformat_context_out, out, packet_reformat_id); - kvfree(in);
+err_free_in: + kvfree(in); return err; }
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit c6cf0b6097bf1bf1b2a89b521e9ecd26b581a93a ]
The memory pointed to by the priv->rx_res pointer is not freed in the error path of mlx5e_init_rep_rx, which can lead to a memory leak. Fix by freeing the memory in the error path, thereby making the error path identical to mlx5e_cleanup_rep_rx().
Fixes: af8bbf730068 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 9bd1a93a512d4..ff0c025db1402 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -912,7 +912,7 @@ static int mlx5e_init_rep_rx(struct mlx5e_priv *priv) err = mlx5e_open_drop_rq(priv, &priv->drop_rq); if (err) { mlx5_core_err(mdev, "open drop rq failed, %d\n", err); - return err; + goto err_rx_res_free; }
err = mlx5e_rx_res_init(priv->rx_res, priv->mdev, 0, @@ -946,6 +946,7 @@ static int mlx5e_init_rep_rx(struct mlx5e_priv *priv) mlx5e_rx_res_destroy(priv->rx_res); err_close_drop_rq: mlx5e_close_drop_rq(&priv->drop_rq); +err_rx_res_free: mlx5e_rx_res_free(priv->rx_res); priv->rx_res = NULL; err_free_fs:
From: Yuanjun Gong ruc_gongyuanjun@163.com
[ Upstream commit e5bcb7564d3bd0c88613c76963c5349be9c511c5 ]
mlx5e_ipsec_remove_trailer() should return an error code if function pskb_trim() returns an unexpected value.
Fixes: 2ac9cfe78223 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path") Signed-off-by: Yuanjun Gong ruc_gongyuanjun@163.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c index 6859f1c1a8319..c4a84f0a3b733 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c @@ -58,7 +58,9 @@ static int mlx5e_ipsec_remove_trailer(struct sk_buff *skb, struct xfrm_state *x)
trailer_len = alen + plen + 2;
- pskb_trim(skb, skb->len - trailer_len); + ret = pskb_trim(skb, skb->len - trailer_len); + if (unlikely(ret)) + return ret; if (skb->protocol == htons(ETH_P_IP)) { ipv4hdr->tot_len = htons(ntohs(ipv4hdr->tot_len) - trailer_len); ip_send_check(ipv4hdr);
From: Amir Tzin amirtz@nvidia.com
[ Upstream commit 3ec43c1b082a8804472430e1253544d75f4b540e ]
Moving to switchdev mode with ntuple offload on causes the kernel to crash since fs->arfs is freed during nic profile cleanup flow.
Ntuple offload is not supported in switchdev mode and it is already unset by mlx5 fix feature ndo in switchdev mode. Verify fs->arfs is valid before disabling it.
trace: [] RIP: 0010:_raw_spin_lock_bh+0x17/0x30 [] arfs_del_rules+0x44/0x1a0 [mlx5_core] [] mlx5e_arfs_disable+0xe/0x20 [mlx5_core] [] mlx5e_handle_feature+0x3d/0xb0 [mlx5_core] [] ? __rtnl_unlock+0x25/0x50 [] mlx5e_set_features+0xfe/0x160 [mlx5_core] [] __netdev_update_features+0x278/0xa50 [] ? netdev_run_todo+0x5e/0x2a0 [] netdev_update_features+0x22/0x70 [] ? _cond_resched+0x15/0x30 [] mlx5e_attach_netdev+0x12a/0x1e0 [mlx5_core] [] mlx5e_netdev_attach_profile+0xa1/0xc0 [mlx5_core] [] mlx5e_netdev_change_profile+0x77/0xe0 [mlx5_core] [] mlx5e_vport_rep_load+0x1ed/0x290 [mlx5_core] [] mlx5_esw_offloads_rep_load+0x88/0xd0 [mlx5_core] [] esw_offloads_load_rep.part.38+0x31/0x50 [mlx5_core] [] esw_offloads_enable+0x6c5/0x710 [mlx5_core] [] mlx5_eswitch_enable_locked+0x1bb/0x290 [mlx5_core] [] mlx5_devlink_eswitch_mode_set+0x14f/0x320 [mlx5_core] [] devlink_nl_cmd_eswitch_set_doit+0x94/0x120 [] genl_family_rcv_msg_doit.isra.17+0x113/0x150 [] genl_family_rcv_msg+0xb7/0x170 [] ? devlink_nl_cmd_port_split_doit+0x100/0x100 [] genl_rcv_msg+0x47/0xa0 [] ? genl_family_rcv_msg+0x170/0x170 [] netlink_rcv_skb+0x4c/0x130 [] genl_rcv+0x24/0x40 [] netlink_unicast+0x19a/0x230 [] netlink_sendmsg+0x204/0x3d0 [] sock_sendmsg+0x50/0x60
Fixes: 90b22b9bcd24 ("net/mlx5e: Disable Rx ntuple offload for uplink representor") Signed-off-by: Amir Tzin amirtz@nvidia.com Reviewed-by: Aya Levin ayal@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c index 0ae1865086ff1..dc0a0a27ac84a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c @@ -136,6 +136,16 @@ static void arfs_del_rules(struct mlx5e_flow_steering *fs);
int mlx5e_arfs_disable(struct mlx5e_flow_steering *fs) { + /* Moving to switchdev mode, fs->arfs is freed by mlx5e_nic_profile + * cleanup_rx callback and it is not recreated when + * mlx5e_uplink_rep_profile is loaded as mlx5e_create_flow_steering() + * is not called by the uplink_rep profile init_rx callback. Thus, if + * ntuple is set, moving to switchdev flow will enter this function + * with fs->arfs nullified. + */ + if (!mlx5e_fs_get_arfs(fs)) + return 0; + arfs_del_rules(fs);
return arfs_disable(fs);
From: Jianbo Liu jianbol@nvidia.com
[ Upstream commit d03b6e6f31820b84f7449cca022047f36c42bc3f ]
For IP tunnel encapsulation in ECMP (Equal-Cost Multipath) mode, as the flow is duplicated to the peer eswitch, the related neighbour information on the peer uplink representor is created as well.
In the cited commit, eswitch devcom unpair is moved to uplink unload API, specifically the profile->cleanup_tx. If there is a encap rule offloaded in ECMP mode, when one eswitch does unpair (because of unloading the driver, for instance), and the peer rule from the peer eswitch is going to be deleted, the use-after-free error is triggered while accessing neigh info, as it is already cleaned up in uplink's profile->disable, which is before its profile->cleanup_tx.
To fix this issue, move the neigh cleanup to profile's cleanup_tx callback, and after mlx5e_cleanup_uplink_rep_tx is called. The neigh init is moved to init_tx for symmeter.
[ 2453.376299] BUG: KASAN: slab-use-after-free in mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core] [ 2453.379125] Read of size 4 at addr ffff888127af9008 by task modprobe/2496
[ 2453.381542] CPU: 7 PID: 2496 Comm: modprobe Tainted: G B 6.4.0-rc7+ #15 [ 2453.383386] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 2453.384335] Call Trace: [ 2453.384625] <TASK> [ 2453.384891] dump_stack_lvl+0x33/0x50 [ 2453.385285] print_report+0xc2/0x610 [ 2453.385667] ? __virt_addr_valid+0xb1/0x130 [ 2453.386091] ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core] [ 2453.386757] kasan_report+0xae/0xe0 [ 2453.387123] ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core] [ 2453.387798] mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core] [ 2453.388465] mlx5e_rep_encap_entry_detach+0xa6/0xe0 [mlx5_core] [ 2453.389111] mlx5e_encap_dealloc+0xa7/0x100 [mlx5_core] [ 2453.389706] mlx5e_tc_tun_encap_dests_unset+0x61/0xb0 [mlx5_core] [ 2453.390361] mlx5_free_flow_attr_actions+0x11e/0x340 [mlx5_core] [ 2453.391015] ? complete_all+0x43/0xd0 [ 2453.391398] ? free_flow_post_acts+0x38/0x120 [mlx5_core] [ 2453.392004] mlx5e_tc_del_fdb_flow+0x4ae/0x690 [mlx5_core] [ 2453.392618] mlx5e_tc_del_fdb_peers_flow+0x308/0x370 [mlx5_core] [ 2453.393276] mlx5e_tc_clean_fdb_peer_flows+0xf5/0x140 [mlx5_core] [ 2453.393925] mlx5_esw_offloads_unpair+0x86/0x540 [mlx5_core] [ 2453.394546] ? mlx5_esw_offloads_set_ns_peer.isra.0+0x180/0x180 [mlx5_core] [ 2453.395268] ? down_write+0xaa/0x100 [ 2453.395652] mlx5_esw_offloads_devcom_event+0x203/0x530 [mlx5_core] [ 2453.396317] mlx5_devcom_send_event+0xbb/0x190 [mlx5_core] [ 2453.396917] mlx5_esw_offloads_devcom_cleanup+0xb0/0xd0 [mlx5_core] [ 2453.397582] mlx5e_tc_esw_cleanup+0x42/0x120 [mlx5_core] [ 2453.398182] mlx5e_rep_tc_cleanup+0x15/0x30 [mlx5_core] [ 2453.398768] mlx5e_cleanup_rep_tx+0x6c/0x80 [mlx5_core] [ 2453.399367] mlx5e_detach_netdev+0xee/0x120 [mlx5_core] [ 2453.399957] mlx5e_netdev_change_profile+0x84/0x170 [mlx5_core] [ 2453.400598] mlx5e_vport_rep_unload+0xe0/0xf0 [mlx5_core] [ 2453.403781] mlx5_eswitch_unregister_vport_reps+0x15e/0x190 [mlx5_core] [ 2453.404479] ? mlx5_eswitch_register_vport_reps+0x200/0x200 [mlx5_core] [ 2453.405170] ? up_write+0x39/0x60 [ 2453.405529] ? kernfs_remove_by_name_ns+0xb7/0xe0 [ 2453.405985] auxiliary_bus_remove+0x2e/0x40 [ 2453.406405] device_release_driver_internal+0x243/0x2d0 [ 2453.406900] ? kobject_put+0x42/0x2d0 [ 2453.407284] bus_remove_device+0x128/0x1d0 [ 2453.407687] device_del+0x240/0x550 [ 2453.408053] ? waiting_for_supplier_show+0xe0/0xe0 [ 2453.408511] ? kobject_put+0xfa/0x2d0 [ 2453.408889] ? __kmem_cache_free+0x14d/0x280 [ 2453.409310] mlx5_rescan_drivers_locked.part.0+0xcd/0x2b0 [mlx5_core] [ 2453.409973] mlx5_unregister_device+0x40/0x50 [mlx5_core] [ 2453.410561] mlx5_uninit_one+0x3d/0x110 [mlx5_core] [ 2453.411111] remove_one+0x89/0x130 [mlx5_core] [ 2453.411628] pci_device_remove+0x59/0xf0 [ 2453.412026] device_release_driver_internal+0x243/0x2d0 [ 2453.412511] ? parse_option_str+0x14/0x90 [ 2453.412915] driver_detach+0x7b/0xf0 [ 2453.413289] bus_remove_driver+0xb5/0x160 [ 2453.413685] pci_unregister_driver+0x3f/0xf0 [ 2453.414104] mlx5_cleanup+0xc/0x20 [mlx5_core]
Fixes: 2be5bd42a5bb ("net/mlx5: Handle pairing of E-switch via uplink un/load APIs") Signed-off-by: Jianbo Liu jianbol@nvidia.com Reviewed-by: Vlad Buslov vladbu@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en_rep.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index ff0c025db1402..bd895ef341a0b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -1040,6 +1040,10 @@ static int mlx5e_init_rep_tx(struct mlx5e_priv *priv) return err; }
+ err = mlx5e_rep_neigh_init(rpriv); + if (err) + goto err_neigh_init; + if (rpriv->rep->vport == MLX5_VPORT_UPLINK) { err = mlx5e_init_uplink_rep_tx(rpriv); if (err) @@ -1056,6 +1060,8 @@ static int mlx5e_init_rep_tx(struct mlx5e_priv *priv) if (rpriv->rep->vport == MLX5_VPORT_UPLINK) mlx5e_cleanup_uplink_rep_tx(rpriv); err_init_tx: + mlx5e_rep_neigh_cleanup(rpriv); +err_neigh_init: mlx5e_destroy_tises(priv); return err; } @@ -1069,22 +1075,17 @@ static void mlx5e_cleanup_rep_tx(struct mlx5e_priv *priv) if (rpriv->rep->vport == MLX5_VPORT_UPLINK) mlx5e_cleanup_uplink_rep_tx(rpriv);
+ mlx5e_rep_neigh_cleanup(rpriv); mlx5e_destroy_tises(priv); }
static void mlx5e_rep_enable(struct mlx5e_priv *priv) { - struct mlx5e_rep_priv *rpriv = priv->ppriv; - mlx5e_set_netdev_mtu_boundaries(priv); - mlx5e_rep_neigh_init(rpriv); }
static void mlx5e_rep_disable(struct mlx5e_priv *priv) { - struct mlx5e_rep_priv *rpriv = priv->ppriv; - - mlx5e_rep_neigh_cleanup(rpriv); }
static int mlx5e_update_rep_rx(struct mlx5e_priv *priv) @@ -1119,7 +1120,6 @@ static int uplink_rep_async_event(struct notifier_block *nb, unsigned long event
static void mlx5e_uplink_rep_enable(struct mlx5e_priv *priv) { - struct mlx5e_rep_priv *rpriv = priv->ppriv; struct net_device *netdev = priv->netdev; struct mlx5_core_dev *mdev = priv->mdev; u16 max_mtu; @@ -1139,7 +1139,6 @@ static void mlx5e_uplink_rep_enable(struct mlx5e_priv *priv) mlx5_notifier_register(mdev, &priv->events_nb); mlx5e_dcbnl_initialize(priv); mlx5e_dcbnl_init_app(priv); - mlx5e_rep_neigh_init(rpriv); mlx5e_rep_bridge_init(priv);
netdev->wanted_features |= NETIF_F_HW_TC; @@ -1154,7 +1153,6 @@ static void mlx5e_uplink_rep_enable(struct mlx5e_priv *priv)
static void mlx5e_uplink_rep_disable(struct mlx5e_priv *priv) { - struct mlx5e_rep_priv *rpriv = priv->ppriv; struct mlx5_core_dev *mdev = priv->mdev;
rtnl_lock(); @@ -1164,7 +1162,6 @@ static void mlx5e_uplink_rep_disable(struct mlx5e_priv *priv) rtnl_unlock();
mlx5e_rep_bridge_cleanup(priv); - mlx5e_rep_neigh_cleanup(rpriv); mlx5e_dcbnl_delete_app(priv); mlx5_notifier_unregister(mdev, &priv->events_nb); mlx5e_rep_tc_disable(priv);
From: Lin Ma linma@zju.edu.cn
[ Upstream commit bcc29b7f5af6797702c2306a7aacb831fc5ce9cb ]
The nla_for_each_nested parsing in function bpf_sk_storage_diag_alloc does not check the length of the nested attribute. This can lead to an out-of-attribute read and allow a malformed nlattr (e.g., length 0) to be viewed as a 4 byte integer.
This patch adds an additional check when the nlattr is getting counted. This makes sure the latter nla_get_u32 can access the attributes with the correct length.
Fixes: 1ed4d92458a9 ("bpf: INET_DIAG support in bpf_sk_storage") Suggested-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Lin Ma linma@zju.edu.cn Reviewed-by: Jakub Kicinski kuba@kernel.org Link: https://lore.kernel.org/r/20230725023330.422856-1-linma@zju.edu.cn Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/bpf_sk_storage.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index 94374d529ea42..ad01b1bea52e4 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -531,8 +531,11 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs) return ERR_PTR(-EPERM);
nla_for_each_nested(nla, nla_stgs, rem) { - if (nla_type(nla) == SK_DIAG_BPF_STORAGE_REQ_MAP_FD) + if (nla_type(nla) == SK_DIAG_BPF_STORAGE_REQ_MAP_FD) { + if (nla_len(nla) != sizeof(u32)) + return ERR_PTR(-EINVAL); nr_maps++; + } }
diag = kzalloc(struct_size(diag, maps, nr_maps), GFP_KERNEL);
From: Lin Ma linma@zju.edu.cn
[ Upstream commit d73ef2d69c0dba5f5a1cb9600045c873bab1fb7f ]
There are totally 9 ndo_bridge_setlink handlers in the current kernel, which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3) i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5) ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7) nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink.
By investigating the code, we find that 1-7 parse and use nlattr IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can lead to an out-of-attribute read and allow a malformed nlattr (e.g., length 0) to be viewed as a 2 byte integer.
To avoid such issues, also for other ndo_bridge_setlink handlers in the future. This patch adds the nla_len check in rtnl_bridge_setlink and does an early error return if length mismatches. To make it works, the break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure this nla_for_each_nested iterates every attribute.
Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink") Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops") Suggested-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Lin Ma linma@zju.edu.cn Acked-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Link: https://lore.kernel.org/r/20230726075314.1059224-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/rtnetlink.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 5625ed30a06f3..2758b3f7c0214 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -5030,13 +5030,17 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC); if (br_spec) { nla_for_each_nested(attr, br_spec, rem) { - if (nla_type(attr) == IFLA_BRIDGE_FLAGS) { + if (nla_type(attr) == IFLA_BRIDGE_FLAGS && !have_flags) { if (nla_len(attr) < sizeof(flags)) return -EINVAL;
have_flags = true; flags = nla_get_u16(attr); - break; + } + + if (nla_type(attr) == IFLA_BRIDGE_MODE) { + if (nla_len(attr) < sizeof(u16)) + return -EINVAL; } } }
From: Yuanjun Gong ruc_gongyuanjun@163.com
[ Upstream commit dadc5b86cc9459581f37fe755b431adc399ea393 ]
in bcm_sf2_sw_probe(), check the return value of clk_prepare_enable() and return the error code if clk_prepare_enable() returns an unexpected value.
Fixes: e9ec5c3bd238 ("net: dsa: bcm_sf2: request and handle clocks") Signed-off-by: Yuanjun Gong ruc_gongyuanjun@163.com Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Link: https://lore.kernel.org/r/20230726170506.16547-1-ruc_gongyuanjun@163.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/bcm_sf2.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index cde253d27bd08..72374b066f64a 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -1436,7 +1436,9 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev) if (IS_ERR(priv->clk)) return PTR_ERR(priv->clk);
- clk_prepare_enable(priv->clk); + ret = clk_prepare_enable(priv->clk); + if (ret) + return ret;
priv->clk_mdiv = devm_clk_get_optional(&pdev->dev, "sw_switch_mdiv"); if (IS_ERR(priv->clk_mdiv)) { @@ -1444,7 +1446,9 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev) goto out_clk; }
- clk_prepare_enable(priv->clk_mdiv); + ret = clk_prepare_enable(priv->clk_mdiv); + if (ret) + goto out_clk;
ret = bcm_sf2_sw_rst(priv); if (ret) {
From: Georg Müller georgmueller@gmx.net
[ Upstream commit 98ce8e4a9dcfb448b30a2d7a16190f4a00382377 ]
Without gcc, the test will fail.
On cleanup, ignore probe removal errors. Otherwise, in case of an error adding the probe, the temporary directory is not removed.
Fixes: 56cbeacf14353057 ("perf probe: Add test for regression introduced by switch to die_get_decl_file()") Signed-off-by: Georg Müller georgmueller@gmx.net Acked-by: Ian Rogers irogers@google.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Georg Müller georgmueller@gmx.net Cc: Ingo Molnar mingo@redhat.com Cc: Jiri Olsa jolsa@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: https://lore.kernel.org/r/20230728151812.454806-2-georgmueller@gmx.net Link: https://lore.kernel.org/r/CAP-5=fUP6UuLgRty3t2=fQsQi3k4hDMz415vWdp1x88QMvZ8u... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/tests/shell/test_uprobe_from_different_cu.sh | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/tools/perf/tests/shell/test_uprobe_from_different_cu.sh b/tools/perf/tests/shell/test_uprobe_from_different_cu.sh index 00d2e0e2e0c28..319f36ebb9a40 100644 --- a/tools/perf/tests/shell/test_uprobe_from_different_cu.sh +++ b/tools/perf/tests/shell/test_uprobe_from_different_cu.sh @@ -4,6 +4,12 @@
set -e
+# skip if there's no gcc +if ! [ -x "$(command -v gcc)" ]; then + echo "failed: no gcc compiler" + exit 2 +fi + temp_dir=$(mktemp -d /tmp/perf-uprobe-different-cu-sh.XXXXXXXXXX)
cleanup() @@ -11,7 +17,7 @@ cleanup() trap - EXIT TERM INT if [[ "${temp_dir}" =~ ^/tmp/perf-uprobe-different-cu-sh.*$ ]]; then echo "--- Cleaning up ---" - perf probe -x ${temp_dir}/testfile -d foo + perf probe -x ${temp_dir}/testfile -d foo || true rm -f "${temp_dir}/"* rmdir "${temp_dir}" fi
From: Jamal Hadi Salim jhs@mojatatu.com
[ Upstream commit e68409db995380d1badacba41ff24996bd396171 ]
A match entry is uniquely identified with an "address" or "path" in the form of: hashtable ID(12b):bucketid(8b):nodeid(12b).
When creating table match entries all of hash table id, bucket id and node (match entry id) are needed to be either specified by the user or reasonable in-kernel defaults are used. The in-kernel default for a table id is 0x800(omnipresent root table); for bucketid it is 0x0. Prior to this fix there was none for a nodeid i.e. the code assumed that the user passed the correct nodeid and if the user passes a nodeid of 0 (as Mingi Cho did) then that is what was used. But nodeid of 0 is reserved for identifying the table. This is not a problem until we dump. The dump code notices that the nodeid is zero and assumes it is referencing a table and therefore references table struct tc_u_hnode instead of what was created i.e match entry struct tc_u_knode.
Ming does an equivalent of: tc filter add dev dummy0 parent 10: prio 1 handle 0x1000 \ protocol ip u32 match ip src 10.0.0.1/32 classid 10:1 action ok
Essentially specifying a table id 0, bucketid 1 and nodeid of zero Tableid 0 is remapped to the default of 0x800. Bucketid 1 is ignored and defaults to 0x00. Nodeid was assumed to be what Ming passed - 0x000
dumping before fix shows: ~$ tc filter ls dev dummy0 parent 10: filter protocol ip pref 1 u32 chain 0 filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1 filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor -30591
Note that the last line reports a table instead of a match entry (you can tell this because it says "ht divisor..."). As a result of reporting the wrong data type (misinterpretting of struct tc_u_knode as being struct tc_u_hnode) the divisor is reported with value of -30591. Ming identified this as part of the heap address (physmap_base is 0xffff8880 (-30591 - 1)).
The fix is to ensure that when table entry matches are added and no nodeid is specified (i.e nodeid == 0) then we get the next available nodeid from the table's pool.
After the fix, this is what the dump shows: $ tc filter ls dev dummy0 parent 10: filter protocol ip pref 1 u32 chain 0 filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1 filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 10:1 not_in_hw match 0a000001/ffffffff at 12 action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1
Reported-by: Mingi Cho mgcho.minic@gmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Link: https://lore.kernel.org/r/20230726135151.416917-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_u32.c | 56 ++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 6 deletions(-)
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 1280736a7b92e..0e3bb1d65be1c 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -1022,18 +1022,62 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, return -EINVAL; }
+ /* At this point, we need to derive the new handle that will be used to + * uniquely map the identity of this table match entry. The + * identity of the entry that we need to construct is 32 bits made of: + * htid(12b):bucketid(8b):node/entryid(12b) + * + * At this point _we have the table(ht)_ in which we will insert this + * entry. We carry the table's id in variable "htid". + * Note that earlier code picked the ht selection either by a) the user + * providing the htid specified via TCA_U32_HASH attribute or b) when + * no such attribute is passed then the root ht, is default to at ID + * 0x[800][00][000]. Rule: the root table has a single bucket with ID 0. + * If OTOH the user passed us the htid, they may also pass a bucketid of + * choice. 0 is fine. For example a user htid is 0x[600][01][000] it is + * indicating hash bucketid of 1. Rule: the entry/node ID _cannot_ be + * passed via the htid, so even if it was non-zero it will be ignored. + * + * We may also have a handle, if the user passed one. The handle also + * carries the same addressing of htid(12b):bucketid(8b):node/entryid(12b). + * Rule: the bucketid on the handle is ignored even if one was passed; + * rather the value on "htid" is always assumed to be the bucketid. + */ if (handle) { + /* Rule: The htid from handle and tableid from htid must match */ if (TC_U32_HTID(handle) && TC_U32_HTID(handle ^ htid)) { NL_SET_ERR_MSG_MOD(extack, "Handle specified hash table address mismatch"); return -EINVAL; } - handle = htid | TC_U32_NODE(handle); - err = idr_alloc_u32(&ht->handle_idr, NULL, &handle, handle, - GFP_KERNEL); - if (err) - return err; - } else + /* Ok, so far we have a valid htid(12b):bucketid(8b) but we + * need to finalize the table entry identification with the last + * part - the node/entryid(12b)). Rule: Nodeid _cannot be 0_ for + * entries. Rule: nodeid of 0 is reserved only for tables(see + * earlier code which processes TC_U32_DIVISOR attribute). + * Rule: The nodeid can only be derived from the handle (and not + * htid). + * Rule: if the handle specified zero for the node id example + * 0x60000000, then pick a new nodeid from the pool of IDs + * this hash table has been allocating from. + * If OTOH it is specified (i.e for example the user passed a + * handle such as 0x60000123), then we use it generate our final + * handle which is used to uniquely identify the match entry. + */ + if (!TC_U32_NODE(handle)) { + handle = gen_new_kid(ht, htid); + } else { + handle = htid | TC_U32_NODE(handle); + err = idr_alloc_u32(&ht->handle_idr, NULL, &handle, + handle, GFP_KERNEL); + if (err) + return err; + } + } else { + /* The user did not give us a handle; lets just generate one + * from the table's pool of nodeids. + */ handle = gen_new_kid(ht, htid); + }
if (tb[TCA_U32_SEL] == NULL) { NL_SET_ERR_MSG_MOD(extack, "Selector not specified");
From: Chengfeng Ye dg573847474@gmail.com
[ Upstream commit 56c6be35fcbed54279df0a2c9e60480a61841d6f ]
As &hc->lock is acquired by both timer _hfcpci_softirq() and hardirq hfcpci_int(), the timer should disable irq before lock acquisition otherwise deadlock could happen if the timmer is preemtped by the hadr irq.
Possible deadlock scenario: hfcpci_softirq() (timer) -> _hfcpci_softirq() -> spin_lock(&hc->lock); <irq interruption> -> hfcpci_int() -> spin_lock(&hc->lock); (deadlock here)
This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock.
The tentative patch fixes the potential deadlock by spin_lock_irq() in timer.
Fixes: b36b654a7e82 ("mISDN: Create /sys/class/mISDN") Signed-off-by: Chengfeng Ye dg573847474@gmail.com Link: https://lore.kernel.org/r/20230727085619.7419-1-dg573847474@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/isdn/hardware/mISDN/hfcpci.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/isdn/hardware/mISDN/hfcpci.c b/drivers/isdn/hardware/mISDN/hfcpci.c index c0331b2680108..fe391de1aba32 100644 --- a/drivers/isdn/hardware/mISDN/hfcpci.c +++ b/drivers/isdn/hardware/mISDN/hfcpci.c @@ -839,7 +839,7 @@ hfcpci_fill_fifo(struct bchannel *bch) *z1t = cpu_to_le16(new_z1); /* now send data */ if (bch->tx_idx < bch->tx_skb->len) return; - dev_kfree_skb(bch->tx_skb); + dev_kfree_skb_any(bch->tx_skb); if (get_next_bframe(bch)) goto next_t_frame; return; @@ -895,7 +895,7 @@ hfcpci_fill_fifo(struct bchannel *bch) } bz->za[new_f1].z1 = cpu_to_le16(new_z1); /* for next buffer */ bz->f1 = new_f1; /* next frame */ - dev_kfree_skb(bch->tx_skb); + dev_kfree_skb_any(bch->tx_skb); get_next_bframe(bch); }
@@ -1119,7 +1119,7 @@ tx_birq(struct bchannel *bch) if (bch->tx_skb && bch->tx_idx < bch->tx_skb->len) hfcpci_fill_fifo(bch); else { - dev_kfree_skb(bch->tx_skb); + dev_kfree_skb_any(bch->tx_skb); if (get_next_bframe(bch)) hfcpci_fill_fifo(bch); } @@ -2277,7 +2277,7 @@ _hfcpci_softirq(struct device *dev, void *unused) return 0;
if (hc->hw.int_m2 & HFCPCI_IRQ_ENABLE) { - spin_lock(&hc->lock); + spin_lock_irq(&hc->lock); bch = Sel_BCS(hc, hc->hw.bswapped ? 2 : 1); if (bch && bch->state == ISDN_P_B_RAW) { /* B1 rx&tx */ main_rec_hfcpci(bch); @@ -2288,7 +2288,7 @@ _hfcpci_softirq(struct device *dev, void *unused) main_rec_hfcpci(bch); tx_birq(bch); } - spin_unlock(&hc->lock); + spin_unlock_irq(&hc->lock); } return 0; }
From: Konstantin Khorenko khorenko@virtuozzo.com
[ Upstream commit e346e231b42bcae6822a6326acfb7b741e9e6026 ]
Here we've got to a situation when tasklet called usleep_range() in PTT acquire logic, thus welcome to the "scheduling while atomic" BUG().
BUG: scheduling while atomic: swapper/24/0/0x00000100
[<ffffffffb41c6199>] schedule+0x29/0x70 [<ffffffffb41c5512>] schedule_hrtimeout_range_clock+0xb2/0x150 [<ffffffffb41c55c3>] schedule_hrtimeout_range+0x13/0x20 [<ffffffffb41c3bcf>] usleep_range+0x4f/0x70 [<ffffffffc08d3e58>] qed_ptt_acquire+0x38/0x100 [qed] [<ffffffffc08eac48>] _qed_get_vport_stats+0x458/0x580 [qed] [<ffffffffc08ead8c>] qed_get_vport_stats+0x1c/0xd0 [qed] [<ffffffffc08dffd3>] qed_get_protocol_stats+0x93/0x100 [qed] qed_mcp_send_protocol_stats case MFW_DRV_MSG_GET_LAN_STATS: case MFW_DRV_MSG_GET_FCOE_STATS: case MFW_DRV_MSG_GET_ISCSI_STATS: case MFW_DRV_MSG_GET_RDMA_STATS: [<ffffffffc08e36d8>] qed_mcp_handle_events+0x2d8/0x890 [qed] qed_int_assertion qed_int_attentions [<ffffffffc08d9490>] qed_int_sp_dpc+0xa50/0xdc0 [qed] [<ffffffffb3aa7623>] tasklet_action+0x83/0x140 [<ffffffffb41d9125>] __do_softirq+0x125/0x2bb [<ffffffffb41d560c>] call_softirq+0x1c/0x30 [<ffffffffb3a30645>] do_softirq+0x65/0xa0 [<ffffffffb3aa78d5>] irq_exit+0x105/0x110 [<ffffffffb41d8996>] do_IRQ+0x56/0xf0
Fix this by making caller to provide the context whether it could be in atomic context flow or not when getting stats from QED driver. QED driver based on the context provided decide to schedule out or not when acquiring the PTT BAR window.
We faced the BUG_ON() while getting vport stats, but according to the code same issue could happen for fcoe and iscsi statistics as well, so fixing them too.
Fixes: 6c75424612a7 ("qed: Add support for NCSI statistics.") Fixes: 1e128c81290a ("qed: Add support for hardware offloaded FCoE.") Fixes: 2f2b2614e893 ("qed: Provide iSCSI statistics to management") Cc: Sudarsana Kalluru skalluru@marvell.com Cc: David Miller davem@davemloft.net Cc: Manish Chopra manishc@marvell.com
Signed-off-by: Konstantin Khorenko khorenko@virtuozzo.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qed/qed_dev_api.h | 16 ++++++++++++ drivers/net/ethernet/qlogic/qed/qed_fcoe.c | 19 ++++++++++---- drivers/net/ethernet/qlogic/qed/qed_fcoe.h | 17 ++++++++++-- drivers/net/ethernet/qlogic/qed/qed_hw.c | 26 ++++++++++++++++--- drivers/net/ethernet/qlogic/qed/qed_iscsi.c | 19 ++++++++++---- drivers/net/ethernet/qlogic/qed/qed_iscsi.h | 8 ++++-- drivers/net/ethernet/qlogic/qed/qed_l2.c | 19 ++++++++++---- drivers/net/ethernet/qlogic/qed/qed_l2.h | 24 +++++++++++++++++ drivers/net/ethernet/qlogic/qed/qed_main.c | 6 ++--- 9 files changed, 128 insertions(+), 26 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev_api.h b/drivers/net/ethernet/qlogic/qed/qed_dev_api.h index f8682356d0cf4..94d4f9413ab7a 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev_api.h +++ b/drivers/net/ethernet/qlogic/qed/qed_dev_api.h @@ -193,6 +193,22 @@ void qed_hw_remove(struct qed_dev *cdev); */ struct qed_ptt *qed_ptt_acquire(struct qed_hwfn *p_hwfn);
+/** + * qed_ptt_acquire_context(): Allocate a PTT window honoring the context + * atomicy. + * + * @p_hwfn: HW device data. + * @is_atomic: Hint from the caller - if the func can sleep or not. + * + * Context: The function should not sleep in case is_atomic == true. + * Return: struct qed_ptt. + * + * Should be called at the entry point to the driver + * (at the beginning of an exported function). + */ +struct qed_ptt *qed_ptt_acquire_context(struct qed_hwfn *p_hwfn, + bool is_atomic); + /** * qed_ptt_release(): Release PTT Window. * diff --git a/drivers/net/ethernet/qlogic/qed/qed_fcoe.c b/drivers/net/ethernet/qlogic/qed/qed_fcoe.c index 3764190b948eb..04602ac947087 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_fcoe.c +++ b/drivers/net/ethernet/qlogic/qed/qed_fcoe.c @@ -693,13 +693,14 @@ static void _qed_fcoe_get_pstats(struct qed_hwfn *p_hwfn, }
static int qed_fcoe_get_stats(struct qed_hwfn *p_hwfn, - struct qed_fcoe_stats *p_stats) + struct qed_fcoe_stats *p_stats, + bool is_atomic) { struct qed_ptt *p_ptt;
memset(p_stats, 0, sizeof(*p_stats));
- p_ptt = qed_ptt_acquire(p_hwfn); + p_ptt = qed_ptt_acquire_context(p_hwfn, is_atomic);
if (!p_ptt) { DP_ERR(p_hwfn, "Failed to acquire ptt\n"); @@ -973,19 +974,27 @@ static int qed_fcoe_destroy_conn(struct qed_dev *cdev, QED_SPQ_MODE_EBLOCK, NULL); }
+static int qed_fcoe_stats_context(struct qed_dev *cdev, + struct qed_fcoe_stats *stats, + bool is_atomic) +{ + return qed_fcoe_get_stats(QED_AFFIN_HWFN(cdev), stats, is_atomic); +} + static int qed_fcoe_stats(struct qed_dev *cdev, struct qed_fcoe_stats *stats) { - return qed_fcoe_get_stats(QED_AFFIN_HWFN(cdev), stats); + return qed_fcoe_stats_context(cdev, stats, false); }
void qed_get_protocol_stats_fcoe(struct qed_dev *cdev, - struct qed_mcp_fcoe_stats *stats) + struct qed_mcp_fcoe_stats *stats, + bool is_atomic) { struct qed_fcoe_stats proto_stats;
/* Retrieve FW statistics */ memset(&proto_stats, 0, sizeof(proto_stats)); - if (qed_fcoe_stats(cdev, &proto_stats)) { + if (qed_fcoe_stats_context(cdev, &proto_stats, is_atomic)) { DP_VERBOSE(cdev, QED_MSG_STORAGE, "Failed to collect FCoE statistics\n"); return; diff --git a/drivers/net/ethernet/qlogic/qed/qed_fcoe.h b/drivers/net/ethernet/qlogic/qed/qed_fcoe.h index 19c85adf4ceb1..214e8299ecb4e 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_fcoe.h +++ b/drivers/net/ethernet/qlogic/qed/qed_fcoe.h @@ -28,8 +28,20 @@ int qed_fcoe_alloc(struct qed_hwfn *p_hwfn); void qed_fcoe_setup(struct qed_hwfn *p_hwfn);
void qed_fcoe_free(struct qed_hwfn *p_hwfn); +/** + * qed_get_protocol_stats_fcoe(): Fills provided statistics + * struct with statistics. + * + * @cdev: Qed dev pointer. + * @stats: Points to struct that will be filled with statistics. + * @is_atomic: Hint from the caller - if the func can sleep or not. + * + * Context: The function should not sleep in case is_atomic == true. + * Return: Void. + */ void qed_get_protocol_stats_fcoe(struct qed_dev *cdev, - struct qed_mcp_fcoe_stats *stats); + struct qed_mcp_fcoe_stats *stats, + bool is_atomic); #else /* CONFIG_QED_FCOE */ static inline int qed_fcoe_alloc(struct qed_hwfn *p_hwfn) { @@ -40,7 +52,8 @@ static inline void qed_fcoe_setup(struct qed_hwfn *p_hwfn) {} static inline void qed_fcoe_free(struct qed_hwfn *p_hwfn) {}
static inline void qed_get_protocol_stats_fcoe(struct qed_dev *cdev, - struct qed_mcp_fcoe_stats *stats) + struct qed_mcp_fcoe_stats *stats, + bool is_atomic) { } #endif /* CONFIG_QED_FCOE */ diff --git a/drivers/net/ethernet/qlogic/qed/qed_hw.c b/drivers/net/ethernet/qlogic/qed/qed_hw.c index 554f30b0cfd5e..6263f847b6b92 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hw.c +++ b/drivers/net/ethernet/qlogic/qed/qed_hw.c @@ -23,7 +23,10 @@ #include "qed_reg_addr.h" #include "qed_sriov.h"
-#define QED_BAR_ACQUIRE_TIMEOUT 1000 +#define QED_BAR_ACQUIRE_TIMEOUT_USLEEP_CNT 1000 +#define QED_BAR_ACQUIRE_TIMEOUT_USLEEP 1000 +#define QED_BAR_ACQUIRE_TIMEOUT_UDELAY_CNT 100000 +#define QED_BAR_ACQUIRE_TIMEOUT_UDELAY 10
/* Invalid values */ #define QED_BAR_INVALID_OFFSET (cpu_to_le32(-1)) @@ -84,12 +87,22 @@ void qed_ptt_pool_free(struct qed_hwfn *p_hwfn) }
struct qed_ptt *qed_ptt_acquire(struct qed_hwfn *p_hwfn) +{ + return qed_ptt_acquire_context(p_hwfn, false); +} + +struct qed_ptt *qed_ptt_acquire_context(struct qed_hwfn *p_hwfn, bool is_atomic) { struct qed_ptt *p_ptt; - unsigned int i; + unsigned int i, count; + + if (is_atomic) + count = QED_BAR_ACQUIRE_TIMEOUT_UDELAY_CNT; + else + count = QED_BAR_ACQUIRE_TIMEOUT_USLEEP_CNT;
/* Take the free PTT from the list */ - for (i = 0; i < QED_BAR_ACQUIRE_TIMEOUT; i++) { + for (i = 0; i < count; i++) { spin_lock_bh(&p_hwfn->p_ptt_pool->lock);
if (!list_empty(&p_hwfn->p_ptt_pool->free_list)) { @@ -105,7 +118,12 @@ struct qed_ptt *qed_ptt_acquire(struct qed_hwfn *p_hwfn) }
spin_unlock_bh(&p_hwfn->p_ptt_pool->lock); - usleep_range(1000, 2000); + + if (is_atomic) + udelay(QED_BAR_ACQUIRE_TIMEOUT_UDELAY); + else + usleep_range(QED_BAR_ACQUIRE_TIMEOUT_USLEEP, + QED_BAR_ACQUIRE_TIMEOUT_USLEEP * 2); }
DP_NOTICE(p_hwfn, "PTT acquire timeout - failed to allocate PTT\n"); diff --git a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c index 511ab214eb9c8..980e7289b4814 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c +++ b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c @@ -999,13 +999,14 @@ static void _qed_iscsi_get_pstats(struct qed_hwfn *p_hwfn, }
static int qed_iscsi_get_stats(struct qed_hwfn *p_hwfn, - struct qed_iscsi_stats *stats) + struct qed_iscsi_stats *stats, + bool is_atomic) { struct qed_ptt *p_ptt;
memset(stats, 0, sizeof(*stats));
- p_ptt = qed_ptt_acquire(p_hwfn); + p_ptt = qed_ptt_acquire_context(p_hwfn, is_atomic); if (!p_ptt) { DP_ERR(p_hwfn, "Failed to acquire ptt\n"); return -EAGAIN; @@ -1336,9 +1337,16 @@ static int qed_iscsi_destroy_conn(struct qed_dev *cdev, QED_SPQ_MODE_EBLOCK, NULL); }
+static int qed_iscsi_stats_context(struct qed_dev *cdev, + struct qed_iscsi_stats *stats, + bool is_atomic) +{ + return qed_iscsi_get_stats(QED_AFFIN_HWFN(cdev), stats, is_atomic); +} + static int qed_iscsi_stats(struct qed_dev *cdev, struct qed_iscsi_stats *stats) { - return qed_iscsi_get_stats(QED_AFFIN_HWFN(cdev), stats); + return qed_iscsi_stats_context(cdev, stats, false); }
static int qed_iscsi_change_mac(struct qed_dev *cdev, @@ -1358,13 +1366,14 @@ static int qed_iscsi_change_mac(struct qed_dev *cdev, }
void qed_get_protocol_stats_iscsi(struct qed_dev *cdev, - struct qed_mcp_iscsi_stats *stats) + struct qed_mcp_iscsi_stats *stats, + bool is_atomic) { struct qed_iscsi_stats proto_stats;
/* Retrieve FW statistics */ memset(&proto_stats, 0, sizeof(proto_stats)); - if (qed_iscsi_stats(cdev, &proto_stats)) { + if (qed_iscsi_stats_context(cdev, &proto_stats, is_atomic)) { DP_VERBOSE(cdev, QED_MSG_STORAGE, "Failed to collect ISCSI statistics\n"); return; diff --git a/drivers/net/ethernet/qlogic/qed/qed_iscsi.h b/drivers/net/ethernet/qlogic/qed/qed_iscsi.h index dec2b00259d42..974cb8d26608c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_iscsi.h +++ b/drivers/net/ethernet/qlogic/qed/qed_iscsi.h @@ -39,11 +39,14 @@ void qed_iscsi_free(struct qed_hwfn *p_hwfn); * * @cdev: Qed dev pointer. * @stats: Points to struct that will be filled with statistics. + * @is_atomic: Hint from the caller - if the func can sleep or not. * + * Context: The function should not sleep in case is_atomic == true. * Return: Void. */ void qed_get_protocol_stats_iscsi(struct qed_dev *cdev, - struct qed_mcp_iscsi_stats *stats); + struct qed_mcp_iscsi_stats *stats, + bool is_atomic); #else /* IS_ENABLED(CONFIG_QED_ISCSI) */ static inline int qed_iscsi_alloc(struct qed_hwfn *p_hwfn) { @@ -56,7 +59,8 @@ static inline void qed_iscsi_free(struct qed_hwfn *p_hwfn) {}
static inline void qed_get_protocol_stats_iscsi(struct qed_dev *cdev, - struct qed_mcp_iscsi_stats *stats) {} + struct qed_mcp_iscsi_stats *stats, + bool is_atomic) {} #endif /* IS_ENABLED(CONFIG_QED_ISCSI) */
#endif diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c index 7776d3bdd459a..970b9aabbc3d7 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_l2.c +++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c @@ -1863,7 +1863,8 @@ static void __qed_get_vport_stats(struct qed_hwfn *p_hwfn, }
static void _qed_get_vport_stats(struct qed_dev *cdev, - struct qed_eth_stats *stats) + struct qed_eth_stats *stats, + bool is_atomic) { u8 fw_vport = 0; int i; @@ -1872,10 +1873,11 @@ static void _qed_get_vport_stats(struct qed_dev *cdev,
for_each_hwfn(cdev, i) { struct qed_hwfn *p_hwfn = &cdev->hwfns[i]; - struct qed_ptt *p_ptt = IS_PF(cdev) ? qed_ptt_acquire(p_hwfn) - : NULL; + struct qed_ptt *p_ptt; bool b_get_port_stats;
+ p_ptt = IS_PF(cdev) ? qed_ptt_acquire_context(p_hwfn, is_atomic) + : NULL; if (IS_PF(cdev)) { /* The main vport index is relative first */ if (qed_fw_vport(p_hwfn, 0, &fw_vport)) { @@ -1900,6 +1902,13 @@ static void _qed_get_vport_stats(struct qed_dev *cdev, }
void qed_get_vport_stats(struct qed_dev *cdev, struct qed_eth_stats *stats) +{ + qed_get_vport_stats_context(cdev, stats, false); +} + +void qed_get_vport_stats_context(struct qed_dev *cdev, + struct qed_eth_stats *stats, + bool is_atomic) { u32 i;
@@ -1908,7 +1917,7 @@ void qed_get_vport_stats(struct qed_dev *cdev, struct qed_eth_stats *stats) return; }
- _qed_get_vport_stats(cdev, stats); + _qed_get_vport_stats(cdev, stats, is_atomic);
if (!cdev->reset_stats) return; @@ -1960,7 +1969,7 @@ void qed_reset_vport_stats(struct qed_dev *cdev) if (!cdev->reset_stats) { DP_INFO(cdev, "Reset stats not allocated\n"); } else { - _qed_get_vport_stats(cdev, cdev->reset_stats); + _qed_get_vport_stats(cdev, cdev->reset_stats, false); cdev->reset_stats->common.link_change_count = 0; } } diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.h b/drivers/net/ethernet/qlogic/qed/qed_l2.h index a538cf478c14e..2d2f82c785ad2 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_l2.h +++ b/drivers/net/ethernet/qlogic/qed/qed_l2.h @@ -249,8 +249,32 @@ qed_sp_eth_rx_queues_update(struct qed_hwfn *p_hwfn, enum spq_mode comp_mode, struct qed_spq_comp_cb *p_comp_data);
+/** + * qed_get_vport_stats(): Fills provided statistics + * struct with statistics. + * + * @cdev: Qed dev pointer. + * @stats: Points to struct that will be filled with statistics. + * + * Return: Void. + */ void qed_get_vport_stats(struct qed_dev *cdev, struct qed_eth_stats *stats);
+/** + * qed_get_vport_stats_context(): Fills provided statistics + * struct with statistics. + * + * @cdev: Qed dev pointer. + * @stats: Points to struct that will be filled with statistics. + * @is_atomic: Hint from the caller - if the func can sleep or not. + * + * Context: The function should not sleep in case is_atomic == true. + * Return: Void. + */ +void qed_get_vport_stats_context(struct qed_dev *cdev, + struct qed_eth_stats *stats, + bool is_atomic); + void qed_reset_vport_stats(struct qed_dev *cdev);
/** diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index c91898be7c030..25d9c254288b5 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -3101,7 +3101,7 @@ void qed_get_protocol_stats(struct qed_dev *cdev,
switch (type) { case QED_MCP_LAN_STATS: - qed_get_vport_stats(cdev, ð_stats); + qed_get_vport_stats_context(cdev, ð_stats, true); stats->lan_stats.ucast_rx_pkts = eth_stats.common.rx_ucast_pkts; stats->lan_stats.ucast_tx_pkts = @@ -3109,10 +3109,10 @@ void qed_get_protocol_stats(struct qed_dev *cdev, stats->lan_stats.fcs_err = -1; break; case QED_MCP_FCOE_STATS: - qed_get_protocol_stats_fcoe(cdev, &stats->fcoe_stats); + qed_get_protocol_stats_fcoe(cdev, &stats->fcoe_stats, true); break; case QED_MCP_ISCSI_STATS: - qed_get_protocol_stats_iscsi(cdev, &stats->iscsi_stats); + qed_get_protocol_stats_iscsi(cdev, &stats->iscsi_stats, true); break; default: DP_VERBOSE(cdev, QED_MSG_SP,
From: Eric Dumazet edumazet@google.com
[ Upstream commit fe11fdcb4207907d80cda2e73777465d68131e66 ]
sk_getsockopt() runs locklessly. This means sk->sk_reserved_mem can be read while other threads are changing its value.
Add missing annotations where they are needed.
Fixes: 2bb2f5fb21b0 ("net: add new socket option SO_RESERVE_MEM") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Wei Wang weiwan@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 0c1baa5517f11..9483820833c5b 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -996,7 +996,7 @@ static void sock_release_reserved_memory(struct sock *sk, int bytes) bytes = round_down(bytes, PAGE_SIZE);
WARN_ON(bytes > sk->sk_reserved_mem); - sk->sk_reserved_mem -= bytes; + WRITE_ONCE(sk->sk_reserved_mem, sk->sk_reserved_mem - bytes); sk_mem_reclaim(sk); }
@@ -1033,7 +1033,8 @@ static int sock_reserve_memory(struct sock *sk, int bytes) } sk->sk_forward_alloc += pages << PAGE_SHIFT;
- sk->sk_reserved_mem += pages << PAGE_SHIFT; + WRITE_ONCE(sk->sk_reserved_mem, + sk->sk_reserved_mem + (pages << PAGE_SHIFT));
return 0; } @@ -1922,7 +1923,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, break;
case SO_RESERVE_MEM: - v.val = sk->sk_reserved_mem; + v.val = READ_ONCE(sk->sk_reserved_mem); break;
case SO_TXREHASH:
From: Eric Dumazet edumazet@google.com
[ Upstream commit c76a0328899bbe226f8adeb88b8da9e4167bd316 ]
sk_getsockopt() runs locklessly. This means sk->sk_txrehash can be read while other threads are changing its value.
Other locations were handled in commit cb6cd2cec799 ("tcp: Change SYN ACK retransmit behaviour to account for rehash")
Fixes: 26859240e4ee ("txhash: Add socket option to control TX hash rethink behavior") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Akhmat Karakotov hmukos@yandex-team.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 9483820833c5b..77abd69c56dde 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1521,7 +1521,9 @@ int sk_setsockopt(struct sock *sk, int level, int optname, } if ((u8)val == SOCK_TXREHASH_DEFAULT) val = READ_ONCE(sock_net(sk)->core.sysctl_txrehash); - /* Paired with READ_ONCE() in tcp_rtx_synack() */ + /* Paired with READ_ONCE() in tcp_rtx_synack() + * and sk_getsockopt(). + */ WRITE_ONCE(sk->sk_txrehash, (u8)val); break;
@@ -1927,7 +1929,8 @@ int sk_getsockopt(struct sock *sk, int level, int optname, break;
case SO_TXREHASH: - v.val = sk->sk_txrehash; + /* Paired with WRITE_ONCE() in sk_setsockopt() */ + v.val = READ_ONCE(sk->sk_txrehash); break;
default:
From: Eric Dumazet edumazet@google.com
[ Upstream commit ea7f45ef77b39e72244d282e47f6cb1ef4135cd2 ]
sk_getsockopt() runs locklessly. This means sk->sk_max_pacing_rate can be read while other threads are changing its value.
Fixes: 62748f32d501 ("net: introduce SO_MAX_PACING_RATE") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 77abd69c56dde..86e88de1238b1 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1426,7 +1426,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname, cmpxchg(&sk->sk_pacing_status, SK_PACING_NONE, SK_PACING_NEEDED); - sk->sk_max_pacing_rate = ulval; + /* Pairs with READ_ONCE() from sk_getsockopt() */ + WRITE_ONCE(sk->sk_max_pacing_rate, ulval); sk->sk_pacing_rate = min(sk->sk_pacing_rate, ulval); break; } @@ -1852,12 +1853,14 @@ int sk_getsockopt(struct sock *sk, int level, int optname, #endif
case SO_MAX_PACING_RATE: + /* The READ_ONCE() pair with the WRITE_ONCE() in sk_setsockopt() */ if (sizeof(v.ulval) != sizeof(v.val) && len >= sizeof(v.ulval)) { lv = sizeof(v.ulval); - v.ulval = sk->sk_max_pacing_rate; + v.ulval = READ_ONCE(sk->sk_max_pacing_rate); } else { /* 32bit version */ - v.val = min_t(unsigned long, sk->sk_max_pacing_rate, ~0U); + v.val = min_t(unsigned long, ~0U, + READ_ONCE(sk->sk_max_pacing_rate)); } break;
From: Eric Dumazet edumazet@google.com
[ Upstream commit e6d12bdb435d23ff6c1890c852d85408a2f496ee ]
In a prior commit, I forgot to change sk_getsockopt() when reading sk->sk_rcvlowat locklessly.
Fixes: eac66402d1c3 ("net: annotate sk->sk_rcvlowat lockless reads") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 86e88de1238b1..1b1fe67b94d4f 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1717,7 +1717,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, break;
case SO_RCVLOWAT: - v.val = sk->sk_rcvlowat; + v.val = READ_ONCE(sk->sk_rcvlowat); break;
case SO_SNDLOWAT:
From: Eric Dumazet edumazet@google.com
[ Upstream commit 74bc084327c643499474ba75df485607da37dd6e ]
In a prior commit, I forgot to change sk_getsockopt() when reading sk->sk_sndbuf locklessly.
Fixes: e292f05e0df7 ("tcp: annotate sk->sk_sndbuf lockless reads") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 1b1fe67b94d4f..04306ccdf9081 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1624,7 +1624,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, break;
case SO_SNDBUF: - v.val = sk->sk_sndbuf; + v.val = READ_ONCE(sk->sk_sndbuf); break;
case SO_RCVBUF:
From: Eric Dumazet edumazet@google.com
[ Upstream commit b4b553253091cafe9ec38994acf42795e073bef5 ]
In a prior commit, I forgot to change sk_getsockopt() when reading sk->sk_rcvbuf locklessly.
Fixes: ebb3b78db7bf ("tcp: annotate sk->sk_rcvbuf lockless reads") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 04306ccdf9081..1a2ec2c4cfe26 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1628,7 +1628,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, break;
case SO_RCVBUF: - v.val = sk->sk_rcvbuf; + v.val = READ_ONCE(sk->sk_rcvbuf); break;
case SO_REUSEADDR:
From: Eric Dumazet edumazet@google.com
[ Upstream commit 3c5b4d69c358a9275a8de98f87caf6eda644b086 ]
sk->sk_mark is often read while another thread could change the value.
Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/inet_sock.h | 7 ++++--- include/net/ip.h | 2 +- include/net/route.h | 4 ++-- net/core/sock.c | 4 ++-- net/dccp/ipv6.c | 4 ++-- net/ipv4/inet_diag.c | 4 ++-- net/ipv4/ip_output.c | 4 ++-- net/ipv4/route.c | 4 ++-- net/ipv4/tcp_ipv4.c | 2 +- net/ipv6/ping.c | 2 +- net/ipv6/raw.c | 4 ++-- net/ipv6/route.c | 7 ++++--- net/ipv6/tcp_ipv6.c | 6 +++--- net/ipv6/udp.c | 4 ++-- net/l2tp/l2tp_ip6.c | 2 +- net/mptcp/sockopt.c | 2 +- net/netfilter/nft_socket.c | 2 +- net/netfilter/xt_socket.c | 4 ++-- net/packet/af_packet.c | 6 +++--- net/smc/af_smc.c | 2 +- net/xdp/xsk.c | 2 +- net/xfrm/xfrm_policy.c | 2 +- 22 files changed, 41 insertions(+), 39 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 51857117ac099..c8ef3b881f03d 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -107,11 +107,12 @@ static inline struct inet_request_sock *inet_rsk(const struct request_sock *sk)
static inline u32 inet_request_mark(const struct sock *sk, struct sk_buff *skb) { - if (!sk->sk_mark && - READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_fwmark_accept)) + u32 mark = READ_ONCE(sk->sk_mark); + + if (!mark && READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_fwmark_accept)) return skb->mark;
- return sk->sk_mark; + return mark; }
static inline int inet_request_bound_dev_if(const struct sock *sk, diff --git a/include/net/ip.h b/include/net/ip.h index 83a1a9bc3ceb1..530e7257e4389 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -93,7 +93,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm, { ipcm_init(ipcm);
- ipcm->sockc.mark = inet->sk.sk_mark; + ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark); ipcm->sockc.tsflags = inet->sk.sk_tsflags; ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if); ipcm->addr = inet->inet_saddr; diff --git a/include/net/route.h b/include/net/route.h index fe00b0a2e4759..af8431b25f800 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -171,7 +171,7 @@ static inline struct rtable *ip_route_output_ports(struct net *net, struct flowi __be16 dport, __be16 sport, __u8 proto, __u8 tos, int oif) { - flowi4_init_output(fl4, oif, sk ? sk->sk_mark : 0, tos, + flowi4_init_output(fl4, oif, sk ? READ_ONCE(sk->sk_mark) : 0, tos, RT_SCOPE_UNIVERSE, proto, sk ? inet_sk_flowi_flags(sk) : 0, daddr, saddr, dport, sport, sock_net_uid(net, sk)); @@ -304,7 +304,7 @@ static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst, if (inet_sk(sk)->transparent) flow_flags |= FLOWI_FLAG_ANYSRC;
- flowi4_init_output(fl4, oif, sk->sk_mark, ip_sock_rt_tos(sk), + flowi4_init_output(fl4, oif, READ_ONCE(sk->sk_mark), ip_sock_rt_tos(sk), ip_sock_rt_scope(sk), protocol, flow_flags, dst, src, dport, sport, sk->sk_uid); } diff --git a/net/core/sock.c b/net/core/sock.c index 1a2ec2c4cfe26..4dd13d34e4740 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -977,7 +977,7 @@ EXPORT_SYMBOL(sock_set_rcvbuf); static void __sock_set_mark(struct sock *sk, u32 val) { if (val != sk->sk_mark) { - sk->sk_mark = val; + WRITE_ONCE(sk->sk_mark, val); sk_dst_reset(sk); } } @@ -1796,7 +1796,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, return security_socket_getpeersec_stream(sock, optval.user, optlen.user, len);
case SO_MARK: - v.val = sk->sk_mark; + v.val = READ_ONCE(sk->sk_mark); break;
case SO_RCVMARK: diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index c0fd8f5f3b94e..b51ce6f8ceba0 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -237,8 +237,8 @@ static int dccp_v6_send_response(const struct sock *sk, struct request_sock *req opt = ireq->ipv6_opt; if (!opt) opt = rcu_dereference(np->opt); - err = ip6_xmit(sk, skb, &fl6, sk->sk_mark, opt, np->tclass, - sk->sk_priority); + err = ip6_xmit(sk, skb, &fl6, READ_ONCE(sk->sk_mark), opt, + np->tclass, sk->sk_priority); rcu_read_unlock(); err = net_xmit_eval(err); } diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index b812eb36f0e36..f7426926a1041 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -150,7 +150,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb, } #endif
- if (net_admin && nla_put_u32(skb, INET_DIAG_MARK, sk->sk_mark)) + if (net_admin && nla_put_u32(skb, INET_DIAG_MARK, READ_ONCE(sk->sk_mark))) goto errout;
if (ext & (1 << (INET_DIAG_CLASS_ID - 1)) || @@ -799,7 +799,7 @@ int inet_diag_bc_sk(const struct nlattr *bc, struct sock *sk) entry.ifindex = sk->sk_bound_dev_if; entry.userlocks = sk_fullsock(sk) ? sk->sk_userlocks : 0; if (sk_fullsock(sk)) - entry.mark = sk->sk_mark; + entry.mark = READ_ONCE(sk->sk_mark); else if (sk->sk_state == TCP_NEW_SYN_RECV) entry.mark = inet_rsk(inet_reqsk(sk))->ir_mark; else if (sk->sk_state == TCP_TIME_WAIT) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 7b4ab545c06e0..99d8cdbfd9ab5 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -184,7 +184,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk,
skb->priority = sk->sk_priority; if (!skb->mark) - skb->mark = sk->sk_mark; + skb->mark = READ_ONCE(sk->sk_mark);
/* Send it out. */ return ip_local_out(net, skb->sk, skb); @@ -527,7 +527,7 @@ int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,
/* TODO : should we use skb->sk here instead of sk ? */ skb->priority = sk->sk_priority; - skb->mark = sk->sk_mark; + skb->mark = READ_ONCE(sk->sk_mark);
res = ip_local_out(net, sk, skb); rcu_read_unlock(); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index cd1fa9f70f1a1..51bd9a50a1d1d 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -518,7 +518,7 @@ static void __build_flow_key(const struct net *net, struct flowi4 *fl4, const struct inet_sock *inet = inet_sk(sk);
oif = sk->sk_bound_dev_if; - mark = sk->sk_mark; + mark = READ_ONCE(sk->sk_mark); tos = ip_sock_rt_tos(sk); scope = ip_sock_rt_scope(sk); prot = inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol; @@ -552,7 +552,7 @@ static void build_sk_flow_key(struct flowi4 *fl4, const struct sock *sk) inet_opt = rcu_dereference(inet->inet_opt); if (inet_opt && inet_opt->opt.srr) daddr = inet_opt->opt.faddr; - flowi4_init_output(fl4, sk->sk_bound_dev_if, sk->sk_mark, + flowi4_init_output(fl4, sk->sk_bound_dev_if, READ_ONCE(sk->sk_mark), ip_sock_rt_tos(sk) & IPTOS_RT_MASK, ip_sock_rt_scope(sk), inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 9a8d59e9303a0..23b4f93afb28d 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -931,7 +931,7 @@ static void tcp_v4_send_ack(const struct sock *sk, ctl_sk = this_cpu_read(ipv4_tcp_sk); sock_net_set(ctl_sk, net); ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ? - inet_twsk(sk)->tw_mark : sk->sk_mark; + inet_twsk(sk)->tw_mark : READ_ONCE(sk->sk_mark); ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ? inet_twsk(sk)->tw_priority : sk->sk_priority; transmit_time = tcp_transmit_time(sk); diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c index 4651aaf70db4f..4d5a27dd9a4b2 100644 --- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -120,7 +120,7 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
ipcm6_init_sk(&ipc6, np); ipc6.sockc.tsflags = sk->sk_tsflags; - ipc6.sockc.mark = sk->sk_mark; + ipc6.sockc.mark = READ_ONCE(sk->sk_mark);
fl6.flowi6_oif = oif;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 33852fc38ad91..e8675e5b5d00b 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -772,12 +772,12 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) */ memset(&fl6, 0, sizeof(fl6));
- fl6.flowi6_mark = sk->sk_mark; + fl6.flowi6_mark = READ_ONCE(sk->sk_mark); fl6.flowi6_uid = sk->sk_uid;
ipcm6_init(&ipc6); ipc6.sockc.tsflags = sk->sk_tsflags; - ipc6.sockc.mark = sk->sk_mark; + ipc6.sockc.mark = fl6.flowi6_mark;
if (sin6) { if (addr_len < SIN6_LEN_RFC2133) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 0b060cb8681f0..960ab43a49c46 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2952,7 +2952,8 @@ void ip6_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, __be32 mtu) if (!oif && skb->dev) oif = l3mdev_master_ifindex(skb->dev);
- ip6_update_pmtu(skb, sock_net(sk), mtu, oif, sk->sk_mark, sk->sk_uid); + ip6_update_pmtu(skb, sock_net(sk), mtu, oif, READ_ONCE(sk->sk_mark), + sk->sk_uid);
dst = __sk_dst_get(sk); if (!dst || !dst->obsolete || @@ -3173,8 +3174,8 @@ void ip6_redirect_no_header(struct sk_buff *skb, struct net *net, int oif)
void ip6_sk_redirect(struct sk_buff *skb, struct sock *sk) { - ip6_redirect(skb, sock_net(sk), sk->sk_bound_dev_if, sk->sk_mark, - sk->sk_uid); + ip6_redirect(skb, sock_net(sk), sk->sk_bound_dev_if, + READ_ONCE(sk->sk_mark), sk->sk_uid); } EXPORT_SYMBOL_GPL(ip6_sk_redirect);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index d9253aa764fae..039aa51390aed 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -567,8 +567,8 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst, opt = ireq->ipv6_opt; if (!opt) opt = rcu_dereference(np->opt); - err = ip6_xmit(sk, skb, fl6, skb->mark ? : sk->sk_mark, opt, - tclass, sk->sk_priority); + err = ip6_xmit(sk, skb, fl6, skb->mark ? : READ_ONCE(sk->sk_mark), + opt, tclass, sk->sk_priority); rcu_read_unlock(); err = net_xmit_eval(err); } @@ -943,7 +943,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 if (sk->sk_state == TCP_TIME_WAIT) mark = inet_twsk(sk)->tw_mark; else - mark = sk->sk_mark; + mark = READ_ONCE(sk->sk_mark); skb_set_delivery_time(buff, tcp_transmit_time(sk), true); } if (txhash) { diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 04f1d696503cd..27348172b25b9 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -622,7 +622,7 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt, if (type == NDISC_REDIRECT) { if (tunnel) { ip6_redirect(skb, sock_net(sk), inet6_iif(skb), - sk->sk_mark, sk->sk_uid); + READ_ONCE(sk->sk_mark), sk->sk_uid); } else { ip6_sk_redirect(skb, sk); } @@ -1350,7 +1350,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) ipcm6_init(&ipc6); ipc6.gso_size = READ_ONCE(up->gso_size); ipc6.sockc.tsflags = sk->sk_tsflags; - ipc6.sockc.mark = sk->sk_mark; + ipc6.sockc.mark = READ_ONCE(sk->sk_mark);
/* destination address check */ if (sin6) { diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c index 5137ea1861ce2..bce4132b0a5c8 100644 --- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -519,7 +519,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) /* Get and verify the address */ memset(&fl6, 0, sizeof(fl6));
- fl6.flowi6_mark = sk->sk_mark; + fl6.flowi6_mark = READ_ONCE(sk->sk_mark); fl6.flowi6_uid = sk->sk_uid;
ipcm6_init(&ipc6); diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index 696ba398d699a..937bd4c556151 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -102,7 +102,7 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in break; case SO_MARK: if (READ_ONCE(ssk->sk_mark) != sk->sk_mark) { - ssk->sk_mark = sk->sk_mark; + WRITE_ONCE(ssk->sk_mark, sk->sk_mark); sk_dst_reset(ssk); } break; diff --git a/net/netfilter/nft_socket.c b/net/netfilter/nft_socket.c index 49a5348a6a14f..777561b71fcbd 100644 --- a/net/netfilter/nft_socket.c +++ b/net/netfilter/nft_socket.c @@ -107,7 +107,7 @@ static void nft_socket_eval(const struct nft_expr *expr, break; case NFT_SOCKET_MARK: if (sk_fullsock(sk)) { - *dest = sk->sk_mark; + *dest = READ_ONCE(sk->sk_mark); } else { regs->verdict.code = NFT_BREAK; return; diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c index 7013f55f05d1e..76e01f292aaff 100644 --- a/net/netfilter/xt_socket.c +++ b/net/netfilter/xt_socket.c @@ -77,7 +77,7 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par,
if (info->flags & XT_SOCKET_RESTORESKMARK && !wildcard && transparent && sk_fullsock(sk)) - pskb->mark = sk->sk_mark; + pskb->mark = READ_ONCE(sk->sk_mark);
if (sk != skb->sk) sock_gen_put(sk); @@ -138,7 +138,7 @@ socket_mt6_v1_v2_v3(const struct sk_buff *skb, struct xt_action_param *par)
if (info->flags & XT_SOCKET_RESTORESKMARK && !wildcard && transparent && sk_fullsock(sk)) - pskb->mark = sk->sk_mark; + pskb->mark = READ_ONCE(sk->sk_mark);
if (sk != skb->sk) sock_gen_put(sk); diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 6ab9d5b543387..30a28c1ff928a 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2053,7 +2053,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg, skb->protocol = proto; skb->dev = dev; skb->priority = sk->sk_priority; - skb->mark = sk->sk_mark; + skb->mark = READ_ONCE(sk->sk_mark); skb->tstamp = sockc.transmit_time;
skb_setup_tx_timestamp(skb, sockc.tsflags); @@ -2576,7 +2576,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, skb->protocol = proto; skb->dev = dev; skb->priority = po->sk.sk_priority; - skb->mark = po->sk.sk_mark; + skb->mark = READ_ONCE(po->sk.sk_mark); skb->tstamp = sockc->transmit_time; skb_setup_tx_timestamp(skb, sockc->tsflags); skb_zcopy_set_nouarg(skb, ph.raw); @@ -2978,7 +2978,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) goto out_unlock;
sockcm_init(&sockc, sk); - sockc.mark = sk->sk_mark; + sockc.mark = READ_ONCE(sk->sk_mark); if (msg->msg_controllen) { err = sock_cmsg_send(sk, msg, &sockc); if (unlikely(err)) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 02d1daae77397..5ae0a54a823b5 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -447,7 +447,7 @@ static void smc_copy_sock_settings(struct sock *nsk, struct sock *osk, nsk->sk_rcvbuf = osk->sk_rcvbuf; nsk->sk_sndtimeo = osk->sk_sndtimeo; nsk->sk_rcvtimeo = osk->sk_rcvtimeo; - nsk->sk_mark = osk->sk_mark; + nsk->sk_mark = READ_ONCE(osk->sk_mark); nsk->sk_priority = osk->sk_priority; nsk->sk_rcvlowat = osk->sk_rcvlowat; nsk->sk_bound_dev_if = osk->sk_bound_dev_if; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 371d269d22fa0..22bf10ffbf2d1 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -504,7 +504,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
skb->dev = dev; skb->priority = xs->sk.sk_priority; - skb->mark = xs->sk.sk_mark; + skb->mark = READ_ONCE(xs->sk.sk_mark); skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; skb->destructor = xsk_destruct_skb;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 7b1b93584bdbe..e65de78cb61bf 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -2174,7 +2174,7 @@ static struct xfrm_policy *xfrm_sk_policy_lookup(const struct sock *sk, int dir,
match = xfrm_selector_match(&pol->selector, fl, family); if (match) { - if ((sk->sk_mark & pol->mark.m) != pol->mark.v || + if ((READ_ONCE(sk->sk_mark) & pol->mark.m) != pol->mark.v || pol->if_id != if_id) { pol = NULL; goto out;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 11695c6e966b0ec7ed1d16777d294cef865a5c91 ]
sk_getsockopt() runs locklessly, thus we need to annotate the read of sk->sk_peek_off.
While we are at it, add corresponding annotations to sk_set_peek_off() and unix_set_peek_off().
Fixes: b9bb53f3836f ("sock: convert sk_peek_offset functions to WRITE_ONCE") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 4 ++-- net/unix/af_unix.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 4dd13d34e4740..61bbe6263f98e 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1815,7 +1815,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, if (!sock->ops->set_peek_off) return -EOPNOTSUPP;
- v.val = sk->sk_peek_off; + v.val = READ_ONCE(sk->sk_peek_off); break; case SO_NOFCS: v.val = sock_flag(sk, SOCK_NOFCS); @@ -3119,7 +3119,7 @@ EXPORT_SYMBOL(__sk_mem_reclaim);
int sk_set_peek_off(struct sock *sk, int val) { - sk->sk_peek_off = val; + WRITE_ONCE(sk->sk_peek_off, val); return 0; } EXPORT_SYMBOL_GPL(sk_set_peek_off); diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 5b19b6c53a2cb..78fa620a63981 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -779,7 +779,7 @@ static int unix_set_peek_off(struct sock *sk, int val) if (mutex_lock_interruptible(&u->iolock)) return -EINTR;
- sk->sk_peek_off = val; + WRITE_ONCE(sk->sk_peek_off, val); mutex_unlock(&u->iolock);
return 0;
From: Eric Dumazet edumazet@google.com
[ Upstream commit e5f0d2dd3c2faa671711dac6d3ff3cef307bcfe3 ]
In a prior commit I forgot that sk_getsockopt() reads sk->sk_ll_usec without holding a lock.
Fixes: 0dbffbb5335a ("net: annotate data race around sk_ll_usec") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 61bbe6263f98e..ff52d51dfe2c5 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1845,7 +1845,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
#ifdef CONFIG_NET_RX_BUSY_POLL case SO_BUSY_POLL: - v.val = sk->sk_ll_usec; + v.val = READ_ONCE(sk->sk_ll_usec); break; case SO_PREFER_BUSY_POLL: v.val = READ_ONCE(sk->sk_prefer_busy_poll);
From: Eric Dumazet edumazet@google.com
[ Upstream commit 8bf43be799d4b242ea552a14db10456446be843e ]
sk_getsockopt() runs locklessly. This means sk->sk_priority can be read while other threads are changing its value.
Other reads also happen without socket lock being held.
Add missing annotations where needed.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 6 +++--- net/ipv4/ip_output.c | 4 ++-- net/ipv4/ip_sockglue.c | 2 +- net/ipv4/raw.c | 2 +- net/ipv4/tcp_ipv4.c | 2 +- net/ipv6/raw.c | 2 +- net/ipv6/tcp_ipv6.c | 3 ++- net/packet/af_packet.c | 6 +++--- 8 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c index ff52d51dfe2c5..3b5304f084ef3 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -800,7 +800,7 @@ EXPORT_SYMBOL(sock_no_linger); void sock_set_priority(struct sock *sk, u32 priority) { lock_sock(sk); - sk->sk_priority = priority; + WRITE_ONCE(sk->sk_priority, priority); release_sock(sk); } EXPORT_SYMBOL(sock_set_priority); @@ -1203,7 +1203,7 @@ int sk_setsockopt(struct sock *sk, int level, int optname, if ((val >= 0 && val <= 6) || sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) || sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) - sk->sk_priority = val; + WRITE_ONCE(sk->sk_priority, val); else ret = -EPERM; break; @@ -1670,7 +1670,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, break;
case SO_PRIORITY: - v.val = sk->sk_priority; + v.val = READ_ONCE(sk->sk_priority); break;
case SO_LINGER: diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 99d8cdbfd9ab5..acfe58d2f1dd7 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -182,7 +182,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk, ip_options_build(skb, &opt->opt, daddr, rt); }
- skb->priority = sk->sk_priority; + skb->priority = READ_ONCE(sk->sk_priority); if (!skb->mark) skb->mark = READ_ONCE(sk->sk_mark);
@@ -526,7 +526,7 @@ int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl, skb_shinfo(skb)->gso_segs ?: 1);
/* TODO : should we use skb->sk here instead of sk ? */ - skb->priority = sk->sk_priority; + skb->priority = READ_ONCE(sk->sk_priority); skb->mark = READ_ONCE(sk->sk_mark);
res = ip_local_out(net, sk, skb); diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index a7fd035b5b4f9..63aa52becd880 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -591,7 +591,7 @@ void __ip_sock_set_tos(struct sock *sk, int val) } if (inet_sk(sk)->tos != val) { inet_sk(sk)->tos = val; - sk->sk_priority = rt_tos2priority(val); + WRITE_ONCE(sk->sk_priority, rt_tos2priority(val)); sk_dst_reset(sk); } } diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 86197634dcf5d..639aa5abda9dd 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -346,7 +346,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, goto error; skb_reserve(skb, hlen);
- skb->priority = sk->sk_priority; + skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; skb->tstamp = sockc->transmit_time; skb_dst_set(skb, &rt->dst); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 23b4f93afb28d..08921b96f9728 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -933,7 +933,7 @@ static void tcp_v4_send_ack(const struct sock *sk, ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ? inet_twsk(sk)->tw_mark : READ_ONCE(sk->sk_mark); ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ? - inet_twsk(sk)->tw_priority : sk->sk_priority; + inet_twsk(sk)->tw_priority : READ_ONCE(sk->sk_priority); transmit_time = tcp_transmit_time(sk); ip_send_unicast_reply(ctl_sk, skb, &TCP_SKB_CB(skb)->header.h4.opt, diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index e8675e5b5d00b..df3abd9e5237c 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -612,7 +612,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, skb_reserve(skb, hlen);
skb->protocol = htons(ETH_P_IPV6); - skb->priority = sk->sk_priority; + skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; skb->tstamp = sockc->transmit_time;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 039aa51390aed..4bdd356bb5c46 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1132,7 +1132,8 @@ static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, tcp_time_stamp_raw() + tcp_rsk(req)->ts_off, READ_ONCE(req->ts_recent), sk->sk_bound_dev_if, tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->saddr, l3index), - ipv6_get_dsfield(ipv6_hdr(skb)), 0, sk->sk_priority, + ipv6_get_dsfield(ipv6_hdr(skb)), 0, + READ_ONCE(sk->sk_priority), READ_ONCE(tcp_rsk(req)->txhash)); }
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 30a28c1ff928a..1681068400733 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2052,7 +2052,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
skb->protocol = proto; skb->dev = dev; - skb->priority = sk->sk_priority; + skb->priority = READ_ONCE(sk->sk_priority); skb->mark = READ_ONCE(sk->sk_mark); skb->tstamp = sockc.transmit_time;
@@ -2575,7 +2575,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
skb->protocol = proto; skb->dev = dev; - skb->priority = po->sk.sk_priority; + skb->priority = READ_ONCE(po->sk.sk_priority); skb->mark = READ_ONCE(po->sk.sk_mark); skb->tstamp = sockc->transmit_time; skb_setup_tx_timestamp(skb, sockc->tsflags); @@ -3052,7 +3052,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
skb->protocol = proto; skb->dev = dev; - skb->priority = sk->sk_priority; + skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc.mark; skb->tstamp = sockc.transmit_time;
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit e739718444f7bf2fa3d70d101761ad83056ca628 ]
syzkaller found zero division error [0] in div_s64_rem() called from get_cycle_time_elapsed(), where sched->cycle_time is the divisor.
We have tests in parse_taprio_schedule() so that cycle_time will never be 0, and actually cycle_time is not 0 in get_cycle_time_elapsed().
The problem is that the types of divisor are different; cycle_time is s64, but the argument of div_s64_rem() is s32.
syzkaller fed this input and 0x100000000 is cast to s32 to be 0.
@TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME={0xc, 0x8, 0x100000000}
We use s64 for cycle_time to cast it to ktime_t, so let's keep it and set max for cycle_time.
While at it, we prevent overflow in setup_txtime() and add another test in parse_taprio_schedule() to check if cycle_time overflows.
Also, we add a new tdc test case for this issue.
[0]: divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 PID: 103 Comm: kworker/1:3 Not tainted 6.5.0-rc1-00330-g60cc1f7d0605 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:div_s64_rem include/linux/math64.h:42 [inline] RIP: 0010:get_cycle_time_elapsed net/sched/sch_taprio.c:223 [inline] RIP: 0010:find_entry_to_transmit+0x252/0x7e0 net/sched/sch_taprio.c:344 Code: 3c 02 00 0f 85 5e 05 00 00 48 8b 4c 24 08 4d 8b bd 40 01 00 00 48 8b 7c 24 48 48 89 c8 4c 29 f8 48 63 f7 48 99 48 89 74 24 70 <48> f7 fe 48 29 d1 48 8d 04 0f 49 89 cc 48 89 44 24 20 49 8d 85 10 RSP: 0018:ffffc90000acf260 EFLAGS: 00010206 RAX: 177450e0347560cf RBX: 0000000000000000 RCX: 177450e0347560cf RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000100000000 RBP: 0000000000000056 R08: 0000000000000000 R09: ffffed10020a0934 R10: ffff8880105049a7 R11: ffff88806cf3a520 R12: ffff888010504800 R13: ffff88800c00d800 R14: ffff8880105049a0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0edf84f0e8 CR3: 000000000d73c002 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> get_packet_txtime net/sched/sch_taprio.c:508 [inline] taprio_enqueue_one+0x900/0xff0 net/sched/sch_taprio.c:577 taprio_enqueue+0x378/0xae0 net/sched/sch_taprio.c:658 dev_qdisc_enqueue+0x46/0x170 net/core/dev.c:3732 __dev_xmit_skb net/core/dev.c:3821 [inline] __dev_queue_xmit+0x1b2f/0x3000 net/core/dev.c:4169 dev_queue_xmit include/linux/netdevice.h:3088 [inline] neigh_resolve_output net/core/neighbour.c:1552 [inline] neigh_resolve_output+0x4a7/0x780 net/core/neighbour.c:1532 neigh_output include/net/neighbour.h:544 [inline] ip6_finish_output2+0x924/0x17d0 net/ipv6/ip6_output.c:135 __ip6_finish_output+0x620/0xaa0 net/ipv6/ip6_output.c:196 ip6_finish_output net/ipv6/ip6_output.c:207 [inline] NF_HOOK_COND include/linux/netfilter.h:292 [inline] ip6_output+0x206/0x410 net/ipv6/ip6_output.c:228 dst_output include/net/dst.h:458 [inline] NF_HOOK.constprop.0+0xea/0x260 include/linux/netfilter.h:303 ndisc_send_skb+0x872/0xe80 net/ipv6/ndisc.c:508 ndisc_send_ns+0xb5/0x130 net/ipv6/ndisc.c:666 addrconf_dad_work+0xc14/0x13f0 net/ipv6/addrconf.c:4175 process_one_work+0x92c/0x13a0 kernel/workqueue.c:2597 worker_thread+0x60f/0x1240 kernel/workqueue.c:2748 kthread+0x2fe/0x3f0 kernel/kthread.c:389 ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308 </TASK> Modules linked in:
Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode") Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Co-developed-by: Eric Dumazet edumazet@google.com Co-developed-by: Pedro Tammela pctammela@mojatatu.com Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_taprio.c | 15 +++++++++-- .../tc-testing/tc-tests/qdiscs/taprio.json | 25 +++++++++++++++++++ 2 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index a274a9332f333..8d5eebb2dd1b1 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -769,6 +769,11 @@ static const struct nla_policy taprio_tc_policy[TCA_TAPRIO_TC_ENTRY_MAX + 1] = { [TCA_TAPRIO_TC_ENTRY_MAX_SDU] = { .type = NLA_U32 }, };
+static struct netlink_range_validation_signed taprio_cycle_time_range = { + .min = 0, + .max = INT_MAX, +}; + static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = { [TCA_TAPRIO_ATTR_PRIOMAP] = { .len = sizeof(struct tc_mqprio_qopt) @@ -777,7 +782,8 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = { [TCA_TAPRIO_ATTR_SCHED_BASE_TIME] = { .type = NLA_S64 }, [TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY] = { .type = NLA_NESTED }, [TCA_TAPRIO_ATTR_SCHED_CLOCKID] = { .type = NLA_S32 }, - [TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME] = { .type = NLA_S64 }, + [TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME] = + NLA_POLICY_FULL_RANGE_SIGNED(NLA_S64, &taprio_cycle_time_range), [TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 }, [TCA_TAPRIO_ATTR_FLAGS] = { .type = NLA_U32 }, [TCA_TAPRIO_ATTR_TXTIME_DELAY] = { .type = NLA_U32 }, @@ -913,6 +919,11 @@ static int parse_taprio_schedule(struct taprio_sched *q, struct nlattr **tb, return -EINVAL; }
+ if (cycle < 0 || cycle > INT_MAX) { + NL_SET_ERR_MSG(extack, "'cycle_time' is too big"); + return -EINVAL; + } + new->cycle_time = cycle; }
@@ -1110,7 +1121,7 @@ static void setup_txtime(struct taprio_sched *q, struct sched_gate_list *sched, ktime_t base) { struct sched_entry *entry; - u32 interval = 0; + u64 interval = 0;
list_for_each_entry(entry, &sched->entries, list) { entry->next_txtime = ktime_add_ns(base, interval); diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json index a44455372646a..08d4861c2e782 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json +++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json @@ -131,5 +131,30 @@ "teardown": [ "echo "1" > /sys/bus/netdevsim/del_device" ] + }, + { + "id": "3e1e", + "name": "Add taprio Qdisc with an invalid cycle-time", + "category": [ + "qdisc", + "taprio" + ], + "plugins": { + "requires": "nsPlugin" + }, + "setup": [ + "echo "1 1 8" > /sys/bus/netdevsim/new_device", + "$TC qdisc add dev $ETH root handle 1: taprio num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@0 1@0 base-time 1000000000 sched-entry S 01 300000 flags 0x1 clockid CLOCK_TAI cycle-time 4294967296 || /bin/true", + "$IP link set dev $ETH up", + "$IP addr add 10.10.10.10/24 dev $ETH" + ], + "cmdUnderTest": "/bin/true", + "expExitCode": "0", + "verifyCmd": "$TC qdisc show dev $ETH", + "matchPattern": "qdisc taprio 1: root refcnt", + "matchCount": "0", + "teardown": [ + "echo "1" > /sys/bus/netdevsim/del_device" + ] } ]
From: Rafal Rogalski rafalx.rogalski@intel.com
[ Upstream commit 4b31fd4d77ffa430d0b74ba1885ea0a41594f202 ]
During qdisc create/delete, it is necessary to rebuild the queue of VSIs. An error occurred because the VSIs created by RDMA were still active.
Added check if RDMA is active. If yes, it disallows qdisc changes and writes a message in the system logs.
Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Rafal Rogalski rafalx.rogalski@intel.com Signed-off-by: Mateusz Palczewski mateusz.palczewski@intel.com Signed-off-by: Kamil Maziarz kamil.maziarz@intel.com Tested-by: Bharathi Sreenivas bharathi.sreenivas@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/20230728171243.2446101-1-anthony.l.nguyen@intel.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_main.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 8f77088900e94..a771e597795d3 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -8777,6 +8777,7 @@ ice_setup_tc(struct net_device *netdev, enum tc_setup_type type, { struct ice_netdev_priv *np = netdev_priv(netdev); struct ice_pf *pf = np->vsi->back; + bool locked = false; int err;
switch (type) { @@ -8786,10 +8787,27 @@ ice_setup_tc(struct net_device *netdev, enum tc_setup_type type, ice_setup_tc_block_cb, np, np, true); case TC_SETUP_QDISC_MQPRIO: + if (pf->adev) { + mutex_lock(&pf->adev_mutex); + device_lock(&pf->adev->dev); + locked = true; + if (pf->adev->dev.driver) { + netdev_err(netdev, "Cannot change qdisc when RDMA is active\n"); + err = -EBUSY; + goto adev_unlock; + } + } + /* setup traffic classifier for receive side */ mutex_lock(&pf->tc_mutex); err = ice_setup_tc_mqprio_qdisc(netdev, type_data); mutex_unlock(&pf->tc_mutex); + +adev_unlock: + if (locked) { + device_unlock(&pf->adev->dev); + mutex_unlock(&pf->adev_mutex); + } return err; default: return -EOPNOTSUPP;
From: Hou Tao houtao1@huawei.com
[ Upstream commit 7c62b75cd1a792e14b037fa4f61f9b18914e7de1 ]
The following warning was reported when running xdp_redirect_cpu with both skb-mode and stress-mode enabled:
------------[ cut here ]------------ Incorrect XDP memory type (-2128176192) usage WARNING: CPU: 7 PID: 1442 at net/core/xdp.c:405 Modules linked in: CPU: 7 PID: 1442 Comm: kworker/7:0 Tainted: G 6.5.0-rc2+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Workqueue: events __cpu_map_entry_free RIP: 0010:__xdp_return+0x1e4/0x4a0 ...... Call Trace: <TASK> ? show_regs+0x65/0x70 ? __warn+0xa5/0x240 ? __xdp_return+0x1e4/0x4a0 ...... xdp_return_frame+0x4d/0x150 __cpu_map_entry_free+0xf9/0x230 process_one_work+0x6b0/0xb80 worker_thread+0x96/0x720 kthread+0x1a5/0x1f0 ret_from_fork+0x3a/0x70 ret_from_fork_asm+0x1b/0x30 </TASK>
The reason for the warning is twofold. One is due to the kthread cpu_map_kthread_run() is stopped prematurely. Another one is __cpu_map_ring_cleanup() doesn't handle skb mode and treats skbs in ptr_ring as XDP frames.
Prematurely-stopped kthread will be fixed by the preceding patch and ptr_ring will be empty when __cpu_map_ring_cleanup() is called. But as the comments in __cpu_map_ring_cleanup() said, handling and freeing skbs in ptr_ring as well to "catch any broken behaviour gracefully".
Fixes: 11941f8a8536 ("bpf: cpumap: Implement generic cpumap") Signed-off-by: Hou Tao houtao1@huawei.com Acked-by: Jesper Dangaard Brouer hawk@kernel.org Link: https://lore.kernel.org/r/20230729095107.1722450-3-houtao@huaweicloud.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/cpumap.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 09141351d5457..e5888d401d799 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -134,11 +134,17 @@ static void __cpu_map_ring_cleanup(struct ptr_ring *ring) * invoked cpu_map_kthread_stop(). Catch any broken behaviour * gracefully and warn once. */ - struct xdp_frame *xdpf; + void *ptr;
- while ((xdpf = ptr_ring_consume(ring))) - if (WARN_ON_ONCE(xdpf)) - xdp_return_frame(xdpf); + while ((ptr = ptr_ring_consume(ring))) { + WARN_ON_ONCE(1); + if (unlikely(__ptr_test_bit(0, &ptr))) { + __ptr_clear_bit(0, &ptr); + kfree_skb(ptr); + continue; + } + xdp_return_frame(ptr); + } }
static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
From: valis sec@valis.email
[ Upstream commit 3044b16e7c6fe5d24b1cdbcf1bd0a9d92d1ebd81 ]
When u32_change() is called on an existing filter, the whole tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class, as tcf_unbind_filter() is always called on the old instance in the success path, decreasing filter_cnt of the still referenced class and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: de5df63228fc ("net: sched: cls_u32 changes to knode must appear atomic to readers") Reported-by: valis sec@valis.email Reported-by: M A Ramdhan ramdhan@starlabs.sg Signed-off-by: valis sec@valis.email Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Reviewed-by: Victor Nogueira victor@mojatatu.com Reviewed-by: Pedro Tammela pctammela@mojatatu.com Reviewed-by: M A Ramdhan ramdhan@starlabs.sg Link: https://lore.kernel.org/r/20230729123202.72406-2-jhs@mojatatu.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_u32.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 0e3bb1d65be1c..ba93e2a6bdbb4 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -824,7 +824,6 @@ static struct tc_u_knode *u32_init_knode(struct net *net, struct tcf_proto *tp,
new->ifindex = n->ifindex; new->fshift = n->fshift; - new->res = n->res; new->flags = n->flags; RCU_INIT_POINTER(new->ht_down, ht);
From: valis sec@valis.email
[ Upstream commit 76e42ae831991c828cffa8c37736ebfb831ad5ec ]
When fw_change() is called on an existing filter, the whole tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class, as tcf_unbind_filter() is always called on the old instance in the success path, decreasing filter_cnt of the still referenced class and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: e35a8ee5993b ("net: sched: fw use RCU") Reported-by: valis sec@valis.email Reported-by: Bing-Jhong Billy Jheng billy@starlabs.sg Signed-off-by: valis sec@valis.email Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Reviewed-by: Victor Nogueira victor@mojatatu.com Reviewed-by: Pedro Tammela pctammela@mojatatu.com Reviewed-by: M A Ramdhan ramdhan@starlabs.sg Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_fw.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c index 1212b057b129c..6160ef7d646ac 100644 --- a/net/sched/cls_fw.c +++ b/net/sched/cls_fw.c @@ -265,7 +265,6 @@ static int fw_change(struct net *net, struct sk_buff *in_skb, return -ENOBUFS;
fnew->id = f->id; - fnew->res = f->res; fnew->ifindex = f->ifindex; fnew->tp = f->tp;
From: valis sec@valis.email
[ Upstream commit b80b829e9e2c1b3f7aae34855e04d8f6ecaf13c8 ]
When route4_change() is called on an existing filter, the whole tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class, as tcf_unbind_filter() is always called on the old instance in the success path, decreasing filter_cnt of the still referenced class and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: 1109c00547fc ("net: sched: RCU cls_route") Reported-by: valis sec@valis.email Reported-by: Bing-Jhong Billy Jheng billy@starlabs.sg Signed-off-by: valis sec@valis.email Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Reviewed-by: Victor Nogueira victor@mojatatu.com Reviewed-by: Pedro Tammela pctammela@mojatatu.com Reviewed-by: M A Ramdhan ramdhan@starlabs.sg Link: https://lore.kernel.org/r/20230729123202.72406-4-jhs@mojatatu.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_route.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c index 9e43b929d4ca4..306188bf2d1ff 100644 --- a/net/sched/cls_route.c +++ b/net/sched/cls_route.c @@ -511,7 +511,6 @@ static int route4_change(struct net *net, struct sk_buff *in_skb, if (fold) { f->id = fold->id; f->iif = fold->iif; - f->res = fold->res; f->handle = fold->handle;
f->tp = fold->tp;
From: Tomas Glozar tglozar@redhat.com
[ Upstream commit 13d2618b48f15966d1adfe1ff6a1985f5eef40ba ]
Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC allocation later in sk_psock_init_link on PREEMPT_RT kernels, since GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist patchset notes for details).
This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to BUG (sleeping function called from invalid context) on RT kernels.
preempt_disable was introduced together with lock_sk and rcu_read_lock in commit 99ba2b5aba24e ("bpf: sockhash, disallow bpf_tcp_close and update in parallel"), probably to match disabled migration of BPF programs, and is no longer necessary.
Remove preempt_disable to fix BUG in sock_map_update_common on RT.
Signed-off-by: Tomas Glozar tglozar@redhat.com Reviewed-by: Jakub Sitnicki jakub@cloudflare.com Link: https://lore.kernel.org/all/20200224140131.461979697@linutronix.de/ Fixes: 99ba2b5aba24 ("bpf: sockhash, disallow bpf_tcp_close and update in parallel") Reviewed-by: John Fastabend john.fastabend@gmail.com Link: https://lore.kernel.org/r/20230728064411.305576-1-tglozar@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock_map.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c index d382672018928..c84e5073c0b66 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -117,7 +117,6 @@ static void sock_map_sk_acquire(struct sock *sk) __acquires(&sk->sk_lock.slock) { lock_sock(sk); - preempt_disable(); rcu_read_lock(); }
@@ -125,7 +124,6 @@ static void sock_map_sk_release(struct sock *sk) __releases(&sk->sk_lock.slock) { rcu_read_unlock(); - preempt_enable(); release_sock(sk); }
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit ef45e8400f5bb66b03cc949f76c80e2a118447de ]
Most kernel functions return negative error codes but some irq functions return zero on error. In this code irq_of_parse_and_map(), returns zero and platform_get_irq() returns negative error codes. We need to handle both cases appropriately.
Fixes: 8425c41d1ef7 ("net: ll_temac: Extend support to non-device-tree platforms") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Acked-by: Esben Haabendal esben@geanix.com Reviewed-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Harini Katakam harini.katakam@amd.com Link: https://lore.kernel.org/r/3d0aef75-06e0-45a5-a2a6-2cc4738d4143@moroto.mounta... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/xilinx/ll_temac_main.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/xilinx/ll_temac_main.c b/drivers/net/ethernet/xilinx/ll_temac_main.c index 1066420d6a83a..6bf5e341c3c11 100644 --- a/drivers/net/ethernet/xilinx/ll_temac_main.c +++ b/drivers/net/ethernet/xilinx/ll_temac_main.c @@ -1568,12 +1568,16 @@ static int temac_probe(struct platform_device *pdev) }
/* Error handle returned DMA RX and TX interrupts */ - if (lp->rx_irq < 0) - return dev_err_probe(&pdev->dev, lp->rx_irq, + if (lp->rx_irq <= 0) { + rc = lp->rx_irq ?: -EINVAL; + return dev_err_probe(&pdev->dev, rc, "could not get DMA RX irq\n"); - if (lp->tx_irq < 0) - return dev_err_probe(&pdev->dev, lp->tx_irq, + } + if (lp->tx_irq <= 0) { + rc = lp->tx_irq ?: -EINVAL; + return dev_err_probe(&pdev->dev, rc, "could not get DMA TX irq\n"); + }
if (temac_np) { /* Retrieve the MAC address */
From: Yuanjun Gong ruc_gongyuanjun@163.com
[ Upstream commit 0b6291ad1940c403734312d0e453e8dac9148f69 ]
in korina_probe(), the return value of clk_prepare_enable() should be checked since it might fail. we can use devm_clk_get_optional_enabled() instead of devm_clk_get_optional() and clk_prepare_enable() to automatically handle the error.
Fixes: e4cd854ec487 ("net: korina: Get mdio input clock via common clock framework") Signed-off-by: Yuanjun Gong ruc_gongyuanjun@163.com Link: https://lore.kernel.org/r/20230731090535.21416-1-ruc_gongyuanjun@163.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/korina.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c index 2b9335cb4bb3a..8537578e1cf1d 100644 --- a/drivers/net/ethernet/korina.c +++ b/drivers/net/ethernet/korina.c @@ -1302,11 +1302,10 @@ static int korina_probe(struct platform_device *pdev) else if (of_get_ethdev_address(pdev->dev.of_node, dev) < 0) eth_hw_addr_random(dev);
- clk = devm_clk_get_optional(&pdev->dev, "mdioclk"); + clk = devm_clk_get_optional_enabled(&pdev->dev, "mdioclk"); if (IS_ERR(clk)) return PTR_ERR(clk); if (clk) { - clk_prepare_enable(clk); lp->mii_clock_freq = clk_get_rate(clk); } else { lp->mii_clock_freq = 200000000; /* max possible input clk */
From: Mark Brown broonie@kernel.org
[ Upstream commit f3bb7759a924713bc54d15f6d0d70733b5935fad ]
As documented in acd7aaf51b20 ("netsec: ignore 'phy-mode' device property on ACPI systems") the SocioNext SynQuacer platform ships with firmware defining the PHY mode as RGMII even though the physical configuration of the PHY is for TX and RX delays. Since bbc4d71d63549bc ("net: phy: realtek: fix rtl8211e rx/tx delay config") this has caused misconfiguration of the PHY, rendering the network unusable.
This was worked around for ACPI by ignoring the phy-mode property but the system is also used with DT. For DT instead if we're running on a SynQuacer force a working PHY mode, as well as the standard EDK2 firmware with DT there are also some of these systems that use u-boot and might not initialise the PHY if not netbooting. Newer firmware imagaes for at least EDK2 are available from Linaro so print a warning when doing this.
Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver") Signed-off-by: Mark Brown broonie@kernel.org Acked-by: Ard Biesheuvel ardb@kernel.org Acked-by: Ilias Apalodimas ilias.apalodimas@linaro.org Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20230731-synquacer-net-v3-1-944be5f06428@kernel.or... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/socionext/netsec.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c index 9b46579b5a103..b130e978366c1 100644 --- a/drivers/net/ethernet/socionext/netsec.c +++ b/drivers/net/ethernet/socionext/netsec.c @@ -1851,6 +1851,17 @@ static int netsec_of_probe(struct platform_device *pdev, return err; }
+ /* + * SynQuacer is physically configured with TX and RX delays + * but the standard firmware claimed otherwise for a long + * time, ignore it. + */ + if (of_machine_is_compatible("socionext,developer-box") && + priv->phy_interface != PHY_INTERFACE_MODE_RGMII_ID) { + dev_warn(&pdev->dev, "Outdated firmware reports incorrect PHY mode, overriding\n"); + priv->phy_interface = PHY_INTERFACE_MODE_RGMII_ID; + } + priv->phy_np = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0); if (!priv->phy_np) { dev_err(&pdev->dev, "missing required property 'phy-handle'\n");
From: Somnath Kotur somnath.kotur@broadcom.com
[ Upstream commit f6974b4c2d8e1062b5a52228ee47293c15b4ee1e ]
The RXBD length field on all bnxt chips is 16-bit and so we cannot support a full page when the native page size is 64K or greater. The non-XDP (non page pool) code path has logic to handle this but the XDP page pool code path does not handle this. Add the missing logic to use page_pool_dev_alloc_frag() to allocate 32K chunks if the page size is 64K or greater.
Fixes: 9f4b28301ce6 ("bnxt: XDP multibuffer enablement") Link: https://lore.kernel.org/netdev/20230728231829.235716-2-michael.chan@broadcom... Reviewed-by: Andy Gospodarek andrew.gospodarek@broadcom.com Signed-off-by: Somnath Kotur somnath.kotur@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Link: https://lore.kernel.org/r/20230731142043.58855-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 42 ++++++++++++------- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 6 +-- 2 files changed, 29 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 6469fb8a42a89..9bd18c2b10bc6 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -721,17 +721,24 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping, struct bnxt_rx_ring_info *rxr, + unsigned int *offset, gfp_t gfp) { struct device *dev = &bp->pdev->dev; struct page *page;
- page = page_pool_dev_alloc_pages(rxr->page_pool); + if (PAGE_SIZE > BNXT_RX_PAGE_SIZE) { + page = page_pool_dev_alloc_frag(rxr->page_pool, offset, + BNXT_RX_PAGE_SIZE); + } else { + page = page_pool_dev_alloc_pages(rxr->page_pool); + *offset = 0; + } if (!page) return NULL;
- *mapping = dma_map_page_attrs(dev, page, 0, PAGE_SIZE, bp->rx_dir, - DMA_ATTR_WEAK_ORDERING); + *mapping = dma_map_page_attrs(dev, page, *offset, BNXT_RX_PAGE_SIZE, + bp->rx_dir, DMA_ATTR_WEAK_ORDERING); if (dma_mapping_error(dev, *mapping)) { page_pool_recycle_direct(rxr->page_pool, page); return NULL; @@ -771,15 +778,16 @@ int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, dma_addr_t mapping;
if (BNXT_RX_PAGE_MODE(bp)) { + unsigned int offset; struct page *page = - __bnxt_alloc_rx_page(bp, &mapping, rxr, gfp); + __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
if (!page) return -ENOMEM;
mapping += bp->rx_dma_offset; rx_buf->data = page; - rx_buf->data_ptr = page_address(page) + bp->rx_offset; + rx_buf->data_ptr = page_address(page) + offset + bp->rx_offset; } else { u8 *data = __bnxt_alloc_rx_frag(bp, &mapping, gfp);
@@ -839,7 +847,7 @@ static inline int bnxt_alloc_rx_page(struct bnxt *bp, unsigned int offset = 0;
if (BNXT_RX_PAGE_MODE(bp)) { - page = __bnxt_alloc_rx_page(bp, &mapping, rxr, gfp); + page = __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
if (!page) return -ENOMEM; @@ -986,15 +994,15 @@ static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp, return NULL; } dma_addr -= bp->rx_dma_offset; - dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, PAGE_SIZE, bp->rx_dir, - DMA_ATTR_WEAK_ORDERING); - skb = build_skb(page_address(page), PAGE_SIZE); + dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE, + bp->rx_dir, DMA_ATTR_WEAK_ORDERING); + skb = build_skb(data_ptr - bp->rx_offset, BNXT_RX_PAGE_SIZE); if (!skb) { page_pool_recycle_direct(rxr->page_pool, page); return NULL; } skb_mark_for_recycle(skb); - skb_reserve(skb, bp->rx_dma_offset); + skb_reserve(skb, bp->rx_offset); __skb_put(skb, len);
return skb; @@ -1020,8 +1028,8 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp, return NULL; } dma_addr -= bp->rx_dma_offset; - dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, PAGE_SIZE, bp->rx_dir, - DMA_ATTR_WEAK_ORDERING); + dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE, + bp->rx_dir, DMA_ATTR_WEAK_ORDERING);
if (unlikely(!payload)) payload = eth_get_headlen(bp->dev, data_ptr, len); @@ -1034,7 +1042,7 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
skb_mark_for_recycle(skb); off = (void *)data_ptr - page_address(page); - skb_add_rx_frag(skb, 0, page, off, len, PAGE_SIZE); + skb_add_rx_frag(skb, 0, page, off, len, BNXT_RX_PAGE_SIZE); memcpy(skb->data - NET_IP_ALIGN, data_ptr - NET_IP_ALIGN, payload + NET_IP_ALIGN);
@@ -1169,7 +1177,7 @@ static struct sk_buff *bnxt_rx_agg_pages_skb(struct bnxt *bp,
skb->data_len += total_frag_len; skb->len += total_frag_len; - skb->truesize += PAGE_SIZE * agg_bufs; + skb->truesize += BNXT_RX_PAGE_SIZE * agg_bufs; return skb; }
@@ -2972,8 +2980,8 @@ static void bnxt_free_one_rx_ring_skbs(struct bnxt *bp, int ring_nr) rx_buf->data = NULL; if (BNXT_RX_PAGE_MODE(bp)) { mapping -= bp->rx_dma_offset; - dma_unmap_page_attrs(&pdev->dev, mapping, PAGE_SIZE, - bp->rx_dir, + dma_unmap_page_attrs(&pdev->dev, mapping, + BNXT_RX_PAGE_SIZE, bp->rx_dir, DMA_ATTR_WEAK_ORDERING); page_pool_recycle_direct(rxr->page_pool, data); } else { @@ -3241,6 +3249,8 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp, pp.nid = dev_to_node(&bp->pdev->dev); pp.dev = &bp->pdev->dev; pp.dma_dir = DMA_BIDIRECTIONAL; + if (PAGE_SIZE > BNXT_RX_PAGE_SIZE) + pp.flags |= PP_FLAG_PAGE_FRAG;
rxr->page_pool = page_pool_create(&pp); if (IS_ERR(rxr->page_pool)) { diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 36d5202c0aeec..aa56db138d6b5 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -180,8 +180,8 @@ void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, u8 *data_ptr, unsigned int len, struct xdp_buff *xdp) { + u32 buflen = BNXT_RX_PAGE_SIZE; struct bnxt_sw_rx_bd *rx_buf; - u32 buflen = PAGE_SIZE; struct pci_dev *pdev; dma_addr_t mapping; u32 offset; @@ -297,7 +297,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, rx_buf = &rxr->rx_buf_ring[cons]; mapping = rx_buf->mapping - bp->rx_dma_offset; dma_unmap_page_attrs(&pdev->dev, mapping, - PAGE_SIZE, bp->rx_dir, + BNXT_RX_PAGE_SIZE, bp->rx_dir, DMA_ATTR_WEAK_ORDERING);
/* if we are unable to allocate a new buffer, abort and reuse */ @@ -478,7 +478,7 @@ bnxt_xdp_build_skb(struct bnxt *bp, struct sk_buff *skb, u8 num_frags, } xdp_update_skb_shared_info(skb, num_frags, sinfo->xdp_frags_size, - PAGE_SIZE * sinfo->nr_frags, + BNXT_RX_PAGE_SIZE * sinfo->nr_frags, xdp_buff_is_frag_pfmemalloc(xdp)); return skb; }
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit 08450ea98ae98d5a35145b675b76db616046ea11 ]
The existing code does not allow the MTU to be set to the maximum even after an XDP program supporting multiple buffers is attached. Fix it to set the netdev->max_mtu to the maximum value if the attached XDP program supports mutiple buffers, regardless of the current MTU value.
Also use a local variable dev instead of repeatedly using bp->dev.
Fixes: 1dc4c557bfed ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff") Reviewed-by: Somnath Kotur somnath.kotur@broadcom.com Reviewed-by: Ajit Khaparde ajit.khaparde@broadcom.com Reviewed-by: Andy Gospodarek andrew.gospodarek@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Link: https://lore.kernel.org/r/20230731142043.58855-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 9bd18c2b10bc6..969db3c45d176 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -4027,26 +4027,29 @@ void bnxt_set_ring_params(struct bnxt *bp) */ int bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode) { + struct net_device *dev = bp->dev; + if (page_mode) { bp->flags &= ~BNXT_FLAG_AGG_RINGS; bp->flags |= BNXT_FLAG_RX_PAGE_MODE;
- if (bp->dev->mtu > BNXT_MAX_PAGE_MODE_MTU) { + if (bp->xdp_prog->aux->xdp_has_frags) + dev->max_mtu = min_t(u16, bp->max_mtu, BNXT_MAX_MTU); + else + dev->max_mtu = + min_t(u16, bp->max_mtu, BNXT_MAX_PAGE_MODE_MTU); + if (dev->mtu > BNXT_MAX_PAGE_MODE_MTU) { bp->flags |= BNXT_FLAG_JUMBO; bp->rx_skb_func = bnxt_rx_multi_page_skb; - bp->dev->max_mtu = - min_t(u16, bp->max_mtu, BNXT_MAX_MTU); } else { bp->flags |= BNXT_FLAG_NO_AGG_RINGS; bp->rx_skb_func = bnxt_rx_page_skb; - bp->dev->max_mtu = - min_t(u16, bp->max_mtu, BNXT_MAX_PAGE_MODE_MTU); } bp->rx_dir = DMA_BIDIRECTIONAL; /* Disable LRO or GRO_HW */ - netdev_update_features(bp->dev); + netdev_update_features(dev); } else { - bp->dev->max_mtu = bp->max_mtu; + dev->max_mtu = bp->max_mtu; bp->flags &= ~BNXT_FLAG_RX_PAGE_MODE; bp->rx_dir = DMA_FROM_DEVICE; bp->rx_skb_func = bnxt_rx_skb;
From: Lin Ma linma@zju.edu.cn
[ Upstream commit 31d49ba033095f6e8158c60f69714a500922e0c3 ]
The dcbnl_bcn_setcfg uses erroneous policy to parse tb[DCB_ATTR_BCN], which is introduced in commit 859ee3c43812 ("DCB: Add support for DCB BCN"). Please see the comment in below code
static int dcbnl_bcn_setcfg(...) { ... ret = nla_parse_nested_deprecated(..., dcbnl_pfc_up_nest, .. ) // !!! dcbnl_pfc_up_nest for attributes // DCB_PFC_UP_ATTR_0 to DCB_PFC_UP_ATTR_ALL in enum dcbnl_pfc_up_attrs ... for (i = DCB_BCN_ATTR_RP_0; i <= DCB_BCN_ATTR_RP_7; i++) { // !!! DCB_BCN_ATTR_RP_0 to DCB_BCN_ATTR_RP_7 in enum dcbnl_bcn_attrs ... value_byte = nla_get_u8(data[i]); ... } ... for (i = DCB_BCN_ATTR_BCNA_0; i <= DCB_BCN_ATTR_RI; i++) { // !!! DCB_BCN_ATTR_BCNA_0 to DCB_BCN_ATTR_RI in enum dcbnl_bcn_attrs ... value_int = nla_get_u32(data[i]); ... } ... }
That is, the nla_parse_nested_deprecated uses dcbnl_pfc_up_nest attributes to parse nlattr defined in dcbnl_pfc_up_attrs. But the following access code fetch each nlattr as dcbnl_bcn_attrs attributes. By looking up the associated nla_policy for dcbnl_bcn_attrs. We can find the beginning part of these two policies are "same".
static const struct nla_policy dcbnl_pfc_up_nest[...] = { [DCB_PFC_UP_ATTR_0] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_1] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_2] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_3] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_4] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_5] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_6] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_7] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_ALL] = {.type = NLA_FLAG}, };
static const struct nla_policy dcbnl_bcn_nest[...] = { [DCB_BCN_ATTR_RP_0] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_1] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_2] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_3] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_4] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_5] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_6] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_7] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_ALL] = {.type = NLA_FLAG}, // from here is somewhat different [DCB_BCN_ATTR_BCNA_0] = {.type = NLA_U32}, ... [DCB_BCN_ATTR_ALL] = {.type = NLA_FLAG}, };
Therefore, the current code is buggy and this nla_parse_nested_deprecated could overflow the dcbnl_pfc_up_nest and use the adjacent nla_policy to parse attributes from DCB_BCN_ATTR_BCNA_0.
Hence use the correct policy dcbnl_bcn_nest to parse the nested tb[DCB_ATTR_BCN] TLV.
Fixes: 859ee3c43812 ("DCB: Add support for DCB BCN") Signed-off-by: Lin Ma linma@zju.edu.cn Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230801013248.87240-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/dcb/dcbnl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c index dc4fb699b56c3..d2981e89d3638 100644 --- a/net/dcb/dcbnl.c +++ b/net/dcb/dcbnl.c @@ -946,7 +946,7 @@ static int dcbnl_bcn_setcfg(struct net_device *netdev, struct nlmsghdr *nlh, return -EOPNOTSUPP;
ret = nla_parse_nested_deprecated(data, DCB_BCN_ATTR_MAX, - tb[DCB_ATTR_BCN], dcbnl_pfc_up_nest, + tb[DCB_ATTR_BCN], dcbnl_bcn_nest, NULL); if (ret) return ret;
From: Alexandra Winter wintera@linux.ibm.com
[ Upstream commit 1cfef80d4c2b2c599189f36f36320b205d9447d9 ]
dev_close() and dev_open() are issued to change the interface state to DOWN or UP (dev->flags IFF_UP). When the netdev is set DOWN it loses e.g its Ipv6 addresses and routes. We don't want this in cases of device recovery (triggered by hardware or software) or when the qeth device is set offline.
Setting a qeth device offline or online and device recovery actions call netif_device_detach() and/or netif_device_attach(). That will reset or set the LOWER_UP indication i.e. change the dev->state Bit __LINK_STATE_PRESENT. That is enough to e.g. cause bond failovers, and still preserves the interface settings that are handled by the network stack.
Don't call dev_open() nor dev_close() from the qeth device driver. Let the network stack handle this.
Fixes: d4560150cb47 ("s390/qeth: call dev_close() during recovery") Signed-off-by: Alexandra Winter wintera@linux.ibm.com Reviewed-by: Wenjia Zhang wenjia@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/net/qeth_core.h | 1 - drivers/s390/net/qeth_core_main.c | 2 -- drivers/s390/net/qeth_l2_main.c | 9 ++++++--- drivers/s390/net/qeth_l3_main.c | 8 +++++--- 4 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index 1d195429753dd..613eab7297046 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -716,7 +716,6 @@ struct qeth_card_info { u16 chid; u8 ids_valid:1; /* cssid,iid,chid */ u8 dev_addr_is_registered:1; - u8 open_when_online:1; u8 promisc_mode:1; u8 use_v1_blkt:1; u8 is_vm_nic:1; diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 8bd9fd51208c9..ae4b6d24bc902 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -5371,8 +5371,6 @@ int qeth_set_offline(struct qeth_card *card, const struct qeth_discipline *disc, qeth_clear_ipacmd_list(card);
rtnl_lock(); - card->info.open_when_online = card->dev->flags & IFF_UP; - dev_close(card->dev); netif_device_detach(card->dev); netif_carrier_off(card->dev); rtnl_unlock(); diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index c6ded3fdd715c..9ef2118fc7a2a 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -2387,9 +2387,12 @@ static int qeth_l2_set_online(struct qeth_card *card, bool carrier_ok) qeth_enable_hw_features(dev); qeth_l2_enable_brport_features(card);
- if (card->info.open_when_online) { - card->info.open_when_online = 0; - dev_open(dev, NULL); + if (netif_running(dev)) { + local_bh_disable(); + napi_schedule(&card->napi); + /* kick-start the NAPI softirq: */ + local_bh_enable(); + qeth_l2_set_rx_mode(dev); } rtnl_unlock(); } diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index d8487a10cd555..c0f30cefec102 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -2017,9 +2017,11 @@ static int qeth_l3_set_online(struct qeth_card *card, bool carrier_ok) netif_device_attach(dev); qeth_enable_hw_features(dev);
- if (card->info.open_when_online) { - card->info.open_when_online = 0; - dev_open(dev, NULL); + if (netif_running(dev)) { + local_bh_disable(); + napi_schedule(&card->napi); + /* kick-start the NAPI softirq: */ + local_bh_enable(); } rtnl_unlock(); }
From: Yue Haibing yuehaibing@huawei.com
[ Upstream commit 30e0191b16e8a58e4620fa3e2839ddc7b9d4281c ]
skbuff: skb_under_panic: text:ffffffff88771f69 len:56 put:-4 head:ffff88805f86a800 data:ffff887f5f86a850 tail:0x88 end:0x2c0 dev:pim6reg ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:192! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 2 PID: 22968 Comm: kworker/2:11 Not tainted 6.5.0-rc3-00044-g0a8db05b571a #236 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:skb_panic+0x152/0x1d0 Call Trace: <TASK> skb_push+0xc4/0xe0 ip6mr_cache_report+0xd69/0x19b0 reg_vif_xmit+0x406/0x690 dev_hard_start_xmit+0x17e/0x6e0 __dev_queue_xmit+0x2d6a/0x3d20 vlan_dev_hard_start_xmit+0x3ab/0x5c0 dev_hard_start_xmit+0x17e/0x6e0 __dev_queue_xmit+0x2d6a/0x3d20 neigh_connected_output+0x3ed/0x570 ip6_finish_output2+0x5b5/0x1950 ip6_finish_output+0x693/0x11c0 ip6_output+0x24b/0x880 NF_HOOK.constprop.0+0xfd/0x530 ndisc_send_skb+0x9db/0x1400 ndisc_send_rs+0x12a/0x6c0 addrconf_dad_completed+0x3c9/0xea0 addrconf_dad_work+0x849/0x1420 process_one_work+0xa22/0x16e0 worker_thread+0x679/0x10c0 ret_from_fork+0x28/0x60 ret_from_fork_asm+0x11/0x20
When setup a vlan device on dev pim6reg, DAD ns packet may sent on reg_vif_xmit(). reg_vif_xmit() ip6mr_cache_report() skb_push(skb, -skb_network_offset(pkt));//skb_network_offset(pkt) is 4 And skb_push declared as: void *skb_push(struct sk_buff *skb, unsigned int len); skb->data -= len; //0xffff88805f86a84c - 0xfffffffc = 0xffff887f5f86a850 skb->data is set to 0xffff887f5f86a850, which is invalid mem addr, lead to skb_push() fails.
Fixes: 14fb64e1f449 ("[IPV6] MROUTE: Support PIM-SM (SSM).") Signed-off-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6mr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index facdc78a43e5c..27fb5479988af 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -1073,7 +1073,7 @@ static int ip6mr_cache_report(const struct mr_table *mrt, struct sk_buff *pkt, And all this only to mangle msg->im6_msgtype and to set msg->im6_mbz to "mbz" :-) */ - skb_push(skb, -skb_network_offset(pkt)); + __skb_pull(skb, skb_network_offset(pkt));
skb_push(skb, sizeof(*msg)); skb_reset_transport_header(skb);
From: Benjamin Poirier bpoirier@nvidia.com
[ Upstream commit 0756384fb1bd38adb2ebcfd1307422f433a1d772 ]
The nexthop code expects a 31 bit hash, such as what is returned by fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash returned by skb_get_hash() can lead to problems related to the fact that 'int hash' is a negative number when the MSB is set.
In the case of hash threshold nexthop groups, nexthop_select_path_hthr() will disproportionately select the first nexthop group entry. In the case of resilient nexthop groups, nexthop_select_path_res() may do an out of bounds access in nh_buckets[], for example: hash = -912054133 num_nh_buckets = 2 bucket_index = 65535
which leads to the following panic:
BUG: unable to handle page fault for address: ffffc900025910c8 PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0 Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x1ee/0x5c0 ? __pfx_is_prefetch.constprop.0+0x10/0x10 ? __pfx_page_fault_oops+0x10/0x10 ? search_bpf_extables+0xfe/0x1c0 ? fixup_exception+0x3b/0x470 ? exc_page_fault+0xf6/0x110 ? asm_exc_page_fault+0x26/0x30 ? nexthop_select_path+0x197/0xbf0 ? nexthop_select_path+0x197/0xbf0 ? lock_is_held_type+0xe7/0x140 vxlan_xmit+0x5b2/0x2340 ? __lock_acquire+0x92b/0x3370 ? __pfx_vxlan_xmit+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_register_lock_class+0x10/0x10 ? skb_network_protocol+0xce/0x2d0 ? dev_hard_start_xmit+0xca/0x350 ? __pfx_vxlan_xmit+0x10/0x10 dev_hard_start_xmit+0xca/0x350 __dev_queue_xmit+0x513/0x1e20 ? __pfx___dev_queue_xmit+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? mark_held_locks+0x44/0x90 ? skb_push+0x4c/0x80 ? eth_header+0x81/0xe0 ? __pfx_eth_header+0x10/0x10 ? neigh_resolve_output+0x215/0x310 ? ip6_finish_output2+0x2ba/0xc90 ip6_finish_output2+0x2ba/0xc90 ? lock_release+0x236/0x3e0 ? ip6_mtu+0xbb/0x240 ? __pfx_ip6_finish_output2+0x10/0x10 ? find_held_lock+0x83/0xa0 ? lock_is_held_type+0xe7/0x140 ip6_finish_output+0x1ee/0x780 ip6_output+0x138/0x460 ? __pfx_ip6_output+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_ip6_finish_output+0x10/0x10 NF_HOOK.constprop.0+0xc0/0x420 ? __pfx_NF_HOOK.constprop.0+0x10/0x10 ? ndisc_send_skb+0x2c0/0x960 ? __pfx_lock_release+0x10/0x10 ? __local_bh_enable_ip+0x93/0x110 ? lock_is_held_type+0xe7/0x140 ndisc_send_skb+0x4be/0x960 ? __pfx_ndisc_send_skb+0x10/0x10 ? mark_held_locks+0x65/0x90 ? find_held_lock+0x83/0xa0 ndisc_send_ns+0xb0/0x110 ? __pfx_ndisc_send_ns+0x10/0x10 addrconf_dad_work+0x631/0x8e0 ? lock_acquire+0x180/0x3f0 ? __pfx_addrconf_dad_work+0x10/0x10 ? mark_held_locks+0x24/0x90 process_one_work+0x582/0x9c0 ? __pfx_process_one_work+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? mark_held_locks+0x24/0x90 worker_thread+0x93/0x630 ? __kthread_parkme+0xdc/0x100 ? __pfx_worker_thread+0x10/0x10 kthread+0x1a5/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 RIP: 0000:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: CR2: ffffc900025910c8 ---[ end trace 0000000000000000 ]--- RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Fix this problem by ensuring the MSB of hash is 0 using a right shift - the same approach used in fib_multipath_hash() and rt6_multipath_hash().
Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries") Signed-off-by: Benjamin Poirier bpoirier@nvidia.com Reviewed-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/vxlan.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/net/vxlan.h b/include/net/vxlan.h index 03bcc1ef0d61e..a46ec889acb73 100644 --- a/include/net/vxlan.h +++ b/include/net/vxlan.h @@ -548,12 +548,12 @@ static inline void vxlan_flag_attr_error(int attrtype, }
static inline bool vxlan_fdb_nh_path_select(struct nexthop *nh, - int hash, + u32 hash, struct vxlan_rdst *rdst) { struct fib_nh_common *nhc;
- nhc = nexthop_path_fdb_result(nh, hash); + nhc = nexthop_path_fdb_result(nh, hash >> 1); if (unlikely(!nhc)) return false;
From: Jianbo Liu jianbol@nvidia.com
[ Upstream commit 618d28a535a0582617465d14e05f3881736a2962 ]
As find_closest_ft_recursive is called to find the closest FT, the first parameter of find_closest_ft can be changed from fs_prio to fs_node. Thus this function is extended to find the closest FT for the nodes of any type, not only prios, but also the sub namespaces.
Signed-off-by: Jianbo Liu jianbol@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/d3962c2b443ec8dde7a740dc742a1f052d5e256c.169080394... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: c635ca45a7a2 ("net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio") Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/fs_core.c | 29 +++++++++---------- 1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index d53749248fa09..73ef771d6a4a4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -876,18 +876,17 @@ static struct mlx5_flow_table *find_closest_ft_recursive(struct fs_node *root, return ft; }
-/* If reverse is false then return the first flow table in next priority of - * prio in the tree, else return the last flow table in the previous priority - * of prio in the tree. +/* If reverse is false then return the first flow table next to the passed node + * in the tree, else return the last flow table before the node in the tree. */ -static struct mlx5_flow_table *find_closest_ft(struct fs_prio *prio, bool reverse) +static struct mlx5_flow_table *find_closest_ft(struct fs_node *node, bool reverse) { struct mlx5_flow_table *ft = NULL; struct fs_node *curr_node; struct fs_node *parent;
- parent = prio->node.parent; - curr_node = &prio->node; + parent = node->parent; + curr_node = node; while (!ft && parent) { ft = find_closest_ft_recursive(parent, &curr_node->list, reverse); curr_node = parent; @@ -897,15 +896,15 @@ static struct mlx5_flow_table *find_closest_ft(struct fs_prio *prio, bool revers }
/* Assuming all the tree is locked by mutex chain lock */ -static struct mlx5_flow_table *find_next_chained_ft(struct fs_prio *prio) +static struct mlx5_flow_table *find_next_chained_ft(struct fs_node *node) { - return find_closest_ft(prio, false); + return find_closest_ft(node, false); }
/* Assuming all the tree is locked by mutex chain lock */ -static struct mlx5_flow_table *find_prev_chained_ft(struct fs_prio *prio) +static struct mlx5_flow_table *find_prev_chained_ft(struct fs_node *node) { - return find_closest_ft(prio, true); + return find_closest_ft(node, true); }
static struct mlx5_flow_table *find_next_fwd_ft(struct mlx5_flow_table *ft, @@ -917,7 +916,7 @@ static struct mlx5_flow_table *find_next_fwd_ft(struct mlx5_flow_table *ft, next_ns = flow_act->action & MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_NS; fs_get_obj(prio, next_ns ? ft->ns->node.parent : ft->node.parent);
- return find_next_chained_ft(prio); + return find_next_chained_ft(&prio->node); }
static int connect_fts_in_prio(struct mlx5_core_dev *dev, @@ -948,7 +947,7 @@ static int connect_prev_fts(struct mlx5_core_dev *dev, { struct mlx5_flow_table *prev_ft;
- prev_ft = find_prev_chained_ft(prio); + prev_ft = find_prev_chained_ft(&prio->node); if (prev_ft) { struct fs_prio *prev_prio;
@@ -1094,7 +1093,7 @@ static int connect_flow_table(struct mlx5_core_dev *dev, struct mlx5_flow_table if (err) return err;
- next_ft = first_ft ? first_ft : find_next_chained_ft(prio); + next_ft = first_ft ? first_ft : find_next_chained_ft(&prio->node); err = connect_fwd_rules(dev, ft, next_ft); if (err) return err; @@ -1169,7 +1168,7 @@ static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespa
tree_init_node(&ft->node, del_hw_flow_table, del_sw_flow_table); next_ft = unmanaged ? ft_attr->next_ft : - find_next_chained_ft(fs_prio); + find_next_chained_ft(&fs_prio->node); ft->def_miss_action = ns->def_miss_action; ft->ns = ns; err = root->cmds->create_flow_table(root, ft, ft_attr, next_ft); @@ -2163,7 +2162,7 @@ static struct mlx5_flow_table *find_next_ft(struct mlx5_flow_table *ft)
if (!list_is_last(&ft->node.list, &prio->node.children)) return list_next_entry(ft, node.list); - return find_next_chained_ft(prio); + return find_next_chained_ft(&prio->node); }
static int update_root_ft_destroy(struct mlx5_flow_table *ft)
From: Jianbo Liu jianbol@nvidia.com
[ Upstream commit c635ca45a7a2023904a1f851e99319af7b87017d ]
In the cited commit, new type of FS_TYPE_PRIO_CHAINS fs_prio was added to support multiple parallel namespaces for multi-chains. And we skip all the flow tables under the fs_node of this type unconditionally, when searching for the next or previous flow table to connect for a new table.
As this search function is also used for find new root table when the old one is being deleted, it will skip the entire FS_TYPE_PRIO_CHAINS fs_node next to the old root. However, new root table should be chosen from it if there is any table in it. Fix it by skipping only the flow tables in the same FS_TYPE_PRIO_CHAINS fs_node when finding the closest FT for a fs_node.
Besides, complete the connecting from FTs of previous priority of prio because there should be multiple prevs after this fs_prio type is introduced. And also the next FT should be chosen from the first flow table next to the prio in the same FS_TYPE_PRIO_CHAINS fs_prio, if this prio is the first child.
Fixes: 328edb499f99 ("net/mlx5: Split FDB fast path prio to multiple namespaces") Signed-off-by: Jianbo Liu jianbol@nvidia.com Reviewed-by: Paul Blakey paulb@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/7a95754df479e722038996c97c97b062b372591f.169080394... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/fs_core.c | 80 +++++++++++++++++-- 1 file changed, 72 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 73ef771d6a4a4..e6674118bc428 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -860,7 +860,7 @@ static struct mlx5_flow_table *find_closest_ft_recursive(struct fs_node *root, struct fs_node *iter = list_entry(start, struct fs_node, list); struct mlx5_flow_table *ft = NULL;
- if (!root || root->type == FS_TYPE_PRIO_CHAINS) + if (!root) return NULL;
list_for_each_advance_continue(iter, &root->children, reverse) { @@ -876,19 +876,42 @@ static struct mlx5_flow_table *find_closest_ft_recursive(struct fs_node *root, return ft; }
+static struct fs_node *find_prio_chains_parent(struct fs_node *parent, + struct fs_node **child) +{ + struct fs_node *node = NULL; + + while (parent && parent->type != FS_TYPE_PRIO_CHAINS) { + node = parent; + parent = parent->parent; + } + + if (child) + *child = node; + + return parent; +} + /* If reverse is false then return the first flow table next to the passed node * in the tree, else return the last flow table before the node in the tree. + * If skip is true, skip the flow tables in the same prio_chains prio. */ -static struct mlx5_flow_table *find_closest_ft(struct fs_node *node, bool reverse) +static struct mlx5_flow_table *find_closest_ft(struct fs_node *node, bool reverse, + bool skip) { + struct fs_node *prio_chains_parent = NULL; struct mlx5_flow_table *ft = NULL; struct fs_node *curr_node; struct fs_node *parent;
+ if (skip) + prio_chains_parent = find_prio_chains_parent(node, NULL); parent = node->parent; curr_node = node; while (!ft && parent) { - ft = find_closest_ft_recursive(parent, &curr_node->list, reverse); + if (parent != prio_chains_parent) + ft = find_closest_ft_recursive(parent, &curr_node->list, + reverse); curr_node = parent; parent = curr_node->parent; } @@ -898,13 +921,13 @@ static struct mlx5_flow_table *find_closest_ft(struct fs_node *node, bool revers /* Assuming all the tree is locked by mutex chain lock */ static struct mlx5_flow_table *find_next_chained_ft(struct fs_node *node) { - return find_closest_ft(node, false); + return find_closest_ft(node, false, true); }
/* Assuming all the tree is locked by mutex chain lock */ static struct mlx5_flow_table *find_prev_chained_ft(struct fs_node *node) { - return find_closest_ft(node, true); + return find_closest_ft(node, true, true); }
static struct mlx5_flow_table *find_next_fwd_ft(struct mlx5_flow_table *ft, @@ -940,21 +963,55 @@ static int connect_fts_in_prio(struct mlx5_core_dev *dev, return 0; }
+static struct mlx5_flow_table *find_closet_ft_prio_chains(struct fs_node *node, + struct fs_node *parent, + struct fs_node **child, + bool reverse) +{ + struct mlx5_flow_table *ft; + + ft = find_closest_ft(node, reverse, false); + + if (ft && parent == find_prio_chains_parent(&ft->node, child)) + return ft; + + return NULL; +} + /* Connect flow tables from previous priority of prio to ft */ static int connect_prev_fts(struct mlx5_core_dev *dev, struct mlx5_flow_table *ft, struct fs_prio *prio) { + struct fs_node *prio_parent, *parent = NULL, *child, *node; struct mlx5_flow_table *prev_ft; + int err = 0; + + prio_parent = find_prio_chains_parent(&prio->node, &child); + + /* return directly if not under the first sub ns of prio_chains prio */ + if (prio_parent && !list_is_first(&child->list, &prio_parent->children)) + return 0;
prev_ft = find_prev_chained_ft(&prio->node); - if (prev_ft) { + while (prev_ft) { struct fs_prio *prev_prio;
fs_get_obj(prev_prio, prev_ft->node.parent); - return connect_fts_in_prio(dev, prev_prio, ft); + err = connect_fts_in_prio(dev, prev_prio, ft); + if (err) + break; + + if (!parent) { + parent = find_prio_chains_parent(&prev_prio->node, &child); + if (!parent) + break; + } + + node = child; + prev_ft = find_closet_ft_prio_chains(node, parent, &child, true); } - return 0; + return err; }
static int update_root_ft_create(struct mlx5_flow_table *ft, struct fs_prio @@ -2156,12 +2213,19 @@ EXPORT_SYMBOL(mlx5_del_flow_rules); /* Assuming prio->node.children(flow tables) is sorted by level */ static struct mlx5_flow_table *find_next_ft(struct mlx5_flow_table *ft) { + struct fs_node *prio_parent, *child; struct fs_prio *prio;
fs_get_obj(prio, ft->node.parent);
if (!list_is_last(&ft->node.list, &prio->node.children)) return list_next_entry(ft, node.list); + + prio_parent = find_prio_chains_parent(&prio->node, &child); + + if (prio_parent && list_is_first(&child->list, &prio_parent->children)) + return find_closest_ft(&prio->node, false, false); + return find_next_chained_ft(&prio->node); }
From: Jonas Gorski jonas.gorski@bisdn.de
[ Upstream commit b755c25fbcd568821a3bb0e0d5c2daa5fcb00bba ]
When both supported and previous version have the same major version, and the firmwares are missing, the driver ends in a loop requesting the same (previous) version over and over again:
[ 76.327413] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version [ 76.339802] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.352162] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.364502] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.376848] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.389183] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.401522] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.413860] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.426199] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version ...
Fix this by inverting the check to that we aren't yet at the previous version, and also check the minor version.
This also catches the case where both versions are the same, as it was after commit bb5dbf2cc64d ("net: marvell: prestera: add firmware v4.0 support").
With this fix applied:
[ 88.499622] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version [ 88.511995] Prestera DX 0000:01:00.0: failed to request previous firmware: mrvl/prestera/mvsw_prestera_fw-v4.0.img [ 88.522403] Prestera DX: probe of 0000:01:00.0 failed with error -2
Fixes: 47f26018a414 ("net: marvell: prestera: try to load previous fw version") Signed-off-by: Jonas Gorski jonas.gorski@bisdn.de Acked-by: Elad Nachman enachman@marvell.com Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Acked-by: Taras Chornyi taras.chornyi@plvision.eu Link: https://lore.kernel.org/r/20230802092357.163944-1-jonas.gorski@bisdn.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/prestera/prestera_pci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_pci.c b/drivers/net/ethernet/marvell/prestera/prestera_pci.c index 59470d99f5228..a37dbbda8de39 100644 --- a/drivers/net/ethernet/marvell/prestera/prestera_pci.c +++ b/drivers/net/ethernet/marvell/prestera/prestera_pci.c @@ -702,7 +702,8 @@ static int prestera_fw_get(struct prestera_fw *fw)
err = request_firmware_direct(&fw->bin, fw_path, fw->dev.dev); if (err) { - if (ver_maj == PRESTERA_SUPP_FW_MAJ_VER) { + if (ver_maj != PRESTERA_PREV_FW_MAJ_VER || + ver_min != PRESTERA_PREV_FW_MIN_VER) { ver_maj = PRESTERA_PREV_FW_MAJ_VER; ver_min = PRESTERA_PREV_FW_MIN_VER;
From: Eric Dumazet edumazet@google.com
[ Upstream commit e6638094d7af6c7b9dcca05ad009e79e31b4f670 ]
Because v4 and v6 families use separate inetpeer trees (respectively net->ipv4.peers and net->ipv6.peers), inetpeer_addr_cmp(a, b) assumes a & b share the same family.
tcp_metrics use a common hash table, where entries can have different families.
We must therefore make sure to not call inetpeer_addr_cmp() if the families do not match.
Fixes: d39d14ffa24c ("net: Add helper function to compare inetpeer addresses") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230802131500.1478140-2-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_metrics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 82f4575f9cd90..c4daf0aa2d4d9 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -78,7 +78,7 @@ static void tcp_metric_set(struct tcp_metrics_block *tm, static bool addr_same(const struct inetpeer_addr *a, const struct inetpeer_addr *b) { - return inetpeer_addr_cmp(a, b) == 0; + return (a->family == b->family) && !inetpeer_addr_cmp(a, b); }
struct tcpm_hash_bucket {
From: Eric Dumazet edumazet@google.com
[ Upstream commit 949ad62a5d5311d36fce2e14fe5fed3f936da51c ]
tm->tcpm_stamp can be read or written locklessly.
Add needed READ_ONCE()/WRITE_ONCE() to document this.
Also constify tcpm_check_stamp() dst argument.
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230802131500.1478140-3-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_metrics.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index c4daf0aa2d4d9..8386165887963 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -97,7 +97,7 @@ static void tcpm_suck_dst(struct tcp_metrics_block *tm, u32 msval; u32 val;
- tm->tcpm_stamp = jiffies; + WRITE_ONCE(tm->tcpm_stamp, jiffies);
val = 0; if (dst_metric_locked(dst, RTAX_RTT)) @@ -131,9 +131,15 @@ static void tcpm_suck_dst(struct tcp_metrics_block *tm,
#define TCP_METRICS_TIMEOUT (60 * 60 * HZ)
-static void tcpm_check_stamp(struct tcp_metrics_block *tm, struct dst_entry *dst) +static void tcpm_check_stamp(struct tcp_metrics_block *tm, + const struct dst_entry *dst) { - if (tm && unlikely(time_after(jiffies, tm->tcpm_stamp + TCP_METRICS_TIMEOUT))) + unsigned long limit; + + if (!tm) + return; + limit = READ_ONCE(tm->tcpm_stamp) + TCP_METRICS_TIMEOUT; + if (unlikely(time_after(jiffies, limit))) tcpm_suck_dst(tm, dst, false); }
@@ -174,7 +180,8 @@ static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst, oldest = deref_locked(tcp_metrics_hash[hash].chain); for (tm = deref_locked(oldest->tcpm_next); tm; tm = deref_locked(tm->tcpm_next)) { - if (time_before(tm->tcpm_stamp, oldest->tcpm_stamp)) + if (time_before(READ_ONCE(tm->tcpm_stamp), + READ_ONCE(oldest->tcpm_stamp))) oldest = tm; } tm = oldest; @@ -434,7 +441,7 @@ void tcp_update_metrics(struct sock *sk) tp->reordering); } } - tm->tcpm_stamp = jiffies; + WRITE_ONCE(tm->tcpm_stamp, jiffies); out_unlock: rcu_read_unlock(); } @@ -647,7 +654,7 @@ static int tcp_metrics_fill_info(struct sk_buff *msg, }
if (nla_put_msecs(msg, TCP_METRICS_ATTR_AGE, - jiffies - tm->tcpm_stamp, + jiffies - READ_ONCE(tm->tcpm_stamp), TCP_METRICS_ATTR_PAD) < 0) goto nla_put_failure;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 285ce119a3c6c4502585936650143e54c8692788 ]
tm->tcpm_lock can be read or written locklessly.
Add needed READ_ONCE()/WRITE_ONCE() to document this.
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230802131500.1478140-4-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_metrics.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 8386165887963..131fa30049691 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -59,7 +59,8 @@ static inline struct net *tm_net(struct tcp_metrics_block *tm) static bool tcp_metric_locked(struct tcp_metrics_block *tm, enum tcp_metric_index idx) { - return tm->tcpm_lock & (1 << idx); + /* Paired with WRITE_ONCE() in tcpm_suck_dst() */ + return READ_ONCE(tm->tcpm_lock) & (1 << idx); }
static u32 tcp_metric_get(struct tcp_metrics_block *tm, @@ -110,7 +111,8 @@ static void tcpm_suck_dst(struct tcp_metrics_block *tm, val |= 1 << TCP_METRIC_CWND; if (dst_metric_locked(dst, RTAX_REORDERING)) val |= 1 << TCP_METRIC_REORDERING; - tm->tcpm_lock = val; + /* Paired with READ_ONCE() in tcp_metric_locked() */ + WRITE_ONCE(tm->tcpm_lock, val);
msval = dst_metric_raw(dst, RTAX_RTT); tm->tcpm_vals[TCP_METRIC_RTT] = msval * USEC_PER_MSEC;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 8c4d04f6b443869d25e59822f7cec88d647028a9 ]
tm->tcpm_vals[] values can be read or written locklessly.
Add needed READ_ONCE()/WRITE_ONCE() to document this, and force use of tcp_metric_get() and tcp_metric_set()
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_metrics.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 131fa30049691..fd4ab7a51cef2 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -63,17 +63,19 @@ static bool tcp_metric_locked(struct tcp_metrics_block *tm, return READ_ONCE(tm->tcpm_lock) & (1 << idx); }
-static u32 tcp_metric_get(struct tcp_metrics_block *tm, +static u32 tcp_metric_get(const struct tcp_metrics_block *tm, enum tcp_metric_index idx) { - return tm->tcpm_vals[idx]; + /* Paired with WRITE_ONCE() in tcp_metric_set() */ + return READ_ONCE(tm->tcpm_vals[idx]); }
static void tcp_metric_set(struct tcp_metrics_block *tm, enum tcp_metric_index idx, u32 val) { - tm->tcpm_vals[idx] = val; + /* Paired with READ_ONCE() in tcp_metric_get() */ + WRITE_ONCE(tm->tcpm_vals[idx], val); }
static bool addr_same(const struct inetpeer_addr *a, @@ -115,13 +117,16 @@ static void tcpm_suck_dst(struct tcp_metrics_block *tm, WRITE_ONCE(tm->tcpm_lock, val);
msval = dst_metric_raw(dst, RTAX_RTT); - tm->tcpm_vals[TCP_METRIC_RTT] = msval * USEC_PER_MSEC; + tcp_metric_set(tm, TCP_METRIC_RTT, msval * USEC_PER_MSEC);
msval = dst_metric_raw(dst, RTAX_RTTVAR); - tm->tcpm_vals[TCP_METRIC_RTTVAR] = msval * USEC_PER_MSEC; - tm->tcpm_vals[TCP_METRIC_SSTHRESH] = dst_metric_raw(dst, RTAX_SSTHRESH); - tm->tcpm_vals[TCP_METRIC_CWND] = dst_metric_raw(dst, RTAX_CWND); - tm->tcpm_vals[TCP_METRIC_REORDERING] = dst_metric_raw(dst, RTAX_REORDERING); + tcp_metric_set(tm, TCP_METRIC_RTTVAR, msval * USEC_PER_MSEC); + tcp_metric_set(tm, TCP_METRIC_SSTHRESH, + dst_metric_raw(dst, RTAX_SSTHRESH)); + tcp_metric_set(tm, TCP_METRIC_CWND, + dst_metric_raw(dst, RTAX_CWND)); + tcp_metric_set(tm, TCP_METRIC_REORDERING, + dst_metric_raw(dst, RTAX_REORDERING)); if (fastopen_clear) { tm->tcpm_fastopen.mss = 0; tm->tcpm_fastopen.syn_loss = 0; @@ -667,7 +672,7 @@ static int tcp_metrics_fill_info(struct sk_buff *msg, if (!nest) goto nla_put_failure; for (i = 0; i < TCP_METRIC_MAX_KERNEL + 1; i++) { - u32 val = tm->tcpm_vals[i]; + u32 val = tcp_metric_get(tm, i);
if (!val) continue;
From: Eric Dumazet edumazet@google.com
[ Upstream commit d5d986ce42c71a7562d32c4e21e026b0f87befec ]
tm->tcpm_net can be read or written locklessly.
Instead of changing write_pnet() and read_pnet() and potentially hurt performance, add the needed READ_ONCE()/WRITE_ONCE() in tm_net() and tcpm_new().
Fixes: 849e8a0ca8d5 ("tcp_metrics: Add a field tcpm_net and verify it matches on lookup") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230802131500.1478140-6-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_metrics.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index fd4ab7a51cef2..4fd274836a48f 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -40,7 +40,7 @@ struct tcp_fastopen_metrics {
struct tcp_metrics_block { struct tcp_metrics_block __rcu *tcpm_next; - possible_net_t tcpm_net; + struct net *tcpm_net; struct inetpeer_addr tcpm_saddr; struct inetpeer_addr tcpm_daddr; unsigned long tcpm_stamp; @@ -51,9 +51,10 @@ struct tcp_metrics_block { struct rcu_head rcu_head; };
-static inline struct net *tm_net(struct tcp_metrics_block *tm) +static inline struct net *tm_net(const struct tcp_metrics_block *tm) { - return read_pnet(&tm->tcpm_net); + /* Paired with the WRITE_ONCE() in tcpm_new() */ + return READ_ONCE(tm->tcpm_net); }
static bool tcp_metric_locked(struct tcp_metrics_block *tm, @@ -197,7 +198,9 @@ static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst, if (!tm) goto out_unlock; } - write_pnet(&tm->tcpm_net, net); + /* Paired with the READ_ONCE() in tm_net() */ + WRITE_ONCE(tm->tcpm_net, net); + tm->tcpm_saddr = *saddr; tm->tcpm_daddr = *daddr;
From: Eric Dumazet edumazet@google.com
[ Upstream commit ddf251fa2bc1d3699eec0bae6ed0bc373b8fda79 ]
Whenever tcpm_new() reclaims an old entry, tcpm_suck_dst() would overwrite data that could be read from tcp_fastopen_cache_get() or tcp_metrics_fill_info().
We need to acquire fastopen_seqlock to maintain consistency.
For newly allocated objects, tcpm_new() can switch to kzalloc() to avoid an extra fastopen_seqlock acquisition.
Fixes: 1fe4c481ba63 ("net-tcp: Fast Open client - cookie cache") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Yuchung Cheng ycheng@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230802131500.1478140-7-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_metrics.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 4fd274836a48f..99ac5efe244d3 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -93,6 +93,7 @@ static struct tcpm_hash_bucket *tcp_metrics_hash __read_mostly; static unsigned int tcp_metrics_hash_log __read_mostly;
static DEFINE_SPINLOCK(tcp_metrics_lock); +static DEFINE_SEQLOCK(fastopen_seqlock);
static void tcpm_suck_dst(struct tcp_metrics_block *tm, const struct dst_entry *dst, @@ -129,11 +130,13 @@ static void tcpm_suck_dst(struct tcp_metrics_block *tm, tcp_metric_set(tm, TCP_METRIC_REORDERING, dst_metric_raw(dst, RTAX_REORDERING)); if (fastopen_clear) { + write_seqlock(&fastopen_seqlock); tm->tcpm_fastopen.mss = 0; tm->tcpm_fastopen.syn_loss = 0; tm->tcpm_fastopen.try_exp = 0; tm->tcpm_fastopen.cookie.exp = false; tm->tcpm_fastopen.cookie.len = 0; + write_sequnlock(&fastopen_seqlock); } }
@@ -194,7 +197,7 @@ static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst, } tm = oldest; } else { - tm = kmalloc(sizeof(*tm), GFP_ATOMIC); + tm = kzalloc(sizeof(*tm), GFP_ATOMIC); if (!tm) goto out_unlock; } @@ -204,7 +207,7 @@ static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst, tm->tcpm_saddr = *saddr; tm->tcpm_daddr = *daddr;
- tcpm_suck_dst(tm, dst, true); + tcpm_suck_dst(tm, dst, reclaim);
if (likely(!reclaim)) { tm->tcpm_next = tcp_metrics_hash[hash].chain; @@ -556,8 +559,6 @@ bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst) return ret; }
-static DEFINE_SEQLOCK(fastopen_seqlock); - void tcp_fastopen_cache_get(struct sock *sk, u16 *mss, struct tcp_fastopen_cookie *cookie) {
From: Boqun Feng boqun.feng@gmail.com
commit b3d8aa84bbfe9b58ccc5332cacf8ea17200af310 upstream.
Currently the rust allocator simply passes the size of the type Layout to krealloc(), and in theory the alignment requirement from the type Layout may be larger than the guarantee provided by SLAB, which means the allocated object is mis-aligned.
Fix this by adjusting the allocation size to the nearest power of two, which SLAB always guarantees a size-aligned allocation. And because Rust guarantees that the original size must be a multiple of alignment and the alignment must be a power of two, then the alignment requirement is satisfied.
Suggested-by: Vlastimil Babka vbabka@suse.cz Co-developed-by: "Andreas Hindborg (Samsung)" nmi@metaspace.dk Signed-off-by: "Andreas Hindborg (Samsung)" nmi@metaspace.dk Signed-off-by: Boqun Feng boqun.feng@gmail.com Cc: stable@vger.kernel.org # v6.1+ Acked-by: Vlastimil Babka vbabka@suse.cz Fixes: 247b365dc8dc ("rust: add `kernel` crate") Link: https://github.com/Rust-for-Linux/linux/issues/974 Link: https://lore.kernel.org/r/20230730012905.643822-2-boqun.feng@gmail.com [ Applied rewording of comment as discussed in the mailing list. ] Signed-off-by: Miguel Ojeda ojeda@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- rust/bindings/bindings_helper.h | 1 rust/kernel/allocator.rs | 74 +++++++++++++++++++++++++++++++--------- 2 files changed, 60 insertions(+), 15 deletions(-)
--- a/rust/bindings/bindings_helper.h +++ b/rust/bindings/bindings_helper.h @@ -9,5 +9,6 @@ #include <linux/slab.h>
/* `bindgen` gets confused at certain things. */ +const size_t BINDINGS_ARCH_SLAB_MINALIGN = ARCH_SLAB_MINALIGN; const gfp_t BINDINGS_GFP_KERNEL = GFP_KERNEL; const gfp_t BINDINGS___GFP_ZERO = __GFP_ZERO; --- a/rust/kernel/allocator.rs +++ b/rust/kernel/allocator.rs @@ -9,6 +9,36 @@ use crate::bindings;
struct KernelAllocator;
+/// Calls `krealloc` with a proper size to alloc a new object aligned to `new_layout`'s alignment. +/// +/// # Safety +/// +/// - `ptr` can be either null or a pointer which has been allocated by this allocator. +/// - `new_layout` must have a non-zero size. +unsafe fn krealloc_aligned(ptr: *mut u8, new_layout: Layout, flags: bindings::gfp_t) -> *mut u8 { + // Customized layouts from `Layout::from_size_align()` can have size < align, so pad first. + let layout = new_layout.pad_to_align(); + + let mut size = layout.size(); + + if layout.align() > bindings::BINDINGS_ARCH_SLAB_MINALIGN { + // The alignment requirement exceeds the slab guarantee, thus try to enlarge the size + // to use the "power-of-two" size/alignment guarantee (see comments in `kmalloc()` for + // more information). + // + // Note that `layout.size()` (after padding) is guaranteed to be a multiple of + // `layout.align()`, so `next_power_of_two` gives enough alignment guarantee. + size = size.next_power_of_two(); + } + + // SAFETY: + // - `ptr` is either null or a pointer returned from a previous `k{re}alloc()` by the + // function safety requirement. + // - `size` is greater than 0 since it's either a `layout.size()` (which cannot be zero + // according to the function safety requirement) or a result from `next_power_of_two()`. + unsafe { bindings::krealloc(ptr as *const core::ffi::c_void, size, flags) as *mut u8 } +} + unsafe impl GlobalAlloc for KernelAllocator { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // `krealloc()` is used instead of `kmalloc()` because the latter is @@ -30,10 +60,20 @@ static ALLOCATOR: KernelAllocator = Kern // to extract the object file that has them from the archive. For the moment, // let's generate them ourselves instead. // +// Note: Although these are *safe* functions, they are called by the compiler +// with parameters that obey the same `GlobalAlloc` function safety +// requirements: size and align should form a valid layout, and size is +// greater than 0. +// // Note that `#[no_mangle]` implies exported too, nowadays. #[no_mangle] -fn __rust_alloc(size: usize, _align: usize) -> *mut u8 { - unsafe { bindings::krealloc(core::ptr::null(), size, bindings::GFP_KERNEL) as *mut u8 } +fn __rust_alloc(size: usize, align: usize) -> *mut u8 { + // SAFETY: See assumption above. + let layout = unsafe { Layout::from_size_align_unchecked(size, align) }; + + // SAFETY: `ptr::null_mut()` is null, per assumption above the size of `layout` is greater + // than 0. + unsafe { krealloc_aligned(ptr::null_mut(), layout, bindings::GFP_KERNEL) } }
#[no_mangle] @@ -42,23 +82,27 @@ fn __rust_dealloc(ptr: *mut u8, _size: u }
#[no_mangle] -fn __rust_realloc(ptr: *mut u8, _old_size: usize, _align: usize, new_size: usize) -> *mut u8 { - unsafe { - bindings::krealloc( - ptr as *const core::ffi::c_void, - new_size, - bindings::GFP_KERNEL, - ) as *mut u8 - } +fn __rust_realloc(ptr: *mut u8, _old_size: usize, align: usize, new_size: usize) -> *mut u8 { + // SAFETY: See assumption above. + let new_layout = unsafe { Layout::from_size_align_unchecked(new_size, align) }; + + // SAFETY: Per assumption above, `ptr` is allocated by `__rust_*` before, and the size of + // `new_layout` is greater than 0. + unsafe { krealloc_aligned(ptr, new_layout, bindings::GFP_KERNEL) } }
#[no_mangle] -fn __rust_alloc_zeroed(size: usize, _align: usize) -> *mut u8 { +fn __rust_alloc_zeroed(size: usize, align: usize) -> *mut u8 { + // SAFETY: See assumption above. + let layout = unsafe { Layout::from_size_align_unchecked(size, align) }; + + // SAFETY: `ptr::null_mut()` is null, per assumption above the size of `layout` is greater + // than 0. unsafe { - bindings::krealloc( - core::ptr::null(), - size, + krealloc_aligned( + ptr::null_mut(), + layout, bindings::GFP_KERNEL | bindings::__GFP_ZERO, - ) as *mut u8 + ) } }
From: Steffen Maier maier@linux.ibm.com
commit e65851989001c0c9ba9177564b13b38201c0854c upstream.
Storage devices are free to send RSCNs, e.g. for internal state changes. If this happens on all connected paths, zfcp risks temporarily losing all paths at the same time. This has strong requirements on multipath configuration such as "no_path_retry queue".
Avoid such situations by deferring fc_rport blocking until after the ADISC response, when any actual state change of the remote port became clear. The already existing port recovery triggers explicitly block the fc_rport. The triggers are: on ADISC reject or timeout (typical cable pull case), and on ADISC indicating that the remote port has changed its WWPN or the port is meanwhile no longer open.
As a side effect, this also removes a confusing direct function call to another work item function zfcp_scsi_rport_work() instead of scheduling that other work item. It was probably done that way to have the rport block side effect immediate and synchronous to the caller.
Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors") Cc: stable@vger.kernel.org #v2.6.30+ Reviewed-by: Benjamin Block bblock@linux.ibm.com Reviewed-by: Fedor Loshakov loshakov@linux.ibm.com Signed-off-by: Steffen Maier maier@linux.ibm.com Link: https://lore.kernel.org/r/20230724145156.3920244-1-maier@linux.ibm.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/scsi/zfcp_fc.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
--- a/drivers/s390/scsi/zfcp_fc.c +++ b/drivers/s390/scsi/zfcp_fc.c @@ -534,8 +534,7 @@ static void zfcp_fc_adisc_handler(void *
/* re-init to undo drop from zfcp_fc_adisc() */ port->d_id = ntoh24(adisc_resp->adisc_port_id); - /* port is good, unblock rport without going through erp */ - zfcp_scsi_schedule_rport_register(port); + /* port is still good, nothing to do */ out: atomic_andnot(ZFCP_STATUS_PORT_LINK_TEST, &port->status); put_device(&port->dev); @@ -595,9 +594,6 @@ void zfcp_fc_link_test_work(struct work_ int retval;
set_worker_desc("zadisc%16llx", port->wwpn); /* < WORKER_DESC_LEN=24 */ - get_device(&port->dev); - port->rport_task = RPORT_DEL; - zfcp_scsi_rport_work(&port->rport_work);
/* only issue one test command at one time per port */ if (atomic_read(&port->status) & ZFCP_STATUS_PORT_LINK_TEST)
From: Michael Kelley mikelley@microsoft.com
commit 010c1e1c5741365dbbf44a5a5bb9f30192875c4c upstream.
The Hyper-V host is queried to get the max transfer size that it supports, and this value is used to set max_sectors for the synthetic SCSI controller. However, this max transfer size may be too large for virtual Fibre Channel devices, which are limited to 512 Kbytes. If a larger transfer size is used with a vFC device, Hyper-V always returns an error, and storvsc logs a message like this where the SRB status and SCSI status are both zero:
hv_storvsc <GUID>: tag#197 cmd 0x8a status: scsi 0x0 srb 0x0 hv 0xc0000001
Add logic to limit the max transfer size to 512 Kbytes for vFC devices.
Fixes: 1d3e0980782f ("scsi: storvsc: Correct reporting of Hyper-V I/O size limits") Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/1689887102-32806-1-git-send-email-mikelley@microso... Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/storvsc_drv.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -365,6 +365,7 @@ static void storvsc_on_channel_callback( #define STORVSC_FC_MAX_LUNS_PER_TARGET 255 #define STORVSC_FC_MAX_TARGETS 128 #define STORVSC_FC_MAX_CHANNELS 8 +#define STORVSC_FC_MAX_XFER_SIZE ((u32)(512 * 1024))
#define STORVSC_IDE_MAX_LUNS_PER_TARGET 64 #define STORVSC_IDE_MAX_TARGETS 1 @@ -2002,6 +2003,9 @@ static int storvsc_probe(struct hv_devic * protecting it from any weird value. */ max_xfer_bytes = round_down(stor_device->max_transfer_bytes, HV_HYP_PAGE_SIZE); + if (is_fc) + max_xfer_bytes = min(max_xfer_bytes, STORVSC_FC_MAX_XFER_SIZE); + /* max_hw_sectors_kb */ host->max_sectors = max_xfer_bytes >> 9; /*
From: Ilya Dryomov idryomov@gmail.com
commit e6e2843230799230fc5deb8279728a7218b0d63c upstream.
If the cluster becomes unavailable, ceph_osdc_notify() may hang even with osd_request_timeout option set because linger_notify_finish_wait() waits for MWatchNotify NOTIFY_COMPLETE message with no associated OSD request in flight -- it's completely asynchronous.
Introduce an additional timeout, derived from the specified notify timeout. While at it, switch both waits to killable which is more correct.
Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Dongsheng Yang dongsheng.yang@easystack.cn Reviewed-by: Xiubo Li xiubli@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ceph/osd_client.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
--- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -3334,17 +3334,24 @@ static int linger_reg_commit_wait(struct int ret;
dout("%s lreq %p linger_id %llu\n", __func__, lreq, lreq->linger_id); - ret = wait_for_completion_interruptible(&lreq->reg_commit_wait); + ret = wait_for_completion_killable(&lreq->reg_commit_wait); return ret ?: lreq->reg_commit_error; }
-static int linger_notify_finish_wait(struct ceph_osd_linger_request *lreq) +static int linger_notify_finish_wait(struct ceph_osd_linger_request *lreq, + unsigned long timeout) { - int ret; + long left;
dout("%s lreq %p linger_id %llu\n", __func__, lreq, lreq->linger_id); - ret = wait_for_completion_interruptible(&lreq->notify_finish_wait); - return ret ?: lreq->notify_finish_error; + left = wait_for_completion_killable_timeout(&lreq->notify_finish_wait, + ceph_timeout_jiffies(timeout)); + if (left <= 0) + left = left ?: -ETIMEDOUT; + else + left = lreq->notify_finish_error; /* completed */ + + return left; }
/* @@ -4896,7 +4903,8 @@ int ceph_osdc_notify(struct ceph_osd_cli linger_submit(lreq); ret = linger_reg_commit_wait(lreq); if (!ret) - ret = linger_notify_finish_wait(lreq); + ret = linger_notify_finish_wait(lreq, + msecs_to_jiffies(2 * timeout * MSEC_PER_SEC)); else dout("lreq %p failed to initiate notify %d\n", lreq, ret);
From: Ross Maynard bids.7405@bigpond.com
commit b99225b4fe297d07400f9e2332ecd7347b224f8d upstream.
The SL-A300, B500/5600, and C700 devices no longer auto-load because of "usbnet: Remove over-broad module alias from zaurus." This patch adds IDs for those 3 devices.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217632 Fixes: 16adf5d07987 ("usbnet: Remove over-broad module alias from zaurus.") Signed-off-by: Ross Maynard bids.7405@bigpond.com Cc: stable@vger.kernel.org Acked-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/69b5423b-2013-9fc9-9569-58e707d9bafb@bigpond.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/cdc_ether.c | 21 +++++++++++++++++++++ drivers/net/usb/zaurus.c | 21 +++++++++++++++++++++ 2 files changed, 42 insertions(+)
--- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -618,6 +618,13 @@ static const struct usb_device_id produc .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_DEVICE, .idVendor = 0x04DD, + .idProduct = 0x8005, /* A-300 */ + ZAURUS_FAKE_INTERFACE, + .driver_info = 0, +}, { + .match_flags = USB_DEVICE_ID_MATCH_INT_INFO + | USB_DEVICE_ID_MATCH_DEVICE, + .idVendor = 0x04DD, .idProduct = 0x8006, /* B-500/SL-5600 */ ZAURUS_MASTER_INTERFACE, .driver_info = 0, @@ -625,11 +632,25 @@ static const struct usb_device_id produc .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_DEVICE, .idVendor = 0x04DD, + .idProduct = 0x8006, /* B-500/SL-5600 */ + ZAURUS_FAKE_INTERFACE, + .driver_info = 0, +}, { + .match_flags = USB_DEVICE_ID_MATCH_INT_INFO + | USB_DEVICE_ID_MATCH_DEVICE, + .idVendor = 0x04DD, .idProduct = 0x8007, /* C-700 */ ZAURUS_MASTER_INTERFACE, .driver_info = 0, }, { .match_flags = USB_DEVICE_ID_MATCH_INT_INFO + | USB_DEVICE_ID_MATCH_DEVICE, + .idVendor = 0x04DD, + .idProduct = 0x8007, /* C-700 */ + ZAURUS_FAKE_INTERFACE, + .driver_info = 0, +}, { + .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_DEVICE, .idVendor = 0x04DD, .idProduct = 0x9031, /* C-750 C-760 */ --- a/drivers/net/usb/zaurus.c +++ b/drivers/net/usb/zaurus.c @@ -289,11 +289,25 @@ static const struct usb_device_id produc .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_DEVICE, .idVendor = 0x04DD, + .idProduct = 0x8005, /* A-300 */ + ZAURUS_FAKE_INTERFACE, + .driver_info = (unsigned long)&bogus_mdlm_info, +}, { + .match_flags = USB_DEVICE_ID_MATCH_INT_INFO + | USB_DEVICE_ID_MATCH_DEVICE, + .idVendor = 0x04DD, .idProduct = 0x8006, /* B-500/SL-5600 */ ZAURUS_MASTER_INTERFACE, .driver_info = ZAURUS_PXA_INFO, }, { .match_flags = USB_DEVICE_ID_MATCH_INT_INFO + | USB_DEVICE_ID_MATCH_DEVICE, + .idVendor = 0x04DD, + .idProduct = 0x8006, /* B-500/SL-5600 */ + ZAURUS_FAKE_INTERFACE, + .driver_info = (unsigned long)&bogus_mdlm_info, +}, { + .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_DEVICE, .idVendor = 0x04DD, .idProduct = 0x8007, /* C-700 */ @@ -301,6 +315,13 @@ static const struct usb_device_id produc .driver_info = ZAURUS_PXA_INFO, }, { .match_flags = USB_DEVICE_ID_MATCH_INT_INFO + | USB_DEVICE_ID_MATCH_DEVICE, + .idVendor = 0x04DD, + .idProduct = 0x8007, /* C-700 */ + ZAURUS_FAKE_INTERFACE, + .driver_info = (unsigned long)&bogus_mdlm_info, +}, { + .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_DEVICE, .idVendor = 0x04DD, .idProduct = 0x9031, /* C-750 C-760 */
From: Xiubo Li xiubli@redhat.com
commit e7e607bd00481745550389a29ecabe33e13d67cf upstream.
Flushing the dirty buffer may take a long time if the cluster is overloaded or if there is network issue. So we should ping the MDSs periodically to keep alive, else the MDS will blocklist the kclient.
Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/61843 Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Milind Changire mchangir@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/mds_client.c | 4 ++-- fs/ceph/mds_client.h | 5 +++++ fs/ceph/super.c | 10 ++++++++++ 3 files changed, 17 insertions(+), 2 deletions(-)
--- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -4758,7 +4758,7 @@ static void delayed_work(struct work_str
dout("mdsc delayed_work\n");
- if (mdsc->stopping) + if (mdsc->stopping >= CEPH_MDSC_STOPPING_FLUSHED) return;
mutex_lock(&mdsc->mutex); @@ -4937,7 +4937,7 @@ void send_flush_mdlog(struct ceph_mds_se void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc) { dout("pre_umount\n"); - mdsc->stopping = 1; + mdsc->stopping = CEPH_MDSC_STOPPING_BEGIN;
ceph_mdsc_iterate_sessions(mdsc, send_flush_mdlog, true); ceph_mdsc_iterate_sessions(mdsc, lock_unlock_session, false); --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -380,6 +380,11 @@ struct cap_wait { int want; };
+enum { + CEPH_MDSC_STOPPING_BEGIN = 1, + CEPH_MDSC_STOPPING_FLUSHED = 2, +}; + /* * mds client state */ --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -1374,6 +1374,16 @@ static void ceph_kill_sb(struct super_bl ceph_mdsc_pre_umount(fsc->mdsc); flush_fs_workqueues(fsc);
+ /* + * Though the kill_anon_super() will finally trigger the + * sync_filesystem() anyway, we still need to do it here + * and then bump the stage of shutdown to stop the work + * queue as earlier as possible. + */ + sync_filesystem(s); + + fsc->mdsc->stopping = CEPH_MDSC_STOPPING_FLUSHED; + kill_anon_super(s);
fsc->client->extra_mon_dispatch = NULL;
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit da042eb4f061a0b54aedadcaa15391490c48e1ad upstream.
The OF node reference obtained from of_parse_phandle() should be dropped if node is not compatible with arm,scmi-shmem.
Fixes: 507cd4d2c5eb ("firmware: arm_scmi: Add compatibility checks for shmem node") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Cristian Marussi cristian.marussi@arm.com Link: https://lore.kernel.org/r/20230719061652.8850-1-krzysztof.kozlowski@linaro.o... Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/arm_scmi/mailbox.c | 4 +++- drivers/firmware/arm_scmi/smc.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/firmware/arm_scmi/mailbox.c +++ b/drivers/firmware/arm_scmi/mailbox.c @@ -106,8 +106,10 @@ static int mailbox_chan_setup(struct scm return -ENOMEM;
shmem = of_parse_phandle(cdev->of_node, "shmem", idx); - if (!of_device_is_compatible(shmem, "arm,scmi-shmem")) + if (!of_device_is_compatible(shmem, "arm,scmi-shmem")) { + of_node_put(shmem); return -ENXIO; + }
ret = of_address_to_resource(shmem, 0, &res); of_node_put(shmem); --- a/drivers/firmware/arm_scmi/smc.c +++ b/drivers/firmware/arm_scmi/smc.c @@ -118,8 +118,10 @@ static int smc_chan_setup(struct scmi_ch return -ENOMEM;
np = of_parse_phandle(cdev->of_node, "shmem", 0); - if (!of_device_is_compatible(np, "arm,scmi-shmem")) + if (!of_device_is_compatible(np, "arm,scmi-shmem")) { + of_node_put(np); return -ENXIO; + }
ret = of_address_to_resource(np, 0, &res); of_node_put(np);
From: gaoming gaoming20@hihonor.com
commit daf60d6cca26e50d65dac374db92e58de745ad26 upstream.
The call stack shown below is a scenario in the Linux 4.19 kernel. Allocating memory failed where exfat fs use kmalloc_array due to system memory fragmentation, while the u-disk was inserted without recognition. Devices such as u-disk using the exfat file system are pluggable and may be insert into the system at any time. However, long-term running systems cannot guarantee the continuity of physical memory. Therefore, it's necessary to address this issue.
Binder:2632_6: page allocation failure: order:4, mode:0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null) Call trace: [242178.097582] dump_backtrace+0x0/0x4 [242178.097589] dump_stack+0xf4/0x134 [242178.097598] warn_alloc+0xd8/0x144 [242178.097603] __alloc_pages_nodemask+0x1364/0x1384 [242178.097608] kmalloc_order+0x2c/0x510 [242178.097612] kmalloc_order_trace+0x40/0x16c [242178.097618] __kmalloc+0x360/0x408 [242178.097624] load_alloc_bitmap+0x160/0x284 [242178.097628] exfat_fill_super+0xa3c/0xe7c [242178.097635] mount_bdev+0x2e8/0x3a0 [242178.097638] exfat_fs_mount+0x40/0x50 [242178.097643] mount_fs+0x138/0x2e8 [242178.097649] vfs_kern_mount+0x90/0x270 [242178.097655] do_mount+0x798/0x173c [242178.097659] ksys_mount+0x114/0x1ac [242178.097665] __arm64_sys_mount+0x24/0x34 [242178.097671] el0_svc_common+0xb8/0x1b8 [242178.097676] el0_svc_handler+0x74/0x90 [242178.097681] el0_svc+0x8/0x340
By analyzing the exfat code,we found that continuous physical memory is not required here,so kvmalloc_array is used can solve this problem.
Cc: stable@vger.kernel.org Signed-off-by: gaoming gaoming20@hihonor.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/exfat/balloc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/exfat/balloc.c +++ b/fs/exfat/balloc.c @@ -69,7 +69,7 @@ static int exfat_allocate_bitmap(struct } sbi->map_sectors = ((need_map_size - 1) >> (sb->s_blocksize_bits)) + 1; - sbi->vol_amap = kmalloc_array(sbi->map_sectors, + sbi->vol_amap = kvmalloc_array(sbi->map_sectors, sizeof(struct buffer_head *), GFP_KERNEL); if (!sbi->vol_amap) return -ENOMEM; @@ -84,7 +84,7 @@ static int exfat_allocate_bitmap(struct while (j < i) brelse(sbi->vol_amap[j++]);
- kfree(sbi->vol_amap); + kvfree(sbi->vol_amap); sbi->vol_amap = NULL; return -EIO; } @@ -138,7 +138,7 @@ void exfat_free_bitmap(struct exfat_sb_i for (i = 0; i < sbi->map_sectors; i++) __brelse(sbi->vol_amap[i]);
- kfree(sbi->vol_amap); + kvfree(sbi->vol_amap); }
int exfat_set_bitmap(struct inode *inode, unsigned int clu, bool sync)
From: Sungjong Seo sj1557.seo@samsung.com
commit ff84772fd45d486e4fc78c82e2f70ce5333543e6 upstream.
There is a potential deadlock reported by syzbot as below:
====================================================== WARNING: possible circular locking dependency detected 6.4.0-next-20230707-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor330/5073 is trying to acquire lock: ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock_killable include/linux/mmap_lock.h:151 [inline] ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: get_mmap_lock_carefully mm/memory.c:5293 [inline] ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: lock_mm_and_find_vma+0x369/0x510 mm/memory.c:5344 but task is already holding lock: ffff888019f760e0 (&sbi->s_lock){+.+.}-{3:3}, at: exfat_iterate+0x117/0xb50 fs/exfat/dir.c:232
which lock already depends on the new lock.
Chain exists of: &mm->mmap_lock --> mapping.invalidate_lock#3 --> &sbi->s_lock
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&sbi->s_lock); lock(mapping.invalidate_lock#3); lock(&sbi->s_lock); rlock(&mm->mmap_lock);
Let's try to avoid above potential deadlock condition by moving dir_emit*() out of sbi->s_lock coverage.
Fixes: ca06197382bd ("exfat: add directory operations") Cc: stable@vger.kernel.org #v5.7+ Reported-by: syzbot+1741a5d9b79989c10bdc@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/00000000000078ee7e060066270b@google.com/T/#u Tested-by: syzbot+1741a5d9b79989c10bdc@syzkaller.appspotmail.com Signed-off-by: Sungjong Seo sj1557.seo@samsung.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/exfat/dir.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-)
--- a/fs/exfat/dir.c +++ b/fs/exfat/dir.c @@ -210,7 +210,10 @@ static void exfat_free_namebuf(struct ex exfat_init_namebuf(nb); }
-/* skip iterating emit_dots when dir is empty */ +/* + * Before calling dir_emit*(), sbi->s_lock should be released + * because page fault can occur in dir_emit*(). + */ #define ITER_POS_FILLED_DOTS (2) static int exfat_iterate(struct file *file, struct dir_context *ctx) { @@ -225,11 +228,10 @@ static int exfat_iterate(struct file *fi int err = 0, fake_offset = 0;
exfat_init_namebuf(nb); - mutex_lock(&EXFAT_SB(sb)->s_lock);
cpos = ctx->pos; if (!dir_emit_dots(file, ctx)) - goto unlock; + goto out;
if (ctx->pos == ITER_POS_FILLED_DOTS) { cpos = 0; @@ -241,16 +243,18 @@ static int exfat_iterate(struct file *fi /* name buffer should be allocated before use */ err = exfat_alloc_namebuf(nb); if (err) - goto unlock; + goto out; get_new: + mutex_lock(&EXFAT_SB(sb)->s_lock); + if (ei->flags == ALLOC_NO_FAT_CHAIN && cpos >= i_size_read(inode)) goto end_of_dir;
err = exfat_readdir(inode, &cpos, &de); if (err) { /* - * At least we tried to read a sector. Move cpos to next sector - * position (should be aligned). + * At least we tried to read a sector. + * Move cpos to next sector position (should be aligned). */ if (err == -EIO) { cpos += 1 << (sb->s_blocksize_bits); @@ -273,16 +277,10 @@ get_new: inum = iunique(sb, EXFAT_ROOT_INO); }
- /* - * Before calling dir_emit(), sb_lock should be released. - * Because page fault can occur in dir_emit() when the size - * of buffer given from user is larger than one page size. - */ mutex_unlock(&EXFAT_SB(sb)->s_lock); if (!dir_emit(ctx, nb->lfn, strlen(nb->lfn), inum, (de.attr & ATTR_SUBDIR) ? DT_DIR : DT_REG)) - goto out_unlocked; - mutex_lock(&EXFAT_SB(sb)->s_lock); + goto out; ctx->pos = cpos; goto get_new;
@@ -290,9 +288,8 @@ end_of_dir: if (!cpos && fake_offset) cpos = ITER_POS_FILLED_DOTS; ctx->pos = cpos; -unlock: mutex_unlock(&EXFAT_SB(sb)->s_lock); -out_unlocked: +out: /* * To improve performance, free namebuf after unlock sb_lock. * If namebuf is not allocated, this function do nothing
From: Olivier Maignial olivier.maignial@hotmail.fr
commit 8544cda94dae6be3f1359539079c68bb731428b1 upstream.
Reading ECC status is failing.
tx58cxgxsxraix_ecc_get_status() is using on-stack buffer for SPINAND_GET_FEATURE_OP() output. It is not suitable for DMA needs of spi-mem.
Fix this by using the spi-mem operations dedicated buffer spinand->scratchbuf.
See spinand->scratchbuf: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/incl... spi_mem_check_op(): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Fixes: 10949af1681d ("mtd: spinand: Add initial support for Toshiba TC58CVG2S0H") Cc: stable@vger.kernel.org Signed-off-by: Olivier Maignial olivier.maignial@hotmail.fr Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/DB4P250MB1032553D05FBE36DEE0D311EFE23A@DB4... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/spi/toshiba.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/mtd/nand/spi/toshiba.c +++ b/drivers/mtd/nand/spi/toshiba.c @@ -73,7 +73,7 @@ static int tx58cxgxsxraix_ecc_get_status { struct nand_device *nand = spinand_to_nand(spinand); u8 mbf = 0; - struct spi_mem_op op = SPINAND_GET_FEATURE_OP(0x30, &mbf); + struct spi_mem_op op = SPINAND_GET_FEATURE_OP(0x30, spinand->scratchbuf);
switch (status & STATUS_ECC_MASK) { case STATUS_ECC_NO_BITFLIPS: @@ -92,7 +92,7 @@ static int tx58cxgxsxraix_ecc_get_status if (spi_mem_exec_op(spinand->spimem, &op)) return nanddev_get_ecc_conf(nand)->strength;
- mbf >>= 4; + mbf = *(spinand->scratchbuf) >> 4;
if (WARN_ON(mbf > nanddev_get_ecc_conf(nand)->strength || !mbf)) return nanddev_get_ecc_conf(nand)->strength;
From: Arseniy Krasnov AVKrasnov@sberdevices.ru
commit 7e6b04f9238eab0f684fafd158c1f32ea65b9eaa upstream.
It is incorrect to calculate number of OOB bytes for ECC engine using some "already known" ECC step size (1024 bytes here). Number of such bytes for ECC engine must be whole OOB except 2 bytes for bad block marker, while proper ECC step size and strength will be selected by ECC logic.
Fixes: 8fae856c5350 ("mtd: rawnand: meson: add support for Amlogic NAND flash controller") Cc: Stable@vger.kernel.org Signed-off-by: Arseniy Krasnov AVKrasnov@sberdevices.ru Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20230705065211.293500-1-AVKrasnov@sberdevi... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/meson_nand.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/mtd/nand/raw/meson_nand.c +++ b/drivers/mtd/nand/raw/meson_nand.c @@ -1184,7 +1184,6 @@ static int meson_nand_attach_chip(struct struct meson_nfc *nfc = nand_get_controller_data(nand); struct meson_nfc_nand_chip *meson_chip = to_meson_nand(nand); struct mtd_info *mtd = nand_to_mtd(nand); - int nsectors = mtd->writesize / 1024; int ret;
if (!mtd->name) { @@ -1202,7 +1201,7 @@ static int meson_nand_attach_chip(struct nand->options |= NAND_NO_SUBPAGE_WRITE;
ret = nand_ecc_choose_conf(nand, nfc->data->ecc_caps, - mtd->oobsize - 2 * nsectors); + mtd->oobsize - 2); if (ret) { dev_err(nfc->dev, "failed to ECC init\n"); return -EINVAL;
From: Jiri Olsa jolsa@kernel.org
commit f2c67a3e60d1071b65848efaa8c3b66c363dd025 upstream.
The nesting protection in bpf_perf_event_output relies on disabled preemption, which is guaranteed for kprobes and tracepoints.
However bpf_perf_event_output can be also called from uprobes context through bpf_prog_run_array_sleepable function which disables migration, but keeps preemption enabled.
This can cause task to be preempted by another one inside the nesting protection and lead eventually to two tasks using same perf_sample_data buffer and cause crashes like:
kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle page fault for address: ffffffff82be3eea ... Call Trace: ? __die+0x1f/0x70 ? page_fault_oops+0x176/0x4d0 ? exc_page_fault+0x132/0x230 ? asm_exc_page_fault+0x22/0x30 ? perf_output_sample+0x12b/0x910 ? perf_event_output+0xd0/0x1d0 ? bpf_perf_event_output+0x162/0x1d0 ? bpf_prog_c6271286d9a4c938_krava1+0x76/0x87 ? __uprobe_perf_func+0x12b/0x540 ? uprobe_dispatcher+0x2c4/0x430 ? uprobe_notify_resume+0x2da/0xce0 ? atomic_notifier_call_chain+0x7b/0x110 ? exit_to_user_mode_prepare+0x13e/0x290 ? irqentry_exit_to_user_mode+0x5/0x30 ? asm_exc_int3+0x35/0x40
Fixing this by disabling preemption in bpf_perf_event_output.
Cc: stable@vger.kernel.org Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") Acked-by: Hou Tao houtao1@huawei.com Signed-off-by: Jiri Olsa jolsa@kernel.org Link: https://lore.kernel.org/r/20230725084206.580930-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/bpf_trace.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -662,8 +662,7 @@ static DEFINE_PER_CPU(int, bpf_trace_nes BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, u64, flags, void *, data, u64, size) { - struct bpf_trace_sample_data *sds = this_cpu_ptr(&bpf_trace_sds); - int nest_level = this_cpu_inc_return(bpf_trace_nest_level); + struct bpf_trace_sample_data *sds; struct perf_raw_record raw = { .frag = { .size = size, @@ -671,7 +670,11 @@ BPF_CALL_5(bpf_perf_event_output, struct }, }; struct perf_sample_data *sd; - int err; + int nest_level, err; + + preempt_disable(); + sds = this_cpu_ptr(&bpf_trace_sds); + nest_level = this_cpu_inc_return(bpf_trace_nest_level);
if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(sds->sds))) { err = -EBUSY; @@ -690,9 +693,9 @@ BPF_CALL_5(bpf_perf_event_output, struct sd->sample_flags |= PERF_SAMPLE_RAW;
err = __bpf_perf_event_output(regs, map, flags, sd); - out: this_cpu_dec(bpf_trace_nest_level); + preempt_enable(); return err; }
From: Dinh Nguyen dinguyen@kernel.org
commit db66795f61354c373ecdadbdae1ed253a96c47cb upstream.
The correct dts property for the SCL falling time is "i2c-scl-falling-time-ns".
Fixes: c8da1d15b8a4 ("arm64: dts: stratix10: i2c clock running out of spec") Cc: stable@vger.kernel.org Signed-off-by: Dinh Nguyen dinguyen@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts | 2 +- arch/arm64/boot/dts/altera/socfpga_stratix10_socdk_nand.dts | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts +++ b/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts @@ -128,7 +128,7 @@ status = "okay"; clock-frequency = <100000>; i2c-sda-falling-time-ns = <890>; /* hcnt */ - i2c-sdl-falling-time-ns = <890>; /* lcnt */ + i2c-scl-falling-time-ns = <890>; /* lcnt */
adc@14 { compatible = "lltc,ltc2497"; --- a/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk_nand.dts +++ b/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk_nand.dts @@ -141,7 +141,7 @@ status = "okay"; clock-frequency = <100000>; i2c-sda-falling-time-ns = <890>; /* hcnt */ - i2c-sdl-falling-time-ns = <890>; /* lcnt */ + i2c-scl-falling-time-ns = <890>; /* lcnt */
adc@14 { compatible = "lltc,ltc2497";
From: Laszlo Ersek lersek@redhat.com
commit 9bc3047374d5bec163e83e743709e23753376f0c upstream.
Commit a096ccca6e50 initializes the "sk_uid" field in the protocol socket (struct sock) from the "/dev/net/tun" device node's owner UID. Per original commit 86741ec25462 ("net: core: Add a UID field to struct sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the userspace process that creates the socket. Commit 86741ec25462 mentions socket() and accept(); with "tun", the action that creates the socket is open("/dev/net/tun").
Therefore the device node's owner UID is irrelevant. In most cases, "/dev/net/tun" will be owned by root, so in practice, commit a096ccca6e50 has no observable effect:
- before, "sk_uid" would be zero, due to undefined behavior (CVE-2023-1076),
- after, "sk_uid" would be zero, due to "/dev/net/tun" being owned by root.
What matters is the (fs)UID of the process performing the open(), so cache that in "sk_uid".
Cc: Eric Dumazet edumazet@google.com Cc: Lorenzo Colitti lorenzo@google.com Cc: Paolo Abeni pabeni@redhat.com Cc: Pietro Borrello borrello@diag.uniroma1.it Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Fixes: a096ccca6e50 ("tun: tun_chr_open(): correctly initialize socket uid") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435 Signed-off-by: Laszlo Ersek lersek@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/tun.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -3457,7 +3457,7 @@ static int tun_chr_open(struct inode *in tfile->socket.file = file; tfile->socket.ops = &tun_socket_ops;
- sock_init_data_uid(&tfile->socket, &tfile->sk, inode->i_uid); + sock_init_data_uid(&tfile->socket, &tfile->sk, current_fsuid());
tfile->sk.sk_write_space = tun_sock_write_space; tfile->sk.sk_sndbuf = INT_MAX;
From: Laszlo Ersek lersek@redhat.com
commit 5c9241f3ceab3257abe2923a59950db0dc8bb737 upstream.
Commit 66b2c338adce initializes the "sk_uid" field in the protocol socket (struct sock) from the "/dev/tapX" device node's owner UID. Per original commit 86741ec25462 ("net: core: Add a UID field to struct sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the userspace process that creates the socket. Commit 86741ec25462 mentions socket() and accept(); with "tap", the action that creates the socket is open("/dev/tapX").
Therefore the device node's owner UID is irrelevant. In most cases, "/dev/tapX" will be owned by root, so in practice, commit 66b2c338adce has no observable effect:
- before, "sk_uid" would be zero, due to undefined behavior (CVE-2023-1076),
- after, "sk_uid" would be zero, due to "/dev/tapX" being owned by root.
What matters is the (fs)UID of the process performing the open(), so cache that in "sk_uid".
Cc: Eric Dumazet edumazet@google.com Cc: Lorenzo Colitti lorenzo@google.com Cc: Paolo Abeni pabeni@redhat.com Cc: Pietro Borrello borrello@diag.uniroma1.it Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 66b2c338adce ("tap: tap_open(): correctly initialize socket uid") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435 Signed-off-by: Laszlo Ersek lersek@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/tap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -533,7 +533,7 @@ static int tap_open(struct inode *inode, q->sock.state = SS_CONNECTED; q->sock.file = file; q->sock.ops = &tap_socket_ops; - sock_init_data_uid(&q->sock, &q->sk, inode->i_uid); + sock_init_data_uid(&q->sock, &q->sk, current_fsuid()); q->sk.sk_write_space = tap_sock_write_space; q->sk.sk_destruct = tap_sock_destruct; q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP;
From: Paul Fertser fercerpav@gmail.com
commit 421033deb91521aa6a9255e495cb106741a52275 upstream.
On DBDC devices the first (internal) phy is only capable of using 2.4 GHz band, and the 5 GHz band is exposed via a separate phy object, so avoid the false advertising.
Reported-by: Rani Hod rani.hod@gmail.com Closes: https://github.com/openwrt/openwrt/pull/12361 Fixes: 7660a1bd0c22 ("mt76: mt7615: register ext_phy if DBDC is detected") Cc: stable@vger.kernel.org Signed-off-by: Paul Fertser fercerpav@gmail.com Reviewed-by: Simon Horman simon.horman@corigine.com Acked-by: Felix Fietkau nbd@nbd.name Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20230605073408.8699-1-fercerpav@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c +++ b/drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c @@ -123,12 +123,12 @@ mt7615_eeprom_parse_hw_band_cap(struct m case MT_EE_5GHZ: dev->mphy.cap.has_5ghz = true; break; - case MT_EE_2GHZ: - dev->mphy.cap.has_2ghz = true; - break; case MT_EE_DBDC: dev->dbdc_support = true; fallthrough; + case MT_EE_2GHZ: + dev->mphy.cap.has_2ghz = true; + break; default: dev->mphy.cap.has_2ghz = true; dev->mphy.cap.has_5ghz = true;
From: Michael Kelley mikelley@microsoft.com
commit d5ace2a776442d80674eff9ed42e737f7dd95056 upstream.
On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs with ConfigVersion 9.3 or later support IBT in the guest. However, current versions of Hyper-V have a bug in that there's not an ENDBR64 instruction at the beginning of the hypercall page. Since hypercalls are made with an indirect call to the hypercall page, all hypercall attempts fail with an exception and Linux panics.
A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start with ENDBR. The VM will boot and run without IBT.
If future Linux 32-bit kernels were to support IBT, additional hypercall page hackery would be needed to make IBT work for such kernels in a Hyper-V VM.
Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/1690001476-98594-1-git-send-email-mikelley@microso... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/hyperv/hv_init.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
--- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -14,6 +14,7 @@ #include <asm/apic.h> #include <asm/desc.h> #include <asm/sev.h> +#include <asm/ibt.h> #include <asm/hypervisor.h> #include <asm/hyperv-tlfs.h> #include <asm/mshyperv.h> @@ -468,6 +469,26 @@ void __init hyperv_init(void) }
/* + * Some versions of Hyper-V that provide IBT in guest VMs have a bug + * in that there's no ENDBR64 instruction at the entry to the + * hypercall page. Because hypercalls are invoked via an indirect call + * to the hypercall page, all hypercall attempts fail when IBT is + * enabled, and Linux panics. For such buggy versions, disable IBT. + * + * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall + * page, so if future Linux kernel versions enable IBT for 32-bit + * builds, additional hypercall page hackery will be required here + * to provide an ENDBR32. + */ +#ifdef CONFIG_X86_KERNEL_IBT + if (cpu_feature_enabled(X86_FEATURE_IBT) && + *(u32 *)hv_hypercall_pg != gen_endbr()) { + setup_clear_cpu_cap(X86_FEATURE_IBT); + pr_warn("Hyper-V: Disabling IBT because of Hyper-V bug\n"); + } +#endif + + /* * hyperv_init() is called before LAPIC is initialized: see * apic_intr_mode_init() -> x86_platform.apic_post_init() and * apic_bsp_setup() -> setup_local_APIC(). The direct-mode STIMER
From: Ilya Dryomov idryomov@gmail.com
commit 9d01e07fd1bfb4daae156ab528aa196f5ac2b2bc upstream.
Due to rbd_try_acquire_lock() effectively swallowing all but EBLOCKLISTED error from rbd_try_lock() ("request lock anyway") and rbd_request_lock() returning ETIMEDOUT error not only for an actual notify timeout but also when the lock owner doesn't respond, a busy loop inside of rbd_acquire_lock() between rbd_try_acquire_lock() and rbd_request_lock() is possible.
Requesting the lock on EBUSY error (returned by get_lock_owner_info() if an incompatible lock or invalid lock owner is detected) makes very little sense. The same goes for ETIMEDOUT error (might pop up pretty much anywhere if osd_request_timeout option is set) and many others.
Just fail I/O requests on rbd_dev->acquiring_list immediately on any error from rbd_try_lock().
Cc: stable@vger.kernel.org # 588159009d5b: rbd: retrieve and check lock owner twice before blocklisting Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Dongsheng Yang dongsheng.yang@easystack.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/rbd.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)
--- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -3676,7 +3676,7 @@ static int rbd_lock(struct rbd_device *r ret = ceph_cls_lock(osdc, &rbd_dev->header_oid, &rbd_dev->header_oloc, RBD_LOCK_NAME, CEPH_CLS_LOCK_EXCLUSIVE, cookie, RBD_LOCK_TAG, "", 0); - if (ret) + if (ret && ret != -EEXIST) return ret;
__rbd_lock(rbd_dev, cookie); @@ -3879,7 +3879,7 @@ static struct ceph_locker *get_lock_owne &rbd_dev->header_oloc, RBD_LOCK_NAME, &lock_type, &lock_tag, &lockers, &num_lockers); if (ret) { - rbd_warn(rbd_dev, "failed to retrieve lockers: %d", ret); + rbd_warn(rbd_dev, "failed to get header lockers: %d", ret); return ERR_PTR(ret); }
@@ -3941,8 +3941,10 @@ static int find_watcher(struct rbd_devic ret = ceph_osdc_list_watchers(osdc, &rbd_dev->header_oid, &rbd_dev->header_oloc, &watchers, &num_watchers); - if (ret) + if (ret) { + rbd_warn(rbd_dev, "failed to get watchers: %d", ret); return ret; + }
sscanf(locker->id.cookie, RBD_LOCK_COOKIE_PREFIX " %llu", &cookie); for (i = 0; i < num_watchers; i++) { @@ -3986,8 +3988,12 @@ static int rbd_try_lock(struct rbd_devic locker = refreshed_locker = NULL;
ret = rbd_lock(rbd_dev); - if (ret != -EBUSY) + if (!ret) + goto out; + if (ret != -EBUSY) { + rbd_warn(rbd_dev, "failed to lock header: %d", ret); goto out; + }
/* determine if the current lock holder is still alive */ locker = get_lock_owner_info(rbd_dev); @@ -4090,11 +4096,8 @@ static int rbd_try_acquire_lock(struct r
ret = rbd_try_lock(rbd_dev); if (ret < 0) { - rbd_warn(rbd_dev, "failed to lock header: %d", ret); - if (ret == -EBLOCKLISTED) - goto out; - - ret = 1; /* request lock anyway */ + rbd_warn(rbd_dev, "failed to acquire lock: %d", ret); + goto out; } if (ret > 0) { up_write(&rbd_dev->lock_rwsem); @@ -6628,12 +6631,11 @@ static int rbd_add_acquire_lock(struct r cancel_delayed_work_sync(&rbd_dev->lock_dwork); if (!ret) ret = -ETIMEDOUT; - }
- if (ret) { - rbd_warn(rbd_dev, "failed to acquire exclusive lock: %ld", ret); - return ret; + rbd_warn(rbd_dev, "failed to acquire lock: %ld", ret); } + if (ret) + return ret;
/* * The lock may have been released by now, unless automatic lock
From: Jiri Olsa jolsa@kernel.org
commit d62cc390c2e99ae267ffe4b8d7e2e08b6c758c32 upstream.
We received report [1] of kernel crash, which is caused by using nesting protection without disabled preemption.
The bpf_event_output can be called by programs executed by bpf_prog_run_array_cg function that disabled migration but keeps preemption enabled.
This can cause task to be preempted by another one inside the nesting protection and lead eventually to two tasks using same perf_sample_data buffer and cause crashes like:
BUG: kernel NULL pointer dereference, address: 0000000000000001 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page ... ? perf_output_sample+0x12a/0x9a0 ? finish_task_switch.isra.0+0x81/0x280 ? perf_event_output+0x66/0xa0 ? bpf_event_output+0x13a/0x190 ? bpf_event_output_data+0x22/0x40 ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb ? xa_load+0x87/0xe0 ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0 ? release_sock+0x3e/0x90 ? sk_setsockopt+0x1a1/0x12f0 ? udp_pre_connect+0x36/0x50 ? inet_dgram_connect+0x93/0xa0 ? __sys_connect+0xb4/0xe0 ? udp_setsockopt+0x27/0x40 ? __pfx_udp_push_pending_frames+0x10/0x10 ? __sys_setsockopt+0xdf/0x1a0 ? __x64_sys_connect+0xf/0x20 ? do_syscall_64+0x3a/0x90 ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fixing this by disabling preemption in bpf_event_output.
[1] https://github.com/cilium/cilium/issues/26756 Cc: stable@vger.kernel.org Reported-by: Oleg "livelace" Popov o.popov@livelace.ru Closes: https://github.com/cilium/cilium/issues/26756 Fixes: 2a916f2f546c ("bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.") Acked-by: Hou Tao houtao1@huawei.com Signed-off-by: Jiri Olsa jolsa@kernel.org Link: https://lore.kernel.org/r/20230725084206.580930-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/bpf_trace.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -720,7 +720,6 @@ static DEFINE_PER_CPU(struct bpf_trace_s u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy) { - int nest_level = this_cpu_inc_return(bpf_event_output_nest_level); struct perf_raw_frag frag = { .copy = ctx_copy, .size = ctx_size, @@ -737,8 +736,12 @@ u64 bpf_event_output(struct bpf_map *map }; struct perf_sample_data *sd; struct pt_regs *regs; + int nest_level; u64 ret;
+ preempt_disable(); + nest_level = this_cpu_inc_return(bpf_event_output_nest_level); + if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(bpf_misc_sds.sds))) { ret = -EBUSY; goto out; @@ -754,6 +757,7 @@ u64 bpf_event_output(struct bpf_map *map ret = __bpf_perf_event_output(regs, map, flags, sd); out: this_cpu_dec(bpf_event_output_nest_level); + preempt_enable(); return ret; }
From: Naveen N Rao naveen@kernel.org
commit 41a506ef71eb38d94fe133f565c87c3e06ccc072 upstream.
With ppc64 -mprofile-kernel and ppc32 -pg, profiling instructions to call into ftrace are emitted right at function entry. The instruction sequence used is minimal to reduce overhead. Crucially, a stackframe is not created for the function being traced. This breaks stack unwinding since the function being traced does not have a stackframe for itself. As such, it never shows up in the backtrace:
/sys/kernel/debug/tracing # echo 1 > /proc/sys/kernel/stack_tracer_enabled /sys/kernel/debug/tracing # cat stack_trace Depth Size Location (17 entries) ----- ---- -------- 0) 4144 32 ftrace_call+0x4/0x44 1) 4112 432 get_page_from_freelist+0x26c/0x1ad0 2) 3680 496 __alloc_pages+0x290/0x1280 3) 3184 336 __folio_alloc+0x34/0x90 4) 2848 176 vma_alloc_folio+0xd8/0x540 5) 2672 272 __handle_mm_fault+0x700/0x1cc0 6) 2400 208 handle_mm_fault+0xf0/0x3f0 7) 2192 80 ___do_page_fault+0x3e4/0xbe0 8) 2112 160 do_page_fault+0x30/0xc0 9) 1952 256 data_access_common_virt+0x210/0x220 10) 1696 400 0xc00000000f16b100 11) 1296 384 load_elf_binary+0x804/0x1b80 12) 912 208 bprm_execve+0x2d8/0x7e0 13) 704 64 do_execveat_common+0x1d0/0x2f0 14) 640 160 sys_execve+0x54/0x70 15) 480 64 system_call_exception+0x138/0x350 16) 416 416 system_call_common+0x160/0x2c4
Fix this by having ftrace create a dummy stackframe for the function being traced. With this, backtraces now capture the function being traced:
/sys/kernel/debug/tracing # cat stack_trace Depth Size Location (17 entries) ----- ---- -------- 0) 3888 32 _raw_spin_trylock+0x8/0x70 1) 3856 576 get_page_from_freelist+0x26c/0x1ad0 2) 3280 64 __alloc_pages+0x290/0x1280 3) 3216 336 __folio_alloc+0x34/0x90 4) 2880 176 vma_alloc_folio+0xd8/0x540 5) 2704 416 __handle_mm_fault+0x700/0x1cc0 6) 2288 96 handle_mm_fault+0xf0/0x3f0 7) 2192 48 ___do_page_fault+0x3e4/0xbe0 8) 2144 192 do_page_fault+0x30/0xc0 9) 1952 608 data_access_common_virt+0x210/0x220 10) 1344 16 0xc0000000334bbb50 11) 1328 416 load_elf_binary+0x804/0x1b80 12) 912 64 bprm_execve+0x2d8/0x7e0 13) 848 176 do_execveat_common+0x1d0/0x2f0 14) 672 192 sys_execve+0x54/0x70 15) 480 64 system_call_exception+0x138/0x350 16) 416 416 system_call_common+0x160/0x2c4
This results in two additional stores in the ftrace entry code, but produces reliable backtraces.
Fixes: 153086644fd1 ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI") Cc: stable@vger.kernel.org Signed-off-by: Naveen N Rao naveen@kernel.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20230621051349.759567-1-naveen@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/trace/ftrace_mprofile.S | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/arch/powerpc/kernel/trace/ftrace_mprofile.S +++ b/arch/powerpc/kernel/trace/ftrace_mprofile.S @@ -33,6 +33,9 @@ * and then arrange for the ftrace function to be called. */ .macro ftrace_regs_entry allregs + /* Create a minimal stack frame for representing B */ + PPC_STLU r1, -STACK_FRAME_MIN_SIZE(r1) + /* Create our stack frame + pt_regs */ PPC_STLU r1,-SWITCH_FRAME_SIZE(r1)
@@ -42,7 +45,7 @@
#ifdef CONFIG_PPC64 /* Save the original return address in A's stack frame */ - std r0, LRSAVE+SWITCH_FRAME_SIZE(r1) + std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1) /* Ok to continue? */ lbz r3, PACA_FTRACE_ENABLED(r13) cmpdi r3, 0 @@ -77,6 +80,8 @@ mflr r7 /* Save it as pt_regs->nip */ PPC_STL r7, _NIP(r1) + /* Also save it in B's stackframe header for proper unwind */ + PPC_STL r7, LRSAVE+SWITCH_FRAME_SIZE(r1) /* Save the read LR in pt_regs->link */ PPC_STL r0, _LINK(r1)
@@ -142,7 +147,7 @@ #endif
/* Pop our stack frame */ - addi r1, r1, SWITCH_FRAME_SIZE + addi r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
#ifdef CONFIG_LIVEPATCH_64 /* Based on the cmpd above, if the NIP was altered handle livepatch */
From: Mark Brown broonie@kernel.org
commit 69af56ae56a48a2522aad906c4461c6c7c092737 upstream.
We have a function sve_sync_from_fpsimd_zeropad() which is used by the ptrace code to update the SVE state when the user writes to the the FPSIMD register set. Currently this checks that the task has SVE enabled but this will miss updates for tasks which have streaming SVE enabled if SVE has not been enabled for the thread, also do the conversion if the task has streaming SVE enabled.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-3-49df214... Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/fpsimd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -791,7 +791,8 @@ void sve_sync_from_fpsimd_zeropad(struct void *sst = task->thread.sve_state; struct user_fpsimd_state const *fst = &task->thread.uw.fpsimd_state;
- if (!test_tsk_thread_flag(task, TIF_SVE)) + if (!test_tsk_thread_flag(task, TIF_SVE) && + !thread_sm_enabled(&task->thread)) return;
vq = sve_vq_from_vl(thread_get_cur_vl(&task->thread));
From: Mark Brown broonie@kernel.org
commit c9bb40b7f786662e33d71afe236442b0b61f0446 upstream.
When setting SME vector lengths we clear TIF_SME to reenable SME traps, doing a reallocation of the backing storage on next use. We do this using clear_thread_flag() which operates on the current thread, meaning that when setting the vector length via ptrace we may both not force traps for the target task and force a spurious flush of any SME state that the tracing task may have.
Clear the flag in the target task.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Reported-by: David Spickett David.Spickett@arm.com Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-tif-sme-v1-1-88312fd6fbf... Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/fpsimd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -864,7 +864,7 @@ int vec_set_vector_length(struct task_st */ task->thread.svcr &= ~(SVCR_SM_MASK | SVCR_ZA_MASK); - clear_thread_flag(TIF_SME); + clear_tsk_thread_flag(task, TIF_SME); free_sme = true; } }
From: Mark Brown broonie@kernel.org
commit 507ea5dd92d23fcf10e4d1a68a443c86a49753ed upstream.
Currently we guard FPSIMD/SVE state conversions with a check for the system supporting SVE but SME only systems may need to sync streaming mode SVE state so add a check for SME support too. These functions are only used by the ptrace code.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-2-49df214... Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/fpsimd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -634,7 +634,7 @@ static void fpsimd_to_sve(struct task_st void *sst = task->thread.sve_state; struct user_fpsimd_state const *fst = &task->thread.uw.fpsimd_state;
- if (!system_supports_sve()) + if (!system_supports_sve() && !system_supports_sme()) return;
vq = sve_vq_from_vl(thread_get_cur_vl(&task->thread)); @@ -660,7 +660,7 @@ static void sve_to_fpsimd(struct task_st unsigned int i; __uint128_t const *p;
- if (!system_supports_sve()) + if (!system_supports_sve() && !system_supports_sme()) return;
vl = thread_get_cur_vl(&task->thread);
From: Aleksa Sarai cyphar@cyphar.com
commit a0fc452a5d7fed986205539259df1d60546f536c upstream.
O_TMPFILE is actually __O_TMPFILE|O_DIRECTORY. This means that the old fast-path check for RESOLVE_CACHED would reject all users passing O_DIRECTORY with -EAGAIN, when in fact the intended test was to check for __O_TMPFILE.
Cc: stable@vger.kernel.org # v5.12+ Fixes: 99668f618062 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED") Signed-off-by: Aleksa Sarai cyphar@cyphar.com Message-Id: 20230806-resolve_cached-o_tmpfile-v1-1-7ba16308465e@cyphar.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/open.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/open.c +++ b/fs/open.c @@ -1233,7 +1233,7 @@ inline int build_open_flags(const struct lookup_flags |= LOOKUP_IN_ROOT; if (how->resolve & RESOLVE_CACHED) { /* Don't bother even trying for create/truncate/tmpfile open */ - if (flags & (O_TRUNC | O_CREAT | O_TMPFILE)) + if (flags & (O_TRUNC | O_CREAT | __O_TMPFILE)) return -EAGAIN; lookup_flags |= LOOKUP_CACHED; }
From: Guchun Chen guchun.chen@amd.com
commit 2dedcf414bb01b8d966eb445db1d181d92304fb2 upstream.
Add a check to avoid null pointer dereference as below:
[ 90.002283] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 90.002292] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 90.002346] ? exc_general_protection+0x159/0x240 [ 90.002352] ? asm_exc_general_protection+0x26/0x30 [ 90.002357] ? ttm_bo_evict_swapout_allowable+0x322/0x5e0 [ttm] [ 90.002365] ? ttm_bo_evict_swapout_allowable+0x42e/0x5e0 [ttm] [ 90.002373] ttm_bo_swapout+0x134/0x7f0 [ttm] [ 90.002383] ? __pfx_ttm_bo_swapout+0x10/0x10 [ttm] [ 90.002391] ? lock_acquire+0x44d/0x4f0 [ 90.002398] ? ttm_device_swapout+0xa5/0x260 [ttm] [ 90.002412] ? lock_acquired+0x355/0xa00 [ 90.002416] ? do_raw_spin_trylock+0xb6/0x190 [ 90.002421] ? __pfx_lock_acquired+0x10/0x10 [ 90.002426] ? ttm_global_swapout+0x25/0x210 [ttm] [ 90.002442] ttm_device_swapout+0x198/0x260 [ttm] [ 90.002456] ? __pfx_ttm_device_swapout+0x10/0x10 [ttm] [ 90.002472] ttm_global_swapout+0x75/0x210 [ttm] [ 90.002486] ttm_tt_populate+0x187/0x3f0 [ttm] [ 90.002501] ttm_bo_handle_move_mem+0x437/0x590 [ttm] [ 90.002517] ttm_bo_validate+0x275/0x430 [ttm] [ 90.002530] ? __pfx_ttm_bo_validate+0x10/0x10 [ttm] [ 90.002544] ? kasan_save_stack+0x33/0x60 [ 90.002550] ? kasan_set_track+0x25/0x30 [ 90.002554] ? __kasan_kmalloc+0x8f/0xa0 [ 90.002558] ? amdgpu_gtt_mgr_new+0x81/0x420 [amdgpu] [ 90.003023] ? ttm_resource_alloc+0xf6/0x220 [ttm] [ 90.003038] amdgpu_bo_pin_restricted+0x2dd/0x8b0 [amdgpu] [ 90.003210] ? __x64_sys_ioctl+0x131/0x1a0 [ 90.003210] ? do_syscall_64+0x60/0x90
Fixes: a2848d08742c ("drm/ttm: never consider pinned BOs for eviction&swap") Tested-by: Mikhail Gavrilov mikhail.v.gavrilov@gmail.com Signed-off-by: Guchun Chen guchun.chen@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Christian König christian.koenig@amd.com Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20230724024229.1118444-1-guchu... Signed-off-by: Christian König christian.koenig@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/ttm/ttm_bo.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -552,7 +552,8 @@ static bool ttm_bo_evict_swapout_allowab
if (bo->pin_count) { *locked = false; - *busy = false; + if (busy) + *busy = false; return false; }
From: Janusz Krzysztofik janusz.krzysztofik@linux.intel.com
commit a337b64f0d5717248a0c894e2618e658e6a9de9f upstream.
Infinite waits for completion of GPU activity have been observed in CI, mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps generated with a lot of extra trace_printk() calls added to the code, revealed loops of request dependencies being accidentally built, preventing the requests from being processed, each waiting for completion of another one's activity.
After we substitute a new request for a last active one tracked on a timeline, we set up a dependency of our new request to wait on completion of current activity of that previous one. While doing that, we must take care of keeping the old request still in memory until we use its attributes for setting up that await dependency, or we can happen to set up the await dependency on an unrelated request that already reuses the memory previously allocated to the old one, already released. Combined with perf adding consecutive kernel context remote requests to different user context timelines, unresolvable loops of await dependencies can be built, leading do infinite waits.
We obtain a pointer to the previous request to wait upon when we substitute it with a pointer to our new request in an active tracker, e.g. in intel_timeline.last_request. In some processing paths we protect that old request from being freed before we use it by getting a reference to it under RCU protection, but in others, e.g. __i915_request_commit() -> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(), we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU, that RCU protection is not sufficient against reuse of memory.
We could protect i915_request's memory from being prematurely reused by calling its release function via call_rcu() and using rcu_read_lock() consequently, as proposed in v1. However, that approach leads to significant (up to 10 times) increase of SLAB utilization by i915_request SLAB cache. Another potential approach is to take a reference to the previous active fence.
When updating an active fence tracker, we first lock the new fence, substitute a pointer of the current active fence with the new one, then we lock the substituted fence. With this approach, there is a time window after the substitution and before the lock when the request can be concurrently released by an interrupt handler and its memory reused, then we may happen to lock and return a new, unrelated request.
Always get a reference to the current active fence first, before replacing it with a new one. Having it protected from premature release and reuse, lock it and then replace with the new one but only if not yet signalled via a potential concurrent interrupt nor replaced with another one by a potential concurrent thread, otherwise retry, starting from getting a reference to the new current one. Adjust users to not get a reference to the previous active fence themselves and always put the reference got by __i915_active_fence_set() when no longer needed.
v3: Fix lockdep splat reports and other issues caused by incorrect use of try_cmpxchg() (use (cmpxchg() != prev) instead) v2: Protect request's memory by getting a reference to it in favor of delegating its release to call_rcu() (Chris)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211 Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself") Suggested-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Janusz Krzysztofik janusz.krzysztofik@linux.intel.com Cc: stable@vger.kernel.org # v5.6+ Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janusz... (cherry picked from commit 946e047a3d88d46d15b5c5af0414098e12b243f7) Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/i915_active.c | 99 +++++++++++++++++++++++++----------- drivers/gpu/drm/i915/i915_request.c | 11 ++++ 2 files changed, 81 insertions(+), 29 deletions(-)
--- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -449,8 +449,11 @@ int i915_active_add_request(struct i915_ } } while (unlikely(is_barrier(active)));
- if (!__i915_active_fence_set(active, fence)) + fence = __i915_active_fence_set(active, fence); + if (!fence) __i915_active_acquire(ref); + else + dma_fence_put(fence);
out: i915_active_release(ref); @@ -469,13 +472,9 @@ __i915_active_set_fence(struct i915_acti return NULL; }
- rcu_read_lock(); prev = __i915_active_fence_set(active, fence); - if (prev) - prev = dma_fence_get_rcu(prev); - else + if (!prev) __i915_active_acquire(ref); - rcu_read_unlock();
return prev; } @@ -1019,10 +1018,11 @@ void i915_request_add_active_barriers(st * * Records the new @fence as the last active fence along its timeline in * this active tracker, moving the tracking callbacks from the previous - * fence onto this one. Returns the previous fence (if not already completed), - * which the caller must ensure is executed before the new fence. To ensure - * that the order of fences within the timeline of the i915_active_fence is - * understood, it should be locked by the caller. + * fence onto this one. Gets and returns a reference to the previous fence + * (if not already completed), which the caller must put after making sure + * that it is executed before the new fence. To ensure that the order of + * fences within the timeline of the i915_active_fence is understood, it + * should be locked by the caller. */ struct dma_fence * __i915_active_fence_set(struct i915_active_fence *active, @@ -1031,7 +1031,23 @@ __i915_active_fence_set(struct i915_acti struct dma_fence *prev; unsigned long flags;
- if (fence == rcu_access_pointer(active->fence)) + /* + * In case of fences embedded in i915_requests, their memory is + * SLAB_FAILSAFE_BY_RCU, then it can be reused right after release + * by new requests. Then, there is a risk of passing back a pointer + * to a new, completely unrelated fence that reuses the same memory + * while tracked under a different active tracker. Combined with i915 + * perf open/close operations that build await dependencies between + * engine kernel context requests and user requests from different + * timelines, this can lead to dependency loops and infinite waits. + * + * As a countermeasure, we try to get a reference to the active->fence + * first, so if we succeed and pass it back to our user then it is not + * released and potentially reused by an unrelated request before the + * user has a chance to set up an await dependency on it. + */ + prev = i915_active_fence_get(active); + if (fence == prev) return fence;
GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)); @@ -1040,27 +1056,56 @@ __i915_active_fence_set(struct i915_acti * Consider that we have two threads arriving (A and B), with * C already resident as the active->fence. * - * A does the xchg first, and so it sees C or NULL depending - * on the timing of the interrupt handler. If it is NULL, the - * previous fence must have been signaled and we know that - * we are first on the timeline. If it is still present, - * we acquire the lock on that fence and serialise with the interrupt - * handler, in the process removing it from any future interrupt - * callback. A will then wait on C before executing (if present). - * - * As B is second, it sees A as the previous fence and so waits for - * it to complete its transition and takes over the occupancy for - * itself -- remembering that it needs to wait on A before executing. + * Both A and B have got a reference to C or NULL, depending on the + * timing of the interrupt handler. Let's assume that if A has got C + * then it has locked C first (before B). * * Note the strong ordering of the timeline also provides consistent * nesting rules for the fence->lock; the inner lock is always the * older lock. */ spin_lock_irqsave(fence->lock, flags); - prev = xchg(__active_fence_slot(active), fence); - if (prev) { - GEM_BUG_ON(prev == fence); + if (prev) spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING); + + /* + * A does the cmpxchg first, and so it sees C or NULL, as before, or + * something else, depending on the timing of other threads and/or + * interrupt handler. If not the same as before then A unlocks C if + * applicable and retries, starting from an attempt to get a new + * active->fence. Meanwhile, B follows the same path as A. + * Once A succeeds with cmpxch, B fails again, retires, gets A from + * active->fence, locks it as soon as A completes, and possibly + * succeeds with cmpxchg. + */ + while (cmpxchg(__active_fence_slot(active), prev, fence) != prev) { + if (prev) { + spin_unlock(prev->lock); + dma_fence_put(prev); + } + spin_unlock_irqrestore(fence->lock, flags); + + prev = i915_active_fence_get(active); + GEM_BUG_ON(prev == fence); + + spin_lock_irqsave(fence->lock, flags); + if (prev) + spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING); + } + + /* + * If prev is NULL then the previous fence must have been signaled + * and we know that we are first on the timeline. If it is still + * present then, having the lock on that fence already acquired, we + * serialise with the interrupt handler, in the process of removing it + * from any future interrupt callback. A will then wait on C before + * executing (if present). + * + * As B is second, it sees A as the previous fence and so waits for + * it to complete its transition and takes over the occupancy for + * itself -- remembering that it needs to wait on A before executing. + */ + if (prev) { __list_del_entry(&active->cb.node); spin_unlock(prev->lock); /* serialise with prev->cb_list */ } @@ -1077,11 +1122,7 @@ int i915_active_fence_set(struct i915_ac int err = 0;
/* Must maintain timeline ordering wrt previous active requests */ - rcu_read_lock(); fence = __i915_active_fence_set(active, &rq->fence); - if (fence) /* but the previous fence may not belong to that timeline! */ - fence = dma_fence_get_rcu(fence); - rcu_read_unlock(); if (fence) { err = i915_request_await_dma_fence(rq, fence); dma_fence_put(fence); --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1647,6 +1647,11 @@ __i915_request_ensure_parallel_ordering(
request_to_parent(rq)->parallel.last_rq = i915_request_get(rq);
+ /* + * Users have to put a reference potentially got by + * __i915_active_fence_set() to the returned request + * when no longer needed + */ return to_request(__i915_active_fence_set(&timeline->last_request, &rq->fence)); } @@ -1693,6 +1698,10 @@ __i915_request_ensure_ordering(struct i9 0); }
+ /* + * Users have to put the reference to prev potentially got + * by __i915_active_fence_set() when no longer needed + */ return prev; }
@@ -1736,6 +1745,8 @@ __i915_request_add_to_timeline(struct i9 prev = __i915_request_ensure_ordering(rq, timeline); else prev = __i915_request_ensure_parallel_ordering(rq, timeline); + if (prev) + i915_request_put(prev);
/* * Make sure that no request gazumped us - if it was allocated after
From: Andi Shyti andi.shyti@linux.intel.com
commit d14560ac1b595aa2e792365e91fea6aeaee66c2b upstream.
Fix the 'NV' definition postfix that is supposed to be INV.
Take the chance to also order properly the registers based on their address and call the GEN12_GFX_CCS_AUX_INV address as GEN12_CCS_AUX_INV like all the other similar registers.
Remove also VD1, VD3 and VE1 registers that don't exist and add BCS0 and CCS0.
Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: stable@vger.kernel.org # v5.8+ Reviewed-by: Nirmoy Das nirmoy.das@intel.com Reviewed-by: Andrzej Hajda andrzej.hajda@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-2-andi.... (cherry picked from commit 2f0b927d3ca3440445975ebde27f3df1c3ed6f76) Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 8 ++++---- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 16 ++++++++-------- drivers/gpu/drm/i915/gt/intel_lrc.c | 6 +++--- 3 files changed, 15 insertions(+), 15 deletions(-)
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -256,8 +256,8 @@ int gen12_emit_flush_rcs(struct i915_req
if (!HAS_FLAT_CCS(rq->engine->i915)) { /* hsdes: 1809175790 */ - cs = gen12_emit_aux_table_inv(rq->engine->gt, - cs, GEN12_GFX_CCS_AUX_NV); + cs = gen12_emit_aux_table_inv(rq->engine->gt, cs, + GEN12_CCS_AUX_INV); }
*cs++ = preparser_disable(false); @@ -317,10 +317,10 @@ int gen12_emit_flush_xcs(struct i915_req if (aux_inv) { /* hsdes: 1809175790 */ if (rq->engine->class == VIDEO_DECODE_CLASS) cs = gen12_emit_aux_table_inv(rq->engine->gt, - cs, GEN12_VD0_AUX_NV); + cs, GEN12_VD0_AUX_INV); else cs = gen12_emit_aux_table_inv(rq->engine->gt, - cs, GEN12_VE0_AUX_NV); + cs, GEN12_VE0_AUX_INV); }
if (mode & EMIT_INVALIDATE) --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -301,9 +301,11 @@ #define GEN8_PRIVATE_PAT_HI _MMIO(0x40e0 + 4) #define GEN10_PAT_INDEX(index) _MMIO(0x40e0 + (index) * 4) #define BSD_HWS_PGA_GEN7 _MMIO(0x4180) -#define GEN12_GFX_CCS_AUX_NV _MMIO(0x4208) -#define GEN12_VD0_AUX_NV _MMIO(0x4218) -#define GEN12_VD1_AUX_NV _MMIO(0x4228) + +#define GEN12_CCS_AUX_INV _MMIO(0x4208) +#define GEN12_VD0_AUX_INV _MMIO(0x4218) +#define GEN12_VE0_AUX_INV _MMIO(0x4238) +#define GEN12_BCS0_AUX_INV _MMIO(0x4248)
#define GEN8_RTCR _MMIO(0x4260) #define GEN8_M1TCR _MMIO(0x4264) @@ -311,14 +313,12 @@ #define GEN8_BTCR _MMIO(0x426c) #define GEN8_VTCR _MMIO(0x4270)
-#define GEN12_VD2_AUX_NV _MMIO(0x4298) -#define GEN12_VD3_AUX_NV _MMIO(0x42a8) -#define GEN12_VE0_AUX_NV _MMIO(0x4238) - #define BLT_HWS_PGA_GEN7 _MMIO(0x4280)
-#define GEN12_VE1_AUX_NV _MMIO(0x42b8) +#define GEN12_VD2_AUX_INV _MMIO(0x4298) +#define GEN12_CCS0_AUX_INV _MMIO(0x42c8) #define AUX_INV REG_BIT(0) + #define VEBOX_HWS_PGA_GEN7 _MMIO(0x4380)
#define GEN12_AUX_ERR_DBG _MMIO(0x43f4) --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1299,7 +1299,7 @@ gen12_emit_indirect_ctx_rcs(const struct /* hsdes: 1809175790 */ if (!HAS_FLAT_CCS(ce->engine->i915)) cs = gen12_emit_aux_table_inv(ce->engine->gt, - cs, GEN12_GFX_CCS_AUX_NV); + cs, GEN12_CCS_AUX_INV);
/* Wa_16014892111 */ if (IS_DG2(ce->engine->i915)) @@ -1326,10 +1326,10 @@ gen12_emit_indirect_ctx_xcs(const struct if (!HAS_FLAT_CCS(ce->engine->i915)) { if (ce->engine->class == VIDEO_DECODE_CLASS) cs = gen12_emit_aux_table_inv(ce->engine->gt, - cs, GEN12_VD0_AUX_NV); + cs, GEN12_VD0_AUX_INV); else if (ce->engine->class == VIDEO_ENHANCEMENT_CLASS) cs = gen12_emit_aux_table_inv(ce->engine->gt, - cs, GEN12_VE0_AUX_NV); + cs, GEN12_VE0_AUX_INV); }
return cs;
From: Geert Uytterhoeven geert+renesas@glider.be
commit a29b2fccf5f2689a9637be85ff1f51c834c6fb33 upstream.
smatch reports:
drivers/clk/imx/clk-imx93.c:294 imx93_clocks_probe() error: uninitialized symbol 'base'.
Indeed, in case of an error, the wrong (yet uninitialized) variable is converted to an error code and returned. Fix this by propagating the error code in the correct variable.
Fixes: e02ba11b45764705 ("clk: imx93: fix memory leak and missing unwind goto in imx93_clocks_probe") Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lore.kernel.org/all/9c2acd81-3ad8-485d-819e-9e4201277831@kadam.mount... Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/all/202306161533.4YDmL22b-lkp@intel.com/ Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20230711150812.3562221-1-geert+renesas@glider.be Reviewed-by: Peng Fan peng.fan@nxp.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/imx/clk-imx93.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/clk/imx/clk-imx93.c +++ b/drivers/clk/imx/clk-imx93.c @@ -288,7 +288,7 @@ static int imx93_clocks_probe(struct pla anatop_base = devm_of_iomap(dev, np, 0, NULL); of_node_put(np); if (WARN_ON(IS_ERR(anatop_base))) { - ret = PTR_ERR(base); + ret = PTR_ERR(anatop_base); goto unregister_hws; }
From: Hou Tao houtao1@huawei.com
commit 640a604585aa30f93e39b17d4d6ba69fcb1e66c9 upstream.
The following warning was reported when running stress-mode enabled xdp_redirect_cpu with some RT threads:
------------[ cut here ]------------ WARNING: CPU: 4 PID: 65 at kernel/bpf/cpumap.c:135 CPU: 4 PID: 65 Comm: kworker/4:1 Not tainted 6.5.0-rc2+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Workqueue: events cpu_map_kthread_stop RIP: 0010:put_cpu_map_entry+0xda/0x220 ...... Call Trace: <TASK> ? show_regs+0x65/0x70 ? __warn+0xa5/0x240 ...... ? put_cpu_map_entry+0xda/0x220 cpu_map_kthread_stop+0x41/0x60 process_one_work+0x6b0/0xb80 worker_thread+0x96/0x720 kthread+0x1a5/0x1f0 ret_from_fork+0x3a/0x70 ret_from_fork_asm+0x1b/0x30 </TASK>
The root cause is the same as commit 436901649731 ("bpf: cpumap: Fix memory leak in cpu_map_update_elem"). The kthread is stopped prematurely by kthread_stop() in cpu_map_kthread_stop(), and kthread() doesn't call cpu_map_kthread_run() at all but XDP program has already queued some frames or skbs into ptr_ring. So when __cpu_map_ring_cleanup() checks the ptr_ring, it will find it was not emptied and report a warning.
An alternative fix is to use __cpu_map_ring_cleanup() to drop these pending frames or skbs when kthread_stop() returns -EINTR, but it may confuse the user, because these frames or skbs have been handled correctly by XDP program. So instead of dropping these frames or skbs, just make sure the per-cpu kthread is running before __cpu_map_entry_alloc() returns.
After apply the fix, the error handle for kthread_stop() will be unnecessary because it will always return 0, so just remove it.
Fixes: 6710e1126934 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP") Signed-off-by: Hou Tao houtao1@huawei.com Reviewed-by: Pu Lehui pulehui@huawei.com Acked-by: Jesper Dangaard Brouer hawk@kernel.org Link: https://lore.kernel.org/r/20230729095107.1722450-2-houtao@huaweicloud.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/cpumap.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)
--- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -26,6 +26,7 @@ #include <linux/workqueue.h> #include <linux/kthread.h> #include <linux/capability.h> +#include <linux/completion.h> #include <trace/events/xdp.h> #include <linux/btf_ids.h>
@@ -71,6 +72,7 @@ struct bpf_cpu_map_entry { struct rcu_head rcu;
struct work_struct kthread_stop_wq; + struct completion kthread_running; };
struct bpf_cpu_map { @@ -164,7 +166,6 @@ static void put_cpu_map_entry(struct bpf static void cpu_map_kthread_stop(struct work_struct *work) { struct bpf_cpu_map_entry *rcpu; - int err;
rcpu = container_of(work, struct bpf_cpu_map_entry, kthread_stop_wq);
@@ -174,14 +175,7 @@ static void cpu_map_kthread_stop(struct rcu_barrier();
/* kthread_stop will wake_up_process and wait for it to complete */ - err = kthread_stop(rcpu->kthread); - if (err) { - /* kthread_stop may be called before cpu_map_kthread_run - * is executed, so we need to release the memory related - * to rcpu. - */ - put_cpu_map_entry(rcpu); - } + kthread_stop(rcpu->kthread); }
static void cpu_map_bpf_prog_run_skb(struct bpf_cpu_map_entry *rcpu, @@ -309,11 +303,11 @@ static int cpu_map_bpf_prog_run(struct b return nframes; }
- static int cpu_map_kthread_run(void *data) { struct bpf_cpu_map_entry *rcpu = data;
+ complete(&rcpu->kthread_running); set_current_state(TASK_INTERRUPTIBLE);
/* When kthread gives stop order, then rcpu have been disconnected @@ -478,6 +472,7 @@ __cpu_map_entry_alloc(struct bpf_map *ma goto free_ptr_ring;
/* Setup kthread */ + init_completion(&rcpu->kthread_running); rcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa, "cpumap/%d/map:%d", cpu, map->id); @@ -491,6 +486,12 @@ __cpu_map_entry_alloc(struct bpf_map *ma kthread_bind(rcpu->kthread, cpu); wake_up_process(rcpu->kthread);
+ /* Make sure kthread has been running, so kthread_stop() will not + * stop the kthread prematurely and all pending frames or skbs + * will be handled by the kthread before kthread_stop() returns. + */ + wait_for_completion(&rcpu->kthread_running); + return rcpu;
free_prog:
From: Linus Torvalds torvalds@linux-foundation.org
commit 797964253d358cf8d705614dda394dbe30120223 upstream.
In commit 20ea1e7d13c1 ("file: always lock position for FMODE_ATOMIC_POS") we ended up always taking the file pos lock, because pidfd_getfd() could get a reference to the file even when it didn't have an elevated file count due to threading of other sharing cases.
But Mateusz Guzik reports that the extra locking is actually measurable, so let's re-introduce the optimization, and only force the locking for directory traversal.
Directories need the lock for correctness reasons, while regular files only need it for "POSIX semantics". Since pidfd_getfd() is about debuggers etc special things that are _way_ outside of POSIX, we can relax the rules for that case.
Reported-by: Mateusz Guzik mjguzik@gmail.com Cc: Christian Brauner brauner@kernel.org Link: https://lore.kernel.org/linux-fsdevel/20230803095311.ijpvhx3fyrbkasul@f/ Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/file.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
--- a/fs/file.c +++ b/fs/file.c @@ -1036,12 +1036,28 @@ unsigned long __fdget_raw(unsigned int f return __fget_light(fd, 0); }
+/* + * Try to avoid f_pos locking. We only need it if the + * file is marked for FMODE_ATOMIC_POS, and it can be + * accessed multiple ways. + * + * Always do it for directories, because pidfd_getfd() + * can make a file accessible even if it otherwise would + * not be, and for directories this is a correctness + * issue, not a "POSIX requirement". + */ +static inline bool file_needs_f_pos_lock(struct file *file) +{ + return (file->f_mode & FMODE_ATOMIC_POS) && + (file_count(file) > 1 || S_ISDIR(file_inode(file)->i_mode)); +} + unsigned long __fdget_pos(unsigned int fd) { unsigned long v = __fdget(fd); struct file *file = (struct file *)(v & ~3);
- if (file && (file->f_mode & FMODE_ATOMIC_POS)) { + if (file && file_needs_f_pos_lock(file)) { v |= FDPUT_POS_UNLOCK; mutex_lock(&file->f_pos_lock); }
From: Roman Gushchin roman.gushchin@linux.dev
commit 3b8abb3239530c423c0b97e42af7f7e856e1ee96 upstream.
KCSAN found an issue in obj_stock_flush_required(): stock->cached_objcg can be reset between the check and dereference:
================================================================== BUG: KCSAN: data-race in drain_all_stock / drain_obj_stock
write to 0xffff888237c2a2f8 of 8 bytes by task 19625 on cpu 0: drain_obj_stock+0x408/0x4e0 mm/memcontrol.c:3306 refill_obj_stock+0x9c/0x1e0 mm/memcontrol.c:3340 obj_cgroup_uncharge+0xe/0x10 mm/memcontrol.c:3408 memcg_slab_free_hook mm/slab.h:587 [inline] __cache_free mm/slab.c:3373 [inline] __do_kmem_cache_free mm/slab.c:3577 [inline] kmem_cache_free+0x105/0x280 mm/slab.c:3602 __d_free fs/dcache.c:298 [inline] dentry_free fs/dcache.c:375 [inline] __dentry_kill+0x422/0x4a0 fs/dcache.c:621 dentry_kill+0x8d/0x1e0 dput+0x118/0x1f0 fs/dcache.c:913 __fput+0x3bf/0x570 fs/file_table.c:329 ____fput+0x15/0x20 fs/file_table.c:349 task_work_run+0x123/0x160 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop+0xcf/0xe0 kernel/entry/common.c:171 exit_to_user_mode_prepare+0x6a/0xa0 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x26/0x140 kernel/entry/common.c:296 do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff888237c2a2f8 of 8 bytes by task 19632 on cpu 1: obj_stock_flush_required mm/memcontrol.c:3319 [inline] drain_all_stock+0x174/0x2a0 mm/memcontrol.c:2361 try_charge_memcg+0x6d0/0xd10 mm/memcontrol.c:2703 try_charge mm/memcontrol.c:2837 [inline] mem_cgroup_charge_skmem+0x51/0x140 mm/memcontrol.c:7290 sock_reserve_memory+0xb1/0x390 net/core/sock.c:1025 sk_setsockopt+0x800/0x1e70 net/core/sock.c:1525 udp_lib_setsockopt+0x99/0x6c0 net/ipv4/udp.c:2692 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2817 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3668 __sys_setsockopt+0x1c3/0x230 net/socket.c:2271 __do_sys_setsockopt net/socket.c:2282 [inline] __se_sys_setsockopt net/socket.c:2279 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2279 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0xffff8881382d52c0 -> 0xffff888138893740
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 19632 Comm: syz-executor.0 Not tainted 6.3.0-rc2-syzkaller-00387-g534293368afa #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
Fix it by using READ_ONCE()/WRITE_ONCE() for all accesses to stock->cached_objcg.
Link: https://lkml.kernel.org/r/20230502160839.361544-1-roman.gushchin@linux.dev Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API") Signed-off-by: Roman Gushchin roman.gushchin@linux.dev Reported-by: syzbot+774c29891415ab0fd29d@syzkaller.appspotmail.com Reported-by: Dmitry Vyukov dvyukov@google.com Link: https://lore.kernel.org/linux-mm/CACT4Y+ZfucZhM60YPphWiCLJr6+SGFhT+jjm8k1P-a... Reviewed-by: Yosry Ahmed yosryahmed@google.com Acked-by: Shakeel Butt shakeelb@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memcontrol.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
--- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3165,12 +3165,12 @@ void mod_objcg_state(struct obj_cgroup * * accumulating over a page of vmstat data or when pgdat or idx * changes. */ - if (stock->cached_objcg != objcg) { + if (READ_ONCE(stock->cached_objcg) != objcg) { old = drain_obj_stock(stock); obj_cgroup_get(objcg); stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0; - stock->cached_objcg = objcg; + WRITE_ONCE(stock->cached_objcg, objcg); stock->cached_pgdat = pgdat; } else if (stock->cached_pgdat != pgdat) { /* Flush the existing cached vmstat data */ @@ -3224,7 +3224,7 @@ static bool consume_obj_stock(struct obj local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock); - if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) { + if (objcg == READ_ONCE(stock->cached_objcg) && stock->nr_bytes >= nr_bytes) { stock->nr_bytes -= nr_bytes; ret = true; } @@ -3236,7 +3236,7 @@ static bool consume_obj_stock(struct obj
static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock) { - struct obj_cgroup *old = stock->cached_objcg; + struct obj_cgroup *old = READ_ONCE(stock->cached_objcg);
if (!old) return NULL; @@ -3289,7 +3289,7 @@ static struct obj_cgroup *drain_obj_stoc stock->cached_pgdat = NULL; }
- stock->cached_objcg = NULL; + WRITE_ONCE(stock->cached_objcg, NULL); /* * The `old' objects needs to be released by the caller via * obj_cgroup_put() outside of memcg_stock_pcp::stock_lock. @@ -3300,10 +3300,11 @@ static struct obj_cgroup *drain_obj_stoc static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, struct mem_cgroup *root_memcg) { + struct obj_cgroup *objcg = READ_ONCE(stock->cached_objcg); struct mem_cgroup *memcg;
- if (stock->cached_objcg) { - memcg = obj_cgroup_memcg(stock->cached_objcg); + if (objcg) { + memcg = obj_cgroup_memcg(objcg); if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) return true; } @@ -3322,10 +3323,10 @@ static void refill_obj_stock(struct obj_ local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock); - if (stock->cached_objcg != objcg) { /* reset if necessary */ + if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ old = drain_obj_stock(stock); obj_cgroup_get(objcg); - stock->cached_objcg = objcg; + WRITE_ONCE(stock->cached_objcg, objcg); stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0; allow_uncharge = true; /* Allow uncharge when objcg changes */
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit ea303f72d70ce2f0b0aa94ab127085289768c5a6 upstream.
syzbot is reporting too large allocation at ntfs_load_attr_list(), for a crafted filesystem can have huge data_size.
Reported-by: syzbot syzbot+89dbb3a789a5b9711793@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=89dbb3a789a5b9711793 Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ntfs3/attrlist.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/ntfs3/attrlist.c +++ b/fs/ntfs3/attrlist.c @@ -52,7 +52,7 @@ int ntfs_load_attr_list(struct ntfs_inod
if (!attr->non_res) { lsize = le32_to_cpu(attr->res.data_size); - le = kmalloc(al_aligned(lsize), GFP_NOFS); + le = kmalloc(al_aligned(lsize), GFP_NOFS | __GFP_NOWARN); if (!le) { err = -ENOMEM; goto out; @@ -80,7 +80,7 @@ int ntfs_load_attr_list(struct ntfs_inod if (err < 0) goto out;
- le = kmalloc(al_aligned(lsize), GFP_NOFS); + le = kmalloc(al_aligned(lsize), GFP_NOFS | __GFP_NOWARN); if (!le) { err = -ENOMEM; goto out;
From: Prince Kumar Maurya princekumarmaurya06@gmail.com
commit ea2b62f305893992156a798f665847e0663c9f41 upstream.
sb_getblk(inode->i_sb, parent) return a null ptr and taking lock on that leads to the null-ptr-deref bug.
Reported-by: syzbot+aad58150cbc64ba41bdc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=aad58150cbc64ba41bdc Signed-off-by: Prince Kumar Maurya princekumarmaurya06@gmail.com Message-Id: 20230531013141.19487-1-princekumarmaurya06@gmail.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/sysv/itree.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/sysv/itree.c +++ b/fs/sysv/itree.c @@ -145,6 +145,10 @@ static int alloc_branch(struct inode *in */ parent = block_to_cpu(SYSV_SB(inode->i_sb), branch[n-1].key); bh = sb_getblk(inode->i_sb, parent); + if (!bh) { + sysv_free_block(inode->i_sb, branch[n].key); + break; + } lock_buffer(bh); memset(bh->b_data, 0, blocksize); branch[n].bh = bh;
From: Sungwoo Kim iam@sung-woo.kim
commit 1728137b33c00d5a2b5110ed7aafb42e7c32e4a1 upstream.
l2cap_sock_release(sk) frees sk. However, sk's children are still alive and point to the already free'd sk's address. To fix this, l2cap_sock_release(sk) also cleans sk's children.
================================================================== BUG: KASAN: use-after-free in l2cap_sock_ready_cb+0xb7/0x100 net/bluetooth/l2cap_sock.c:1650 Read of size 8 at addr ffff888104617aa8 by task kworker/u3:0/276
CPU: 0 PID: 276 Comm: kworker/u3:0 Not tainted 6.2.0-00001-gef397bd4d5fb-dirty #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: hci2 hci_rx_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x72/0x95 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:306 [inline] print_report+0x175/0x478 mm/kasan/report.c:417 kasan_report+0xb1/0x130 mm/kasan/report.c:517 l2cap_sock_ready_cb+0xb7/0x100 net/bluetooth/l2cap_sock.c:1650 l2cap_chan_ready+0x10e/0x1e0 net/bluetooth/l2cap_core.c:1386 l2cap_config_req+0x753/0x9f0 net/bluetooth/l2cap_core.c:4480 l2cap_bredr_sig_cmd net/bluetooth/l2cap_core.c:5739 [inline] l2cap_sig_channel net/bluetooth/l2cap_core.c:6509 [inline] l2cap_recv_frame+0xe2e/0x43c0 net/bluetooth/l2cap_core.c:7788 l2cap_recv_acldata+0x6ed/0x7e0 net/bluetooth/l2cap_core.c:8506 hci_acldata_packet net/bluetooth/hci_core.c:3813 [inline] hci_rx_work+0x66e/0xbc0 net/bluetooth/hci_core.c:4048 process_one_work+0x4ea/0x8e0 kernel/workqueue.c:2289 worker_thread+0x364/0x8e0 kernel/workqueue.c:2436 kthread+0x1b9/0x200 kernel/kthread.c:376 ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308 </TASK>
Allocated by task 288: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:374 [inline] __kasan_kmalloc+0x82/0x90 mm/kasan/common.c:383 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slab_common.c:968 [inline] __kmalloc+0x5a/0x140 mm/slab_common.c:981 kmalloc include/linux/slab.h:584 [inline] sk_prot_alloc+0x113/0x1f0 net/core/sock.c:2040 sk_alloc+0x36/0x3c0 net/core/sock.c:2093 l2cap_sock_alloc.constprop.0+0x39/0x1c0 net/bluetooth/l2cap_sock.c:1852 l2cap_sock_create+0x10d/0x220 net/bluetooth/l2cap_sock.c:1898 bt_sock_create+0x183/0x290 net/bluetooth/af_bluetooth.c:132 __sock_create+0x226/0x380 net/socket.c:1518 sock_create net/socket.c:1569 [inline] __sys_socket_create net/socket.c:1606 [inline] __sys_socket_create net/socket.c:1591 [inline] __sys_socket+0x112/0x200 net/socket.c:1639 __do_sys_socket net/socket.c:1652 [inline] __se_sys_socket net/socket.c:1650 [inline] __x64_sys_socket+0x40/0x50 net/socket.c:1650 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc
Freed by task 288: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:523 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free mm/kasan/common.c:200 [inline] __kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1781 [inline] slab_free_freelist_hook mm/slub.c:1807 [inline] slab_free mm/slub.c:3787 [inline] __kmem_cache_free+0x88/0x1f0 mm/slub.c:3800 sk_prot_free net/core/sock.c:2076 [inline] __sk_destruct+0x347/0x430 net/core/sock.c:2168 sk_destruct+0x9c/0xb0 net/core/sock.c:2183 __sk_free+0x82/0x220 net/core/sock.c:2194 sk_free+0x7c/0xa0 net/core/sock.c:2205 sock_put include/net/sock.h:1991 [inline] l2cap_sock_kill+0x256/0x2b0 net/bluetooth/l2cap_sock.c:1257 l2cap_sock_release+0x1a7/0x220 net/bluetooth/l2cap_sock.c:1428 __sock_release+0x80/0x150 net/socket.c:650 sock_close+0x19/0x30 net/socket.c:1368 __fput+0x17a/0x5c0 fs/file_table.c:320 task_work_run+0x132/0x1c0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x113/0x120 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:296 do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc
The buggy address belongs to the object at ffff888104617800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 680 bytes inside of 1024-byte region [ffff888104617800, ffff888104617c00)
The buggy address belongs to the physical page: page:00000000dbca6a80 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888104614000 pfn:0x104614 head:00000000dbca6a80 order:2 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0 flags: 0x200000000010200(slab|head|node=0|zone=2) raw: 0200000000010200 ffff888100041dc0 ffffea0004212c10 ffffea0004234b10 raw: ffff888104614000 0000000000080002 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888104617980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888104617a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888104617a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff888104617b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888104617b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
Ack: This bug is found by FuzzBT with a modified Syzkaller. Other contributors are Ruoyu Wu and Hui Peng. Signed-off-by: Sungwoo Kim iam@sung-woo.kim Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/l2cap_sock.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/bluetooth/l2cap_sock.c +++ b/net/bluetooth/l2cap_sock.c @@ -46,6 +46,7 @@ static const struct proto_ops l2cap_sock static void l2cap_sock_init(struct sock *sk, struct sock *parent); static struct sock *l2cap_sock_alloc(struct net *net, struct socket *sock, int proto, gfp_t prio, int kern); +static void l2cap_sock_cleanup_listen(struct sock *parent);
bool l2cap_is_socket(struct socket *sock) { @@ -1415,6 +1416,7 @@ static int l2cap_sock_release(struct soc if (!sk) return 0;
+ l2cap_sock_cleanup_listen(sk); bt_sock_unlink(&l2cap_sk_list, sk);
err = l2cap_sock_shutdown(sock, SHUT_RDWR);
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 8b64d420fe2450f82848178506d3e3a0bd195539 upstream.
syzbot is reporting false a positive ODEBUG message immediately after ODEBUG was disabled due to OOM.
[ 1062.309646][T22911] ODEBUG: Out of memory. ODEBUG disabled [ 1062.886755][ T5171] ------------[ cut here ]------------ [ 1062.892770][ T5171] ODEBUG: assert_init not available (active state 0) object: ffffc900056afb20 object type: timer_list hint: process_timeout+0x0/0x40
CPU 0 [ T5171] CPU 1 [T22911] -------------- -------------- debug_object_assert_init() { if (!debug_objects_enabled) return; db = get_bucket(addr); lookup_object_or_alloc() { debug_objects_enabled = 0; return NULL; } debug_objects_oom() { pr_warn("Out of memory. ODEBUG disabled\n"); // all buckets get emptied here, and } lookup_object_or_alloc(addr, db, descr, false, true) { // this bucket is already empty. return ERR_PTR(-ENOENT); } // Emits false positive warning. debug_print_object(&o, "assert_init"); }
Recheck debug_object_enabled in debug_print_object() to avoid that.
Reported-by: syzbot syzbot+7937ba6a50bdd00fffdf@syzkaller.appspotmail.com Suggested-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/492fe2ae-5141-d548-ebd5-62f5fe2e57f7@I-love.SAKURA... Closes: https://syzkaller.appspot.com/bug?extid=7937ba6a50bdd00fffdf Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/debugobjects.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -498,6 +498,15 @@ static void debug_print_object(struct de const struct debug_obj_descr *descr = obj->descr; static int limit;
+ /* + * Don't report if lookup_object_or_alloc() by the current thread + * failed because lookup_object_or_alloc()/debug_objects_oom() by a + * concurrent thread turned off debug_objects_enabled and cleared + * the hash buckets. + */ + if (!debug_objects_enabled) + return; + if (limit < 5 && descr != descr_test) { void *hint = descr->debug_hint ? descr->debug_hint(obj->object) : NULL;
From: Alan Stern stern@rowland.harvard.edu
commit 5e1627cb43ddf1b24b92eb26f8d958a3f5676ccb upstream.
The syzbot fuzzer identified a problem in the usbnet driver:
usb 1-1: BOGUS urb xfer, pipe 3 != type 1 WARNING: CPU: 0 PID: 754 at drivers/usb/core/urb.c:504 usb_submit_urb+0xed6/0x1880 drivers/usb/core/urb.c:504 Modules linked in: CPU: 0 PID: 754 Comm: kworker/0:2 Not tainted 6.4.0-rc7-syzkaller-00014-g692b7dc87ca6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 Workqueue: mld mld_ifc_work RIP: 0010:usb_submit_urb+0xed6/0x1880 drivers/usb/core/urb.c:504 Code: 7c 24 18 e8 2c b4 5b fb 48 8b 7c 24 18 e8 42 07 f0 fe 41 89 d8 44 89 e1 4c 89 ea 48 89 c6 48 c7 c7 a0 c9 fc 8a e8 5a 6f 23 fb <0f> 0b e9 58 f8 ff ff e8 fe b3 5b fb 48 81 c5 c0 05 00 00 e9 84 f7 RSP: 0018:ffffc9000463f568 EFLAGS: 00010086 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff88801eb28000 RSI: ffffffff814c03b7 RDI: 0000000000000001 RBP: ffff8881443b7190 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000003 R13: ffff88802a77cb18 R14: 0000000000000003 R15: ffff888018262500 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556a99c15a18 CR3: 0000000028c71000 CR4: 0000000000350ef0 Call Trace: <TASK> usbnet_start_xmit+0xfe5/0x2190 drivers/net/usb/usbnet.c:1453 __netdev_start_xmit include/linux/netdevice.h:4918 [inline] netdev_start_xmit include/linux/netdevice.h:4932 [inline] xmit_one net/core/dev.c:3578 [inline] dev_hard_start_xmit+0x187/0x700 net/core/dev.c:3594 ...
This bug is caused by the fact that usbnet trusts the bulk endpoint addresses its probe routine receives in the driver_info structure, and it does not check to see that these endpoints actually exist and have the expected type and directions.
The fix is simply to add such a check.
Reported-and-tested-by: syzbot+63ee658b9a100ffadbe2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/000000000000a56e9105d0cec021@google.com/ Signed-off-by: Alan Stern stern@rowland.harvard.edu CC: Oliver Neukum oneukum@suse.com Link: https://lore.kernel.org/r/ea152b6d-44df-4f8a-95c6-4db51143dcc1@rowland.harva... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/usbnet.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -1770,6 +1770,10 @@ usbnet_probe (struct usb_interface *udev } else if (!info->in || !info->out) status = usbnet_get_endpoints (dev, udev); else { + u8 ep_addrs[3] = { + info->in + USB_DIR_IN, info->out + USB_DIR_OUT, 0 + }; + dev->in = usb_rcvbulkpipe (xdev, info->in); dev->out = usb_sndbulkpipe (xdev, info->out); if (!(info->flags & FLAG_NO_SETINT)) @@ -1779,6 +1783,8 @@ usbnet_probe (struct usb_interface *udev else status = 0;
+ if (status == 0 && !usb_check_bulk_endpoints(udev, ep_addrs)) + status = -EINVAL; } if (status >= 0 && dev->status) status = init_status (dev, udev);
From: Jan Kara jack@suse.cz
commit c541dce86c537714b6761a79a969c1623dfa222b upstream.
The reconfigure / remount code takes a lot of effort to protect filesystem's reconfiguration code from racing writes on remounting read-only. However during remounting read-only filesystem to read-write mode userspace writes can start immediately once we clear SB_RDONLY flag. This is inconvenient for example for ext4 because we need to do some writes to the filesystem (such as preparation of quota files) before we can take userspace writes so we are clearing SB_RDONLY flag before we are fully ready to accept userpace writes and syzbot has found a way to exploit this [1]. Also as far as I'm reading the code the filesystem remount code was protected from racing writes in the legacy mount path by the mount's MNT_READONLY flag so this is relatively new problem. It is actually fairly easy to protect remount read-write from racing writes using sb->s_readonly_remount flag so let's just do that instead of having to workaround these races in the filesystem code.
[1] https://lore.kernel.org/all/00000000000006a0df05f6667499@google.com/T/
Signed-off-by: Jan Kara jack@suse.cz Message-Id: 20230615113848.8439-1-jack@suse.cz Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/super.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/fs/super.c +++ b/fs/super.c @@ -904,6 +904,7 @@ int reconfigure_super(struct fs_context struct super_block *sb = fc->root->d_sb; int retval; bool remount_ro = false; + bool remount_rw = false; bool force = fc->sb_flags & SB_FORCE;
if (fc->sb_flags_mask & ~MS_RMT_MASK) @@ -921,7 +922,7 @@ int reconfigure_super(struct fs_context bdev_read_only(sb->s_bdev)) return -EACCES; #endif - + remount_rw = !(fc->sb_flags & SB_RDONLY) && sb_rdonly(sb); remount_ro = (fc->sb_flags & SB_RDONLY) && !sb_rdonly(sb); }
@@ -951,6 +952,14 @@ int reconfigure_super(struct fs_context if (retval) return retval; } + } else if (remount_rw) { + /* + * We set s_readonly_remount here to protect filesystem's + * reconfigure code from writes from userspace until + * reconfigure finishes. + */ + sb->s_readonly_remount = 1; + smp_wmb(); }
if (fc->ops->reconfigure) {
From: Jan Kara jack@suse.cz
commit 404615d7f1dcd4cca200e9a7a9df3a1dcae1dd62 upstream.
Ext2 has fields in superblock reserved for subblock allocation support. However that never landed. Drop the many years dead code.
Reported-by: syzbot+af5e10f73dbff48f70af@syzkaller.appspotmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext2/ext2.h | 12 ------------ fs/ext2/super.c | 23 ++++------------------- 2 files changed, 4 insertions(+), 31 deletions(-)
--- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -70,10 +70,7 @@ struct mb_cache; * second extended-fs super-block data in memory */ struct ext2_sb_info { - unsigned long s_frag_size; /* Size of a fragment in bytes */ - unsigned long s_frags_per_block;/* Number of fragments per block */ unsigned long s_inodes_per_block;/* Number of inodes per block */ - unsigned long s_frags_per_group;/* Number of fragments in a group */ unsigned long s_blocks_per_group;/* Number of blocks in a group */ unsigned long s_inodes_per_group;/* Number of inodes in a group */ unsigned long s_itb_per_group; /* Number of inode table blocks per group */ @@ -189,15 +186,6 @@ static inline struct ext2_sb_info *EXT2_ #define EXT2_FIRST_INO(s) (EXT2_SB(s)->s_first_ino)
/* - * Macro-instructions used to manage fragments - */ -#define EXT2_MIN_FRAG_SIZE 1024 -#define EXT2_MAX_FRAG_SIZE 4096 -#define EXT2_MIN_FRAG_LOG_SIZE 10 -#define EXT2_FRAG_SIZE(s) (EXT2_SB(s)->s_frag_size) -#define EXT2_FRAGS_PER_BLOCK(s) (EXT2_SB(s)->s_frags_per_block) - -/* * Structure of a blocks group descriptor */ struct ext2_group_desc --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -668,10 +668,9 @@ static int ext2_setup_super (struct supe es->s_max_mnt_count = cpu_to_le16(EXT2_DFL_MAX_MNT_COUNT); le16_add_cpu(&es->s_mnt_count, 1); if (test_opt (sb, DEBUG)) - ext2_msg(sb, KERN_INFO, "%s, %s, bs=%lu, fs=%lu, gc=%lu, " + ext2_msg(sb, KERN_INFO, "%s, %s, bs=%lu, gc=%lu, " "bpg=%lu, ipg=%lu, mo=%04lx]", EXT2FS_VERSION, EXT2FS_DATE, sb->s_blocksize, - sbi->s_frag_size, sbi->s_groups_count, EXT2_BLOCKS_PER_GROUP(sb), EXT2_INODES_PER_GROUP(sb), @@ -1012,14 +1011,7 @@ static int ext2_fill_super(struct super_ } }
- sbi->s_frag_size = EXT2_MIN_FRAG_SIZE << - le32_to_cpu(es->s_log_frag_size); - if (sbi->s_frag_size == 0) - goto cantfind_ext2; - sbi->s_frags_per_block = sb->s_blocksize / sbi->s_frag_size; - sbi->s_blocks_per_group = le32_to_cpu(es->s_blocks_per_group); - sbi->s_frags_per_group = le32_to_cpu(es->s_frags_per_group); sbi->s_inodes_per_group = le32_to_cpu(es->s_inodes_per_group);
sbi->s_inodes_per_block = sb->s_blocksize / EXT2_INODE_SIZE(sb); @@ -1045,11 +1037,10 @@ static int ext2_fill_super(struct super_ goto failed_mount; }
- if (sb->s_blocksize != sbi->s_frag_size) { + if (es->s_log_frag_size != es->s_log_block_size) { ext2_msg(sb, KERN_ERR, - "error: fragsize %lu != blocksize %lu" - "(not supported yet)", - sbi->s_frag_size, sb->s_blocksize); + "error: fragsize log %u != blocksize log %u", + le32_to_cpu(es->s_log_frag_size), sb->s_blocksize_bits); goto failed_mount; }
@@ -1066,12 +1057,6 @@ static int ext2_fill_super(struct super_ sbi->s_blocks_per_group, sbi->s_inodes_per_group + 3); goto failed_mount; } - if (sbi->s_frags_per_group > sb->s_blocksize * 8) { - ext2_msg(sb, KERN_ERR, - "error: #fragments per group too big: %lu", - sbi->s_frags_per_group); - goto failed_mount; - } if (sbi->s_inodes_per_group < sbi->s_inodes_per_block || sbi->s_inodes_per_group > sb->s_blocksize * 8) { ext2_msg(sb, KERN_ERR,
From: Filipe Manana fdmanana@suse.com
commit d8ccbd21918fd7fa6ce3226cffc22c444228e8ad upstream.
At add_new_free_space() we have these BUG_ON()'s that are there to deal with any failure to add free space to the in memory free space cache. Such failures are mostly -ENOMEM that should be very rare. However there's no need to have these BUG_ON()'s, we can just return any error to the caller and all callers and their upper call chain are already dealing with errors.
So just make add_new_free_space() return any errors, while removing the BUG_ON()'s, and returning the total amount of added free space to an optional u64 pointer argument.
Reported-by: syzbot+3ba856e07b7127889d8c@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/000000000000e9cb8305ff4e8327@google.com/ Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/block-group.c | 51 ++++++++++++++++++++++++++++++--------------- fs/btrfs/block-group.h | 4 +-- fs/btrfs/free-space-tree.c | 24 +++++++++++++++------ 3 files changed, 53 insertions(+), 26 deletions(-)
--- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -494,12 +494,16 @@ static void fragment_free_space(struct b * used yet since their free space will be released as soon as the transaction * commits. */ -u64 add_new_free_space(struct btrfs_block_group *block_group, u64 start, u64 end) +int add_new_free_space(struct btrfs_block_group *block_group, u64 start, u64 end, + u64 *total_added_ret) { struct btrfs_fs_info *info = block_group->fs_info; - u64 extent_start, extent_end, size, total_added = 0; + u64 extent_start, extent_end, size; int ret;
+ if (total_added_ret) + *total_added_ret = 0; + while (start < end) { ret = find_first_extent_bit(&info->excluded_extents, start, &extent_start, &extent_end, @@ -512,10 +516,12 @@ u64 add_new_free_space(struct btrfs_bloc start = extent_end + 1; } else if (extent_start > start && extent_start < end) { size = extent_start - start; - total_added += size; ret = btrfs_add_free_space_async_trimmed(block_group, start, size); - BUG_ON(ret); /* -ENOMEM or logic error */ + if (ret) + return ret; + if (total_added_ret) + *total_added_ret += size; start = extent_end + 1; } else { break; @@ -524,13 +530,15 @@ u64 add_new_free_space(struct btrfs_bloc
if (start < end) { size = end - start; - total_added += size; ret = btrfs_add_free_space_async_trimmed(block_group, start, size); - BUG_ON(ret); /* -ENOMEM or logic error */ + if (ret) + return ret; + if (total_added_ret) + *total_added_ret += size; }
- return total_added; + return 0; }
static int load_extent_tree_free(struct btrfs_caching_control *caching_ctl) @@ -637,8 +645,13 @@ next:
if (key.type == BTRFS_EXTENT_ITEM_KEY || key.type == BTRFS_METADATA_ITEM_KEY) { - total_found += add_new_free_space(block_group, last, - key.objectid); + u64 space_added; + + ret = add_new_free_space(block_group, last, key.objectid, + &space_added); + if (ret) + goto out; + total_found += space_added; if (key.type == BTRFS_METADATA_ITEM_KEY) last = key.objectid + fs_info->nodesize; @@ -653,11 +666,10 @@ next: } path->slots[0]++; } - ret = 0; - - total_found += add_new_free_space(block_group, last, - block_group->start + block_group->length);
+ ret = add_new_free_space(block_group, last, + block_group->start + block_group->length, + NULL); out: btrfs_free_path(path); return ret; @@ -2101,9 +2113,11 @@ static int read_one_block_group(struct b btrfs_free_excluded_extents(cache); } else if (cache->used == 0) { cache->cached = BTRFS_CACHE_FINISHED; - add_new_free_space(cache, cache->start, - cache->start + cache->length); + ret = add_new_free_space(cache, cache->start, + cache->start + cache->length, NULL); btrfs_free_excluded_extents(cache); + if (ret) + goto error; }
ret = btrfs_add_block_group_cache(info, cache); @@ -2529,9 +2543,12 @@ struct btrfs_block_group *btrfs_make_blo return ERR_PTR(ret); }
- add_new_free_space(cache, chunk_offset, chunk_offset + size); - + ret = add_new_free_space(cache, chunk_offset, chunk_offset + size, NULL); btrfs_free_excluded_extents(cache); + if (ret) { + btrfs_put_block_group(cache); + return ERR_PTR(ret); + }
/* * Ensure the corresponding space_info object is created and --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -284,8 +284,8 @@ int btrfs_cache_block_group(struct btrfs void btrfs_put_caching_control(struct btrfs_caching_control *ctl); struct btrfs_caching_control *btrfs_get_caching_control( struct btrfs_block_group *cache); -u64 add_new_free_space(struct btrfs_block_group *block_group, - u64 start, u64 end); +int add_new_free_space(struct btrfs_block_group *block_group, + u64 start, u64 end, u64 *total_added_ret); struct btrfs_trans_handle *btrfs_start_trans_remove_block_group( struct btrfs_fs_info *fs_info, const u64 chunk_offset); --- a/fs/btrfs/free-space-tree.c +++ b/fs/btrfs/free-space-tree.c @@ -1510,9 +1510,13 @@ static int load_free_space_bitmaps(struc if (prev_bit == 0 && bit == 1) { extent_start = offset; } else if (prev_bit == 1 && bit == 0) { - total_found += add_new_free_space(block_group, - extent_start, - offset); + u64 space_added; + + ret = add_new_free_space(block_group, extent_start, + offset, &space_added); + if (ret) + goto out; + total_found += space_added; if (total_found > CACHING_CTL_WAKE_UP) { total_found = 0; wake_up(&caching_ctl->wait); @@ -1524,8 +1528,9 @@ static int load_free_space_bitmaps(struc } } if (prev_bit == 1) { - total_found += add_new_free_space(block_group, extent_start, - end); + ret = add_new_free_space(block_group, extent_start, end, NULL); + if (ret) + goto out; extent_count++; }
@@ -1564,6 +1569,8 @@ static int load_free_space_extents(struc end = block_group->start + block_group->length;
while (1) { + u64 space_added; + ret = btrfs_next_item(root, path); if (ret < 0) goto out; @@ -1578,8 +1585,11 @@ static int load_free_space_extents(struc ASSERT(key.type == BTRFS_FREE_SPACE_EXTENT_KEY); ASSERT(key.objectid < end && key.objectid + key.offset <= end);
- total_found += add_new_free_space(block_group, key.objectid, - key.objectid + key.offset); + ret = add_new_free_space(block_group, key.objectid, + key.objectid + key.offset, &space_added); + if (ret) + goto out; + total_found += space_added; if (total_found > CACHING_CTL_WAKE_UP) { total_found = 0; wake_up(&caching_ctl->wait);
From: Chao Yu chao@kernel.org
commit a6ec83786ab9f13f25fb18166dee908845713a95 upstream.
syzbot reports below bug:
BUG: KASAN: slab-use-after-free in f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574 Read of size 4 at addr ffff88802a25c000 by task syz-executor148/5000
CPU: 1 PID: 5000 Comm: syz-executor148 Not tainted 6.4.0-rc7-syzkaller-00041-ge660abd551f1 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:351 print_report mm/kasan/report.c:462 [inline] kasan_report+0x11c/0x130 mm/kasan/report.c:572 f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574 truncate_dnode+0x229/0x2e0 fs/f2fs/node.c:944 f2fs_truncate_inode_blocks+0x64b/0xde0 fs/f2fs/node.c:1154 f2fs_do_truncate_blocks+0x4ac/0xf30 fs/f2fs/file.c:721 f2fs_truncate_blocks+0x7b/0x300 fs/f2fs/file.c:749 f2fs_truncate.part.0+0x4a5/0x630 fs/f2fs/file.c:799 f2fs_truncate include/linux/fs.h:825 [inline] f2fs_setattr+0x1738/0x2090 fs/f2fs/file.c:1006 notify_change+0xb2c/0x1180 fs/attr.c:483 do_truncate+0x143/0x200 fs/open.c:66 handle_truncate fs/namei.c:3295 [inline] do_open fs/namei.c:3640 [inline] path_openat+0x2083/0x2750 fs/namei.c:3791 do_filp_open+0x1ba/0x410 fs/namei.c:3818 do_sys_openat2+0x16d/0x4c0 fs/open.c:1356 do_sys_open fs/open.c:1372 [inline] __do_sys_creat fs/open.c:1448 [inline] __se_sys_creat fs/open.c:1442 [inline] __x64_sys_creat+0xcd/0x120 fs/open.c:1442 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
The root cause is, inodeA references inodeB via inodeB's ino, once inodeA is truncated, it calls truncate_dnode() to truncate data blocks in inodeB's node page, it traverse mapping data from node->i.i_addr[0] to node->i.i_addr[ADDRS_PER_BLOCK() - 1], result in out-of-boundary access.
This patch fixes to add sanity check on dnode page in truncate_dnode(), so that, it can help to avoid triggering such issue, and once it encounters such issue, it will record newly introduced ERROR_INVALID_NODE_REFERENCE error into superblock, later fsck can detect such issue and try repairing.
Also, it removes f2fs_truncate_data_blocks() for cleanup due to the function has only one caller, and uses f2fs_truncate_data_blocks_range() instead.
Reported-and-tested-by: syzbot+12cb4425b22169b52036@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000f3038a05fef867f8@google... Signed-off-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/f2fs/f2fs.h | 1 - fs/f2fs/file.c | 5 ----- fs/f2fs/node.c | 14 ++++++++++++-- include/linux/f2fs_fs.h | 1 + 4 files changed, 13 insertions(+), 8 deletions(-)
--- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3431,7 +3431,6 @@ static inline bool __is_valid_data_blkad * file.c */ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync); -void f2fs_truncate_data_blocks(struct dnode_of_data *dn); int f2fs_do_truncate_blocks(struct inode *inode, u64 from, bool lock); int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock); int f2fs_truncate(struct inode *inode); --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -628,11 +628,6 @@ void f2fs_truncate_data_blocks_range(str dn->ofs_in_node, nr_free); }
-void f2fs_truncate_data_blocks(struct dnode_of_data *dn) -{ - f2fs_truncate_data_blocks_range(dn, ADDRS_PER_BLOCK(dn->inode)); -} - static int truncate_partial_data_page(struct inode *inode, u64 from, bool cache_only) { --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -923,6 +923,7 @@ static int truncate_node(struct dnode_of
static int truncate_dnode(struct dnode_of_data *dn) { + struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); struct page *page; int err;
@@ -930,16 +931,25 @@ static int truncate_dnode(struct dnode_o return 1;
/* get direct node */ - page = f2fs_get_node_page(F2FS_I_SB(dn->inode), dn->nid); + page = f2fs_get_node_page(sbi, dn->nid); if (PTR_ERR(page) == -ENOENT) return 1; else if (IS_ERR(page)) return PTR_ERR(page);
+ if (IS_INODE(page) || ino_of_node(page) != dn->inode->i_ino) { + f2fs_err(sbi, "incorrect node reference, ino: %lu, nid: %u, ino_of_node: %u", + dn->inode->i_ino, dn->nid, ino_of_node(page)); + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_handle_error(sbi, ERROR_INVALID_NODE_REFERENCE); + f2fs_put_page(page, 1); + return -EFSCORRUPTED; + } + /* Make dnode_of_data for parameter */ dn->node_page = page; dn->ofs_in_node = 0; - f2fs_truncate_data_blocks(dn); + f2fs_truncate_data_blocks_range(dn, ADDRS_PER_BLOCK(dn->inode)); err = truncate_node(dn); if (err) { f2fs_put_page(page, 1); --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -104,6 +104,7 @@ enum f2fs_error { ERROR_INCONSISTENT_SIT, ERROR_CORRUPTED_VERITY_XATTR, ERROR_CORRUPTED_XATTR, + ERROR_INVALID_NODE_REFERENCE, ERROR_MAX, };
From: Pavel Begunkov asml.silence@gmail.com
commit 5498bf28d8f2bd63a46ad40f4427518615fb793f upstream.
It's racy to read ->cached_cq_tail without taking proper measures (usually grabbing ->completion_lock) as timeout requests with CQE offsets do, however they have never had a good semantics for from when they start counting. Annotate racy reads with data_race().
Reported-by: syzbot+cb265db2f3f3468ef436@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/4de3685e185832a92a572df2be2c735d2e21a83d.168450605... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/timeout.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/io_uring/timeout.c +++ b/io_uring/timeout.c @@ -545,7 +545,7 @@ int io_timeout(struct io_kiocb *req, uns goto add; }
- tail = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts); + tail = data_race(ctx->cached_cq_tail) - atomic_read(&ctx->cq_timeouts); timeout->target_seq = tail + off;
/* Update the last seq here in case io_flush_timeouts() hasn't.
From: Roger Quadros rogerq@kernel.org
[ Upstream commit d8403b9eeee66d5dd81ecb9445800b108c267ce3 ]
Once the ECC word endianness is converted to BE32, we force cast it to u32 so we can use elm_write_reg() which in turn uses writel().
Fixes below sparse warnings:
drivers/mtd/nand/raw/omap_elm.c:180:37: sparse: expected unsigned int [usertype] val drivers/mtd/nand/raw/omap_elm.c:180:37: sparse: got restricted __be32 [usertype] drivers/mtd/nand/raw/omap_elm.c:185:37: sparse: expected unsigned int [usertype] val drivers/mtd/nand/raw/omap_elm.c:185:37: sparse: got restricted __be32 [usertype] drivers/mtd/nand/raw/omap_elm.c:190:37: sparse: expected unsigned int [usertype] val drivers/mtd/nand/raw/omap_elm.c:190:37: sparse: got restricted __be32 [usertype]
drivers/mtd/nand/raw/omap_elm.c:200:40: sparse: sparse: restricted __be32 degrades to integer
drivers/mtd/nand/raw/omap_elm.c:206:39: sparse: sparse: restricted __be32 degrades to integer drivers/mtd/nand/raw/omap_elm.c:210:37: sparse: expected unsigned int [assigned] [usertype] val drivers/mtd/nand/raw/omap_elm.c:210:37: sparse: got restricted __be32 [usertype] drivers/mtd/nand/raw/omap_elm.c:213:37: sparse: expected unsigned int [assigned] [usertype] val drivers/mtd/nand/raw/omap_elm.c:213:37: sparse: got restricted __be32 [usertype] drivers/mtd/nand/raw/omap_elm.c:216:37: sparse: expected unsigned int [assigned] [usertype] val drivers/mtd/nand/raw/omap_elm.c:216:37: sparse: got restricted __be32 [usertype] drivers/mtd/nand/raw/omap_elm.c:219:37: sparse: expected unsigned int [assigned] [usertype] val drivers/mtd/nand/raw/omap_elm.c:219:37: sparse: got restricted __be32 [usertype] drivers/mtd/nand/raw/omap_elm.c:222:37: sparse: expected unsigned int [assigned] [usertype] val drivers/mtd/nand/raw/omap_elm.c:222:37: sparse: got restricted __be32 [usertype] drivers/mtd/nand/raw/omap_elm.c:225:37: sparse: expected unsigned int [assigned] [usertype] val drivers/mtd/nand/raw/omap_elm.c:225:37: sparse: got restricted __be32 [usertype] drivers/mtd/nand/raw/omap_elm.c:228:39: sparse: sparse: restricted __be32 degrades to integer
Fixes: bf22433575ef ("mtd: devices: elm: Add support for ELM error correction") Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202306212211.WDXokuWh-lkp@intel.com/ Signed-off-by: Roger Quadros rogerq@kernel.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20230624184021.7740-1-rogerq@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/nand/raw/omap_elm.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/mtd/nand/raw/omap_elm.c b/drivers/mtd/nand/raw/omap_elm.c index 4796a48e1012a..22d37fc37e98a 100644 --- a/drivers/mtd/nand/raw/omap_elm.c +++ b/drivers/mtd/nand/raw/omap_elm.c @@ -177,17 +177,17 @@ static void elm_load_syndrome(struct elm_info *info, switch (info->bch_type) { case BCH8_ECC: /* syndrome fragment 0 = ecc[9-12B] */ - val = cpu_to_be32(*(u32 *) &ecc[9]); + val = (__force u32)cpu_to_be32(*(u32 *)&ecc[9]); elm_write_reg(info, offset, val);
/* syndrome fragment 1 = ecc[5-8B] */ offset += 4; - val = cpu_to_be32(*(u32 *) &ecc[5]); + val = (__force u32)cpu_to_be32(*(u32 *)&ecc[5]); elm_write_reg(info, offset, val);
/* syndrome fragment 2 = ecc[1-4B] */ offset += 4; - val = cpu_to_be32(*(u32 *) &ecc[1]); + val = (__force u32)cpu_to_be32(*(u32 *)&ecc[1]); elm_write_reg(info, offset, val);
/* syndrome fragment 3 = ecc[0B] */ @@ -197,35 +197,35 @@ static void elm_load_syndrome(struct elm_info *info, break; case BCH4_ECC: /* syndrome fragment 0 = ecc[20-52b] bits */ - val = (cpu_to_be32(*(u32 *) &ecc[3]) >> 4) | + val = ((__force u32)cpu_to_be32(*(u32 *)&ecc[3]) >> 4) | ((ecc[2] & 0xf) << 28); elm_write_reg(info, offset, val);
/* syndrome fragment 1 = ecc[0-20b] bits */ offset += 4; - val = cpu_to_be32(*(u32 *) &ecc[0]) >> 12; + val = (__force u32)cpu_to_be32(*(u32 *)&ecc[0]) >> 12; elm_write_reg(info, offset, val); break; case BCH16_ECC: - val = cpu_to_be32(*(u32 *) &ecc[22]); + val = (__force u32)cpu_to_be32(*(u32 *)&ecc[22]); elm_write_reg(info, offset, val); offset += 4; - val = cpu_to_be32(*(u32 *) &ecc[18]); + val = (__force u32)cpu_to_be32(*(u32 *)&ecc[18]); elm_write_reg(info, offset, val); offset += 4; - val = cpu_to_be32(*(u32 *) &ecc[14]); + val = (__force u32)cpu_to_be32(*(u32 *)&ecc[14]); elm_write_reg(info, offset, val); offset += 4; - val = cpu_to_be32(*(u32 *) &ecc[10]); + val = (__force u32)cpu_to_be32(*(u32 *)&ecc[10]); elm_write_reg(info, offset, val); offset += 4; - val = cpu_to_be32(*(u32 *) &ecc[6]); + val = (__force u32)cpu_to_be32(*(u32 *)&ecc[6]); elm_write_reg(info, offset, val); offset += 4; - val = cpu_to_be32(*(u32 *) &ecc[2]); + val = (__force u32)cpu_to_be32(*(u32 *)&ecc[2]); elm_write_reg(info, offset, val); offset += 4; - val = cpu_to_be32(*(u32 *) &ecc[0]) >> 16; + val = (__force u32)cpu_to_be32(*(u32 *)&ecc[0]) >> 16; elm_write_reg(info, offset, val); break; default:
From: Johan Jonker jbx6244@gmail.com
[ Upstream commit d0ca3b92b7a6f42841ea9da8492aaf649db79780 ]
Rockchip boot blocks are written per 4 x 512 byte sectors per page. Each page with boot blocks must have a page address (PA) pointer in OOB to the next page.
The currently advertised free OOB area starts at offset 6, like if 4 PA bytes were located right after the BBM. This is wrong as the PA bytes are located right before the ECC bytes.
Fix the layout by allowing access to all bytes between the BBM and the PA bytes instead of reserving 4 bytes right after the BBM.
This change breaks existing jffs2 users.
Fixes: 058e0e847d54 ("mtd: rawnand: rockchip: NFC driver for RK3308, RK2928 and others") Signed-off-by: Johan Jonker jbx6244@gmail.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/d202f12d-188c-20e8-f2c2-9cc874ad4d22@gmail... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/nand/raw/rockchip-nand-controller.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/mtd/nand/raw/rockchip-nand-controller.c b/drivers/mtd/nand/raw/rockchip-nand-controller.c index f133985cc053a..9070dafae9db8 100644 --- a/drivers/mtd/nand/raw/rockchip-nand-controller.c +++ b/drivers/mtd/nand/raw/rockchip-nand-controller.c @@ -562,9 +562,10 @@ static int rk_nfc_write_page_raw(struct nand_chip *chip, const u8 *buf, * BBM OOB1 OOB2 OOB3 |......| PA0 PA1 PA2 PA3 * * The rk_nfc_ooblayout_free() function already has reserved - * these 4 bytes with: + * these 4 bytes together with 2 bytes for BBM + * by reducing it's length: * - * oob_region->offset = NFC_SYS_DATA_SIZE + 2; + * oob_region->length = rknand->metadata_size - NFC_SYS_DATA_SIZE - 2; */ if (!i) memcpy(rk_nfc_oob_ptr(chip, i), @@ -933,12 +934,8 @@ static int rk_nfc_ooblayout_free(struct mtd_info *mtd, int section, if (section) return -ERANGE;
- /* - * The beginning of the OOB area stores the reserved data for the NFC, - * the size of the reserved data is NFC_SYS_DATA_SIZE bytes. - */ oob_region->length = rknand->metadata_size - NFC_SYS_DATA_SIZE - 2; - oob_region->offset = NFC_SYS_DATA_SIZE + 2; + oob_region->offset = 2;
return 0; }
From: Johan Jonker jbx6244@gmail.com
[ Upstream commit ea690ad78dd611e3906df5b948a516000b05c1cb ]
Currently, read/write_page_hwecc() and read/write_page_raw() are not aligned: there is a mismatch in the OOB bytes which are not read/written at the same offset in both cases (raw vs. hwecc).
This is a real problem when relying on the presence of the Page Addresses (PA) when using the NAND chip as a boot device, as the BootROM expects additional data in the OOB area at specific locations.
Rockchip boot blocks are written per 4 x 512 byte sectors per page. Each page with boot blocks must have a page address (PA) pointer in OOB to the next page. Pages are written in a pattern depending on the NAND chip ID.
Generate boot block page address and pattern for hwecc in user space and copy PA data to/from the already reserved last 4 bytes before ECC in the chip->oob_poi data layout.
Align the different helpers. This change breaks existing jffs2 users.
Fixes: 058e0e847d54 ("mtd: rawnand: rockchip: NFC driver for RK3308, RK2928 and others") Signed-off-by: Johan Jonker jbx6244@gmail.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/5e782c08-862b-51ae-47ff-3299940928ca@gmail... Signed-off-by: Sasha Levin sashal@kernel.org --- .../mtd/nand/raw/rockchip-nand-controller.c | 34 ++++++++++++------- 1 file changed, 21 insertions(+), 13 deletions(-)
diff --git a/drivers/mtd/nand/raw/rockchip-nand-controller.c b/drivers/mtd/nand/raw/rockchip-nand-controller.c index 9070dafae9db8..c9c4e9ffcae18 100644 --- a/drivers/mtd/nand/raw/rockchip-nand-controller.c +++ b/drivers/mtd/nand/raw/rockchip-nand-controller.c @@ -598,7 +598,7 @@ static int rk_nfc_write_page_hwecc(struct nand_chip *chip, const u8 *buf, int pages_per_blk = mtd->erasesize / mtd->writesize; int ret = 0, i, boot_rom_mode = 0; dma_addr_t dma_data, dma_oob; - u32 reg; + u32 tmp; u8 *oob;
nand_prog_page_begin_op(chip, page, 0, NULL, 0); @@ -625,6 +625,13 @@ static int rk_nfc_write_page_hwecc(struct nand_chip *chip, const u8 *buf, * * 0xFF 0xFF 0xFF 0xFF | BBM OOB1 OOB2 OOB3 | ... * + * The code here just swaps the first 4 bytes with the last + * 4 bytes without losing any data. + * + * The chip->oob_poi data layout: + * + * BBM OOB1 OOB2 OOB3 |......| PA0 PA1 PA2 PA3 + * * Configure the ECC algorithm supported by the boot ROM. */ if ((page < (pages_per_blk * rknand->boot_blks)) && @@ -635,21 +642,17 @@ static int rk_nfc_write_page_hwecc(struct nand_chip *chip, const u8 *buf, }
for (i = 0; i < ecc->steps; i++) { - if (!i) { - reg = 0xFFFFFFFF; - } else { + if (!i) + oob = chip->oob_poi + (ecc->steps - 1) * NFC_SYS_DATA_SIZE; + else oob = chip->oob_poi + (i - 1) * NFC_SYS_DATA_SIZE; - reg = oob[0] | oob[1] << 8 | oob[2] << 16 | - oob[3] << 24; - }
- if (!i && boot_rom_mode) - reg = (page & (pages_per_blk - 1)) * 4; + tmp = oob[0] | oob[1] << 8 | oob[2] << 16 | oob[3] << 24;
if (nfc->cfg->type == NFC_V9) - nfc->oob_buf[i] = reg; + nfc->oob_buf[i] = tmp; else - nfc->oob_buf[i * (oob_step / 4)] = reg; + nfc->oob_buf[i * (oob_step / 4)] = tmp; }
dma_data = dma_map_single(nfc->dev, (void *)nfc->page_buf, @@ -812,12 +815,17 @@ static int rk_nfc_read_page_hwecc(struct nand_chip *chip, u8 *buf, int oob_on, goto timeout_err; }
- for (i = 1; i < ecc->steps; i++) { - oob = chip->oob_poi + (i - 1) * NFC_SYS_DATA_SIZE; + for (i = 0; i < ecc->steps; i++) { + if (!i) + oob = chip->oob_poi + (ecc->steps - 1) * NFC_SYS_DATA_SIZE; + else + oob = chip->oob_poi + (i - 1) * NFC_SYS_DATA_SIZE; + if (nfc->cfg->type == NFC_V9) tmp = nfc->oob_buf[i]; else tmp = nfc->oob_buf[i * (oob_step / 4)]; + *oob++ = (u8)tmp; *oob++ = (u8)(tmp >> 8); *oob++ = (u8)(tmp >> 16);
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit c6abce60338aa2080973cd95be0aedad528bb41f ]
'op-cs' is copied in 'fun->mchip_number' which is used to access the 'mchip_offsets' and the 'rnb_gpio' arrays. These arrays have NAND_MAX_CHIPS elements, so the index must be below this limit.
Fix the sanity check in order to avoid the NAND_MAX_CHIPS value. This would lead to out-of-bound accesses.
Fixes: 54309d657767 ("mtd: rawnand: fsl_upm: Implement exec_op()") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Reviewed-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/cd01cba1c7eda58bdabaae174c78c067325803d2.1... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/nand/raw/fsl_upm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/fsl_upm.c b/drivers/mtd/nand/raw/fsl_upm.c index b3cc427100a22..636e65328bb32 100644 --- a/drivers/mtd/nand/raw/fsl_upm.c +++ b/drivers/mtd/nand/raw/fsl_upm.c @@ -135,7 +135,7 @@ static int fun_exec_op(struct nand_chip *chip, const struct nand_operation *op, unsigned int i; int ret;
- if (op->cs > NAND_MAX_CHIPS) + if (op->cs >= NAND_MAX_CHIPS) return -EINVAL;
if (check_only)
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit 6722b25712054c0f903b839b8f5088438dd04df3 ]
altmap->free includes the entire free space from which altmap blocks can be allocated. So when checking whether the kernel is doing altmap block free, compute the boundary correctly, otherwise memory hotunplug can fail.
Fixes: 9ef34630a461 ("powerpc/mm: Fallback to RAM if the altmap is unusable") Signed-off-by: "Aneesh Kumar K.V" aneesh.kumar@linux.ibm.com Reviewed-by: David Hildenbrand david@redhat.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20230724181320.471386-1-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/mm/init_64.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index fe1b83020e0df..0ec5b45b1e86a 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -314,8 +314,7 @@ void __ref vmemmap_free(unsigned long start, unsigned long end, start = ALIGN_DOWN(start, page_size); if (altmap) { alt_start = altmap->base_pfn; - alt_end = altmap->base_pfn + altmap->reserve + - altmap->free + altmap->alloc + altmap->align; + alt_end = altmap->base_pfn + altmap->reserve + altmap->free; }
pr_debug("vmemmap_free %lx...%lx\n", start, end);
From: Alexander Stein alexander.stein@ew.tq-group.com
[ Upstream commit ee31742bf17636da1304af77b2cb1c29b5dda642 ]
When hactive is not aligned to 8 pixels, it is aligned accordingly and hfront porch needs to be reduced the same amount. Unfortunately the front porch is set to the difference rather than reducing it. There are some Samsung TVs which can't cope with a front porch of instead of 70.
Fixes: 94dfec48fca7 ("drm/imx: Add 8 pixel alignment fix") Signed-off-by: Alexander Stein alexander.stein@ew.tq-group.com Reviewed-by: Philipp Zabel p.zabel@pengutronix.de Link: https://lore.kernel.org/r/20230515072137.116211-1-alexander.stein@ew.tq-grou... [p.zabel@pengutronix.de: Fixed subject] Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Link: https://patchwork.freedesktop.org/patch/msgid/20230515072137.116211-1-alexan... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/imx/ipuv3-crtc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/imx/ipuv3-crtc.c b/drivers/gpu/drm/imx/ipuv3-crtc.c index 5f26090b0c985..89585b31b985e 100644 --- a/drivers/gpu/drm/imx/ipuv3-crtc.c +++ b/drivers/gpu/drm/imx/ipuv3-crtc.c @@ -310,7 +310,7 @@ static void ipu_crtc_mode_set_nofb(struct drm_crtc *crtc) dev_warn(ipu_crtc->dev, "8-pixel align hactive %d -> %d\n", sig_cfg.mode.hactive, new_hactive);
- sig_cfg.mode.hfront_porch = new_hactive - sig_cfg.mode.hactive; + sig_cfg.mode.hfront_porch -= new_hactive - sig_cfg.mode.hactive; sig_cfg.mode.hactive = new_hactive; }
From: Rodrigo Siqueira Rodrigo.Siqueira@amd.com
commit bb46a6a9bab134b9d15043ea8fa9d6c276e938b8 upstream.
The function dc_update_planes_and_stream handles multiple cases where DC needs to remove and add planes in the commit tail phase. After Linux started to use this function, some of the IGT kms_plane started to fail; one good example to illustrate why the new sequence regressed IGT is the subtest plane-position-hole which has the following diagram as a template:
+--------------------+ +-----------------------+ | +-----+ | | +-----+ | | | | | | | +-----+ | | | +--+ | ==> | | | | | | | |__| | | +-|---+ | | | | | +-----+ | | | | | +--------------------+ +-----------------------+ (a) Final image (b) Composed image
IGT expects image (a) as the final result of two plane compositions as described in figure (b). After the migration to the new sequence, the last plane order is changed, and DC generates the following image:
+---------------------+ | +-----+ | | | | | | | | | | +-----+ | | | +---------------------+
Notice that the generated image by DC is different because the small square that should be composed on top of the primary plane is below the primary plane. For this reason, the CRC will mismatch with the expected value. Since the function dc_add_all_planes_for_stream re-append all the new planes back to the dc_validation_set, this commit ensures that the original sequence is maintained. After this change, all CI tests in all ASICs start to pass again.
Reviewed-by: Harry Wentland Harry.Wentland@amd.com Acked-by: Qingqing Zhuo qingqing.zhuo@amd.com Suggested-by: Melissa Wen mwen@igalia.com Signed-off-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -351,6 +351,19 @@ static inline bool is_dc_timing_adjust_n return false; }
+static inline void reverse_planes_order(struct dc_surface_update *array_of_surface_update, + int planes_count) +{ + int i, j; + struct dc_surface_update surface_updates_temp; + + for (i = 0, j = planes_count - 1; i < j; i++, j--) { + surface_updates_temp = array_of_surface_update[i]; + array_of_surface_update[i] = array_of_surface_update[j]; + array_of_surface_update[j] = surface_updates_temp; + } +} + /** * update_planes_and_stream_adapter() - Send planes to be updated in DC * @@ -367,6 +380,8 @@ static inline bool update_planes_and_str struct dc_stream_update *stream_update, struct dc_surface_update *array_of_surface_update) { + reverse_planes_order(array_of_surface_update, planes_count); + /* * Previous frame finished and HW is ready for optimization. */
From: Peichen Huang PeiChen.Huang@amd.com
commit a1c9a1e27022d13c70a14c4faeab6ce293ad043b upstream.
[Why] Some dock and mst monitor don't like to receive CLEAR_PAYLOAD_ID_TABLE when mst_en is set to 0. It doesn't make sense to do so in source side, either.
[How] Don't send CLEAR_PAYLOAD_ID_TABLE if mst_en is 0
Reviewed-by: George Shen George.Shen@amd.com Acked-by: Qingqing Zhuo qingqing.zhuo@amd.com Signed-off-by: Peichen Huang PeiChen.Huang@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com [ 6.1.y doesn't have the file rename from 54618888d1ea7 ("drm/amd/display: break down dc_link.c") ] Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c @@ -2092,6 +2092,7 @@ static enum dc_status enable_link_dp_mst struct pipe_ctx *pipe_ctx) { struct dc_link *link = pipe_ctx->stream->link; + unsigned char mstm_cntl;
/* sink signal type after MST branch is MST. Multiple MST sinks * share one link. Link DP PHY is enable or training only once. @@ -2100,7 +2101,9 @@ static enum dc_status enable_link_dp_mst return DC_OK;
/* clear payload table */ - dm_helpers_dp_mst_clear_payload_allocation_table(link->ctx, link); + core_link_read_dpcd(link, DP_MSTM_CTRL, &mstm_cntl, 1); + if (mstm_cntl & DP_MST_EN) + dm_helpers_dp_mst_clear_payload_allocation_table(link->ctx, link);
/* to make sure the pending down rep can be processed * before enabling the link
From: Sean Christopherson seanjc@google.com
[ Upstream commit 3bcbc20942db5d738221cca31a928efc09827069 ]
To allow running rseq and KVM's rseq selftests as statically linked binaries, initialize the various "trampoline" pointers to point directly at the expect glibc symbols, and skip the dlysm() lookups if the rseq size is non-zero, i.e. the binary is statically linked *and* the libc registered its own rseq.
Define weak versions of the symbols so as not to break linking against libc versions that don't support rseq in any capacity.
The KVM selftests in particular are often statically linked so that they can be run on targets with very limited runtime environments, i.e. test machines.
Fixes: 233e667e1ae3 ("selftests/rseq: Uplift rseq selftests for compatibility with glibc-2.35") Cc: Aaron Lewis aaronlewis@google.com Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20230721223352.2333911-1-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/rseq/rseq.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c index 4177f9507bbee..b736a5169aad0 100644 --- a/tools/testing/selftests/rseq/rseq.c +++ b/tools/testing/selftests/rseq/rseq.c @@ -32,9 +32,17 @@ #include "../kselftest.h" #include "rseq.h"
-static const ptrdiff_t *libc_rseq_offset_p; -static const unsigned int *libc_rseq_size_p; -static const unsigned int *libc_rseq_flags_p; +/* + * Define weak versions to play nice with binaries that are statically linked + * against a libc that doesn't support registering its own rseq. + */ +__weak ptrdiff_t __rseq_offset; +__weak unsigned int __rseq_size; +__weak unsigned int __rseq_flags; + +static const ptrdiff_t *libc_rseq_offset_p = &__rseq_offset; +static const unsigned int *libc_rseq_size_p = &__rseq_size; +static const unsigned int *libc_rseq_flags_p = &__rseq_flags;
/* Offset from the thread pointer to the rseq area. */ ptrdiff_t rseq_offset; @@ -108,9 +116,17 @@ int rseq_unregister_current_thread(void) static __attribute__((constructor)) void rseq_init(void) { - libc_rseq_offset_p = dlsym(RTLD_NEXT, "__rseq_offset"); - libc_rseq_size_p = dlsym(RTLD_NEXT, "__rseq_size"); - libc_rseq_flags_p = dlsym(RTLD_NEXT, "__rseq_flags"); + /* + * If the libc's registered rseq size isn't already valid, it may be + * because the binary is dynamically linked and not necessarily due to + * libc not having registered a restartable sequence. Try to find the + * symbols if that's the case. + */ + if (!*libc_rseq_size_p) { + libc_rseq_offset_p = dlsym(RTLD_NEXT, "__rseq_offset"); + libc_rseq_size_p = dlsym(RTLD_NEXT, "__rseq_size"); + libc_rseq_flags_p = dlsym(RTLD_NEXT, "__rseq_flags"); + } if (libc_rseq_size_p && libc_rseq_offset_p && libc_rseq_flags_p && *libc_rseq_size_p != 0) { /* rseq registration owned by glibc */
From: Yangtao Li frank.li@vivo.com
[ Upstream commit 967eaad1fed5f6335ea97a47d45214744dc57925 ]
Some minor modifications to flush_merge and related parameters:
1.The FLUSH_MERGE opt is set by default only in non-ro mode. 2.When ro and merge are set at the same time, an error is reported. 3.Display noflush_merge mount opt.
Suggested-by: Chao Yu chao@kernel.org Signed-off-by: Yangtao Li frank.li@vivo.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Stable-dep-of: 458c15dfbce6 ("f2fs: don't reset unchangable mount option in f2fs_remount()") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/super.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index b6dad389fa144..36bb1c969e8bb 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1347,6 +1347,12 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount) return -EINVAL; }
+ if ((f2fs_sb_has_readonly(sbi) || f2fs_readonly(sbi->sb)) && + test_opt(sbi, FLUSH_MERGE)) { + f2fs_err(sbi, "FLUSH_MERGE not compatible with readonly mode"); + return -EINVAL; + } + if (f2fs_sb_has_readonly(sbi) && !f2fs_readonly(sbi->sb)) { f2fs_err(sbi, "Allow to mount readonly mode only"); return -EROFS; @@ -1933,8 +1939,10 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root) seq_puts(seq, ",inline_dentry"); else seq_puts(seq, ",noinline_dentry"); - if (!f2fs_readonly(sbi->sb) && test_opt(sbi, FLUSH_MERGE)) + if (test_opt(sbi, FLUSH_MERGE)) seq_puts(seq, ",flush_merge"); + else + seq_puts(seq, ",noflush_merge"); if (test_opt(sbi, NOBARRIER)) seq_puts(seq, ",nobarrier"); if (test_opt(sbi, FASTBOOT)) @@ -2063,7 +2071,8 @@ static void default_options(struct f2fs_sb_info *sbi) set_opt(sbi, MERGE_CHECKPOINT); F2FS_OPTION(sbi).unusable_cap = 0; sbi->sb->s_flags |= SB_LAZYTIME; - set_opt(sbi, FLUSH_MERGE); + if (!f2fs_sb_has_readonly(sbi) && !f2fs_readonly(sbi->sb)) + set_opt(sbi, FLUSH_MERGE); if (f2fs_hw_support_discard(sbi) || f2fs_hw_should_discard(sbi)) set_opt(sbi, DISCARD); if (f2fs_sb_has_blkzoned(sbi)) {
From: Chao Yu chao@kernel.org
[ Upstream commit 458c15dfbce62c35fefd9ca637b20a051309c9f1 ]
syzbot reports a bug as below:
general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:__lock_acquire+0x69/0x2000 kernel/locking/lockdep.c:4942 Call Trace: lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5691 __raw_write_lock include/linux/rwlock_api_smp.h:209 [inline] _raw_write_lock+0x2e/0x40 kernel/locking/spinlock.c:300 __drop_extent_tree+0x3ac/0x660 fs/f2fs/extent_cache.c:1100 f2fs_drop_extent_tree+0x17/0x30 fs/f2fs/extent_cache.c:1116 f2fs_insert_range+0x2d5/0x3c0 fs/f2fs/file.c:1664 f2fs_fallocate+0x4e4/0x6d0 fs/f2fs/file.c:1838 vfs_fallocate+0x54b/0x6b0 fs/open.c:324 ksys_fallocate fs/open.c:347 [inline] __do_sys_fallocate fs/open.c:355 [inline] __se_sys_fallocate fs/open.c:353 [inline] __x64_sys_fallocate+0xbd/0x100 fs/open.c:353 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
The root cause is race condition as below: - since it tries to remount rw filesystem, so that do_remount won't call sb_prepare_remount_readonly to block fallocate, there may be race condition in between remount and fallocate. - in f2fs_remount(), default_options() will reset mount option to default one, and then update it based on result of parse_options(), so there is a hole which race condition can happen.
Thread A Thread B - f2fs_fill_super - parse_options - clear_opt(READ_EXTENT_CACHE)
- f2fs_remount - default_options - set_opt(READ_EXTENT_CACHE) - f2fs_fallocate - f2fs_insert_range - f2fs_drop_extent_tree - __drop_extent_tree - __may_extent_tree - test_opt(READ_EXTENT_CACHE) return true - write_lock(&et->lock) access NULL pointer - parse_options - clear_opt(READ_EXTENT_CACHE)
Cc: stable@vger.kernel.org Reported-by: syzbot+d015b6c2fbb5c383bf08@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/20230522124203.3838360-1-chao@kerne... Signed-off-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/super.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 36bb1c969e8bb..ff47aad636e5b 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2040,9 +2040,22 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root) return 0; }
-static void default_options(struct f2fs_sb_info *sbi) +static void default_options(struct f2fs_sb_info *sbi, bool remount) { /* init some FS parameters */ + if (!remount) { + set_opt(sbi, READ_EXTENT_CACHE); + clear_opt(sbi, DISABLE_CHECKPOINT); + + if (f2fs_hw_support_discard(sbi) || f2fs_hw_should_discard(sbi)) + set_opt(sbi, DISCARD); + + if (f2fs_sb_has_blkzoned(sbi)) + F2FS_OPTION(sbi).discard_unit = DISCARD_UNIT_SECTION; + else + F2FS_OPTION(sbi).discard_unit = DISCARD_UNIT_BLOCK; + } + if (f2fs_sb_has_readonly(sbi)) F2FS_OPTION(sbi).active_logs = NR_CURSEG_RO_TYPE; else @@ -2065,23 +2078,16 @@ static void default_options(struct f2fs_sb_info *sbi) set_opt(sbi, INLINE_XATTR); set_opt(sbi, INLINE_DATA); set_opt(sbi, INLINE_DENTRY); - set_opt(sbi, READ_EXTENT_CACHE); set_opt(sbi, NOHEAP); - clear_opt(sbi, DISABLE_CHECKPOINT); set_opt(sbi, MERGE_CHECKPOINT); F2FS_OPTION(sbi).unusable_cap = 0; sbi->sb->s_flags |= SB_LAZYTIME; if (!f2fs_sb_has_readonly(sbi) && !f2fs_readonly(sbi->sb)) set_opt(sbi, FLUSH_MERGE); - if (f2fs_hw_support_discard(sbi) || f2fs_hw_should_discard(sbi)) - set_opt(sbi, DISCARD); - if (f2fs_sb_has_blkzoned(sbi)) { + if (f2fs_sb_has_blkzoned(sbi)) F2FS_OPTION(sbi).fs_mode = FS_MODE_LFS; - F2FS_OPTION(sbi).discard_unit = DISCARD_UNIT_SECTION; - } else { + else F2FS_OPTION(sbi).fs_mode = FS_MODE_ADAPTIVE; - F2FS_OPTION(sbi).discard_unit = DISCARD_UNIT_BLOCK; - }
#ifdef CONFIG_F2FS_FS_XATTR set_opt(sbi, XATTR_USER); @@ -2253,7 +2259,7 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) clear_sbi_flag(sbi, SBI_NEED_SB_WRITE); }
- default_options(sbi); + default_options(sbi, true);
/* parse mount options */ err = parse_options(sb, data, true); @@ -4150,7 +4156,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) sbi->s_chksum_seed = f2fs_chksum(sbi, ~0, raw_super->uuid, sizeof(raw_super->uuid));
- default_options(sbi); + default_options(sbi, false); /* parse mount options */ options = kstrdup((const char *)data, GFP_KERNEL); if (data && !options) {
From: Namjae Jeon linkinjeon@kernel.org
[ Upstream commit d42334578eba1390859012ebb91e1e556d51db49 ]
exfat_extract_uni_name copies characters from a given file name entry into the 'uniname' variable. This variable is actually defined on the stack of the exfat_readdir() function. According to the definition of the 'exfat_uni_name' type, the file name should be limited 255 characters (+ null teminator space), but the exfat_get_uniname_from_ext_entry() function can write more characters because there is no check if filename entries exceeds max filename length. This patch add the check not to copy filename characters when exceeding max filename length.
Cc: stable@vger.kernel.org Cc: Yuezhang Mo Yuezhang.Mo@sony.com Reported-by: Maxim Suhanov dfirblog@gmail.com Reviewed-by: Sungjong Seo sj1557.seo@samsung.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/exfat/dir.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c index 78de6f67f882d..51b03b0dd5f75 100644 --- a/fs/exfat/dir.c +++ b/fs/exfat/dir.c @@ -34,6 +34,7 @@ static void exfat_get_uniname_from_ext_entry(struct super_block *sb, { int i; struct exfat_entry_set_cache *es; + unsigned int uni_len = 0, len;
es = exfat_get_dentry_set(sb, p_dir, entry, ES_ALL_ENTRIES); if (!es) @@ -52,7 +53,10 @@ static void exfat_get_uniname_from_ext_entry(struct super_block *sb, if (exfat_get_entry_type(ep) != TYPE_EXTEND) break;
- exfat_extract_uni_name(ep, uniname); + len = exfat_extract_uni_name(ep, uniname); + uni_len += len; + if (len != EXFAT_FILE_NAME_LEN || uni_len >= MAX_NAME_LENGTH) + break; uniname += EXFAT_FILE_NAME_LEN; }
@@ -1024,7 +1028,8 @@ int exfat_find_dir_entry(struct super_block *sb, struct exfat_inode_info *ei, if (entry_type == TYPE_EXTEND) { unsigned short entry_uniname[16], unichar;
- if (step != DIRENT_STEP_NAME) { + if (step != DIRENT_STEP_NAME || + name_len >= MAX_NAME_LENGTH) { step = DIRENT_STEP_FILE; continue; }
From: Mark Brown broonie@kernel.org
commit 045aecdfcb2e060db142d83a0f4082380c465d2c upstream.
Systems which implement SME without also implementing SVE are architecturally valid but were not initially supported by the kernel, unfortunately we missed one issue in the ptrace code.
The SVE register setting code is shared between SVE and streaming mode SVE. When we set full SVE register state we currently enable TIF_SVE unconditionally, in the case where streaming SVE is being configured on a system that supports vanilla SVE this is not an issue since we always initialise enough state for both vector lengths but on a system which only support SME it will result in us attempting to restore the SVE vector length after having set streaming SVE registers.
Fix this by making the enabling of SVE conditional on setting SVE vector state. If we set streaming SVE state and SVE was not already enabled this will result in a SVE access trap on next use of normal SVE, this will cause us to flush our register state but this is fine since the only way to trigger a SVE access trap would be to exit streaming mode which will cause the in register state to be flushed anyway.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-1-49df214... Signed-off-by: Catalin Marinas catalin.marinas@arm.com [Fix up backport -- broonie] Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/ptrace.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -937,11 +937,13 @@ static int sve_set_common(struct task_st /* * Ensure target->thread.sve_state is up to date with target's * FPSIMD regs, so that a short copyin leaves trailing - * registers unmodified. Always enable SVE even if going into - * streaming mode. + * registers unmodified. Only enable SVE if we are + * configuring normal SVE, a system with streaming SVE may not + * have normal SVE. */ fpsimd_sync_to_sve(target); - set_tsk_thread_flag(target, TIF_SVE); + if (type == ARM64_VEC_SVE) + set_tsk_thread_flag(target, TIF_SVE);
BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header)); start = SVE_PT_SVE_OFFSET;
From: Tong Liu01 Tong.Liu01@amd.com
commit 4864f2ee9ee2acf4a1009b58fbc62f17fa086d4e upstream
Move TMR region from top of FB to 2MB for FFBM, so we need to reserve TMR region firstly to make sure TMR can be allocated at 2MB
Signed-off-by: Tong Liu01 Tong.Liu01@amd.com Acked-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 104 +++++++++++++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 51 +++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 5 + drivers/gpu/drm/amd/include/atomfirmware.h | 63 +++++++++++-- 4 files changed, 191 insertions(+), 32 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c @@ -101,39 +101,97 @@ void amdgpu_atomfirmware_scratch_regs_in } }
+static int amdgpu_atomfirmware_allocate_fb_v2_1(struct amdgpu_device *adev, + struct vram_usagebyfirmware_v2_1 *fw_usage, int *usage_bytes) +{ + uint32_t start_addr, fw_size, drv_size; + + start_addr = le32_to_cpu(fw_usage->start_address_in_kb); + fw_size = le16_to_cpu(fw_usage->used_by_firmware_in_kb); + drv_size = le16_to_cpu(fw_usage->used_by_driver_in_kb); + + DRM_DEBUG("atom firmware v2_1 requested %08x %dkb fw %dkb drv\n", + start_addr, + fw_size, + drv_size); + + if ((start_addr & ATOM_VRAM_OPERATION_FLAGS_MASK) == + (uint32_t)(ATOM_VRAM_BLOCK_SRIOV_MSG_SHARE_RESERVATION << + ATOM_VRAM_OPERATION_FLAGS_SHIFT)) { + /* Firmware request VRAM reservation for SR-IOV */ + adev->mman.fw_vram_usage_start_offset = (start_addr & + (~ATOM_VRAM_OPERATION_FLAGS_MASK)) << 10; + adev->mman.fw_vram_usage_size = fw_size << 10; + /* Use the default scratch size */ + *usage_bytes = 0; + } else { + *usage_bytes = drv_size << 10; + } + return 0; +} + +static int amdgpu_atomfirmware_allocate_fb_v2_2(struct amdgpu_device *adev, + struct vram_usagebyfirmware_v2_2 *fw_usage, int *usage_bytes) +{ + uint32_t fw_start_addr, fw_size, drv_start_addr, drv_size; + + fw_start_addr = le32_to_cpu(fw_usage->fw_region_start_address_in_kb); + fw_size = le16_to_cpu(fw_usage->used_by_firmware_in_kb); + + drv_start_addr = le32_to_cpu(fw_usage->driver_region0_start_address_in_kb); + drv_size = le32_to_cpu(fw_usage->used_by_driver_region0_in_kb); + + DRM_DEBUG("atom requested fw start at %08x %dkb and drv start at %08x %dkb\n", + fw_start_addr, + fw_size, + drv_start_addr, + drv_size); + + if ((fw_start_addr & (ATOM_VRAM_BLOCK_NEEDS_NO_RESERVATION << 30)) == 0) { + /* Firmware request VRAM reservation for SR-IOV */ + adev->mman.fw_vram_usage_start_offset = (fw_start_addr & + (~ATOM_VRAM_OPERATION_FLAGS_MASK)) << 10; + adev->mman.fw_vram_usage_size = fw_size << 10; + } + + if ((drv_start_addr & (ATOM_VRAM_BLOCK_NEEDS_NO_RESERVATION << 30)) == 0) { + /* driver request VRAM reservation for SR-IOV */ + adev->mman.drv_vram_usage_start_offset = (drv_start_addr & + (~ATOM_VRAM_OPERATION_FLAGS_MASK)) << 10; + adev->mman.drv_vram_usage_size = drv_size << 10; + } + + *usage_bytes = 0; + return 0; +} + int amdgpu_atomfirmware_allocate_fb_scratch(struct amdgpu_device *adev) { struct atom_context *ctx = adev->mode_info.atom_context; int index = get_index_into_master_table(atom_master_list_of_data_tables_v2_1, vram_usagebyfirmware); - struct vram_usagebyfirmware_v2_1 *firmware_usage; - uint32_t start_addr, size; + struct vram_usagebyfirmware_v2_1 *fw_usage_v2_1; + struct vram_usagebyfirmware_v2_2 *fw_usage_v2_2; uint16_t data_offset; + uint8_t frev, crev; int usage_bytes = 0;
- if (amdgpu_atom_parse_data_header(ctx, index, NULL, NULL, NULL, &data_offset)) { - firmware_usage = (struct vram_usagebyfirmware_v2_1 *)(ctx->bios + data_offset); - DRM_DEBUG("atom firmware requested %08x %dkb fw %dkb drv\n", - le32_to_cpu(firmware_usage->start_address_in_kb), - le16_to_cpu(firmware_usage->used_by_firmware_in_kb), - le16_to_cpu(firmware_usage->used_by_driver_in_kb)); - - start_addr = le32_to_cpu(firmware_usage->start_address_in_kb); - size = le16_to_cpu(firmware_usage->used_by_firmware_in_kb); - - if ((uint32_t)(start_addr & ATOM_VRAM_OPERATION_FLAGS_MASK) == - (uint32_t)(ATOM_VRAM_BLOCK_SRIOV_MSG_SHARE_RESERVATION << - ATOM_VRAM_OPERATION_FLAGS_SHIFT)) { - /* Firmware request VRAM reservation for SR-IOV */ - adev->mman.fw_vram_usage_start_offset = (start_addr & - (~ATOM_VRAM_OPERATION_FLAGS_MASK)) << 10; - adev->mman.fw_vram_usage_size = size << 10; - /* Use the default scratch size */ - usage_bytes = 0; - } else { - usage_bytes = le16_to_cpu(firmware_usage->used_by_driver_in_kb) << 10; + if (amdgpu_atom_parse_data_header(ctx, index, NULL, &frev, &crev, &data_offset)) { + if (frev == 2 && crev == 1) { + fw_usage_v2_1 = + (struct vram_usagebyfirmware_v2_1 *)(ctx->bios + data_offset); + amdgpu_atomfirmware_allocate_fb_v2_1(adev, + fw_usage_v2_1, + &usage_bytes); + } else if (frev >= 2 && crev >= 2) { + fw_usage_v2_2 = + (struct vram_usagebyfirmware_v2_2 *)(ctx->bios + data_offset); + amdgpu_atomfirmware_allocate_fb_v2_2(adev, + fw_usage_v2_2, + &usage_bytes); } } + ctx->scratch_size_bytes = 0; if (usage_bytes == 0) usage_bytes = 20 * 1024; --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1537,6 +1537,23 @@ static void amdgpu_ttm_fw_reserve_vram_f NULL, &adev->mman.fw_vram_usage_va); }
+/* + * Driver Reservation functions + */ +/** + * amdgpu_ttm_drv_reserve_vram_fini - free drv reserved vram + * + * @adev: amdgpu_device pointer + * + * free drv reserved vram if it has been reserved. + */ +static void amdgpu_ttm_drv_reserve_vram_fini(struct amdgpu_device *adev) +{ + amdgpu_bo_free_kernel(&adev->mman.drv_vram_usage_reserved_bo, + NULL, + NULL); +} + /** * amdgpu_ttm_fw_reserve_vram_init - create bo vram reservation from fw * @@ -1563,6 +1580,31 @@ static int amdgpu_ttm_fw_reserve_vram_in &adev->mman.fw_vram_usage_va); }
+/** + * amdgpu_ttm_drv_reserve_vram_init - create bo vram reservation from driver + * + * @adev: amdgpu_device pointer + * + * create bo vram reservation from drv. + */ +static int amdgpu_ttm_drv_reserve_vram_init(struct amdgpu_device *adev) +{ + uint64_t vram_size = adev->gmc.visible_vram_size; + + adev->mman.drv_vram_usage_reserved_bo = NULL; + + if (adev->mman.drv_vram_usage_size == 0 || + adev->mman.drv_vram_usage_size > vram_size) + return 0; + + return amdgpu_bo_create_kernel_at(adev, + adev->mman.drv_vram_usage_start_offset, + adev->mman.drv_vram_usage_size, + AMDGPU_GEM_DOMAIN_VRAM, + &adev->mman.drv_vram_usage_reserved_bo, + NULL); +} + /* * Memoy training reservation functions */ @@ -1731,6 +1773,14 @@ int amdgpu_ttm_init(struct amdgpu_device }
/* + *The reserved vram for driver must be pinned to the specified + *place on the VRAM, so reserve it early. + */ + r = amdgpu_ttm_drv_reserve_vram_init(adev); + if (r) + return r; + + /* * only NAVI10 and onwards ASIC support for IP discovery. * If IP discovery enabled, a block of memory should be * reserved for IP discovey. @@ -1855,6 +1905,7 @@ void amdgpu_ttm_fini(struct amdgpu_devic amdgpu_bo_free_kernel(&adev->mman.sdma_access_bo, NULL, &adev->mman.sdma_access_ptr); amdgpu_ttm_fw_reserve_vram_fini(adev); + amdgpu_ttm_drv_reserve_vram_fini(adev);
if (drm_dev_enter(adev_to_drm(adev), &idx)) {
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -86,6 +86,11 @@ struct amdgpu_mman { struct amdgpu_bo *fw_vram_usage_reserved_bo; void *fw_vram_usage_va;
+ /* driver VRAM reservation */ + u64 drv_vram_usage_start_offset; + u64 drv_vram_usage_size; + struct amdgpu_bo *drv_vram_usage_reserved_bo; + /* PAGE_SIZE'd BO for process memory r/w over SDMA. */ struct amdgpu_bo *sdma_access_bo; void *sdma_access_ptr; --- a/drivers/gpu/drm/amd/include/atomfirmware.h +++ b/drivers/gpu/drm/amd/include/atomfirmware.h @@ -705,20 +705,65 @@ struct atom_gpio_pin_lut_v2_1 };
-/* - *************************************************************************** - Data Table vram_usagebyfirmware structure - *************************************************************************** -*/ +/* + * VBIOS/PRE-OS always reserve a FB region at the top of frame buffer. driver should not write + * access that region. driver can allocate their own reservation region as long as it does not + * overlap firwmare's reservation region. + * if (pre-NV1X) atom data table firmwareInfoTable version < 3.3: + * in this case, atom data table vram_usagebyfirmwareTable version always <= 2.1 + * if VBIOS/UEFI GOP is posted: + * VBIOS/UEFIGOP update used_by_firmware_in_kb = total reserved size by VBIOS + * update start_address_in_kb = total_mem_size_in_kb - used_by_firmware_in_kb; + * ( total_mem_size_in_kb = reg(CONFIG_MEMSIZE)<<10) + * driver can allocate driver reservation region under firmware reservation, + * used_by_driver_in_kb = driver reservation size + * driver reservation start address = (start_address_in_kb - used_by_driver_in_kb) + * Comment1[hchan]: There is only one reservation at the beginning of the FB reserved by + * host driver. Host driver would overwrite the table with the following + * used_by_firmware_in_kb = total reserved size for pf-vf info exchange and + * set SRIOV_MSG_SHARE_RESERVATION mask start_address_in_kb = 0 + * else there is no VBIOS reservation region: + * driver must allocate driver reservation region at top of FB. + * driver set used_by_driver_in_kb = driver reservation size + * driver reservation start address = (total_mem_size_in_kb - used_by_driver_in_kb) + * same as Comment1 + * else (NV1X and after): + * if VBIOS/UEFI GOP is posted: + * VBIOS/UEFIGOP update: + * used_by_firmware_in_kb = atom_firmware_Info_v3_3.fw_reserved_size_in_kb; + * start_address_in_kb = total_mem_size_in_kb - used_by_firmware_in_kb; + * (total_mem_size_in_kb = reg(CONFIG_MEMSIZE)<<10) + * if vram_usagebyfirmwareTable version <= 2.1: + * driver can allocate driver reservation region under firmware reservation, + * driver set used_by_driver_in_kb = driver reservation size + * driver reservation start address = start_address_in_kb - used_by_driver_in_kb + * same as Comment1 + * else driver can: + * allocate it reservation any place as long as it does overlap pre-OS FW reservation area + * set used_by_driver_region0_in_kb = driver reservation size + * set driver_region0_start_address_in_kb = driver reservation region start address + * Comment2[hchan]: Host driver can set used_by_firmware_in_kb and start_address_in_kb to + * zero as the reservation for VF as it doesn’t exist. And Host driver should also + * update atom_firmware_Info table to remove the same VBIOS reservation as well. + */
struct vram_usagebyfirmware_v2_1 { - struct atom_common_table_header table_header; - uint32_t start_address_in_kb; - uint16_t used_by_firmware_in_kb; - uint16_t used_by_driver_in_kb; + struct atom_common_table_header table_header; + uint32_t start_address_in_kb; + uint16_t used_by_firmware_in_kb; + uint16_t used_by_driver_in_kb; };
+struct vram_usagebyfirmware_v2_2 { + struct atom_common_table_header table_header; + uint32_t fw_region_start_address_in_kb; + uint16_t used_by_firmware_in_kb; + uint16_t reserved; + uint32_t driver_region0_start_address_in_kb; + uint32_t used_by_driver_region0_in_kb; + uint32_t reserved32[7]; +};
/* ***************************************************************************
From: Luben Tuikov luben.tuikov@amd.com
commit 3273f11675ef11959d25a56df3279f712bcd41b7 upstream
Remove the "domain" argument to amdgpu_bo_create_kernel_at() since this function takes an "offset" argument which is the offset off of VRAM, and as such allocation always takes place in VRAM. Thus, the "domain" argument is unnecessary.
Cc: Alex Deucher Alexander.Deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: AMD Graphics amd-gfx@lists.freedesktop.org Signed-off-by: Luben Tuikov luben.tuikov@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 ------- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 - 4 files changed, 6 insertions(+), 14 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -347,17 +347,16 @@ int amdgpu_bo_create_kernel(struct amdgp * @adev: amdgpu device object * @offset: offset of the BO * @size: size of the BO - * @domain: where to place it * @bo_ptr: used to initialize BOs in structures * @cpu_addr: optional CPU address mapping * - * Creates a kernel BO at a specific offset in the address space of the domain. + * Creates a kernel BO at a specific offset in VRAM. * * Returns: * 0 on success, negative error code otherwise. */ int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev, - uint64_t offset, uint64_t size, uint32_t domain, + uint64_t offset, uint64_t size, struct amdgpu_bo **bo_ptr, void **cpu_addr) { struct ttm_operation_ctx ctx = { false, false }; @@ -367,8 +366,9 @@ int amdgpu_bo_create_kernel_at(struct am offset &= PAGE_MASK; size = ALIGN(size, PAGE_SIZE);
- r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE, domain, bo_ptr, - NULL, cpu_addr); + r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE, + AMDGPU_GEM_DOMAIN_VRAM, bo_ptr, NULL, + cpu_addr); if (r) return r;
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -284,7 +284,7 @@ int amdgpu_bo_create_kernel(struct amdgp u32 domain, struct amdgpu_bo **bo_ptr, u64 *gpu_addr, void **cpu_addr); int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev, - uint64_t offset, uint64_t size, uint32_t domain, + uint64_t offset, uint64_t size, struct amdgpu_bo **bo_ptr, void **cpu_addr); int amdgpu_bo_create_user(struct amdgpu_device *adev, struct amdgpu_bo_param *bp, --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1575,7 +1575,6 @@ static int amdgpu_ttm_fw_reserve_vram_in return amdgpu_bo_create_kernel_at(adev, adev->mman.fw_vram_usage_start_offset, adev->mman.fw_vram_usage_size, - AMDGPU_GEM_DOMAIN_VRAM, &adev->mman.fw_vram_usage_reserved_bo, &adev->mman.fw_vram_usage_va); } @@ -1600,7 +1599,6 @@ static int amdgpu_ttm_drv_reserve_vram_i return amdgpu_bo_create_kernel_at(adev, adev->mman.drv_vram_usage_start_offset, adev->mman.drv_vram_usage_size, - AMDGPU_GEM_DOMAIN_VRAM, &adev->mman.drv_vram_usage_reserved_bo, NULL); } @@ -1681,7 +1679,6 @@ static int amdgpu_ttm_reserve_tmr(struct ret = amdgpu_bo_create_kernel_at(adev, ctx->c2p_train_data_offset, ctx->train_data_size, - AMDGPU_GEM_DOMAIN_VRAM, &ctx->c2p_bo, NULL); if (ret) { @@ -1695,7 +1692,6 @@ static int amdgpu_ttm_reserve_tmr(struct ret = amdgpu_bo_create_kernel_at(adev, adev->gmc.real_vram_size - adev->mman.discovery_tmr_size, adev->mman.discovery_tmr_size, - AMDGPU_GEM_DOMAIN_VRAM, &adev->mman.discovery_memory, NULL); if (ret) { @@ -1796,21 +1792,18 @@ int amdgpu_ttm_init(struct amdgpu_device * avoid display artifacts while transitioning between pre-OS * and driver. */ r = amdgpu_bo_create_kernel_at(adev, 0, adev->mman.stolen_vga_size, - AMDGPU_GEM_DOMAIN_VRAM, &adev->mman.stolen_vga_memory, NULL); if (r) return r; r = amdgpu_bo_create_kernel_at(adev, adev->mman.stolen_vga_size, adev->mman.stolen_extended_size, - AMDGPU_GEM_DOMAIN_VRAM, &adev->mman.stolen_extended_memory, NULL); if (r) return r; r = amdgpu_bo_create_kernel_at(adev, adev->mman.stolen_reserved_offset, adev->mman.stolen_reserved_size, - AMDGPU_GEM_DOMAIN_VRAM, &adev->mman.stolen_reserved_memory, NULL); if (r) --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -391,7 +391,6 @@ static void amdgpu_virt_ras_reserve_bps( */ if (amdgpu_bo_create_kernel_at(adev, bp << AMDGPU_GPU_PAGE_SHIFT, AMDGPU_GPU_PAGE_SIZE, - AMDGPU_GEM_DOMAIN_VRAM, &bo, NULL)) DRM_DEBUG("RAS WARN: reserve vram for retired page %llx fail\n", bp);
From: Lijo Lazar lijo.lazar@amd.com
commit db3b5cb64a9ca301d14ed027e470834316720e42 upstream
Use the generic term fw_reserved_memory for FW reserve region. This region may also hold discovery TMR in addition to other reserve regions. This region size could be larger than discovery tmr size, hence don't change the discovery tmr size based on this.
Signed-off-by: Lijo Lazar lijo.lazar@amd.com Reviewed-by: Le Ma le.ma@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com [ This change fixes reading IP discovery from debugfs. It needed to be hand modified because: * GC 9.4.3 support isn't introduced in older kernels until 228ce176434b ("drm/amdgpu: Handle VRAM dependencies on GFXIP9.4.3") * It also changed because of 58ab2c08d708 (drm/amdgpu: use VRAM|GTT for a bunch of kernel allocations) not being present. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2748 ] Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 33 ++++++++++++++++++-------------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 3 +- 2 files changed, 21 insertions(+), 15 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1625,14 +1625,15 @@ static int amdgpu_ttm_training_reserve_v return 0; }
-static void amdgpu_ttm_training_data_block_init(struct amdgpu_device *adev) +static void amdgpu_ttm_training_data_block_init(struct amdgpu_device *adev, + uint32_t reserve_size) { struct psp_memory_training_context *ctx = &adev->psp.mem_train_ctx;
memset(ctx, 0, sizeof(*ctx));
ctx->c2p_train_data_offset = - ALIGN((adev->gmc.mc_vram_size - adev->mman.discovery_tmr_size - SZ_1M), SZ_1M); + ALIGN((adev->gmc.mc_vram_size - reserve_size - SZ_1M), SZ_1M); ctx->p2c_train_data_offset = (adev->gmc.mc_vram_size - GDDR6_MEM_TRAINING_OFFSET); ctx->train_data_size = @@ -1650,9 +1651,10 @@ static void amdgpu_ttm_training_data_blo */ static int amdgpu_ttm_reserve_tmr(struct amdgpu_device *adev) { - int ret; struct psp_memory_training_context *ctx = &adev->psp.mem_train_ctx; bool mem_train_support = false; + uint32_t reserve_size = 0; + int ret;
if (!amdgpu_sriov_vf(adev)) { if (amdgpu_atomfirmware_mem_training_supported(adev)) @@ -1668,14 +1670,15 @@ static int amdgpu_ttm_reserve_tmr(struct * Otherwise, fallback to legacy approach to check and reserve tmr block for ip * discovery data and G6 memory training data respectively */ - adev->mman.discovery_tmr_size = - amdgpu_atomfirmware_get_fw_reserved_fb_size(adev); - if (!adev->mman.discovery_tmr_size) - adev->mman.discovery_tmr_size = DISCOVERY_TMR_OFFSET; + if (adev->bios) + reserve_size = + amdgpu_atomfirmware_get_fw_reserved_fb_size(adev); + if (!reserve_size) + reserve_size = DISCOVERY_TMR_OFFSET;
if (mem_train_support) { /* reserve vram for mem train according to TMR location */ - amdgpu_ttm_training_data_block_init(adev); + amdgpu_ttm_training_data_block_init(adev, reserve_size); ret = amdgpu_bo_create_kernel_at(adev, ctx->c2p_train_data_offset, ctx->train_data_size, @@ -1690,13 +1693,14 @@ static int amdgpu_ttm_reserve_tmr(struct }
ret = amdgpu_bo_create_kernel_at(adev, - adev->gmc.real_vram_size - adev->mman.discovery_tmr_size, - adev->mman.discovery_tmr_size, - &adev->mman.discovery_memory, + adev->gmc.real_vram_size - reserve_size, + reserve_size, + &adev->mman.fw_reserved_memory, NULL); if (ret) { DRM_ERROR("alloc tmr failed(%d)!\n", ret); - amdgpu_bo_free_kernel(&adev->mman.discovery_memory, NULL, NULL); + amdgpu_bo_free_kernel(&adev->mman.fw_reserved_memory, + NULL, NULL); return ret; }
@@ -1890,8 +1894,9 @@ void amdgpu_ttm_fini(struct amdgpu_devic /* return the stolen vga memory back to VRAM */ amdgpu_bo_free_kernel(&adev->mman.stolen_vga_memory, NULL, NULL); amdgpu_bo_free_kernel(&adev->mman.stolen_extended_memory, NULL, NULL); - /* return the IP Discovery TMR memory back to VRAM */ - amdgpu_bo_free_kernel(&adev->mman.discovery_memory, NULL, NULL); + /* return the FW reserved memory back to VRAM */ + amdgpu_bo_free_kernel(&adev->mman.fw_reserved_memory, NULL, + NULL); if (adev->mman.stolen_reserved_size) amdgpu_bo_free_kernel(&adev->mman.stolen_reserved_memory, NULL, NULL); --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -78,7 +78,8 @@ struct amdgpu_mman { /* discovery */ uint8_t *discovery_bin; uint32_t discovery_tmr_size; - struct amdgpu_bo *discovery_memory; + /* fw reserved memory */ + struct amdgpu_bo *fw_reserved_memory;
/* firmware VRAM reservation */ u64 fw_vram_usage_start_offset;
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 0fc6fea41c7122aa5f2088117f50144b507e13d7 which is commit a2b6e99d8a623544f3bdccd28ee35b9c1b00daa5 upstream.
It is reported to cause regression issues, so it should be reverted from the 6.1.y tree for now.
Reported-by: Thorsten Leemhuis regressions@leemhuis.info Link: https://lore.kernel.org/r/f0870e8f-0c66-57fd-f95d-18d014a11939@leemhuis.info Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8419 Cc: Manasi Navare navaremanasi@google.com Cc: Drew Davenport ddavenport@chromium.org Cc: Jouni Högander jouni.hogander@intel.com Cc: Imre Deak imre.deak@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Jani Nikula jani.nikula@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_display.c | 28 ++------------------------- 1 file changed, 3 insertions(+), 25 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -7123,8 +7123,6 @@ static void intel_update_crtc(struct int
intel_fbc_update(state, crtc);
- drm_WARN_ON(&i915->drm, !intel_display_power_is_enabled(i915, POWER_DOMAIN_DC_OFF)); - if (!modeset && (new_crtc_state->uapi.color_mgmt_changed || new_crtc_state->update_pipe)) @@ -7501,28 +7499,8 @@ static void intel_atomic_commit_tail(str drm_atomic_helper_wait_for_dependencies(&state->base); drm_dp_mst_atomic_wait_for_dependencies(&state->base);
- /* - * During full modesets we write a lot of registers, wait - * for PLLs, etc. Doing that while DC states are enabled - * is not a good idea. - * - * During fastsets and other updates we also need to - * disable DC states due to the following scenario: - * 1. DC5 exit and PSR exit happen - * 2. Some or all _noarm() registers are written - * 3. Due to some long delay PSR is re-entered - * 4. DC5 entry -> DMC saves the already written new - * _noarm() registers and the old not yet written - * _arm() registers - * 5. DC5 exit -> DMC restores a mixture of old and - * new register values and arms the update - * 6. PSR exit -> hardware latches a mixture of old and - * new register values -> corrupted frame, or worse - * 7. New _arm() registers are finally written - * 8. Hardware finally latches a complete set of new - * register values, and subsequent frames will be OK again - */ - wakeref = intel_display_power_get(dev_priv, POWER_DOMAIN_DC_OFF); + if (state->modeset) + wakeref = intel_display_power_get(dev_priv, POWER_DOMAIN_MODESET);
intel_atomic_prepare_plane_clear_colors(state);
@@ -7661,8 +7639,8 @@ static void intel_atomic_commit_tail(str * the culprit. */ intel_uncore_arm_unclaimed_mmio_detection(&dev_priv->uncore); + intel_display_power_put(dev_priv, POWER_DOMAIN_MODESET, wakeref); } - intel_display_power_put(dev_priv, POWER_DOMAIN_DC_OFF, wakeref); intel_runtime_pm_put(&dev_priv->runtime_pm, state->wakeref);
/*
On Wed, Aug 09, 2023 at 12:39:47PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.45-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
For RCU, Tested-by: Joel Fernandes (Google) joel@joelfernandes.org
thanks,
- Joel
thanks,
greg k-h
Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.1.45-rc1
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "drm/i915: Disable DC states for all commits"
Lijo Lazar lijo.lazar@amd.com drm/amdgpu: Use apt name for FW reserved region
Luben Tuikov luben.tuikov@amd.com drm/amdgpu: Remove unnecessary domain argument
Tong Liu01 Tong.Liu01@amd.com drm/amdgpu: add vram reservation based on vram_usagebyfirmware_v2_2
Mark Brown broonie@kernel.org arm64/ptrace: Don't enable SVE when setting streaming SVE
Namjae Jeon linkinjeon@kernel.org exfat: check if filename entries exceeds max filename length
Chao Yu chao@kernel.org f2fs: don't reset unchangable mount option in f2fs_remount()
Yangtao Li frank.li@vivo.com f2fs: fix to set flush_merge opt and show noflush_merge
Sean Christopherson seanjc@google.com selftests/rseq: Play nice with binaries statically linked against glibc 2.35+
Peichen Huang PeiChen.Huang@amd.com drm/amd/display: skip CLEAR_PAYLOAD_ID_TABLE if device mst_en is 0
Rodrigo Siqueira Rodrigo.Siqueira@amd.com drm/amd/display: Ensure that planes are in the same order
Alexander Stein alexander.stein@ew.tq-group.com drm/imx/ipuv3: Fix front porch adjustment upon hactive aligning
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/mm/altmap: Fix altmap boundary check
Christophe JAILLET christophe.jaillet@wanadoo.fr mtd: rawnand: fsl_upm: Fix an off-by one test in fun_exec_op()
Johan Jonker jbx6244@gmail.com mtd: rawnand: rockchip: Align hwecc vs. raw page helper layouts
Johan Jonker jbx6244@gmail.com mtd: rawnand: rockchip: fix oobfree offset and description
Roger Quadros rogerq@kernel.org mtd: rawnand: omap_elm: Fix incorrect type in assignment
Pavel Begunkov asml.silence@gmail.com io_uring: annotate offset timeout races
Chao Yu chao@kernel.org f2fs: fix to do sanity check on direct node in truncate_dnode()
Filipe Manana fdmanana@suse.com btrfs: remove BUG_ON()'s in add_new_free_space()
Jan Kara jack@suse.cz ext2: Drop fragment support
Jan Kara jack@suse.cz fs: Protect reconfiguration of sb read-write from racing writes
Alan Stern stern@rowland.harvard.edu net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp debugobjects: Recheck debug_objects_enabled before reporting
Sungwoo Kim iam@sung-woo.kim Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
Prince Kumar Maurya princekumarmaurya06@gmail.com fs/sysv: Null check to prevent null-ptr-deref bug
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_load_attr_list()
Roman Gushchin roman.gushchin@linux.dev mm: kmem: fix a NULL pointer dereference in obj_stock_flush_required()
Linus Torvalds torvalds@linux-foundation.org file: reinstate f_pos locking optimization for regular files
Hou Tao houtao1@huawei.com bpf, cpumap: Make sure kthread is running before map update returns
Geert Uytterhoeven geert+renesas@glider.be clk: imx93: Propagate correct error in imx93_clocks_probe()
Andi Shyti andi.shyti@linux.intel.com drm/i915/gt: Cleanup aux invalidation registers
Janusz Krzysztofik janusz.krzysztofik@linux.intel.com drm/i915: Fix premature release of request's reusable memory
Guchun Chen guchun.chen@amd.com drm/ttm: check null pointer before accessing when swapping
Aleksa Sarai cyphar@cyphar.com open: make RESOLVE_CACHED correctly test for O_TMPFILE
Mark Brown broonie@kernel.org arm64/fpsimd: Sync FPSIMD state with SVE for SME only systems
Mark Brown broonie@kernel.org arm64/fpsimd: Clear SME state in the target task when setting the VL
Mark Brown broonie@kernel.org arm64/fpsimd: Sync and zero pad FPSIMD state for streaming SVE
Naveen N Rao naveen@kernel.org powerpc/ftrace: Create a dummy stackframe to fix stack unwind
Jiri Olsa jolsa@kernel.org bpf: Disable preemption in bpf_event_output
Ilya Dryomov idryomov@gmail.com rbd: prevent busy loop when requesting exclusive lock
Michael Kelley mikelley@microsoft.com x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction
Paul Fertser fercerpav@gmail.com wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC)
Laszlo Ersek lersek@redhat.com net: tap_open(): set sk_uid from current_fsuid()
Laszlo Ersek lersek@redhat.com net: tun_chr_open(): set sk_uid from current_fsuid()
Dinh Nguyen dinguyen@kernel.org arm64: dts: stratix10: fix incorrect I2C property for SCL signal
Jiri Olsa jolsa@kernel.org bpf: Disable preemption in bpf_perf_event_output
Arseniy Krasnov AVKrasnov@sberdevices.ru mtd: rawnand: meson: fix OOB available bytes for ECC
Olivier Maignial olivier.maignial@hotmail.fr mtd: spinand: toshiba: Fix ecc_get_status
Sungjong Seo sj1557.seo@samsung.com exfat: release s_lock before calling dir_emit()
gaoming gaoming20@hihonor.com exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfree
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org firmware: arm_scmi: Drop OF node reference in the transport channel setup
Xiubo Li xiubli@redhat.com ceph: defer stopping mdsc delayed_work
Ross Maynard bids.7405@bigpond.com USB: zaurus: Add ID for A-300/B-500/C-700
Ilya Dryomov idryomov@gmail.com libceph: fix potential hang in ceph_osdc_notify()
Michael Kelley mikelley@microsoft.com scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices
Steffen Maier maier@linux.ibm.com scsi: zfcp: Defer fc_rport blocking until after ADISC response
Boqun Feng boqun.feng@gmail.com rust: allocator: Prevent mis-aligned allocation
Eric Dumazet edumazet@google.com tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
Eric Dumazet edumazet@google.com tcp_metrics: annotate data-races around tm->tcpm_net
Eric Dumazet edumazet@google.com tcp_metrics: annotate data-races around tm->tcpm_vals[]
Eric Dumazet edumazet@google.com tcp_metrics: annotate data-races around tm->tcpm_lock
Eric Dumazet edumazet@google.com tcp_metrics: annotate data-races around tm->tcpm_stamp
Eric Dumazet edumazet@google.com tcp_metrics: fix addr_same() helper
Jonas Gorski jonas.gorski@bisdn.de prestera: fix fallback to previous version on same major version
Jianbo Liu jianbol@nvidia.com net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio
Jianbo Liu jianbol@nvidia.com net/mlx5: fs_core: Make find_closest_ft more generic
Benjamin Poirier bpoirier@nvidia.com vxlan: Fix nexthop hash size
Yue Haibing yuehaibing@huawei.com ip6mr: Fix skb_under_panic in ip6mr_cache_report()
Alexandra Winter wintera@linux.ibm.com s390/qeth: Don't call dev_close/dev_open (DOWN/UP)
Lin Ma linma@zju.edu.cn net: dcb: choose correct policy to parse DCB_ATTR_BCN
Michael Chan michael.chan@broadcom.com bnxt_en: Fix max_mtu setting for multi-buf XDP
Somnath Kotur somnath.kotur@broadcom.com bnxt_en: Fix page pool logic for page size >= 64K
Mark Brown broonie@kernel.org net: netsec: Ignore 'phy-mode' on SynQuacer in DT mode
Yuanjun Gong ruc_gongyuanjun@163.com net: korina: handle clk prepare error in korina_probe()
Dan Carpenter dan.carpenter@linaro.org net: ll_temac: fix error checking of irq_of_parse_and_map()
Tomas Glozar tglozar@redhat.com bpf: sockmap: Remove preempt_disable in sock_map_sk_acquire
valis sec@valis.email net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
valis sec@valis.email net/sched: cls_fw: No longer copy tcf_result on update to avoid use-after-free
valis sec@valis.email net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
Hou Tao houtao1@huawei.com bpf, cpumap: Handle skb as well when clean up ptr_ring
Rafal Rogalski rafalx.rogalski@intel.com ice: Fix RDMA VSI removal during queue rebuild
Kuniyuki Iwashima kuniyu@amazon.com net/sched: taprio: Limit TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME to INT_MAX.
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_priority
Eric Dumazet edumazet@google.com net: add missing data-race annotation for sk_ll_usec
Eric Dumazet edumazet@google.com net: add missing data-race annotations around sk->sk_peek_off
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_mark
Eric Dumazet edumazet@google.com net: add missing READ_ONCE(sk->sk_rcvbuf) annotation
Eric Dumazet edumazet@google.com net: add missing READ_ONCE(sk->sk_sndbuf) annotation
Eric Dumazet edumazet@google.com net: add missing READ_ONCE(sk->sk_rcvlowat) annotation
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_max_pacing_rate
Eric Dumazet edumazet@google.com net: annotate data-race around sk->sk_txrehash
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_reserved_mem
Konstantin Khorenko khorenko@virtuozzo.com qed: Fix scheduling in a tasklet while getting stats
Chengfeng Ye dg573847474@gmail.com mISDN: hfcpci: Fix potential deadlock on &hc->lock
Jamal Hadi Salim jhs@mojatatu.com net: sched: cls_u32: Fix match key mis-addressing
Georg Müller georgmueller@gmx.net perf test uprobe_from_different_cu: Skip if there is no gcc
Yuanjun Gong ruc_gongyuanjun@163.com net: dsa: fix value check in bcm_sf2_sw_probe()
Lin Ma linma@zju.edu.cn rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length
Lin Ma linma@zju.edu.cn bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing
Jianbo Liu jianbol@nvidia.com net/mlx5e: Move representor neigh cleanup to profile cleanup_tx
Amir Tzin amirtz@nvidia.com net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set
Yuanjun Gong ruc_gongyuanjun@163.com net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
Zhengchao Shao shaozhengchao@huawei.com net/mlx5: fix potential memory leak in mlx5e_init_rep_rx
Zhengchao Shao shaozhengchao@huawei.com net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
Zhengchao Shao shaozhengchao@huawei.com net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups
Ilan Peer ilan.peer@intel.com wifi: cfg80211: Fix return value in scan logic
Gao Xiang xiang@kernel.org erofs: fix wrong primary bvec selection on deduplicated extents
Heiko Carstens hca@linux.ibm.com KVM: s390: fix sthyi error handling
ndesaulniers@google.com ndesaulniers@google.com word-at-a-time: use the same return type for has_zero regardless of endianness
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Fix chan_free cleanup on SMC
Yury Norov yury.norov@gmail.com lib/bitmap: workaround const_eval test build failure
Punit Agrawal punit.agrawal@bytedance.com firmware: smccc: Fix use of uninitialised results structure
Benjamin Gaignard benjamin.gaignard@collabora.com arm64: dts: freescale: Fix VPU G2 clock
Hugo Villeneuve hvilleneuve@dimonoff.com arm64: dts: imx8mn-var-som: add missing pull-up for onboard PHY reset pinmux
Yashwanth Varakala y.varakala@phytec.de arm64: dts: phycore-imx8mm: Correction in gpio-line-names
Yashwanth Varakala y.varakala@phytec.de arm64: dts: phycore-imx8mm: Label typo-fix of VPU
Tim Harvey tharvey@gateworks.com arm64: dts: imx8mm-venice-gw7904: disable disp_blk_ctrl
Tim Harvey tharvey@gateworks.com arm64: dts: imx8mm-venice-gw7903: disable disp_blk_ctrl
Robin Murphy robin.murphy@arm.com iommu/arm-smmu-v3: Document nesting-related errata
Robin Murphy robin.murphy@arm.com iommu/arm-smmu-v3: Add explicit feature for nesting
Robin Murphy robin.murphy@arm.com iommu/arm-smmu-v3: Document MMU-700 erratum 2812531
Robin Murphy robin.murphy@arm.com iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982
Alex Elder elder@linaro.org net: ipa: only reset hashed tables when supported
Shay Drory shayd@nvidia.com net/mlx5: Free irqs only on shutdown callback
Peter Zijlstra peterz@infradead.org perf: Fix function pointer case
Jens Axboe axboe@kernel.dk io_uring: gate iowait schedule on having pending requests
Diffstat:
Documentation/arm64/silicon-errata.rst | 4 + Makefile | 4 +- .../boot/dts/altera/socfpga_stratix10_socdk.dts | 2 +- .../dts/altera/socfpga_stratix10_socdk_nand.dts | 2 +- .../dts/freescale/imx8mm-phyboard-polis-rdk.dts | 2 +- .../boot/dts/freescale/imx8mm-phycore-som.dtsi | 4 +- .../boot/dts/freescale/imx8mm-venice-gw7903.dts | 4 + .../boot/dts/freescale/imx8mm-venice-gw7904.dts | 4 + arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi | 2 +- arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +- arch/arm64/kernel/fpsimd.c | 9 +- arch/arm64/kernel/ptrace.c | 8 +- arch/powerpc/include/asm/word-at-a-time.h | 2 +- arch/powerpc/kernel/trace/ftrace_mprofile.S | 9 +- arch/powerpc/mm/init_64.c | 3 +- arch/s390/kernel/sthyi.c | 6 +- arch/s390/kvm/intercept.c | 9 +- arch/x86/hyperv/hv_init.c | 21 +++++ drivers/block/rbd.c | 28 +++--- drivers/clk/imx/clk-imx93.c | 2 +- drivers/firmware/arm_scmi/mailbox.c | 4 +- drivers/firmware/arm_scmi/smc.c | 21 +++-- drivers/firmware/smccc/soc_id.c | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 104 +++++++++++++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 89 +++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 - drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++ drivers/gpu/drm/amd/display/dc/core/dc_link.c | 5 +- drivers/gpu/drm/amd/include/atomfirmware.h | 63 +++++++++++-- drivers/gpu/drm/i915/display/intel_display.c | 28 +----- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 8 +- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 16 ++-- drivers/gpu/drm/i915/gt/intel_lrc.c | 6 +- drivers/gpu/drm/i915/i915_active.c | 99 +++++++++++++------ drivers/gpu/drm/i915/i915_request.c | 11 +++ drivers/gpu/drm/imx/ipuv3-crtc.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 3 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 ++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 ++ drivers/isdn/hardware/mISDN/hfcpci.c | 10 +- drivers/mtd/nand/raw/fsl_upm.c | 2 +- drivers/mtd/nand/raw/meson_nand.c | 3 +- drivers/mtd/nand/raw/omap_elm.c | 24 ++--- drivers/mtd/nand/raw/rockchip-nand-controller.c | 45 +++++---- drivers/mtd/nand/spi/toshiba.c | 4 +- drivers/net/dsa/bcm_sf2.c | 8 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 59 +++++++----- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 6 +- drivers/net/ethernet/intel/ice/ice_main.c | 18 ++++ drivers/net/ethernet/korina.c | 3 +- .../net/ethernet/marvell/prestera/prestera_pci.c | 3 +- .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c | 4 +- .../mellanox/mlx5/core/en_accel/macsec_fs.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 10 ++ drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 20 ++-- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 105 ++++++++++++++++----- drivers/net/ethernet/mellanox/mlx5/core/mlx5_irq.h | 1 + drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 29 ++++++ .../ethernet/mellanox/mlx5/core/steering/dr_cmd.c | 5 +- drivers/net/ethernet/qlogic/qed/qed_dev_api.h | 16 ++++ drivers/net/ethernet/qlogic/qed/qed_fcoe.c | 19 +++- drivers/net/ethernet/qlogic/qed/qed_fcoe.h | 17 +++- drivers/net/ethernet/qlogic/qed/qed_hw.c | 26 ++++- drivers/net/ethernet/qlogic/qed/qed_iscsi.c | 19 +++- drivers/net/ethernet/qlogic/qed/qed_iscsi.h | 8 +- drivers/net/ethernet/qlogic/qed/qed_l2.c | 19 +++- drivers/net/ethernet/qlogic/qed/qed_l2.h | 24 +++++ drivers/net/ethernet/qlogic/qed/qed_main.c | 6 +- drivers/net/ethernet/socionext/netsec.c | 11 +++ drivers/net/ethernet/xilinx/ll_temac_main.c | 12 ++- drivers/net/ipa/ipa_table.c | 26 ++--- drivers/net/tap.c | 2 +- drivers/net/tun.c | 2 +- drivers/net/usb/cdc_ether.c | 21 +++++ drivers/net/usb/usbnet.c | 6 ++ drivers/net/usb/zaurus.c | 21 +++++ drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c | 6 +- drivers/s390/net/qeth_core.h | 1 - drivers/s390/net/qeth_core_main.c | 2 - drivers/s390/net/qeth_l2_main.c | 9 +- drivers/s390/net/qeth_l3_main.c | 8 +- drivers/s390/scsi/zfcp_fc.c | 6 +- drivers/scsi/storvsc_drv.c | 4 + fs/btrfs/block-group.c | 51 ++++++---- fs/btrfs/block-group.h | 4 +- fs/btrfs/free-space-tree.c | 24 +++-- fs/ceph/mds_client.c | 4 +- fs/ceph/mds_client.h | 5 + fs/ceph/super.c | 10 ++ fs/erofs/zdata.c | 7 +- fs/exfat/balloc.c | 6 +- fs/exfat/dir.c | 36 +++---- fs/ext2/ext2.h | 12 --- fs/ext2/super.c | 23 +---- fs/f2fs/f2fs.h | 1 - fs/f2fs/file.c | 5 - fs/f2fs/node.c | 14 ++- fs/f2fs/super.c | 43 ++++++--- fs/file.c | 18 +++- fs/ntfs3/attrlist.c | 4 +- fs/open.c | 2 +- fs/super.c | 11 ++- fs/sysv/itree.c | 4 + include/asm-generic/word-at-a-time.h | 2 +- include/linux/f2fs_fs.h | 1 + include/net/inet_sock.h | 7 +- include/net/ip.h | 2 +- include/net/route.h | 4 +- include/net/vxlan.h | 4 +- io_uring/io_uring.c | 23 +++-- io_uring/timeout.c | 2 +- kernel/bpf/cpumap.c | 35 ++++--- kernel/events/core.c | 8 +- kernel/trace/bpf_trace.c | 17 +++- lib/Makefile | 6 ++ lib/debugobjects.c | 9 ++ lib/test_bitmap.c | 8 +- mm/memcontrol.c | 19 ++-- net/bluetooth/l2cap_sock.c | 2 + net/ceph/osd_client.c | 20 ++-- net/core/bpf_sk_storage.c | 5 +- net/core/rtnetlink.c | 8 +- net/core/sock.c | 45 +++++---- net/core/sock_map.c | 2 - net/dcb/dcbnl.c | 2 +- net/dccp/ipv6.c | 4 +- net/ipv4/inet_diag.c | 4 +- net/ipv4/ip_output.c | 8 +- net/ipv4/ip_sockglue.c | 2 +- net/ipv4/raw.c | 2 +- net/ipv4/route.c | 4 +- net/ipv4/tcp_ipv4.c | 4 +- net/ipv4/tcp_metrics.c | 70 +++++++++----- net/ipv6/ip6mr.c | 2 +- net/ipv6/ping.c | 2 +- net/ipv6/raw.c | 6 +- net/ipv6/route.c | 7 +- net/ipv6/tcp_ipv6.c | 9 +- net/ipv6/udp.c | 4 +- net/l2tp/l2tp_ip6.c | 2 +- net/mptcp/sockopt.c | 2 +- net/netfilter/nft_socket.c | 2 +- net/netfilter/xt_socket.c | 4 +- net/packet/af_packet.c | 12 +-- net/sched/cls_fw.c | 1 - net/sched/cls_route.c | 1 - net/sched/cls_u32.c | 57 +++++++++-- net/sched/sch_taprio.c | 15 ++- net/smc/af_smc.c | 2 +- net/unix/af_unix.c | 2 +- net/wireless/scan.c | 2 +- net/xdp/xsk.c | 2 +- net/xfrm/xfrm_policy.c | 2 +- rust/bindings/bindings_helper.h | 1 + rust/kernel/allocator.rs | 74 ++++++++++++--- .../tests/shell/test_uprobe_from_different_cu.sh | 8 +- tools/testing/selftests/rseq/rseq.c | 28 ++++-- .../tc-testing/tc-tests/qdiscs/taprio.json | 25 +++++ 162 files changed, 1576 insertions(+), 647 deletions(-)
Hello,
On 2023-08-09T12:39:47+02:00 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.45-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] 02a4c6c322d1 ("Linux 6.1.45-rc1")
Thanks, SJ
[...]
---
ok 1 selftests: damon: debugfs_attrs.sh ok 2 selftests: damon: debugfs_schemes.sh ok 3 selftests: damon: debugfs_target_ids.sh ok 4 selftests: damon: debugfs_empty_targets.sh ok 5 selftests: damon: debugfs_huge_count_read_write.sh ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: sysfs.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_m68k.sh ok 12 selftests: damon-tests: build_arm64.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh
PASS
On 8/9/23 03:39, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.45-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.1.45-rc1
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "drm/i915: Disable DC states for all commits"
Lijo Lazar lijo.lazar@amd.com drm/amdgpu: Use apt name for FW reserved region
Luben Tuikov luben.tuikov@amd.com drm/amdgpu: Remove unnecessary domain argument
Tong Liu01 Tong.Liu01@amd.com drm/amdgpu: add vram reservation based on vram_usagebyfirmware_v2_2
Mark Brown broonie@kernel.org arm64/ptrace: Don't enable SVE when setting streaming SVE
Namjae Jeon linkinjeon@kernel.org exfat: check if filename entries exceeds max filename length
Chao Yu chao@kernel.org f2fs: don't reset unchangable mount option in f2fs_remount()
Yangtao Li frank.li@vivo.com f2fs: fix to set flush_merge opt and show noflush_merge
Sean Christopherson seanjc@google.com selftests/rseq: Play nice with binaries statically linked against glibc 2.35+
Peichen Huang PeiChen.Huang@amd.com drm/amd/display: skip CLEAR_PAYLOAD_ID_TABLE if device mst_en is 0
Rodrigo Siqueira Rodrigo.Siqueira@amd.com drm/amd/display: Ensure that planes are in the same order
Alexander Stein alexander.stein@ew.tq-group.com drm/imx/ipuv3: Fix front porch adjustment upon hactive aligning
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/mm/altmap: Fix altmap boundary check
Christophe JAILLET christophe.jaillet@wanadoo.fr mtd: rawnand: fsl_upm: Fix an off-by one test in fun_exec_op()
Johan Jonker jbx6244@gmail.com mtd: rawnand: rockchip: Align hwecc vs. raw page helper layouts
Johan Jonker jbx6244@gmail.com mtd: rawnand: rockchip: fix oobfree offset and description
Roger Quadros rogerq@kernel.org mtd: rawnand: omap_elm: Fix incorrect type in assignment
Pavel Begunkov asml.silence@gmail.com io_uring: annotate offset timeout races
Chao Yu chao@kernel.org f2fs: fix to do sanity check on direct node in truncate_dnode()
Filipe Manana fdmanana@suse.com btrfs: remove BUG_ON()'s in add_new_free_space()
Jan Kara jack@suse.cz ext2: Drop fragment support
Jan Kara jack@suse.cz fs: Protect reconfiguration of sb read-write from racing writes
Alan Stern stern@rowland.harvard.edu net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp debugobjects: Recheck debug_objects_enabled before reporting
Sungwoo Kim iam@sung-woo.kim Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
Prince Kumar Maurya princekumarmaurya06@gmail.com fs/sysv: Null check to prevent null-ptr-deref bug
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_load_attr_list()
Roman Gushchin roman.gushchin@linux.dev mm: kmem: fix a NULL pointer dereference in obj_stock_flush_required()
Linus Torvalds torvalds@linux-foundation.org file: reinstate f_pos locking optimization for regular files
Hou Tao houtao1@huawei.com bpf, cpumap: Make sure kthread is running before map update returns
Geert Uytterhoeven geert+renesas@glider.be clk: imx93: Propagate correct error in imx93_clocks_probe()
Andi Shyti andi.shyti@linux.intel.com drm/i915/gt: Cleanup aux invalidation registers
Janusz Krzysztofik janusz.krzysztofik@linux.intel.com drm/i915: Fix premature release of request's reusable memory
Guchun Chen guchun.chen@amd.com drm/ttm: check null pointer before accessing when swapping
Aleksa Sarai cyphar@cyphar.com open: make RESOLVE_CACHED correctly test for O_TMPFILE
Mark Brown broonie@kernel.org arm64/fpsimd: Sync FPSIMD state with SVE for SME only systems
Mark Brown broonie@kernel.org arm64/fpsimd: Clear SME state in the target task when setting the VL
Mark Brown broonie@kernel.org arm64/fpsimd: Sync and zero pad FPSIMD state for streaming SVE
Naveen N Rao naveen@kernel.org powerpc/ftrace: Create a dummy stackframe to fix stack unwind
Jiri Olsa jolsa@kernel.org bpf: Disable preemption in bpf_event_output
Ilya Dryomov idryomov@gmail.com rbd: prevent busy loop when requesting exclusive lock
Michael Kelley mikelley@microsoft.com x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction
Paul Fertser fercerpav@gmail.com wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC)
Laszlo Ersek lersek@redhat.com net: tap_open(): set sk_uid from current_fsuid()
Laszlo Ersek lersek@redhat.com net: tun_chr_open(): set sk_uid from current_fsuid()
Dinh Nguyen dinguyen@kernel.org arm64: dts: stratix10: fix incorrect I2C property for SCL signal
Jiri Olsa jolsa@kernel.org bpf: Disable preemption in bpf_perf_event_output
Arseniy Krasnov AVKrasnov@sberdevices.ru mtd: rawnand: meson: fix OOB available bytes for ECC
Olivier Maignial olivier.maignial@hotmail.fr mtd: spinand: toshiba: Fix ecc_get_status
Sungjong Seo sj1557.seo@samsung.com exfat: release s_lock before calling dir_emit()
gaoming gaoming20@hihonor.com exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfree
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org firmware: arm_scmi: Drop OF node reference in the transport channel setup
Xiubo Li xiubli@redhat.com ceph: defer stopping mdsc delayed_work
Ross Maynard bids.7405@bigpond.com USB: zaurus: Add ID for A-300/B-500/C-700
Ilya Dryomov idryomov@gmail.com libceph: fix potential hang in ceph_osdc_notify()
Michael Kelley mikelley@microsoft.com scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices
Steffen Maier maier@linux.ibm.com scsi: zfcp: Defer fc_rport blocking until after ADISC response
Boqun Feng boqun.feng@gmail.com rust: allocator: Prevent mis-aligned allocation
Eric Dumazet edumazet@google.com tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
Eric Dumazet edumazet@google.com tcp_metrics: annotate data-races around tm->tcpm_net
Eric Dumazet edumazet@google.com tcp_metrics: annotate data-races around tm->tcpm_vals[]
Eric Dumazet edumazet@google.com tcp_metrics: annotate data-races around tm->tcpm_lock
Eric Dumazet edumazet@google.com tcp_metrics: annotate data-races around tm->tcpm_stamp
Eric Dumazet edumazet@google.com tcp_metrics: fix addr_same() helper
Jonas Gorski jonas.gorski@bisdn.de prestera: fix fallback to previous version on same major version
Jianbo Liu jianbol@nvidia.com net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio
Jianbo Liu jianbol@nvidia.com net/mlx5: fs_core: Make find_closest_ft more generic
Benjamin Poirier bpoirier@nvidia.com vxlan: Fix nexthop hash size
Yue Haibing yuehaibing@huawei.com ip6mr: Fix skb_under_panic in ip6mr_cache_report()
Alexandra Winter wintera@linux.ibm.com s390/qeth: Don't call dev_close/dev_open (DOWN/UP)
Lin Ma linma@zju.edu.cn net: dcb: choose correct policy to parse DCB_ATTR_BCN
Michael Chan michael.chan@broadcom.com bnxt_en: Fix max_mtu setting for multi-buf XDP
Somnath Kotur somnath.kotur@broadcom.com bnxt_en: Fix page pool logic for page size >= 64K
Mark Brown broonie@kernel.org net: netsec: Ignore 'phy-mode' on SynQuacer in DT mode
Yuanjun Gong ruc_gongyuanjun@163.com net: korina: handle clk prepare error in korina_probe()
Dan Carpenter dan.carpenter@linaro.org net: ll_temac: fix error checking of irq_of_parse_and_map()
Tomas Glozar tglozar@redhat.com bpf: sockmap: Remove preempt_disable in sock_map_sk_acquire
valis sec@valis.email net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
valis sec@valis.email net/sched: cls_fw: No longer copy tcf_result on update to avoid use-after-free
valis sec@valis.email net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
Hou Tao houtao1@huawei.com bpf, cpumap: Handle skb as well when clean up ptr_ring
Rafal Rogalski rafalx.rogalski@intel.com ice: Fix RDMA VSI removal during queue rebuild
Kuniyuki Iwashima kuniyu@amazon.com net/sched: taprio: Limit TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME to INT_MAX.
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_priority
Eric Dumazet edumazet@google.com net: add missing data-race annotation for sk_ll_usec
Eric Dumazet edumazet@google.com net: add missing data-race annotations around sk->sk_peek_off
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_mark
Eric Dumazet edumazet@google.com net: add missing READ_ONCE(sk->sk_rcvbuf) annotation
Eric Dumazet edumazet@google.com net: add missing READ_ONCE(sk->sk_sndbuf) annotation
Eric Dumazet edumazet@google.com net: add missing READ_ONCE(sk->sk_rcvlowat) annotation
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_max_pacing_rate
Eric Dumazet edumazet@google.com net: annotate data-race around sk->sk_txrehash
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_reserved_mem
Konstantin Khorenko khorenko@virtuozzo.com qed: Fix scheduling in a tasklet while getting stats
Chengfeng Ye dg573847474@gmail.com mISDN: hfcpci: Fix potential deadlock on &hc->lock
Jamal Hadi Salim jhs@mojatatu.com net: sched: cls_u32: Fix match key mis-addressing
Georg Müller georgmueller@gmx.net perf test uprobe_from_different_cu: Skip if there is no gcc
Yuanjun Gong ruc_gongyuanjun@163.com net: dsa: fix value check in bcm_sf2_sw_probe()
Lin Ma linma@zju.edu.cn rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length
Lin Ma linma@zju.edu.cn bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing
Jianbo Liu jianbol@nvidia.com net/mlx5e: Move representor neigh cleanup to profile cleanup_tx
Amir Tzin amirtz@nvidia.com net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set
Yuanjun Gong ruc_gongyuanjun@163.com net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
Zhengchao Shao shaozhengchao@huawei.com net/mlx5: fix potential memory leak in mlx5e_init_rep_rx
Zhengchao Shao shaozhengchao@huawei.com net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
Zhengchao Shao shaozhengchao@huawei.com net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups
Ilan Peer ilan.peer@intel.com wifi: cfg80211: Fix return value in scan logic
Gao Xiang xiang@kernel.org erofs: fix wrong primary bvec selection on deduplicated extents
Heiko Carstens hca@linux.ibm.com KVM: s390: fix sthyi error handling
ndesaulniers@google.com ndesaulniers@google.com word-at-a-time: use the same return type for has_zero regardless of endianness
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Fix chan_free cleanup on SMC
Yury Norov yury.norov@gmail.com lib/bitmap: workaround const_eval test build failure
Punit Agrawal punit.agrawal@bytedance.com firmware: smccc: Fix use of uninitialised results structure
Benjamin Gaignard benjamin.gaignard@collabora.com arm64: dts: freescale: Fix VPU G2 clock
Hugo Villeneuve hvilleneuve@dimonoff.com arm64: dts: imx8mn-var-som: add missing pull-up for onboard PHY reset pinmux
Yashwanth Varakala y.varakala@phytec.de arm64: dts: phycore-imx8mm: Correction in gpio-line-names
Yashwanth Varakala y.varakala@phytec.de arm64: dts: phycore-imx8mm: Label typo-fix of VPU
Tim Harvey tharvey@gateworks.com arm64: dts: imx8mm-venice-gw7904: disable disp_blk_ctrl
Tim Harvey tharvey@gateworks.com arm64: dts: imx8mm-venice-gw7903: disable disp_blk_ctrl
Robin Murphy robin.murphy@arm.com iommu/arm-smmu-v3: Document nesting-related errata
Robin Murphy robin.murphy@arm.com iommu/arm-smmu-v3: Add explicit feature for nesting
Robin Murphy robin.murphy@arm.com iommu/arm-smmu-v3: Document MMU-700 erratum 2812531
Robin Murphy robin.murphy@arm.com iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982
Alex Elder elder@linaro.org net: ipa: only reset hashed tables when supported
Shay Drory shayd@nvidia.com net/mlx5: Free irqs only on shutdown callback
Peter Zijlstra peterz@infradead.org perf: Fix function pointer case
Jens Axboe axboe@kernel.dk io_uring: gate iowait schedule on having pending requests
Diffstat:
Documentation/arm64/silicon-errata.rst | 4 + Makefile | 4 +- .../boot/dts/altera/socfpga_stratix10_socdk.dts | 2 +- .../dts/altera/socfpga_stratix10_socdk_nand.dts | 2 +- .../dts/freescale/imx8mm-phyboard-polis-rdk.dts | 2 +- .../boot/dts/freescale/imx8mm-phycore-som.dtsi | 4 +- .../boot/dts/freescale/imx8mm-venice-gw7903.dts | 4 + .../boot/dts/freescale/imx8mm-venice-gw7904.dts | 4 + arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi | 2 +- arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +- arch/arm64/kernel/fpsimd.c | 9 +- arch/arm64/kernel/ptrace.c | 8 +- arch/powerpc/include/asm/word-at-a-time.h | 2 +- arch/powerpc/kernel/trace/ftrace_mprofile.S | 9 +- arch/powerpc/mm/init_64.c | 3 +- arch/s390/kernel/sthyi.c | 6 +- arch/s390/kvm/intercept.c | 9 +- arch/x86/hyperv/hv_init.c | 21 +++++ drivers/block/rbd.c | 28 +++--- drivers/clk/imx/clk-imx93.c | 2 +- drivers/firmware/arm_scmi/mailbox.c | 4 +- drivers/firmware/arm_scmi/smc.c | 21 +++-- drivers/firmware/smccc/soc_id.c | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 104 +++++++++++++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 89 +++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 - drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++ drivers/gpu/drm/amd/display/dc/core/dc_link.c | 5 +- drivers/gpu/drm/amd/include/atomfirmware.h | 63 +++++++++++-- drivers/gpu/drm/i915/display/intel_display.c | 28 +----- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 8 +- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 16 ++-- drivers/gpu/drm/i915/gt/intel_lrc.c | 6 +- drivers/gpu/drm/i915/i915_active.c | 99 +++++++++++++------ drivers/gpu/drm/i915/i915_request.c | 11 +++ drivers/gpu/drm/imx/ipuv3-crtc.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 3 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 ++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 ++ drivers/isdn/hardware/mISDN/hfcpci.c | 10 +- drivers/mtd/nand/raw/fsl_upm.c | 2 +- drivers/mtd/nand/raw/meson_nand.c | 3 +- drivers/mtd/nand/raw/omap_elm.c | 24 ++--- drivers/mtd/nand/raw/rockchip-nand-controller.c | 45 +++++---- drivers/mtd/nand/spi/toshiba.c | 4 +- drivers/net/dsa/bcm_sf2.c | 8 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 59 +++++++----- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 6 +- drivers/net/ethernet/intel/ice/ice_main.c | 18 ++++ drivers/net/ethernet/korina.c | 3 +- .../net/ethernet/marvell/prestera/prestera_pci.c | 3 +- .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c | 4 +- .../mellanox/mlx5/core/en_accel/macsec_fs.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 10 ++ drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 20 ++-- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 105 ++++++++++++++++----- drivers/net/ethernet/mellanox/mlx5/core/mlx5_irq.h | 1 + drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 29 ++++++ .../ethernet/mellanox/mlx5/core/steering/dr_cmd.c | 5 +- drivers/net/ethernet/qlogic/qed/qed_dev_api.h | 16 ++++ drivers/net/ethernet/qlogic/qed/qed_fcoe.c | 19 +++- drivers/net/ethernet/qlogic/qed/qed_fcoe.h | 17 +++- drivers/net/ethernet/qlogic/qed/qed_hw.c | 26 ++++- drivers/net/ethernet/qlogic/qed/qed_iscsi.c | 19 +++- drivers/net/ethernet/qlogic/qed/qed_iscsi.h | 8 +- drivers/net/ethernet/qlogic/qed/qed_l2.c | 19 +++- drivers/net/ethernet/qlogic/qed/qed_l2.h | 24 +++++ drivers/net/ethernet/qlogic/qed/qed_main.c | 6 +- drivers/net/ethernet/socionext/netsec.c | 11 +++ drivers/net/ethernet/xilinx/ll_temac_main.c | 12 ++- drivers/net/ipa/ipa_table.c | 26 ++--- drivers/net/tap.c | 2 +- drivers/net/tun.c | 2 +- drivers/net/usb/cdc_ether.c | 21 +++++ drivers/net/usb/usbnet.c | 6 ++ drivers/net/usb/zaurus.c | 21 +++++ drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c | 6 +- drivers/s390/net/qeth_core.h | 1 - drivers/s390/net/qeth_core_main.c | 2 - drivers/s390/net/qeth_l2_main.c | 9 +- drivers/s390/net/qeth_l3_main.c | 8 +- drivers/s390/scsi/zfcp_fc.c | 6 +- drivers/scsi/storvsc_drv.c | 4 + fs/btrfs/block-group.c | 51 ++++++---- fs/btrfs/block-group.h | 4 +- fs/btrfs/free-space-tree.c | 24 +++-- fs/ceph/mds_client.c | 4 +- fs/ceph/mds_client.h | 5 + fs/ceph/super.c | 10 ++ fs/erofs/zdata.c | 7 +- fs/exfat/balloc.c | 6 +- fs/exfat/dir.c | 36 +++---- fs/ext2/ext2.h | 12 --- fs/ext2/super.c | 23 +---- fs/f2fs/f2fs.h | 1 - fs/f2fs/file.c | 5 - fs/f2fs/node.c | 14 ++- fs/f2fs/super.c | 43 ++++++--- fs/file.c | 18 +++- fs/ntfs3/attrlist.c | 4 +- fs/open.c | 2 +- fs/super.c | 11 ++- fs/sysv/itree.c | 4 + include/asm-generic/word-at-a-time.h | 2 +- include/linux/f2fs_fs.h | 1 + include/net/inet_sock.h | 7 +- include/net/ip.h | 2 +- include/net/route.h | 4 +- include/net/vxlan.h | 4 +- io_uring/io_uring.c | 23 +++-- io_uring/timeout.c | 2 +- kernel/bpf/cpumap.c | 35 ++++--- kernel/events/core.c | 8 +- kernel/trace/bpf_trace.c | 17 +++- lib/Makefile | 6 ++ lib/debugobjects.c | 9 ++ lib/test_bitmap.c | 8 +- mm/memcontrol.c | 19 ++-- net/bluetooth/l2cap_sock.c | 2 + net/ceph/osd_client.c | 20 ++-- net/core/bpf_sk_storage.c | 5 +- net/core/rtnetlink.c | 8 +- net/core/sock.c | 45 +++++---- net/core/sock_map.c | 2 - net/dcb/dcbnl.c | 2 +- net/dccp/ipv6.c | 4 +- net/ipv4/inet_diag.c | 4 +- net/ipv4/ip_output.c | 8 +- net/ipv4/ip_sockglue.c | 2 +- net/ipv4/raw.c | 2 +- net/ipv4/route.c | 4 +- net/ipv4/tcp_ipv4.c | 4 +- net/ipv4/tcp_metrics.c | 70 +++++++++----- net/ipv6/ip6mr.c | 2 +- net/ipv6/ping.c | 2 +- net/ipv6/raw.c | 6 +- net/ipv6/route.c | 7 +- net/ipv6/tcp_ipv6.c | 9 +- net/ipv6/udp.c | 4 +- net/l2tp/l2tp_ip6.c | 2 +- net/mptcp/sockopt.c | 2 +- net/netfilter/nft_socket.c | 2 +- net/netfilter/xt_socket.c | 4 +- net/packet/af_packet.c | 12 +-- net/sched/cls_fw.c | 1 - net/sched/cls_route.c | 1 - net/sched/cls_u32.c | 57 +++++++++-- net/sched/sch_taprio.c | 15 ++- net/smc/af_smc.c | 2 +- net/unix/af_unix.c | 2 +- net/wireless/scan.c | 2 +- net/xdp/xsk.c | 2 +- net/xfrm/xfrm_policy.c | 2 +- rust/bindings/bindings_helper.h | 1 + rust/kernel/allocator.rs | 74 ++++++++++++--- .../tests/shell/test_uprobe_from_different_cu.sh | 8 +- tools/testing/selftests/rseq/rseq.c | 28 ++++-- .../tc-testing/tc-tests/qdiscs/taprio.json | 25 +++++ 162 files changed, 1576 insertions(+), 647 deletions(-)
On Wed, Aug 09, 2023 at 12:39:47PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully compiled and installed bindeb-pkgs on my computer (Acer Aspire E15, Intel Core i3 Haswell). No noticeable regressions.
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
Hi Greg
On Wed, Aug 9, 2023 at 7:54 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.45-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
6.1.45-rc1 tested.
Build successfully completed. Boot successfully completed. No dmesg regressions. Video output normal. Sound output normal.
Lenovo ThinkPad X1 Carbon Gen10(Intel i7-1260P(x86_64) arch linux)
Thanks
Tested-by: Takeshi Ogasawara takeshi.ogasawara@futuring-girl.com
On 8/9/23 3:39 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.45-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On 8/9/23 03:39, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
Building loongarch:defconfig ... failed -------------- Error log: <stdin>:569:2: warning: #warning syscall fstat not implemented [-Wcpp] arch/loongarch/kernel/setup.c: In function 'arch_cpu_finalize_init': arch/loongarch/kernel/setup.c:86:9: error: implicit declaration of function 'alternative_instructions'
Actually introduced in v6.1.44 with commit 08e86d42e2c9 ("loongarch/cpu: Switch to arch_cpu_finalize_init()"). Alternative instruction support was only introduced for loongarch in v6.2 with commit 19e5eb15b00c ("LoongArch: Add alternative runtime patching mechanism").
Guenter
On Thu, Aug 10, 2023 at 03:15:28AM -0700, Guenter Roeck wrote:
On 8/9/23 03:39, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
Building loongarch:defconfig ... failed
Error log: <stdin>:569:2: warning: #warning syscall fstat not implemented [-Wcpp] arch/loongarch/kernel/setup.c: In function 'arch_cpu_finalize_init': arch/loongarch/kernel/setup.c:86:9: error: implicit declaration of function 'alternative_instructions'
Actually introduced in v6.1.44 with commit 08e86d42e2c9 ("loongarch/cpu: Switch to arch_cpu_finalize_init()"). Alternative instruction support was only introduced for loongarch in v6.2 with commit 19e5eb15b00c ("LoongArch: Add alternative runtime patching mechanism").
Thanks for the report, I'll fix this after this release.
greg k-h
On Wed, Aug 09, 2023 at 12:39:47PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Conor Dooley conor.dooley@microchip.com
Thanks, Conor.
On Wed, Aug 09, 2023 at 12:39:47PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 153 fail: 4 Failed builds: loongarch:defconfig loongarch:allnoconfig loongarch:tinyconfig loongarch:allmodconfig Qemu test results: total: 521 pass: 513 fail: 8 Failed tests: arm:fuji-bmc:aspeed_g5_defconfig:notests:mem1G:mtd128,0,8,1:net,nic:aspeed-bmc-facebook-fuji:rootfs arm:bletchley-bmc,fmc-model=mt25qu02g,spi-model=mt25qu02g:aspeed_g5_defconfig:notests:mem1G:mtd256:net,nic:aspeed-bmc-facebook-bletchley:rootfs <all loongarch>
loongarch failures as already reported, introduced with v6.1.44.
The failed arm tests crash in f2fs (again - previously reported against the v6.1.43 release candidate).
[ 6.685458] 8<--- cut here --- [ 6.685593] Unable to handle kernel NULL pointer dereference at virtual address 00000034 [ 6.685725] [00000034] *pgd=00000000 [ 6.686010] Internal error: Oops: 5 [#1] SMP ARM [ 6.686209] CPU: 0 PID: 194 Comm: seedrng Not tainted 6.1.45-rc1-00128-g02a4c6c322d1 #1 [ 6.686350] Hardware name: Generic DT based system [ 6.686467] PC is at f2fs_issue_flush+0x160/0x210 [ 6.686821] LR is at f2fs_do_sync_file+0x7c8/0xaa8
Guenter
On Thu, Aug 10, 2023 at 09:14:43AM -0700, Guenter Roeck wrote:
On Wed, Aug 09, 2023 at 12:39:47PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 153 fail: 4 Failed builds: loongarch:defconfig loongarch:allnoconfig loongarch:tinyconfig loongarch:allmodconfig Qemu test results: total: 521 pass: 513 fail: 8 Failed tests: arm:fuji-bmc:aspeed_g5_defconfig:notests:mem1G:mtd128,0,8,1:net,nic:aspeed-bmc-facebook-fuji:rootfs arm:bletchley-bmc,fmc-model=mt25qu02g,spi-model=mt25qu02g:aspeed_g5_defconfig:notests:mem1G:mtd256:net,nic:aspeed-bmc-facebook-bletchley:rootfs
<all loongarch>
loongarch failures as already reported, introduced with v6.1.44.
The failed arm tests crash in f2fs (again - previously reported against the v6.1.43 release candidate).
[ 6.685458] 8<--- cut here --- [ 6.685593] Unable to handle kernel NULL pointer dereference at virtual address 00000034 [ 6.685725] [00000034] *pgd=00000000 [ 6.686010] Internal error: Oops: 5 [#1] SMP ARM [ 6.686209] CPU: 0 PID: 194 Comm: seedrng Not tainted 6.1.45-rc1-00128-g02a4c6c322d1 #1 [ 6.686350] Hardware name: Generic DT based system [ 6.686467] PC is at f2fs_issue_flush+0x160/0x210 [ 6.686821] LR is at f2fs_do_sync_file+0x7c8/0xaa8
Odd that you are the only one seeing this f2fs report, does no one else use f2fs on 6.1 systems?
thanks,
greg k-h
On 8/11/23 03:07, Greg Kroah-Hartman wrote:
On Thu, Aug 10, 2023 at 09:14:43AM -0700, Guenter Roeck wrote:
On Wed, Aug 09, 2023 at 12:39:47PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 153 fail: 4 Failed builds: loongarch:defconfig loongarch:allnoconfig loongarch:tinyconfig loongarch:allmodconfig Qemu test results: total: 521 pass: 513 fail: 8 Failed tests: arm:fuji-bmc:aspeed_g5_defconfig:notests:mem1G:mtd128,0,8,1:net,nic:aspeed-bmc-facebook-fuji:rootfs arm:bletchley-bmc,fmc-model=mt25qu02g,spi-model=mt25qu02g:aspeed_g5_defconfig:notests:mem1G:mtd256:net,nic:aspeed-bmc-facebook-bletchley:rootfs
<all loongarch>
loongarch failures as already reported, introduced with v6.1.44.
The failed arm tests crash in f2fs (again - previously reported against the v6.1.43 release candidate).
[ 6.685458] 8<--- cut here --- [ 6.685593] Unable to handle kernel NULL pointer dereference at virtual address 00000034 [ 6.685725] [00000034] *pgd=00000000 [ 6.686010] Internal error: Oops: 5 [#1] SMP ARM [ 6.686209] CPU: 0 PID: 194 Comm: seedrng Not tainted 6.1.45-rc1-00128-g02a4c6c322d1 #1 [ 6.686350] Hardware name: Generic DT based system [ 6.686467] PC is at f2fs_issue_flush+0x160/0x210 [ 6.686821] LR is at f2fs_do_sync_file+0x7c8/0xaa8
Odd that you are the only one seeing this f2fs report, does no one else use f2fs on 6.1 systems?
It would appear that none of the CI systems does.
Guenter
On Wed, Aug 09, 2023 at 12:39:47PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
For Rust, build tested and booted to see the trivial Rust sample output in the kernel log:
Tested-by: Miguel Ojeda ojeda@kernel.org # Rust
Cheers, Miguel
On Wed, 9 Aug 2023 at 16:21, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.45-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
While building Linux stable rc 6.1 x86_64 with clang-17 failed due to following warnings / errors.
Regressions found on x86_64:
- build/clang-nightly-lkftconfig-kselftest - build/clang-nightly-x86_64_defconfig - build/clang-nightly-lkftconfig - build/clang-lkftconfig - build/clang-nightly-allmodconfig
Build errors: ----- ld.lld: error: ./arch/x86/kernel/vmlinux.lds:193: at least one side of the expression must be absolute make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
upstream report: ----- - https://lore.kernel.org/llvm/CA+G9fYsdUeNu-gwbs0+T6XHi4hYYk=Y9725-wFhZ7gJMsp...
Proposed fix patch: ----- [PATCH] x86/srso: fix build breakage for LD=ld.lld - https://lore.kernel.org/lkml/20230809-gds-v1-1-eaac90b0cbcc@google.com/T/
This patch is yet to be backported and CC to stable.
-- Linaro LKFT https://lkft.linaro.org
On Fri, Aug 11, 2023 at 08:52:01AM +0530, Naresh Kamboju wrote:
Build errors:
ld.lld: error: ./arch/x86/kernel/vmlinux.lds:193: at least one side of the expression must be absolute make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
upstream report:
- https://lore.kernel.org/llvm/CA+G9fYsdUeNu-gwbs0+T6XHi4hYYk=Y9725-wFhZ7gJMspLDRA@mail.gmail.com/
Proposed fix patch:
[PATCH] x86/srso: fix build breakage for LD=ld.lld
This patch is yet to be backported and CC to stable.
It's now in -tip, I would expect it to make -rc6:
https://git.kernel.org/tip/cbe8ded48b939b9d55d2c5589ab56caa7b530709
It should have had 'Cc: stable@vger.kernel.org' but I hope the Fixes: tag alone will ensure it gets picked up once it hits mainline, especially since there are other fixes that will come in that pull.
Cheers, Nathan
On 8/10/23 20:22, Naresh Kamboju wrote:
On Wed, 9 Aug 2023 at 16:21, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.45-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
While building Linux stable rc 6.1 x86_64 with clang-17 failed due to following warnings / errors.
Regressions found on x86_64:
- build/clang-nightly-lkftconfig-kselftest
- build/clang-nightly-x86_64_defconfig
- build/clang-nightly-lkftconfig
- build/clang-lkftconfig
- build/clang-nightly-allmodconfig
Build errors:
ld.lld: error: ./arch/x86/kernel/vmlinux.lds:193: at least one side of the expression must be absolute make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
We see that with v5.10.y, v5.15.y, and v6.1.y when building ChromeOS images with clang/lld. There are additional problems with LTO and the built-in assembler. See https://www.linuxquestions.org/questions/slackware-14/error-building-kernel-... for a summary.
As far as I can see none of those problems has been fixed in the upstream kernel.
Guenter
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
upstream report:
- https://lore.kernel.org/llvm/CA+G9fYsdUeNu-gwbs0+T6XHi4hYYk=Y9725-wFhZ7gJMspLDRA@mail.gmail.com/
Proposed fix patch:
[PATCH] x86/srso: fix build breakage for LD=ld.lld
This patch is yet to be backported and CC to stable.
-- Linaro LKFT https://lkft.linaro.org
On Thu, Aug 10, 2023 at 08:41:42PM -0700, Guenter Roeck wrote:
On 8/10/23 20:22, Naresh Kamboju wrote:
Build errors:
ld.lld: error: ./arch/x86/kernel/vmlinux.lds:193: at least one side of the expression must be absolute make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
We see that with v5.10.y, v5.15.y, and v6.1.y when building ChromeOS images with clang/lld. There are additional problems with LTO and the built-in assembler. See https://www.linuxquestions.org/questions/slackware-14/error-building-kernel-... for a summary.
Yup, we have issues open for all of those:
https://github.com/ClangBuiltLinux/linux/issues/1907 https://github.com/ClangBuiltLinux/linux/issues/1909 https://github.com/ClangBuiltLinux/linux/issues/1911
1907 is fixed in -tip and I am sure it will make -rc6 [1].
1909 is fixed with [2] but it is sitting in x86/core (i.e., slated for next merge window). I am guessing at the time it was picked up, it was not fixing a noticeable issue, which is obviously not the case now. Nick reached out to the -tip folks on IRC to inquire about getting that applied to a branch that is going to Linus soon, as it is more of a process issue since it has conflicts with SRSO and an separate issue that was pointed out post-acceptance (which I addressed and pushed to [3] for testing). I never saw a response there (which is understandable, it is a busy time...) so looping the -tip folks in now, just to make sure it does not get lost (apologies if this is noise).
1911 is still being investigated (some additional eyes on it would not hurt).
[1]: https://git.kernel.org/tip/cbe8ded48b939b9d55d2c5589ab56caa7b530709 [2]: https://git.kernel.org/tip/973ab2d61f33dc85212c486e624af348c4eeb5c9 [3]: https://github.com/ClangBuiltLinux/linux/commit/150c42407f87463c27a2459e0684...
As far as I can see none of those problems has been fixed in the upstream kernel.
Indeed, embargos are fun... :)
Cheers, Nathan
On Thu, Aug 10, 2023 at 09:13:39PM -0700, Nathan Chancellor wrote:
1911 is still being investigated (some additional eyes on it would not hurt).
I'm hoping that we can take this one:
https://lore.kernel.org/r/20230809072200.543939260@infradead.org
which should resolve this issue, right?
On Sun, Aug 13, 2023 at 01:02:54PM +0200, Borislav Petkov wrote:
On Thu, Aug 10, 2023 at 09:13:39PM -0700, Nathan Chancellor wrote:
1911 is still being investigated (some additional eyes on it would not hurt).
I'm hoping that we can take this one:
https://lore.kernel.org/r/20230809072200.543939260@infradead.org
which should resolve this issue, right?
Yes, it does, as least for mainline and 6.4. The backport to 6.1 seems hairy (due to a lack of call depth tracking me thinks). It may be worth taking Nick's change there for simplicity's sake but I'll let y'all make that decision.
Cheers, Nathan
On Wed, 09 Aug 2023 12:39:47 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.45-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.1: 11 builds: 11 pass, 0 fail 28 boots: 28 pass, 0 fail 130 tests: 130 pass, 0 fail
Linux version: 6.1.45-rc1-g02a4c6c322d1 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Thierry Reding treding@nvidia.com
On Wed, Aug 09, 2023 at 12:39:47PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.45 release. There are 127 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Nothing looking amiss from here.. Tested-by: Conor Dooley conor.dooley@microchip.com
Thanks, Conor.
linux-stable-mirror@lists.linaro.org