This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.107-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.1.107-rc1
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Input: MT - limit max slots
Paolo Abeni pabeni@redhat.com selftests: net: more strict check in net_helper
Yuri Benditovich yuri.benditovich@daynix.com net: change maximum number of UDP segments to 128
Dave Kleikamp dave.kleikamp@oracle.com Revert "jfs: fix shift-out-of-bounds in dbJoin"
Jesse Brandeburg jesse.brandeburg@intel.com ice: fix W=1 headers mismatch
Felix Fietkau nbd@nbd.name udp: fix receiving fraglist GSO packets
Andreas Gruenbacher agruenba@redhat.com gfs2: Remove freeze_go_demote_ok
Andreas Gruenbacher agruenba@redhat.com gfs2: Remove LM_FLAG_PRIORITY flag
Andreas Gruenbacher agruenba@redhat.com gfs2: don't withdraw if init_threads() got interrupted
Andreas Gruenbacher agruenba@redhat.com gfs2: Fix another freeze/thaw hang
Felix Fietkau nbd@nbd.name wifi: cfg80211: fix receiving mesh packets without RFC1042 header
Felix Fietkau nbd@nbd.name wifi: mac80211: fix potential null pointer dereference
Felix Fietkau nbd@nbd.name wifi: mac80211: drop bogus static keywords in A-MSDU rx
Felix Fietkau nbd@nbd.name wifi: mac80211: fix receiving mesh packets in forwarding=0 networks
Felix Fietkau nbd@nbd.name wifi: mac80211: fix flow dissection for forwarded packets
Felix Fietkau nbd@nbd.name wifi: mac80211: fix mesh forwarding
Felix Fietkau nbd@nbd.name wifi: mac80211: fix mesh path discovery based on unicast packets
Johannes Berg johannes.berg@intel.com wifi: mac80211: add documentation for amsdu_mesh_control
Willem de Bruijn willemb@google.com net: drop bad gso csum_start and offset in virtio_net_hdr
Willem de Bruijn willemb@google.com net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation
Yan Zhai yan@cloudflare.com gso: fix dodgy bit handling for GSO_UDP_L4
Andrew Melnychenko andrew@daynix.com udp: allow header check for dodgy GSO_UDP_L4 packets.
Jan Höppner hoeppner@linux.ibm.com Revert "s390/dasd: Establish DMA alignment"
Li RongQing lirongqing@baidu.com KVM: x86: fire timer when it is migrated and expired, and in oneshot mode
Boyuan Zhang boyuan.zhang@amd.com drm/amdgpu/vcn: not pause dpg for unified queue
Boyuan Zhang boyuan.zhang@amd.com drm/amdgpu/vcn: identify unified queue in sw init
Lee, Chun-Yi joeyli.kernel@gmail.com Bluetooth: hci_ldisc: check HCI_UART_PROTO_READY flag in HCIUARTGETPROTO
Trond Myklebust trond.myklebust@hammerspace.com nfsd: Fix a regression in nfsd_setattr()
NeilBrown neilb@suse.de nfsd: don't call locks_release_private() twice concurrently
Jeff Layton jlayton@kernel.org nfsd: drop the nfsd_put helper
NeilBrown neilb@suse.de nfsd: call nfsd_last_thread() before final nfsd_put()
NeilBrown neilb@suse.de NFSD: simplify error paths in nfsd_svc()
NeilBrown neilb@suse.de nfsd: separate nfsd_last_thread() from nfsd_put()
NeilBrown neilb@suse.de nfsd: Simplify code around svc_exit_thread() call in nfsd()
Zi Yan ziy@nvidia.com mm/numa: no task_numa_fault() call if PTE is changed
Zi Yan ziy@nvidia.com mm/numa: no task_numa_fault() call if PMD is changed
Hailong Liu hailong.liu@oppo.com mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
Takashi Iwai tiwai@suse.de ALSA: timer: Relax start tick time check for slave timer elements
Javier Carrasco javier.carrasco.cruz@gmail.com hwmon: (ltc2992) Fix memory leak in ltc2992_parse_dt()
Eric Dumazet edumazet@google.com tcp: do not export tcp_twsk_purge()
Alex Hung alex.hung@amd.com Revert "drm/amd/display: Validate hw_points_num before using it"
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "usb: gadget: uvc: cleanup request when not in correct state"
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: only decrement add_addr_accepted for MPJ req
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: re-using ID of unused flushed subflows
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: re-using ID of unused removed subflows
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: re-using ID of unused removed ADD_ADDR
Peng Fan peng.fan@nxp.com pmdomain: imx: wait SSAR when i.MX93 power domain on
Ben Whitten ben.whitten@gmail.com mmc: dw_mmc: allow biu and ciu clocks to defer
Marc Zyngier maz@kernel.org KVM: arm64: Make ICC_*SGI*_EL1 undef in the absence of a vGICv3
Nikolay Kuratov kniv@yandex-team.ru cxgb4: add forgotten u64 ivlan cast before shift
Werner Sembach wse@tuxedocomputers.com Input: i8042 - use new forcenorestore quirk to replace old buggy quirk combination
Werner Sembach wse@tuxedocomputers.com Input: i8042 - add forcenorestore quirk to leave controller untouched even on s3
Siarhei Vishniakou svv@google.com HID: microsoft: Add rumble support to latest xbox controllers
Jason Gerecke jason.gerecke@wacom.com HID: wacom: Defer calculation of resolution until resolution_code is known
Jiaxun Yang jiaxun.yang@flygoat.com MIPS: Loongson64: Set timer mode in cpu-probe
Candice Li candice.li@amd.com drm/amdgpu: Validate TA binary size
Namjae Jeon linkinjeon@kernel.org ksmbd: the buffer of smb2 query dir response has at least 1 byte
Chaotian Jing chaotian.jing@mediatek.com scsi: core: Fix the return value of scsi_logical_block_count()
Griffin Kroah-Hartman griffin@kroah.com Bluetooth: MGMT: Add error handling to pair_device()
Dan Carpenter dan.carpenter@linaro.org mmc: mmc_test: Fix NULL dereference on allocation failure
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: cleanup FB if dpu_format_populate_layout fails
Abhinav Kumar quic_abhinavk@quicinc.com drm/msm/dp: reset the link phy params before link training
Abhinav Kumar quic_abhinavk@quicinc.com drm/msm/dp: fix the max supported bpp logic
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: don't play tricks with debug macros
Sean Anderson sean.anderson@linux.dev net: xilinx: axienet: Fix dangling multicast addresses
Sean Anderson sean.anderson@linux.dev net: xilinx: axienet: Always disable promiscuous mode
Bharat Bhushan bbhushan2@marvell.com octeontx2-af: Fix CPT AF register offset calculation
Pablo Neira Ayuso pablo@netfilter.org netfilter: flowtable: validate vlan header
Eric Dumazet edumazet@google.com ipv6: prevent possible UAF in ip6_xmit()
Eric Dumazet edumazet@google.com ipv6: fix possible UAF in ip6_finish_output2()
Eric Dumazet edumazet@google.com ipv6: prevent UAF in ip6_send_skb()
Stephen Hemminger stephen@networkplumber.org netem: fix return value if duplicate enqueue fails
Joseph Huang Joseph.Huang@garmin.com net: dsa: mv88e6xxx: Fix out-of-bound access
Dan Carpenter dan.carpenter@linaro.org dpaa2-switch: Fix error checking in dpaa2_switch_seed_bp()
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: fix ICE_LAST_OFFSET formula
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: fix page reuse when PAGE_SIZE is over 8k
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: Pull out next_to_clean bump out of ice_put_rx_buf()
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: Store page count inside ice_rx_buf
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: Add xdp_buff to ice_rx_ring struct
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: Prepare legacy-rx for upcoming XDP multi-buffer support
Nikolay Aleksandrov razor@blackwall.org bonding: fix xfrm state handling when clearing active slave
Nikolay Aleksandrov razor@blackwall.org bonding: fix xfrm real_dev null pointer dereference
Nikolay Aleksandrov razor@blackwall.org bonding: fix null pointer deref in bond_ipsec_offload_ok
Nikolay Aleksandrov razor@blackwall.org bonding: fix bond_ipsec_offload_ok return type
Thomas Bogendoerfer tbogendoerfer@suse.de ip6_tunnel: Fix broken GRO
Sebastian Andrzej Siewior bigeasy@linutronix.de netfilter: nft_counter: Synchronize nft_counter_reset() against reader.
Sebastian Andrzej Siewior bigeasy@linutronix.de netfilter: nft_counter: Disable BH in nft_counter_offload_stats().
Kuniyuki Iwashima kuniyu@amazon.com kcm: Serialise kcm_sendmsg() for the same socket.
Jeremy Kerr jk@codeconstruct.com.au net: mctp: test: Use correct skb for route input check
Florian Westphal fw@strlen.de tcp: prevent concurrent execution of tcp_sk_exit_batch
Eric Dumazet edumazet@google.com tcp/dccp: do not care about families in inet_twsk_purge()
Eric Dumazet edumazet@google.com tcp/dccp: bypass empty buckets in inet_twsk_purge()
Hangbin Liu liuhangbin@gmail.com selftests: udpgro: report error when receive failed
Lucas Karpinski lkarpins@redhat.com selftests/net: synchronize udpgro tests' tx and rx connection
Simon Horman horms@kernel.org tc-testing: don't access non-existent variable on exception
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: serialize access to the injection/extraction groups
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: fix QoS class for injected packets with "ocelot-8021q"
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and register injection
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: tag_ocelot: call only the relevant portion of __skb_vlan_pop() on TX
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: tag_ocelot: do not rely on skb_mac_header() for VLAN xmit
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: SMP: Fix assumption of Central always being Initiator
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_core: Fix LE quote calculation
Lang Yu Lang.Yu@amd.com drm/amdkfd: reserve the BO before validating it
Maximilian Luz luzmaximilian@gmail.com platform/surface: aggregator: Fix warning when controller is destroyed in probe
Rodrigo Siqueira Rodrigo.Siqueira@amd.com drm/amd/display: Adjust cursor position
Filipe Manana fdmanana@suse.com btrfs: send: allow cloning non-aligned extent if it ends at i_size
David Sterba dsterba@suse.com btrfs: replace sb::s_blocksize by fs_info::sectorsize
Long Li longli@microsoft.com net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
Mikulas Patocka mpatocka@redhat.com dm suspend: return -ERESTARTSYS instead of -EINTR
Breno Leitao leitao@debian.org i2c: tegra: Do not mark ACPI devices as irq safe
Michał Mirosław mirq-linux@rere.qmqm.pl i2c: tegra: allow VI support to be compiled out
Michał Mirosław mirq-linux@rere.qmqm.pl i2c: tegra: allow DVC support to be compiled out
Aurelien Jarno aurelien@aurel32.net media: solo6x10: replace max(a, min(b, c)) by clamp(b, a, c)
Eric Dumazet edumazet@google.com gtp: pull network headers in gtp_dev_xmit()
Phil Chang phil.chang@mediatek.com hrtimer: Prevent queuing of hrtimer without a function callback
Jesse Zhang jesse.zhang@amd.com drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent
Sagi Grimberg sagi@grimberg.me nvmet-rdma: fix possible bad dereference when freeing rsps
Baokun Li libaokun1@huawei.com ext4: set the type of max_zeroout to unsigned int to avoid overflow
Guanrui Huang guanrui.huang@linux.alibaba.com irqchip/gic-v3-its: Remove BUG_ON in its_vpe_irq_domain_alloc
Abdulrasaq Lawani abdulrasaqolawani@gmail.com fbdev: offb: replace of_node_put with __free(device_node)
Krishna Kurapati quic_kriskura@quicinc.com usb: dwc3: core: Skip setting event buffers for host only controllers
Gergo Koteles soyer@irl.hu platform/x86: lg-laptop: fix %s null argument warning
Adrian Hunter adrian.hunter@intel.com clocksource: Make watchdog and suspend-timing multiplication overflow safe
Biju Das biju.das.jz@bp.renesas.com irqchip/renesas-rzg2l: Do not set TIEN and TINT source at the same time
Alexander Gordeev agordeev@linux.ibm.com s390/iucv: fix receive buffer virtual vs physical address confusion
Oreoluwa Babatunde quic_obabatun@quicinc.com openrisc: Call setup_memory() earlier in the init sequence
NeilBrown neilb@suse.de NFS: avoid infinite loop in pnfs_update_layout.
Hannes Reinecke hare@suse.de nvmet-tcp: do not continue for invalid icreq
Jian Shen shenjian15@huawei.com net: hns3: add checking for vf id of mailbox
Alexandre Belloni alexandre.belloni@bootlin.com rtc: nct3018y: fix possible NULL dereference
Richard Fitzgerald rf@opensource.cirrus.com firmware: cirrus: cs_dsp: Initialize debugfs_root to invalid
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: bnep: Fix out-of-bound access
Keith Busch kbusch@kernel.org nvme: clear caller pointer on identify failure
Uwe Kleine-König u.kleine-koenig@pengutronix.de usb: gadget: fsl: Increase size of name buffer for endpoints
Zhiguo Niu zhiguo.niu@unisoc.com f2fs: fix to do sanity check in update_sit_entry
David Sterba dsterba@suse.com btrfs: delete pointless BUG_ON check on quota root in btrfs_qgroup_account_extent()
David Sterba dsterba@suse.com btrfs: change BUG_ON to assertion in tree_move_down()
David Sterba dsterba@suse.com btrfs: send: handle unexpected data in header buffer in begin_cmd()
David Sterba dsterba@suse.com btrfs: handle invalid root reference found in may_destroy_subvol()
David Sterba dsterba@suse.com btrfs: tests: allocate dummy fs_info and root in test_find_delalloc()
David Sterba dsterba@suse.com btrfs: change BUG_ON to assertion when checking for delayed_node root
David Sterba dsterba@suse.com btrfs: delayed-inode: drop pointless BUG_ON in __btrfs_remove_delayed_item()
Michael Ellerman mpe@ellerman.id.au powerpc/boot: Only free if realloc() succeeds
Li zeming zeming@nfschina.com powerpc/boot: Handle allocation failure in simple_realloc()
Helge Deller deller@gmx.de parisc: Use irq_enter_rcu() to fix warning at kernel/context_tracking.c:367
Christophe Kerello christophe.kerello@foss.st.com memory: stm32-fmc2-ebi: check regmap_read return value
Kees Cook keescook@chromium.org x86: Increase brk randomness entropy for 64-bit systems
Li Nan linan122@huawei.com md: clean up invalid BUG_ON in md_ioctl
Eric Dumazet edumazet@google.com netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
Martin Blumenstingl martin.blumenstingl@googlemail.com clocksource/drivers/arm_global_timer: Guard against division by zero
Stefan Hajnoczi stefanha@redhat.com virtiofs: forbid newlines in tags
Costa Shulyupin costa.shul@redhat.com hrtimer: Select housekeeping CPU during migration
Erico Nunes nunes.erico@gmail.com drm/lima: set gp bus_stop bit before hard reset
Kees Cook keescook@chromium.org net/sun3_82586: Avoid reading past buffer in debug output
Philipp Stanner pstanner@redhat.com media: drivers/media/dvb-core: copy user arrays safely
Justin Tee justin.tee@broadcom.com scsi: lpfc: Initialize status local variable in lpfc_sli4_repost_sgl_list()
Max Filippov jcmvbkbc@gmail.com fs: binfmt_elf_efpic: don't use missing interpreter's properties
Hans Verkuil hverkuil-cisco@xs4all.nl media: pci: cx23885: check cx23885_vdev_init() return
Neel Natu neelnatu@google.com kernfs: fix false-positive WARN(nr_mmapped) in kernfs_drain_open_files
Jan Kara jack@suse.cz quota: Remove BUG_ON from dqget()
Al Viro viro@zeniv.linux.org.uk fuse: fix UAF in rcu pathwalks
Al Viro viro@zeniv.linux.org.uk afs: fix __afs_break_callback() / afs_drop_open_mmap() race
Baokun Li libaokun1@huawei.com ext4: do not trim the group with corrupted block bitmap
Daniel Wagner dwagner@suse.de nvmet-trace: avoid dereferencing pointer too early
Andreas Gruenbacher agruenba@redhat.com gfs2: Refcounting fix in gfs2_thaw_super
Zijun Hu quic_zijuhu@quicinc.com Bluetooth: hci_conn: Check non NULL function before calling for HFP offload
Andy Yan andy.yan@rock-chips.com drm/rockchip: vop2: clear afbc en and transform bit for cluster window at linear mode
Kees Cook keescook@chromium.org hwmon: (pc87360) Bounds check data->innr usage
Bard Liao yung-chuan.liao@linux.intel.com ASoC: SOF: ipc4: check return value of snd_sof_ipc_msg_data
Kunwu Chan chentao@kylinos.cn powerpc/xics: Check return value of kasprintf in icp_native_map_one_cpu
Ashish Mhetre amhetre@nvidia.com memory: tegra: Skip SID programming if SID registers aren't set
Rob Clark robdclark@chromium.org drm/msm: Reduce fallout of fence signaling vs reclaim hangs
Li Lingfeng lilingfeng3@huawei.com block: Fix lockdep warning in blk_mq_mark_tag_wait
Samuel Holland samuel.holland@sifive.com arm64: Fix KASAN random tag seed initialization
Masahiro Yamada masahiroy@kernel.org rust: fix the default format for CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
Masahiro Yamada masahiroy@kernel.org rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
Miguel Ojeda ojeda@kernel.org rust: work around `bindgen` 0.69.0 issue
Miguel Ojeda ojeda@kernel.org kbuild: rust_is_available: handle failures calling `$RUSTC`/`$BINDGEN`
Miguel Ojeda ojeda@kernel.org kbuild: rust_is_available: normalize version matching
Antoniu Miclaus antoniu.miclaus@analog.com hwmon: (ltc2992) Avoid division by zero
Chengfeng Ye dg573847474@gmail.com IB/hfi1: Fix potential deadlock on &irq_src_lock and &dd->uctxt_lock
Gustavo A. R. Silva gustavoars@kernel.org clk: visconti: Add bounds-checking coverage for struct visconti_pll_provider
Mukesh Sisodiya mukesh.sisodiya@intel.com wifi: iwlwifi: fw: Fix debugfs command sending
Miri Korenblit miriam.rachel.korenblit@intel.com wifi: iwlwifi: abort scan when rfkill on but device enabled
Andreas Gruenbacher agruenba@redhat.com gfs2: setattr_chown: Add missing initialization
Mike Christie michael.christie@oracle.com scsi: spi: Fix sshdr use
Hans Verkuil hverkuil-cisco@xs4all.nl media: qcom: venus: fix incorrect return value
Mikko Perttunen mperttunen@nvidia.com drm/tegra: Zero-initialize iosys_map
Christian Brauner brauner@kernel.org binfmt_misc: cleanup on filesystem umount
Yu Kuai yukuai3@huawei.com md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'
Chengfeng Ye dg573847474@gmail.com media: s5p-mfc: Fix potential deadlock on condlock
Chengfeng Ye dg573847474@gmail.com staging: ks7010: disable bh on tx_dev_lock
Alex Hung alex.hung@amd.com drm/amd/display: Validate hw_points_num before using it
Michael Grzeschik m.grzeschik@pengutronix.de usb: gadget: uvc: cleanup request when not in correct state
David Lechner dlechner@baylibre.com staging: iio: resolver: ad2s1210: fix use before initialization
Hans Verkuil hverkuil-cisco@xs4all.nl media: radio-isa: use dev_name to fill in bus_info
Philip Yang Philip.Yang@amd.com drm/amdkfd: Move dma unmapping after TLB flush
Jarkko Nikula jarkko.nikula@linux.intel.com i3c: mipi-i3c-hci: Do not unmap region not mapped for transfer
Jarkko Nikula jarkko.nikula@linux.intel.com i3c: mipi-i3c-hci: Remove BUG() when Ring Abort request times out
Tomi Valkeinen tomi.valkeinen@ideasonboard.com drm/bridge: tc358768: Attempt to fix DSI horizontal timings
Heiko Carstens hca@linux.ibm.com s390/smp,mcck: fix early IPI handling
Zhu Yanjun yanjun.zhu@linux.dev RDMA/rtrs: Fix the problem of variable not initialized fully
Wolfram Sang wsa+renesas@sang-engineering.com i2c: riic: avoid potential division by zero
Kamalesh Babulal kamalesh.babulal@oracle.com cgroup: Avoid extra dereference in css_populate_dir()
Jeff Johnson quic_jjohnson@quicinc.com wifi: cw1200: Avoid processing an invalid TIM IE
Paul E. McKenney paulmck@kernel.org rcu: Eliminate rcu_gp_slow_unregister() false positive
Zhen Lei thunder.leizhen@huawei.com rcu: Dump memory object info if callback function is invalid
Johannes Berg johannes.berg@intel.com wifi: mac80211: fix BA session teardown race
Johannes Berg johannes.berg@intel.com wifi: cfg80211: check wiphy mutex is held for wdev mutex
Rand Deeb rand.sec96@gmail.com ssb: Fix division by zero issue in ssb_calc_clock_rate
Lee Jones lee@kernel.org drm/amd/amdgpu/imu_v11_0: Increase buffer size to ensure all possible values can be stored
Parsa Poorshikhian parsa.poorsh@gmail.com ALSA: hda/realtek: Fix noise from speakers on Lenovo IdeaPad 3 15IAU7
Jie Wang wangjie125@huawei.com net: hns3: fix a deadlock problem when config TC during resetting
Peiyang Wang wangpeiyang1@huawei.com net: hns3: use the user's cfg after reset
Jie Wang wangjie125@huawei.com net: hns3: fix wrong use of semaphore up
Phil Sutter phil@nwl.cc netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
Phil Sutter phil@nwl.cc netfilter: nf_tables: Introduce nf_tables_getobj_single
Phil Sutter phil@nwl.cc netfilter: nf_tables: Carry reset boolean in nft_obj_dump_ctx
Phil Sutter phil@nwl.cc netfilter: nf_tables: nft_obj_filter fits into cb->ctx
Phil Sutter phil@nwl.cc netfilter: nf_tables: Carry s_idx in nft_obj_dump_ctx
Phil Sutter phil@nwl.cc netfilter: nf_tables: A better name for nft_obj_filter
Phil Sutter phil@nwl.cc netfilter: nf_tables: Unconditionally allocate nft_obj_filter
Phil Sutter phil@nwl.cc netfilter: nf_tables: Drop pointless memset in nf_tables_dump_obj
Phil Sutter phil@nwl.cc netfilter: nf_tables: Audit log dump reset after the fact
Florian Westphal fw@strlen.de netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
Donald Hunter donald.hunter@gmail.com netfilter: flowtable: initialise extack before use
Tom Hughes tom@compton.nu netfilter: allow ipv6 fragments to arrive on different devices
Eugene Syromiatnikov esyr@redhat.com mptcp: correct MPTCP_SUBFLOW_ATTR_SSN_OFFSET reserved size
David Thompson davthompson@nvidia.com mlxbf_gige: disable RX filters until RX path initialized
Yue Haibing yuehaibing@huawei.com mlxbf_gige: Remove two unused function declarations
Pawel Dembicki paweldembicki@gmail.com net: dsa: vsc73xx: check busy flag in MDIO operations
Pawel Dembicki paweldembicki@gmail.com net: dsa: vsc73xx: use read_poll_timeout instead delay loop
Pawel Dembicki paweldembicki@gmail.com net: dsa: vsc73xx: pass value in phy_write operation
Radhey Shyam Pandey radhey.shyam.pandey@amd.com net: axienet: Fix register defines comment description
Dan Carpenter dan.carpenter@linaro.org atm: idt77252: prevent use after free in dequeue_rx()
Cosmin Ratiu cratiu@nvidia.com net/mlx5e: Correctly report errors for ethtool rx flows
Dragos Tatulea dtatulea@nvidia.com net/mlx5e: Take state lock during tx timeout reporter
Faizal Rahim faizal.abdul.rahim@linux.intel.com igc: Fix packet still tx after gate close by reducing i226 MAC retry buffer
Muhammad Husaini Zulkifli muhammad.husaini.zulkifli@intel.com igc: Correct the launchtime offset
Takashi Iwai tiwai@suse.de ALSA: usb: Fix UBSAN warning in parse_audio_unit()
Konstantin Komarov almaz.alexandrovich@paragon-software.com fs/ntfs3: Do copy_to_user out of run_lock
Pei Li peili.dev@gmail.com jfs: Fix shift-out-of-bounds in dbDiscardAG
Edward Adam Davis eadavis@qq.com jfs: fix null ptr deref in dtInsertEntry
Willem de Bruijn willemb@google.com fou: remove warn in gue_gro_receive on unsupported protocol
yunshui jiangyunshui@kylinos.cn bpf, net: Use DEV_STAT_INC()
Jan Kara jack@suse.cz udf: Fix bogus checksum computation in udf_rename()
Jan Kara jack@suse.cz ext4: do not create EA inode under buffer lock
Jan Kara jack@suse.cz ext4: fold quota accounting into ext4_xattr_inode_lookup_create()
Li Zhong floridsleeves@gmail.com ext4: check the return value of ext4_xattr_inode_dec_ref()
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: RFCOMM: Fix not validating setsockopt user input
Alexei Starovoitov ast@kernel.org bpf: Avoid kfree_rcu() under lock in bpf_lpm_trie.
Kees Cook keescook@chromium.org bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
Donald Hunter donald.hunter@gmail.com docs/bpf: Document BPF_MAP_TYPE_LPM_TRIE map
Johannes Berg johannes.berg@intel.com wifi: cfg80211: check A-MSDU format more carefully
Felix Fietkau nbd@nbd.name wifi: mac80211: add a workaround for receiving non-standard mesh A-MSDU
Felix Fietkau nbd@nbd.name wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces
Felix Fietkau nbd@nbd.name wifi: mac80211: remove mesh forwarding congestion check
Felix Fietkau nbd@nbd.name wifi: cfg80211: factor out bridge tunnel / RFC1042 header check
Felix Fietkau nbd@nbd.name wifi: cfg80211: move A-MSDU check in ieee80211_data_to_8023_exthdr
Felix Fietkau nbd@nbd.name wifi: mac80211: fix and simplify unencrypted drop check for mesh
Gavrilov Ilia Ilia.Gavrilov@infotecs.ru pppoe: Fix memory leak in pppoe_sendmsg()
Dmitry Antipov dmantipov@yandex.ru net: sctp: fix skb leak in sctp_inq_free()
Allison Henderson allison.henderson@oracle.com net:rds: Fix possible deadlock in rds_message_put
Jan Kara jack@suse.cz quota: Detect loops in quota tree
Gao Xiang xiang@kernel.org erofs: avoid debugging output for (de)compressed data
Edward Adam Davis eadavis@qq.com reiserfs: fix uninit-value in comp_keys
Phillip Lougher phillip@squashfs.org.uk Squashfs: fix variable overflow triggered by sysbot
Lizhi Xu lizhi.xu@windriver.com squashfs: squashfs_read_data need to check if the length is 0
Manas Ghandat ghandatmanas@gmail.com jfs: fix shift-out-of-bounds in dbJoin
Jakub Kicinski kuba@kernel.org net: don't dump stack on queue timeout
Yajun Deng yajun.deng@linux.dev net: sched: Print msecs when transmit queue time out
Johannes Berg johannes.berg@intel.com wifi: mac80211: fix change_address deadlock during unregister
Johannes Berg johannes.berg@intel.com wifi: mac80211: take wiphy lock for MAC addr change
Ying Hsu yinghsu@chromium.org Bluetooth: Fix hci_link_tx_to RCU lock usage
Andreas Gruenbacher agruenba@redhat.com gfs2: Stop using gfs2_make_fs_ro for withdraw
Andreas Gruenbacher agruenba@redhat.com gfs2: Rework freeze / thaw logic
Andreas Gruenbacher agruenba@redhat.com gfs2: Rename SDF_{FS_FROZEN => FREEZE_INITIATOR}
Andreas Gruenbacher agruenba@redhat.com gfs2: Rename gfs2_freeze_lock{ => _shared }
Andreas Gruenbacher agruenba@redhat.com gfs2: Rename the {freeze,thaw}_super callbacks
Andreas Gruenbacher agruenba@redhat.com gfs2: Rename remaining "transaction" glock references
Kees Cook keescook@chromium.org pid: Replace struct pid 1-element array with flex-array
Thomas Gleixner tglx@linutronix.de posix-timers: Ensure timer ID search-loop limit is valid
Andrii Nakryiko andrii@kernel.org bpf: drop unnecessary user-triggerable WARN_ONCE in verifierl log
Andrii Nakryiko andrii@kernel.org bpf: Split off basic BPF verifier log into separate file
Ivan Orlov ivan.orlov0322@gmail.com mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
Ivan Orlov ivan.orlov0322@gmail.com 9P FS: Fix wild-memory-access write in v9fs_get_acl
Theodore Ts'o tytso@mit.edu ext4, jbd2: add an optimized bmap for the journal inode
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: prevent WARNING in nilfs_dat_commit_end()
Leon Hwang leon.hwang@linux.dev bpf: Fix updating attached freplace prog in prog_array map
Claudio Imbrenda imbrenda@linux.ibm.com s390/uv: Panic for set and remove shared access UVC errors
Alex Deucher alexander.deucher@amd.com drm/amdgpu/jpeg2: properly set atomics vmid field
Al Viro viro@zeniv.linux.org.uk memcg_write_event_control(): fix a user-triggerable oops
Bas Nieuwenhuizen bas@basnieuwenhuizen.nl drm/amdgpu: Actually check flags for all context ops.
Qu Wenruo wqu@suse.com btrfs: tree-checker: add dev extent item checks
Naohiro Aota naohiro.aota@wdc.com btrfs: zoned: properly take lock to read/update block group's zoned variables
Waiman Long longman@redhat.com mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
Zhen Lei thunder.leizhen@huawei.com selinux: fix potential counting error in avc_add_xperms_decision()
Max Kellermann max.kellermann@ionos.com fs/netfs/fscache_cookie: add missing "n_accesses" check
Dan Carpenter dan.carpenter@linaro.org rtla/osnoise: Prevent NULL dereference in error handling
Andi Shyti andi.shyti@kernel.org i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
Al Viro viro@zeniv.linux.org.uk fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
Alexander Lobakin aleksander.lobakin@intel.com bitmap: introduce generic optimized bitmap_size()
Alexander Lobakin aleksander.lobakin@intel.com btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits()
Alexander Lobakin aleksander.lobakin@intel.com s390/cio: rename bitmap_size() -> idset_bitmap_size()
Alexander Lobakin aleksander.lobakin@intel.com fs/ntfs3: add prefix to bitmap_size() and use BITS_TO_U64()
Zhihao Cheng chengzhihao1@huawei.com vfs: Don't evict inode under the inode lru traversing context
Mikulas Patocka mpatocka@redhat.com dm persistent data: fix memory allocation failure
Khazhismel Kumykov khazhy@google.com dm resume: don't return EINVAL when signalled
Haibo Xu haibo1.xu@intel.com arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
Nam Cao namcao@linutronix.de riscv: change XIP's kernel_map.size to be size of the entire kernel
Stefan Haberland sth@linux.ibm.com s390/dasd: fix error recovery leading to data corruption on ESE devices
Mika Westerberg mika.westerberg@linux.intel.com thunderbolt: Mark XDomain as unplugged when router is removed
Mathias Nyman mathias.nyman@linux.intel.com xhci: Fix Panther point NULL pointer deref at full-speed re-enumeration
Juan José Arboleda soyjuanarbol@gmail.com ALSA: usb-audio: Support Yamaha P-125 quirk entry
Lianqin Hu hulianqin@vivo.com ALSA: usb-audio: Add delay quirk for VIVO USB-C-XE710 HEADSET
Eli Billauer eli.billauer@gmail.com char: xillybus: Check USB endpoints when probing device
Eli Billauer eli.billauer@gmail.com char: xillybus: Refine workqueue handling
Eli Billauer eli.billauer@gmail.com char: xillybus: Don't destroy workqueue from work item running on it
Jann Horn jannh@google.com fuse: Initialize beyond-EOF page contents before setting uptodate
Mathieu Othacehe othacehe@gnu.org tty: atmel_serial: use the correct RTS flag.
-------------
Diffstat:
Documentation/bpf/map_lpm_trie.rst | 181 ++++++++++ Documentation/filesystems/gfs2-glocks.rst | 3 +- Makefile | 4 +- arch/arm64/kernel/acpi_numa.c | 2 +- arch/arm64/kernel/setup.c | 3 - arch/arm64/kernel/smp.c | 2 + arch/arm64/kvm/sys_regs.c | 6 + arch/arm64/kvm/vgic/vgic.h | 7 + arch/mips/kernel/cpu-probe.c | 4 + arch/openrisc/kernel/setup.c | 6 +- arch/parisc/kernel/irq.c | 4 +- arch/powerpc/boot/simple_alloc.c | 7 +- arch/powerpc/sysdev/xics/icp-native.c | 2 + arch/riscv/mm/init.c | 4 +- arch/s390/include/asm/uv.h | 5 +- arch/s390/kernel/early.c | 12 +- arch/s390/kernel/smp.c | 4 +- arch/x86/kernel/process.c | 5 +- arch/x86/kvm/lapic.c | 8 +- block/blk-mq-tag.c | 5 +- drivers/atm/idt77252.c | 9 +- drivers/bluetooth/hci_ldisc.c | 3 +- drivers/char/xillybus/xillyusb.c | 42 ++- drivers/clk/visconti/pll.c | 6 +- drivers/clocksource/arm_global_timer.c | 11 +- drivers/firmware/cirrus/cs_dsp.c | 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 40 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 8 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 53 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 6 +- drivers/gpu/drm/amd/amdgpu/imu_v11_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 24 +- .../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 2 +- drivers/gpu/drm/bridge/tc358768.c | 215 ++++++++++-- drivers/gpu/drm/lima/lima_gp.c | 12 + drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 14 +- drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 3 + drivers/gpu/drm/msm/dp/dp_ctrl.c | 2 + drivers/gpu/drm/msm/dp/dp_panel.c | 19 +- drivers/gpu/drm/msm/msm_gem_shrinker.c | 2 +- drivers/gpu/drm/rockchip/rockchip_drm_vop2.c | 5 + drivers/gpu/drm/tegra/gem.c | 2 +- drivers/hid/hid-ids.h | 10 +- drivers/hid/hid-microsoft.c | 11 +- drivers/hid/wacom_wac.c | 4 +- drivers/hwmon/ltc2992.c | 8 +- drivers/hwmon/pc87360.c | 6 +- drivers/i2c/busses/i2c-qcom-geni.c | 4 +- drivers/i2c/busses/i2c-riic.c | 2 +- drivers/i2c/busses/i2c-tegra.c | 41 ++- drivers/i3c/master/mipi-i3c-hci/dma.c | 5 +- drivers/infiniband/hw/hfi1/chip.c | 5 +- drivers/infiniband/ulp/rtrs/rtrs.c | 2 +- drivers/input/input-mt.c | 3 + drivers/input/serio/i8042-acpipnpio.h | 20 +- drivers/input/serio/i8042.c | 10 +- drivers/irqchip/irq-gic-v3-its.c | 2 - drivers/irqchip/irq-renesas-rzg2l.c | 5 +- drivers/md/dm-clone-metadata.c | 5 - drivers/md/dm-ioctl.c | 22 +- drivers/md/dm.c | 4 +- drivers/md/md.c | 5 - drivers/md/persistent-data/dm-space-map-metadata.c | 4 +- drivers/md/raid5-cache.c | 47 +-- drivers/media/dvb-core/dvb_frontend.c | 12 +- drivers/media/pci/cx23885/cx23885-video.c | 8 + drivers/media/pci/solo6x10/solo6x10-offsets.h | 10 +- drivers/media/platform/qcom/venus/pm_helpers.c | 2 +- .../media/platform/samsung/s5p-mfc/s5p_mfc_enc.c | 2 +- drivers/media/radio/radio-isa.c | 2 +- drivers/memory/stm32-fmc2-ebi.c | 122 +++++-- drivers/memory/tegra/tegra186.c | 3 + drivers/mmc/core/mmc_test.c | 9 +- drivers/mmc/host/dw_mmc.c | 8 + drivers/net/bonding/bond_main.c | 21 +- drivers/net/bonding/bond_options.c | 2 +- drivers/net/dsa/mv88e6xxx/global1_atu.c | 3 +- drivers/net/dsa/ocelot/felix.c | 11 + drivers/net/dsa/vitesse-vsc73xx-core.c | 69 +++- drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 3 +- .../net/ethernet/freescale/dpaa2/dpaa2-switch.c | 7 +- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 28 +- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 7 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c | 3 + .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 4 +- drivers/net/ethernet/i825xx/sun3_82586.c | 2 +- drivers/net/ethernet/intel/ice/ice_base.c | 4 +- drivers/net/ethernet/intel/ice/ice_lib.c | 8 +- drivers/net/ethernet/intel/ice/ice_main.c | 10 +- drivers/net/ethernet/intel/ice/ice_txrx.c | 117 +++---- drivers/net/ethernet/intel/ice/ice_txrx.h | 6 +- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 1 + drivers/net/ethernet/intel/igc/igc_defines.h | 15 + drivers/net/ethernet/intel/igc/igc_main.c | 7 + drivers/net/ethernet/intel/igc/igc_regs.h | 1 + drivers/net/ethernet/intel/igc/igc_tsn.c | 64 ++++ drivers/net/ethernet/intel/igc/igc_tsn.h | 1 + .../net/ethernet/marvell/octeontx2/af/rvu_cpt.c | 23 +- .../ethernet/mellanox/mlx5/core/en/reporter_tx.c | 2 + .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 2 +- .../net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h | 9 +- .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 10 + .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h | 2 + .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c | 50 ++- drivers/net/ethernet/microsoft/mana/mana.h | 1 + drivers/net/ethernet/microsoft/mana/mana_en.c | 22 +- drivers/net/ethernet/mscc/ocelot.c | 91 ++++- drivers/net/ethernet/mscc/ocelot_fdma.c | 3 +- drivers/net/ethernet/mscc/ocelot_vsc7514.c | 4 + drivers/net/ethernet/xilinx/xilinx_axienet.h | 17 +- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 25 +- drivers/net/gtp.c | 3 + drivers/net/ppp/pppoe.c | 23 +- drivers/net/wireless/intel/iwlwifi/fw/debugfs.c | 6 +- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 2 +- .../net/wireless/marvell/mwifiex/11n_rxreorder.c | 2 +- drivers/net/wireless/st/cw1200/txrx.c | 2 +- drivers/nvme/host/core.c | 5 +- drivers/nvme/target/rdma.c | 16 +- drivers/nvme/target/tcp.c | 1 + drivers/nvme/target/trace.c | 6 +- drivers/nvme/target/trace.h | 28 +- drivers/platform/surface/aggregator/controller.c | 3 +- drivers/platform/x86/lg-laptop.c | 2 +- drivers/rtc/rtc-nct3018y.c | 6 +- drivers/s390/block/dasd.c | 36 +- drivers/s390/block/dasd_3990_erp.c | 10 +- drivers/s390/block/dasd_diag.c | 1 - drivers/s390/block/dasd_eckd.c | 58 ++- drivers/s390/block/dasd_int.h | 2 +- drivers/s390/cio/idset.c | 12 +- drivers/scsi/lpfc/lpfc_sli.c | 2 +- drivers/scsi/scsi_transport_spi.c | 4 +- drivers/soc/imx/imx93-pd.c | 5 +- drivers/ssb/main.c | 2 +- drivers/staging/iio/resolver/ad2s1210.c | 7 +- drivers/staging/ks7010/ks7010_sdio.c | 4 +- drivers/thunderbolt/switch.c | 1 + drivers/tty/serial/atmel_serial.c | 2 +- drivers/usb/dwc3/core.c | 13 + drivers/usb/gadget/udc/fsl_udc_core.c | 2 +- drivers/usb/host/xhci.c | 8 +- drivers/video/fbdev/offb.c | 3 +- fs/9p/xattr.c | 8 +- fs/afs/file.c | 8 +- fs/binfmt_elf_fdpic.c | 2 +- fs/binfmt_misc.c | 216 +++++++++--- fs/btrfs/delayed-inode.c | 4 +- fs/btrfs/disk-io.c | 2 + fs/btrfs/extent_io.c | 4 +- fs/btrfs/free-space-cache.c | 22 +- fs/btrfs/inode.c | 11 +- fs/btrfs/ioctl.c | 2 +- fs/btrfs/qgroup.c | 2 - fs/btrfs/reflink.c | 6 +- fs/btrfs/send.c | 63 +++- fs/btrfs/super.c | 2 +- fs/btrfs/tests/extent-io-tests.c | 28 +- fs/btrfs/tree-checker.c | 69 ++++ fs/erofs/decompressor.c | 8 +- fs/ext4/extents.c | 3 +- fs/ext4/mballoc.c | 3 + fs/ext4/super.c | 23 ++ fs/ext4/xattr.c | 158 ++++----- fs/f2fs/segment.c | 5 +- fs/file.c | 28 +- fs/fscache/cookie.c | 4 + fs/fuse/cuse.c | 3 +- fs/fuse/dev.c | 6 +- fs/fuse/fuse_i.h | 1 + fs/fuse/inode.c | 15 +- fs/fuse/virtio_fs.c | 10 + fs/gfs2/glock.c | 27 +- fs/gfs2/glock.h | 9 - fs/gfs2/glops.c | 66 ++-- fs/gfs2/incore.h | 2 +- fs/gfs2/inode.c | 2 +- fs/gfs2/lock_dlm.c | 5 - fs/gfs2/log.c | 2 - fs/gfs2/ops_fstype.c | 13 +- fs/gfs2/recovery.c | 28 +- fs/gfs2/super.c | 197 ++++++++--- fs/gfs2/super.h | 1 + fs/gfs2/sys.c | 2 +- fs/gfs2/util.c | 53 +-- fs/gfs2/util.h | 5 +- fs/inode.c | 39 ++- fs/jbd2/journal.c | 9 +- fs/jfs/jfs_dmap.c | 2 + fs/jfs/jfs_dtree.c | 2 + fs/kernfs/file.c | 8 +- fs/nfs/pnfs.c | 8 + fs/nfsd/nfs4proc.c | 4 + fs/nfsd/nfs4state.c | 2 +- fs/nfsd/nfsctl.c | 32 +- fs/nfsd/nfsd.h | 3 +- fs/nfsd/nfssvc.c | 85 ++--- fs/nfsd/vfs.c | 6 +- fs/nilfs2/btree.c | 1 + fs/nilfs2/dat.c | 11 + fs/nilfs2/direct.c | 1 + fs/ntfs3/bitmap.c | 4 +- fs/ntfs3/frecord.c | 75 +++- fs/ntfs3/fsntfs.c | 2 +- fs/ntfs3/index.c | 11 +- fs/ntfs3/ntfs_fs.h | 4 +- fs/ntfs3/super.c | 2 +- fs/quota/dquot.c | 5 +- fs/quota/quota_tree.c | 128 +++++-- fs/quota/quota_v2.c | 15 +- fs/reiserfs/stree.c | 2 +- fs/smb/server/smb2pdu.c | 3 +- fs/squashfs/block.c | 2 +- fs/squashfs/file.c | 3 +- fs/squashfs/file_direct.c | 6 +- fs/udf/namei.c | 1 - include/linux/bitmap.h | 20 +- include/linux/bpf_verifier.h | 23 +- include/linux/cpumask.h | 2 +- include/linux/dsa/ocelot.h | 47 +++ include/linux/fs.h | 5 + include/linux/if_vlan.h | 21 ++ include/linux/jbd2.h | 8 + include/linux/pid.h | 2 +- include/linux/sched/signal.h | 2 +- include/linux/sunrpc/svc.h | 13 - include/linux/udp.h | 2 +- include/linux/virtio_net.h | 35 +- include/net/cfg80211.h | 40 ++- include/net/inet_timewait_sock.h | 2 +- include/net/kcm.h | 1 + include/net/tcp.h | 2 +- include/scsi/scsi_cmnd.h | 2 +- include/soc/mscc/ocelot.h | 12 +- include/trace/events/huge_memory.h | 3 +- include/uapi/linux/bpf.h | 19 +- init/Kconfig | 7 +- kernel/bpf/Makefile | 3 +- kernel/bpf/log.c | 82 +++++ kernel/bpf/lpm_trie.c | 33 +- kernel/bpf/verifier.c | 69 ---- kernel/cgroup/cgroup.c | 4 +- kernel/pid.c | 7 +- kernel/pid_namespace.c | 2 +- kernel/rcu/rcu.h | 7 + kernel/rcu/srcutiny.c | 1 + kernel/rcu/srcutree.c | 1 + kernel/rcu/tasks.h | 1 + kernel/rcu/tiny.c | 1 + kernel/rcu/tree.c | 3 +- kernel/time/clocksource.c | 42 ++- kernel/time/hrtimer.c | 5 +- kernel/time/posix-timers.c | 31 +- lib/math/prime_numbers.c | 2 - mm/huge_memory.c | 30 +- mm/khugepaged.c | 20 ++ mm/memcontrol.c | 7 +- mm/memory-failure.c | 20 +- mm/memory.c | 29 +- mm/vmalloc.c | 11 +- net/bluetooth/bnep/core.c | 3 +- net/bluetooth/hci_conn.c | 11 +- net/bluetooth/hci_core.c | 24 +- net/bluetooth/mgmt.c | 4 + net/bluetooth/rfcomm/sock.c | 14 +- net/bluetooth/smp.c | 146 ++++---- net/bridge/br_netfilter_hooks.c | 6 +- net/core/filter.c | 8 +- net/core/skbuff.c | 8 +- net/dccp/ipv4.c | 2 +- net/dccp/ipv6.c | 6 - net/dsa/tag_ocelot.c | 37 +- net/ipv4/fou.c | 2 +- net/ipv4/inet_timewait_sock.c | 16 +- net/ipv4/tcp_ipv4.c | 16 +- net/ipv4/tcp_minisocks.c | 7 +- net/ipv4/tcp_offload.c | 3 + net/ipv4/udp_offload.c | 18 +- net/ipv6/ip6_output.c | 10 + net/ipv6/ip6_tunnel.c | 12 +- net/ipv6/netfilter/nf_conntrack_reasm.c | 4 + net/ipv6/tcp_ipv6.c | 6 - net/iucv/iucv.c | 3 +- net/kcm/kcmsock.c | 4 + net/mac80211/agg-tx.c | 6 +- net/mac80211/debugfs_netdev.c | 3 - net/mac80211/driver-ops.c | 3 - net/mac80211/ieee80211_i.h | 1 - net/mac80211/iface.c | 27 +- net/mac80211/rx.c | 387 +++++++++++---------- net/mac80211/sta_info.c | 17 + net/mac80211/sta_info.h | 3 + net/mctp/test/route-test.c | 2 +- net/mptcp/diag.c | 2 +- net/mptcp/pm_netlink.c | 31 +- net/netfilter/nf_flow_table_inet.c | 3 + net/netfilter/nf_flow_table_ip.c | 3 + net/netfilter/nf_flow_table_offload.c | 2 +- net/netfilter/nf_tables_api.c | 225 +++++++----- net/netfilter/nfnetlink_queue.c | 35 +- net/netfilter/nft_counter.c | 9 +- net/netlink/af_netlink.c | 13 +- net/rds/recv.c | 13 +- net/sched/sch_generic.c | 11 +- net/sched/sch_netem.c | 47 ++- net/sctp/inqueue.c | 14 +- net/wireless/core.h | 8 +- net/wireless/util.c | 195 +++++++---- samples/bpf/map_perf_test_user.c | 2 +- samples/bpf/xdp_router_ipv4_user.c | 2 +- scripts/rust_is_available.sh | 41 ++- security/selinux/avc.c | 2 +- sound/core/timer.c | 2 +- sound/pci/hda/patch_realtek.c | 1 - sound/soc/sof/ipc4.c | 9 +- sound/usb/mixer.c | 7 + sound/usb/quirks-table.h | 1 + sound/usb/quirks.c | 2 + tools/include/linux/bitmap.h | 7 +- tools/include/uapi/linux/bpf.h | 19 +- tools/testing/selftests/bpf/progs/map_ptr_kern.c | 2 +- tools/testing/selftests/bpf/test_lpm_map.c | 18 +- tools/testing/selftests/core/close_range_test.c | 35 ++ tools/testing/selftests/net/net_helper.sh | 25 ++ tools/testing/selftests/net/udpgro.sh | 57 +-- tools/testing/selftests/net/udpgro_bench.sh | 5 +- tools/testing/selftests/net/udpgro_frglist.sh | 5 +- tools/testing/selftests/net/udpgso.c | 2 +- tools/testing/selftests/tc-testing/tdc.py | 1 - tools/tracing/rtla/src/osnoise_top.c | 11 +- 335 files changed, 4050 insertions(+), 2027 deletions(-)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathieu Othacehe othacehe@gnu.org
commit c9f6613b16123989f2c3bd04b1d9b2365d6914e7 upstream.
In RS485 mode, the RTS pin is driven high by hardware when the transmitter is operating. This behaviour cannot be changed. This means that the driver should claim that it supports SER_RS485_RTS_ON_SEND and not SER_RS485_RTS_AFTER_SEND.
Otherwise, when configuring the port with the SER_RS485_RTS_ON_SEND, one get the following warning:
kern.warning kernel: atmel_usart_serial atmel_usart_serial.2.auto: ttyS1 (1): invalid RTS setting, using RTS_AFTER_SEND instead
which is contradictory with what's really happening.
Signed-off-by: Mathieu Othacehe othacehe@gnu.org Cc: stable stable@kernel.org Tested-by: Alexander Dahl ada@thorsis.com Fixes: af47c491e3c7 ("serial: atmel: Fill in rs485_supported") Link: https://lore.kernel.org/r/20240808060637.19886-1-othacehe@gnu.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/atmel_serial.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/serial/atmel_serial.c +++ b/drivers/tty/serial/atmel_serial.c @@ -2539,7 +2539,7 @@ static const struct uart_ops atmel_pops };
static const struct serial_rs485 atmel_rs485_supported = { - .flags = SER_RS485_ENABLED | SER_RS485_RTS_AFTER_SEND | SER_RS485_RX_DURING_TX, + .flags = SER_RS485_ENABLED | SER_RS485_RTS_ON_SEND | SER_RS485_RX_DURING_TX, .delay_rts_before_send = 1, .delay_rts_after_send = 1, };
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
commit 3c0da3d163eb32f1f91891efaade027fa9b245b9 upstream.
fuse_notify_store(), unlike fuse_do_readpage(), does not enable page zeroing (because it can be used to change partial page contents).
So fuse_notify_store() must be more careful to fully initialize page contents (including parts of the page that are beyond end-of-file) before marking the page uptodate.
The current code can leave beyond-EOF page contents uninitialized, which makes these uninitialized page contents visible to userspace via mmap().
This is an information leak, but only affects systems which do not enable init-on-alloc (via CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y or the corresponding kernel command line parameter).
Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2574 Cc: stable@kernel.org Fixes: a1d75f258230 ("fuse: add store request") Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/dev.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1615,9 +1615,11 @@ static int fuse_notify_store(struct fuse
this_num = min_t(unsigned, num, PAGE_SIZE - offset); err = fuse_copy_page(cs, &page, offset, this_num, 0); - if (!err && offset == 0 && - (this_num == PAGE_SIZE || file_size == end)) + if (!PageUptodate(page) && !err && offset == 0 && + (this_num == PAGE_SIZE || file_size == end)) { + zero_user_segment(page, this_num, PAGE_SIZE); SetPageUptodate(page); + } unlock_page(page); put_page(page);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eli Billauer eli.billauer@gmail.com
commit ccbde4b128ef9c73d14d0d7817d68ef795f6d131 upstream.
Triggered by a kref decrement, destroy_workqueue() may be called from within a work item for destroying its own workqueue. This illegal situation is averted by adding a module-global workqueue for exclusive use of the offending work item. Other work items continue to be queued on per-device workqueues to ensure performance.
Reported-by: syzbot+91dbdfecdd3287734d8e@syzkaller.appspotmail.com Cc: stable stable@kernel.org Closes: https://lore.kernel.org/lkml/0000000000000ab25a061e1dfe9f@google.com/ Signed-off-by: Eli Billauer eli.billauer@gmail.com Link: https://lore.kernel.org/r/20240801121126.60183-1-eli.billauer@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/xillybus/xillyusb.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)
--- a/drivers/char/xillybus/xillyusb.c +++ b/drivers/char/xillybus/xillyusb.c @@ -50,6 +50,7 @@ MODULE_LICENSE("GPL v2"); static const char xillyname[] = "xillyusb";
static unsigned int fifo_buf_order; +static struct workqueue_struct *wakeup_wq;
#define USB_VENDOR_ID_XILINX 0x03fd #define USB_VENDOR_ID_ALTERA 0x09fb @@ -561,10 +562,6 @@ static void cleanup_dev(struct kref *kre * errors if executed. The mechanism relies on that xdev->error is assigned * a non-zero value by report_io_error() prior to queueing wakeup_all(), * which prevents bulk_in_work() from calling process_bulk_in(). - * - * The fact that wakeup_all() and bulk_in_work() are queued on the same - * workqueue makes their concurrent execution very unlikely, however the - * kernel's API doesn't seem to ensure this strictly. */
static void wakeup_all(struct work_struct *work) @@ -619,7 +616,7 @@ static void report_io_error(struct xilly
if (do_once) { kref_get(&xdev->kref); /* xdev is used by work item */ - queue_work(xdev->workq, &xdev->wakeup_workitem); + queue_work(wakeup_wq, &xdev->wakeup_workitem); } }
@@ -2242,6 +2239,10 @@ static int __init xillyusb_init(void) { int rc = 0;
+ wakeup_wq = alloc_workqueue(xillyname, 0, 0); + if (!wakeup_wq) + return -ENOMEM; + if (LOG2_INITIAL_FIFO_BUF_SIZE > PAGE_SHIFT) fifo_buf_order = LOG2_INITIAL_FIFO_BUF_SIZE - PAGE_SHIFT; else @@ -2249,11 +2250,16 @@ static int __init xillyusb_init(void)
rc = usb_register(&xillyusb_driver);
+ if (rc) + destroy_workqueue(wakeup_wq); + return rc; }
static void __exit xillyusb_exit(void) { + destroy_workqueue(wakeup_wq); + usb_deregister(&xillyusb_driver); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eli Billauer eli.billauer@gmail.com
commit ad899c301c880766cc709aad277991b3ab671b66 upstream.
As the wakeup work item now runs on a separate workqueue, it needs to be flushed separately along with flushing the device's workqueue.
Also, move the destroy_workqueue() call to the end of the exit method, so that deinitialization is done in the opposite order of initialization.
Fixes: ccbde4b128ef ("char: xillybus: Don't destroy workqueue from work item running on it") Cc: stable stable@kernel.org Signed-off-by: Eli Billauer eli.billauer@gmail.com Link: https://lore.kernel.org/r/20240816070200.50695-1-eli.billauer@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/xillybus/xillyusb.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/char/xillybus/xillyusb.c +++ b/drivers/char/xillybus/xillyusb.c @@ -2079,9 +2079,11 @@ static int xillyusb_discovery(struct usb * just after responding with the IDT, there is no reason for any * work item to be running now. To be sure that xdev->channels * is updated on anything that might run in parallel, flush the - * workqueue, which rarely does anything. + * device's workqueue and the wakeup work item. This rarely + * does anything. */ flush_workqueue(xdev->workq); + flush_work(&xdev->wakeup_workitem);
xdev->num_channels = num_channels;
@@ -2258,9 +2260,9 @@ static int __init xillyusb_init(void)
static void __exit xillyusb_exit(void) { - destroy_workqueue(wakeup_wq); - usb_deregister(&xillyusb_driver); + + destroy_workqueue(wakeup_wq); }
module_init(xillyusb_init);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eli Billauer eli.billauer@gmail.com
commit 2374bf7558de915edc6ec8cb10ec3291dfab9594 upstream.
Ensure, as the driver probes the device, that all endpoints that the driver may attempt to access exist and are of the correct type.
All XillyUSB devices must have a Bulk IN and Bulk OUT endpoint at address 1. This is verified in xillyusb_setup_base_eps().
On top of that, a XillyUSB device may have additional Bulk OUT endpoints. The information about these endpoints' addresses is deduced from a data structure (the IDT) that the driver fetches from the device while probing it. These endpoints are checked in setup_channels().
A XillyUSB device never has more than one IN endpoint, as all data towards the host is multiplexed in this single Bulk IN endpoint. This is why setup_channels() only checks OUT endpoints.
Reported-by: syzbot+eac39cba052f2e750dbe@syzkaller.appspotmail.com Cc: stable stable@kernel.org Closes: https://lore.kernel.org/all/0000000000001d44a6061f7a54ee@google.com/T/ Fixes: a53d1202aef1 ("char: xillybus: Add driver for XillyUSB (Xillybus variant for USB)"). Signed-off-by: Eli Billauer eli.billauer@gmail.com Link: https://lore.kernel.org/r/20240816070200.50695-2-eli.billauer@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/xillybus/xillyusb.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)
--- a/drivers/char/xillybus/xillyusb.c +++ b/drivers/char/xillybus/xillyusb.c @@ -1889,6 +1889,13 @@ static const struct file_operations xill
static int xillyusb_setup_base_eps(struct xillyusb_dev *xdev) { + struct usb_device *udev = xdev->udev; + + /* Verify that device has the two fundamental bulk in/out endpoints */ + if (usb_pipe_type_check(udev, usb_sndbulkpipe(udev, MSG_EP_NUM)) || + usb_pipe_type_check(udev, usb_rcvbulkpipe(udev, IN_EP_NUM))) + return -ENODEV; + xdev->msg_ep = endpoint_alloc(xdev, MSG_EP_NUM | USB_DIR_OUT, bulk_out_work, 1, 2); if (!xdev->msg_ep) @@ -1918,14 +1925,15 @@ static int setup_channels(struct xillyus __le16 *chandesc, int num_channels) { - struct xillyusb_channel *chan; + struct usb_device *udev = xdev->udev; + struct xillyusb_channel *chan, *new_channels; int i;
chan = kcalloc(num_channels, sizeof(*chan), GFP_KERNEL); if (!chan) return -ENOMEM;
- xdev->channels = chan; + new_channels = chan;
for (i = 0; i < num_channels; i++, chan++) { unsigned int in_desc = le16_to_cpu(*chandesc++); @@ -1954,6 +1962,15 @@ static int setup_channels(struct xillyus */
if ((out_desc & 0x80) && i < 14) { /* Entry is valid */ + if (usb_pipe_type_check(udev, + usb_sndbulkpipe(udev, i + 2))) { + dev_err(xdev->dev, + "Missing BULK OUT endpoint %d\n", + i + 2); + kfree(new_channels); + return -ENODEV; + } + chan->writable = 1; chan->out_synchronous = !!(out_desc & 0x40); chan->out_seekable = !!(out_desc & 0x20); @@ -1963,6 +1980,7 @@ static int setup_channels(struct xillyus } }
+ xdev->channels = new_channels; return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lianqin Hu hulianqin@vivo.com
commit 004eb8ba776ccd3e296ea6f78f7ae7985b12824e upstream.
Audio control requests that sets sampling frequency sometimes fail on this card. Adding delay between control messages eliminates that problem.
Signed-off-by: Lianqin Hu hulianqin@vivo.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/TYUPR06MB6217FF67076AF3E49E12C877D2842@TYUPR06MB621... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -2179,6 +2179,8 @@ static const struct usb_audio_quirk_flag QUIRK_FLAG_GENERIC_IMPLICIT_FB), DEVICE_FLG(0x2b53, 0x0031, /* Fiero SC-01 (firmware v1.1.0) */ QUIRK_FLAG_GENERIC_IMPLICIT_FB), + DEVICE_FLG(0x2d95, 0x8021, /* VIVO USB-C-XE710 HEADSET */ + QUIRK_FLAG_CTL_MSG_DELAY_1M), DEVICE_FLG(0x30be, 0x0101, /* Schiit Hel */ QUIRK_FLAG_IGNORE_CTL_ERROR), DEVICE_FLG(0x413c, 0xa506, /* Dell AE515 sound bar */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juan José Arboleda soyjuanarbol@gmail.com
commit c286f204ce6ba7b48e3dcba53eda7df8eaa64dd9 upstream.
This patch adds a USB quirk for the Yamaha P-125 digital piano.
Signed-off-by: Juan José Arboleda soyjuanarbol@gmail.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240813161053.70256-1-soyjuanarbol@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks-table.h | 1 + 1 file changed, 1 insertion(+)
--- a/sound/usb/quirks-table.h +++ b/sound/usb/quirks-table.h @@ -273,6 +273,7 @@ YAMAHA_DEVICE(0x105a, NULL), YAMAHA_DEVICE(0x105b, NULL), YAMAHA_DEVICE(0x105c, NULL), YAMAHA_DEVICE(0x105d, NULL), +YAMAHA_DEVICE(0x1718, "P-125"), { USB_DEVICE(0x0499, 0x1503), .driver_info = (unsigned long) & (const struct snd_usb_audio_quirk) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Nyman mathias.nyman@linux.intel.com
commit af8e119f52e9c13e556be9e03f27957554a84656 upstream.
re-enumerating full-speed devices after a failed address device command can trigger a NULL pointer dereference.
Full-speed devices may need to reconfigure the endpoint 0 Max Packet Size value during enumeration. Usb core calls usb_ep0_reinit() in this case, which ends up calling xhci_configure_endpoint().
On Panther point xHC the xhci_configure_endpoint() function will additionally check and reserve bandwidth in software. Other hosts do this in hardware
If xHC address device command fails then a new xhci_virt_device structure is allocated as part of re-enabling the slot, but the bandwidth table pointers are not set up properly here. This triggers the NULL pointer dereference the next time usb_ep0_reinit() is called and xhci_configure_endpoint() tries to check and reserve bandwidth
[46710.713538] usb 3-1: new full-speed USB device number 5 using xhci_hcd [46710.713699] usb 3-1: Device not responding to setup address. [46710.917684] usb 3-1: Device not responding to setup address. [46711.125536] usb 3-1: device not accepting address 5, error -71 [46711.125594] BUG: kernel NULL pointer dereference, address: 0000000000000008 [46711.125600] #PF: supervisor read access in kernel mode [46711.125603] #PF: error_code(0x0000) - not-present page [46711.125606] PGD 0 P4D 0 [46711.125610] Oops: Oops: 0000 [#1] PREEMPT SMP PTI [46711.125615] CPU: 1 PID: 25760 Comm: kworker/1:2 Not tainted 6.10.3_2 #1 [46711.125620] Hardware name: Gigabyte Technology Co., Ltd. [46711.125623] Workqueue: usb_hub_wq hub_event [usbcore] [46711.125668] RIP: 0010:xhci_reserve_bandwidth (drivers/usb/host/xhci.c
Fix this by making sure bandwidth table pointers are set up correctly after a failed address device command, and additionally by avoiding checking for bandwidth in cases like this where no actual endpoints are added or removed, i.e. only context for default control endpoint 0 is evaluated.
Reported-by: Karel Balej balejk@matfyz.cz Closes: https://lore.kernel.org/linux-usb/D3CKQQAETH47.1MUO22RTCH2O3@matfyz.cz/ Cc: stable@vger.kernel.org Fixes: 651aaf36a7d7 ("usb: xhci: Handle USB transaction error on address command") Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20240815141117.2702314-2-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -2971,7 +2971,7 @@ static int xhci_configure_endpoint(struc xhci->num_active_eps); return -ENOMEM; } - if ((xhci->quirks & XHCI_SW_BW_CHECKING) && + if ((xhci->quirks & XHCI_SW_BW_CHECKING) && !ctx_change && xhci_reserve_bandwidth(xhci, virt_dev, command->in_ctx)) { if ((xhci->quirks & XHCI_EP_LIMIT_QUIRK)) xhci_free_host_resources(xhci, ctrl_ctx); @@ -4313,8 +4313,10 @@ static int xhci_setup_device(struct usb_ mutex_unlock(&xhci->mutex); ret = xhci_disable_slot(xhci, udev->slot_id); xhci_free_virt_device(xhci, udev->slot_id); - if (!ret) - xhci_alloc_dev(hcd, udev); + if (!ret) { + if (xhci_alloc_dev(hcd, udev) == 1) + xhci_setup_addressable_virt_dev(xhci, udev); + } kfree(command->completion); kfree(command); return -EPROTO;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mika Westerberg mika.westerberg@linux.intel.com
commit e2006140ad2e01a02ed0aff49cc2ae3ceeb11f8d upstream.
I noticed that when we do discrete host router NVM upgrade and it gets hot-removed from the PCIe side as a result of NVM firmware authentication, if there is another host connected with enabled paths we hang in tearing them down. This is due to fact that the Thunderbolt networking driver also tries to cleanup the paths and ends up blocking in tb_disconnect_xdomain_paths() waiting for the domain lock.
However, at this point we already cleaned the paths in tb_stop() so there is really no need for tb_disconnect_xdomain_paths() to do that anymore. Furthermore it already checks if the XDomain is unplugged and bails out early so take advantage of that and mark the XDomain as unplugged when we remove the parent router.
Cc: stable@vger.kernel.org Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thunderbolt/switch.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/thunderbolt/switch.c +++ b/drivers/thunderbolt/switch.c @@ -3086,6 +3086,7 @@ void tb_switch_remove(struct tb_switch * tb_switch_remove(port->remote->sw); port->remote = NULL; } else if (port->xdomain) { + port->xdomain->is_unplugged = true; tb_xdomain_remove(port->xdomain); port->xdomain = NULL; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Haberland sth@linux.ibm.com
commit 7db4042336580dfd75cb5faa82c12cd51098c90b upstream.
Extent Space Efficient (ESE) or thin provisioned volumes need to be formatted on demand during usual IO processing.
The dasd_ese_needs_format function checks for error codes that signal the non existence of a proper track format.
The check for incorrect length is to imprecise since other error cases leading to transport of insufficient data also have this flag set. This might lead to data corruption in certain error cases for example during a storage server warmstart.
Fix by removing the check for incorrect length and replacing by explicitly checking for invalid track format in transport mode.
Also remove the check for file protected since this is not a valid ESE handling case.
Cc: stable@vger.kernel.org # 5.3+ Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes") Reviewed-by: Jan Hoeppner hoeppner@linux.ibm.com Signed-off-by: Stefan Haberland sth@linux.ibm.com Link: https://lore.kernel.org/r/20240812125733.126431-3-sth@linux.ibm.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/block/dasd.c | 36 +++++++++++++++--------- drivers/s390/block/dasd_3990_erp.c | 10 +----- drivers/s390/block/dasd_eckd.c | 55 ++++++++++++++++--------------------- drivers/s390/block/dasd_int.h | 2 - 4 files changed, 50 insertions(+), 53 deletions(-)
--- a/drivers/s390/block/dasd.c +++ b/drivers/s390/block/dasd.c @@ -1597,9 +1597,15 @@ static int dasd_ese_needs_format(struct if (!sense) return 0;
- return !!(sense[1] & SNS1_NO_REC_FOUND) || - !!(sense[1] & SNS1_FILE_PROTECTED) || - scsw_cstat(&irb->scsw) == SCHN_STAT_INCORR_LEN; + if (sense[1] & SNS1_NO_REC_FOUND) + return 1; + + if ((sense[1] & SNS1_INV_TRACK_FORMAT) && + scsw_is_tm(&irb->scsw) && + !(sense[2] & SNS2_ENV_DATA_PRESENT)) + return 1; + + return 0; }
static int dasd_ese_oos_cond(u8 *sense) @@ -1620,7 +1626,7 @@ void dasd_int_handler(struct ccw_device struct dasd_device *device; unsigned long now; int nrf_suppressed = 0; - int fp_suppressed = 0; + int it_suppressed = 0; struct request *req; u8 *sense = NULL; int expires; @@ -1675,8 +1681,9 @@ void dasd_int_handler(struct ccw_device */ sense = dasd_get_sense(irb); if (sense) { - fp_suppressed = (sense[1] & SNS1_FILE_PROTECTED) && - test_bit(DASD_CQR_SUPPRESS_FP, &cqr->flags); + it_suppressed = (sense[1] & SNS1_INV_TRACK_FORMAT) && + !(sense[2] & SNS2_ENV_DATA_PRESENT) && + test_bit(DASD_CQR_SUPPRESS_IT, &cqr->flags); nrf_suppressed = (sense[1] & SNS1_NO_REC_FOUND) && test_bit(DASD_CQR_SUPPRESS_NRF, &cqr->flags);
@@ -1691,7 +1698,7 @@ void dasd_int_handler(struct ccw_device return; } } - if (!(fp_suppressed || nrf_suppressed)) + if (!(it_suppressed || nrf_suppressed)) device->discipline->dump_sense_dbf(device, irb, "int");
if (device->features & DASD_FEATURE_ERPLOG) @@ -2452,14 +2459,17 @@ retry: rc = 0; list_for_each_entry_safe(cqr, n, ccw_queue, blocklist) { /* - * In some cases the 'File Protected' or 'Incorrect Length' - * error might be expected and error recovery would be - * unnecessary in these cases. Check if the according suppress - * bit is set. + * In some cases certain errors might be expected and + * error recovery would be unnecessary in these cases. + * Check if the according suppress bit is set. */ sense = dasd_get_sense(&cqr->irb); - if (sense && sense[1] & SNS1_FILE_PROTECTED && - test_bit(DASD_CQR_SUPPRESS_FP, &cqr->flags)) + if (sense && (sense[1] & SNS1_INV_TRACK_FORMAT) && + !(sense[2] & SNS2_ENV_DATA_PRESENT) && + test_bit(DASD_CQR_SUPPRESS_IT, &cqr->flags)) + continue; + if (sense && (sense[1] & SNS1_NO_REC_FOUND) && + test_bit(DASD_CQR_SUPPRESS_NRF, &cqr->flags)) continue; if (scsw_cstat(&cqr->irb.scsw) == 0x40 && test_bit(DASD_CQR_SUPPRESS_IL, &cqr->flags)) --- a/drivers/s390/block/dasd_3990_erp.c +++ b/drivers/s390/block/dasd_3990_erp.c @@ -1406,14 +1406,8 @@ dasd_3990_erp_file_prot(struct dasd_ccw_
struct dasd_device *device = erp->startdev;
- /* - * In some cases the 'File Protected' error might be expected and - * log messages shouldn't be written then. - * Check if the according suppress bit is set. - */ - if (!test_bit(DASD_CQR_SUPPRESS_FP, &erp->flags)) - dev_err(&device->cdev->dev, - "Accessing the DASD failed because of a hardware error\n"); + dev_err(&device->cdev->dev, + "Accessing the DASD failed because of a hardware error\n");
return dasd_3990_erp_cleanup(erp, DASD_CQR_FAILED);
--- a/drivers/s390/block/dasd_eckd.c +++ b/drivers/s390/block/dasd_eckd.c @@ -2288,6 +2288,7 @@ dasd_eckd_analysis_ccw(struct dasd_devic cqr->status = DASD_CQR_FILLED; /* Set flags to suppress output for expected errors */ set_bit(DASD_CQR_SUPPRESS_NRF, &cqr->flags); + set_bit(DASD_CQR_SUPPRESS_IT, &cqr->flags);
return cqr; } @@ -2569,7 +2570,6 @@ dasd_eckd_build_check_tcw(struct dasd_de cqr->buildclk = get_tod_clock(); cqr->status = DASD_CQR_FILLED; /* Set flags to suppress output for expected errors */ - set_bit(DASD_CQR_SUPPRESS_FP, &cqr->flags); set_bit(DASD_CQR_SUPPRESS_IL, &cqr->flags);
return cqr; @@ -4145,8 +4145,6 @@ static struct dasd_ccw_req *dasd_eckd_bu
/* Set flags to suppress output for expected errors */ if (dasd_eckd_is_ese(basedev)) { - set_bit(DASD_CQR_SUPPRESS_FP, &cqr->flags); - set_bit(DASD_CQR_SUPPRESS_IL, &cqr->flags); set_bit(DASD_CQR_SUPPRESS_NRF, &cqr->flags); }
@@ -4648,9 +4646,8 @@ static struct dasd_ccw_req *dasd_eckd_bu
/* Set flags to suppress output for expected errors */ if (dasd_eckd_is_ese(basedev)) { - set_bit(DASD_CQR_SUPPRESS_FP, &cqr->flags); - set_bit(DASD_CQR_SUPPRESS_IL, &cqr->flags); set_bit(DASD_CQR_SUPPRESS_NRF, &cqr->flags); + set_bit(DASD_CQR_SUPPRESS_IT, &cqr->flags); }
return cqr; @@ -5821,36 +5818,32 @@ static void dasd_eckd_dump_sense(struct { u8 *sense = dasd_get_sense(irb);
- if (scsw_is_tm(&irb->scsw)) { - /* - * In some cases the 'File Protected' or 'Incorrect Length' - * error might be expected and log messages shouldn't be written - * then. Check if the according suppress bit is set. - */ - if (sense && (sense[1] & SNS1_FILE_PROTECTED) && - test_bit(DASD_CQR_SUPPRESS_FP, &req->flags)) - return; - if (scsw_cstat(&irb->scsw) == 0x40 && - test_bit(DASD_CQR_SUPPRESS_IL, &req->flags)) - return; + /* + * In some cases certain errors might be expected and + * log messages shouldn't be written then. + * Check if the according suppress bit is set. + */ + if (sense && (sense[1] & SNS1_INV_TRACK_FORMAT) && + !(sense[2] & SNS2_ENV_DATA_PRESENT) && + test_bit(DASD_CQR_SUPPRESS_IT, &req->flags)) + return;
- dasd_eckd_dump_sense_tcw(device, req, irb); - } else { - /* - * In some cases the 'Command Reject' or 'No Record Found' - * error might be expected and log messages shouldn't be - * written then. Check if the according suppress bit is set. - */ - if (sense && sense[0] & SNS0_CMD_REJECT && - test_bit(DASD_CQR_SUPPRESS_CR, &req->flags)) - return; + if (sense && sense[0] & SNS0_CMD_REJECT && + test_bit(DASD_CQR_SUPPRESS_CR, &req->flags)) + return;
- if (sense && sense[1] & SNS1_NO_REC_FOUND && - test_bit(DASD_CQR_SUPPRESS_NRF, &req->flags)) - return; + if (sense && sense[1] & SNS1_NO_REC_FOUND && + test_bit(DASD_CQR_SUPPRESS_NRF, &req->flags)) + return;
+ if (scsw_cstat(&irb->scsw) == 0x40 && + test_bit(DASD_CQR_SUPPRESS_IL, &req->flags)) + return; + + if (scsw_is_tm(&irb->scsw)) + dasd_eckd_dump_sense_tcw(device, req, irb); + else dasd_eckd_dump_sense_ccw(device, req, irb); - } }
static int dasd_eckd_reload_device(struct dasd_device *device) --- a/drivers/s390/block/dasd_int.h +++ b/drivers/s390/block/dasd_int.h @@ -225,7 +225,7 @@ struct dasd_ccw_req { * The following flags are used to suppress output of certain errors. */ #define DASD_CQR_SUPPRESS_NRF 4 /* Suppress 'No Record Found' error */ -#define DASD_CQR_SUPPRESS_FP 5 /* Suppress 'File Protected' error*/ +#define DASD_CQR_SUPPRESS_IT 5 /* Suppress 'Invalid Track' error*/ #define DASD_CQR_SUPPRESS_IL 6 /* Suppress 'Incorrect Length' error */ #define DASD_CQR_SUPPRESS_CR 7 /* Suppress 'Command Reject' error */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nam Cao namcao@linutronix.de
commit 57d76bc51fd80824bcc0c84a5b5ec944f1b51edd upstream.
With XIP kernel, kernel_map.size is set to be only the size of data part of the kernel. This is inconsistent with "normal" kernel, who sets it to be the size of the entire kernel.
More importantly, XIP kernel fails to boot if CONFIG_DEBUG_VIRTUAL is enabled, because there are checks on virtual addresses with the assumption that kernel_map.size is the size of the entire kernel (these checks are in arch/riscv/mm/physaddr.c).
Change XIP's kernel_map.size to be the size of the entire kernel.
Signed-off-by: Nam Cao namcao@linutronix.de Cc: stable@vger.kernel.org # v6.1+ Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20240508191917.2892064-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/mm/init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -816,7 +816,7 @@ static void __init create_kernel_page_ta PMD_SIZE, PAGE_KERNEL_EXEC);
/* Map the data in RAM */ - end_va = kernel_map.virt_addr + XIP_OFFSET + kernel_map.size; + end_va = kernel_map.virt_addr + kernel_map.size; for (va = kernel_map.virt_addr + XIP_OFFSET; va < end_va; va += PMD_SIZE) create_pgd_mapping(pgdir, va, kernel_map.phys_addr + (va - (kernel_map.virt_addr + XIP_OFFSET)), @@ -947,7 +947,7 @@ asmlinkage void __init setup_vm(uintptr_
phys_ram_base = CONFIG_PHYS_RAM_BASE; kernel_map.phys_addr = (uintptr_t)CONFIG_PHYS_RAM_BASE; - kernel_map.size = (uintptr_t)(&_end) - (uintptr_t)(&_sdata); + kernel_map.size = (uintptr_t)(&_end) - (uintptr_t)(&_start);
kernel_map.va_kernel_xip_pa_offset = kernel_map.virt_addr - kernel_map.xiprom; #else
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haibo Xu haibo1.xu@intel.com
commit a21dcf0ea8566ebbe011c79d6ed08cdfea771de3 upstream.
Currently, only acpi_early_node_map[0] was initialized to NUMA_NO_NODE. To ensure all the values were properly initialized, switch to initialize all of them to NUMA_NO_NODE.
Fixes: e18962491696 ("arm64: numa: rework ACPI NUMA initialization") Cc: stable@vger.kernel.org # 4.19.x Reported-by: Andrew Jones ajones@ventanamicro.com Suggested-by: Andrew Jones ajones@ventanamicro.com Signed-off-by: Haibo Xu haibo1.xu@intel.com Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Reviewed-by: Sunil V L sunilvl@ventanamicro.com Reviewed-by: Andrew Jones ajones@ventanamicro.com Acked-by: Catalin Marinas catalin.marinas@arm.com Acked-by: Lorenzo Pieralisi lpieralisi@kernel.org Reviewed-by: Hanjun Guo guohanjun@huawei.com Link: https://lore.kernel.org/r/853d7f74aa243f6f5999e203246f0d1ae92d2b61.172282842... Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/acpi_numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm64/kernel/acpi_numa.c +++ b/arch/arm64/kernel/acpi_numa.c @@ -27,7 +27,7 @@
#include <asm/numa.h>
-static int acpi_early_node_map[NR_CPUS] __initdata = { NUMA_NO_NODE }; +static int acpi_early_node_map[NR_CPUS] __initdata = { [0 ... NR_CPUS - 1] = NUMA_NO_NODE };
int __init acpi_numa_get_nid(unsigned int cpu) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Khazhismel Kumykov khazhy@google.com
commit 7a636b4f03af9d541205f69e373672e7b2b60a8a upstream.
If the dm_resume method is called on a device that is not suspended, the method will suspend the device briefly, before resuming it (so that the table will be swapped).
However, there was a bug that the return value of dm_suspended_md was not checked. dm_suspended_md may return an error when it is interrupted by a signal. In this case, do_resume would call dm_swap_table, which would return -EINVAL.
This commit fixes the logic, so that error returned by dm_suspend is checked and the resume operation is undone.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Khazhismel Kumykov khazhy@google.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-ioctl.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)
--- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1156,8 +1156,26 @@ static int do_resume(struct dm_ioctl *pa suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG; if (param->flags & DM_NOFLUSH_FLAG) suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG; - if (!dm_suspended_md(md)) - dm_suspend(md, suspend_flags); + if (!dm_suspended_md(md)) { + r = dm_suspend(md, suspend_flags); + if (r) { + down_write(&_hash_lock); + hc = dm_get_mdptr(md); + if (hc && !hc->new_map) { + hc->new_map = new_map; + new_map = NULL; + } else { + r = -ENXIO; + } + up_write(&_hash_lock); + if (new_map) { + dm_sync_table(md); + dm_table_destroy(new_map); + } + dm_put(md); + return r; + } + }
old_size = dm_get_size(md); old_map = dm_swap_table(md, new_map);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
commit faada2174c08662ae98b439c69efe3e79382c538 upstream.
kmalloc is unreliable when allocating more than 8 pages of memory. It may fail when there is plenty of free memory but the memory is fragmented. Zdenek Kabelac observed such failure in his tests.
This commit changes kmalloc to kvmalloc - kvmalloc will fall back to vmalloc if the large allocation fails.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Reported-by: Zdenek Kabelac zkabelac@redhat.com Reviewed-by: Mike Snitzer snitzer@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/persistent-data/dm-space-map-metadata.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/md/persistent-data/dm-space-map-metadata.c +++ b/drivers/md/persistent-data/dm-space-map-metadata.c @@ -274,7 +274,7 @@ static void sm_metadata_destroy(struct d { struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm);
- kfree(smm); + kvfree(smm); }
static int sm_metadata_get_nr_blocks(struct dm_space_map *sm, dm_block_t *count) @@ -768,7 +768,7 @@ struct dm_space_map *dm_sm_metadata_init { struct sm_metadata *smm;
- smm = kmalloc(sizeof(*smm), GFP_KERNEL); + smm = kvmalloc(sizeof(*smm), GFP_KERNEL); if (!smm) return ERR_PTR(-ENOMEM);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhihao Cheng chengzhihao1@huawei.com
commit 2a0629834cd82f05d424bbc193374f9a43d1f87d upstream.
The inode reclaiming process(See function prune_icache_sb) collects all reclaimable inodes and mark them with I_FREEING flag at first, at that time, other processes will be stuck if they try getting these inodes (See function find_inode_fast), then the reclaiming process destroy the inodes by function dispose_list(). Some filesystems(eg. ext4 with ea_inode feature, ubifs with xattr) may do inode lookup in the inode evicting callback function, if the inode lookup is operated under the inode lru traversing context, deadlock problems may happen.
Case 1: In function ext4_evict_inode(), the ea inode lookup could happen if ea_inode feature is enabled, the lookup process will be stuck under the evicting context like this:
1. File A has inode i_reg and an ea inode i_ea 2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea 3. Then, following three processes running like this:
PA PB echo 2 > /proc/sys/vm/drop_caches shrink_slab prune_dcache_sb // i_reg is added into lru, lru->i_ea->i_reg prune_icache_sb list_lru_walk_one inode_lru_isolate i_ea->i_state |= I_FREEING // set inode state inode_lru_isolate __iget(i_reg) spin_unlock(&i_reg->i_lock) spin_unlock(lru_lock) rm file A i_reg->nlink = 0 iput(i_reg) // i_reg->nlink is 0, do evict ext4_evict_inode ext4_xattr_delete_inode ext4_xattr_inode_dec_ref_all ext4_xattr_inode_iget ext4_iget(i_ea->i_ino) iget_locked find_inode_fast __wait_on_freeing_inode(i_ea) ----→ AA deadlock dispose_list // cannot be executed by prune_icache_sb wake_up_bit(&i_ea->i_state)
Case 2: In deleted inode writing function ubifs_jnl_write_inode(), file deleting process holds BASEHD's wbuf->io_mutex while getting the xattr inode, which could race with inode reclaiming process(The reclaiming process could try locking BASEHD's wbuf->io_mutex in inode evicting function), then an ABBA deadlock problem would happen as following:
1. File A has inode ia and a xattr(with inode ixa), regular file B has inode ib and a xattr. 2. getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa 3. Then, following three processes running like this:
PA PB PC echo 2 > /proc/sys/vm/drop_caches shrink_slab prune_dcache_sb // ib and ia are added into lru, lru->ixa->ib->ia prune_icache_sb list_lru_walk_one inode_lru_isolate ixa->i_state |= I_FREEING // set inode state inode_lru_isolate __iget(ib) spin_unlock(&ib->i_lock) spin_unlock(lru_lock) rm file B ib->nlink = 0 rm file A iput(ia) ubifs_evict_inode(ia) ubifs_jnl_delete_inode(ia) ubifs_jnl_write_inode(ia) make_reservation(BASEHD) // Lock wbuf->io_mutex ubifs_iget(ixa->i_ino) iget_locked find_inode_fast __wait_on_freeing_inode(ixa) | iput(ib) // ib->nlink is 0, do evict | ubifs_evict_inode | ubifs_jnl_delete_inode(ib) ↓ ubifs_jnl_write_inode ABBA deadlock ←-----make_reservation(BASEHD) dispose_list // cannot be executed by prune_icache_sb wake_up_bit(&ixa->i_state)
Fix the possible deadlock by using new inode state flag I_LRU_ISOLATING to pin the inode in memory while inode_lru_isolate() reclaims its pages instead of using ordinary inode reference. This way inode deletion cannot be triggered from inode_lru_isolate() thus avoiding the deadlock. evict() is made to wait for I_LRU_ISOLATING to be cleared before proceeding with inode cleanup.
Link: https://lore.kernel.org/all/37c29c42-7685-d1f0-067d-63582ffac405@huaweicloud... Link: https://bugzilla.kernel.org/show_bug.cgi?id=219022 Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Fixes: 7959cf3a7506 ("ubifs: journal: Handle xattrs like files") Cc: stable@vger.kernel.org Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Link: https://lore.kernel.org/r/20240809031628.1069873-1-chengzhihao@huaweicloud.c... Reviewed-by: Jan Kara jack@suse.cz Suggested-by: Jan Kara jack@suse.cz Suggested-by: Mateusz Guzik mjguzik@gmail.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/inode.c | 39 +++++++++++++++++++++++++++++++++++++-- include/linux/fs.h | 5 +++++ 2 files changed, 42 insertions(+), 2 deletions(-)
--- a/fs/inode.c +++ b/fs/inode.c @@ -486,6 +486,39 @@ static void inode_lru_list_del(struct in this_cpu_dec(nr_unused); }
+static void inode_pin_lru_isolating(struct inode *inode) +{ + lockdep_assert_held(&inode->i_lock); + WARN_ON(inode->i_state & (I_LRU_ISOLATING | I_FREEING | I_WILL_FREE)); + inode->i_state |= I_LRU_ISOLATING; +} + +static void inode_unpin_lru_isolating(struct inode *inode) +{ + spin_lock(&inode->i_lock); + WARN_ON(!(inode->i_state & I_LRU_ISOLATING)); + inode->i_state &= ~I_LRU_ISOLATING; + smp_mb(); + wake_up_bit(&inode->i_state, __I_LRU_ISOLATING); + spin_unlock(&inode->i_lock); +} + +static void inode_wait_for_lru_isolating(struct inode *inode) +{ + spin_lock(&inode->i_lock); + if (inode->i_state & I_LRU_ISOLATING) { + DEFINE_WAIT_BIT(wq, &inode->i_state, __I_LRU_ISOLATING); + wait_queue_head_t *wqh; + + wqh = bit_waitqueue(&inode->i_state, __I_LRU_ISOLATING); + spin_unlock(&inode->i_lock); + __wait_on_bit(wqh, &wq, bit_wait, TASK_UNINTERRUPTIBLE); + spin_lock(&inode->i_lock); + WARN_ON(inode->i_state & I_LRU_ISOLATING); + } + spin_unlock(&inode->i_lock); +} + /** * inode_sb_list_add - add inode to the superblock list of inodes * @inode: inode to add @@ -654,6 +687,8 @@ static void evict(struct inode *inode)
inode_sb_list_del(inode);
+ inode_wait_for_lru_isolating(inode); + /* * Wait for flusher thread to be done with the inode so that filesystem * does not start destroying it while writeback is still running. Since @@ -855,7 +890,7 @@ static enum lru_status inode_lru_isolate * be under pressure before the cache inside the highmem zone. */ if (inode_has_buffers(inode) || !mapping_empty(&inode->i_data)) { - __iget(inode); + inode_pin_lru_isolating(inode); spin_unlock(&inode->i_lock); spin_unlock(lru_lock); if (remove_inode_buffers(inode)) { @@ -868,7 +903,7 @@ static enum lru_status inode_lru_isolate if (current->reclaim_state) current->reclaim_state->reclaimed_slab += reap; } - iput(inode); + inode_unpin_lru_isolating(inode); spin_lock(lru_lock); return LRU_RETRY; } --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2507,6 +2507,9 @@ static inline void kiocb_clone(struct ki * * I_PINNING_FSCACHE_WB Inode is pinning an fscache object for writeback. * + * I_LRU_ISOLATING Inode is pinned being isolated from LRU without holding + * i_count. + * * Q: What is the difference between I_WILL_FREE and I_FREEING? */ #define I_DIRTY_SYNC (1 << 0) @@ -2530,6 +2533,8 @@ static inline void kiocb_clone(struct ki #define I_DONTCACHE (1 << 16) #define I_SYNC_QUEUED (1 << 17) #define I_PINNING_FSCACHE_WB (1 << 18) +#define __I_LRU_ISOLATING 19 +#define I_LRU_ISOLATING (1 << __I_LRU_ISOLATING)
#define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC) #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Lobakin aleksander.lobakin@intel.com
commit 3f5ef5109f6a054ce58b3bec7214ed76c9cc269f upstream.
bitmap_size() is a pretty generic name and one may want to use it for a generic bitmap API function. At the same time, its logic is NTFS-specific, as it aligns to the sizeof(u64), not the sizeof(long) (although it uses ideologically right ALIGN() instead of division). Add the prefix 'ntfs3_' used for that FS (not just 'ntfs_' to not mix it with the legacy module) and use generic BITS_TO_U64() while at it.
Suggested-by: Yury Norov yury.norov@gmail.com # BITS_TO_U64() Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Yury Norov yury.norov@gmail.com Signed-off-by: Alexander Lobakin aleksander.lobakin@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ntfs3/bitmap.c | 4 ++-- fs/ntfs3/fsntfs.c | 2 +- fs/ntfs3/index.c | 11 ++++++----- fs/ntfs3/ntfs_fs.h | 4 ++-- fs/ntfs3/super.c | 2 +- 5 files changed, 12 insertions(+), 11 deletions(-)
--- a/fs/ntfs3/bitmap.c +++ b/fs/ntfs3/bitmap.c @@ -656,7 +656,7 @@ int wnd_init(struct wnd_bitmap *wnd, str wnd->total_zeroes = nbits; wnd->extent_max = MINUS_ONE_T; wnd->zone_bit = wnd->zone_end = 0; - wnd->nwnd = bytes_to_block(sb, bitmap_size(nbits)); + wnd->nwnd = bytes_to_block(sb, ntfs3_bitmap_size(nbits)); wnd->bits_last = nbits & (wbits - 1); if (!wnd->bits_last) wnd->bits_last = wbits; @@ -1320,7 +1320,7 @@ int wnd_extend(struct wnd_bitmap *wnd, s return -EINVAL;
/* Align to 8 byte boundary. */ - new_wnd = bytes_to_block(sb, bitmap_size(new_bits)); + new_wnd = bytes_to_block(sb, ntfs3_bitmap_size(new_bits)); new_last = new_bits & (wbits - 1); if (!new_last) new_last = wbits; --- a/fs/ntfs3/fsntfs.c +++ b/fs/ntfs3/fsntfs.c @@ -493,7 +493,7 @@ static int ntfs_extend_mft(struct ntfs_s ni->mi.dirty = true;
/* Step 2: Resize $MFT::BITMAP. */ - new_bitmap_bytes = bitmap_size(new_mft_total); + new_bitmap_bytes = ntfs3_bitmap_size(new_mft_total);
err = attr_set_size(ni, ATTR_BITMAP, NULL, 0, &sbi->mft.bitmap.run, new_bitmap_bytes, &new_bitmap_bytes, true, NULL); --- a/fs/ntfs3/index.c +++ b/fs/ntfs3/index.c @@ -1454,8 +1454,8 @@ static int indx_create_allocate(struct n
alloc->nres.valid_size = alloc->nres.data_size = cpu_to_le64(data_size);
- err = ni_insert_resident(ni, bitmap_size(1), ATTR_BITMAP, in->name, - in->name_len, &bitmap, NULL, NULL); + err = ni_insert_resident(ni, ntfs3_bitmap_size(1), ATTR_BITMAP, + in->name, in->name_len, &bitmap, NULL, NULL); if (err) goto out2;
@@ -1516,8 +1516,9 @@ static int indx_add_allocate(struct ntfs if (bmp) { /* Increase bitmap. */ err = attr_set_size(ni, ATTR_BITMAP, in->name, in->name_len, - &indx->bitmap_run, bitmap_size(bit + 1), - NULL, true, NULL); + &indx->bitmap_run, + ntfs3_bitmap_size(bit + 1), NULL, true, + NULL); if (err) goto out1; } @@ -2080,7 +2081,7 @@ static int indx_shrink(struct ntfs_index if (err) return err;
- bpb = bitmap_size(bit); + bpb = ntfs3_bitmap_size(bit); if (bpb * 8 == nbits) return 0;
--- a/fs/ntfs3/ntfs_fs.h +++ b/fs/ntfs3/ntfs_fs.h @@ -951,9 +951,9 @@ static inline bool run_is_empty(struct r }
/* NTFS uses quad aligned bitmaps. */ -static inline size_t bitmap_size(size_t bits) +static inline size_t ntfs3_bitmap_size(size_t bits) { - return ALIGN((bits + 7) >> 3, 8); + return BITS_TO_U64(bits) * sizeof(u64); }
#define _100ns2seconds 10000000 --- a/fs/ntfs3/super.c +++ b/fs/ntfs3/super.c @@ -1108,7 +1108,7 @@ static int ntfs_fill_super(struct super_
/* Check bitmap boundary. */ tt = sbi->used.bitmap.nbits; - if (inode->i_size < bitmap_size(tt)) { + if (inode->i_size < ntfs3_bitmap_size(tt)) { err = -EINVAL; goto put_inode_out; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Lobakin aleksander.lobakin@intel.com
commit c1023f5634b9bfcbfff0dc200245309e3cde9b54 upstream.
bitmap_size() is a pretty generic name and one may want to use it for a generic bitmap API function. At the same time, its logic is not "generic", i.e. it's not just `nbits -> size of bitmap in bytes` converter as it would be expected from its name. Add the prefix 'idset_' used throughout the file where the function resides.
Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Acked-by: Peter Oberparleiter oberpar@linux.ibm.com Signed-off-by: Alexander Lobakin aleksander.lobakin@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/cio/idset.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/drivers/s390/cio/idset.c +++ b/drivers/s390/cio/idset.c @@ -16,7 +16,7 @@ struct idset { unsigned long bitmap[]; };
-static inline unsigned long bitmap_size(int num_ssid, int num_id) +static inline unsigned long idset_bitmap_size(int num_ssid, int num_id) { return BITS_TO_LONGS(num_ssid * num_id) * sizeof(unsigned long); } @@ -25,11 +25,12 @@ static struct idset *idset_new(int num_s { struct idset *set;
- set = vmalloc(sizeof(struct idset) + bitmap_size(num_ssid, num_id)); + set = vmalloc(sizeof(struct idset) + + idset_bitmap_size(num_ssid, num_id)); if (set) { set->num_ssid = num_ssid; set->num_id = num_id; - memset(set->bitmap, 0, bitmap_size(num_ssid, num_id)); + memset(set->bitmap, 0, idset_bitmap_size(num_ssid, num_id)); } return set; } @@ -41,7 +42,8 @@ void idset_free(struct idset *set)
void idset_fill(struct idset *set) { - memset(set->bitmap, 0xff, bitmap_size(set->num_ssid, set->num_id)); + memset(set->bitmap, 0xff, + idset_bitmap_size(set->num_ssid, set->num_id)); }
static inline void idset_add(struct idset *set, int ssid, int id)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Lobakin aleksander.lobakin@intel.com
commit 4ca532d64648d4776d15512caed3efea05ca7195 upstream.
bitmap_set_bits() does not start with the FS' prefix and may collide with a new generic helper one day. It operates with the FS-specific types, so there's no change those two could do the same thing. Just add the prefix to exclude such possible conflict.
Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Acked-by: David Sterba dsterba@suse.com Reviewed-by: Yury Norov yury.norov@gmail.com Signed-off-by: Alexander Lobakin aleksander.lobakin@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/free-space-cache.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1894,9 +1894,9 @@ static inline void bitmap_clear_bits(str ctl->free_space -= bytes; }
-static void bitmap_set_bits(struct btrfs_free_space_ctl *ctl, - struct btrfs_free_space *info, u64 offset, - u64 bytes) +static void btrfs_bitmap_set_bits(struct btrfs_free_space_ctl *ctl, + struct btrfs_free_space *info, u64 offset, + u64 bytes) { unsigned long start, count, end; int extent_delta = 1; @@ -2232,7 +2232,7 @@ static u64 add_bytes_to_bitmap(struct bt
bytes_to_set = min(end - offset, bytes);
- bitmap_set_bits(ctl, info, offset, bytes_to_set); + btrfs_bitmap_set_bits(ctl, info, offset, bytes_to_set);
return bytes_to_set;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Lobakin aleksander.lobakin@intel.com
commit a37fbe666c016fd89e4460d0ebfcea05baba46dc upstream.
The number of times yet another open coded `BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge. Some generic helper is long overdue.
Add one, bitmap_size(), but with one detail. BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both divident and divisor are compile-time constants or when the divisor is not a pow-of-2. When it is however, the compilers sometimes tend to generate suboptimal code (GCC 13):
48 83 c0 3f add $0x3f,%rax 48 c1 e8 06 shr $0x6,%rax 48 8d 14 c5 00 00 00 00 lea 0x0(,%rax,8),%rdx
%BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does full division of `nbits + 63` by it and then multiplication by 8. Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC:
8d 50 3f lea 0x3f(%rax),%edx c1 ea 03 shr $0x3,%edx 81 e2 f8 ff ff 1f and $0x1ffffff8,%edx
Now it shifts `nbits + 63` by 3 positions (IOW performs fast division by 8) and then masks bits[2:0]. bloat-o-meter:
add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617)
Clang does it better and generates the same code before/after starting from -O1, except that with the ALIGN() approach it uses %edx and thus still saves some bytes:
add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520)
Note that we can't expand DIV_ROUND_UP() by adding a check and using this approach there, as it's used in array declarations where expressions are not allowed. Add this helper to tools/ as well.
Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Acked-by: Yury Norov yury.norov@gmail.com Signed-off-by: Alexander Lobakin aleksander.lobakin@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-clone-metadata.c | 5 ----- drivers/s390/cio/idset.c | 2 +- include/linux/bitmap.h | 8 +++++--- include/linux/cpumask.h | 2 +- lib/math/prime_numbers.c | 2 -- tools/include/linux/bitmap.h | 7 ++++--- 6 files changed, 11 insertions(+), 15 deletions(-)
--- a/drivers/md/dm-clone-metadata.c +++ b/drivers/md/dm-clone-metadata.c @@ -465,11 +465,6 @@ static void __destroy_persistent_data_st
/*---------------------------------------------------------------------------*/
-static size_t bitmap_size(unsigned long nr_bits) -{ - return BITS_TO_LONGS(nr_bits) * sizeof(long); -} - static int __dirty_map_init(struct dirty_map *dmap, unsigned long nr_words, unsigned long nr_regions) { --- a/drivers/s390/cio/idset.c +++ b/drivers/s390/cio/idset.c @@ -18,7 +18,7 @@ struct idset {
static inline unsigned long idset_bitmap_size(int num_ssid, int num_id) { - return BITS_TO_LONGS(num_ssid * num_id) * sizeof(unsigned long); + return bitmap_size(size_mul(num_ssid, num_id)); }
static struct idset *idset_new(int num_ssid, int num_id) --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -237,9 +237,11 @@ extern int bitmap_print_list_to_buf(char #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1))) #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
+#define bitmap_size(nbits) (ALIGN(nbits, BITS_PER_LONG) / BITS_PER_BYTE) + static inline void bitmap_zero(unsigned long *dst, unsigned int nbits) { - unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); + unsigned int len = bitmap_size(nbits);
if (small_const_nbits(nbits)) *dst = 0; @@ -249,7 +251,7 @@ static inline void bitmap_zero(unsigned
static inline void bitmap_fill(unsigned long *dst, unsigned int nbits) { - unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); + unsigned int len = bitmap_size(nbits);
if (small_const_nbits(nbits)) *dst = ~0UL; @@ -260,7 +262,7 @@ static inline void bitmap_fill(unsigned static inline void bitmap_copy(unsigned long *dst, const unsigned long *src, unsigned int nbits) { - unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); + unsigned int len = bitmap_size(nbits);
if (small_const_nbits(nbits)) *dst = *src; --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -769,7 +769,7 @@ static inline int cpulist_parse(const ch */ static inline unsigned int cpumask_size(void) { - return BITS_TO_LONGS(nr_cpumask_bits) * sizeof(long); + return bitmap_size(nr_cpumask_bits); }
/* --- a/lib/math/prime_numbers.c +++ b/lib/math/prime_numbers.c @@ -6,8 +6,6 @@ #include <linux/prime_numbers.h> #include <linux/slab.h>
-#define bitmap_size(nbits) (BITS_TO_LONGS(nbits) * sizeof(unsigned long)) - struct primes { struct rcu_head rcu; unsigned long last, sz; --- a/tools/include/linux/bitmap.h +++ b/tools/include/linux/bitmap.h @@ -25,13 +25,14 @@ bool __bitmap_intersects(const unsigned #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1))) #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
+#define bitmap_size(nbits) (ALIGN(nbits, BITS_PER_LONG) / BITS_PER_BYTE) + static inline void bitmap_zero(unsigned long *dst, unsigned int nbits) { if (small_const_nbits(nbits)) *dst = 0UL; else { - int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); - memset(dst, 0, len); + memset(dst, 0, bitmap_size(nbits)); } }
@@ -117,7 +118,7 @@ static inline int test_and_clear_bit(int */ static inline unsigned long *bitmap_zalloc(int nbits) { - return calloc(1, BITS_TO_LONGS(nbits) * sizeof(unsigned long)); + return calloc(1, bitmap_size(nbits)); }
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Al Viro viro@zeniv.linux.org.uk
commit 9a2fa1472083580b6c66bdaf291f591e1170123a upstream.
copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied.
For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to.
The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe.
Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open.
The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first.
* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset().
Reproducer added to tools/testing/selftests/core/close_range_test.c
Cc: stable@vger.kernel.org Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/file.c | 28 ++++++++----------- include/linux/bitmap.h | 12 ++++++++ tools/testing/selftests/core/close_range_test.c | 35 ++++++++++++++++++++++++ 3 files changed, 59 insertions(+), 16 deletions(-)
--- a/fs/file.c +++ b/fs/file.c @@ -46,27 +46,23 @@ static void free_fdtable_rcu(struct rcu_ #define BITBIT_NR(nr) BITS_TO_LONGS(BITS_TO_LONGS(nr)) #define BITBIT_SIZE(nr) (BITBIT_NR(nr) * sizeof(long))
+#define fdt_words(fdt) ((fdt)->max_fds / BITS_PER_LONG) // words in ->open_fds /* * Copy 'count' fd bits from the old table to the new table and clear the extra * space if any. This does not copy the file pointers. Called with the files * spinlock held for write. */ -static void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt, - unsigned int count) +static inline void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt, + unsigned int copy_words) { - unsigned int cpy, set; + unsigned int nwords = fdt_words(nfdt);
- cpy = count / BITS_PER_BYTE; - set = (nfdt->max_fds - count) / BITS_PER_BYTE; - memcpy(nfdt->open_fds, ofdt->open_fds, cpy); - memset((char *)nfdt->open_fds + cpy, 0, set); - memcpy(nfdt->close_on_exec, ofdt->close_on_exec, cpy); - memset((char *)nfdt->close_on_exec + cpy, 0, set); - - cpy = BITBIT_SIZE(count); - set = BITBIT_SIZE(nfdt->max_fds) - cpy; - memcpy(nfdt->full_fds_bits, ofdt->full_fds_bits, cpy); - memset((char *)nfdt->full_fds_bits + cpy, 0, set); + bitmap_copy_and_extend(nfdt->open_fds, ofdt->open_fds, + copy_words * BITS_PER_LONG, nwords * BITS_PER_LONG); + bitmap_copy_and_extend(nfdt->close_on_exec, ofdt->close_on_exec, + copy_words * BITS_PER_LONG, nwords * BITS_PER_LONG); + bitmap_copy_and_extend(nfdt->full_fds_bits, ofdt->full_fds_bits, + copy_words, nwords); }
/* @@ -84,7 +80,7 @@ static void copy_fdtable(struct fdtable memcpy(nfdt->fd, ofdt->fd, cpy); memset((char *)nfdt->fd + cpy, 0, set);
- copy_fd_bitmaps(nfdt, ofdt, ofdt->max_fds); + copy_fd_bitmaps(nfdt, ofdt, fdt_words(ofdt)); }
/* @@ -374,7 +370,7 @@ struct files_struct *dup_fd(struct files open_files = sane_fdtable_size(old_fdt, max_fds); }
- copy_fd_bitmaps(new_fdt, old_fdt, open_files); + copy_fd_bitmaps(new_fdt, old_fdt, open_files / BITS_PER_LONG);
old_fds = old_fdt->fd; new_fds = new_fdt->fd; --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -281,6 +281,18 @@ static inline void bitmap_copy_clear_tai dst[nbits / BITS_PER_LONG] &= BITMAP_LAST_WORD_MASK(nbits); }
+static inline void bitmap_copy_and_extend(unsigned long *to, + const unsigned long *from, + unsigned int count, unsigned int size) +{ + unsigned int copy = BITS_TO_LONGS(count); + + memcpy(to, from, copy * sizeof(long)); + if (count % BITS_PER_LONG) + to[copy - 1] &= BITMAP_LAST_WORD_MASK(count); + memset(to + copy, 0, bitmap_size(size) - copy * sizeof(long)); +} + /* * On 32-bit systems bitmaps are represented as u32 arrays internally. On LE64 * machines the order of hi and lo parts of numbers match the bitmap structure. --- a/tools/testing/selftests/core/close_range_test.c +++ b/tools/testing/selftests/core/close_range_test.c @@ -563,4 +563,39 @@ TEST(close_range_cloexec_unshare_syzbot) EXPECT_EQ(close(fd3), 0); }
+TEST(close_range_bitmap_corruption) +{ + pid_t pid; + int status; + struct __clone_args args = { + .flags = CLONE_FILES, + .exit_signal = SIGCHLD, + }; + + /* get the first 128 descriptors open */ + for (int i = 2; i < 128; i++) + EXPECT_GE(dup2(0, i), 0); + + /* get descriptor table shared */ + pid = sys_clone3(&args, sizeof(args)); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* unshare and truncate descriptor table down to 64 */ + if (sys_close_range(64, ~0U, CLOSE_RANGE_UNSHARE)) + exit(EXIT_FAILURE); + + ASSERT_EQ(fcntl(64, F_GETFD), -1); + /* ... and verify that the range 64..127 is not + stuck "fully used" according to secondary bitmap */ + EXPECT_EQ(dup(0), 64) + exit(EXIT_FAILURE); + exit(EXIT_SUCCESS); + } + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); +} + TEST_HARNESS_MAIN
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Shyti andi.shyti@kernel.org
commit 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8 upstream.
Add the missing geni_icc_disable() call before returning in the geni_i2c_runtime_resume() function.
Commit 9ba48db9f77c ("i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume") by Gaosheng missed disabling the interconnect in one case.
Fixes: bf225ed357c6 ("i2c: i2c-qcom-geni: Add interconnect support") Cc: Gaosheng Cui cuigaosheng1@huawei.com Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-qcom-geni.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/i2c/busses/i2c-qcom-geni.c +++ b/drivers/i2c/busses/i2c-qcom-geni.c @@ -990,8 +990,10 @@ static int __maybe_unused geni_i2c_runti return ret;
ret = clk_prepare_enable(gi2c->core_clk); - if (ret) + if (ret) { + geni_icc_disable(&gi2c->se); return ret; + }
ret = geni_se_resources_on(&gi2c->se); if (ret) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
commit 90574d2a675947858b47008df8d07f75ea50d0d0 upstream.
If the "tool->data" allocation fails then there is no need to call osnoise_free_top() and, in fact, doing so will lead to a NULL dereference.
Cc: stable@vger.kernel.org Cc: John Kacur jkacur@redhat.com Cc: "Luis Claudio R. Goncalves" lgoncalv@redhat.com Cc: Clark Williams williams@redhat.com Fixes: 1eceb2fc2ca5 ("rtla/osnoise: Add osnoise top mode") Link: https://lore.kernel.org/f964ed1f-64d2-4fde-ad3e-708331f8f358@stanley.mountai... Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/tracing/rtla/src/osnoise_top.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
--- a/tools/tracing/rtla/src/osnoise_top.c +++ b/tools/tracing/rtla/src/osnoise_top.c @@ -520,8 +520,10 @@ struct osnoise_tool *osnoise_init_top(st return NULL;
tool->data = osnoise_alloc_top(nr_cpus); - if (!tool->data) - goto out_err; + if (!tool->data) { + osnoise_destroy_tool(tool); + return NULL; + }
tool->params = params;
@@ -529,11 +531,6 @@ struct osnoise_tool *osnoise_init_top(st osnoise_top_handler, NULL);
return tool; - -out_err: - osnoise_free_top(tool->data); - osnoise_destroy_tool(tool); - return NULL; }
static int stop_tracing;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Kellermann max.kellermann@ionos.com
commit f71aa06398aabc2e3eaac25acdf3d62e0094ba70 upstream.
This fixes a NULL pointer dereference bug due to a data race which looks like this:
BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018 Workqueue: events_unbound netfs_rreq_write_to_cache_work RIP: 0010:cachefiles_prepare_write+0x30/0xa0 Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10 RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286 RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000 RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438 RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001 R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68 R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00 FS: 0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x1f/0x70 ? page_fault_oops+0x15d/0x440 ? search_module_extables+0xe/0x40 ? fixup_exception+0x22/0x2f0 ? exc_page_fault+0x5f/0x100 ? asm_exc_page_fault+0x22/0x30 ? cachefiles_prepare_write+0x30/0xa0 netfs_rreq_write_to_cache_work+0x135/0x2e0 process_one_work+0x137/0x2c0 worker_thread+0x2e9/0x400 ? __pfx_worker_thread+0x10/0x10 kthread+0xcc/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Modules linked in: CR2: 0000000000000008 ---[ end trace 0000000000000000 ]---
This happened because fscache_cookie_state_machine() was slow and was still running while another process invoked fscache_unuse_cookie(); this led to a fscache_cookie_lru_do_one() call, setting the FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by fscache_cookie_state_machine(), withdrawing the cookie via cachefiles_withdraw_cookie(), clearing cookie->cache_priv.
At the same time, yet another process invoked cachefiles_prepare_write(), which found a NULL pointer in this code line:
struct cachefiles_object *object = cachefiles_cres_object(cres);
The next line crashes, obviously:
struct cachefiles_cache *cache = object->volume->cache;
During cachefiles_prepare_write(), the "n_accesses" counter is non-zero (via fscache_begin_operation()). The cookie must not be withdrawn until it drops to zero.
The counter is checked by fscache_cookie_state_machine() before switching to FSCACHE_COOKIE_STATE_RELINQUISHING and FSCACHE_COOKIE_STATE_WITHDRAWING (in "case FSCACHE_COOKIE_STATE_FAILED"), but not for FSCACHE_COOKIE_STATE_LRU_DISCARDING ("case FSCACHE_COOKIE_STATE_ACTIVE").
This patch adds the missing check. With a non-zero access counter, the function returns and the next fscache_end_cookie_access() call will queue another fscache_cookie_state_machine() call to handle the still-pending FSCACHE_COOKIE_DO_LRU_DISCARD.
Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning") Signed-off-by: Max Kellermann max.kellermann@ionos.com Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/20240729162002.3436763-2-dhowells@redhat.com cc: Jeff Layton jlayton@kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: stable@vger.kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fscache/cookie.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -741,6 +741,10 @@ again_locked: spin_lock(&cookie->lock); } if (test_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags)) { + if (atomic_read(&cookie->n_accesses) != 0) + /* still being accessed: postpone it */ + break; + __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_LRU_DISCARDING); wake = true;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhen Lei thunder.leizhen@huawei.com
commit 379d9af3f3da2da1bbfa67baf1820c72a080d1f1 upstream.
The count increases only when a node is successfully added to the linked list.
Cc: stable@vger.kernel.org Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls") Signed-off-by: Zhen Lei thunder.leizhen@huawei.com Acked-by: Stephen Smalley stephen.smalley.work@gmail.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/selinux/avc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/security/selinux/avc.c +++ b/security/selinux/avc.c @@ -332,12 +332,12 @@ static int avc_add_xperms_decision(struc { struct avc_xperms_decision_node *dest_xpd;
- node->ae.xp_node->xp.len++; dest_xpd = avc_xperms_decision_alloc(src->used); if (!dest_xpd) return -ENOMEM; avc_copy_xperms_decision(&dest_xpd->xpd, src); list_add(&dest_xpd->xpd_list, &node->ae.xp_node->xpd_head); + node->ae.xp_node->xp.len++; return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Waiman Long longman@redhat.com
commit d75abd0d0bc29e6ebfebbf76d11b4067b35844af upstream.
The memory_failure_cpu structure is a per-cpu structure. Access to its content requires the use of get_cpu_var() to lock in the current CPU and disable preemption. The use of a regular spinlock_t for locking purpose is fine for a non-RT kernel.
Since the integration of RT spinlock support into the v5.15 kernel, a spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping lock in a preemption disabled context is illegal resulting in the following kind of warning.
[12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 [12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0 [12135.732252] preempt_count: 1, expected: 0 [12135.732255] RCU nest depth: 2, expected: 2 : [12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021 [12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred [12135.732433] Call Trace: [12135.732436] <TASK> [12135.732450] dump_stack_lvl+0x57/0x81 [12135.732461] __might_resched.cold+0xf4/0x12f [12135.732479] rt_spin_lock+0x4c/0x100 [12135.732491] memory_failure_queue+0x40/0xe0 [12135.732503] ghes_do_memory_failure+0x53/0x390 [12135.732516] ghes_do_proc.constprop.0+0x229/0x3e0 [12135.732575] ghes_proc+0xf9/0x1a0 [12135.732591] ghes_notify_hed+0x6a/0x150 [12135.732602] notifier_call_chain+0x43/0xb0 [12135.732626] blocking_notifier_call_chain+0x43/0x60 [12135.732637] acpi_ev_notify_dispatch+0x47/0x70 [12135.732648] acpi_os_execute_deferred+0x13/0x20 [12135.732654] process_one_work+0x41f/0x500 [12135.732695] worker_thread+0x192/0x360 [12135.732715] kthread+0x111/0x140 [12135.732733] ret_from_fork+0x29/0x50 [12135.732779] </TASK>
Fix it by using a raw_spinlock_t for locking instead.
Also move the pr_err() out of the lock critical section and after put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep with this call.
[longman@redhat.com: don't hold percpu ref across pr_err(), per Miaohe] Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant") Signed-off-by: Waiman Long longman@redhat.com Acked-by: Miaohe Lin linmiaohe@huawei.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Juri Lelli juri.lelli@redhat.com Cc: Len Brown len.brown@intel.com Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory-failure.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
--- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2208,7 +2208,7 @@ struct memory_failure_entry { struct memory_failure_cpu { DECLARE_KFIFO(fifo, struct memory_failure_entry, MEMORY_FAILURE_FIFO_SIZE); - spinlock_t lock; + raw_spinlock_t lock; struct work_struct work; };
@@ -2234,20 +2234,22 @@ void memory_failure_queue(unsigned long { struct memory_failure_cpu *mf_cpu; unsigned long proc_flags; + bool buffer_overflow; struct memory_failure_entry entry = { .pfn = pfn, .flags = flags, };
mf_cpu = &get_cpu_var(memory_failure_cpu); - spin_lock_irqsave(&mf_cpu->lock, proc_flags); - if (kfifo_put(&mf_cpu->fifo, entry)) + raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags); + buffer_overflow = !kfifo_put(&mf_cpu->fifo, entry); + if (!buffer_overflow) schedule_work_on(smp_processor_id(), &mf_cpu->work); - else + raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags); + put_cpu_var(memory_failure_cpu); + if (buffer_overflow) pr_err("buffer overflow when queuing memory failure at %#lx\n", pfn); - spin_unlock_irqrestore(&mf_cpu->lock, proc_flags); - put_cpu_var(memory_failure_cpu); } EXPORT_SYMBOL_GPL(memory_failure_queue);
@@ -2260,9 +2262,9 @@ static void memory_failure_work_func(str
mf_cpu = container_of(work, struct memory_failure_cpu, work); for (;;) { - spin_lock_irqsave(&mf_cpu->lock, proc_flags); + raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags); gotten = kfifo_get(&mf_cpu->fifo, &entry); - spin_unlock_irqrestore(&mf_cpu->lock, proc_flags); + raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags); if (!gotten) break; if (entry.flags & MF_SOFT_OFFLINE) @@ -2292,7 +2294,7 @@ static int __init memory_failure_init(vo
for_each_possible_cpu(cpu) { mf_cpu = &per_cpu(memory_failure_cpu, cpu); - spin_lock_init(&mf_cpu->lock); + raw_spin_lock_init(&mf_cpu->lock); INIT_KFIFO(mf_cpu->fifo); INIT_WORK(&mf_cpu->work, memory_failure_work_func); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naohiro Aota naohiro.aota@wdc.com
commit e30729d4bd4001881be4d1ad4332a5d4985398f8 upstream.
__btrfs_add_free_space_zoned() references and modifies bg's alloc_offset, ro, and zone_unusable, but without taking the lock. It is mostly safe because they monotonically increase (at least for now) and this function is mostly called by a transaction commit, which is serialized by itself.
Still, taking the lock is a safer and correct option and I'm going to add a change to reset zone_unusable while a block group is still alive. So, add locking around the operations.
Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Naohiro Aota naohiro.aota@wdc.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/free-space-cache.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
--- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2677,15 +2677,16 @@ static int __btrfs_add_free_space_zoned( u64 offset = bytenr - block_group->start; u64 to_free, to_unusable; int bg_reclaim_threshold = 0; - bool initial = ((size == block_group->length) && (block_group->alloc_offset == 0)); + bool initial; u64 reclaimable_unusable;
- WARN_ON(!initial && offset + size > block_group->zone_capacity); + spin_lock(&block_group->lock);
+ initial = ((size == block_group->length) && (block_group->alloc_offset == 0)); + WARN_ON(!initial && offset + size > block_group->zone_capacity); if (!initial) bg_reclaim_threshold = READ_ONCE(sinfo->bg_reclaim_threshold);
- spin_lock(&ctl->tree_lock); if (!used) to_free = size; else if (initial) @@ -2698,7 +2699,9 @@ static int __btrfs_add_free_space_zoned( to_free = offset + size - block_group->alloc_offset; to_unusable = size - to_free;
+ spin_lock(&ctl->tree_lock); ctl->free_space += to_free; + spin_unlock(&ctl->tree_lock); /* * If the block group is read-only, we should account freed space into * bytes_readonly. @@ -2707,11 +2710,8 @@ static int __btrfs_add_free_space_zoned( block_group->zone_unusable += to_unusable; WARN_ON(block_group->zone_unusable > block_group->length); } - spin_unlock(&ctl->tree_lock); if (!used) { - spin_lock(&block_group->lock); block_group->alloc_offset -= size; - spin_unlock(&block_group->lock); }
reclaimable_unusable = block_group->zone_unusable - @@ -2726,6 +2726,8 @@ static int __btrfs_add_free_space_zoned( btrfs_mark_bg_to_reclaim(block_group); }
+ spin_unlock(&block_group->lock); + return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
commit 008e2512dc5696ab2dc5bf264e98a9fe9ceb830e upstream.
[REPORT] There is a corruption report that btrfs refused to mount a fs that has overlapping dev extents:
BTRFS error (device sdc): dev extent devid 4 physical offset 14263979671552 overlap with previous dev extent end 14263980982272 BTRFS error (device sdc): failed to verify dev extents against chunks: -117 BTRFS error (device sdc): open_ctree failed
[CAUSE] The direct cause is very obvious, there is a bad dev extent item with incorrect length.
With btrfs check reporting two overlapping extents, the second one shows some clue on the cause:
ERROR: dev extent devid 4 offset 14263979671552 len 6488064 overlap with previous dev extent end 14263980982272 ERROR: dev extent devid 13 offset 2257707008000 len 6488064 overlap with previous dev extent end 2257707270144 ERROR: errors found in extent allocation tree or chunk allocation
The second one looks like a bitflip happened during new chunk allocation: hex(2257707008000) = 0x20da9d30000 hex(2257707270144) = 0x20da9d70000 diff = 0x00000040000
So it looks like a bitflip happened during new dev extent allocation, resulting the second overlap.
Currently we only do the dev-extent verification at mount time, but if the corruption is caused by memory bitflip, we really want to catch it before writing the corruption to the storage.
Furthermore the dev extent items has the following key definition:
(<device id> DEV_EXTENT <physical offset>)
Thus we can not just rely on the generic key order check to make sure there is no overlapping.
[ENHANCEMENT] Introduce dedicated dev extent checks, including:
- Fixed member checks * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3) * chunk_objectid should always be BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
- Alignment checks * chunk_offset should be aligned to sectorsize * length should be aligned to sectorsize * key.offset should be aligned to sectorsize
- Overlap checks If the previous key is also a dev-extent item, with the same device id, make sure we do not overlap with the previous dev extent.
Reported: Stefan N stefannnau@gmail.com Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=a... CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/tree-checker.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+)
--- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -1613,6 +1613,72 @@ static int check_inode_ref(struct extent return 0; }
+static int check_dev_extent_item(const struct extent_buffer *leaf, + const struct btrfs_key *key, + int slot, + struct btrfs_key *prev_key) +{ + struct btrfs_dev_extent *de; + const u32 sectorsize = leaf->fs_info->sectorsize; + + de = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent); + /* Basic fixed member checks. */ + if (unlikely(btrfs_dev_extent_chunk_tree(leaf, de) != + BTRFS_CHUNK_TREE_OBJECTID)) { + generic_err(leaf, slot, + "invalid dev extent chunk tree id, has %llu expect %llu", + btrfs_dev_extent_chunk_tree(leaf, de), + BTRFS_CHUNK_TREE_OBJECTID); + return -EUCLEAN; + } + if (unlikely(btrfs_dev_extent_chunk_objectid(leaf, de) != + BTRFS_FIRST_CHUNK_TREE_OBJECTID)) { + generic_err(leaf, slot, + "invalid dev extent chunk objectid, has %llu expect %llu", + btrfs_dev_extent_chunk_objectid(leaf, de), + BTRFS_FIRST_CHUNK_TREE_OBJECTID); + return -EUCLEAN; + } + /* Alignment check. */ + if (unlikely(!IS_ALIGNED(key->offset, sectorsize))) { + generic_err(leaf, slot, + "invalid dev extent key.offset, has %llu not aligned to %u", + key->offset, sectorsize); + return -EUCLEAN; + } + if (unlikely(!IS_ALIGNED(btrfs_dev_extent_chunk_offset(leaf, de), + sectorsize))) { + generic_err(leaf, slot, + "invalid dev extent chunk offset, has %llu not aligned to %u", + btrfs_dev_extent_chunk_objectid(leaf, de), + sectorsize); + return -EUCLEAN; + } + if (unlikely(!IS_ALIGNED(btrfs_dev_extent_length(leaf, de), + sectorsize))) { + generic_err(leaf, slot, + "invalid dev extent length, has %llu not aligned to %u", + btrfs_dev_extent_length(leaf, de), sectorsize); + return -EUCLEAN; + } + /* Overlap check with previous dev extent. */ + if (slot && prev_key->objectid == key->objectid && + prev_key->type == key->type) { + struct btrfs_dev_extent *prev_de; + u64 prev_len; + + prev_de = btrfs_item_ptr(leaf, slot - 1, struct btrfs_dev_extent); + prev_len = btrfs_dev_extent_length(leaf, prev_de); + if (unlikely(prev_key->offset + prev_len > key->offset)) { + generic_err(leaf, slot, + "dev extent overlap, prev offset %llu len %llu current offset %llu", + prev_key->objectid, prev_len, key->offset); + return -EUCLEAN; + } + } + return 0; +} + /* * Common point to switch the item-specific validation. */ @@ -1648,6 +1714,9 @@ static int check_leaf_item(struct extent case BTRFS_DEV_ITEM_KEY: ret = check_dev_item(leaf, key, slot); break; + case BTRFS_DEV_EXTENT_KEY: + ret = check_dev_extent_item(leaf, key, slot, prev_key); + break; case BTRFS_INODE_ITEM_KEY: ret = check_inode_item(leaf, key, slot); break;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
commit 0573a1e2ea7e35bff08944a40f1adf2bb35cea61 upstream.
Missing validation ...
Checked libdrm and it clears all the structs, so we should be safe to just check everything.
Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit c6b86421f1f9ddf9d706f2453159813ee39d0cf9) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -656,16 +656,24 @@ int amdgpu_ctx_ioctl(struct drm_device *
switch (args->in.op) { case AMDGPU_CTX_OP_ALLOC_CTX: + if (args->in.flags) + return -EINVAL; r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, &id); args->out.alloc.ctx_id = id; break; case AMDGPU_CTX_OP_FREE_CTX: + if (args->in.flags) + return -EINVAL; r = amdgpu_ctx_free(fpriv, id); break; case AMDGPU_CTX_OP_QUERY_STATE: + if (args->in.flags) + return -EINVAL; r = amdgpu_ctx_query(adev, fpriv, id, &args->out); break; case AMDGPU_CTX_OP_QUERY_STATE2: + if (args->in.flags) + return -EINVAL; r = amdgpu_ctx_query2(adev, fpriv, id, &args->out); break; case AMDGPU_CTX_OP_GET_STABLE_PSTATE:
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Al Viro viro@zeniv.linux.org.uk
commit 046667c4d3196938e992fba0dfcde570aa85cd0e upstream.
we are *not* guaranteed that anything past the terminating NUL is mapped (let alone initialized with anything sane).
Fixes: 0dea116876ee ("cgroup: implement eventfd-based generic API for notifications") Cc: stable@vger.kernel.org Cc: Andrew Morton akpm@linux-foundation.org Acked-by: Michal Hocko mhocko@suse.com Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memcontrol.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4857,9 +4857,12 @@ static ssize_t memcg_write_event_control buf = endp + 1;
cfd = simple_strtoul(buf, &endp, 10); - if ((*endp != ' ') && (*endp != '\0')) + if (*endp == '\0') + buf = endp; + else if (*endp == ' ') + buf = endp + 1; + else return -EINVAL; - buf = endp + 1;
event = kzalloc(sizeof(*event), GFP_KERNEL); if (!event)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit e414a304f2c5368a84f03ad34d29b89f965a33c9 upstream.
This needs to be set as well if the IB uses atomics.
Reviewed-by: Leo Liu leo.liu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 35c628774e50b3784c59e8ca7973f03bcb067132) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c @@ -541,11 +541,11 @@ void jpeg_v2_0_dec_ring_emit_ib(struct a
amdgpu_ring_write(ring, PACKETJ(mmUVD_LMI_JRBC_IB_VMID_INTERNAL_OFFSET, 0, 0, PACKETJ_TYPE0)); - amdgpu_ring_write(ring, (vmid | (vmid << 4))); + amdgpu_ring_write(ring, (vmid | (vmid << 4) | (vmid << 8)));
amdgpu_ring_write(ring, PACKETJ(mmUVD_LMI_JPEG_VMID_INTERNAL_OFFSET, 0, 0, PACKETJ_TYPE0)); - amdgpu_ring_write(ring, (vmid | (vmid << 4))); + amdgpu_ring_write(ring, (vmid | (vmid << 4) | (vmid << 8)));
amdgpu_ring_write(ring, PACKETJ(mmUVD_LMI_JRBC_IB_64BIT_BAR_LOW_INTERNAL_OFFSET, 0, 0, PACKETJ_TYPE0));
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Claudio Imbrenda imbrenda@linux.ibm.com
[ Upstream commit cff59d8631e1409ffdd22d9d717e15810181b32c ]
The return value uv_set_shared() and uv_remove_shared() (which are wrappers around the share() function) is not always checked. The system integrity of a protected guest depends on the Share and Unshare UVCs being successful. This means that any caller that fails to check the return value will compromise the security of the protected guest.
No code path that would lead to such violation of the security guarantees is currently exercised, since all the areas that are shared never get unshared during the lifetime of the system. This might change and become an issue in the future.
The Share and Unshare UVCs can only fail in case of hypervisor misbehaviour (either a bug or malicious behaviour). In such cases there is no reasonable way forward, and the system needs to panic.
This patch replaces the return at the end of the share() function with a panic, to guarantee system integrity.
Fixes: 5abb9351dfd9 ("s390/uv: introduce guest side ultravisor code") Signed-off-by: Claudio Imbrenda imbrenda@linux.ibm.com Reviewed-by: Christian Borntraeger borntraeger@linux.ibm.com Reviewed-by: Steffen Eiden seiden@linux.ibm.com Reviewed-by: Janosch Frank frankja@linux.ibm.com Link: https://lore.kernel.org/r/20240801112548.85303-1-imbrenda@linux.ibm.com Message-ID: 20240801112548.85303-1-imbrenda@linux.ibm.com [frankja@linux.ibm.com: Fixed up patch subject] Signed-off-by: Janosch Frank frankja@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/include/asm/uv.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h index be3ef9dd69726..6abcb46a8dfe2 100644 --- a/arch/s390/include/asm/uv.h +++ b/arch/s390/include/asm/uv.h @@ -387,7 +387,10 @@ static inline int share(unsigned long addr, u16 cmd)
if (!uv_call(0, (u64)&uvcb)) return 0; - return -EINVAL; + pr_err("%s UVC failed (rc: 0x%x, rrc: 0x%x), possible hypervisor bug.\n", + uvcb.header.cmd == UVC_CMD_SET_SHARED_ACCESS ? "Share" : "Unshare", + uvcb.header.rc, uvcb.header.rrc); + panic("System security cannot be guaranteed unless the system panics now.\n"); }
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leon Hwang leon.hwang@linux.dev
[ Upstream commit fdad456cbcca739bae1849549c7a999857c56f88 ]
The commit f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed a NULL pointer dereference panic, but didn't fix the issue that fails to update attached freplace prog to prog_array map.
Since commit 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog and its target prog are able to tail call each other.
And the commit 3aac1ead5eb6 ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL after attaching freplace prog to its target prog.
After loading freplace the prog_array's owner type is BPF_PROG_TYPE_SCHED_CLS. Then, after attaching freplace its prog->aux->dst_prog is NULL. Then, while updating freplace in prog_array the bpf_prog_map_compatible() incorrectly returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. After this patch the resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS and update to prog_array can succeed.
Fixes: f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Martin KaFai Lau martin.lau@kernel.org Acked-by: Yonghong Song yonghong.song@linux.dev Signed-off-by: Leon Hwang leon.hwang@linux.dev Link: https://lore.kernel.org/r/20240728114612.48486-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/bpf_verifier.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 6a524c5462a6f..131adc98080b8 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -645,8 +645,8 @@ static inline u32 type_flag(u32 type) /* only use after check_attach_btf_id() */ static inline enum bpf_prog_type resolve_prog_type(const struct bpf_prog *prog) { - return (prog->type == BPF_PROG_TYPE_EXT && prog->aux->dst_prog) ? - prog->aux->dst_prog->type : prog->type; + return (prog->type == BPF_PROG_TYPE_EXT && prog->aux->saved_dst_prog_type) ? + prog->aux->saved_dst_prog_type : prog->type; }
static inline bool bpf_prog_check_recur(const struct bpf_prog *prog)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryusuke Konishi konishi.ryusuke@gmail.com
[ Upstream commit 602ce7b8e1343b19c0cf93a3dd1926838ac5a1cc ]
If nilfs2 reads a corrupted disk image and its DAT metadata file contains invalid lifetime data for a virtual block number, a kernel warning can be generated by the WARN_ON check in nilfs_dat_commit_end() and can panic if the kernel is booted with panic_on_warn.
This patch avoids the issue with a sanity check that treats it as an error.
Since error return is not allowed in the execution phase of nilfs_dat_commit_end(), this inserts that sanity check in nilfs_dat_prepare_end(), which prepares for nilfs_dat_commit_end().
As the error code, -EINVAL is returned to notify bmap layer of the metadata corruption. When the bmap layer sees this code, it handles the abnormal situation and replaces the return code with -EIO as it should.
Link: https://lkml.kernel.org/r/000000000000154d2c05e9ec7df6@google.com Link: https://lkml.kernel.org/r/20230127132202.6083-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+cbff7a52b6f99059e67f@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nilfs2/dat.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c index 242cc36bf1e97..351010828d883 100644 --- a/fs/nilfs2/dat.c +++ b/fs/nilfs2/dat.c @@ -158,6 +158,7 @@ void nilfs_dat_commit_start(struct inode *dat, struct nilfs_palloc_req *req, int nilfs_dat_prepare_end(struct inode *dat, struct nilfs_palloc_req *req) { struct nilfs_dat_entry *entry; + __u64 start; sector_t blocknr; void *kaddr; int ret; @@ -169,6 +170,7 @@ int nilfs_dat_prepare_end(struct inode *dat, struct nilfs_palloc_req *req) kaddr = kmap_atomic(req->pr_entry_bh->b_page); entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr, req->pr_entry_bh, kaddr); + start = le64_to_cpu(entry->de_start); blocknr = le64_to_cpu(entry->de_blocknr); kunmap_atomic(kaddr);
@@ -179,6 +181,15 @@ int nilfs_dat_prepare_end(struct inode *dat, struct nilfs_palloc_req *req) return ret; } } + if (unlikely(start > nilfs_mdt_cno(dat))) { + nilfs_err(dat->i_sb, + "vblocknr = %llu has abnormal lifetime: start cno (= %llu) > current cno (= %llu)", + (unsigned long long)req->pr_entry_nr, + (unsigned long long)start, + (unsigned long long)nilfs_mdt_cno(dat)); + nilfs_dat_abort_entry(dat, req); + return -EINVAL; + }
return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
[ Upstream commit 62913ae96de747091c4dacd06d158e7729c1a76d ]
The generic bmap() function exported by the VFS takes locks and does checks that are not necessary for the journal inode. So allow the file system to set a journal-optimized bmap function in journal->j_bmap.
Reported-by: syzbot+9543479984ae9e576000@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=e4aaa78795e490421c79f76ec3679006c8ff4cf... Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/super.c | 23 +++++++++++++++++++++++ fs/jbd2/journal.c | 9 ++++++--- include/linux/jbd2.h | 8 ++++++++ 3 files changed, 37 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 274542d869d0c..3db39758486e9 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5752,6 +5752,28 @@ static struct inode *ext4_get_journal_inode(struct super_block *sb, return journal_inode; }
+static int ext4_journal_bmap(journal_t *journal, sector_t *block) +{ + struct ext4_map_blocks map; + int ret; + + if (journal->j_inode == NULL) + return 0; + + map.m_lblk = *block; + map.m_len = 1; + ret = ext4_map_blocks(NULL, journal->j_inode, &map, 0); + if (ret <= 0) { + ext4_msg(journal->j_inode->i_sb, KERN_CRIT, + "journal bmap failed: block %llu ret %d\n", + *block, ret); + jbd2_journal_abort(journal, ret ? ret : -EIO); + return ret; + } + *block = map.m_pblk; + return 0; +} + static journal_t *ext4_get_journal(struct super_block *sb, unsigned int journal_inum) { @@ -5772,6 +5794,7 @@ static journal_t *ext4_get_journal(struct super_block *sb, return NULL; } journal->j_private = sb; + journal->j_bmap = ext4_journal_bmap; ext4_init_journal_params(sb, journal); return journal; } diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index c8d59f7c47453..d3d3ea439d29b 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -971,10 +971,13 @@ int jbd2_journal_bmap(journal_t *journal, unsigned long blocknr, { int err = 0; unsigned long long ret; - sector_t block = 0; + sector_t block = blocknr;
- if (journal->j_inode) { - block = blocknr; + if (journal->j_bmap) { + err = journal->j_bmap(journal, &block); + if (err == 0) + *retp = block; + } else if (journal->j_inode) { ret = bmap(journal->j_inode, &block);
if (ret || !block) { diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index e301d323108d1..5bf7ada754d79 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1302,6 +1302,14 @@ struct journal_s struct buffer_head *bh, enum passtype pass, int off, tid_t expected_commit_id); + + /** + * @j_bmap: + * + * Bmap function that should be used instead of the generic + * VFS bmap function. + */ + int (*j_bmap)(struct journal_s *journal, sector_t *block); };
#define jbd2_might_wait_for_commit(j) \
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Orlov ivan.orlov0322@gmail.com
[ Upstream commit 707823e7f22f3864ddc7d85e8e9b614afe4f1b16 ]
KASAN reported the following issue: [ 36.825817][ T5923] BUG: KASAN: wild-memory-access in v9fs_get_acl+0x1a4/0x390 [ 36.827479][ T5923] Write of size 4 at addr 9fffeb37f97f1c00 by task syz-executor798/5923 [ 36.829303][ T5923] [ 36.829846][ T5923] CPU: 0 PID: 5923 Comm: syz-executor798 Not tainted 6.2.0-syzkaller-18302-g596b6b709632 #0 [ 36.832110][ T5923] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023 [ 36.834464][ T5923] Call trace: [ 36.835196][ T5923] dump_backtrace+0x1c8/0x1f4 [ 36.836229][ T5923] show_stack+0x2c/0x3c [ 36.837100][ T5923] dump_stack_lvl+0xd0/0x124 [ 36.838103][ T5923] print_report+0xe4/0x4c0 [ 36.839068][ T5923] kasan_report+0xd4/0x130 [ 36.840052][ T5923] kasan_check_range+0x264/0x2a4 [ 36.841199][ T5923] __kasan_check_write+0x2c/0x3c [ 36.842216][ T5923] v9fs_get_acl+0x1a4/0x390 [ 36.843232][ T5923] v9fs_mount+0x77c/0xa5c [ 36.844163][ T5923] legacy_get_tree+0xd4/0x16c [ 36.845173][ T5923] vfs_get_tree+0x90/0x274 [ 36.846137][ T5923] do_new_mount+0x25c/0x8c8 [ 36.847066][ T5923] path_mount+0x590/0xe58 [ 36.848147][ T5923] __arm64_sys_mount+0x45c/0x594 [ 36.849273][ T5923] invoke_syscall+0x98/0x2c0 [ 36.850421][ T5923] el0_svc_common+0x138/0x258 [ 36.851397][ T5923] do_el0_svc+0x64/0x198 [ 36.852398][ T5923] el0_svc+0x58/0x168 [ 36.853224][ T5923] el0t_64_sync_handler+0x84/0xf0 [ 36.854293][ T5923] el0t_64_sync+0x190/0x194
Calling '__v9fs_get_acl' method in 'v9fs_get_acl' creates the following chain of function calls:
__v9fs_get_acl v9fs_fid_get_acl v9fs_fid_xattr_get p9_client_xattrwalk
Function p9_client_xattrwalk accepts a pointer to u64-typed variable attr_size and puts some u64 value into it. However, after the executing the p9_client_xattrwalk, in some circumstances we assign the value of u64-typed variable 'attr_size' to the variable 'retval', which we will return. However, the type of 'retval' is ssize_t, and if the value of attr_size is larger than SSIZE_MAX, we will face the signed type overflow. If the overflow occurs, the result of v9fs_fid_xattr_get may be negative, but not classified as an error. When we try to allocate an acl with 'broken' size we receive an error, but don't process it. When we try to free this acl, we face the 'wild-memory-access' error (because it wasn't allocated).
This patch will add new condition to the 'v9fs_fid_xattr_get' function, so it will return an EOVERFLOW error if the 'attr_size' is larger than SSIZE_MAX.
In this version of the patch I simplified the condition.
In previous (v2) version of the patch I removed explicit type conversion and added separate condition to check the possible overflow and return an error (in v1 version I've just modified the existing condition).
Tested via syzkaller.
Suggested-by: Christian Schoenebeck linux_oss@crudebyte.com Reported-by: syzbot+cb1d16facb3cc90de5fb@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=fbbef66d9e4d096242f3617de5d14d12705b465... Signed-off-by: Ivan Orlov ivan.orlov0322@gmail.com Reviewed-by: Christian Schoenebeck linux_oss@crudebyte.com Signed-off-by: Eric Van Hensbergen ericvh@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/9p/xattr.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/9p/xattr.c b/fs/9p/xattr.c index 3b9aa61de8c2d..2aac0e8c4835e 100644 --- a/fs/9p/xattr.c +++ b/fs/9p/xattr.c @@ -34,10 +34,12 @@ ssize_t v9fs_fid_xattr_get(struct p9_fid *fid, const char *name, return retval; } if (attr_size > buffer_size) { - if (!buffer_size) /* request to get the attr_size */ - retval = attr_size; - else + if (buffer_size) retval = -ERANGE; + else if (attr_size > SSIZE_MAX) + retval = -EOVERFLOW; + else /* request to get the attr_size */ + retval = attr_size; } else { iov_iter_truncate(&to, attr_size); retval = p9_client_read(attr_fid, 0, &to, &err);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 7397031622e05ca206e2d674ec199d6bb66fc9ba ]
nilfs_btree_assign_p() and nilfs_direct_assign_p() are not initializing "struct nilfs_binfo_dat"->bi_pad field, causing uninit-value reports when being passed to CRC function.
Link: https://lkml.kernel.org/r/20230326152146.15872-1-konishi.ryusuke@gmail.com Reported-by: syzbot syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b Reported-by: Dipanjan Das mail.dipanjan.das@gmail.com Link: https://lkml.kernel.org/r/CANX2M5bVbzRi6zH3PTcNE_31TzerstOXUa9Bay4E6y6dX23_p... Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: Alexander Potapenko glider@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nilfs2/btree.c | 1 + fs/nilfs2/direct.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c index bd24a33fc72e1..42617080a8384 100644 --- a/fs/nilfs2/btree.c +++ b/fs/nilfs2/btree.c @@ -2224,6 +2224,7 @@ static int nilfs_btree_assign_p(struct nilfs_bmap *btree, /* on-disk format */ binfo->bi_dat.bi_blkoff = cpu_to_le64(key); binfo->bi_dat.bi_level = level; + memset(binfo->bi_dat.bi_pad, 0, sizeof(binfo->bi_dat.bi_pad));
return 0; } diff --git a/fs/nilfs2/direct.c b/fs/nilfs2/direct.c index 8f802f7b0840b..893ab36824cc2 100644 --- a/fs/nilfs2/direct.c +++ b/fs/nilfs2/direct.c @@ -319,6 +319,7 @@ static int nilfs_direct_assign_p(struct nilfs_bmap *direct,
binfo->bi_dat.bi_blkoff = cpu_to_le64(key); binfo->bi_dat.bi_level = 0; + memset(binfo->bi_dat.bi_pad, 0, sizeof(binfo->bi_dat.bi_pad));
return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Orlov ivan.orlov0322@gmail.com
[ Upstream commit 2ce0bdfebc74f6cbd4e97a4e767d505a81c38cf2 ]
Syzkaller reported the following issue:
kernel BUG at mm/khugepaged.c:1823! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 5097 Comm: syz-executor220 Not tainted 6.2.0-syzkaller-13154-g857f1268a591 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/16/2023 RIP: 0010:collapse_file mm/khugepaged.c:1823 [inline] RIP: 0010:hpage_collapse_scan_file+0x67c8/0x7580 mm/khugepaged.c:2233 Code: 00 00 89 de e8 c9 66 a3 ff 31 ff 89 de e8 c0 66 a3 ff 45 84 f6 0f 85 28 0d 00 00 e8 22 64 a3 ff e9 dc f7 ff ff e8 18 64 a3 ff <0f> 0b f3 0f 1e fa e8 0d 64 a3 ff e9 93 f6 ff ff f3 0f 1e fa 4c 89 RSP: 0018:ffffc90003dff4e0 EFLAGS: 00010093 RAX: ffffffff81e95988 RBX: 00000000000001c1 RCX: ffff8880205b3a80 RDX: 0000000000000000 RSI: 00000000000001c0 RDI: 00000000000001c1 RBP: ffffc90003dff830 R08: ffffffff81e90e67 R09: fffffbfff1a433c3 R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000000 R13: ffffc90003dff6c0 R14: 00000000000001c0 R15: 0000000000000000 FS: 00007fdbae5ee700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdbae6901e0 CR3: 000000007b2dd000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> madvise_collapse+0x721/0xf50 mm/khugepaged.c:2693 madvise_vma_behavior mm/madvise.c:1086 [inline] madvise_walk_vmas mm/madvise.c:1260 [inline] do_madvise+0x9e5/0x4680 mm/madvise.c:1439 __do_sys_madvise mm/madvise.c:1452 [inline] __se_sys_madvise mm/madvise.c:1450 [inline] __x64_sys_madvise+0xa5/0xb0 mm/madvise.c:1450 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
The xas_store() call during page cache scanning can potentially translate 'xas' into the error state (with the reproducer provided by the syzkaller the error code is -ENOMEM). However, there are no further checks after the 'xas_store', and the next call of 'xas_next' at the start of the scanning cycle doesn't increase the xa_index, and the issue occurs.
This patch will add the xarray state error checking after the xas_store() and the corresponding result error code.
Tested via syzbot.
[akpm@linux-foundation.org: update include/trace/events/huge_memory.h's SCAN_STATUS] Link: https://lkml.kernel.org/r/20230329145330.23191-1-ivan.orlov0322@gmail.com Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959f... Signed-off-by: Ivan Orlov ivan.orlov0322@gmail.com Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com Tested-by: Zach O'Keefe zokeefe@google.com Cc: Yang Shi shy828301@gmail.com Cc: Himadri Pandya himadrispandya@gmail.com Cc: Ivan Orlov ivan.orlov0322@gmail.com Cc: Shuah Khan skhan@linuxfoundation.org Cc: Song Liu songliubraving@fb.com Cc: Rik van Riel riel@surriel.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Johannes Weiner hannes@cmpxchg.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/trace/events/huge_memory.h | 3 ++- mm/khugepaged.c | 20 ++++++++++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 760455dfa8600..01591e7995235 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -36,7 +36,8 @@ EM( SCAN_ALLOC_HUGE_PAGE_FAIL, "alloc_huge_page_failed") \ EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \ EM( SCAN_TRUNCATED, "truncated") \ - EMe(SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ + EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ + EMe(SCAN_STORE_FAILED, "store_failed")
#undef EM #undef EMe diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 65bd0b105266a..085fca1fa27af 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -55,6 +55,7 @@ enum scan_result { SCAN_CGROUP_CHARGE_FAIL, SCAN_TRUNCATED, SCAN_PAGE_HAS_PRIVATE, + SCAN_STORE_FAILED, };
#define CREATE_TRACE_POINTS @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, goto xa_locked; } xas_store(&xas, hpage); + if (xas_error(&xas)) { + /* revert shmem_charge performed + * in the previous condition + */ + mapping->nrpages--; + shmem_uncharge(mapping->host, 1); + result = SCAN_STORE_FAILED; + goto xa_locked; + } nr_none++; continue; } @@ -1991,6 +2001,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
/* Finally, replace with the new page. */ xas_store(&xas, hpage); + /* We can't get an ENOMEM here (because the allocation happened before) + * but let's check for errors (XArray implementation can be + * changed in the future) + */ + WARN_ON_ONCE(xas_error(&xas)); continue; out_unlock: unlock_page(page); @@ -2028,6 +2043,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, /* Join all the small entries into a single multi-index entry */ xas_set_order(&xas, start, HPAGE_PMD_ORDER); xas_store(&xas, hpage); + /* Here we can't get an ENOMEM (because entries were + * previously allocated) But let's check for errors + * (XArray implementation can be changed in the future) + */ + WARN_ON_ONCE(xas_error(&xas)); xa_locked: xas_unlock_irq(&xas); xa_unlocked:
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrii Nakryiko andrii@kernel.org
[ Upstream commit 4294a0a7ab6282c3d92f03de84e762dda993c93d ]
kernel/bpf/verifier.c file is large and growing larger all the time. So it's good to start splitting off more or less self-contained parts into separate files to keep source code size (somewhat) somewhat under control.
This patch is a one step in this direction, moving some of BPF verifier log routines into a separate kernel/bpf/log.c. Right now it's most low-level and isolated routines to append data to log, reset log to previous position, etc. Eventually we could probably move verifier state printing logic here as well, but this patch doesn't attempt to do that yet.
Subsequent patches will add more logic to verifier log management, so having basics in a separate file will make sure verifier.c doesn't grow more with new changes.
Signed-off-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Lorenz Bauer lmb@isovalent.com Link: https://lore.kernel.org/bpf/20230406234205.323208-2-andrii@kernel.org Stable-dep-of: cff36398bd4c ("bpf: drop unnecessary user-triggerable WARN_ONCE in verifierl log") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/bpf_verifier.h | 19 +++----- kernel/bpf/Makefile | 3 +- kernel/bpf/log.c | 85 ++++++++++++++++++++++++++++++++++++ kernel/bpf/verifier.c | 69 ----------------------------- 4 files changed, 94 insertions(+), 82 deletions(-) create mode 100644 kernel/bpf/log.c
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 131adc98080b8..33b073deb8c17 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -445,11 +445,6 @@ struct bpf_verifier_log { u32 len_total; };
-static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log) -{ - return log->len_used >= log->len_total - 1; -} - #define BPF_LOG_LEVEL1 1 #define BPF_LOG_LEVEL2 2 #define BPF_LOG_STATS 4 @@ -459,6 +454,11 @@ static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log) #define BPF_LOG_MIN_ALIGNMENT 8U #define BPF_LOG_ALIGNMENT 40U
+static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log) +{ + return log->len_used >= log->len_total - 1; +} + static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log) { return log && @@ -466,13 +466,6 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log) log->level == BPF_LOG_KERNEL); }
-static inline bool -bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log) -{ - return log->len_total >= 128 && log->len_total <= UINT_MAX >> 2 && - log->level && log->ubuf && !(log->level & ~BPF_LOG_MASK); -} - #define BPF_MAX_SUBPROGS 256
struct bpf_subprog_info { @@ -556,12 +549,14 @@ struct bpf_verifier_env { char type_str_buf[TYPE_STR_BUF_LEN]; };
+bool bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log); __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt, va_list args); __printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env, const char *fmt, ...); __printf(2, 3) void bpf_log(struct bpf_verifier_log *log, const char *fmt, ...); +void bpf_vlog_reset(struct bpf_verifier_log *log, u32 new_pos);
static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env) { diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 341c94f208f4c..5b86ea9f09c46 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -6,7 +6,8 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse endif CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
-obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o +obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o +obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c new file mode 100644 index 0000000000000..920061e38d2e1 --- /dev/null +++ b/kernel/bpf/log.c @@ -0,0 +1,85 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com + * Copyright (c) 2016 Facebook + * Copyright (c) 2018 Covalent IO, Inc. http://covalent.io + */ +#include <uapi/linux/btf.h> +#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/bpf.h> +#include <linux/bpf_verifier.h> + +bool bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log) +{ + return log->len_total >= 128 && log->len_total <= UINT_MAX >> 2 && + log->level && log->ubuf && !(log->level & ~BPF_LOG_MASK); +} + +void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt, + va_list args) +{ + unsigned int n; + + n = vscnprintf(log->kbuf, BPF_VERIFIER_TMP_LOG_SIZE, fmt, args); + + WARN_ONCE(n >= BPF_VERIFIER_TMP_LOG_SIZE - 1, + "verifier log line truncated - local buffer too short\n"); + + if (log->level == BPF_LOG_KERNEL) { + bool newline = n > 0 && log->kbuf[n - 1] == '\n'; + + pr_err("BPF: %s%s", log->kbuf, newline ? "" : "\n"); + return; + } + + n = min(log->len_total - log->len_used - 1, n); + log->kbuf[n] = '\0'; + if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1)) + log->len_used += n; + else + log->ubuf = NULL; +} + +void bpf_vlog_reset(struct bpf_verifier_log *log, u32 new_pos) +{ + char zero = 0; + + if (!bpf_verifier_log_needed(log)) + return; + + log->len_used = new_pos; + if (put_user(zero, log->ubuf + new_pos)) + log->ubuf = NULL; +} + +/* log_level controls verbosity level of eBPF verifier. + * bpf_verifier_log_write() is used to dump the verification trace to the log, + * so the user can figure out what's wrong with the program + */ +__printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env, + const char *fmt, ...) +{ + va_list args; + + if (!bpf_verifier_log_needed(&env->log)) + return; + + va_start(args, fmt); + bpf_verifier_vlog(&env->log, fmt, args); + va_end(args); +} +EXPORT_SYMBOL_GPL(bpf_verifier_log_write); + +__printf(2, 3) void bpf_log(struct bpf_verifier_log *log, + const char *fmt, ...) +{ + va_list args; + + if (!bpf_verifier_log_needed(log)) + return; + + va_start(args, fmt); + bpf_verifier_vlog(log, fmt, args); + va_end(args); +} +EXPORT_SYMBOL_GPL(bpf_log); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 8973d3c9597ce..4efa50eb07d72 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -291,61 +291,6 @@ find_linfo(const struct bpf_verifier_env *env, u32 insn_off) return &linfo[i - 1]; }
-void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt, - va_list args) -{ - unsigned int n; - - n = vscnprintf(log->kbuf, BPF_VERIFIER_TMP_LOG_SIZE, fmt, args); - - WARN_ONCE(n >= BPF_VERIFIER_TMP_LOG_SIZE - 1, - "verifier log line truncated - local buffer too short\n"); - - if (log->level == BPF_LOG_KERNEL) { - bool newline = n > 0 && log->kbuf[n - 1] == '\n'; - - pr_err("BPF: %s%s", log->kbuf, newline ? "" : "\n"); - return; - } - - n = min(log->len_total - log->len_used - 1, n); - log->kbuf[n] = '\0'; - if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1)) - log->len_used += n; - else - log->ubuf = NULL; -} - -static void bpf_vlog_reset(struct bpf_verifier_log *log, u32 new_pos) -{ - char zero = 0; - - if (!bpf_verifier_log_needed(log)) - return; - - log->len_used = new_pos; - if (put_user(zero, log->ubuf + new_pos)) - log->ubuf = NULL; -} - -/* log_level controls verbosity level of eBPF verifier. - * bpf_verifier_log_write() is used to dump the verification trace to the log, - * so the user can figure out what's wrong with the program - */ -__printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env, - const char *fmt, ...) -{ - va_list args; - - if (!bpf_verifier_log_needed(&env->log)) - return; - - va_start(args, fmt); - bpf_verifier_vlog(&env->log, fmt, args); - va_end(args); -} -EXPORT_SYMBOL_GPL(bpf_verifier_log_write); - __printf(2, 3) static void verbose(void *private_data, const char *fmt, ...) { struct bpf_verifier_env *env = private_data; @@ -359,20 +304,6 @@ __printf(2, 3) static void verbose(void *private_data, const char *fmt, ...) va_end(args); }
-__printf(2, 3) void bpf_log(struct bpf_verifier_log *log, - const char *fmt, ...) -{ - va_list args; - - if (!bpf_verifier_log_needed(log)) - return; - - va_start(args, fmt); - bpf_verifier_vlog(log, fmt, args); - va_end(args); -} -EXPORT_SYMBOL_GPL(bpf_log); - static const char *ltrim(const char *s) { while (isspace(*s))
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrii Nakryiko andrii@kernel.org
[ Upstream commit cff36398bd4c7d322d424433db437f3c3391c491 ]
It's trivial for user to trigger "verifier log line truncated" warning, as verifier has a fixed-sized buffer of 1024 bytes (as of now), and there are at least two pieces of user-provided information that can be output through this buffer, and both can be arbitrarily sized by user: - BTF names; - BTF.ext source code lines strings.
Verifier log buffer should be properly sized for typical verifier state output. But it's sort-of expected that this buffer won't be long enough in some circumstances. So let's drop the check. In any case code will work correctly, at worst truncating a part of a single line output.
Reported-by: syzbot+8b2a08dfbd25fd933d75@syzkaller.appspotmail.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/r/20230516180409.3549088-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/log.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c index 920061e38d2e1..cd1b7113fbfd0 100644 --- a/kernel/bpf/log.c +++ b/kernel/bpf/log.c @@ -22,9 +22,6 @@ void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
n = vscnprintf(log->kbuf, BPF_VERIFIER_TMP_LOG_SIZE, fmt, args);
- WARN_ONCE(n >= BPF_VERIFIER_TMP_LOG_SIZE - 1, - "verifier log line truncated - local buffer too short\n"); - if (log->level == BPF_LOG_KERNEL) { bool newline = n > 0 && log->kbuf[n - 1] == '\n';
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit 8ce8849dd1e78dadcee0ec9acbd259d239b7069f ]
posix_timer_add() tries to allocate a posix timer ID by starting from the cached ID which was stored by the last successful allocation.
This is done in a loop searching the ID space for a free slot one by one. The loop has to terminate when the search wrapped around to the starting point.
But that's racy vs. establishing the starting point. That is read out lockless, which leads to the following problem:
CPU0 CPU1 posix_timer_add() start = sig->posix_timer_id; lock(hash_lock); ... posix_timer_add() if (++sig->posix_timer_id < 0) start = sig->posix_timer_id; sig->posix_timer_id = 0;
So CPU1 can observe a negative start value, i.e. -1, and the loop break never happens because the condition can never be true:
if (sig->posix_timer_id == start) break;
While this is unlikely to ever turn into an endless loop as the ID space is huge (INT_MAX), the racy read of the start value caught the attention of KCSAN and Dmitry unearthed that incorrectness.
Rewrite it so that all id operations are under the hash lock.
Reported-by: syzbot+5c54bd3eb218bb595aa9@syzkaller.appspotmail.com Reported-by: Dmitry Vyukov dvyukov@google.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Frederic Weisbecker frederic@kernel.org Link: https://lore.kernel.org/r/87bkhzdn6g.ffs@tglx Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sched/signal.h | 2 +- kernel/time/posix-timers.c | 31 ++++++++++++++++++------------- 2 files changed, 19 insertions(+), 14 deletions(-)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 20099268fa257..669e8cff40c74 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -135,7 +135,7 @@ struct signal_struct { #ifdef CONFIG_POSIX_TIMERS
/* POSIX.1b Interval Timers */ - int posix_timer_id; + unsigned int next_posix_timer_id; struct list_head posix_timers;
/* ITIMER_REAL timer for the process */ diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index ed3c4a9543982..2d6cf93ca370a 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -140,25 +140,30 @@ static struct k_itimer *posix_timer_by_id(timer_t id) static int posix_timer_add(struct k_itimer *timer) { struct signal_struct *sig = current->signal; - int first_free_id = sig->posix_timer_id; struct hlist_head *head; - int ret = -ENOENT; + unsigned int cnt, id;
- do { + /* + * FIXME: Replace this by a per signal struct xarray once there is + * a plan to handle the resulting CRIU regression gracefully. + */ + for (cnt = 0; cnt <= INT_MAX; cnt++) { spin_lock(&hash_lock); - head = &posix_timers_hashtable[hash(sig, sig->posix_timer_id)]; - if (!__posix_timers_find(head, sig, sig->posix_timer_id)) { + id = sig->next_posix_timer_id; + + /* Write the next ID back. Clamp it to the positive space */ + sig->next_posix_timer_id = (id + 1) & INT_MAX; + + head = &posix_timers_hashtable[hash(sig, id)]; + if (!__posix_timers_find(head, sig, id)) { hlist_add_head_rcu(&timer->t_hash, head); - ret = sig->posix_timer_id; + spin_unlock(&hash_lock); + return id; } - if (++sig->posix_timer_id < 0) - sig->posix_timer_id = 0; - if ((sig->posix_timer_id == first_free_id) && (ret == -ENOENT)) - /* Loop over all possible ids completed */ - ret = -EAGAIN; spin_unlock(&hash_lock); - } while (ret == -ENOENT); - return ret; + } + /* POSIX return code when no timer ID could be allocated */ + return -EAGAIN; }
static inline void unlock_timer(struct k_itimer *timr, unsigned long flags)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit b69f0aeb068980af983d399deafc7477cec8bc04 ]
For pid namespaces, struct pid uses a dynamically sized array member, "numbers". This was implemented using the ancient 1-element fake flexible array, which has been deprecated for decades.
Replace it with a C99 flexible array, refactor the array size calculations to use struct_size(), and address elements via indexes. Note that the static initializer (which defines a single element) works as-is, and requires no special handling.
Without this, CONFIG_UBSAN_BOUNDS (and potentially CONFIG_FORTIFY_SOURCE) will trigger bounds checks:
https://lore.kernel.org/lkml/20230517-bushaltestelle-super-e223978c1ba6@brau...
Cc: Christian Brauner brauner@kernel.org Cc: Jan Kara jack@suse.cz Cc: Jeff Xu jeffxu@google.com Cc: Andreas Gruenbacher agruenba@redhat.com Cc: Daniel Verkamp dverkamp@chromium.org Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Jeff Xu jeffxu@google.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Boqun Feng boqun.feng@gmail.com Cc: Luis Chamberlain mcgrof@kernel.org Cc: Frederic Weisbecker frederic@kernel.org Reported-by: syzbot+ac3b41786a2d0565b6d5@syzkaller.appspotmail.com [brauner: dropped unrelated changes and remove 0 with NULL cast] Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/pid.h | 2 +- kernel/pid.c | 7 +++++-- kernel/pid_namespace.c | 2 +- 3 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/include/linux/pid.h b/include/linux/pid.h index 343abf22092e6..bf3af54de6165 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -67,7 +67,7 @@ struct pid /* wait queue for pidfd notifications */ wait_queue_head_t wait_pidfd; struct rcu_head rcu; - struct upid numbers[1]; + struct upid numbers[]; };
extern struct pid init_struct_pid; diff --git a/kernel/pid.c b/kernel/pid.c index 3fbc5e46b7217..74834c04a0818 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -661,8 +661,11 @@ void __init pid_idr_init(void)
idr_init(&init_pid_ns.idr);
- init_pid_ns.pid_cachep = KMEM_CACHE(pid, - SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT); + init_pid_ns.pid_cachep = kmem_cache_create("pid", + struct_size((struct pid *)NULL, numbers, 1), + __alignof__(struct pid), + SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, + NULL); }
static struct file *__pidfd_fget(struct task_struct *task, int fd) diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 1daadbefcee3a..a575fabf697eb 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -47,7 +47,7 @@ static struct kmem_cache *create_pid_cachep(unsigned int level) return kc;
snprintf(name, sizeof(name), "pid_%u", level + 1); - len = sizeof(struct pid) + level * sizeof(struct upid); + len = struct_size((struct pid *)NULL, numbers, level + 1); mutex_lock(&pid_caches_mutex); /* Name collision forces to do allocation under mutex. */ if (!*pkc)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
[ Upstream commit af1abe11466f1a6cb6ba22ee0d815c21c3559947 ]
The transaction glock was repurposed to serve as the new freeze glock years ago. Don't refer to it as the transaction glock anymore.
Also, to be more precise, call it the "freeze glock" instead of the "freeze lock". Ditto for the journal glock.
Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Stable-dep-of: f66af88e3321 ("gfs2: Stop using gfs2_make_fs_ro for withdraw") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/glock.c | 4 ++-- fs/gfs2/ops_fstype.c | 2 +- fs/gfs2/recovery.c | 8 ++++---- fs/gfs2/super.c | 2 +- fs/gfs2/util.c | 2 +- 5 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 95353982e643a..be05c43b89a59 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -146,8 +146,8 @@ static void gfs2_glock_dealloc(struct rcu_head *rcu) * * We need to allow some glocks to be enqueued, dequeued, promoted, and demoted * when we're withdrawn. For example, to maintain metadata integrity, we should - * disallow the use of inode and rgrp glocks when withdrawn. Other glocks, like - * iopen or the transaction glocks may be safely used because none of their + * disallow the use of inode and rgrp glocks when withdrawn. Other glocks like + * the iopen or freeze glock may be safely used because none of their * metadata goes through the journal. So in general, we should disallow all * glocks that are journaled, and allow all the others. One exception is: * we need to allow our active journal to be promoted and demoted so others diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index c0cf1d2d0ef5b..c7f6208ad98c0 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -434,7 +434,7 @@ static int init_locking(struct gfs2_sbd *sdp, struct gfs2_holder *mount_gh, error = gfs2_glock_get(sdp, GFS2_FREEZE_LOCK, &gfs2_freeze_glops, CREATE, &sdp->sd_freeze_gl); if (error) { - fs_err(sdp, "can't create transaction glock: %d\n", error); + fs_err(sdp, "can't create freeze glock: %d\n", error); goto fail_rename; }
diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c index 2bb085a72e8ee..d8e522f389aa7 100644 --- a/fs/gfs2/recovery.c +++ b/fs/gfs2/recovery.c @@ -420,10 +420,10 @@ void gfs2_recover_func(struct work_struct *work) if (sdp->sd_args.ar_spectator) goto fail; if (jd->jd_jid != sdp->sd_lockstruct.ls_jid) { - fs_info(sdp, "jid=%u: Trying to acquire journal lock...\n", + fs_info(sdp, "jid=%u: Trying to acquire journal glock...\n", jd->jd_jid); jlocked = 1; - /* Acquire the journal lock so we can do recovery */ + /* Acquire the journal glock so we can do recovery */
error = gfs2_glock_nq_num(sdp, jd->jd_jid, &gfs2_journal_glops, LM_ST_EXCLUSIVE, @@ -465,10 +465,10 @@ void gfs2_recover_func(struct work_struct *work) ktime_ms_delta(t_jhd, t_jlck));
if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) { - fs_info(sdp, "jid=%u: Acquiring the transaction lock...\n", + fs_info(sdp, "jid=%u: Acquiring the freeze glock...\n", jd->jd_jid);
- /* Acquire a shared hold on the freeze lock */ + /* Acquire a shared hold on the freeze glock */
error = gfs2_freeze_lock(sdp, &thaw_gh, LM_FLAG_PRIORITY); if (error) diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 6107cd680176c..c87fafbe710a6 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -463,7 +463,7 @@ static int gfs2_write_inode(struct inode *inode, struct writeback_control *wbc) * @flags: The type of dirty * * Unfortunately it can be called under any combination of inode - * glock and transaction lock, so we have to check carefully. + * glock and freeze glock, so we have to check carefully. * * At the moment this deals only with atime - it should be possible * to expand that role in future, once a review of the locking has diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c index 48c69aa60cd17..86d1415932a43 100644 --- a/fs/gfs2/util.c +++ b/fs/gfs2/util.c @@ -107,7 +107,7 @@ int gfs2_freeze_lock(struct gfs2_sbd *sdp, struct gfs2_holder *freeze_gh, error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_SHARED, flags, freeze_gh); if (error && error != GLR_TRYFAILED) - fs_err(sdp, "can't lock the freeze lock: %d\n", error); + fs_err(sdp, "can't lock the freeze glock: %d\n", error); return error; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
[ Upstream commit 097cca525adf10f35c9dac037155564f1b1a688b ]
Rename gfs2_freeze to gfs2_freeze_super and gfs2_unfreeze to gfs2_thaw_super to match the names of the corresponding super operations.
Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Stable-dep-of: f66af88e3321 ("gfs2: Stop using gfs2_make_fs_ro for withdraw") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/super.c | 12 ++++++------ fs/gfs2/util.c | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index c87fafbe710a6..d7b3a982552cf 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -682,12 +682,12 @@ void gfs2_freeze_func(struct work_struct *work) }
/** - * gfs2_freeze - prevent further writes to the filesystem + * gfs2_freeze_super - prevent further writes to the filesystem * @sb: the VFS structure for the filesystem * */
-static int gfs2_freeze(struct super_block *sb) +static int gfs2_freeze_super(struct super_block *sb) { struct gfs2_sbd *sdp = sb->s_fs_info; int error; @@ -727,12 +727,12 @@ static int gfs2_freeze(struct super_block *sb) }
/** - * gfs2_unfreeze - reallow writes to the filesystem + * gfs2_thaw_super - reallow writes to the filesystem * @sb: the VFS structure for the filesystem * */
-static int gfs2_unfreeze(struct super_block *sb) +static int gfs2_thaw_super(struct super_block *sb) { struct gfs2_sbd *sdp = sb->s_fs_info;
@@ -1499,8 +1499,8 @@ const struct super_operations gfs2_super_ops = { .evict_inode = gfs2_evict_inode, .put_super = gfs2_put_super, .sync_fs = gfs2_sync_fs, - .freeze_super = gfs2_freeze, - .thaw_super = gfs2_unfreeze, + .freeze_super = gfs2_freeze_super, + .thaw_super = gfs2_thaw_super, .statfs = gfs2_statfs, .drop_inode = gfs2_drop_inode, .show_options = gfs2_show_options, diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c index 86d1415932a43..11cc59ac64fdc 100644 --- a/fs/gfs2/util.c +++ b/fs/gfs2/util.c @@ -188,7 +188,7 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp) sdp->sd_jinode_gh.gh_flags |= GL_NOCACHE; gfs2_glock_dq(&sdp->sd_jinode_gh); if (test_bit(SDF_FS_FROZEN, &sdp->sd_flags)) { - /* Make sure gfs2_unfreeze works if partially-frozen */ + /* Make sure gfs2_thaw_super works if partially-frozen */ flush_work(&sdp->sd_freeze_work); atomic_set(&sdp->sd_freeze_state, SFS_FROZEN); thaw_super(sdp->sd_vfs);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
[ Upstream commit e392edd5d52a6742595ecaf8270c1af3e96b9a38 ]
Rename gfs2_freeze_lock to gfs2_freeze_lock_shared to make it a bit more obvious that this function establishes the "thawed" state of the freeze glock.
Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Stable-dep-of: f66af88e3321 ("gfs2: Stop using gfs2_make_fs_ro for withdraw") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/ops_fstype.c | 4 ++-- fs/gfs2/recovery.c | 2 +- fs/gfs2/super.c | 2 +- fs/gfs2/util.c | 10 +++++----- fs/gfs2/util.h | 5 +++-- 5 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index c7f6208ad98c0..e427fb7fbe998 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -1266,7 +1266,7 @@ static int gfs2_fill_super(struct super_block *sb, struct fs_context *fc) } }
- error = gfs2_freeze_lock(sdp, &freeze_gh, 0); + error = gfs2_freeze_lock_shared(sdp, &freeze_gh, 0); if (error) goto fail_per_node;
@@ -1587,7 +1587,7 @@ static int gfs2_reconfigure(struct fs_context *fc) if ((sb->s_flags ^ fc->sb_flags) & SB_RDONLY) { struct gfs2_holder freeze_gh;
- error = gfs2_freeze_lock(sdp, &freeze_gh, 0); + error = gfs2_freeze_lock_shared(sdp, &freeze_gh, 0); if (error) return -EINVAL;
diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c index d8e522f389aa7..61ef07da40b22 100644 --- a/fs/gfs2/recovery.c +++ b/fs/gfs2/recovery.c @@ -470,7 +470,7 @@ void gfs2_recover_func(struct work_struct *work)
/* Acquire a shared hold on the freeze glock */
- error = gfs2_freeze_lock(sdp, &thaw_gh, LM_FLAG_PRIORITY); + error = gfs2_freeze_lock_shared(sdp, &thaw_gh, LM_FLAG_PRIORITY); if (error) goto fail_gunlock_ji;
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index d7b3a982552cf..cb05332e473bd 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -662,7 +662,7 @@ void gfs2_freeze_func(struct work_struct *work) struct super_block *sb = sdp->sd_vfs;
atomic_inc(&sb->s_active); - error = gfs2_freeze_lock(sdp, &freeze_gh, 0); + error = gfs2_freeze_lock_shared(sdp, &freeze_gh, 0); if (error) { gfs2_assert_withdraw(sdp, 0); } else { diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c index 11cc59ac64fdc..1195ea08f9ca4 100644 --- a/fs/gfs2/util.c +++ b/fs/gfs2/util.c @@ -93,13 +93,13 @@ int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd, }
/** - * gfs2_freeze_lock - hold the freeze glock + * gfs2_freeze_lock_shared - hold the freeze glock * @sdp: the superblock * @freeze_gh: pointer to the requested holder * @caller_flags: any additional flags needed by the caller */ -int gfs2_freeze_lock(struct gfs2_sbd *sdp, struct gfs2_holder *freeze_gh, - int caller_flags) +int gfs2_freeze_lock_shared(struct gfs2_sbd *sdp, struct gfs2_holder *freeze_gh, + int caller_flags) { int flags = LM_FLAG_NOEXP | GL_EXACT | caller_flags; int error; @@ -157,8 +157,8 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp) gfs2_holder_mark_uninitialized(&freeze_gh); if (sdp->sd_freeze_gl && !gfs2_glock_is_locked_by_me(sdp->sd_freeze_gl)) { - ret = gfs2_freeze_lock(sdp, &freeze_gh, - log_write_allowed ? 0 : LM_FLAG_TRY); + ret = gfs2_freeze_lock_shared(sdp, &freeze_gh, + log_write_allowed ? 0 : LM_FLAG_TRY); if (ret == GLR_TRYFAILED) ret = 0; } diff --git a/fs/gfs2/util.h b/fs/gfs2/util.h index 78ec190f4155b..3291e33e81e97 100644 --- a/fs/gfs2/util.h +++ b/fs/gfs2/util.h @@ -149,8 +149,9 @@ int gfs2_io_error_i(struct gfs2_sbd *sdp, const char *function,
extern int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd, bool verbose); -extern int gfs2_freeze_lock(struct gfs2_sbd *sdp, - struct gfs2_holder *freeze_gh, int caller_flags); +extern int gfs2_freeze_lock_shared(struct gfs2_sbd *sdp, + struct gfs2_holder *freeze_gh, + int caller_flags); extern void gfs2_freeze_unlock(struct gfs2_holder *freeze_gh);
#define gfs2_io_error(sdp) \
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
[ Upstream commit cad1e15804a83afd9a5c1d95a428d60d1f9c0340 ]
Rename the SDF_FS_FROZEN flag to SDF_FREEZE_INITIATOR to indicate more clearly that the node that has this flag set is the initiator of the freeze.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com Stable-dep-of: f66af88e3321 ("gfs2: Stop using gfs2_make_fs_ro for withdraw") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/incore.h | 2 +- fs/gfs2/super.c | 8 ++++---- fs/gfs2/sys.c | 2 +- fs/gfs2/util.c | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index d09d9892cd055..113aeb5877027 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -600,7 +600,7 @@ enum { SDF_RORECOVERY = 7, /* read only recovery */ SDF_SKIP_DLM_UNLOCK = 8, SDF_FORCE_AIL_FLUSH = 9, - SDF_FS_FROZEN = 10, + SDF_FREEZE_INITIATOR = 10, SDF_WITHDRAWING = 11, /* Will withdraw eventually */ SDF_WITHDRAW_IN_PROG = 12, /* Withdraw is in progress */ SDF_REMOTE_WITHDRAW = 13, /* Performing remote recovery */ diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index cb05332e473bd..cdfbfda046945 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -676,8 +676,8 @@ void gfs2_freeze_func(struct work_struct *work) gfs2_freeze_unlock(&freeze_gh); } deactivate_super(sb); - clear_bit_unlock(SDF_FS_FROZEN, &sdp->sd_flags); - wake_up_bit(&sdp->sd_flags, SDF_FS_FROZEN); + clear_bit_unlock(SDF_FREEZE_INITIATOR, &sdp->sd_flags); + wake_up_bit(&sdp->sd_flags, SDF_FREEZE_INITIATOR); return; }
@@ -720,7 +720,7 @@ static int gfs2_freeze_super(struct super_block *sb) fs_err(sdp, "retrying...\n"); msleep(1000); } - set_bit(SDF_FS_FROZEN, &sdp->sd_flags); + set_bit(SDF_FREEZE_INITIATOR, &sdp->sd_flags); out: mutex_unlock(&sdp->sd_freeze_mutex); return error; @@ -745,7 +745,7 @@ static int gfs2_thaw_super(struct super_block *sb)
gfs2_freeze_unlock(&sdp->sd_freeze_gh); mutex_unlock(&sdp->sd_freeze_mutex); - return wait_on_bit(&sdp->sd_flags, SDF_FS_FROZEN, TASK_INTERRUPTIBLE); + return wait_on_bit(&sdp->sd_flags, SDF_FREEZE_INITIATOR, TASK_INTERRUPTIBLE); }
/** diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c index d87ea98cf5350..e1fa76d4a7c22 100644 --- a/fs/gfs2/sys.c +++ b/fs/gfs2/sys.c @@ -110,7 +110,7 @@ static ssize_t status_show(struct gfs2_sbd *sdp, char *buf) test_bit(SDF_RORECOVERY, &f), test_bit(SDF_SKIP_DLM_UNLOCK, &f), test_bit(SDF_FORCE_AIL_FLUSH, &f), - test_bit(SDF_FS_FROZEN, &f), + test_bit(SDF_FREEZE_INITIATOR, &f), test_bit(SDF_WITHDRAWING, &f), test_bit(SDF_WITHDRAW_IN_PROG, &f), test_bit(SDF_REMOTE_WITHDRAW, &f), diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c index 1195ea08f9ca4..ebf87fb7b3bf5 100644 --- a/fs/gfs2/util.c +++ b/fs/gfs2/util.c @@ -187,7 +187,7 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp) } sdp->sd_jinode_gh.gh_flags |= GL_NOCACHE; gfs2_glock_dq(&sdp->sd_jinode_gh); - if (test_bit(SDF_FS_FROZEN, &sdp->sd_flags)) { + if (test_bit(SDF_FREEZE_INITIATOR, &sdp->sd_flags)) { /* Make sure gfs2_thaw_super works if partially-frozen */ flush_work(&sdp->sd_freeze_work); atomic_set(&sdp->sd_freeze_state, SFS_FROZEN);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
[ Upstream commit b77b4a4815a9651d1d6e07b8e6548eee9531a5eb ]
So far, at mount time, gfs2 would take the freeze glock in shared mode and then immediately drop it again, turning it into a cached glock that can be reclaimed at any time. To freeze the filesystem cluster-wide, the node initiating the freeze would take the freeze glock in exclusive mode, which would cause the freeze glock's freeze_go_sync() callback to run on each node. There, gfs2 would freeze the filesystem and schedule gfs2_freeze_func() to run. gfs2_freeze_func() would re-acquire the freeze glock in shared mode, thaw the filesystem, and drop the freeze glock again. The initiating node would keep the freeze glock held in exclusive mode. To thaw the filesystem, the initiating node would drop the freeze glock again, which would allow gfs2_freeze_func() to resume on all nodes, leaving the filesystem in the thawed state.
It turns out that in freeze_go_sync(), we cannot reliably and safely freeze the filesystem. This is primarily because the final unmount of a filesystem takes a write lock on the s_umount rw semaphore before calling into gfs2_put_super(), and freeze_go_sync() needs to call freeze_super() which also takes a write lock on the same semaphore, causing a deadlock. We could work around this by trying to take an active reference on the super block first, which would prevent unmount from running at the same time. But that can fail, and freeze_go_sync() isn't actually allowed to fail.
To get around this, this patch changes the freeze glock locking scheme as follows:
At mount time, each node takes the freeze glock in shared mode. To freeze a filesystem, the initiating node first freezes the filesystem locally and then drops and re-acquires the freeze glock in exclusive mode. All other nodes notice that there is contention on the freeze glock in their go_callback callbacks, and they schedule gfs2_freeze_func() to run. There, they freeze the filesystem locally and drop and re-acquire the freeze glock before re-thawing the filesystem. This is happening outside of the glock state engine, so there, we are allowed to fail.
From a cluster point of view, taking and immediately dropping a glock is
indistinguishable from taking the glock and only dropping it upon contention, so this new scheme is compatible with the old one.
Thanks to Li Dong lidong@vivo.com for reporting a locking bug in gfs2_freeze_func() in a previous version of this commit.
Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Stable-dep-of: f66af88e3321 ("gfs2: Stop using gfs2_make_fs_ro for withdraw") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/glops.c | 52 +++++-------- fs/gfs2/log.c | 2 - fs/gfs2/ops_fstype.c | 5 +- fs/gfs2/recovery.c | 24 +++--- fs/gfs2/super.c | 172 +++++++++++++++++++++++++++++++++---------- fs/gfs2/super.h | 1 + fs/gfs2/util.c | 32 +++----- 7 files changed, 178 insertions(+), 110 deletions(-)
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index 91a542b9d81e8..089b3d811e43d 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -555,47 +555,33 @@ static void inode_go_dump(struct seq_file *seq, struct gfs2_glock *gl, }
/** - * freeze_go_sync - promote/demote the freeze glock + * freeze_go_callback - A cluster node is requesting a freeze * @gl: the glock + * @remote: true if this came from a different cluster node */
-static int freeze_go_sync(struct gfs2_glock *gl) +static void freeze_go_callback(struct gfs2_glock *gl, bool remote) { - int error = 0; struct gfs2_sbd *sdp = gl->gl_name.ln_sbd; + struct super_block *sb = sdp->sd_vfs; + + if (!remote || + gl->gl_state != LM_ST_SHARED || + gl->gl_demote_state != LM_ST_UNLOCKED) + return;
/* - * We need to check gl_state == LM_ST_SHARED here and not gl_req == - * LM_ST_EXCLUSIVE. That's because when any node does a freeze, - * all the nodes should have the freeze glock in SH mode and they all - * call do_xmote: One for EX and the others for UN. They ALL must - * freeze locally, and they ALL must queue freeze work. The freeze_work - * calls freeze_func, which tries to reacquire the freeze glock in SH, - * effectively waiting for the thaw on the node who holds it in EX. - * Once thawed, the work func acquires the freeze glock in - * SH and everybody goes back to thawed. + * Try to get an active super block reference to prevent racing with + * unmount (see trylock_super()). But note that unmount isn't the only + * place where a write lock on s_umount is taken, and we can fail here + * because of things like remount as well. */ - if (gl->gl_state == LM_ST_SHARED && !gfs2_withdrawn(sdp) && - !test_bit(SDF_NORECOVERY, &sdp->sd_flags)) { - atomic_set(&sdp->sd_freeze_state, SFS_STARTING_FREEZE); - error = freeze_super(sdp->sd_vfs); - if (error) { - fs_info(sdp, "GFS2: couldn't freeze filesystem: %d\n", - error); - if (gfs2_withdrawn(sdp)) { - atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN); - return 0; - } - gfs2_assert_withdraw(sdp, 0); - } - queue_work(gfs2_freeze_wq, &sdp->sd_freeze_work); - if (test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags)) - gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_FREEZE | - GFS2_LFC_FREEZE_GO_SYNC); - else /* read-only mounts */ - atomic_set(&sdp->sd_freeze_state, SFS_FROZEN); + if (down_read_trylock(&sb->s_umount)) { + atomic_inc(&sb->s_active); + up_read(&sb->s_umount); + if (!queue_work(gfs2_freeze_wq, &sdp->sd_freeze_work)) + deactivate_super(sb); } - return 0; }
/** @@ -760,9 +746,9 @@ const struct gfs2_glock_operations gfs2_rgrp_glops = { };
const struct gfs2_glock_operations gfs2_freeze_glops = { - .go_sync = freeze_go_sync, .go_xmote_bh = freeze_go_xmote_bh, .go_demote_ok = freeze_go_demote_ok, + .go_callback = freeze_go_callback, .go_type = LM_TYPE_NONDISK, .go_flags = GLOF_NONDISK, }; diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index e021d5f50c231..8fd8bb8604869 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -1136,8 +1136,6 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags) if (flags & (GFS2_LOG_HEAD_FLUSH_SHUTDOWN | GFS2_LOG_HEAD_FLUSH_FREEZE)) gfs2_log_shutdown(sdp); - if (flags & GFS2_LOG_HEAD_FLUSH_FREEZE) - atomic_set(&sdp->sd_freeze_state, SFS_FROZEN); }
out_end: diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index e427fb7fbe998..8299113858ce4 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -1143,7 +1143,6 @@ static int gfs2_fill_super(struct super_block *sb, struct fs_context *fc) int silent = fc->sb_flags & SB_SILENT; struct gfs2_sbd *sdp; struct gfs2_holder mount_gh; - struct gfs2_holder freeze_gh; int error;
sdp = init_sbd(sb); @@ -1266,15 +1265,15 @@ static int gfs2_fill_super(struct super_block *sb, struct fs_context *fc) } }
- error = gfs2_freeze_lock_shared(sdp, &freeze_gh, 0); + error = gfs2_freeze_lock_shared(sdp, &sdp->sd_freeze_gh, 0); if (error) goto fail_per_node;
if (!sb_rdonly(sb)) error = gfs2_make_fs_rw(sdp);
- gfs2_freeze_unlock(&freeze_gh); if (error) { + gfs2_freeze_unlock(&sdp->sd_freeze_gh); if (sdp->sd_quotad_process) kthread_stop(sdp->sd_quotad_process); sdp->sd_quotad_process = NULL; diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c index 61ef07da40b22..afeda936e2beb 100644 --- a/fs/gfs2/recovery.c +++ b/fs/gfs2/recovery.c @@ -404,7 +404,7 @@ void gfs2_recover_func(struct work_struct *work) struct gfs2_inode *ip = GFS2_I(jd->jd_inode); struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode); struct gfs2_log_header_host head; - struct gfs2_holder j_gh, ji_gh, thaw_gh; + struct gfs2_holder j_gh, ji_gh; ktime_t t_start, t_jlck, t_jhd, t_tlck, t_rep; int ro = 0; unsigned int pass; @@ -465,14 +465,14 @@ void gfs2_recover_func(struct work_struct *work) ktime_ms_delta(t_jhd, t_jlck));
if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) { - fs_info(sdp, "jid=%u: Acquiring the freeze glock...\n", - jd->jd_jid); - - /* Acquire a shared hold on the freeze glock */ + mutex_lock(&sdp->sd_freeze_mutex);
- error = gfs2_freeze_lock_shared(sdp, &thaw_gh, LM_FLAG_PRIORITY); - if (error) + if (atomic_read(&sdp->sd_freeze_state) != SFS_UNFROZEN) { + mutex_unlock(&sdp->sd_freeze_mutex); + fs_warn(sdp, "jid=%u: Can't replay: filesystem " + "is frozen\n", jd->jd_jid); goto fail_gunlock_ji; + }
if (test_bit(SDF_RORECOVERY, &sdp->sd_flags)) { ro = 1; @@ -496,7 +496,7 @@ void gfs2_recover_func(struct work_struct *work) fs_warn(sdp, "jid=%u: Can't replay: read-only block " "device\n", jd->jd_jid); error = -EROFS; - goto fail_gunlock_thaw; + goto fail_gunlock_nofreeze; }
t_tlck = ktime_get(); @@ -514,7 +514,7 @@ void gfs2_recover_func(struct work_struct *work) lops_after_scan(jd, error, pass); if (error) { up_read(&sdp->sd_log_flush_lock); - goto fail_gunlock_thaw; + goto fail_gunlock_nofreeze; } }
@@ -522,7 +522,7 @@ void gfs2_recover_func(struct work_struct *work) clean_journal(jd, &head); up_read(&sdp->sd_log_flush_lock);
- gfs2_freeze_unlock(&thaw_gh); + mutex_unlock(&sdp->sd_freeze_mutex); t_rep = ktime_get(); fs_info(sdp, "jid=%u: Journal replayed in %lldms [jlck:%lldms, " "jhead:%lldms, tlck:%lldms, replay:%lldms]\n", @@ -543,8 +543,8 @@ void gfs2_recover_func(struct work_struct *work) fs_info(sdp, "jid=%u: Done\n", jd->jd_jid); goto done;
-fail_gunlock_thaw: - gfs2_freeze_unlock(&thaw_gh); +fail_gunlock_nofreeze: + mutex_unlock(&sdp->sd_freeze_mutex); fail_gunlock_ji: if (jlocked) { gfs2_glock_dq_uninit(&ji_gh); diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index cdfbfda046945..1a888b9c3d110 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -332,7 +332,12 @@ static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp) struct lfcc *lfcc; LIST_HEAD(list); struct gfs2_log_header_host lh; - int error; + int error, error2; + + /* + * Grab all the journal glocks in SH mode. We are *probably* doing + * that to prevent recovery. + */
list_for_each_entry(jd, &sdp->sd_jindex_list, jd_list) { lfcc = kmalloc(sizeof(struct lfcc), GFP_KERNEL); @@ -349,11 +354,13 @@ static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp) list_add(&lfcc->list, &list); }
+ gfs2_freeze_unlock(&sdp->sd_freeze_gh); + error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_EXCLUSIVE, LM_FLAG_NOEXP | GL_NOPID, &sdp->sd_freeze_gh); if (error) - goto out; + goto relock_shared;
list_for_each_entry(jd, &sdp->sd_jindex_list, jd_list) { error = gfs2_jdesc_check(jd); @@ -368,8 +375,14 @@ static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp) } }
- if (error) - gfs2_freeze_unlock(&sdp->sd_freeze_gh); + if (!error) + goto out; /* success */ + + gfs2_freeze_unlock(&sdp->sd_freeze_gh); + +relock_shared: + error2 = gfs2_freeze_lock_shared(sdp, &sdp->sd_freeze_gh, 0); + gfs2_assert_withdraw(sdp, !error2);
out: while (!list_empty(&list)) { @@ -600,6 +613,8 @@ static void gfs2_put_super(struct super_block *sb)
/* Release stuff */
+ gfs2_freeze_unlock(&sdp->sd_freeze_gh); + iput(sdp->sd_jindex); iput(sdp->sd_statfs_inode); iput(sdp->sd_rindex); @@ -654,31 +669,82 @@ static int gfs2_sync_fs(struct super_block *sb, int wait) return sdp->sd_log_error; }
-void gfs2_freeze_func(struct work_struct *work) +static int gfs2_freeze_locally(struct gfs2_sbd *sdp) { - int error; - struct gfs2_holder freeze_gh; - struct gfs2_sbd *sdp = container_of(work, struct gfs2_sbd, sd_freeze_work); struct super_block *sb = sdp->sd_vfs; + int error;
- atomic_inc(&sb->s_active); - error = gfs2_freeze_lock_shared(sdp, &freeze_gh, 0); - if (error) { - gfs2_assert_withdraw(sdp, 0); - } else { - atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN); - error = thaw_super(sb); - if (error) { - fs_info(sdp, "GFS2: couldn't thaw filesystem: %d\n", - error); - gfs2_assert_withdraw(sdp, 0); + atomic_set(&sdp->sd_freeze_state, SFS_STARTING_FREEZE); + + error = freeze_super(sb); + if (error) + goto fail; + + if (test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags)) { + gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_FREEZE | + GFS2_LFC_FREEZE_GO_SYNC); + if (gfs2_withdrawn(sdp)) { + thaw_super(sb); + error = -EIO; + goto fail; } - gfs2_freeze_unlock(&freeze_gh); } + return 0; + +fail: + atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN); + return error; +} + +static int gfs2_do_thaw(struct gfs2_sbd *sdp) +{ + struct super_block *sb = sdp->sd_vfs; + int error; + + error = gfs2_freeze_lock_shared(sdp, &sdp->sd_freeze_gh, 0); + if (error) + goto fail; + error = thaw_super(sb); + if (!error) + return 0; + +fail: + fs_info(sdp, "GFS2: couldn't thaw filesystem: %d\n", error); + gfs2_assert_withdraw(sdp, 0); + return error; +} + +void gfs2_freeze_func(struct work_struct *work) +{ + struct gfs2_sbd *sdp = container_of(work, struct gfs2_sbd, sd_freeze_work); + struct super_block *sb = sdp->sd_vfs; + int error; + + mutex_lock(&sdp->sd_freeze_mutex); + error = -EBUSY; + if (atomic_read(&sdp->sd_freeze_state) != SFS_UNFROZEN) + goto freeze_failed; + + error = gfs2_freeze_locally(sdp); + if (error) + goto freeze_failed; + + gfs2_freeze_unlock(&sdp->sd_freeze_gh); + atomic_set(&sdp->sd_freeze_state, SFS_FROZEN); + + error = gfs2_do_thaw(sdp); + if (error) + goto out; + + atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN); + goto out; + +freeze_failed: + fs_info(sdp, "GFS2: couldn't freeze filesystem: %d\n", error); + +out: + mutex_unlock(&sdp->sd_freeze_mutex); deactivate_super(sb); - clear_bit_unlock(SDF_FREEZE_INITIATOR, &sdp->sd_flags); - wake_up_bit(&sdp->sd_flags, SDF_FREEZE_INITIATOR); - return; }
/** @@ -692,21 +758,27 @@ static int gfs2_freeze_super(struct super_block *sb) struct gfs2_sbd *sdp = sb->s_fs_info; int error;
- mutex_lock(&sdp->sd_freeze_mutex); - if (atomic_read(&sdp->sd_freeze_state) != SFS_UNFROZEN) { - error = -EBUSY; + if (!mutex_trylock(&sdp->sd_freeze_mutex)) + return -EBUSY; + error = -EBUSY; + if (atomic_read(&sdp->sd_freeze_state) != SFS_UNFROZEN) goto out; - }
for (;;) { - if (gfs2_withdrawn(sdp)) { - error = -EINVAL; + error = gfs2_freeze_locally(sdp); + if (error) { + fs_info(sdp, "GFS2: couldn't freeze filesystem: %d\n", + error); goto out; }
error = gfs2_lock_fs_check_clean(sdp); if (!error) - break; + break; /* success */ + + error = gfs2_do_thaw(sdp); + if (error) + goto out;
if (error == -EBUSY) fs_err(sdp, "waiting for recovery before freeze\n"); @@ -720,8 +792,12 @@ static int gfs2_freeze_super(struct super_block *sb) fs_err(sdp, "retrying...\n"); msleep(1000); } - set_bit(SDF_FREEZE_INITIATOR, &sdp->sd_flags); + out: + if (!error) { + set_bit(SDF_FREEZE_INITIATOR, &sdp->sd_flags); + atomic_set(&sdp->sd_freeze_state, SFS_FROZEN); + } mutex_unlock(&sdp->sd_freeze_mutex); return error; } @@ -735,17 +811,39 @@ static int gfs2_freeze_super(struct super_block *sb) static int gfs2_thaw_super(struct super_block *sb) { struct gfs2_sbd *sdp = sb->s_fs_info; + int error;
- mutex_lock(&sdp->sd_freeze_mutex); - if (atomic_read(&sdp->sd_freeze_state) != SFS_FROZEN || - !gfs2_holder_initialized(&sdp->sd_freeze_gh)) { - mutex_unlock(&sdp->sd_freeze_mutex); - return -EINVAL; + if (!mutex_trylock(&sdp->sd_freeze_mutex)) + return -EBUSY; + error = -EINVAL; + if (!test_bit(SDF_FREEZE_INITIATOR, &sdp->sd_flags)) + goto out; + + gfs2_freeze_unlock(&sdp->sd_freeze_gh); + + error = gfs2_do_thaw(sdp); + + if (!error) { + clear_bit(SDF_FREEZE_INITIATOR, &sdp->sd_flags); + atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN); } +out: + mutex_unlock(&sdp->sd_freeze_mutex); + return error; +} + +void gfs2_thaw_freeze_initiator(struct super_block *sb) +{ + struct gfs2_sbd *sdp = sb->s_fs_info; + + mutex_lock(&sdp->sd_freeze_mutex); + if (!test_bit(SDF_FREEZE_INITIATOR, &sdp->sd_flags)) + goto out;
gfs2_freeze_unlock(&sdp->sd_freeze_gh); + +out: mutex_unlock(&sdp->sd_freeze_mutex); - return wait_on_bit(&sdp->sd_flags, SDF_FREEZE_INITIATOR, TASK_INTERRUPTIBLE); }
/** diff --git a/fs/gfs2/super.h b/fs/gfs2/super.h index 58d13fd77aed5..bba58629bc458 100644 --- a/fs/gfs2/super.h +++ b/fs/gfs2/super.h @@ -46,6 +46,7 @@ extern void gfs2_statfs_change_out(const struct gfs2_statfs_change_host *sc, extern void update_statfs(struct gfs2_sbd *sdp, struct buffer_head *m_bh); extern int gfs2_statfs_sync(struct super_block *sb, int type); extern void gfs2_freeze_func(struct work_struct *work); +extern void gfs2_thaw_freeze_initiator(struct super_block *sb);
extern void free_local_statfs_inodes(struct gfs2_sbd *sdp); extern struct inode *find_local_statfs_inode(struct gfs2_sbd *sdp, diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c index ebf87fb7b3bf5..d4cc8667a5b72 100644 --- a/fs/gfs2/util.c +++ b/fs/gfs2/util.c @@ -124,7 +124,6 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp) struct gfs2_inode *ip; struct gfs2_glock *i_gl; u64 no_formal_ino; - int log_write_allowed = test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags); int ret = 0; int tries;
@@ -152,24 +151,18 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp) */ clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags); if (!sb_rdonly(sdp->sd_vfs)) { - struct gfs2_holder freeze_gh; - - gfs2_holder_mark_uninitialized(&freeze_gh); - if (sdp->sd_freeze_gl && - !gfs2_glock_is_locked_by_me(sdp->sd_freeze_gl)) { - ret = gfs2_freeze_lock_shared(sdp, &freeze_gh, - log_write_allowed ? 0 : LM_FLAG_TRY); - if (ret == GLR_TRYFAILED) - ret = 0; - } - if (!ret) - gfs2_make_fs_ro(sdp); + bool locked = mutex_trylock(&sdp->sd_freeze_mutex); + + gfs2_make_fs_ro(sdp); + + if (locked) + mutex_unlock(&sdp->sd_freeze_mutex); + /* * Dequeue any pending non-system glock holders that can no * longer be granted because the file system is withdrawn. */ gfs2_gl_dq_holders(sdp); - gfs2_freeze_unlock(&freeze_gh); }
if (sdp->sd_lockstruct.ls_ops->lm_lock == NULL) { /* lock_nolock */ @@ -187,15 +180,8 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp) } sdp->sd_jinode_gh.gh_flags |= GL_NOCACHE; gfs2_glock_dq(&sdp->sd_jinode_gh); - if (test_bit(SDF_FREEZE_INITIATOR, &sdp->sd_flags)) { - /* Make sure gfs2_thaw_super works if partially-frozen */ - flush_work(&sdp->sd_freeze_work); - atomic_set(&sdp->sd_freeze_state, SFS_FROZEN); - thaw_super(sdp->sd_vfs); - } else { - wait_on_bit(&i_gl->gl_flags, GLF_DEMOTE, - TASK_UNINTERRUPTIBLE); - } + gfs2_thaw_freeze_initiator(sdp->sd_vfs); + wait_on_bit(&i_gl->gl_flags, GLF_DEMOTE, TASK_UNINTERRUPTIBLE);
/* * holder_uninit to force glock_put, to force dlm to let go
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
[ Upstream commit f66af88e33212b57ea86da2c5d66c0d9d5c46344 ]
[ 81.372851][ T5532] CPU: 1 PID: 5532 Comm: syz-executor.0 Not tainted 6.2.0-rc1-syzkaller-dirty #0 [ 81.382080][ T5532] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 [ 81.392343][ T5532] Call Trace: [ 81.395654][ T5532] <TASK> [ 81.398603][ T5532] dump_stack_lvl+0x1b1/0x290 [ 81.418421][ T5532] gfs2_assert_warn_i+0x19a/0x2e0 [ 81.423480][ T5532] gfs2_quota_cleanup+0x4c6/0x6b0 [ 81.428611][ T5532] gfs2_make_fs_ro+0x517/0x610 [ 81.457802][ T5532] gfs2_withdraw+0x609/0x1540 [ 81.481452][ T5532] gfs2_inode_refresh+0xb2d/0xf60 [ 81.506658][ T5532] gfs2_instantiate+0x15e/0x220 [ 81.511504][ T5532] gfs2_glock_wait+0x1d9/0x2a0 [ 81.516352][ T5532] do_sync+0x485/0xc80 [ 81.554943][ T5532] gfs2_quota_sync+0x3da/0x8b0 [ 81.559738][ T5532] gfs2_sync_fs+0x49/0xb0 [ 81.564063][ T5532] sync_filesystem+0xe8/0x220 [ 81.568740][ T5532] generic_shutdown_super+0x6b/0x310 [ 81.574112][ T5532] kill_block_super+0x79/0xd0 [ 81.578779][ T5532] deactivate_locked_super+0xa7/0xf0 [ 81.584064][ T5532] cleanup_mnt+0x494/0x520 [ 81.593753][ T5532] task_work_run+0x243/0x300 [ 81.608837][ T5532] exit_to_user_mode_loop+0x124/0x150 [ 81.614232][ T5532] exit_to_user_mode_prepare+0xb2/0x140 [ 81.619820][ T5532] syscall_exit_to_user_mode+0x26/0x60 [ 81.625287][ T5532] do_syscall_64+0x49/0xb0 [ 81.629710][ T5532] entry_SYSCALL_64_after_hwframe+0x63/0xcd
In this backtrace, gfs2_quota_sync() takes quota data references and then calls do_sync(). Function do_sync() encounters filesystem corruption and withdraws the filesystem, which (among other things) calls gfs2_quota_cleanup(). Function gfs2_quota_cleanup() wrongly assumes that nobody is holding any quota data references anymore, and destroys all quota data objects. When gfs2_quota_sync() then resumes and dereferences the quota data objects it is holding, those objects are no longer there.
Function gfs2_quota_cleanup() deals with resource deallocation and can easily be delayed until gfs2_put_super() in the case of a filesystem withdraw. In fact, most of the other work gfs2_make_fs_ro() does is unnecessary during a withdraw as well, so change signal_our_withdraw() to skip gfs2_make_fs_ro() and perform the necessary steps directly instead.
Thanks to Edward Adam Davis eadavis@sina.com for the initial patches.
Link: https://lore.kernel.org/all/0000000000002b5e2405f14e860f@google.com Reported-by: syzbot+3f6a670108ce43356017@syzkaller.appspotmail.com Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/super.c | 9 ++------- fs/gfs2/util.c | 19 ++++++++++++++++++- 2 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 1a888b9c3d110..f9b47df485d17 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -563,15 +563,8 @@ void gfs2_make_fs_ro(struct gfs2_sbd *sdp) gfs2_log_is_empty(sdp), HZ * 5); gfs2_assert_warn(sdp, gfs2_log_is_empty(sdp)); - } else { - wait_event_timeout(sdp->sd_log_waitq, - gfs2_log_is_empty(sdp), - HZ * 5); } gfs2_quota_cleanup(sdp); - - if (!log_write_allowed) - sdp->sd_vfs->s_flags |= SB_RDONLY; }
/** @@ -607,6 +600,8 @@ static void gfs2_put_super(struct super_block *sb) } else { gfs2_quota_cleanup(sdp); } + if (gfs2_withdrawn(sdp)) + gfs2_quota_cleanup(sdp); WARN_ON(gfs2_withdrawing(sdp));
/* At this point, we're through modifying the disk */ diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c index d4cc8667a5b72..30b8821c54ad4 100644 --- a/fs/gfs2/util.c +++ b/fs/gfs2/util.c @@ -9,6 +9,7 @@ #include <linux/spinlock.h> #include <linux/completion.h> #include <linux/buffer_head.h> +#include <linux/kthread.h> #include <linux/crc32.h> #include <linux/gfs2_ondisk.h> #include <linux/delay.h> @@ -153,7 +154,23 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp) if (!sb_rdonly(sdp->sd_vfs)) { bool locked = mutex_trylock(&sdp->sd_freeze_mutex);
- gfs2_make_fs_ro(sdp); + if (sdp->sd_quotad_process && + current != sdp->sd_quotad_process) { + kthread_stop(sdp->sd_quotad_process); + sdp->sd_quotad_process = NULL; + } + + if (sdp->sd_logd_process && + current != sdp->sd_logd_process) { + kthread_stop(sdp->sd_logd_process); + sdp->sd_logd_process = NULL; + } + + wait_event_timeout(sdp->sd_log_waitq, + gfs2_log_is_empty(sdp), + HZ * 5); + + sdp->sd_vfs->s_flags |= SB_RDONLY;
if (locked) mutex_unlock(&sdp->sd_freeze_mutex);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ying Hsu yinghsu@chromium.org
[ Upstream commit c7eaf80bfb0c8cef852cce9501b95dd5a6bddcb9 ]
Syzbot found a bug "BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580". It is because hci_link_tx_to holds an RCU read lock and calls hci_disconnect which would hold a mutex lock since the commit a13f316e90fd ("Bluetooth: hci_conn: Consolidate code for aborting connections"). Here's an example call trace:
__dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xfc/0x174 lib/dump_stack.c:106 ___might_sleep+0x4a9/0x4d3 kernel/sched/core.c:9663 __mutex_lock_common kernel/locking/mutex.c:576 [inline] __mutex_lock+0xc7/0x6e7 kernel/locking/mutex.c:732 hci_cmd_sync_queue+0x3a/0x287 net/bluetooth/hci_sync.c:388 hci_abort_conn+0x2cd/0x2e4 net/bluetooth/hci_conn.c:1812 hci_disconnect+0x207/0x237 net/bluetooth/hci_conn.c:244 hci_link_tx_to net/bluetooth/hci_core.c:3254 [inline] __check_timeout net/bluetooth/hci_core.c:3419 [inline] __check_timeout+0x310/0x361 net/bluetooth/hci_core.c:3399 hci_sched_le net/bluetooth/hci_core.c:3602 [inline] hci_tx_work+0xe8f/0x12d0 net/bluetooth/hci_core.c:3652 process_one_work+0x75c/0xba1 kernel/workqueue.c:2310 worker_thread+0x5b2/0x73a kernel/workqueue.c:2457 kthread+0x2f7/0x30b kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
This patch releases RCU read lock before calling hci_disconnect and reacquires it afterward to fix the bug.
Fixes: a13f316e90fd ("Bluetooth: hci_conn: Consolidate code for aborting connections") Signed-off-by: Ying Hsu yinghsu@chromium.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_core.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 398a324657697..cf164ec9899c3 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -3419,7 +3419,12 @@ static void hci_link_tx_to(struct hci_dev *hdev, __u8 type) if (c->type == type && c->sent) { bt_dev_err(hdev, "killing stalled connection %pMR", &c->dst); + /* hci_disconnect might sleep, so, we have to release + * the RCU read lock before calling it. + */ + rcu_read_unlock(); hci_disconnect(c, HCI_ERROR_REMOTE_USER_TERM); + rcu_read_lock(); } }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit a26787aa13974fb0b3fb42bfeb4256c1b686e305 ]
We want to ensure everything holds the wiphy lock, so also extend that to the MAC change callback.
Signed-off-by: Johannes Berg johannes.berg@intel.com Stable-dep-of: 74a7c93f45ab ("wifi: mac80211: fix change_address deadlock during unregister") Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/iface.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index e00e1bf0f754a..408ee5afc9ae7 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -251,9 +251,9 @@ static int ieee80211_can_powered_addr_change(struct ieee80211_sub_if_data *sdata return ret; }
-static int ieee80211_change_mac(struct net_device *dev, void *addr) +static int _ieee80211_change_mac(struct ieee80211_sub_if_data *sdata, + void *addr) { - struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev); struct ieee80211_local *local = sdata->local; struct sockaddr *sa = addr; bool check_dup = true; @@ -278,7 +278,7 @@ static int ieee80211_change_mac(struct net_device *dev, void *addr)
if (live) drv_remove_interface(local, sdata); - ret = eth_mac_addr(dev, sa); + ret = eth_mac_addr(sdata->dev, sa);
if (ret == 0) { memcpy(sdata->vif.addr, sa->sa_data, ETH_ALEN); @@ -294,6 +294,19 @@ static int ieee80211_change_mac(struct net_device *dev, void *addr) return ret; }
+static int ieee80211_change_mac(struct net_device *dev, void *addr) +{ + struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev); + struct ieee80211_local *local = sdata->local; + int ret; + + wiphy_lock(local->hw.wiphy); + ret = _ieee80211_change_mac(sdata, addr); + wiphy_unlock(local->hw.wiphy); + + return ret; +} + static inline int identical_mac_addr_allowed(int type1, int type2) { return type1 == NL80211_IFTYPE_MONITOR ||
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 74a7c93f45abba538914a65dd2ef2ea7cf7150e2 ]
When using e.g. bonding, and doing a sequence such as
# iw wlan0 set type __ap # ip link add name bond1 type bond # ip link set wlan0 master bond1 # iw wlan0 interface del
we deadlock, since the wlan0 interface removal will cause bonding to reset the MAC address of wlan0.
The locking would be somewhat difficult to fix, but since this only happens during removal, we can simply ignore the MAC address change at this time.
Reported-by: syzbot+25b3a0b24216651bc2af@syzkaller.appspotmail.com Signed-off-by: Johannes Berg johannes.berg@intel.com Link: https://lore.kernel.org/r/20231012123447.9f9d7fd1f237.Ic3a5ef4391b670941a69c... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/iface.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index 408ee5afc9ae7..6a9d81e9069c9 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -300,6 +300,14 @@ static int ieee80211_change_mac(struct net_device *dev, void *addr) struct ieee80211_local *local = sdata->local; int ret;
+ /* + * This happens during unregistration if there's a bond device + * active (maybe other cases?) and we must get removed from it. + * But we really don't care anymore if it's not registered now. + */ + if (!dev->ieee80211_ptr->registered) + return 0; + wiphy_lock(local->hw.wiphy); ret = _ieee80211_change_mac(sdata, addr); wiphy_unlock(local->hw.wiphy);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yajun Deng yajun.deng@linux.dev
[ Upstream commit 2f0f9465ad9fa9c93f30009184c10da0f504f313 ]
The kernel will print several warnings in a short period of time when it stalls. Like this:
First warning: [ 7100.097547] ------------[ cut here ]------------ [ 7100.097550] NETDEV WATCHDOG: eno2 (xxx): transmit queue 8 timed out [ 7100.097571] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x260/0x270 ...
Second warning: [ 7147.756952] rcu: INFO: rcu_preempt self-detected stall on CPU [ 7147.756958] rcu: 24-....: (59999 ticks this GP) idle=546/1/0x400000000000000 softirq=367 3137/3673146 fqs=13844 [ 7147.756960] (t=60001 jiffies g=4322709 q=133381) [ 7147.756962] NMI backtrace for cpu 24 ...
We calculate that the transmit queue start stall should occur before 7095s according to watchdog_timeo, the rcu start stall at 7087s. These two times are close together, it is difficult to confirm which happened first.
To let users know the exact time the stall started, print msecs when the transmit queue time out.
Signed-off-by: Yajun Deng yajun.deng@linux.dev Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: e316dd1cf135 ("net: don't dump stack on queue timeout") Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_generic.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 7053c0292c335..4023c955036b1 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -502,7 +502,7 @@ static void dev_watchdog(struct timer_list *t) if (netif_device_present(dev) && netif_running(dev) && netif_carrier_ok(dev)) { - int some_queue_timedout = 0; + unsigned int timedout_ms = 0; unsigned int i; unsigned long trans_start;
@@ -514,16 +514,16 @@ static void dev_watchdog(struct timer_list *t) if (netif_xmit_stopped(txq) && time_after(jiffies, (trans_start + dev->watchdog_timeo))) { - some_queue_timedout = 1; + timedout_ms = jiffies_to_msecs(jiffies - trans_start); atomic_long_inc(&txq->trans_timeout); break; } }
- if (unlikely(some_queue_timedout)) { + if (unlikely(timedout_ms)) { trace_net_dev_xmit_timeout(dev, i); - WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n", - dev->name, netdev_drivername(dev), i); + WARN_ONCE(1, "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out %u ms\n", + dev->name, netdev_drivername(dev), i, timedout_ms); netif_freeze_queues(dev); dev->netdev_ops->ndo_tx_timeout(dev, i); netif_unfreeze_queues(dev);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit e316dd1cf1358ff9c44b37c7be273a7dc4349986 ]
The top syzbot report for networking (#14 for the entire kernel) is the queue timeout splat. We kept it around for a long time, because in real life it provides pretty strong signal that something is wrong with the driver or the device.
Removing it is also likely to break monitoring for those who track it as a kernel warning.
Nevertheless, WARN()ings are best suited for catching kernel programming bugs. If a Tx queue gets starved due to a pause storm, priority configuration, or other weirdness - that's obviously a problem, but not a problem we can fix at the kernel level.
Bite the bullet and convert the WARN() to a print.
Before:
NETDEV WATCHDOG: eni1np1 (netdevsim): transmit queue 0 timed out 1975 ms WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x39e/0x3b0 [... completely pointless stack trace of a timer follows ...]
Now:
netdevsim netdevsim1 eni1np1: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 1769 ms
Alternatively we could mark the drivers which syzbot has learned to abuse as "print-instead-of-WARN" selectively.
Reported-by: syzbot+d55372214aff0faa1f1f@syzkaller.appspotmail.com Reviewed-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_generic.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 4023c955036b1..6ab9359c1706f 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -522,8 +522,9 @@ static void dev_watchdog(struct timer_list *t)
if (unlikely(timedout_ms)) { trace_net_dev_xmit_timeout(dev, i); - WARN_ONCE(1, "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out %u ms\n", - dev->name, netdev_drivername(dev), i, timedout_ms); + netdev_crit(dev, "NETDEV WATCHDOG: CPU: %d: transmit queue %u timed out %u ms\n", + raw_smp_processor_id(), + i, timedout_ms); netif_freeze_queues(dev); dev->netdev_ops->ndo_tx_timeout(dev, i); netif_unfreeze_queues(dev);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Manas Ghandat ghandatmanas@gmail.com
[ Upstream commit cca974daeb6c43ea971f8ceff5a7080d7d49ee30 ]
Currently while joining the leaf in a buddy system there is shift out of bound error in calculation of BUDSIZE. Added the required check to the BUDSIZE and fixed the documentation as well.
Reported-by: syzbot+411debe54d318eaed386@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=411debe54d318eaed386 Signed-off-by: Manas Ghandat ghandatmanas@gmail.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jfs/jfs_dmap.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c index 4462274e325ac..8d064c9e9605d 100644 --- a/fs/jfs/jfs_dmap.c +++ b/fs/jfs/jfs_dmap.c @@ -2763,7 +2763,9 @@ static int dbBackSplit(dmtree_t *tp, int leafno, bool is_ctl) * leafno - the number of the leaf to be updated. * newval - the new value for the leaf. * - * RETURN VALUES: none + * RETURN VALUES: + * 0 - success + * -EIO - i/o error */ static int dbJoin(dmtree_t *tp, int leafno, int newval, bool is_ctl) { @@ -2790,6 +2792,10 @@ static int dbJoin(dmtree_t *tp, int leafno, int newval, bool is_ctl) * get the buddy size (number of words covered) of * the new value. */ + + if ((newval - tp->dmt_budmin) > BUDMIN) + return -EIO; + budsz = BUDSIZE(newval, tp->dmt_budmin);
/* try to join.
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lizhi Xu lizhi.xu@windriver.com
[ Upstream commit eb66b8abae98f869c224f7c852b685ae02144564 ]
When the length passed in is 0, the pagemap_scan_test_walk() caller should bail. This error causes at least a WARN_ON().
Link: https://lkml.kernel.org/r/20231116031352.40853-1-lizhi.xu@windriver.com Reported-by: syzbot+32d3767580a1ea339a81@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000000526f2060a30a085@google.com Signed-off-by: Lizhi Xu lizhi.xu@windriver.com Reviewed-by: Phillip Lougher phillip@squashfs.org.uk Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/squashfs/block.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c index 833aca92301f0..45ea5d62cef42 100644 --- a/fs/squashfs/block.c +++ b/fs/squashfs/block.c @@ -198,7 +198,7 @@ int squashfs_read_data(struct super_block *sb, u64 index, int length, TRACE("Block @ 0x%llx, %scompressed size %d\n", index - 2, compressed ? "" : "un", length); } - if (length < 0 || length > output->length || + if (length <= 0 || length > output->length || (index + length) > msblk->bytes_used) { res = -EIO; goto out;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phillip Lougher phillip@squashfs.org.uk
[ Upstream commit 12427de9439d68b8e96ba6f50b601ef15f437612 ]
Sysbot reports a slab out of bounds write in squashfs_readahead().
This is ultimately caused by a file reporting an (infeasibly) large file size (1407374883553280 bytes) with the minimum block size of 4K.
This causes variable overflow.
Link: https://lkml.kernel.org/r/20231113160901.6444-1-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher phillip@squashfs.org.uk Reported-by: syzbot+604424eb051c2f696163@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000b1fda20609ede0d1@google.com/ Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/squashfs/file.c | 3 ++- fs/squashfs/file_direct.c | 6 +++--- 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c index 8ba8c4c507707..e8df6430444b0 100644 --- a/fs/squashfs/file.c +++ b/fs/squashfs/file.c @@ -544,7 +544,8 @@ static void squashfs_readahead(struct readahead_control *ractl) struct squashfs_page_actor *actor; unsigned int nr_pages = 0; struct page **pages; - int i, file_end = i_size_read(inode) >> msblk->block_log; + int i; + loff_t file_end = i_size_read(inode) >> msblk->block_log; unsigned int max_pages = 1UL << shift;
readahead_expand(ractl, start, (len | mask) + 1); diff --git a/fs/squashfs/file_direct.c b/fs/squashfs/file_direct.c index f1ccad519e28c..763a3f7a75f6d 100644 --- a/fs/squashfs/file_direct.c +++ b/fs/squashfs/file_direct.c @@ -26,10 +26,10 @@ int squashfs_readpage_block(struct page *target_page, u64 block, int bsize, struct inode *inode = target_page->mapping->host; struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
- int file_end = (i_size_read(inode) - 1) >> PAGE_SHIFT; + loff_t file_end = (i_size_read(inode) - 1) >> PAGE_SHIFT; int mask = (1 << (msblk->block_log - PAGE_SHIFT)) - 1; - int start_index = target_page->index & ~mask; - int end_index = start_index | mask; + loff_t start_index = target_page->index & ~mask; + loff_t end_index = start_index | mask; int i, n, pages, bytes, res = -ENOMEM; struct page **page; struct squashfs_page_actor *actor;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Edward Adam Davis eadavis@qq.com
[ Upstream commit dd8f87f21dc3da2eaf46e7401173f935b90b13a8 ]
The cpu_key was not initialized in reiserfs_delete_solid_item(), which triggered this issue.
Reported-and-tested-by: syzbot+b3b14fb9f8a14c5d0267@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis eadavis@qq.com Link: https://lore.kernel.org/r/tencent_9EA7E746DE92DBC66049A62EDF6ED64CA706@qq.co... Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/reiserfs/stree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c index 84c12a1947b22..6ecf772919688 100644 --- a/fs/reiserfs/stree.c +++ b/fs/reiserfs/stree.c @@ -1409,7 +1409,7 @@ void reiserfs_delete_solid_item(struct reiserfs_transaction_handle *th, INITIALIZE_PATH(path); int item_len = 0; int tb_init = 0; - struct cpu_key cpu_key; + struct cpu_key cpu_key = {}; int retval; int quota_cut_bytes = 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gao Xiang hsiangkao@linux.alibaba.com
[ Upstream commit 496530c7c1dfc159d59a75ae00b572f570710c53 ]
Syzbot reported a KMSAN warning, erofs: (device loop0): z_erofs_lz4_decompress_mem: failed to decompress -12 in[46, 4050] out[917] ===================================================== BUG: KMSAN: uninit-value in hex_dump_to_buffer+0xae9/0x10f0 lib/hexdump.c:194 .. print_hex_dump+0x13d/0x3e0 lib/hexdump.c:276 z_erofs_lz4_decompress_mem fs/erofs/decompressor.c:252 [inline] z_erofs_lz4_decompress+0x257e/0x2a70 fs/erofs/decompressor.c:311 z_erofs_decompress_pcluster fs/erofs/zdata.c:1290 [inline] z_erofs_decompress_queue+0x338c/0x6460 fs/erofs/zdata.c:1372 z_erofs_runqueue+0x36cd/0x3830 z_erofs_read_folio+0x435/0x810 fs/erofs/zdata.c:1843
The root cause is that the printed decompressed buffer may be filled incompletely due to decompression failure. Since they were once only used for debugging, get rid of them now.
Reported-and-tested-by: syzbot+6c746eea496f34b3161d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/000000000000321c24060d7cfa1c@google.com Reviewed-by: Yue Hu huyue2@coolpad.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20231227151903.2900413-1-hsiangkao@linux.alibaba.c... Signed-off-by: Sasha Levin sashal@kernel.org --- fs/erofs/decompressor.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c index 1eefa4411e066..708bf142b1888 100644 --- a/fs/erofs/decompressor.c +++ b/fs/erofs/decompressor.c @@ -248,15 +248,9 @@ static int z_erofs_lz4_decompress_mem(struct z_erofs_lz4_decompress_ctx *ctx, if (ret != rq->outputsize) { erofs_err(rq->sb, "failed to decompress %d in[%u, %u] out[%u]", ret, rq->inputsize, inputmargin, rq->outputsize); - - print_hex_dump(KERN_DEBUG, "[ in]: ", DUMP_PREFIX_OFFSET, - 16, 1, src + inputmargin, rq->inputsize, true); - print_hex_dump(KERN_DEBUG, "[out]: ", DUMP_PREFIX_OFFSET, - 16, 1, out, rq->outputsize, true); - if (ret >= 0) memset(out + ret, 0, rq->outputsize - ret); - ret = -EIO; + ret = -EFSCORRUPTED; } else { ret = 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kara jack@suse.cz
[ Upstream commit a898cb621ac589b0b9e959309689a027e765aa12 ]
Syzbot has found that when it creates corrupted quota files where the quota tree contains a loop, we will deadlock when tryling to insert a dquot. Add loop detection into functions traversing the quota tree.
Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- fs/quota/quota_tree.c | 128 +++++++++++++++++++++++++++++++----------- fs/quota/quota_v2.c | 15 +++-- 2 files changed, 105 insertions(+), 38 deletions(-)
diff --git a/fs/quota/quota_tree.c b/fs/quota/quota_tree.c index 0f1493e0f6d05..254f6359b287f 100644 --- a/fs/quota/quota_tree.c +++ b/fs/quota/quota_tree.c @@ -21,6 +21,12 @@ MODULE_AUTHOR("Jan Kara"); MODULE_DESCRIPTION("Quota trie support"); MODULE_LICENSE("GPL");
+/* + * Maximum quota tree depth we support. Only to limit recursion when working + * with the tree. + */ +#define MAX_QTREE_DEPTH 6 + #define __QUOTA_QT_PARANOIA
static int __get_index(struct qtree_mem_dqinfo *info, qid_t id, int depth) @@ -327,27 +333,36 @@ static uint find_free_dqentry(struct qtree_mem_dqinfo *info,
/* Insert reference to structure into the trie */ static int do_insert_tree(struct qtree_mem_dqinfo *info, struct dquot *dquot, - uint *treeblk, int depth) + uint *blks, int depth) { char *buf = kmalloc(info->dqi_usable_bs, GFP_NOFS); int ret = 0, newson = 0, newact = 0; __le32 *ref; uint newblk; + int i;
if (!buf) return -ENOMEM; - if (!*treeblk) { + if (!blks[depth]) { ret = get_free_dqblk(info); if (ret < 0) goto out_buf; - *treeblk = ret; + for (i = 0; i < depth; i++) + if (ret == blks[i]) { + quota_error(dquot->dq_sb, + "Free block already used in tree: block %u", + ret); + ret = -EIO; + goto out_buf; + } + blks[depth] = ret; memset(buf, 0, info->dqi_usable_bs); newact = 1; } else { - ret = read_blk(info, *treeblk, buf); + ret = read_blk(info, blks[depth], buf); if (ret < 0) { quota_error(dquot->dq_sb, "Can't read tree quota " - "block %u", *treeblk); + "block %u", blks[depth]); goto out_buf; } } @@ -357,8 +372,20 @@ static int do_insert_tree(struct qtree_mem_dqinfo *info, struct dquot *dquot, info->dqi_blocks - 1); if (ret) goto out_buf; - if (!newblk) + if (!newblk) { newson = 1; + } else { + for (i = 0; i <= depth; i++) + if (newblk == blks[i]) { + quota_error(dquot->dq_sb, + "Cycle in quota tree detected: block %u index %u", + blks[depth], + get_index(info, dquot->dq_id, depth)); + ret = -EIO; + goto out_buf; + } + } + blks[depth + 1] = newblk; if (depth == info->dqi_qtree_depth - 1) { #ifdef __QUOTA_QT_PARANOIA if (newblk) { @@ -370,16 +397,16 @@ static int do_insert_tree(struct qtree_mem_dqinfo *info, struct dquot *dquot, goto out_buf; } #endif - newblk = find_free_dqentry(info, dquot, &ret); + blks[depth + 1] = find_free_dqentry(info, dquot, &ret); } else { - ret = do_insert_tree(info, dquot, &newblk, depth+1); + ret = do_insert_tree(info, dquot, blks, depth + 1); } if (newson && ret >= 0) { ref[get_index(info, dquot->dq_id, depth)] = - cpu_to_le32(newblk); - ret = write_blk(info, *treeblk, buf); + cpu_to_le32(blks[depth + 1]); + ret = write_blk(info, blks[depth], buf); } else if (newact && ret < 0) { - put_free_dqblk(info, buf, *treeblk); + put_free_dqblk(info, buf, blks[depth]); } out_buf: kfree(buf); @@ -390,7 +417,7 @@ static int do_insert_tree(struct qtree_mem_dqinfo *info, struct dquot *dquot, static inline int dq_insert_tree(struct qtree_mem_dqinfo *info, struct dquot *dquot) { - int tmp = QT_TREEOFF; + uint blks[MAX_QTREE_DEPTH] = { QT_TREEOFF };
#ifdef __QUOTA_QT_PARANOIA if (info->dqi_blocks <= QT_TREEOFF) { @@ -398,7 +425,11 @@ static inline int dq_insert_tree(struct qtree_mem_dqinfo *info, return -EIO; } #endif - return do_insert_tree(info, dquot, &tmp, 0); + if (info->dqi_qtree_depth >= MAX_QTREE_DEPTH) { + quota_error(dquot->dq_sb, "Quota tree depth too big!"); + return -EIO; + } + return do_insert_tree(info, dquot, blks, 0); }
/* @@ -511,19 +542,20 @@ static int free_dqentry(struct qtree_mem_dqinfo *info, struct dquot *dquot,
/* Remove reference to dquot from tree */ static int remove_tree(struct qtree_mem_dqinfo *info, struct dquot *dquot, - uint *blk, int depth) + uint *blks, int depth) { char *buf = kmalloc(info->dqi_usable_bs, GFP_NOFS); int ret = 0; uint newblk; __le32 *ref = (__le32 *)buf; + int i;
if (!buf) return -ENOMEM; - ret = read_blk(info, *blk, buf); + ret = read_blk(info, blks[depth], buf); if (ret < 0) { quota_error(dquot->dq_sb, "Can't read quota data block %u", - *blk); + blks[depth]); goto out_buf; } newblk = le32_to_cpu(ref[get_index(info, dquot->dq_id, depth)]); @@ -532,29 +564,38 @@ static int remove_tree(struct qtree_mem_dqinfo *info, struct dquot *dquot, if (ret) goto out_buf;
+ for (i = 0; i <= depth; i++) + if (newblk == blks[i]) { + quota_error(dquot->dq_sb, + "Cycle in quota tree detected: block %u index %u", + blks[depth], + get_index(info, dquot->dq_id, depth)); + ret = -EIO; + goto out_buf; + } if (depth == info->dqi_qtree_depth - 1) { ret = free_dqentry(info, dquot, newblk); - newblk = 0; + blks[depth + 1] = 0; } else { - ret = remove_tree(info, dquot, &newblk, depth+1); + blks[depth + 1] = newblk; + ret = remove_tree(info, dquot, blks, depth + 1); } - if (ret >= 0 && !newblk) { - int i; + if (ret >= 0 && !blks[depth + 1]) { ref[get_index(info, dquot->dq_id, depth)] = cpu_to_le32(0); /* Block got empty? */ for (i = 0; i < (info->dqi_usable_bs >> 2) && !ref[i]; i++) ; /* Don't put the root block into the free block list */ if (i == (info->dqi_usable_bs >> 2) - && *blk != QT_TREEOFF) { - put_free_dqblk(info, buf, *blk); - *blk = 0; + && blks[depth] != QT_TREEOFF) { + put_free_dqblk(info, buf, blks[depth]); + blks[depth] = 0; } else { - ret = write_blk(info, *blk, buf); + ret = write_blk(info, blks[depth], buf); if (ret < 0) quota_error(dquot->dq_sb, "Can't write quota tree block %u", - *blk); + blks[depth]); } } out_buf: @@ -565,11 +606,15 @@ static int remove_tree(struct qtree_mem_dqinfo *info, struct dquot *dquot, /* Delete dquot from tree */ int qtree_delete_dquot(struct qtree_mem_dqinfo *info, struct dquot *dquot) { - uint tmp = QT_TREEOFF; + uint blks[MAX_QTREE_DEPTH] = { QT_TREEOFF };
if (!dquot->dq_off) /* Even not allocated? */ return 0; - return remove_tree(info, dquot, &tmp, 0); + if (info->dqi_qtree_depth >= MAX_QTREE_DEPTH) { + quota_error(dquot->dq_sb, "Quota tree depth too big!"); + return -EIO; + } + return remove_tree(info, dquot, blks, 0); } EXPORT_SYMBOL(qtree_delete_dquot);
@@ -613,18 +658,20 @@ static loff_t find_block_dqentry(struct qtree_mem_dqinfo *info,
/* Find entry for given id in the tree */ static loff_t find_tree_dqentry(struct qtree_mem_dqinfo *info, - struct dquot *dquot, uint blk, int depth) + struct dquot *dquot, uint *blks, int depth) { char *buf = kmalloc(info->dqi_usable_bs, GFP_NOFS); loff_t ret = 0; __le32 *ref = (__le32 *)buf; + uint blk; + int i;
if (!buf) return -ENOMEM; - ret = read_blk(info, blk, buf); + ret = read_blk(info, blks[depth], buf); if (ret < 0) { quota_error(dquot->dq_sb, "Can't read quota tree block %u", - blk); + blks[depth]); goto out_buf; } ret = 0; @@ -636,8 +683,19 @@ static loff_t find_tree_dqentry(struct qtree_mem_dqinfo *info, if (ret) goto out_buf;
+ /* Check for cycles in the tree */ + for (i = 0; i <= depth; i++) + if (blk == blks[i]) { + quota_error(dquot->dq_sb, + "Cycle in quota tree detected: block %u index %u", + blks[depth], + get_index(info, dquot->dq_id, depth)); + ret = -EIO; + goto out_buf; + } + blks[depth + 1] = blk; if (depth < info->dqi_qtree_depth - 1) - ret = find_tree_dqentry(info, dquot, blk, depth+1); + ret = find_tree_dqentry(info, dquot, blks, depth + 1); else ret = find_block_dqentry(info, dquot, blk); out_buf: @@ -649,7 +707,13 @@ static loff_t find_tree_dqentry(struct qtree_mem_dqinfo *info, static inline loff_t find_dqentry(struct qtree_mem_dqinfo *info, struct dquot *dquot) { - return find_tree_dqentry(info, dquot, QT_TREEOFF, 0); + uint blks[MAX_QTREE_DEPTH] = { QT_TREEOFF }; + + if (info->dqi_qtree_depth >= MAX_QTREE_DEPTH) { + quota_error(dquot->dq_sb, "Quota tree depth too big!"); + return -EIO; + } + return find_tree_dqentry(info, dquot, blks, 0); }
int qtree_read_dquot(struct qtree_mem_dqinfo *info, struct dquot *dquot) diff --git a/fs/quota/quota_v2.c b/fs/quota/quota_v2.c index b1467f3921c28..6921d40645a7e 100644 --- a/fs/quota/quota_v2.c +++ b/fs/quota/quota_v2.c @@ -166,14 +166,17 @@ static int v2_read_file_info(struct super_block *sb, int type) i_size_read(sb_dqopt(sb)->files[type])); goto out_free; } - if (qinfo->dqi_free_blk >= qinfo->dqi_blocks) { - quota_error(sb, "Free block number too big (%u >= %u).", - qinfo->dqi_free_blk, qinfo->dqi_blocks); + if (qinfo->dqi_free_blk && (qinfo->dqi_free_blk <= QT_TREEOFF || + qinfo->dqi_free_blk >= qinfo->dqi_blocks)) { + quota_error(sb, "Free block number %u out of range (%u, %u).", + qinfo->dqi_free_blk, QT_TREEOFF, qinfo->dqi_blocks); goto out_free; } - if (qinfo->dqi_free_entry >= qinfo->dqi_blocks) { - quota_error(sb, "Block with free entry too big (%u >= %u).", - qinfo->dqi_free_entry, qinfo->dqi_blocks); + if (qinfo->dqi_free_entry && (qinfo->dqi_free_entry <= QT_TREEOFF || + qinfo->dqi_free_entry >= qinfo->dqi_blocks)) { + quota_error(sb, "Block with free entry %u out of range (%u, %u).", + qinfo->dqi_free_entry, QT_TREEOFF, + qinfo->dqi_blocks); goto out_free; } ret = 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Allison Henderson allison.henderson@oracle.com
[ Upstream commit f1acf1ac84d2ae97b7889b87223c1064df850069 ]
Functions rds_still_queued and rds_clear_recv_queue lock a given socket in order to safely iterate over the incoming rds messages. However calling rds_inc_put while under this lock creates a potential deadlock. rds_inc_put may eventually call rds_message_purge, which will lock m_rs_lock. This is the incorrect locking order since m_rs_lock is meant to be locked before the socket. To fix this, we move the message item to a local list or variable that wont need rs_recv_lock protection. Then we can safely call rds_inc_put on any item stored locally after rs_recv_lock is released.
Fixes: bdbe6fbc6a2f ("RDS: recv.c") Reported-by: syzbot+f9db6ff27b9bfdcfeca0@syzkaller.appspotmail.com Reported-by: syzbot+dcd73ff9291e6d34b3ab@syzkaller.appspotmail.com Signed-off-by: Allison Henderson allison.henderson@oracle.com Link: https://lore.kernel.org/r/20240209022854.200292-1-allison.henderson@oracle.c... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/rds/recv.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/net/rds/recv.c b/net/rds/recv.c index 5b426dc3634d1..a316180d3c32e 100644 --- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -424,6 +424,7 @@ static int rds_still_queued(struct rds_sock *rs, struct rds_incoming *inc, struct sock *sk = rds_rs_to_sk(rs); int ret = 0; unsigned long flags; + struct rds_incoming *to_drop = NULL;
write_lock_irqsave(&rs->rs_recv_lock, flags); if (!list_empty(&inc->i_item)) { @@ -434,11 +435,14 @@ static int rds_still_queued(struct rds_sock *rs, struct rds_incoming *inc, -be32_to_cpu(inc->i_hdr.h_len), inc->i_hdr.h_dport); list_del_init(&inc->i_item); - rds_inc_put(inc); + to_drop = inc; } } write_unlock_irqrestore(&rs->rs_recv_lock, flags);
+ if (to_drop) + rds_inc_put(to_drop); + rdsdebug("inc %p rs %p still %d dropped %d\n", inc, rs, ret, drop); return ret; } @@ -757,16 +761,21 @@ void rds_clear_recv_queue(struct rds_sock *rs) struct sock *sk = rds_rs_to_sk(rs); struct rds_incoming *inc, *tmp; unsigned long flags; + LIST_HEAD(to_drop);
write_lock_irqsave(&rs->rs_recv_lock, flags); list_for_each_entry_safe(inc, tmp, &rs->rs_recv_queue, i_item) { rds_recv_rcvbuf_delta(rs, sk, inc->i_conn->c_lcong, -be32_to_cpu(inc->i_hdr.h_len), inc->i_hdr.h_dport); + list_move(&inc->i_item, &to_drop); + } + write_unlock_irqrestore(&rs->rs_recv_lock, flags); + + list_for_each_entry_safe(inc, tmp, &to_drop, i_item) { list_del_init(&inc->i_item); rds_inc_put(inc); } - write_unlock_irqrestore(&rs->rs_recv_lock, flags); }
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Antipov dmantipov@yandex.ru
[ Upstream commit 4e45170d9acc2d5ae8f545bf3f2f67504a361338 ]
In case of GSO, 'chunk->skb' pointer may point to an entry from fraglist created in 'sctp_packet_gso_append()'. To avoid freeing random fraglist entry (and so undefined behavior and/or memory leak), introduce 'sctp_inq_chunk_free()' helper to ensure that 'chunk->skb' is set to 'chunk->head_skb' (i.e. fraglist head) before calling 'sctp_chunk_free()', and use the aforementioned helper in 'sctp_inq_pop()' as well.
Reported-by: syzbot+8bb053b5d63595ab47db@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?id=0d8351bbe54fd04a492c2daab0164138db00804... Fixes: 90017accff61 ("sctp: Add GSO support") Suggested-by: Xin Long lucien.xin@gmail.com Signed-off-by: Dmitry Antipov dmantipov@yandex.ru Acked-by: Xin Long lucien.xin@gmail.com Link: https://lore.kernel.org/r/20240214082224.10168-1-dmantipov@yandex.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/inqueue.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c index 7182c5a450fb5..5c16521818058 100644 --- a/net/sctp/inqueue.c +++ b/net/sctp/inqueue.c @@ -38,6 +38,14 @@ void sctp_inq_init(struct sctp_inq *queue) INIT_WORK(&queue->immediate, NULL); }
+/* Properly release the chunk which is being worked on. */ +static inline void sctp_inq_chunk_free(struct sctp_chunk *chunk) +{ + if (chunk->head_skb) + chunk->skb = chunk->head_skb; + sctp_chunk_free(chunk); +} + /* Release the memory associated with an SCTP inqueue. */ void sctp_inq_free(struct sctp_inq *queue) { @@ -53,7 +61,7 @@ void sctp_inq_free(struct sctp_inq *queue) * free it as well. */ if (queue->in_progress) { - sctp_chunk_free(queue->in_progress); + sctp_inq_chunk_free(queue->in_progress); queue->in_progress = NULL; } } @@ -130,9 +138,7 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue) goto new_skb; }
- if (chunk->head_skb) - chunk->skb = chunk->head_skb; - sctp_chunk_free(chunk); + sctp_inq_chunk_free(chunk); chunk = queue->in_progress = NULL; } else { /* Nothing to do. Next chunk in the packet, please. */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gavrilov Ilia Ilia.Gavrilov@infotecs.ru
[ Upstream commit dc34ebd5c018b0edf47f39d11083ad8312733034 ]
syzbot reports a memory leak in pppoe_sendmsg [1].
The problem is in the pppoe_recvmsg() function that handles errors in the wrong order. For the skb_recv_datagram() function, check the pointer to skb for NULL first, and then check the 'error' variable, because the skb_recv_datagram() function can set 'error' to -EAGAIN in a loop but return a correct pointer to socket buffer after a number of attempts, though 'error' remains set to -EAGAIN.
skb_recv_datagram __skb_recv_datagram // Loop. if (err == -EAGAIN) then // go to the next loop iteration __skb_try_recv_datagram // if (skb != NULL) then return 'skb' // else if a signal is received then // return -EAGAIN
Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with Syzkaller.
Link: https://syzkaller.appspot.com/bug?extid=6bdfd184eac7709e5cc9 [1]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+6bdfd184eac7709e5cc9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6bdfd184eac7709e5cc9 Signed-off-by: Gavrilov Ilia Ilia.Gavrilov@infotecs.ru Reviewed-by: Guillaume Nault gnault@redhat.com Link: https://lore.kernel.org/r/20240214085814.3894917-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ppp/pppoe.c | 23 +++++++++-------------- 1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c index ce2cbb5903d7b..c6f44af35889d 100644 --- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -1007,26 +1007,21 @@ static int pppoe_recvmsg(struct socket *sock, struct msghdr *m, struct sk_buff *skb; int error = 0;
- if (sk->sk_state & PPPOX_BOUND) { - error = -EIO; - goto end; - } + if (sk->sk_state & PPPOX_BOUND) + return -EIO;
skb = skb_recv_datagram(sk, flags, &error); - if (error < 0) - goto end; + if (!skb) + return error;
- if (skb) { - total_len = min_t(size_t, total_len, skb->len); - error = skb_copy_datagram_msg(skb, 0, m, total_len); - if (error == 0) { - consume_skb(skb); - return total_len; - } + total_len = min_t(size_t, total_len, skb->len); + error = skb_copy_datagram_msg(skb, 0, m, total_len); + if (error == 0) { + consume_skb(skb); + return total_len; }
kfree_skb(skb); -end: return error; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
[ Upstream commit 94b9b9de05b62ac54d8766caa9865fb4d82cc47e ]
ieee80211_drop_unencrypted is called from ieee80211_rx_h_mesh_fwding and ieee80211_frame_allowed.
Since ieee80211_rx_h_mesh_fwding can forward packets for other mesh nodes and is called earlier, it needs to check the decryptions status and if the packet is using the control protocol on its own, instead of deferring to the later call from ieee80211_frame_allowed.
Because of that, ieee80211_drop_unencrypted has a mesh specific check that skips over the mesh header in order to check the payload protocol. This code is invalid when called from ieee80211_frame_allowed, since that happens after the 802.11->802.3 conversion.
Fix this by moving the mesh specific check directly into ieee80211_rx_h_mesh_fwding.
Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20221201135730.19723-1-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Stable-dep-of: 9ad797485692 ("wifi: cfg80211: check A-MSDU format more carefully") Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/rx.c | 38 ++++++++++---------------------------- 1 file changed, 10 insertions(+), 28 deletions(-)
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index c4c80037df91d..b68a9200403e7 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2408,7 +2408,6 @@ static int ieee80211_802_1x_port_control(struct ieee80211_rx_data *rx)
static int ieee80211_drop_unencrypted(struct ieee80211_rx_data *rx, __le16 fc) { - struct ieee80211_hdr *hdr = (void *)rx->skb->data; struct sk_buff *skb = rx->skb; struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
@@ -2419,31 +2418,6 @@ static int ieee80211_drop_unencrypted(struct ieee80211_rx_data *rx, __le16 fc) if (status->flag & RX_FLAG_DECRYPTED) return 0;
- /* check mesh EAPOL frames first */ - if (unlikely(rx->sta && ieee80211_vif_is_mesh(&rx->sdata->vif) && - ieee80211_is_data(fc))) { - struct ieee80211s_hdr *mesh_hdr; - u16 hdr_len = ieee80211_hdrlen(fc); - u16 ethertype_offset; - __be16 ethertype; - - if (!ether_addr_equal(hdr->addr1, rx->sdata->vif.addr)) - goto drop_check; - - /* make sure fixed part of mesh header is there, also checks skb len */ - if (!pskb_may_pull(rx->skb, hdr_len + 6)) - goto drop_check; - - mesh_hdr = (struct ieee80211s_hdr *)(skb->data + hdr_len); - ethertype_offset = hdr_len + ieee80211_get_mesh_hdrlen(mesh_hdr) + - sizeof(rfc1042_header); - - if (skb_copy_bits(rx->skb, ethertype_offset, ðertype, 2) == 0 && - ethertype == rx->sdata->control_port_protocol) - return 0; - } - -drop_check: /* Drop unencrypted frames if key is set. */ if (unlikely(!ieee80211_has_protected(fc) && !ieee80211_is_any_nullfunc(fc) && @@ -2897,8 +2871,16 @@ ieee80211_rx_h_mesh_fwding(struct ieee80211_rx_data *rx) hdr = (struct ieee80211_hdr *) skb->data; mesh_hdr = (struct ieee80211s_hdr *) (skb->data + hdrlen);
- if (ieee80211_drop_unencrypted(rx, hdr->frame_control)) - return RX_DROP_MONITOR; + if (ieee80211_drop_unencrypted(rx, hdr->frame_control)) { + int offset = hdrlen + ieee80211_get_mesh_hdrlen(mesh_hdr) + + sizeof(rfc1042_header); + __be16 ethertype; + + if (!ether_addr_equal(hdr->addr1, rx->sdata->vif.addr) || + skb_copy_bits(rx->skb, offset, ðertype, 2) != 0 || + ethertype != rx->sdata->control_port_protocol) + return RX_DROP_MONITOR; + }
/* frame is in RMC, don't forward */ if (ieee80211_is_data(hdr->frame_control) &&
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
[ Upstream commit 0f690e6b4dcd7243e2805a76981b252c2d4bdce6 ]
When parsing the outer A-MSDU header, don't check for inner bridge tunnel or RFC1042 headers. This is handled by ieee80211_amsdu_to_8023s already.
Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230213100855.34315-1-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Stable-dep-of: 9ad797485692 ("wifi: cfg80211: check A-MSDU format more carefully") Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/util.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/wireless/util.c b/net/wireless/util.c index 1665320d22146..4680e65460c85 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -631,8 +631,9 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, break; }
- if (likely(skb_copy_bits(skb, hdrlen, &payload, sizeof(payload)) == 0 && - ((!is_amsdu && ether_addr_equal(payload.hdr, rfc1042_header) && + if (likely(!is_amsdu && + skb_copy_bits(skb, hdrlen, &payload, sizeof(payload)) == 0 && + ((ether_addr_equal(payload.hdr, rfc1042_header) && payload.proto != htons(ETH_P_AARP) && payload.proto != htons(ETH_P_IPX)) || ether_addr_equal(payload.hdr, bridge_tunnel_header)))) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
[ Upstream commit 9f718554e7eacea62d3f972cae24d969755bf3b6 ]
The same check is done in multiple places, unify it.
Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230213100855.34315-2-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Stable-dep-of: 9ad797485692 ("wifi: cfg80211: check A-MSDU format more carefully") Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/util.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/net/wireless/util.c b/net/wireless/util.c index 4680e65460c85..8597694a0cfdb 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -542,6 +542,21 @@ unsigned int ieee80211_get_mesh_hdrlen(struct ieee80211s_hdr *meshhdr) } EXPORT_SYMBOL(ieee80211_get_mesh_hdrlen);
+static bool ieee80211_get_8023_tunnel_proto(const void *hdr, __be16 *proto) +{ + const __be16 *hdr_proto = hdr + ETH_ALEN; + + if (!(ether_addr_equal(hdr, rfc1042_header) && + *hdr_proto != htons(ETH_P_AARP) && + *hdr_proto != htons(ETH_P_IPX)) && + !ether_addr_equal(hdr, bridge_tunnel_header)) + return false; + + *proto = *hdr_proto; + + return true; +} + int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, const u8 *addr, enum nl80211_iftype iftype, u8 data_offset, bool is_amsdu) @@ -633,14 +648,9 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr,
if (likely(!is_amsdu && skb_copy_bits(skb, hdrlen, &payload, sizeof(payload)) == 0 && - ((ether_addr_equal(payload.hdr, rfc1042_header) && - payload.proto != htons(ETH_P_AARP) && - payload.proto != htons(ETH_P_IPX)) || - ether_addr_equal(payload.hdr, bridge_tunnel_header)))) { - /* remove RFC1042 or Bridge-Tunnel encapsulation and - * replace EtherType */ + ieee80211_get_8023_tunnel_proto(&payload, &tmp.h_proto))) { + /* remove RFC1042 or Bridge-Tunnel encapsulation */ hdrlen += ETH_ALEN + 2; - tmp.h_proto = payload.proto; skb_postpull_rcsum(skb, &payload, ETH_ALEN + 2); } else { tmp.h_proto = htons(skb->len - hdrlen); @@ -756,8 +766,6 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, { unsigned int hlen = ALIGN(extra_headroom, 4); struct sk_buff *frame = NULL; - u16 ethertype; - u8 *payload; int offset = 0, remaining; struct ethhdr eth; bool reuse_frag = skb->head_frag && !skb_has_frag_list(skb); @@ -811,14 +819,8 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, frame->dev = skb->dev; frame->priority = skb->priority;
- payload = frame->data; - ethertype = (payload[6] << 8) | payload[7]; - if (likely((ether_addr_equal(payload, rfc1042_header) && - ethertype != ETH_P_AARP && ethertype != ETH_P_IPX) || - ether_addr_equal(payload, bridge_tunnel_header))) { - eth.h_proto = htons(ethertype); + if (likely(ieee80211_get_8023_tunnel_proto(frame->data, ð.h_proto))) skb_pull(frame, ETH_ALEN + 2); - }
memcpy(skb_push(frame, sizeof(eth)), ð, sizeof(eth)); __skb_queue_tail(list, frame);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
[ Upstream commit 5c1e269aa5ebafeec69b68ff560522faa5bcb6c1 ]
Now that all drivers use iTXQ, it does not make sense to check to drop tx forwarding packets when the driver has stopped the queues. fq_codel will take care of dropping packets when the queues fill up
Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230213100855.34315-3-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Stable-dep-of: 9ad797485692 ("wifi: cfg80211: check A-MSDU format more carefully") Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/debugfs_netdev.c | 3 --- net/mac80211/ieee80211_i.h | 1 - net/mac80211/rx.c | 5 ----- 3 files changed, 9 deletions(-)
diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c index 08a1d7564b7f2..8ced615add712 100644 --- a/net/mac80211/debugfs_netdev.c +++ b/net/mac80211/debugfs_netdev.c @@ -603,8 +603,6 @@ IEEE80211_IF_FILE(fwded_mcast, u.mesh.mshstats.fwded_mcast, DEC); IEEE80211_IF_FILE(fwded_unicast, u.mesh.mshstats.fwded_unicast, DEC); IEEE80211_IF_FILE(fwded_frames, u.mesh.mshstats.fwded_frames, DEC); IEEE80211_IF_FILE(dropped_frames_ttl, u.mesh.mshstats.dropped_frames_ttl, DEC); -IEEE80211_IF_FILE(dropped_frames_congestion, - u.mesh.mshstats.dropped_frames_congestion, DEC); IEEE80211_IF_FILE(dropped_frames_no_route, u.mesh.mshstats.dropped_frames_no_route, DEC);
@@ -741,7 +739,6 @@ static void add_mesh_stats(struct ieee80211_sub_if_data *sdata) MESHSTATS_ADD(fwded_frames); MESHSTATS_ADD(dropped_frames_ttl); MESHSTATS_ADD(dropped_frames_no_route); - MESHSTATS_ADD(dropped_frames_congestion); #undef MESHSTATS_ADD }
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 709eb7bfcf194..8a3af4144d3f0 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -327,7 +327,6 @@ struct mesh_stats { __u32 fwded_frames; /* Mesh total forwarded frames */ __u32 dropped_frames_ttl; /* Not transmitted since mesh_ttl == 0*/ __u32 dropped_frames_no_route; /* Not transmitted, no route found */ - __u32 dropped_frames_congestion;/* Not forwarded due to congestion */ };
#define PREQ_Q_F_START 0x1 diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index b68a9200403e7..1d50126aebbc8 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2931,11 +2931,6 @@ ieee80211_rx_h_mesh_fwding(struct ieee80211_rx_data *rx) return RX_CONTINUE;
ac = ieee802_1d_to_ac[skb->priority]; - q = sdata->vif.hw_queue[ac]; - if (ieee80211_queue_stopped(&local->hw, q)) { - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, dropped_frames_congestion); - return RX_DROP_MONITOR; - } skb_set_queue_mapping(skb, ac);
if (!--mesh_hdr->ttl) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
[ Upstream commit 986e43b19ae9176093da35e0a844e65c8bf9ede7 ]
The current mac80211 mesh A-MSDU receive path fails to parse A-MSDU packets on mesh interfaces, because it assumes that the Mesh Control field is always directly after the 802.11 header. 802.11-2020 9.3.2.2.2 Figure 9-70 shows that the Mesh Control field is actually part of the A-MSDU subframe header. This makes more sense, since it allows packets for multiple different destinations to be included in the same A-MSDU, as long as RA and TID are still the same. Another issue is the fact that the A-MSDU subframe length field was apparently accidentally defined as little-endian in the standard.
In order to fix this, the mesh forwarding path needs happen at a different point in the receive path.
ieee80211_data_to_8023_exthdr is changed to ignore the mesh control field and leave it in after the ethernet header. This also affects the source/dest MAC address fields, which now in the case of mesh point to the mesh SA/DA.
ieee80211_amsdu_to_8023s is changed to deal with the endian difference and to add the Mesh Control length to the subframe length, since it's not covered by the MSDU length field.
With these changes, the mac80211 will get the same packet structure for converted regular data packets and unpacked A-MSDU subframes.
The mesh forwarding checks are now only performed after the A-MSDU decap. For locally received packets, the Mesh Control header is stripped away. For forwarded packets, a new 802.11 header gets added.
Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230213100855.34315-4-nbd@nbd.name [fix fortify build error] Signed-off-by: Johannes Berg johannes.berg@intel.com Stable-dep-of: 9ad797485692 ("wifi: cfg80211: check A-MSDU format more carefully") Signed-off-by: Sasha Levin sashal@kernel.org --- .../wireless/marvell/mwifiex/11n_rxreorder.c | 2 +- include/net/cfg80211.h | 27 +- net/mac80211/rx.c | 350 ++++++++++-------- net/wireless/util.c | 120 +++--- 4 files changed, 297 insertions(+), 202 deletions(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c index 54ab8b54369ba..4ab3a14567b65 100644 --- a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c +++ b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c @@ -33,7 +33,7 @@ static int mwifiex_11n_dispatch_amsdu_pkt(struct mwifiex_private *priv, skb_trim(skb, le16_to_cpu(local_rx_pd->rx_pkt_length));
ieee80211_amsdu_to_8023s(skb, &list, priv->curr_addr, - priv->wdev.iftype, 0, NULL, NULL); + priv->wdev.iftype, 0, NULL, NULL, false);
while (!skb_queue_empty(&list)) { struct rx_packet_hdr *rx_hdr; diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 5bf5c1ab542ce..c2f7d01b3a16e 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -6316,11 +6316,36 @@ static inline int ieee80211_data_to_8023(struct sk_buff *skb, const u8 *addr, * @extra_headroom: The hardware extra headroom for SKBs in the @list. * @check_da: DA to check in the inner ethernet header, or NULL * @check_sa: SA to check in the inner ethernet header, or NULL + * @mesh_control: A-MSDU subframe header includes the mesh control field */ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, const u8 *addr, enum nl80211_iftype iftype, const unsigned int extra_headroom, - const u8 *check_da, const u8 *check_sa); + const u8 *check_da, const u8 *check_sa, + bool mesh_control); + +/** + * ieee80211_get_8023_tunnel_proto - get RFC1042 or bridge tunnel encap protocol + * + * Check for RFC1042 or bridge tunnel header and fetch the encapsulated + * protocol. + * + * @hdr: pointer to the MSDU payload + * @proto: destination pointer to store the protocol + * Return: true if encapsulation was found + */ +bool ieee80211_get_8023_tunnel_proto(const void *hdr, __be16 *proto); + +/** + * ieee80211_strip_8023_mesh_hdr - strip mesh header from converted 802.3 frames + * + * Strip the mesh header, which was left in by ieee80211_data_to_8023 as part + * of the MSDU data. Also move any source/destination addresses from the mesh + * header to the ethernet header (if present). + * + * @skb: The 802.3 frame with embedded mesh header + */ +int ieee80211_strip_8023_mesh_hdr(struct sk_buff *skb);
/** * cfg80211_classify8021d - determine the 802.1p/1d tag for a data frame diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index 1d50126aebbc8..8d2379944f3de 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2725,6 +2725,174 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx) } }
+static ieee80211_rx_result +ieee80211_rx_mesh_data(struct ieee80211_sub_if_data *sdata, struct sta_info *sta, + struct sk_buff *skb) +{ +#ifdef CONFIG_MAC80211_MESH + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; + struct ieee80211_local *local = sdata->local; + uint16_t fc = IEEE80211_FTYPE_DATA | IEEE80211_STYPE_QOS_DATA; + struct ieee80211_hdr hdr = { + .frame_control = cpu_to_le16(fc) + }; + struct ieee80211_hdr *fwd_hdr; + struct ieee80211s_hdr *mesh_hdr; + struct ieee80211_tx_info *info; + struct sk_buff *fwd_skb; + struct ethhdr *eth; + bool multicast; + int tailroom = 0; + int hdrlen, mesh_hdrlen; + u8 *qos; + + if (!ieee80211_vif_is_mesh(&sdata->vif)) + return RX_CONTINUE; + + if (!pskb_may_pull(skb, sizeof(*eth) + 6)) + return RX_DROP_MONITOR; + + mesh_hdr = (struct ieee80211s_hdr *)(skb->data + sizeof(*eth)); + mesh_hdrlen = ieee80211_get_mesh_hdrlen(mesh_hdr); + + if (!pskb_may_pull(skb, sizeof(*eth) + mesh_hdrlen)) + return RX_DROP_MONITOR; + + eth = (struct ethhdr *)skb->data; + multicast = is_multicast_ether_addr(eth->h_dest); + + mesh_hdr = (struct ieee80211s_hdr *)(eth + 1); + if (!mesh_hdr->ttl) + return RX_DROP_MONITOR; + + /* frame is in RMC, don't forward */ + if (is_multicast_ether_addr(eth->h_dest) && + mesh_rmc_check(sdata, eth->h_source, mesh_hdr)) + return RX_DROP_MONITOR; + + /* Frame has reached destination. Don't forward */ + if (ether_addr_equal(sdata->vif.addr, eth->h_dest)) + goto rx_accept; + + if (!ifmsh->mshcfg.dot11MeshForwarding) { + if (is_multicast_ether_addr(eth->h_dest)) + goto rx_accept; + + return RX_DROP_MONITOR; + } + + /* forward packet */ + if (sdata->crypto_tx_tailroom_needed_cnt) + tailroom = IEEE80211_ENCRYPT_TAILROOM; + + if (!--mesh_hdr->ttl) { + if (multicast) + goto rx_accept; + + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, dropped_frames_ttl); + return RX_DROP_MONITOR; + } + + if (mesh_hdr->flags & MESH_FLAGS_AE) { + struct mesh_path *mppath; + char *proxied_addr; + + if (multicast) + proxied_addr = mesh_hdr->eaddr1; + else if ((mesh_hdr->flags & MESH_FLAGS_AE) == MESH_FLAGS_AE_A5_A6) + /* has_a4 already checked in ieee80211_rx_mesh_check */ + proxied_addr = mesh_hdr->eaddr2; + else + return RX_DROP_MONITOR; + + rcu_read_lock(); + mppath = mpp_path_lookup(sdata, proxied_addr); + if (!mppath) { + mpp_path_add(sdata, proxied_addr, eth->h_source); + } else { + spin_lock_bh(&mppath->state_lock); + if (!ether_addr_equal(mppath->mpp, eth->h_source)) + memcpy(mppath->mpp, eth->h_source, ETH_ALEN); + mppath->exp_time = jiffies; + spin_unlock_bh(&mppath->state_lock); + } + rcu_read_unlock(); + } + + skb_set_queue_mapping(skb, ieee802_1d_to_ac[skb->priority]); + + ieee80211_fill_mesh_addresses(&hdr, &hdr.frame_control, + eth->h_dest, eth->h_source); + hdrlen = ieee80211_hdrlen(hdr.frame_control); + if (multicast) { + int extra_head = sizeof(struct ieee80211_hdr) - sizeof(*eth); + + fwd_skb = skb_copy_expand(skb, local->tx_headroom + extra_head + + IEEE80211_ENCRYPT_HEADROOM, + tailroom, GFP_ATOMIC); + if (!fwd_skb) + goto rx_accept; + } else { + fwd_skb = skb; + skb = NULL; + + if (skb_cow_head(fwd_skb, hdrlen - sizeof(struct ethhdr))) + return RX_DROP_UNUSABLE; + } + + fwd_hdr = skb_push(fwd_skb, hdrlen - sizeof(struct ethhdr)); + memcpy(fwd_hdr, &hdr, hdrlen - 2); + qos = ieee80211_get_qos_ctl(fwd_hdr); + qos[0] = qos[1] = 0; + + skb_reset_mac_header(fwd_skb); + hdrlen += mesh_hdrlen; + if (ieee80211_get_8023_tunnel_proto(fwd_skb->data + hdrlen, + &fwd_skb->protocol)) + hdrlen += ETH_ALEN; + else + fwd_skb->protocol = htons(fwd_skb->len - hdrlen); + skb_set_network_header(fwd_skb, hdrlen); + + info = IEEE80211_SKB_CB(fwd_skb); + memset(info, 0, sizeof(*info)); + info->control.flags |= IEEE80211_TX_INTCFL_NEED_TXPROCESSING; + info->control.vif = &sdata->vif; + info->control.jiffies = jiffies; + if (multicast) { + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_mcast); + memcpy(fwd_hdr->addr2, sdata->vif.addr, ETH_ALEN); + /* update power mode indication when forwarding */ + ieee80211_mps_set_frame_flags(sdata, NULL, fwd_hdr); + } else if (!mesh_nexthop_lookup(sdata, fwd_skb)) { + /* mesh power mode flags updated in mesh_nexthop_lookup */ + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_unicast); + } else { + /* unable to resolve next hop */ + if (sta) + mesh_path_error_tx(sdata, ifmsh->mshcfg.element_ttl, + hdr.addr3, 0, + WLAN_REASON_MESH_PATH_NOFORWARD, + sta->sta.addr); + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, dropped_frames_no_route); + kfree_skb(fwd_skb); + goto rx_accept; + } + + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_frames); + fwd_skb->dev = sdata->dev; + ieee80211_add_pending_skb(local, fwd_skb); + +rx_accept: + if (!skb) + return RX_QUEUED; + + ieee80211_strip_8023_mesh_hdr(skb); +#endif + + return RX_CONTINUE; +} + static ieee80211_rx_result debug_noinline __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) { @@ -2733,8 +2901,10 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data; __le16 fc = hdr->frame_control; struct sk_buff_head frame_list; + static ieee80211_rx_result res; struct ethhdr ethhdr; const u8 *check_da = ethhdr.h_dest, *check_sa = ethhdr.h_source; + bool mesh = false;
if (unlikely(ieee80211_has_a4(hdr->frame_control))) { check_da = NULL; @@ -2751,6 +2921,8 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) break; case NL80211_IFTYPE_MESH_POINT: check_sa = NULL; + check_da = NULL; + mesh = true; break; default: break; @@ -2768,17 +2940,29 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) ieee80211_amsdu_to_8023s(skb, &frame_list, dev->dev_addr, rx->sdata->vif.type, rx->local->hw.extra_tx_headroom, - check_da, check_sa); + check_da, check_sa, mesh);
while (!skb_queue_empty(&frame_list)) { rx->skb = __skb_dequeue(&frame_list);
- if (!ieee80211_frame_allowed(rx, fc)) { - dev_kfree_skb(rx->skb); + res = ieee80211_rx_mesh_data(rx->sdata, rx->sta, rx->skb); + switch (res) { + case RX_QUEUED: continue; + case RX_CONTINUE: + break; + default: + goto free; }
+ if (!ieee80211_frame_allowed(rx, fc)) + goto free; + ieee80211_deliver_skb(rx); + continue; + +free: + dev_kfree_skb(rx->skb); }
return RX_QUEUED; @@ -2811,6 +2995,8 @@ ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx) if (!rx->sdata->u.mgd.use_4addr) return RX_DROP_UNUSABLE; break; + case NL80211_IFTYPE_MESH_POINT: + break; default: return RX_DROP_UNUSABLE; } @@ -2839,155 +3025,6 @@ ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx) return __ieee80211_rx_h_amsdu(rx, 0); }
-#ifdef CONFIG_MAC80211_MESH -static ieee80211_rx_result -ieee80211_rx_h_mesh_fwding(struct ieee80211_rx_data *rx) -{ - struct ieee80211_hdr *fwd_hdr, *hdr; - struct ieee80211_tx_info *info; - struct ieee80211s_hdr *mesh_hdr; - struct sk_buff *skb = rx->skb, *fwd_skb; - struct ieee80211_local *local = rx->local; - struct ieee80211_sub_if_data *sdata = rx->sdata; - struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; - u16 ac, q, hdrlen; - int tailroom = 0; - - hdr = (struct ieee80211_hdr *) skb->data; - hdrlen = ieee80211_hdrlen(hdr->frame_control); - - /* make sure fixed part of mesh header is there, also checks skb len */ - if (!pskb_may_pull(rx->skb, hdrlen + 6)) - return RX_DROP_MONITOR; - - mesh_hdr = (struct ieee80211s_hdr *) (skb->data + hdrlen); - - /* make sure full mesh header is there, also checks skb len */ - if (!pskb_may_pull(rx->skb, - hdrlen + ieee80211_get_mesh_hdrlen(mesh_hdr))) - return RX_DROP_MONITOR; - - /* reload pointers */ - hdr = (struct ieee80211_hdr *) skb->data; - mesh_hdr = (struct ieee80211s_hdr *) (skb->data + hdrlen); - - if (ieee80211_drop_unencrypted(rx, hdr->frame_control)) { - int offset = hdrlen + ieee80211_get_mesh_hdrlen(mesh_hdr) + - sizeof(rfc1042_header); - __be16 ethertype; - - if (!ether_addr_equal(hdr->addr1, rx->sdata->vif.addr) || - skb_copy_bits(rx->skb, offset, ðertype, 2) != 0 || - ethertype != rx->sdata->control_port_protocol) - return RX_DROP_MONITOR; - } - - /* frame is in RMC, don't forward */ - if (ieee80211_is_data(hdr->frame_control) && - is_multicast_ether_addr(hdr->addr1) && - mesh_rmc_check(rx->sdata, hdr->addr3, mesh_hdr)) - return RX_DROP_MONITOR; - - if (!ieee80211_is_data(hdr->frame_control)) - return RX_CONTINUE; - - if (!mesh_hdr->ttl) - return RX_DROP_MONITOR; - - if (mesh_hdr->flags & MESH_FLAGS_AE) { - struct mesh_path *mppath; - char *proxied_addr; - char *mpp_addr; - - if (is_multicast_ether_addr(hdr->addr1)) { - mpp_addr = hdr->addr3; - proxied_addr = mesh_hdr->eaddr1; - } else if ((mesh_hdr->flags & MESH_FLAGS_AE) == - MESH_FLAGS_AE_A5_A6) { - /* has_a4 already checked in ieee80211_rx_mesh_check */ - mpp_addr = hdr->addr4; - proxied_addr = mesh_hdr->eaddr2; - } else { - return RX_DROP_MONITOR; - } - - rcu_read_lock(); - mppath = mpp_path_lookup(sdata, proxied_addr); - if (!mppath) { - mpp_path_add(sdata, proxied_addr, mpp_addr); - } else { - spin_lock_bh(&mppath->state_lock); - if (!ether_addr_equal(mppath->mpp, mpp_addr)) - memcpy(mppath->mpp, mpp_addr, ETH_ALEN); - mppath->exp_time = jiffies; - spin_unlock_bh(&mppath->state_lock); - } - rcu_read_unlock(); - } - - /* Frame has reached destination. Don't forward */ - if (!is_multicast_ether_addr(hdr->addr1) && - ether_addr_equal(sdata->vif.addr, hdr->addr3)) - return RX_CONTINUE; - - ac = ieee802_1d_to_ac[skb->priority]; - skb_set_queue_mapping(skb, ac); - - if (!--mesh_hdr->ttl) { - if (!is_multicast_ether_addr(hdr->addr1)) - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, - dropped_frames_ttl); - goto out; - } - - if (!ifmsh->mshcfg.dot11MeshForwarding) - goto out; - - if (sdata->crypto_tx_tailroom_needed_cnt) - tailroom = IEEE80211_ENCRYPT_TAILROOM; - - fwd_skb = skb_copy_expand(skb, local->tx_headroom + - IEEE80211_ENCRYPT_HEADROOM, - tailroom, GFP_ATOMIC); - if (!fwd_skb) - goto out; - - fwd_skb->dev = sdata->dev; - fwd_hdr = (struct ieee80211_hdr *) fwd_skb->data; - fwd_hdr->frame_control &= ~cpu_to_le16(IEEE80211_FCTL_RETRY); - info = IEEE80211_SKB_CB(fwd_skb); - memset(info, 0, sizeof(*info)); - info->control.flags |= IEEE80211_TX_INTCFL_NEED_TXPROCESSING; - info->control.vif = &rx->sdata->vif; - info->control.jiffies = jiffies; - if (is_multicast_ether_addr(fwd_hdr->addr1)) { - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_mcast); - memcpy(fwd_hdr->addr2, sdata->vif.addr, ETH_ALEN); - /* update power mode indication when forwarding */ - ieee80211_mps_set_frame_flags(sdata, NULL, fwd_hdr); - } else if (!mesh_nexthop_lookup(sdata, fwd_skb)) { - /* mesh power mode flags updated in mesh_nexthop_lookup */ - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_unicast); - } else { - /* unable to resolve next hop */ - mesh_path_error_tx(sdata, ifmsh->mshcfg.element_ttl, - fwd_hdr->addr3, 0, - WLAN_REASON_MESH_PATH_NOFORWARD, - fwd_hdr->addr2); - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, dropped_frames_no_route); - kfree_skb(fwd_skb); - return RX_DROP_MONITOR; - } - - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_frames); - ieee80211_add_pending_skb(local, fwd_skb); - out: - if (is_multicast_ether_addr(hdr->addr1)) - return RX_CONTINUE; - return RX_DROP_MONITOR; -} -#endif - static ieee80211_rx_result debug_noinline ieee80211_rx_h_data(struct ieee80211_rx_data *rx) { @@ -2996,6 +3033,7 @@ ieee80211_rx_h_data(struct ieee80211_rx_data *rx) struct net_device *dev = sdata->dev; struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)rx->skb->data; __le16 fc = hdr->frame_control; + static ieee80211_rx_result res; bool port_control; int err;
@@ -3022,6 +3060,10 @@ ieee80211_rx_h_data(struct ieee80211_rx_data *rx) if (unlikely(err)) return RX_DROP_UNUSABLE;
+ res = ieee80211_rx_mesh_data(rx->sdata, rx->sta, rx->skb); + if (res != RX_CONTINUE) + return res; + if (!ieee80211_frame_allowed(rx, fc)) return RX_DROP_MONITOR;
@@ -3996,10 +4038,6 @@ static void ieee80211_rx_handlers(struct ieee80211_rx_data *rx, CALL_RXH(ieee80211_rx_h_defragment); CALL_RXH(ieee80211_rx_h_michael_mic_verify); /* must be after MMIC verify so header is counted in MPDU mic */ -#ifdef CONFIG_MAC80211_MESH - if (ieee80211_vif_is_mesh(&rx->sdata->vif)) - CALL_RXH(ieee80211_rx_h_mesh_fwding); -#endif CALL_RXH(ieee80211_rx_h_amsdu); CALL_RXH(ieee80211_rx_h_data);
diff --git a/net/wireless/util.c b/net/wireless/util.c index 8597694a0cfdb..61a76f31fac89 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -542,7 +542,7 @@ unsigned int ieee80211_get_mesh_hdrlen(struct ieee80211s_hdr *meshhdr) } EXPORT_SYMBOL(ieee80211_get_mesh_hdrlen);
-static bool ieee80211_get_8023_tunnel_proto(const void *hdr, __be16 *proto) +bool ieee80211_get_8023_tunnel_proto(const void *hdr, __be16 *proto) { const __be16 *hdr_proto = hdr + ETH_ALEN;
@@ -556,6 +556,49 @@ static bool ieee80211_get_8023_tunnel_proto(const void *hdr, __be16 *proto)
return true; } +EXPORT_SYMBOL(ieee80211_get_8023_tunnel_proto); + +int ieee80211_strip_8023_mesh_hdr(struct sk_buff *skb) +{ + const void *mesh_addr; + struct { + struct ethhdr eth; + u8 flags; + } payload; + int hdrlen; + int ret; + + ret = skb_copy_bits(skb, 0, &payload, sizeof(payload)); + if (ret) + return ret; + + hdrlen = sizeof(payload.eth) + __ieee80211_get_mesh_hdrlen(payload.flags); + + if (likely(pskb_may_pull(skb, hdrlen + 8) && + ieee80211_get_8023_tunnel_proto(skb->data + hdrlen, + &payload.eth.h_proto))) + hdrlen += ETH_ALEN + 2; + else if (!pskb_may_pull(skb, hdrlen)) + return -EINVAL; + + mesh_addr = skb->data + sizeof(payload.eth) + ETH_ALEN; + switch (payload.flags & MESH_FLAGS_AE) { + case MESH_FLAGS_AE_A4: + memcpy(&payload.eth.h_source, mesh_addr, ETH_ALEN); + break; + case MESH_FLAGS_AE_A5_A6: + memcpy(&payload.eth, mesh_addr, 2 * ETH_ALEN); + break; + default: + break; + } + + pskb_pull(skb, hdrlen - sizeof(payload.eth)); + memcpy(skb->data, &payload.eth, sizeof(payload.eth)); + + return 0; +} +EXPORT_SYMBOL(ieee80211_strip_8023_mesh_hdr);
int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, const u8 *addr, enum nl80211_iftype iftype, @@ -568,7 +611,6 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, } payload; struct ethhdr tmp; u16 hdrlen; - u8 mesh_flags = 0;
if (unlikely(!ieee80211_is_data_present(hdr->frame_control))) return -1; @@ -589,12 +631,6 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, memcpy(tmp.h_dest, ieee80211_get_DA(hdr), ETH_ALEN); memcpy(tmp.h_source, ieee80211_get_SA(hdr), ETH_ALEN);
- if (iftype == NL80211_IFTYPE_MESH_POINT && - skb_copy_bits(skb, hdrlen, &mesh_flags, 1) < 0) - return -1; - - mesh_flags &= MESH_FLAGS_AE; - switch (hdr->frame_control & cpu_to_le16(IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) { case cpu_to_le16(IEEE80211_FCTL_TODS): @@ -608,17 +644,6 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, iftype != NL80211_IFTYPE_AP_VLAN && iftype != NL80211_IFTYPE_STATION)) return -1; - if (iftype == NL80211_IFTYPE_MESH_POINT) { - if (mesh_flags == MESH_FLAGS_AE_A4) - return -1; - if (mesh_flags == MESH_FLAGS_AE_A5_A6 && - skb_copy_bits(skb, hdrlen + - offsetof(struct ieee80211s_hdr, eaddr1), - tmp.h_dest, 2 * ETH_ALEN) < 0) - return -1; - - hdrlen += __ieee80211_get_mesh_hdrlen(mesh_flags); - } break; case cpu_to_le16(IEEE80211_FCTL_FROMDS): if ((iftype != NL80211_IFTYPE_STATION && @@ -627,16 +652,6 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, (is_multicast_ether_addr(tmp.h_dest) && ether_addr_equal(tmp.h_source, addr))) return -1; - if (iftype == NL80211_IFTYPE_MESH_POINT) { - if (mesh_flags == MESH_FLAGS_AE_A5_A6) - return -1; - if (mesh_flags == MESH_FLAGS_AE_A4 && - skb_copy_bits(skb, hdrlen + - offsetof(struct ieee80211s_hdr, eaddr1), - tmp.h_source, ETH_ALEN) < 0) - return -1; - hdrlen += __ieee80211_get_mesh_hdrlen(mesh_flags); - } break; case cpu_to_le16(0): if (iftype != NL80211_IFTYPE_ADHOC && @@ -646,7 +661,7 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, break; }
- if (likely(!is_amsdu && + if (likely(!is_amsdu && iftype != NL80211_IFTYPE_MESH_POINT && skb_copy_bits(skb, hdrlen, &payload, sizeof(payload)) == 0 && ieee80211_get_8023_tunnel_proto(&payload, &tmp.h_proto))) { /* remove RFC1042 or Bridge-Tunnel encapsulation */ @@ -722,7 +737,8 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct sk_buff *frame,
static struct sk_buff * __ieee80211_amsdu_copy(struct sk_buff *skb, unsigned int hlen, - int offset, int len, bool reuse_frag) + int offset, int len, bool reuse_frag, + int min_len) { struct sk_buff *frame; int cur_len = len; @@ -736,7 +752,7 @@ __ieee80211_amsdu_copy(struct sk_buff *skb, unsigned int hlen, * in the stack later. */ if (reuse_frag) - cur_len = min_t(int, len, 32); + cur_len = min_t(int, len, min_len);
/* * Allocate and reserve two bytes more for payload @@ -746,6 +762,7 @@ __ieee80211_amsdu_copy(struct sk_buff *skb, unsigned int hlen, if (!frame) return NULL;
+ frame->priority = skb->priority; skb_reserve(frame, hlen + sizeof(struct ethhdr) + 2); skb_copy_bits(skb, offset, skb_put(frame, cur_len), cur_len);
@@ -762,23 +779,37 @@ __ieee80211_amsdu_copy(struct sk_buff *skb, unsigned int hlen, void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, const u8 *addr, enum nl80211_iftype iftype, const unsigned int extra_headroom, - const u8 *check_da, const u8 *check_sa) + const u8 *check_da, const u8 *check_sa, + bool mesh_control) { unsigned int hlen = ALIGN(extra_headroom, 4); struct sk_buff *frame = NULL; int offset = 0, remaining; - struct ethhdr eth; + struct { + struct ethhdr eth; + uint8_t flags; + } hdr; bool reuse_frag = skb->head_frag && !skb_has_frag_list(skb); bool reuse_skb = false; bool last = false; + int copy_len = sizeof(hdr.eth); + + if (iftype == NL80211_IFTYPE_MESH_POINT) + copy_len = sizeof(hdr);
while (!last) { unsigned int subframe_len; - int len; + int len, mesh_len = 0; u8 padding;
- skb_copy_bits(skb, offset, ð, sizeof(eth)); - len = ntohs(eth.h_proto); + skb_copy_bits(skb, offset, &hdr, copy_len); + if (iftype == NL80211_IFTYPE_MESH_POINT) + mesh_len = __ieee80211_get_mesh_hdrlen(hdr.flags); + if (mesh_control) + len = le16_to_cpu(*(__le16 *)&hdr.eth.h_proto) + mesh_len; + else + len = ntohs(hdr.eth.h_proto); + subframe_len = sizeof(struct ethhdr) + len; padding = (4 - subframe_len) & 0x3;
@@ -787,16 +818,16 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, if (subframe_len > remaining) goto purge; /* mitigate A-MSDU aggregation injection attacks */ - if (ether_addr_equal(eth.h_dest, rfc1042_header)) + if (ether_addr_equal(hdr.eth.h_dest, rfc1042_header)) goto purge;
offset += sizeof(struct ethhdr); last = remaining <= subframe_len + padding;
/* FIXME: should we really accept multicast DA? */ - if ((check_da && !is_multicast_ether_addr(eth.h_dest) && - !ether_addr_equal(check_da, eth.h_dest)) || - (check_sa && !ether_addr_equal(check_sa, eth.h_source))) { + if ((check_da && !is_multicast_ether_addr(hdr.eth.h_dest) && + !ether_addr_equal(check_da, hdr.eth.h_dest)) || + (check_sa && !ether_addr_equal(check_sa, hdr.eth.h_source))) { offset += len + padding; continue; } @@ -808,7 +839,7 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, reuse_skb = true; } else { frame = __ieee80211_amsdu_copy(skb, hlen, offset, len, - reuse_frag); + reuse_frag, 32 + mesh_len); if (!frame) goto purge;
@@ -819,10 +850,11 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, frame->dev = skb->dev; frame->priority = skb->priority;
- if (likely(ieee80211_get_8023_tunnel_proto(frame->data, ð.h_proto))) + if (likely(iftype != NL80211_IFTYPE_MESH_POINT && + ieee80211_get_8023_tunnel_proto(frame->data, &hdr.eth.h_proto))) skb_pull(frame, ETH_ALEN + 2);
- memcpy(skb_push(frame, sizeof(eth)), ð, sizeof(eth)); + memcpy(skb_push(frame, sizeof(hdr.eth)), &hdr.eth, sizeof(hdr.eth)); __skb_queue_tail(list, frame); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
[ Upstream commit 6e4c0d0460bd32ca9244dff3ba2d2da27235de11 ]
At least ath10k and ath11k supported hardware (maybe more) does not implement mesh A-MSDU aggregation in a standard compliant way. 802.11-2020 9.3.2.2.2 declares that the Mesh Control field is part of the A-MSDU header (and little-endian). As such, its length must not be included in the subframe length field. Hardware affected by this bug treats the mesh control field as part of the MSDU data and sets the length accordingly. In order to avoid packet loss, keep track of which stations are affected by this and take it into account when converting A-MSDU to 802.3 + mesh control packets.
Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230213100855.34315-5-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Stable-dep-of: 9ad797485692 ("wifi: cfg80211: check A-MSDU format more carefully") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/cfg80211.h | 13 +++++++++++++ net/mac80211/rx.c | 15 ++++++++++++--- net/mac80211/sta_info.c | 3 +++ net/mac80211/sta_info.h | 1 + net/wireless/util.c | 32 ++++++++++++++++++++++++++++++++ 5 files changed, 61 insertions(+), 3 deletions(-)
diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index c2f7d01b3a16e..2a0fc4a64af1e 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -6301,6 +6301,19 @@ static inline int ieee80211_data_to_8023(struct sk_buff *skb, const u8 *addr, return ieee80211_data_to_8023_exthdr(skb, NULL, addr, iftype, 0, false); }
+/** + * ieee80211_is_valid_amsdu - check if subframe lengths of an A-MSDU are valid + * + * This is used to detect non-standard A-MSDU frames, e.g. the ones generated + * by ath10k and ath11k, where the subframe length includes the length of the + * mesh control field. + * + * @skb: The input A-MSDU frame without any headers. + * @mesh_hdr: use standard compliant mesh A-MSDU subframe header + * Returns: true if subframe header lengths are valid for the @mesh_hdr mode + */ +bool ieee80211_is_valid_amsdu(struct sk_buff *skb, bool mesh_hdr); + /** * ieee80211_amsdu_to_8023s - decode an IEEE 802.11n A-MSDU frame * diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index 8d2379944f3de..7cf1444c242d0 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2904,7 +2904,6 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) static ieee80211_rx_result res; struct ethhdr ethhdr; const u8 *check_da = ethhdr.h_dest, *check_sa = ethhdr.h_source; - bool mesh = false;
if (unlikely(ieee80211_has_a4(hdr->frame_control))) { check_da = NULL; @@ -2922,7 +2921,6 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) case NL80211_IFTYPE_MESH_POINT: check_sa = NULL; check_da = NULL; - mesh = true; break; default: break; @@ -2937,10 +2935,21 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) data_offset, true)) return RX_DROP_UNUSABLE;
+ if (rx->sta && rx->sta->amsdu_mesh_control < 0) { + bool valid_std = ieee80211_is_valid_amsdu(skb, true); + bool valid_nonstd = ieee80211_is_valid_amsdu(skb, false); + + if (valid_std && !valid_nonstd) + rx->sta->amsdu_mesh_control = 1; + else if (valid_nonstd && !valid_std) + rx->sta->amsdu_mesh_control = 0; + } + ieee80211_amsdu_to_8023s(skb, &frame_list, dev->dev_addr, rx->sdata->vif.type, rx->local->hw.extra_tx_headroom, - check_da, check_sa, mesh); + check_da, check_sa, + rx->sta->amsdu_mesh_control);
while (!skb_queue_empty(&frame_list)) { rx->skb = __skb_dequeue(&frame_list); diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index f388b39531748..91768abf2d75b 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -594,6 +594,9 @@ __sta_info_alloc(struct ieee80211_sub_if_data *sdata,
sta->sta_state = IEEE80211_STA_NONE;
+ if (sdata->vif.type == NL80211_IFTYPE_MESH_POINT) + sta->amsdu_mesh_control = -1; + /* Mark TID as unreserved */ sta->reserved_tid = IEEE80211_TID_UNRESERVED;
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h index 4809756a43dd1..dbf441a0ac6b6 100644 --- a/net/mac80211/sta_info.h +++ b/net/mac80211/sta_info.h @@ -706,6 +706,7 @@ struct sta_info { struct codel_params cparams;
u8 reserved_tid; + s8 amsdu_mesh_control;
struct cfg80211_chan_def tdls_chandef;
diff --git a/net/wireless/util.c b/net/wireless/util.c index 61a76f31fac89..4cf17c3c18392 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -776,6 +776,38 @@ __ieee80211_amsdu_copy(struct sk_buff *skb, unsigned int hlen, return frame; }
+bool ieee80211_is_valid_amsdu(struct sk_buff *skb, bool mesh_hdr) +{ + int offset = 0, remaining, subframe_len, padding; + + for (offset = 0; offset < skb->len; offset += subframe_len + padding) { + struct { + __be16 len; + u8 mesh_flags; + } hdr; + u16 len; + + if (skb_copy_bits(skb, offset + 2 * ETH_ALEN, &hdr, sizeof(hdr)) < 0) + return false; + + if (mesh_hdr) + len = le16_to_cpu(*(__le16 *)&hdr.len) + + __ieee80211_get_mesh_hdrlen(hdr.mesh_flags); + else + len = ntohs(hdr.len); + + subframe_len = sizeof(struct ethhdr) + len; + padding = (4 - subframe_len) & 0x3; + remaining = skb->len - offset; + + if (subframe_len > remaining) + return false; + } + + return true; +} +EXPORT_SYMBOL(ieee80211_is_valid_amsdu); + void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, const u8 *addr, enum nl80211_iftype iftype, const unsigned int extra_headroom,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 9ad7974856926129f190ffbe3beea78460b3b7cc ]
If it looks like there's another subframe in the A-MSDU but the header isn't fully there, we can end up reading data out of bounds, only to discard later. Make this a bit more careful and check if the subframe header can even be present.
Reported-by: syzbot+d050d437fe47d479d210@syzkaller.appspotmail.com Link: https://msgid.link/20240226203405.a731e2c95e38.I82ce7d8c0cc8970ce29d0a39fdc0... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/util.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/wireless/util.c b/net/wireless/util.c index 4cf17c3c18392..d1a4a9fd2bcba 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -778,15 +778,19 @@ __ieee80211_amsdu_copy(struct sk_buff *skb, unsigned int hlen,
bool ieee80211_is_valid_amsdu(struct sk_buff *skb, bool mesh_hdr) { - int offset = 0, remaining, subframe_len, padding; + int offset = 0, subframe_len, padding;
for (offset = 0; offset < skb->len; offset += subframe_len + padding) { + int remaining = skb->len - offset; struct { __be16 len; u8 mesh_flags; } hdr; u16 len;
+ if (sizeof(hdr) > remaining) + return false; + if (skb_copy_bits(skb, offset + 2 * ETH_ALEN, &hdr, sizeof(hdr)) < 0) return false;
@@ -798,7 +802,6 @@ bool ieee80211_is_valid_amsdu(struct sk_buff *skb, bool mesh_hdr)
subframe_len = sizeof(struct ethhdr) + len; padding = (4 - subframe_len) & 0x3; - remaining = skb->len - offset;
if (subframe_len > remaining) return false; @@ -816,7 +819,7 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, { unsigned int hlen = ALIGN(extra_headroom, 4); struct sk_buff *frame = NULL; - int offset = 0, remaining; + int offset = 0; struct { struct ethhdr eth; uint8_t flags; @@ -830,10 +833,14 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, copy_len = sizeof(hdr);
while (!last) { + int remaining = skb->len - offset; unsigned int subframe_len; int len, mesh_len = 0; u8 padding;
+ if (copy_len > remaining) + goto purge; + skb_copy_bits(skb, offset, &hdr, copy_len); if (iftype == NL80211_IFTYPE_MESH_POINT) mesh_len = __ieee80211_get_mesh_hdrlen(hdr.flags); @@ -846,7 +853,6 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, padding = (4 - subframe_len) & 0x3;
/* the last MSDU has no padding */ - remaining = skb->len - offset; if (subframe_len > remaining) goto purge; /* mitigate A-MSDU aggregation injection attacks */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Donald Hunter donald.hunter@gmail.com
[ Upstream commit 83177c0dca3811faa051124731a692609caee7c7 ]
Add documentation for BPF_MAP_TYPE_LPM_TRIE including kernel BPF helper usage, userspace usage and examples.
Signed-off-by: Donald Hunter donald.hunter@gmail.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20221101114542.24481-2-donald.hunter@gmail.com Stable-dep-of: 59f2f841179a ("bpf: Avoid kfree_rcu() under lock in bpf_lpm_trie.") Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/bpf/map_lpm_trie.rst | 181 +++++++++++++++++++++++++++++ 1 file changed, 181 insertions(+) create mode 100644 Documentation/bpf/map_lpm_trie.rst
diff --git a/Documentation/bpf/map_lpm_trie.rst b/Documentation/bpf/map_lpm_trie.rst new file mode 100644 index 0000000000000..31be1aa7ba2cb --- /dev/null +++ b/Documentation/bpf/map_lpm_trie.rst @@ -0,0 +1,181 @@ +.. SPDX-License-Identifier: GPL-2.0-only +.. Copyright (C) 2022 Red Hat, Inc. + +===================== +BPF_MAP_TYPE_LPM_TRIE +===================== + +.. note:: + - ``BPF_MAP_TYPE_LPM_TRIE`` was introduced in kernel version 4.11 + +``BPF_MAP_TYPE_LPM_TRIE`` provides a longest prefix match algorithm that +can be used to match IP addresses to a stored set of prefixes. +Internally, data is stored in an unbalanced trie of nodes that uses +``prefixlen,data`` pairs as its keys. The ``data`` is interpreted in +network byte order, i.e. big endian, so ``data[0]`` stores the most +significant byte. + +LPM tries may be created with a maximum prefix length that is a multiple +of 8, in the range from 8 to 2048. The key used for lookup and update +operations is a ``struct bpf_lpm_trie_key``, extended by +``max_prefixlen/8`` bytes. + +- For IPv4 addresses the data length is 4 bytes +- For IPv6 addresses the data length is 16 bytes + +The value type stored in the LPM trie can be any user defined type. + +.. note:: + When creating a map of type ``BPF_MAP_TYPE_LPM_TRIE`` you must set the + ``BPF_F_NO_PREALLOC`` flag. + +Usage +===== + +Kernel BPF +---------- + +.. c:function:: + void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) + +The longest prefix entry for a given data value can be found using the +``bpf_map_lookup_elem()`` helper. This helper returns a pointer to the +value associated with the longest matching ``key``, or ``NULL`` if no +entry was found. + +The ``key`` should have ``prefixlen`` set to ``max_prefixlen`` when +performing longest prefix lookups. For example, when searching for the +longest prefix match for an IPv4 address, ``prefixlen`` should be set to +``32``. + +.. c:function:: + long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags) + +Prefix entries can be added or updated using the ``bpf_map_update_elem()`` +helper. This helper replaces existing elements atomically. + +``bpf_map_update_elem()`` returns ``0`` on success, or negative error in +case of failure. + + .. note:: + The flags parameter must be one of BPF_ANY, BPF_NOEXIST or BPF_EXIST, + but the value is ignored, giving BPF_ANY semantics. + +.. c:function:: + long bpf_map_delete_elem(struct bpf_map *map, const void *key) + +Prefix entries can be deleted using the ``bpf_map_delete_elem()`` +helper. This helper will return 0 on success, or negative error in case +of failure. + +Userspace +--------- + +Access from userspace uses libbpf APIs with the same names as above, with +the map identified by ``fd``. + +.. c:function:: + int bpf_map_get_next_key (int fd, const void *cur_key, void *next_key) + +A userspace program can iterate through the entries in an LPM trie using +libbpf's ``bpf_map_get_next_key()`` function. The first key can be +fetched by calling ``bpf_map_get_next_key()`` with ``cur_key`` set to +``NULL``. Subsequent calls will fetch the next key that follows the +current key. ``bpf_map_get_next_key()`` returns ``0`` on success, +``-ENOENT`` if ``cur_key`` is the last key in the trie, or negative +error in case of failure. + +``bpf_map_get_next_key()`` will iterate through the LPM trie elements +from leftmost leaf first. This means that iteration will return more +specific keys before less specific ones. + +Examples +======== + +Please see ``tools/testing/selftests/bpf/test_lpm_map.c`` for examples +of LPM trie usage from userspace. The code snippets below demonstrate +API usage. + +Kernel BPF +---------- + +The following BPF code snippet shows how to declare a new LPM trie for IPv4 +address prefixes: + +.. code-block:: c + + #include <linux/bpf.h> + #include <bpf/bpf_helpers.h> + + struct ipv4_lpm_key { + __u32 prefixlen; + __u32 data; + }; + + struct { + __uint(type, BPF_MAP_TYPE_LPM_TRIE); + __type(key, struct ipv4_lpm_key); + __type(value, __u32); + __uint(map_flags, BPF_F_NO_PREALLOC); + __uint(max_entries, 255); + } ipv4_lpm_map SEC(".maps"); + +The following BPF code snippet shows how to lookup by IPv4 address: + +.. code-block:: c + + void *lookup(__u32 ipaddr) + { + struct ipv4_lpm_key key = { + .prefixlen = 32, + .data = ipaddr + }; + + return bpf_map_lookup_elem(&ipv4_lpm_map, &key); + } + +Userspace +--------- + +The following snippet shows how to insert an IPv4 prefix entry into an +LPM trie: + +.. code-block:: c + + int add_prefix_entry(int lpm_fd, __u32 addr, __u32 prefixlen, struct value *value) + { + struct ipv4_lpm_key ipv4_key = { + .prefixlen = prefixlen, + .data = addr + }; + return bpf_map_update_elem(lpm_fd, &ipv4_key, value, BPF_ANY); + } + +The following snippet shows a userspace program walking through the entries +of an LPM trie: + + +.. code-block:: c + + #include <bpf/libbpf.h> + #include <bpf/bpf.h> + + void iterate_lpm_trie(int map_fd) + { + struct ipv4_lpm_key *cur_key = NULL; + struct ipv4_lpm_key next_key; + struct value value; + int err; + + for (;;) { + err = bpf_map_get_next_key(map_fd, cur_key, &next_key); + if (err) + break; + + bpf_map_lookup_elem(map_fd, &next_key, &value); + + /* Use key and value here */ + + cur_key = &next_key; + } + }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit 896880ff30866f386ebed14ab81ce1ad3710cfc4 ]
Replace deprecated 0-length array in struct bpf_lpm_trie_key with flexible array. Found with GCC 13:
../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=] 207 | *(__be16 *)&key->data[i]); | ^~~~~~~~~~~~~ ../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16' 102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x)) | ^ ../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu' 97 | #define be16_to_cpu __be16_to_cpu | ^~~~~~~~~~~~~ ../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu' 206 | u16 diff = be16_to_cpu(*(__be16 *)&node->data[i] ^ | ^~~~~~~~~~~ In file included from ../include/linux/bpf.h:7: ../include/uapi/linux/bpf.h:82:17: note: while referencing 'data' 82 | __u8 data[0]; /* Arbitrary size */ | ^~~~
And found at run-time under CONFIG_FORTIFY_SOURCE:
UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49 index 0 is out of range for type '__u8 [*]'
Changing struct bpf_lpm_trie_key is difficult since has been used by userspace. For example, in Cilium:
struct egress_gw_policy_key { struct bpf_lpm_trie_key lpm_key; __u32 saddr; __u32 daddr; };
While direct references to the "data" member haven't been found, there are static initializers what include the final member. For example, the "{}" here:
struct egress_gw_policy_key in_key = { .lpm_key = { 32 + 24, {} }, .saddr = CLIENT_IP, .daddr = EXTERNAL_SVC_IP & 0Xffffff, };
To avoid the build time and run time warnings seen with a 0-sized trailing array for struct bpf_lpm_trie_key, introduce a new struct that correctly uses a flexible array for the trailing bytes, struct bpf_lpm_trie_key_u8. As part of this, include the "header" portion (which is just the "prefixlen" member), so it can be used by anything building a bpf_lpr_trie_key that has trailing members that aren't a u8 flexible array (like the self-test[1]), which is named struct bpf_lpm_trie_key_hdr.
Unfortunately, C++ refuses to parse the __struct_group() helper, so it is not possible to define struct bpf_lpm_trie_key_hdr directly in struct bpf_lpm_trie_key_u8, so we must open-code the union directly.
Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out, and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment to the UAPI header directing folks to the two new options.
Reported-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Gustavo A. R. Silva gustavoars@kernel.org Closes: https://paste.debian.net/hidden/ca500597/ Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1] Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org Stable-dep-of: 59f2f841179a ("bpf: Avoid kfree_rcu() under lock in bpf_lpm_trie.") Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/bpf/map_lpm_trie.rst | 2 +- include/uapi/linux/bpf.h | 19 +++++++++++++++++- kernel/bpf/lpm_trie.c | 20 +++++++++---------- samples/bpf/map_perf_test_user.c | 2 +- samples/bpf/xdp_router_ipv4_user.c | 2 +- tools/include/uapi/linux/bpf.h | 19 +++++++++++++++++- .../selftests/bpf/progs/map_ptr_kern.c | 2 +- tools/testing/selftests/bpf/test_lpm_map.c | 18 ++++++++--------- 8 files changed, 59 insertions(+), 25 deletions(-)
diff --git a/Documentation/bpf/map_lpm_trie.rst b/Documentation/bpf/map_lpm_trie.rst index 31be1aa7ba2cb..b4fce3f7c98ff 100644 --- a/Documentation/bpf/map_lpm_trie.rst +++ b/Documentation/bpf/map_lpm_trie.rst @@ -17,7 +17,7 @@ significant byte.
LPM tries may be created with a maximum prefix length that is a multiple of 8, in the range from 8 to 2048. The key used for lookup and update -operations is a ``struct bpf_lpm_trie_key``, extended by +operations is a ``struct bpf_lpm_trie_key_u8``, extended by ``max_prefixlen/8`` bytes.
- For IPv4 addresses the data length is 4 bytes diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index a17688011440e..58c7fc75da752 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -76,12 +76,29 @@ struct bpf_insn { __s32 imm; /* signed immediate constant */ };
-/* Key of an a BPF_MAP_TYPE_LPM_TRIE entry */ +/* Deprecated: use struct bpf_lpm_trie_key_u8 (when the "data" member is needed for + * byte access) or struct bpf_lpm_trie_key_hdr (when using an alternative type for + * the trailing flexible array member) instead. + */ struct bpf_lpm_trie_key { __u32 prefixlen; /* up to 32 for AF_INET, 128 for AF_INET6 */ __u8 data[0]; /* Arbitrary size */ };
+/* Header for bpf_lpm_trie_key structs */ +struct bpf_lpm_trie_key_hdr { + __u32 prefixlen; +}; + +/* Key of an a BPF_MAP_TYPE_LPM_TRIE entry, with trailing byte array. */ +struct bpf_lpm_trie_key_u8 { + union { + struct bpf_lpm_trie_key_hdr hdr; + __u32 prefixlen; + }; + __u8 data[]; /* Arbitrary size */ +}; + struct bpf_cgroup_storage_key { __u64 cgroup_inode_id; /* cgroup inode id */ __u32 attach_type; /* program attach type (enum bpf_attach_type) */ diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index ce3a091d52e89..b80bffc59e5fb 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -164,13 +164,13 @@ static inline int extract_bit(const u8 *data, size_t index) */ static size_t longest_prefix_match(const struct lpm_trie *trie, const struct lpm_trie_node *node, - const struct bpf_lpm_trie_key *key) + const struct bpf_lpm_trie_key_u8 *key) { u32 limit = min(node->prefixlen, key->prefixlen); u32 prefixlen = 0, i = 0;
BUILD_BUG_ON(offsetof(struct lpm_trie_node, data) % sizeof(u32)); - BUILD_BUG_ON(offsetof(struct bpf_lpm_trie_key, data) % sizeof(u32)); + BUILD_BUG_ON(offsetof(struct bpf_lpm_trie_key_u8, data) % sizeof(u32));
#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && defined(CONFIG_64BIT)
@@ -229,7 +229,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key) { struct lpm_trie *trie = container_of(map, struct lpm_trie, map); struct lpm_trie_node *node, *found = NULL; - struct bpf_lpm_trie_key *key = _key; + struct bpf_lpm_trie_key_u8 *key = _key;
if (key->prefixlen > trie->max_prefixlen) return NULL; @@ -309,7 +309,7 @@ static int trie_update_elem(struct bpf_map *map, struct lpm_trie *trie = container_of(map, struct lpm_trie, map); struct lpm_trie_node *node, *im_node = NULL, *new_node = NULL; struct lpm_trie_node __rcu **slot; - struct bpf_lpm_trie_key *key = _key; + struct bpf_lpm_trie_key_u8 *key = _key; unsigned long irq_flags; unsigned int next_bit; size_t matchlen = 0; @@ -437,7 +437,7 @@ static int trie_update_elem(struct bpf_map *map, static int trie_delete_elem(struct bpf_map *map, void *_key) { struct lpm_trie *trie = container_of(map, struct lpm_trie, map); - struct bpf_lpm_trie_key *key = _key; + struct bpf_lpm_trie_key_u8 *key = _key; struct lpm_trie_node __rcu **trim, **trim2; struct lpm_trie_node *node, *parent; unsigned long irq_flags; @@ -536,7 +536,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) sizeof(struct lpm_trie_node)) #define LPM_VAL_SIZE_MIN 1
-#define LPM_KEY_SIZE(X) (sizeof(struct bpf_lpm_trie_key) + (X)) +#define LPM_KEY_SIZE(X) (sizeof(struct bpf_lpm_trie_key_u8) + (X)) #define LPM_KEY_SIZE_MAX LPM_KEY_SIZE(LPM_DATA_SIZE_MAX) #define LPM_KEY_SIZE_MIN LPM_KEY_SIZE(LPM_DATA_SIZE_MIN)
@@ -568,7 +568,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr) /* copy mandatory map attributes */ bpf_map_init_from_attr(&trie->map, attr); trie->data_size = attr->key_size - - offsetof(struct bpf_lpm_trie_key, data); + offsetof(struct bpf_lpm_trie_key_u8, data); trie->max_prefixlen = trie->data_size * 8;
spin_lock_init(&trie->lock); @@ -619,7 +619,7 @@ static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key) { struct lpm_trie_node *node, *next_node = NULL, *parent, *search_root; struct lpm_trie *trie = container_of(map, struct lpm_trie, map); - struct bpf_lpm_trie_key *key = _key, *next_key = _next_key; + struct bpf_lpm_trie_key_u8 *key = _key, *next_key = _next_key; struct lpm_trie_node **node_stack = NULL; int err = 0, stack_ptr = -1; unsigned int next_bit; @@ -706,7 +706,7 @@ static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key) } do_copy: next_key->prefixlen = next_node->prefixlen; - memcpy((void *)next_key + offsetof(struct bpf_lpm_trie_key, data), + memcpy((void *)next_key + offsetof(struct bpf_lpm_trie_key_u8, data), next_node->data, trie->data_size); free_stack: kfree(node_stack); @@ -718,7 +718,7 @@ static int trie_check_btf(const struct bpf_map *map, const struct btf_type *key_type, const struct btf_type *value_type) { - /* Keys must have struct bpf_lpm_trie_key embedded. */ + /* Keys must have struct bpf_lpm_trie_key_u8 embedded. */ return BTF_INFO_KIND(key_type->info) != BTF_KIND_STRUCT ? -EINVAL : 0; } diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c index 1bb53f4b29e11..cb5c776103b99 100644 --- a/samples/bpf/map_perf_test_user.c +++ b/samples/bpf/map_perf_test_user.c @@ -370,7 +370,7 @@ static void run_perf_test(int tasks)
static void fill_lpm_trie(void) { - struct bpf_lpm_trie_key *key; + struct bpf_lpm_trie_key_u8 *key; unsigned long value = 0; unsigned int i; int r; diff --git a/samples/bpf/xdp_router_ipv4_user.c b/samples/bpf/xdp_router_ipv4_user.c index 683913bbf2797..28bae295d0ed1 100644 --- a/samples/bpf/xdp_router_ipv4_user.c +++ b/samples/bpf/xdp_router_ipv4_user.c @@ -91,7 +91,7 @@ static int recv_msg(struct sockaddr_nl sock_addr, int sock) static void read_route(struct nlmsghdr *nh, int nll) { char dsts[24], gws[24], ifs[16], dsts_len[24], metrics[24]; - struct bpf_lpm_trie_key *prefix_key; + struct bpf_lpm_trie_key_u8 *prefix_key; struct rtattr *rt_attr; struct rtmsg *rt_msg; int rtm_family; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index a17688011440e..58c7fc75da752 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -76,12 +76,29 @@ struct bpf_insn { __s32 imm; /* signed immediate constant */ };
-/* Key of an a BPF_MAP_TYPE_LPM_TRIE entry */ +/* Deprecated: use struct bpf_lpm_trie_key_u8 (when the "data" member is needed for + * byte access) or struct bpf_lpm_trie_key_hdr (when using an alternative type for + * the trailing flexible array member) instead. + */ struct bpf_lpm_trie_key { __u32 prefixlen; /* up to 32 for AF_INET, 128 for AF_INET6 */ __u8 data[0]; /* Arbitrary size */ };
+/* Header for bpf_lpm_trie_key structs */ +struct bpf_lpm_trie_key_hdr { + __u32 prefixlen; +}; + +/* Key of an a BPF_MAP_TYPE_LPM_TRIE entry, with trailing byte array. */ +struct bpf_lpm_trie_key_u8 { + union { + struct bpf_lpm_trie_key_hdr hdr; + __u32 prefixlen; + }; + __u8 data[]; /* Arbitrary size */ +}; + struct bpf_cgroup_storage_key { __u64 cgroup_inode_id; /* cgroup inode id */ __u32 attach_type; /* program attach type (enum bpf_attach_type) */ diff --git a/tools/testing/selftests/bpf/progs/map_ptr_kern.c b/tools/testing/selftests/bpf/progs/map_ptr_kern.c index db388f593d0a2..96eed198af361 100644 --- a/tools/testing/selftests/bpf/progs/map_ptr_kern.c +++ b/tools/testing/selftests/bpf/progs/map_ptr_kern.c @@ -311,7 +311,7 @@ struct lpm_trie { } __attribute__((preserve_access_index));
struct lpm_key { - struct bpf_lpm_trie_key trie_key; + struct bpf_lpm_trie_key_hdr trie_key; __u32 data; };
diff --git a/tools/testing/selftests/bpf/test_lpm_map.c b/tools/testing/selftests/bpf/test_lpm_map.c index c028d621c744d..d98c72dc563ea 100644 --- a/tools/testing/selftests/bpf/test_lpm_map.c +++ b/tools/testing/selftests/bpf/test_lpm_map.c @@ -211,7 +211,7 @@ static void test_lpm_map(int keysize) volatile size_t n_matches, n_matches_after_delete; size_t i, j, n_nodes, n_lookups; struct tlpm_node *t, *list = NULL; - struct bpf_lpm_trie_key *key; + struct bpf_lpm_trie_key_u8 *key; uint8_t *data, *value; int r, map;
@@ -331,8 +331,8 @@ static void test_lpm_map(int keysize) static void test_lpm_ipaddr(void) { LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_NO_PREALLOC); - struct bpf_lpm_trie_key *key_ipv4; - struct bpf_lpm_trie_key *key_ipv6; + struct bpf_lpm_trie_key_u8 *key_ipv4; + struct bpf_lpm_trie_key_u8 *key_ipv6; size_t key_size_ipv4; size_t key_size_ipv6; int map_fd_ipv4; @@ -423,7 +423,7 @@ static void test_lpm_ipaddr(void) static void test_lpm_delete(void) { LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_NO_PREALLOC); - struct bpf_lpm_trie_key *key; + struct bpf_lpm_trie_key_u8 *key; size_t key_size; int map_fd; __u64 value; @@ -532,7 +532,7 @@ static void test_lpm_delete(void) static void test_lpm_get_next_key(void) { LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_NO_PREALLOC); - struct bpf_lpm_trie_key *key_p, *next_key_p; + struct bpf_lpm_trie_key_u8 *key_p, *next_key_p; size_t key_size; __u32 value = 0; int map_fd; @@ -693,9 +693,9 @@ static void *lpm_test_command(void *arg) { int i, j, ret, iter, key_size; struct lpm_mt_test_info *info = arg; - struct bpf_lpm_trie_key *key_p; + struct bpf_lpm_trie_key_u8 *key_p;
- key_size = sizeof(struct bpf_lpm_trie_key) + sizeof(__u32); + key_size = sizeof(*key_p) + sizeof(__u32); key_p = alloca(key_size); for (iter = 0; iter < info->iter; iter++) for (i = 0; i < MAX_TEST_KEYS; i++) { @@ -717,7 +717,7 @@ static void *lpm_test_command(void *arg) ret = bpf_map_lookup_elem(info->map_fd, key_p, &value); assert(ret == 0 || errno == ENOENT); } else { - struct bpf_lpm_trie_key *next_key_p = alloca(key_size); + struct bpf_lpm_trie_key_u8 *next_key_p = alloca(key_size); ret = bpf_map_get_next_key(info->map_fd, key_p, next_key_p); assert(ret == 0 || errno == ENOENT || errno == ENOMEM); } @@ -752,7 +752,7 @@ static void test_lpm_multi_thread(void)
/* create a trie */ value_size = sizeof(__u32); - key_size = sizeof(struct bpf_lpm_trie_key) + value_size; + key_size = sizeof(struct bpf_lpm_trie_key_hdr) + value_size; map_fd = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE, NULL, key_size, value_size, 100, &opts);
/* create 4 threads to test update, delete, lookup and get_next_key */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexei Starovoitov ast@kernel.org
[ Upstream commit 59f2f841179aa6a0899cb9cf53659149a35749b7 ]
syzbot reported the following lock sequence: cpu 2: grabs timer_base lock spins on bpf_lpm lock
cpu 1: grab rcu krcp lock spins on timer_base lock
cpu 0: grab bpf_lpm lock spins on rcu krcp lock
bpf_lpm lock can be the same. timer_base lock can also be the same due to timer migration. but rcu krcp lock is always per-cpu, so it cannot be the same lock. Hence it's a false positive. To avoid lockdep complaining move kfree_rcu() after spin_unlock.
Reported-by: syzbot+1fa663a2100308ab6eab@syzkaller.appspotmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20240329171439.37813-1-alexei.starovoitov@gmail.... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/lpm_trie.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index b80bffc59e5fb..37b510d91b810 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -308,6 +308,7 @@ static int trie_update_elem(struct bpf_map *map, { struct lpm_trie *trie = container_of(map, struct lpm_trie, map); struct lpm_trie_node *node, *im_node = NULL, *new_node = NULL; + struct lpm_trie_node *free_node = NULL; struct lpm_trie_node __rcu **slot; struct bpf_lpm_trie_key_u8 *key = _key; unsigned long irq_flags; @@ -382,7 +383,7 @@ static int trie_update_elem(struct bpf_map *map, trie->n_entries--;
rcu_assign_pointer(*slot, new_node); - kfree_rcu(node, rcu); + free_node = node;
goto out; } @@ -429,6 +430,7 @@ static int trie_update_elem(struct bpf_map *map, }
spin_unlock_irqrestore(&trie->lock, irq_flags); + kfree_rcu(free_node, rcu);
return ret; } @@ -437,6 +439,7 @@ static int trie_update_elem(struct bpf_map *map, static int trie_delete_elem(struct bpf_map *map, void *_key) { struct lpm_trie *trie = container_of(map, struct lpm_trie, map); + struct lpm_trie_node *free_node = NULL, *free_parent = NULL; struct bpf_lpm_trie_key_u8 *key = _key; struct lpm_trie_node __rcu **trim, **trim2; struct lpm_trie_node *node, *parent; @@ -506,8 +509,8 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) else rcu_assign_pointer( *trim2, rcu_access_pointer(parent->child[0])); - kfree_rcu(parent, rcu); - kfree_rcu(node, rcu); + free_parent = parent; + free_node = node; goto out; }
@@ -521,10 +524,12 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) rcu_assign_pointer(*trim, rcu_access_pointer(node->child[1])); else RCU_INIT_POINTER(*trim, NULL); - kfree_rcu(node, rcu); + free_node = node;
out: spin_unlock_irqrestore(&trie->lock, irq_flags); + kfree_rcu(free_parent, rcu); + kfree_rcu(free_node, rcu);
return ret; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit a97de7bff13b1cc825c1b1344eaed8d6c2d3e695 ]
syzbot reported rfcomm_sock_setsockopt_old() is copying data without checking user input length.
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline] BUG: KASAN: slab-out-of-bounds in rfcomm_sock_setsockopt_old net/bluetooth/rfcomm/sock.c:632 [inline] BUG: KASAN: slab-out-of-bounds in rfcomm_sock_setsockopt+0x893/0xa70 net/bluetooth/rfcomm/sock.c:673 Read of size 4 at addr ffff8880209a8bc3 by task syz-executor632/5064
Fixes: 9f2c8a03fbb3 ("Bluetooth: Replace RFCOMM link mode with security level") Fixes: bb23c0ab8246 ("Bluetooth: Add support for deferring RFCOMM connection setup") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/rfcomm/sock.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c index b54e8a530f55a..29aa07e9db9d7 100644 --- a/net/bluetooth/rfcomm/sock.c +++ b/net/bluetooth/rfcomm/sock.c @@ -629,7 +629,7 @@ static int rfcomm_sock_setsockopt_old(struct socket *sock, int optname,
switch (optname) { case RFCOMM_LM: - if (copy_from_sockptr(&opt, optval, sizeof(u32))) { + if (bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen)) { err = -EFAULT; break; } @@ -664,7 +664,6 @@ static int rfcomm_sock_setsockopt(struct socket *sock, int level, int optname, struct sock *sk = sock->sk; struct bt_security sec; int err = 0; - size_t len; u32 opt;
BT_DBG("sk %p", sk); @@ -686,11 +685,9 @@ static int rfcomm_sock_setsockopt(struct socket *sock, int level, int optname,
sec.level = BT_SECURITY_LOW;
- len = min_t(unsigned int, sizeof(sec), optlen); - if (copy_from_sockptr(&sec, optval, len)) { - err = -EFAULT; + err = bt_copy_from_sockptr(&sec, sizeof(sec), optval, optlen); + if (err) break; - }
if (sec.level > BT_SECURITY_HIGH) { err = -EINVAL; @@ -706,10 +703,9 @@ static int rfcomm_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- if (copy_from_sockptr(&opt, optval, sizeof(u32))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + if (err) break; - }
if (opt) set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li Zhong floridsleeves@gmail.com
[ Upstream commit 56d0d0b9289dae041becc7ee6bd966a00dd610e0 ]
Check the return value of ext4_xattr_inode_dec_ref(), which could return error code and need to be warned.
Signed-off-by: Li Zhong floridsleeves@gmail.com Link: https://lore.kernel.org/r/20220917002816.3804400-1-floridsleeves@gmail.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 0a46ef234756 ("ext4: do not create EA inode under buffer lock") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/xattr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index f0a45d3ec4ebb..0df0a3ecba37a 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1550,7 +1550,8 @@ static int ext4_xattr_inode_lookup_create(handle_t *handle, struct inode *inode,
err = ext4_xattr_inode_write(handle, ea_inode, value, value_len); if (err) { - ext4_xattr_inode_dec_ref(handle, ea_inode); + if (ext4_xattr_inode_dec_ref(handle, ea_inode)) + ext4_warning_inode(ea_inode, "cleanup dec ref error %d", err); iput(ea_inode); return err; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kara jack@suse.cz
[ Upstream commit 8208c41c43ad5e9b63dce6c45a73e326109ca658 ]
When allocating EA inode, quota accounting is done just before ext4_xattr_inode_lookup_create(). Logically these two operations belong together so just fold quota accounting into ext4_xattr_inode_lookup_create(). We also make ext4_xattr_inode_lookup_create() return the looked up / created inode to convert the function to a more standard calling convention.
Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240209112107.10585-1-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 0a46ef234756 ("ext4: do not create EA inode under buffer lock") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/xattr.c | 50 ++++++++++++++++++++++++------------------------- 1 file changed, 24 insertions(+), 26 deletions(-)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 0df0a3ecba37a..b18035b8887be 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1522,46 +1522,49 @@ ext4_xattr_inode_cache_find(struct inode *inode, const void *value, /* * Add value of the EA in an inode. */ -static int ext4_xattr_inode_lookup_create(handle_t *handle, struct inode *inode, - const void *value, size_t value_len, - struct inode **ret_inode) +static struct inode *ext4_xattr_inode_lookup_create(handle_t *handle, + struct inode *inode, const void *value, size_t value_len) { struct inode *ea_inode; u32 hash; int err;
+ /* Account inode & space to quota even if sharing... */ + err = ext4_xattr_inode_alloc_quota(inode, value_len); + if (err) + return ERR_PTR(err); + hash = ext4_xattr_inode_hash(EXT4_SB(inode->i_sb), value, value_len); ea_inode = ext4_xattr_inode_cache_find(inode, value, value_len, hash); if (ea_inode) { err = ext4_xattr_inode_inc_ref(handle, ea_inode); - if (err) { - iput(ea_inode); - return err; - } - - *ret_inode = ea_inode; - return 0; + if (err) + goto out_err; + return ea_inode; }
/* Create an inode for the EA value */ ea_inode = ext4_xattr_inode_create(handle, inode, hash); - if (IS_ERR(ea_inode)) - return PTR_ERR(ea_inode); + if (IS_ERR(ea_inode)) { + ext4_xattr_inode_free_quota(inode, NULL, value_len); + return ea_inode; + }
err = ext4_xattr_inode_write(handle, ea_inode, value, value_len); if (err) { if (ext4_xattr_inode_dec_ref(handle, ea_inode)) ext4_warning_inode(ea_inode, "cleanup dec ref error %d", err); - iput(ea_inode); - return err; + goto out_err; }
if (EA_INODE_CACHE(inode)) mb_cache_entry_create(EA_INODE_CACHE(inode), GFP_NOFS, hash, ea_inode->i_ino, true /* reusable */); - - *ret_inode = ea_inode; - return 0; + return ea_inode; +out_err: + iput(ea_inode); + ext4_xattr_inode_free_quota(inode, NULL, value_len); + return ERR_PTR(err); }
/* @@ -1669,16 +1672,11 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i, if (i->value && in_inode) { WARN_ON_ONCE(!i->value_len);
- ret = ext4_xattr_inode_alloc_quota(inode, i->value_len); - if (ret) - goto out; - - ret = ext4_xattr_inode_lookup_create(handle, inode, i->value, - i->value_len, - &new_ea_inode); - if (ret) { + new_ea_inode = ext4_xattr_inode_lookup_create(handle, inode, + i->value, i->value_len); + if (IS_ERR(new_ea_inode)) { + ret = PTR_ERR(new_ea_inode); new_ea_inode = NULL; - ext4_xattr_inode_free_quota(inode, NULL, i->value_len); goto out; } }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kara jack@suse.cz
[ Upstream commit 0a46ef234756dca04623b7591e8ebb3440622f0b ]
ext4_xattr_set_entry() creates new EA inodes while holding buffer lock on the external xattr block. This is problematic as it nests all the allocation locking (which acquires locks on other buffers) under the buffer lock. This can even deadlock when the filesystem is corrupted and e.g. quota file is setup to contain xattr block as data block. Move the allocation of EA inode out of ext4_xattr_set_entry() into the callers.
Reported-by: syzbot+a43d4f48b8397d0e41a9@syzkaller.appspotmail.com Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240321162657.27420-2-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/xattr.c | 113 +++++++++++++++++++++++------------------------- 1 file changed, 53 insertions(+), 60 deletions(-)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index b18035b8887be..d94b1a6c60e27 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1576,6 +1576,7 @@ static struct inode *ext4_xattr_inode_lookup_create(handle_t *handle, static int ext4_xattr_set_entry(struct ext4_xattr_info *i, struct ext4_xattr_search *s, handle_t *handle, struct inode *inode, + struct inode *new_ea_inode, bool is_block) { struct ext4_xattr_entry *last, *next; @@ -1583,7 +1584,6 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i, size_t min_offs = s->end - s->base, name_len = strlen(i->name); int in_inode = i->in_inode; struct inode *old_ea_inode = NULL; - struct inode *new_ea_inode = NULL; size_t old_size, new_size; int ret;
@@ -1668,38 +1668,11 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i, old_ea_inode = NULL; goto out; } - } - if (i->value && in_inode) { - WARN_ON_ONCE(!i->value_len); - - new_ea_inode = ext4_xattr_inode_lookup_create(handle, inode, - i->value, i->value_len); - if (IS_ERR(new_ea_inode)) { - ret = PTR_ERR(new_ea_inode); - new_ea_inode = NULL; - goto out; - } - }
- if (old_ea_inode) { /* We are ready to release ref count on the old_ea_inode. */ ret = ext4_xattr_inode_dec_ref(handle, old_ea_inode); - if (ret) { - /* Release newly required ref count on new_ea_inode. */ - if (new_ea_inode) { - int err; - - err = ext4_xattr_inode_dec_ref(handle, - new_ea_inode); - if (err) - ext4_warning_inode(new_ea_inode, - "dec ref new_ea_inode err=%d", - err); - ext4_xattr_inode_free_quota(inode, new_ea_inode, - i->value_len); - } + if (ret) goto out; - }
ext4_xattr_inode_free_quota(inode, old_ea_inode, le32_to_cpu(here->e_value_size)); @@ -1823,7 +1796,6 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i, ret = 0; out: iput(old_ea_inode); - iput(new_ea_inode); return ret; }
@@ -1886,9 +1858,21 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, size_t old_ea_inode_quota = 0; unsigned int ea_ino;
- #define header(x) ((struct ext4_xattr_header *)(x))
+ /* If we need EA inode, prepare it before locking the buffer */ + if (i->value && i->in_inode) { + WARN_ON_ONCE(!i->value_len); + + ea_inode = ext4_xattr_inode_lookup_create(handle, inode, + i->value, i->value_len); + if (IS_ERR(ea_inode)) { + error = PTR_ERR(ea_inode); + ea_inode = NULL; + goto cleanup; + } + } + if (s->base) { int offset = (char *)s->here - bs->bh->b_data;
@@ -1897,6 +1881,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, EXT4_JTR_NONE); if (error) goto cleanup; + lock_buffer(bs->bh);
if (header(s->base)->h_refcount == cpu_to_le32(1)) { @@ -1923,7 +1908,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, } ea_bdebug(bs->bh, "modifying in-place"); error = ext4_xattr_set_entry(i, s, handle, inode, - true /* is_block */); + ea_inode, true /* is_block */); ext4_xattr_block_csum_set(inode, bs->bh); unlock_buffer(bs->bh); if (error == -EFSCORRUPTED) @@ -1991,29 +1976,13 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, s->end = s->base + sb->s_blocksize; }
- error = ext4_xattr_set_entry(i, s, handle, inode, true /* is_block */); + error = ext4_xattr_set_entry(i, s, handle, inode, ea_inode, + true /* is_block */); if (error == -EFSCORRUPTED) goto bad_block; if (error) goto cleanup;
- if (i->value && s->here->e_value_inum) { - /* - * A ref count on ea_inode has been taken as part of the call to - * ext4_xattr_set_entry() above. We would like to drop this - * extra ref but we have to wait until the xattr block is - * initialized and has its own ref count on the ea_inode. - */ - ea_ino = le32_to_cpu(s->here->e_value_inum); - error = ext4_xattr_inode_iget(inode, ea_ino, - le32_to_cpu(s->here->e_hash), - &ea_inode); - if (error) { - ea_inode = NULL; - goto cleanup; - } - } - inserted: if (!IS_LAST_ENTRY(s->first)) { new_bh = ext4_xattr_block_cache_find(inode, header(s->base), @@ -2166,17 +2135,16 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
cleanup: if (ea_inode) { - int error2; - - error2 = ext4_xattr_inode_dec_ref(handle, ea_inode); - if (error2) - ext4_warning_inode(ea_inode, "dec ref error=%d", - error2); + if (error) { + int error2;
- /* If there was an error, revert the quota charge. */ - if (error) + error2 = ext4_xattr_inode_dec_ref(handle, ea_inode); + if (error2) + ext4_warning_inode(ea_inode, "dec ref error=%d", + error2); ext4_xattr_inode_free_quota(inode, ea_inode, i_size_read(ea_inode)); + } iput(ea_inode); } if (ce) @@ -2234,14 +2202,38 @@ int ext4_xattr_ibody_set(handle_t *handle, struct inode *inode, { struct ext4_xattr_ibody_header *header; struct ext4_xattr_search *s = &is->s; + struct inode *ea_inode = NULL; int error;
if (!EXT4_INODE_HAS_XATTR_SPACE(inode)) return -ENOSPC;
- error = ext4_xattr_set_entry(i, s, handle, inode, false /* is_block */); - if (error) + /* If we need EA inode, prepare it before locking the buffer */ + if (i->value && i->in_inode) { + WARN_ON_ONCE(!i->value_len); + + ea_inode = ext4_xattr_inode_lookup_create(handle, inode, + i->value, i->value_len); + if (IS_ERR(ea_inode)) + return PTR_ERR(ea_inode); + } + error = ext4_xattr_set_entry(i, s, handle, inode, ea_inode, + false /* is_block */); + if (error) { + if (ea_inode) { + int error2; + + error2 = ext4_xattr_inode_dec_ref(handle, ea_inode); + if (error2) + ext4_warning_inode(ea_inode, "dec ref error=%d", + error2); + + ext4_xattr_inode_free_quota(inode, ea_inode, + i_size_read(ea_inode)); + iput(ea_inode); + } return error; + } header = IHDR(inode, ext4_raw_inode(&is->iloc)); if (!IS_LAST_ENTRY(s->first)) { header->h_magic = cpu_to_le32(EXT4_XATTR_MAGIC); @@ -2250,6 +2242,7 @@ int ext4_xattr_ibody_set(handle_t *handle, struct inode *inode, header->h_magic = cpu_to_le32(0); ext4_clear_inode_state(inode, EXT4_STATE_XATTR); } + iput(ea_inode); return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kara jack@suse.cz
[ Upstream commit 27ab33854873e6fb958cb074681a0107cc2ecc4c ]
Syzbot reports uninitialized memory access in udf_rename() when updating checksum of '..' directory entry of a moved directory. This is indeed true as we pass on-stack diriter.fi to the udf_update_tag() and because that has only struct fileIdentDesc included in it and not the impUse or name fields, the checksumming function is going to checksum random stack contents beyond the end of the structure. This is actually harmless because the following udf_fiiter_write_fi() will recompute the checksum from on-disk buffers where everything is properly included. So all that is needed is just removing the bogus calculation.
Fixes: e9109a92d2a9 ("udf: Convert udf_rename() to new directory iteration code") Link: https://lore.kernel.org/all/000000000000cf405f060d8f75a9@google.com/T/ Link: https://patch.msgid.link/20240617154201.29512-1-jack@suse.cz Reported-by: syzbot+d31185aa54170f7fc1f5@syzkaller.appspotmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- fs/udf/namei.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/udf/namei.c b/fs/udf/namei.c index 7c95c549dd64e..ded71044988ab 100644 --- a/fs/udf/namei.c +++ b/fs/udf/namei.c @@ -1183,7 +1183,6 @@ static int udf_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
if (dir_fi) { dir_fi->icb.extLocation = cpu_to_lelb(UDF_I(new_dir)->i_location); - udf_update_tag((char *)dir_fi, udf_dir_entry_len(dir_fi)); if (old_iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB) mark_inode_dirty(old_inode); else
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: yunshui jiangyunshui@kylinos.cn
[ Upstream commit d9cbd8343b010016fcaabc361c37720dcafddcbe ]
syzbot/KCSAN reported that races happen when multiple CPUs updating dev->stats.tx_error concurrently. Adopt SMP safe DEV_STATS_INC() to update the dev->stats fields.
Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: yunshui jiangyunshui@kylinos.cn Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20240523033520.4029314-1-jiangyunshui@kylinos.cn Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/filter.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c index 210b881cb50b8..1cd5f146cafe4 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2264,12 +2264,12 @@ static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
err = bpf_out_neigh_v6(net, skb, dev, nh); if (unlikely(net_xmit_eval(err))) - dev->stats.tx_errors++; + DEV_STATS_INC(dev, tx_errors); else ret = NET_XMIT_SUCCESS; goto out_xmit; out_drop: - dev->stats.tx_errors++; + DEV_STATS_INC(dev, tx_errors); kfree_skb(skb); out_xmit: return ret; @@ -2371,12 +2371,12 @@ static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev,
err = bpf_out_neigh_v4(net, skb, dev, nh); if (unlikely(net_xmit_eval(err))) - dev->stats.tx_errors++; + DEV_STATS_INC(dev, tx_errors); else ret = NET_XMIT_SUCCESS; goto out_xmit; out_drop: - dev->stats.tx_errors++; + DEV_STATS_INC(dev, tx_errors); kfree_skb(skb); out_xmit: return ret;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
[ Upstream commit dd89a81d850fa9a65f67b4527c0e420d15bf836c ]
Drop the WARN_ON_ONCE inn gue_gro_receive if the encapsulated type is not known or does not have a GRO handler.
Such a packet is easily constructed. Syzbot generates them and sets off this warning.
Remove the warning as it is expected and not actionable.
The warning was previously reduced from WARN_ON to WARN_ON_ONCE in commit 270136613bf7 ("fou: Do WARN_ON_ONCE in gue_gro_receive for bad proto callbacks").
Signed-off-by: Willem de Bruijn willemb@google.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240614122552.1649044-1-willemdebruijn.kernel@gma... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/fou.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c index 0c3c6d0cee290..358bff068eef8 100644 --- a/net/ipv4/fou.c +++ b/net/ipv4/fou.c @@ -431,7 +431,7 @@ static struct sk_buff *gue_gro_receive(struct sock *sk,
offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads; ops = rcu_dereference(offloads[proto]); - if (WARN_ON_ONCE(!ops || !ops->callbacks.gro_receive)) + if (!ops || !ops->callbacks.gro_receive) goto out;
pp = call_gro_receive(ops->callbacks.gro_receive, head, skb);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Edward Adam Davis eadavis@qq.com
[ Upstream commit ce6dede912f064a855acf6f04a04cbb2c25b8c8c ]
[syzbot reported] general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 5061 Comm: syz-executor404 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:dtInsertEntry+0xd0c/0x1780 fs/jfs/jfs_dtree.c:3713 ... [Analyze] In dtInsertEntry(), when the pointer h has the same value as p, after writing name in UniStrncpy_to_le(), p->header.flag will be cleared. This will cause the previously true judgment "p->header.flag & BT-LEAF" to change to no after writing the name operation, this leads to entering an incorrect branch and accessing the uninitialized object ih when judging this condition for the second time.
[Fix] After got the page, check freelist first, if freelist == 0 then exit dtInsert() and return -EINVAL.
Reported-by: syzbot+bba84aef3a26fb93deb9@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis eadavis@qq.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jfs/jfs_dtree.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/jfs/jfs_dtree.c b/fs/jfs/jfs_dtree.c index 031d8f570f581..5d3127ca68a42 100644 --- a/fs/jfs/jfs_dtree.c +++ b/fs/jfs/jfs_dtree.c @@ -834,6 +834,8 @@ int dtInsert(tid_t tid, struct inode *ip, * the full page. */ DT_GETSEARCH(ip, btstack->top, bn, mp, p, index); + if (p->header.freelist == 0) + return -EINVAL;
/* * insert entry for new key
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pei Li peili.dev@gmail.com
[ Upstream commit 7063b80268e2593e58bee8a8d709c2f3ff93e2f2 ]
When searching for the next smaller log2 block, BLKSTOL2() returned 0, causing shift exponent -1 to be negative.
This patch fixes the issue by exiting the loop directly when negative shift is found.
Reported-by: syzbot+61be3359d2ee3467e7e4@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=61be3359d2ee3467e7e4 Signed-off-by: Pei Li peili.dev@gmail.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jfs/jfs_dmap.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c index 8d064c9e9605d..7a3f4f62c34bc 100644 --- a/fs/jfs/jfs_dmap.c +++ b/fs/jfs/jfs_dmap.c @@ -1626,6 +1626,8 @@ s64 dbDiscardAG(struct inode *ip, int agno, s64 minlen) } else if (rc == -ENOSPC) { /* search for next smaller log2 block */ l2nb = BLKSTOL2(nblocks) - 1; + if (unlikely(l2nb < 0)) + break; nblocks = 1LL << l2nb; } else { /* Trim any already allocated blocks */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konstantin Komarov almaz.alexandrovich@paragon-software.com
[ Upstream commit d57431c6f511bf020e474026d9f3123d7bfbea8c ]
In order not to call copy_to_user (from fiemap_fill_next_extent) we allocate memory in the kernel, fill it and copy it to user memory after up_read(run_lock).
Reported-by: syzbot+36bb70085ef6edc2ebb9@syzkaller.appspotmail.com Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/frecord.c | 75 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 72 insertions(+), 3 deletions(-)
diff --git a/fs/ntfs3/frecord.c b/fs/ntfs3/frecord.c index 02465ab3f398c..6cce71cc750ea 100644 --- a/fs/ntfs3/frecord.c +++ b/fs/ntfs3/frecord.c @@ -1897,6 +1897,47 @@ enum REPARSE_SIGN ni_parse_reparse(struct ntfs_inode *ni, struct ATTRIB *attr, return REPARSE_LINK; }
+/* + * fiemap_fill_next_extent_k - a copy of fiemap_fill_next_extent + * but it accepts kernel address for fi_extents_start + */ +static int fiemap_fill_next_extent_k(struct fiemap_extent_info *fieinfo, + u64 logical, u64 phys, u64 len, u32 flags) +{ + struct fiemap_extent extent; + struct fiemap_extent __user *dest = fieinfo->fi_extents_start; + + /* only count the extents */ + if (fieinfo->fi_extents_max == 0) { + fieinfo->fi_extents_mapped++; + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; + } + + if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max) + return 1; + + if (flags & FIEMAP_EXTENT_DELALLOC) + flags |= FIEMAP_EXTENT_UNKNOWN; + if (flags & FIEMAP_EXTENT_DATA_ENCRYPTED) + flags |= FIEMAP_EXTENT_ENCODED; + if (flags & (FIEMAP_EXTENT_DATA_TAIL | FIEMAP_EXTENT_DATA_INLINE)) + flags |= FIEMAP_EXTENT_NOT_ALIGNED; + + memset(&extent, 0, sizeof(extent)); + extent.fe_logical = logical; + extent.fe_physical = phys; + extent.fe_length = len; + extent.fe_flags = flags; + + dest += fieinfo->fi_extents_mapped; + memcpy(dest, &extent, sizeof(extent)); + + fieinfo->fi_extents_mapped++; + if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max) + return 1; + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; +} + /* * ni_fiemap - Helper for file_fiemap(). * @@ -1907,6 +1948,8 @@ int ni_fiemap(struct ntfs_inode *ni, struct fiemap_extent_info *fieinfo, __u64 vbo, __u64 len) { int err = 0; + struct fiemap_extent __user *fe_u = fieinfo->fi_extents_start; + struct fiemap_extent *fe_k = NULL; struct ntfs_sb_info *sbi = ni->mi.sbi; u8 cluster_bits = sbi->cluster_bits; struct runs_tree *run; @@ -1954,6 +1997,18 @@ int ni_fiemap(struct ntfs_inode *ni, struct fiemap_extent_info *fieinfo, goto out; }
+ /* + * To avoid lock problems replace pointer to user memory by pointer to kernel memory. + */ + fe_k = kmalloc_array(fieinfo->fi_extents_max, + sizeof(struct fiemap_extent), + GFP_NOFS | __GFP_ZERO); + if (!fe_k) { + err = -ENOMEM; + goto out; + } + fieinfo->fi_extents_start = fe_k; + end = vbo + len; alloc_size = le64_to_cpu(attr->nres.alloc_size); if (end > alloc_size) @@ -2042,8 +2097,9 @@ int ni_fiemap(struct ntfs_inode *ni, struct fiemap_extent_info *fieinfo, if (vbo + dlen >= end) flags |= FIEMAP_EXTENT_LAST;
- err = fiemap_fill_next_extent(fieinfo, vbo, lbo, dlen, - flags); + err = fiemap_fill_next_extent_k(fieinfo, vbo, lbo, dlen, + flags); + if (err < 0) break; if (err == 1) { @@ -2063,7 +2119,8 @@ int ni_fiemap(struct ntfs_inode *ni, struct fiemap_extent_info *fieinfo, if (vbo + bytes >= end) flags |= FIEMAP_EXTENT_LAST;
- err = fiemap_fill_next_extent(fieinfo, vbo, lbo, bytes, flags); + err = fiemap_fill_next_extent_k(fieinfo, vbo, lbo, bytes, + flags); if (err < 0) break; if (err == 1) { @@ -2076,7 +2133,19 @@ int ni_fiemap(struct ntfs_inode *ni, struct fiemap_extent_info *fieinfo,
up_read(run_lock);
+ /* + * Copy to user memory out of lock + */ + if (copy_to_user(fe_u, fe_k, + fieinfo->fi_extents_max * + sizeof(struct fiemap_extent))) { + err = -EFAULT; + } + out: + /* Restore original pointer. */ + fieinfo->fi_extents_start = fe_u; + kfree(fe_k); return err; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
[ Upstream commit 2f38cf730caedaeacdefb7ff35b0a3c1168117f9 ]
A malformed USB descriptor may pass the lengthy mixer description with a lot of channels, and this may overflow the 32bit integer shift size, as caught by syzbot UBSAN test. Although this won't cause any real trouble, it's better to address.
This patch introduces a sanity check of the number of channels to bail out the parsing when too many channels are found.
Reported-by: syzbot+78d5b129a762182225aa@syzkaller.appspotmail.com Closes: https://lore.kernel.org/0000000000000adac5061d3c7355@google.com Link: https://patch.msgid.link/20240715123619.26612-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/usb/mixer.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c index 5699a62d17679..34ded71cb8077 100644 --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -2023,6 +2023,13 @@ static int parse_audio_feature_unit(struct mixer_build *state, int unitid, bmaControls = ftr->bmaControls; }
+ if (channels > 32) { + usb_audio_info(state->chip, + "usbmixer: too many channels (%d) in unit %d\n", + channels, unitid); + return -EINVAL; + } + /* parse the source unit */ err = parse_audio_unit(state, hdr->bSourceID); if (err < 0)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Muhammad Husaini Zulkifli muhammad.husaini.zulkifli@intel.com
[ Upstream commit 790835fcc0cb9992349ae3c9010dbc7321aaa24d ]
The launchtime offset should be corrected according to sections 7.5.2.6 Transmit Scheduling Latency of the Intel Ethernet I225/I226 Software User Manual.
Software can compensate the latency between the transmission scheduling and the time that packet is transmitted to the network by setting this GTxOffset register. Without setting this register, there may be a significant delay between the packet scheduling and the network point.
This patch helps to reduce the latency for each of the link speed.
Before:
10Mbps : 11000 - 13800 nanosecond 100Mbps : 1300 - 1700 nanosecond 1000Mbps : 190 - 600 nanosecond 2500Mbps : 1400 - 1700 nanosecond
After:
10Mbps : less than 750 nanosecond 100Mbps : less than 192 nanosecond 1000Mbps : less than 128 nanosecond 2500Mbps : less than 128 nanosecond
Test Setup:
Talker : Use l2_tai.c to generate the launchtime into packet payload. Listener: Use timedump.c to compute the delta between packet arrival and LaunchTime packet payload.
Signed-off-by: Vinicius Costa Gomes vinicius.gomes@intel.com Signed-off-by: Muhammad Husaini Zulkifli muhammad.husaini.zulkifli@intel.com Acked-by: Sasha Neftin sasha.neftin@intel.com Acked-by: Paul Menzel pmenzel@molgen.mpg.de Tested-by: Naama Meir naamax.meir@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: e037a26ead18 ("igc: Fix packet still tx after gate close by reducing i226 MAC retry buffer") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc_defines.h | 9 ++++++ drivers/net/ethernet/intel/igc/igc_main.c | 7 +++++ drivers/net/ethernet/intel/igc/igc_regs.h | 1 + drivers/net/ethernet/intel/igc/igc_tsn.c | 30 ++++++++++++++++++++ drivers/net/ethernet/intel/igc/igc_tsn.h | 1 + 5 files changed, 48 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h index efdabcbd66ddd..63fa7608861b2 100644 --- a/drivers/net/ethernet/intel/igc/igc_defines.h +++ b/drivers/net/ethernet/intel/igc/igc_defines.h @@ -402,6 +402,15 @@ #define IGC_DTXMXPKTSZ_TSN 0x19 /* 1600 bytes of max TX DMA packet size */ #define IGC_DTXMXPKTSZ_DEFAULT 0x98 /* 9728-byte Jumbo frames */
+/* Transmit Scheduling Latency */ +/* Latency between transmission scheduling (LaunchTime) and the time + * the packet is transmitted to the network in nanosecond. + */ +#define IGC_TXOFFSET_SPEED_10 0x000034BC +#define IGC_TXOFFSET_SPEED_100 0x00000578 +#define IGC_TXOFFSET_SPEED_1000 0x0000012C +#define IGC_TXOFFSET_SPEED_2500 0x00000578 + /* Time Sync Interrupt Causes */ #define IGC_TSICR_SYS_WRAP BIT(0) /* SYSTIM Wrap around. */ #define IGC_TSICR_TXTS BIT(1) /* Transmit Timestamp. */ diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index e052f49cc08d7..39f8f28288aaa 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -5586,6 +5586,13 @@ static void igc_watchdog_task(struct work_struct *work) break; }
+ /* Once the launch time has been set on the wire, there + * is a delay before the link speed can be determined + * based on link-up activity. Write into the register + * as soon as we know the correct link speed. + */ + igc_tsn_adjust_txtime_offset(adapter); + if (adapter->link_speed != SPEED_1000) goto no_wait;
diff --git a/drivers/net/ethernet/intel/igc/igc_regs.h b/drivers/net/ethernet/intel/igc/igc_regs.h index c0d8214148d1d..01c86d36856d2 100644 --- a/drivers/net/ethernet/intel/igc/igc_regs.h +++ b/drivers/net/ethernet/intel/igc/igc_regs.h @@ -224,6 +224,7 @@ /* Transmit Scheduling Registers */ #define IGC_TQAVCTRL 0x3570 #define IGC_TXQCTL(_n) (0x3344 + 0x4 * (_n)) +#define IGC_GTXOFFSET 0x3310 #define IGC_BASET_L 0x3314 #define IGC_BASET_H 0x3318 #define IGC_QBVCYCLET 0x331C diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c index 31ea0781b65ec..83f02b00735d3 100644 --- a/drivers/net/ethernet/intel/igc/igc_tsn.c +++ b/drivers/net/ethernet/intel/igc/igc_tsn.c @@ -49,6 +49,35 @@ static unsigned int igc_tsn_new_flags(struct igc_adapter *adapter) return new_flags; }
+void igc_tsn_adjust_txtime_offset(struct igc_adapter *adapter) +{ + struct igc_hw *hw = &adapter->hw; + u16 txoffset; + + if (!is_any_launchtime(adapter)) + return; + + switch (adapter->link_speed) { + case SPEED_10: + txoffset = IGC_TXOFFSET_SPEED_10; + break; + case SPEED_100: + txoffset = IGC_TXOFFSET_SPEED_100; + break; + case SPEED_1000: + txoffset = IGC_TXOFFSET_SPEED_1000; + break; + case SPEED_2500: + txoffset = IGC_TXOFFSET_SPEED_2500; + break; + default: + txoffset = 0; + break; + } + + wr32(IGC_GTXOFFSET, txoffset); +} + /* Returns the TSN specific registers to their default values after * the adapter is reset. */ @@ -58,6 +87,7 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter) u32 tqavctrl; int i;
+ wr32(IGC_GTXOFFSET, 0); wr32(IGC_TXPBS, I225_TXPBSIZE_DEFAULT); wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_DEFAULT);
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.h b/drivers/net/ethernet/intel/igc/igc_tsn.h index 1512307f5a528..b53e6af560b73 100644 --- a/drivers/net/ethernet/intel/igc/igc_tsn.h +++ b/drivers/net/ethernet/intel/igc/igc_tsn.h @@ -6,5 +6,6 @@
int igc_tsn_offload_apply(struct igc_adapter *adapter); int igc_tsn_reset(struct igc_adapter *adapter); +void igc_tsn_adjust_txtime_offset(struct igc_adapter *adapter);
#endif /* _IGC_BASE_H */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Faizal Rahim faizal.abdul.rahim@linux.intel.com
[ Upstream commit e037a26ead187901f83cad9c503ccece5ff6817a ]
Testing uncovered that even when the taprio gate is closed, some packets still transmit.
According to i225/6 hardware errata [1], traffic might overflow the planned QBV window. This happens because MAC maintains an internal buffer, primarily for supporting half duplex retries. Therefore, even when the gate closes, residual MAC data in the buffer may still transmit.
To mitigate this for i226, reduce the MAC's internal buffer from 192 bytes to the recommended 88 bytes by modifying the RETX_CTL register value.
This follows guidelines from: [1] Ethernet Controller I225/I22 Spec Update Rev 2.1 Errata Item 9: TSN: Packet Transmission Might Cross Qbv Window [2] I225/6 SW User Manual Rev 1.2.4: Section 8.11.5 Retry Buffer Control
Note that the RETX_CTL register can't be used in TSN mode because half duplex feature cannot coexist with TSN.
Test Steps: 1. Send taprio cmd to board A: tc qdisc replace dev enp1s0 parent root handle 100 taprio \ num_tc 4 \ map 3 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \ queues 1@0 1@1 1@2 1@3 \ base-time 0 \ sched-entry S 0x07 500000 \ sched-entry S 0x0f 500000 \ flags 0x2 \ txtime-delay 0
Note that for TC3, gate should open for 500us and close for another 500us.
3. Take tcpdump log on Board B.
4. Send udp packets via UDP tai app from Board A to Board B.
5. Analyze tcpdump log via wireshark log on Board B. Ensure that the total time from the first to the last packet received during one cycle for TC3 does not exceed 500us.
Fixes: 43546211738e ("igc: Add new device ID's") Signed-off-by: Faizal Rahim faizal.abdul.rahim@linux.intel.com Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Tested-by: Mor Bar-Gabay morx.bar.gabay@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc_defines.h | 6 ++++ drivers/net/ethernet/intel/igc/igc_tsn.c | 34 ++++++++++++++++++++ 2 files changed, 40 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h index 63fa7608861b2..8187a658dcbd5 100644 --- a/drivers/net/ethernet/intel/igc/igc_defines.h +++ b/drivers/net/ethernet/intel/igc/igc_defines.h @@ -402,6 +402,12 @@ #define IGC_DTXMXPKTSZ_TSN 0x19 /* 1600 bytes of max TX DMA packet size */ #define IGC_DTXMXPKTSZ_DEFAULT 0x98 /* 9728-byte Jumbo frames */
+/* Retry Buffer Control */ +#define IGC_RETX_CTL 0x041C +#define IGC_RETX_CTL_WATERMARK_MASK 0xF +#define IGC_RETX_CTL_QBVFULLTH_SHIFT 8 /* QBV Retry Buffer Full Threshold */ +#define IGC_RETX_CTL_QBVFULLEN 0x1000 /* Enable QBV Retry Buffer Full Threshold */ + /* Transmit Scheduling Latency */ /* Latency between transmission scheduling (LaunchTime) and the time * the packet is transmitted to the network in nanosecond. diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c index 83f02b00735d3..abdaaf7db4125 100644 --- a/drivers/net/ethernet/intel/igc/igc_tsn.c +++ b/drivers/net/ethernet/intel/igc/igc_tsn.c @@ -78,6 +78,15 @@ void igc_tsn_adjust_txtime_offset(struct igc_adapter *adapter) wr32(IGC_GTXOFFSET, txoffset); }
+static void igc_tsn_restore_retx_default(struct igc_adapter *adapter) +{ + struct igc_hw *hw = &adapter->hw; + u32 retxctl; + + retxctl = rd32(IGC_RETX_CTL) & IGC_RETX_CTL_WATERMARK_MASK; + wr32(IGC_RETX_CTL, retxctl); +} + /* Returns the TSN specific registers to their default values after * the adapter is reset. */ @@ -91,6 +100,9 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter) wr32(IGC_TXPBS, I225_TXPBSIZE_DEFAULT); wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_DEFAULT);
+ if (igc_is_device_id_i226(hw)) + igc_tsn_restore_retx_default(adapter); + tqavctrl = rd32(IGC_TQAVCTRL); tqavctrl &= ~(IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV | IGC_TQAVCTRL_FUTSCDDIS); @@ -111,6 +123,25 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter) return 0; }
+/* To partially fix i226 HW errata, reduce MAC internal buffering from 192 Bytes + * to 88 Bytes by setting RETX_CTL register using the recommendation from: + * a) Ethernet Controller I225/I226 Specification Update Rev 2.1 + * Item 9: TSN: Packet Transmission Might Cross the Qbv Window + * b) I225/6 SW User Manual Rev 1.2.4: Section 8.11.5 Retry Buffer Control + */ +static void igc_tsn_set_retx_qbvfullthreshold(struct igc_adapter *adapter) +{ + struct igc_hw *hw = &adapter->hw; + u32 retxctl, watermark; + + retxctl = rd32(IGC_RETX_CTL); + watermark = retxctl & IGC_RETX_CTL_WATERMARK_MASK; + /* Set QBVFULLTH value using watermark and set QBVFULLEN */ + retxctl |= (watermark << IGC_RETX_CTL_QBVFULLTH_SHIFT) | + IGC_RETX_CTL_QBVFULLEN; + wr32(IGC_RETX_CTL, retxctl); +} + static int igc_tsn_enable_offload(struct igc_adapter *adapter) { struct igc_hw *hw = &adapter->hw; @@ -124,6 +155,9 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter) wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_TSN); wr32(IGC_TXPBS, IGC_TXPBSIZE_TSN);
+ if (igc_is_device_id_i226(hw)) + igc_tsn_set_retx_qbvfullthreshold(adapter); + for (i = 0; i < adapter->num_tx_queues; i++) { struct igc_ring *ring = adapter->tx_ring[i]; u32 txqctl = 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dragos Tatulea dtatulea@nvidia.com
[ Upstream commit e6b5afd30b99b43682a7764e1a74a42fe4d5f4b3 ]
mlx5e_safe_reopen_channels() requires the state lock taken. The referenced changed in the Fixes tag removed the lock to fix another issue. This patch adds it back but at a later point (when calling mlx5e_safe_reopen_channels()) to avoid the deadlock referenced in the Fixes tag.
Fixes: eab0da38912e ("net/mlx5e: Fix possible deadlock on mlx5e_tx_timeout_work") Signed-off-by: Dragos Tatulea dtatulea@nvidia.com Link: https://lore.kernel.org/all/ZplpKq8FKi3vwfxv@gmail.com/T/ Reviewed-by: Breno Leitao leitao@debian.org Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/20240808144107.2095424-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c index 60bc5b577ab99..02d9fb0c5ec24 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c @@ -111,7 +111,9 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx) return err; }
+ mutex_lock(&priv->state_lock); err = mlx5e_safe_reopen_channels(priv); + mutex_unlock(&priv->state_lock); if (!err) { to_ctx->status = 1; /* all channels recovered */ return err;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cosmin Ratiu cratiu@nvidia.com
[ Upstream commit cbc796be1779c4dbc9a482c7233995e2a8b6bfb3 ]
Previously, an ethtool rx flow with no attrs would not be added to the NIC as it has no rules to configure the hw with, but it would be reported as successful to the caller (return code 0). This is confusing for the user as ethtool then reports "Added rule $num", but no rule was actually added.
This change corrects that by instead reporting these wrong rules as -EINVAL.
Fixes: b29c61dac3a2 ("net/mlx5e: Ethtool steering flow validation refactoring") Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Saeed Mahameed saeedm@nvidia.com Reviewed-by: Dragos Tatulea dtatulea@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/20240808144107.2095424-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c index aac32e505c14f..a8870c6daec6c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c @@ -738,7 +738,7 @@ mlx5e_ethtool_flow_replace(struct mlx5e_priv *priv, if (num_tuples <= 0) { netdev_warn(priv->netdev, "%s: flow is not valid %d\n", __func__, num_tuples); - return num_tuples; + return num_tuples < 0 ? num_tuples : -EINVAL; }
eth_ft = get_flow_table(priv, fs, num_tuples);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit a9a18e8f770c9b0703dab93580d0b02e199a4c79 ]
We can't dereference "skb" after calling vcc->push() because the skb is released.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/atm/idt77252.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c index 2daf50d4cd47a..7810f974b2ca9 100644 --- a/drivers/atm/idt77252.c +++ b/drivers/atm/idt77252.c @@ -1118,8 +1118,8 @@ dequeue_rx(struct idt77252_dev *card, struct rsq_entry *rsqe) rpp->len += skb->len;
if (stat & SAR_RSQE_EPDU) { + unsigned int len, truesize; unsigned char *l1l2; - unsigned int len;
l1l2 = (unsigned char *) ((unsigned long) skb->data + skb->len - 6);
@@ -1189,14 +1189,15 @@ dequeue_rx(struct idt77252_dev *card, struct rsq_entry *rsqe) ATM_SKB(skb)->vcc = vcc; __net_timestamp(skb);
+ truesize = skb->truesize; vcc->push(vcc, skb); atomic_inc(&vcc->stats->rx);
- if (skb->truesize > SAR_FB_SIZE_3) + if (truesize > SAR_FB_SIZE_3) add_rx_skb(card, 3, SAR_FB_SIZE_3, 1); - else if (skb->truesize > SAR_FB_SIZE_2) + else if (truesize > SAR_FB_SIZE_2) add_rx_skb(card, 2, SAR_FB_SIZE_2, 1); - else if (skb->truesize > SAR_FB_SIZE_1) + else if (truesize > SAR_FB_SIZE_1) add_rx_skb(card, 1, SAR_FB_SIZE_1, 1); else add_rx_skb(card, 0, SAR_FB_SIZE_0, 1);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Radhey Shyam Pandey radhey.shyam.pandey@amd.com
[ Upstream commit 9ff2f816e2aa65ca9a1cdf0954842f8173c0f48d ]
In axiethernet header fix register defines comment description to be inline with IP documentation. It updates MAC configuration register, MDIO configuration register and frame filter control description.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Radhey Shyam Pandey radhey.shyam.pandey@amd.com Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/xilinx/xilinx_axienet.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h index 6370c447ac5ca..969bea5541976 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h @@ -159,16 +159,16 @@ #define XAE_RCW1_OFFSET 0x00000404 /* Rx Configuration Word 1 */ #define XAE_TC_OFFSET 0x00000408 /* Tx Configuration */ #define XAE_FCC_OFFSET 0x0000040C /* Flow Control Configuration */ -#define XAE_EMMC_OFFSET 0x00000410 /* EMAC mode configuration */ -#define XAE_PHYC_OFFSET 0x00000414 /* RGMII/SGMII configuration */ +#define XAE_EMMC_OFFSET 0x00000410 /* MAC speed configuration */ +#define XAE_PHYC_OFFSET 0x00000414 /* RX Max Frame Configuration */ #define XAE_ID_OFFSET 0x000004F8 /* Identification register */ -#define XAE_MDIO_MC_OFFSET 0x00000500 /* MII Management Config */ -#define XAE_MDIO_MCR_OFFSET 0x00000504 /* MII Management Control */ -#define XAE_MDIO_MWD_OFFSET 0x00000508 /* MII Management Write Data */ -#define XAE_MDIO_MRD_OFFSET 0x0000050C /* MII Management Read Data */ +#define XAE_MDIO_MC_OFFSET 0x00000500 /* MDIO Setup */ +#define XAE_MDIO_MCR_OFFSET 0x00000504 /* MDIO Control */ +#define XAE_MDIO_MWD_OFFSET 0x00000508 /* MDIO Write Data */ +#define XAE_MDIO_MRD_OFFSET 0x0000050C /* MDIO Read Data */ #define XAE_UAW0_OFFSET 0x00000700 /* Unicast address word 0 */ #define XAE_UAW1_OFFSET 0x00000704 /* Unicast address word 1 */ -#define XAE_FMI_OFFSET 0x00000708 /* Filter Mask Index */ +#define XAE_FMI_OFFSET 0x00000708 /* Frame Filter Control */ #define XAE_AF0_OFFSET 0x00000710 /* Address Filter 0 */ #define XAE_AF1_OFFSET 0x00000714 /* Address Filter 1 */
@@ -307,7 +307,7 @@ */ #define XAE_UAW1_UNICASTADDR_MASK 0x0000FFFF
-/* Bit masks for Axi Ethernet FMI register */ +/* Bit masks for Axi Ethernet FMC register */ #define XAE_FMI_PM_MASK 0x80000000 /* Promis. mode enable */ #define XAE_FMI_IND_MASK 0x00000003 /* Index Mask */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pawel Dembicki paweldembicki@gmail.com
[ Upstream commit 5b9eebc2c7a5f0cc7950d918c1e8a4ad4bed5010 ]
In the 'vsc73xx_phy_write' function, the register value is missing, and the phy write operation always sends zeros.
This commit passes the value variable into the proper register.
Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Reviewed-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: Pawel Dembicki paweldembicki@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/vitesse-vsc73xx-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/vitesse-vsc73xx-core.c b/drivers/net/dsa/vitesse-vsc73xx-core.c index 3efd556690563..81d39dfe21f45 100644 --- a/drivers/net/dsa/vitesse-vsc73xx-core.c +++ b/drivers/net/dsa/vitesse-vsc73xx-core.c @@ -531,7 +531,7 @@ static int vsc73xx_phy_write(struct dsa_switch *ds, int phy, int regnum, return 0; }
- cmd = (phy << 21) | (regnum << 16); + cmd = (phy << 21) | (regnum << 16) | val; ret = vsc73xx_write(vsc, VSC73XX_BLOCK_MII, 0, 1, cmd); if (ret) return ret;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pawel Dembicki paweldembicki@gmail.com
[ Upstream commit eb7e33d01db3aec128590391b2397384bab406b6 ]
Switch the delay loop during the Arbiter empty check from vsc73xx_adjust_link() to use read_poll_timeout(). Functionally, one msleep() call is eliminated at the end of the loop in the timeout case.
As Russell King suggested:
"This [change] avoids the issue that on the last iteration, the code reads the register, tests it, finds the condition that's being waiting for is false, _then_ waits and end up printing the error message - that last wait is rather useless, and as the arbiter state isn't checked after waiting, it could be that we had success during the last wait."
Suggested-by: Russell King linux@armlinux.org.uk Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: Pawel Dembicki paweldembicki@gmail.com Link: https://lore.kernel.org/r/20240417205048.3542839-2-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: fa63c6434b6f ("net: dsa: vsc73xx: check busy flag in MDIO operations") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/vitesse-vsc73xx-core.c | 30 ++++++++++++++------------ 1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/drivers/net/dsa/vitesse-vsc73xx-core.c b/drivers/net/dsa/vitesse-vsc73xx-core.c index 81d39dfe21f45..92087f9d73550 100644 --- a/drivers/net/dsa/vitesse-vsc73xx-core.c +++ b/drivers/net/dsa/vitesse-vsc73xx-core.c @@ -17,6 +17,7 @@ #include <linux/kernel.h> #include <linux/module.h> #include <linux/device.h> +#include <linux/iopoll.h> #include <linux/of.h> #include <linux/of_device.h> #include <linux/of_mdio.h> @@ -269,6 +270,9 @@ #define IS_7398(a) ((a)->chipid == VSC73XX_CHIPID_ID_7398) #define IS_739X(a) (IS_7395(a) || IS_7398(a))
+#define VSC73XX_POLL_SLEEP_US 1000 +#define VSC73XX_POLL_TIMEOUT_US 10000 + struct vsc73xx_counter { u8 counter; const char *name; @@ -780,7 +784,7 @@ static void vsc73xx_adjust_link(struct dsa_switch *ds, int port, * after a PHY or the CPU port comes up or down. */ if (!phydev->link) { - int maxloop = 10; + int ret, err;
dev_dbg(vsc->dev, "port %d: went down\n", port); @@ -795,19 +799,17 @@ static void vsc73xx_adjust_link(struct dsa_switch *ds, int port, VSC73XX_ARBDISC, BIT(port), BIT(port));
/* Wait until queue is empty */ - vsc73xx_read(vsc, VSC73XX_BLOCK_ARBITER, 0, - VSC73XX_ARBEMPTY, &val); - while (!(val & BIT(port))) { - msleep(1); - vsc73xx_read(vsc, VSC73XX_BLOCK_ARBITER, 0, - VSC73XX_ARBEMPTY, &val); - if (--maxloop == 0) { - dev_err(vsc->dev, - "timeout waiting for block arbiter\n"); - /* Continue anyway */ - break; - } - } + ret = read_poll_timeout(vsc73xx_read, err, + err < 0 || (val & BIT(port)), + VSC73XX_POLL_SLEEP_US, + VSC73XX_POLL_TIMEOUT_US, false, + vsc, VSC73XX_BLOCK_ARBITER, 0, + VSC73XX_ARBEMPTY, &val); + if (ret) + dev_err(vsc->dev, + "timeout waiting for block arbiter\n"); + else if (err < 0) + dev_err(vsc->dev, "error reading arbiter\n");
/* Put this port into reset */ vsc73xx_write(vsc, VSC73XX_BLOCK_MAC, port, VSC73XX_MAC_CFG,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pawel Dembicki paweldembicki@gmail.com
[ Upstream commit fa63c6434b6f6aaf9d8d599dc899bc0a074cc0ad ]
The VSC73xx has a busy flag used during MDIO operations. It is raised when MDIO read/write operations are in progress. Without it, PHYs are misconfigured and bus operations do not work as expected.
Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Reviewed-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: Pawel Dembicki paweldembicki@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/vitesse-vsc73xx-core.c | 37 +++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/vitesse-vsc73xx-core.c b/drivers/net/dsa/vitesse-vsc73xx-core.c index 92087f9d73550..c8e9ca5d5c284 100644 --- a/drivers/net/dsa/vitesse-vsc73xx-core.c +++ b/drivers/net/dsa/vitesse-vsc73xx-core.c @@ -39,6 +39,10 @@ #define VSC73XX_BLOCK_ARBITER 0x5 /* Only subblock 0 */ #define VSC73XX_BLOCK_SYSTEM 0x7 /* Only subblock 0 */
+/* MII Block subblock */ +#define VSC73XX_BLOCK_MII_INTERNAL 0x0 /* Internal MDIO subblock */ +#define VSC73XX_BLOCK_MII_EXTERNAL 0x1 /* External MDIO subblock */ + #define CPU_PORT 6 /* CPU port */
/* MAC Block registers */ @@ -197,6 +201,8 @@ #define VSC73XX_MII_CMD 0x1 #define VSC73XX_MII_DATA 0x2
+#define VSC73XX_MII_STAT_BUSY BIT(3) + /* Arbiter block 5 registers */ #define VSC73XX_ARBEMPTY 0x0c #define VSC73XX_ARBDISC 0x0e @@ -271,6 +277,7 @@ #define IS_739X(a) (IS_7395(a) || IS_7398(a))
#define VSC73XX_POLL_SLEEP_US 1000 +#define VSC73XX_MDIO_POLL_SLEEP_US 5 #define VSC73XX_POLL_TIMEOUT_US 10000
struct vsc73xx_counter { @@ -488,6 +495,22 @@ static int vsc73xx_detect(struct vsc73xx *vsc) return 0; }
+static int vsc73xx_mdio_busy_check(struct vsc73xx *vsc) +{ + int ret, err; + u32 val; + + ret = read_poll_timeout(vsc73xx_read, err, + err < 0 || !(val & VSC73XX_MII_STAT_BUSY), + VSC73XX_MDIO_POLL_SLEEP_US, + VSC73XX_POLL_TIMEOUT_US, false, vsc, + VSC73XX_BLOCK_MII, VSC73XX_BLOCK_MII_INTERNAL, + VSC73XX_MII_STAT, &val); + if (ret) + return ret; + return err; +} + static int vsc73xx_phy_read(struct dsa_switch *ds, int phy, int regnum) { struct vsc73xx *vsc = ds->priv; @@ -495,12 +518,20 @@ static int vsc73xx_phy_read(struct dsa_switch *ds, int phy, int regnum) u32 val; int ret;
+ ret = vsc73xx_mdio_busy_check(vsc); + if (ret) + return ret; + /* Setting bit 26 means "read" */ cmd = BIT(26) | (phy << 21) | (regnum << 16); ret = vsc73xx_write(vsc, VSC73XX_BLOCK_MII, 0, 1, cmd); if (ret) return ret; - msleep(2); + + ret = vsc73xx_mdio_busy_check(vsc); + if (ret) + return ret; + ret = vsc73xx_read(vsc, VSC73XX_BLOCK_MII, 0, 2, &val); if (ret) return ret; @@ -524,6 +555,10 @@ static int vsc73xx_phy_write(struct dsa_switch *ds, int phy, int regnum, u32 cmd; int ret;
+ ret = vsc73xx_mdio_busy_check(vsc); + if (ret) + return ret; + /* It was found through tedious experiments that this router * chip really hates to have it's PHYs reset. They * never recover if that happens: autonegotiation stops
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yue Haibing yuehaibing@huawei.com
[ Upstream commit 98261be155f8de38f11b6542d4a8935e5532687b ]
Commit f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") declared but never implemented these.
Signed-off-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Asmaa Mnebhi asmaa@nvidia.com Link: https://lore.kernel.org/r/20230808145249.41596-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: df934abb185c ("mlxbf_gige: disable RX filters until RX path initialized") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h index 5a1027b072155..483fca0cc5a0c 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h @@ -148,9 +148,6 @@ enum mlxbf_gige_res { int mlxbf_gige_mdio_probe(struct platform_device *pdev, struct mlxbf_gige *priv); void mlxbf_gige_mdio_remove(struct mlxbf_gige *priv); -irqreturn_t mlxbf_gige_mdio_handle_phy_interrupt(int irq, void *dev_id); -void mlxbf_gige_mdio_enable_phy_int(struct mlxbf_gige *priv); - void mlxbf_gige_set_mac_rx_filter(struct mlxbf_gige *priv, unsigned int index, u64 dmac); void mlxbf_gige_get_mac_rx_filter(struct mlxbf_gige *priv,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Thompson davthompson@nvidia.com
[ Upstream commit df934abb185c71c9f2fa07a5013672d0cbd36560 ]
A recent change to the driver exposed a bug where the MAC RX filters (unicast MAC, broadcast MAC, and multicast MAC) are configured and enabled before the RX path is fully initialized. The result of this bug is that after the PHY is started packets that match these MAC RX filters start to flow into the RX FIFO. And then, after rx_init() is completed, these packets will go into the driver RX ring as well. If enough packets are received to fill the RX ring (default size is 128 packets) before the call to request_irq() completes, the driver RX function becomes stuck.
This bug is intermittent but is most likely to be seen where the oob_net0 interface is connected to a busy network with lots of broadcast and multicast traffic.
All the MAC RX filters must be disabled until the RX path is ready, i.e. all initialization is done and all the IRQs are installed.
Fixes: f7442a634ac0 ("mlxbf_gige: call request_irq() after NAPI initialized") Reviewed-by: Asmaa Mnebhi asmaa@nvidia.com Signed-off-by: David Thompson davthompson@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20240809163612.12852-1-davthompson@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/mellanox/mlxbf_gige/mlxbf_gige.h | 8 +++ .../mellanox/mlxbf_gige/mlxbf_gige_main.c | 10 ++++ .../mellanox/mlxbf_gige/mlxbf_gige_regs.h | 2 + .../mellanox/mlxbf_gige/mlxbf_gige_rx.c | 50 ++++++++++++++++--- 4 files changed, 64 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h index 483fca0cc5a0c..bf1a2883f0820 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h @@ -39,6 +39,7 @@ */ #define MLXBF_GIGE_BCAST_MAC_FILTER_IDX 0 #define MLXBF_GIGE_LOCAL_MAC_FILTER_IDX 1 +#define MLXBF_GIGE_MAX_FILTER_IDX 3
/* Define for broadcast MAC literal */ #define BCAST_MAC_ADDR 0xFFFFFFFFFFFF @@ -148,6 +149,13 @@ enum mlxbf_gige_res { int mlxbf_gige_mdio_probe(struct platform_device *pdev, struct mlxbf_gige *priv); void mlxbf_gige_mdio_remove(struct mlxbf_gige *priv); + +void mlxbf_gige_enable_multicast_rx(struct mlxbf_gige *priv); +void mlxbf_gige_disable_multicast_rx(struct mlxbf_gige *priv); +void mlxbf_gige_enable_mac_rx_filter(struct mlxbf_gige *priv, + unsigned int index); +void mlxbf_gige_disable_mac_rx_filter(struct mlxbf_gige *priv, + unsigned int index); void mlxbf_gige_set_mac_rx_filter(struct mlxbf_gige *priv, unsigned int index, u64 dmac); void mlxbf_gige_get_mac_rx_filter(struct mlxbf_gige *priv, diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c index d6b4d163bbbfd..6d90576fda597 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c @@ -168,6 +168,10 @@ static int mlxbf_gige_open(struct net_device *netdev) if (err) goto napi_deinit;
+ mlxbf_gige_enable_mac_rx_filter(priv, MLXBF_GIGE_BCAST_MAC_FILTER_IDX); + mlxbf_gige_enable_mac_rx_filter(priv, MLXBF_GIGE_LOCAL_MAC_FILTER_IDX); + mlxbf_gige_enable_multicast_rx(priv); + /* Set bits in INT_EN that we care about */ int_en = MLXBF_GIGE_INT_EN_HW_ACCESS_ERROR | MLXBF_GIGE_INT_EN_TX_CHECKSUM_INPUTS | @@ -293,6 +297,7 @@ static int mlxbf_gige_probe(struct platform_device *pdev) void __iomem *plu_base; void __iomem *base; int addr, phy_irq; + unsigned int i; int err;
base = devm_platform_ioremap_resource(pdev, MLXBF_GIGE_RES_MAC); @@ -335,6 +340,11 @@ static int mlxbf_gige_probe(struct platform_device *pdev) priv->rx_q_entries = MLXBF_GIGE_DEFAULT_RXQ_SZ; priv->tx_q_entries = MLXBF_GIGE_DEFAULT_TXQ_SZ;
+ for (i = 0; i <= MLXBF_GIGE_MAX_FILTER_IDX; i++) + mlxbf_gige_disable_mac_rx_filter(priv, i); + mlxbf_gige_disable_multicast_rx(priv); + mlxbf_gige_disable_promisc(priv); + /* Write initial MAC address to hardware */ mlxbf_gige_initial_mac(priv);
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h index 7be3a793984d5..d27535a1fb86f 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h @@ -59,6 +59,8 @@ #define MLXBF_GIGE_TX_STATUS_DATA_FIFO_FULL BIT(1) #define MLXBF_GIGE_RX_MAC_FILTER_DMAC_RANGE_START 0x0520 #define MLXBF_GIGE_RX_MAC_FILTER_DMAC_RANGE_END 0x0528 +#define MLXBF_GIGE_RX_MAC_FILTER_GENERAL 0x0530 +#define MLXBF_GIGE_RX_MAC_FILTER_EN_MULTICAST BIT(1) #define MLXBF_GIGE_RX_MAC_FILTER_COUNT_DISC 0x0540 #define MLXBF_GIGE_RX_MAC_FILTER_COUNT_DISC_EN BIT(0) #define MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS 0x0548 diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c index 6999843584934..eb62620b63c7f 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c @@ -11,15 +11,31 @@ #include "mlxbf_gige.h" #include "mlxbf_gige_regs.h"
-void mlxbf_gige_set_mac_rx_filter(struct mlxbf_gige *priv, - unsigned int index, u64 dmac) +void mlxbf_gige_enable_multicast_rx(struct mlxbf_gige *priv) { void __iomem *base = priv->base; - u64 control; + u64 data;
- /* Write destination MAC to specified MAC RX filter */ - writeq(dmac, base + MLXBF_GIGE_RX_MAC_FILTER + - (index * MLXBF_GIGE_RX_MAC_FILTER_STRIDE)); + data = readq(base + MLXBF_GIGE_RX_MAC_FILTER_GENERAL); + data |= MLXBF_GIGE_RX_MAC_FILTER_EN_MULTICAST; + writeq(data, base + MLXBF_GIGE_RX_MAC_FILTER_GENERAL); +} + +void mlxbf_gige_disable_multicast_rx(struct mlxbf_gige *priv) +{ + void __iomem *base = priv->base; + u64 data; + + data = readq(base + MLXBF_GIGE_RX_MAC_FILTER_GENERAL); + data &= ~MLXBF_GIGE_RX_MAC_FILTER_EN_MULTICAST; + writeq(data, base + MLXBF_GIGE_RX_MAC_FILTER_GENERAL); +} + +void mlxbf_gige_enable_mac_rx_filter(struct mlxbf_gige *priv, + unsigned int index) +{ + void __iomem *base = priv->base; + u64 control;
/* Enable MAC receive filter mask for specified index */ control = readq(base + MLXBF_GIGE_CONTROL); @@ -27,6 +43,28 @@ void mlxbf_gige_set_mac_rx_filter(struct mlxbf_gige *priv, writeq(control, base + MLXBF_GIGE_CONTROL); }
+void mlxbf_gige_disable_mac_rx_filter(struct mlxbf_gige *priv, + unsigned int index) +{ + void __iomem *base = priv->base; + u64 control; + + /* Disable MAC receive filter mask for specified index */ + control = readq(base + MLXBF_GIGE_CONTROL); + control &= ~(MLXBF_GIGE_CONTROL_EN_SPECIFIC_MAC << index); + writeq(control, base + MLXBF_GIGE_CONTROL); +} + +void mlxbf_gige_set_mac_rx_filter(struct mlxbf_gige *priv, + unsigned int index, u64 dmac) +{ + void __iomem *base = priv->base; + + /* Write destination MAC to specified MAC RX filter */ + writeq(dmac, base + MLXBF_GIGE_RX_MAC_FILTER + + (index * MLXBF_GIGE_RX_MAC_FILTER_STRIDE)); +} + void mlxbf_gige_get_mac_rx_filter(struct mlxbf_gige *priv, unsigned int index, u64 *dmac) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eugene Syromiatnikov esyr@redhat.com
[ Upstream commit 655111b838cdabdb604f3625a9ff08c5eedb11da ]
ssn_offset field is u32 and is placed into the netlink response with nla_put_u32(), but only 2 bytes are reserved for the attribute payload in subflow_get_info_size() (even though it makes no difference in the end, as it is aligned up to 4 bytes). Supply the correct argument to the relevant nla_total_size() call to make it less confusing.
Fixes: 5147dfb50832 ("mptcp: allow dumping subflow context to userspace") Signed-off-by: Eugene Syromiatnikov esyr@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240812065024.GA19719@asgard.redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/diag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/diag.c b/net/mptcp/diag.c index 7017dd60659dc..b2199cc282384 100644 --- a/net/mptcp/diag.c +++ b/net/mptcp/diag.c @@ -95,7 +95,7 @@ static size_t subflow_get_info_size(const struct sock *sk) nla_total_size(4) + /* MPTCP_SUBFLOW_ATTR_RELWRITE_SEQ */ nla_total_size_64bit(8) + /* MPTCP_SUBFLOW_ATTR_MAP_SEQ */ nla_total_size(4) + /* MPTCP_SUBFLOW_ATTR_MAP_SFSEQ */ - nla_total_size(2) + /* MPTCP_SUBFLOW_ATTR_SSN_OFFSET */ + nla_total_size(4) + /* MPTCP_SUBFLOW_ATTR_SSN_OFFSET */ nla_total_size(2) + /* MPTCP_SUBFLOW_ATTR_MAP_DATALEN */ nla_total_size(4) + /* MPTCP_SUBFLOW_ATTR_FLAGS */ nla_total_size(1) + /* MPTCP_SUBFLOW_ATTR_ID_REM */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Hughes tom@compton.nu
[ Upstream commit 3cd740b985963f874a1a094f1969e998b9d05554 ]
Commit 264640fc2c5f4 ("ipv6: distinguish frag queues by device for multicast and link-local packets") modified the ipv6 fragment reassembly logic to distinguish frag queues by device for multicast and link-local packets but in fact only the main reassembly code limits the use of the device to those address types and the netfilter reassembly code uses the device for all packets.
This means that if fragments of a packet arrive on different interfaces then netfilter will fail to reassemble them and the fragments will be expired without going any further through the filters.
Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Tom Hughes tom@compton.nu Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/netfilter/nf_conntrack_reasm.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 87a394179092c..e4b45db8a3992 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -154,6 +154,10 @@ static struct frag_queue *fq_find(struct net *net, __be32 id, u32 user, }; struct inet_frag_queue *q;
+ if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST | + IPV6_ADDR_LINKLOCAL))) + key.iif = 0; + q = inet_frag_find(nf_frag->fqdir, &key); if (!q) return NULL;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Donald Hunter donald.hunter@gmail.com
[ Upstream commit e9767137308daf906496613fd879808a07f006a2 ]
Fix missing initialisation of extack in flow offload.
Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Donald Hunter donald.hunter@gmail.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_flow_table_offload.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c index 1c26f03fc6617..1904a4f295d4a 100644 --- a/net/netfilter/nf_flow_table_offload.c +++ b/net/netfilter/nf_flow_table_offload.c @@ -841,8 +841,8 @@ static int nf_flow_offload_tuple(struct nf_flowtable *flowtable, struct list_head *block_cb_list) { struct flow_cls_offload cls_flow = {}; + struct netlink_ext_ack extack = {}; struct flow_block_cb *block_cb; - struct netlink_ext_ack extack; __be16 proto = ETH_P_ALL; int err, i = 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 7d8dc1c7be8d3509e8f5164dd5df64c8e34d7eeb ]
Conntrack assumes an unconfirmed entry (not yet committed to global hash table) has a refcount of 1 and is not visible to other cores.
With multicast forwarding this assumption breaks down because such skbs get cloned after being picked up, i.e. ct->use refcount is > 1.
Likewise, bridge netfilter will clone broad/mutlicast frames and all frames in case they need to be flood-forwarded during learning phase.
For ip multicast forwarding or plain bridge flood-forward this will "work" because packets don't leave softirq and are implicitly serialized.
With nfqueue this no longer holds true, the packets get queued and can be reinjected in arbitrary ways.
Disable this feature, I see no other solution.
After this patch, nfqueue cannot queue packets except the last multicast/broadcast packet.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br_netfilter_hooks.c | 6 +++++- net/netfilter/nfnetlink_queue.c | 35 +++++++++++++++++++++++++++++++-- 2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index 9ac70c27da835..9229300881b5f 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -618,8 +618,12 @@ static unsigned int br_nf_local_in(void *priv, if (likely(nf_ct_is_confirmed(ct))) return NF_ACCEPT;
+ if (WARN_ON_ONCE(refcount_read(&nfct->use) != 1)) { + nf_reset_ct(skb); + return NF_ACCEPT; + } + WARN_ON_ONCE(skb_shared(skb)); - WARN_ON_ONCE(refcount_read(&nfct->use) != 1);
/* We can't call nf_confirm here, it would create a dependency * on nf_conntrack module. diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index 5bc342cb13767..f13eed826cbb8 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -647,10 +647,41 @@ static bool nf_ct_drop_unconfirmed(const struct nf_queue_entry *entry) { #if IS_ENABLED(CONFIG_NF_CONNTRACK) static const unsigned long flags = IPS_CONFIRMED | IPS_DYING; - const struct nf_conn *ct = (void *)skb_nfct(entry->skb); + struct nf_conn *ct = (void *)skb_nfct(entry->skb); + unsigned long status; + unsigned int use;
- if (ct && ((ct->status & flags) == IPS_DYING)) + if (!ct) + return false; + + status = READ_ONCE(ct->status); + if ((status & flags) == IPS_DYING) return true; + + if (status & IPS_CONFIRMED) + return false; + + /* in some cases skb_clone() can occur after initial conntrack + * pickup, but conntrack assumes exclusive skb->_nfct ownership for + * unconfirmed entries. + * + * This happens for br_netfilter and with ip multicast routing. + * We can't be solved with serialization here because one clone could + * have been queued for local delivery. + */ + use = refcount_read(&ct->ct_general.use); + if (likely(use == 1)) + return false; + + /* Can't decrement further? Exclusive ownership. */ + if (!refcount_dec_not_one(&ct->ct_general.use)) + return false; + + skb_set_nfct(entry->skb, 0); + /* No nf_ct_put(): we already decremented .use and it cannot + * drop down to 0. + */ + return true; #endif return false; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit e0b6648b0446e59522819c75ba1dcb09e68d3e94 ]
In theory, dumpreset may fail and invalidate the preceeding log message. Fix this and use the occasion to prepare for object reset locking, which benefits from a few unrelated changes:
* Add an early call to nfnetlink_unicast if not resetting which effectively skips the audit logging but also unindents it. * Extract the table's name from the netlink attribute (which is verified via earlier table lookup) to not rely upon validity of the looked up table pointer. * Do not use local variable family, it will vanish.
Fixes: 8e6cf365e1d5 ("audit: log nftables configuration change events") Signed-off-by: Phil Sutter phil@nwl.cc Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 10180d280e792..747033129c0fe 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7531,6 +7531,7 @@ static int nf_tables_dump_obj_done(struct netlink_callback *cb) static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const nla[]) { + const struct nftables_pernet *nft_net = nft_pernet(info->net); struct netlink_ext_ack *extack = info->extack; u8 genmask = nft_genmask_cur(info->net); u8 family = info->nfmsg->nfgen_family; @@ -7540,6 +7541,7 @@ static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, struct sk_buff *skb2; bool reset = false; u32 objtype; + char *buf; int err;
if (info->nlh->nlmsg_flags & NLM_F_DUMP) { @@ -7578,27 +7580,23 @@ static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, if (NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET) reset = true;
- if (reset) { - const struct nftables_pernet *nft_net; - char *buf; - - nft_net = nft_pernet(net); - buf = kasprintf(GFP_ATOMIC, "%s:%u", table->name, nft_net->base_seq); - - audit_log_nfcfg(buf, - family, - 1, - AUDIT_NFT_OP_OBJ_RESET, - GFP_ATOMIC); - kfree(buf); - } - err = nf_tables_fill_obj_info(skb2, net, NETLINK_CB(skb).portid, info->nlh->nlmsg_seq, NFT_MSG_NEWOBJ, 0, family, table, obj, reset); if (err < 0) goto err_fill_obj_info;
+ if (!reset) + return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid); + + buf = kasprintf(GFP_ATOMIC, "%.*s:%u", + nla_len(nla[NFTA_OBJ_TABLE]), + (char *)nla_data(nla[NFTA_OBJ_TABLE]), + nft_net->base_seq); + audit_log_nfcfg(buf, info->nfmsg->nfgen_family, 1, + AUDIT_NFT_OP_OBJ_RESET, GFP_ATOMIC); + kfree(buf); + return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid);
err_fill_obj_info:
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit ff16111cc10c82ee065ffbd9fa8d6210394ff8c6 ]
The code does not make use of cb->args fields past the first one, no need to zero them.
Signed-off-by: Phil Sutter phil@nwl.cc Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 747033129c0fe..ddf84f226822b 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7452,9 +7452,6 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) goto cont; if (idx < s_idx) goto cont; - if (idx > s_idx) - memset(&cb->args[1], 0, - sizeof(cb->args) - sizeof(cb->args[0])); if (filter && filter->table && strcmp(filter->table, table->name)) goto cont;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit 4279cc60b354d2d2b970655a70a151cbfa1d958b ]
Prep work for moving the filter into struct netlink_callback's scratch area.
Signed-off-by: Phil Sutter phil@nwl.cc Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 36 +++++++++++++++-------------------- 1 file changed, 15 insertions(+), 21 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index ddf84f226822b..07140899a8d1d 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7452,11 +7452,9 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) goto cont; if (idx < s_idx) goto cont; - if (filter && filter->table && - strcmp(filter->table, table->name)) + if (filter->table && strcmp(filter->table, table->name)) goto cont; - if (filter && - filter->type != NFT_OBJECT_UNSPEC && + if (filter->type != NFT_OBJECT_UNSPEC && obj->ops->type->type != filter->type) goto cont;
@@ -7491,23 +7489,21 @@ static int nf_tables_dump_obj_start(struct netlink_callback *cb) const struct nlattr * const *nla = cb->data; struct nft_obj_filter *filter = NULL;
- if (nla[NFTA_OBJ_TABLE] || nla[NFTA_OBJ_TYPE]) { - filter = kzalloc(sizeof(*filter), GFP_ATOMIC); - if (!filter) - return -ENOMEM; + filter = kzalloc(sizeof(*filter), GFP_ATOMIC); + if (!filter) + return -ENOMEM;
- if (nla[NFTA_OBJ_TABLE]) { - filter->table = nla_strdup(nla[NFTA_OBJ_TABLE], GFP_ATOMIC); - if (!filter->table) { - kfree(filter); - return -ENOMEM; - } + if (nla[NFTA_OBJ_TABLE]) { + filter->table = nla_strdup(nla[NFTA_OBJ_TABLE], GFP_ATOMIC); + if (!filter->table) { + kfree(filter); + return -ENOMEM; } - - if (nla[NFTA_OBJ_TYPE]) - filter->type = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE])); }
+ if (nla[NFTA_OBJ_TYPE]) + filter->type = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE])); + cb->data = filter; return 0; } @@ -7516,10 +7512,8 @@ static int nf_tables_dump_obj_done(struct netlink_callback *cb) { struct nft_obj_filter *filter = cb->data;
- if (filter) { - kfree(filter->table); - kfree(filter); - } + kfree(filter->table); + kfree(filter);
return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit ecf49cad807061d880bea27a5da8e0114ddc7690 ]
Name it for what it is supposed to become, a real nft_obj_dump_ctx. No functional change intended.
Signed-off-by: Phil Sutter phil@nwl.cc Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 07140899a8d1d..f4bdfd5dd319a 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7416,7 +7416,7 @@ static void audit_log_obj_reset(const struct nft_table *table, kfree(buf); }
-struct nft_obj_filter { +struct nft_obj_dump_ctx { char *table; u32 type; }; @@ -7426,7 +7426,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) const struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh); const struct nft_table *table; unsigned int idx = 0, s_idx = cb->args[0]; - struct nft_obj_filter *filter = cb->data; + struct nft_obj_dump_ctx *ctx = cb->data; struct net *net = sock_net(skb->sk); int family = nfmsg->nfgen_family; struct nftables_pernet *nft_net; @@ -7452,10 +7452,10 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) goto cont; if (idx < s_idx) goto cont; - if (filter->table && strcmp(filter->table, table->name)) + if (ctx->table && strcmp(ctx->table, table->name)) goto cont; - if (filter->type != NFT_OBJECT_UNSPEC && - obj->ops->type->type != filter->type) + if (ctx->type != NFT_OBJECT_UNSPEC && + obj->ops->type->type != ctx->type) goto cont;
rc = nf_tables_fill_obj_info(skb, net, @@ -7487,33 +7487,33 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) static int nf_tables_dump_obj_start(struct netlink_callback *cb) { const struct nlattr * const *nla = cb->data; - struct nft_obj_filter *filter = NULL; + struct nft_obj_dump_ctx *ctx = NULL;
- filter = kzalloc(sizeof(*filter), GFP_ATOMIC); - if (!filter) + ctx = kzalloc(sizeof(*ctx), GFP_ATOMIC); + if (!ctx) return -ENOMEM;
if (nla[NFTA_OBJ_TABLE]) { - filter->table = nla_strdup(nla[NFTA_OBJ_TABLE], GFP_ATOMIC); - if (!filter->table) { - kfree(filter); + ctx->table = nla_strdup(nla[NFTA_OBJ_TABLE], GFP_ATOMIC); + if (!ctx->table) { + kfree(ctx); return -ENOMEM; } }
if (nla[NFTA_OBJ_TYPE]) - filter->type = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE])); + ctx->type = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE]));
- cb->data = filter; + cb->data = ctx; return 0; }
static int nf_tables_dump_obj_done(struct netlink_callback *cb) { - struct nft_obj_filter *filter = cb->data; + struct nft_obj_dump_ctx *ctx = cb->data;
- kfree(filter->table); - kfree(filter); + kfree(ctx->table); + kfree(ctx);
return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit 2eda95cfa2fc43bcb21a801dc1d16a0b7cc73860 ]
Prep work for moving the context into struct netlink_callback scratch area.
Signed-off-by: Phil Sutter phil@nwl.cc Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index f4bdfd5dd319a..48cd3e2dde69c 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7417,6 +7417,7 @@ static void audit_log_obj_reset(const struct nft_table *table, }
struct nft_obj_dump_ctx { + unsigned int s_idx; char *table; u32 type; }; @@ -7424,14 +7425,14 @@ struct nft_obj_dump_ctx { static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) { const struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh); - const struct nft_table *table; - unsigned int idx = 0, s_idx = cb->args[0]; struct nft_obj_dump_ctx *ctx = cb->data; struct net *net = sock_net(skb->sk); int family = nfmsg->nfgen_family; struct nftables_pernet *nft_net; + const struct nft_table *table; unsigned int entries = 0; struct nft_object *obj; + unsigned int idx = 0; bool reset = false; int rc = 0;
@@ -7450,7 +7451,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) list_for_each_entry_rcu(obj, &table->objects, list) { if (!nft_is_active(net, obj)) goto cont; - if (idx < s_idx) + if (idx < ctx->s_idx) goto cont; if (ctx->table && strcmp(ctx->table, table->name)) goto cont; @@ -7480,7 +7481,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) } rcu_read_unlock();
- cb->args[0] = idx; + ctx->s_idx = idx; return skb->len; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit 5a893b9cdf6fa5758f43d323a1d7fa6d1bf489ff ]
No need to allocate it if one may just use struct netlink_callback's scratch area for it.
Signed-off-by: Phil Sutter phil@nwl.cc Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 48cd3e2dde69c..05c93af417120 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7425,7 +7425,7 @@ struct nft_obj_dump_ctx { static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) { const struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh); - struct nft_obj_dump_ctx *ctx = cb->data; + struct nft_obj_dump_ctx *ctx = (void *)cb->ctx; struct net *net = sock_net(skb->sk); int family = nfmsg->nfgen_family; struct nftables_pernet *nft_net; @@ -7487,34 +7487,28 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb)
static int nf_tables_dump_obj_start(struct netlink_callback *cb) { + struct nft_obj_dump_ctx *ctx = (void *)cb->ctx; const struct nlattr * const *nla = cb->data; - struct nft_obj_dump_ctx *ctx = NULL;
- ctx = kzalloc(sizeof(*ctx), GFP_ATOMIC); - if (!ctx) - return -ENOMEM; + BUILD_BUG_ON(sizeof(*ctx) > sizeof(cb->ctx));
if (nla[NFTA_OBJ_TABLE]) { ctx->table = nla_strdup(nla[NFTA_OBJ_TABLE], GFP_ATOMIC); - if (!ctx->table) { - kfree(ctx); + if (!ctx->table) return -ENOMEM; - } }
if (nla[NFTA_OBJ_TYPE]) ctx->type = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE]));
- cb->data = ctx; return 0; }
static int nf_tables_dump_obj_done(struct netlink_callback *cb) { - struct nft_obj_dump_ctx *ctx = cb->data; + struct nft_obj_dump_ctx *ctx = (void *)cb->ctx;
kfree(ctx->table); - kfree(ctx);
return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit a552339063d37b3b1133d9dfc31f851edafb27bb ]
Relieve the dump callback from having to inspect nlmsg_type upon each call, just do it once at start of the dump.
Signed-off-by: Phil Sutter phil@nwl.cc Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 05c93af417120..38a5e5c5530c7 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7420,6 +7420,7 @@ struct nft_obj_dump_ctx { unsigned int s_idx; char *table; u32 type; + bool reset; };
static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) @@ -7433,12 +7434,8 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) unsigned int entries = 0; struct nft_object *obj; unsigned int idx = 0; - bool reset = false; int rc = 0;
- if (NFNL_MSG_TYPE(cb->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET) - reset = true; - rcu_read_lock(); nft_net = nft_pernet(net); cb->seq = READ_ONCE(nft_net->base_seq); @@ -7465,7 +7462,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) NFT_MSG_NEWOBJ, NLM_F_MULTI | NLM_F_APPEND, table->family, table, - obj, reset); + obj, ctx->reset); if (rc < 0) break;
@@ -7474,7 +7471,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) cont: idx++; } - if (reset && entries) + if (ctx->reset && entries) audit_log_obj_reset(table, nft_net->base_seq, entries); if (rc < 0) break; @@ -7501,6 +7498,9 @@ static int nf_tables_dump_obj_start(struct netlink_callback *cb) if (nla[NFTA_OBJ_TYPE]) ctx->type = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE]));
+ if (NFNL_MSG_TYPE(cb->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET) + ctx->reset = true; + return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit 69fc3e9e90f1afc11f4015e6b75d18ab9acee348 ]
Outsource the reply skb preparation for non-dump getrule requests into a distinct function. Prep work for object reset locking.
Signed-off-by: Phil Sutter phil@nwl.cc Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 75 ++++++++++++++++++++--------------- 1 file changed, 44 insertions(+), 31 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 38a5e5c5530c7..88eacfe746810 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7514,10 +7514,10 @@ static int nf_tables_dump_obj_done(struct netlink_callback *cb) }
/* called with rcu_read_lock held */ -static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, - const struct nlattr * const nla[]) +static struct sk_buff * +nf_tables_getobj_single(u32 portid, const struct nfnl_info *info, + const struct nlattr * const nla[], bool reset) { - const struct nftables_pernet *nft_net = nft_pernet(info->net); struct netlink_ext_ack *extack = info->extack; u8 genmask = nft_genmask_cur(info->net); u8 family = info->nfmsg->nfgen_family; @@ -7525,52 +7525,69 @@ static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, struct net *net = info->net; struct nft_object *obj; struct sk_buff *skb2; - bool reset = false; u32 objtype; - char *buf; int err;
- if (info->nlh->nlmsg_flags & NLM_F_DUMP) { - struct netlink_dump_control c = { - .start = nf_tables_dump_obj_start, - .dump = nf_tables_dump_obj, - .done = nf_tables_dump_obj_done, - .module = THIS_MODULE, - .data = (void *)nla, - }; - - return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c); - } - if (!nla[NFTA_OBJ_NAME] || !nla[NFTA_OBJ_TYPE]) - return -EINVAL; + return ERR_PTR(-EINVAL);
table = nft_table_lookup(net, nla[NFTA_OBJ_TABLE], family, genmask, 0); if (IS_ERR(table)) { NL_SET_BAD_ATTR(extack, nla[NFTA_OBJ_TABLE]); - return PTR_ERR(table); + return ERR_CAST(table); }
objtype = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE])); obj = nft_obj_lookup(net, table, nla[NFTA_OBJ_NAME], objtype, genmask); if (IS_ERR(obj)) { NL_SET_BAD_ATTR(extack, nla[NFTA_OBJ_NAME]); - return PTR_ERR(obj); + return ERR_CAST(obj); }
skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); if (!skb2) - return -ENOMEM; + return ERR_PTR(-ENOMEM); + + err = nf_tables_fill_obj_info(skb2, net, portid, + info->nlh->nlmsg_seq, NFT_MSG_NEWOBJ, 0, + family, table, obj, reset); + if (err < 0) { + kfree_skb(skb2); + return ERR_PTR(err); + } + + return skb2; +} + +static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, + const struct nlattr * const nla[]) +{ + struct nftables_pernet *nft_net = nft_pernet(info->net); + u32 portid = NETLINK_CB(skb).portid; + struct net *net = info->net; + struct sk_buff *skb2; + bool reset = false; + char *buf; + + if (info->nlh->nlmsg_flags & NLM_F_DUMP) { + struct netlink_dump_control c = { + .start = nf_tables_dump_obj_start, + .dump = nf_tables_dump_obj, + .done = nf_tables_dump_obj_done, + .module = THIS_MODULE, + .data = (void *)nla, + }; + + return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c); + }
if (NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET) reset = true;
- err = nf_tables_fill_obj_info(skb2, net, NETLINK_CB(skb).portid, - info->nlh->nlmsg_seq, NFT_MSG_NEWOBJ, 0, - family, table, obj, reset); - if (err < 0) - goto err_fill_obj_info; + skb2 = nf_tables_getobj_single(portid, info, nla, reset); + if (IS_ERR(skb2)) + return PTR_ERR(skb2);
if (!reset) return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid); @@ -7583,11 +7600,7 @@ static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, AUDIT_NFT_OP_OBJ_RESET, GFP_ATOMIC); kfree(buf);
- return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid); - -err_fill_obj_info: - kfree_skb(skb2); - return err; + return nfnetlink_unicast(skb2, net, portid); }
static void nft_obj_destroy(const struct nft_ctx *ctx, struct nft_object *obj)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit bd662c4218f9648e888bebde9468146965f3f8a0 ]
Objects' dump callbacks are not concurrency-safe per-se with reset bit set. If two CPUs perform a reset at the same time, at least counter and quota objects suffer from value underrun.
Prevent this by introducing dedicated locking callbacks for nfnetlink and the asynchronous dump handling to serialize access.
Fixes: 43da04a593d8 ("netfilter: nf_tables: atomic dump and reset for stateful objects") Signed-off-by: Phil Sutter phil@nwl.cc Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 72 ++++++++++++++++++++++++++++------- 1 file changed, 59 insertions(+), 13 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 88eacfe746810..63b7be0a95d04 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7482,6 +7482,19 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) return skb->len; }
+static int nf_tables_dumpreset_obj(struct sk_buff *skb, + struct netlink_callback *cb) +{ + struct nftables_pernet *nft_net = nft_pernet(sock_net(skb->sk)); + int ret; + + mutex_lock(&nft_net->commit_mutex); + ret = nf_tables_dump_obj(skb, cb); + mutex_unlock(&nft_net->commit_mutex); + + return ret; +} + static int nf_tables_dump_obj_start(struct netlink_callback *cb) { struct nft_obj_dump_ctx *ctx = (void *)cb->ctx; @@ -7498,12 +7511,18 @@ static int nf_tables_dump_obj_start(struct netlink_callback *cb) if (nla[NFTA_OBJ_TYPE]) ctx->type = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE]));
- if (NFNL_MSG_TYPE(cb->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET) - ctx->reset = true; - return 0; }
+static int nf_tables_dumpreset_obj_start(struct netlink_callback *cb) +{ + struct nft_obj_dump_ctx *ctx = (void *)cb->ctx; + + ctx->reset = true; + + return nf_tables_dump_obj_start(cb); +} + static int nf_tables_dump_obj_done(struct netlink_callback *cb) { struct nft_obj_dump_ctx *ctx = (void *)cb->ctx; @@ -7562,18 +7581,43 @@ nf_tables_getobj_single(u32 portid, const struct nfnl_info *info,
static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const nla[]) +{ + u32 portid = NETLINK_CB(skb).portid; + struct sk_buff *skb2; + + if (info->nlh->nlmsg_flags & NLM_F_DUMP) { + struct netlink_dump_control c = { + .start = nf_tables_dump_obj_start, + .dump = nf_tables_dump_obj, + .done = nf_tables_dump_obj_done, + .module = THIS_MODULE, + .data = (void *)nla, + }; + + return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c); + } + + skb2 = nf_tables_getobj_single(portid, info, nla, false); + if (IS_ERR(skb2)) + return PTR_ERR(skb2); + + return nfnetlink_unicast(skb2, info->net, portid); +} + +static int nf_tables_getobj_reset(struct sk_buff *skb, + const struct nfnl_info *info, + const struct nlattr * const nla[]) { struct nftables_pernet *nft_net = nft_pernet(info->net); u32 portid = NETLINK_CB(skb).portid; struct net *net = info->net; struct sk_buff *skb2; - bool reset = false; char *buf;
if (info->nlh->nlmsg_flags & NLM_F_DUMP) { struct netlink_dump_control c = { - .start = nf_tables_dump_obj_start, - .dump = nf_tables_dump_obj, + .start = nf_tables_dumpreset_obj_start, + .dump = nf_tables_dumpreset_obj, .done = nf_tables_dump_obj_done, .module = THIS_MODULE, .data = (void *)nla, @@ -7582,16 +7626,18 @@ static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c); }
- if (NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET) - reset = true; + if (!try_module_get(THIS_MODULE)) + return -EINVAL; + rcu_read_unlock(); + mutex_lock(&nft_net->commit_mutex); + skb2 = nf_tables_getobj_single(portid, info, nla, true); + mutex_unlock(&nft_net->commit_mutex); + rcu_read_lock(); + module_put(THIS_MODULE);
- skb2 = nf_tables_getobj_single(portid, info, nla, reset); if (IS_ERR(skb2)) return PTR_ERR(skb2);
- if (!reset) - return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid); - buf = kasprintf(GFP_ATOMIC, "%.*s:%u", nla_len(nla[NFTA_OBJ_TABLE]), (char *)nla_data(nla[NFTA_OBJ_TABLE]), @@ -8807,7 +8853,7 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = { .policy = nft_obj_policy, }, [NFT_MSG_GETOBJ_RESET] = { - .call = nf_tables_getobj, + .call = nf_tables_getobj_reset, .type = NFNL_CB_RCU, .attr_count = NFTA_OBJ_MAX, .policy = nft_obj_policy,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jie Wang wangjie125@huawei.com
[ Upstream commit 8445d9d3c03101859663d34fda747f6a50947556 ]
Currently, if hns3 PF or VF FLR reset failed after five times retry, the reset done process will directly release the semaphore which has already released in hclge_reset_prepare_general. This will cause down operation fail.
So this patch fixes it by adding reset state judgement. The up operation is only called after successful PF FLR reset.
Fixes: 8627bdedc435 ("net: hns3: refactor the precedure of PF FLR") Fixes: f28368bb4542 ("net: hns3: refactor the procedure of VF FLR") Signed-off-by: Jie Wang wangjie125@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 4 ++-- drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 01e24b69e9203..dfb428550ac03 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -11538,8 +11538,8 @@ static void hclge_reset_done(struct hnae3_ae_dev *ae_dev) dev_err(&hdev->pdev->dev, "fail to rebuild, ret=%d\n", ret);
hdev->reset_type = HNAE3_NONE_RESET; - clear_bit(HCLGE_STATE_RST_HANDLING, &hdev->state); - up(&hdev->reset_sem); + if (test_and_clear_bit(HCLGE_STATE_RST_HANDLING, &hdev->state)) + up(&hdev->reset_sem); }
static void hclge_clear_resetting_state(struct hclge_dev *hdev) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c index 1f5a27fb309aa..aebb104f4c290 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c @@ -1764,8 +1764,8 @@ static void hclgevf_reset_done(struct hnae3_ae_dev *ae_dev) ret);
hdev->reset_type = HNAE3_NONE_RESET; - clear_bit(HCLGEVF_STATE_RST_HANDLING, &hdev->state); - up(&hdev->reset_sem); + if (test_and_clear_bit(HCLGEVF_STATE_RST_HANDLING, &hdev->state)) + up(&hdev->reset_sem); }
static u32 hclgevf_get_fw_version(struct hnae3_handle *handle)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peiyang Wang wangpeiyang1@huawei.com
[ Upstream commit 30545e17eac1f50c5ef49644daf6af205100a965 ]
Consider the followed case that the user change speed and reset the net interface. Before the hw change speed successfully, the driver get old old speed from hw by timer task. After reset, the previous speed is config to hw. As a result, the new speed is configed successfully but lost after PF reset. The followed pictured shows more dirrectly.
+------+ +----+ +----+ | USER | | PF | | HW | +---+--+ +-+--+ +-+--+ | ethtool -s 100G | | +------------------>| set speed 100G | | +--------------------->| | | set successfully | | |<---------------------+---+ | |query cfg (timer task)| | | +--------------------->| | handle speed | | return 200G | | changing event | ethtool --reset |<---------------------+ | (100G) +------------------>| cfg previous speed |<--+ | | after reset (200G) | | +--------------------->| | | +---+ | |query cfg (timer task)| | | +--------------------->| | handle speed | | return 100G | | changing event | |<---------------------+ | (200G) | | |<--+ | |query cfg (timer task)| | +--------------------->| | | return 200G | | |<---------------------+ | | | v v v
This patch save new speed if hw change speed successfully, which will be used after reset successfully.
Fixes: 2d03eacc0b7e ("net: hns3: Only update mac configuation when necessary") Signed-off-by: Peiyang Wang wangpeiyang1@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../hisilicon/hns3/hns3pf/hclge_main.c | 24 ++++++++++++++----- .../hisilicon/hns3/hns3pf/hclge_mdio.c | 3 +++ 2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index dfb428550ac03..45bd5c79e4da8 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -2696,8 +2696,17 @@ static int hclge_cfg_mac_speed_dup_h(struct hnae3_handle *handle, int speed, { struct hclge_vport *vport = hclge_get_vport(handle); struct hclge_dev *hdev = vport->back; + int ret; + + ret = hclge_cfg_mac_speed_dup(hdev, speed, duplex, lane_num);
- return hclge_cfg_mac_speed_dup(hdev, speed, duplex, lane_num); + if (ret) + return ret; + + hdev->hw.mac.req_speed = speed; + hdev->hw.mac.req_duplex = duplex; + + return 0; }
static int hclge_set_autoneg_en(struct hclge_dev *hdev, bool enable) @@ -2999,17 +3008,20 @@ static int hclge_mac_init(struct hclge_dev *hdev) if (!test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state)) hdev->hw.mac.duplex = HCLGE_MAC_FULL;
- ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.speed, - hdev->hw.mac.duplex, hdev->hw.mac.lane_num); - if (ret) - return ret; - if (hdev->hw.mac.support_autoneg) { ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.autoneg); if (ret) return ret; }
+ if (!hdev->hw.mac.autoneg) { + ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed, + hdev->hw.mac.req_duplex, + hdev->hw.mac.lane_num); + if (ret) + return ret; + } + mac->link = 0;
if (mac->user_fec_mode & BIT(HNAE3_FEC_USER_DEF)) { diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c index 85fb11de43a12..80079657afebe 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c @@ -191,6 +191,9 @@ static void hclge_mac_adjust_link(struct net_device *netdev) if (ret) netdev_err(netdev, "failed to adjust link.\n");
+ hdev->hw.mac.req_speed = (u32)speed; + hdev->hw.mac.req_duplex = (u8)duplex; + ret = hclge_cfg_flowctrl(hdev); if (ret) netdev_err(netdev, "failed to configure flow control.\n");
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jie Wang wangjie125@huawei.com
[ Upstream commit be5e816d00a506719e9dbb1a9c861c5ced30a109 ]
When config TC during the reset process, may cause a deadlock, the flow is as below: pf reset start │ ▼ ...... setup tc │ │ ▼ ▼ DOWN: napi_disable() napi_disable()(skip) │ │ │ ▼ ▼ ...... ...... │ │ ▼ │ napi_enable() │ ▼ UINIT: netif_napi_del() │ ▼ ...... │ ▼ INIT: netif_napi_add() │ ▼ ...... global reset start │ │ ▼ ▼ UP: napi_enable()(skip) ...... │ │ ▼ ▼ ...... napi_disable()
In reset process, the driver will DOWN the port and then UINIT, in this case, the setup tc process will UP the port before UINIT, so cause the problem. Adds a DOWN process in UINIT to fix it.
Fixes: bb6b94a896d4 ("net: hns3: Add reset interface implementation in client") Signed-off-by: Jie Wang wangjie125@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 4ce43c3a00a37..0377a056aaecc 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -5728,6 +5728,9 @@ static int hns3_reset_notify_uninit_enet(struct hnae3_handle *handle) struct net_device *netdev = handle->kinfo.netdev; struct hns3_nic_priv *priv = netdev_priv(netdev);
+ if (!test_bit(HNS3_NIC_STATE_DOWN, &priv->state)) + hns3_nic_net_stop(netdev); + if (!test_and_clear_bit(HNS3_NIC_STATE_INITED, &priv->state)) { netdev_warn(netdev, "already uninitialized\n"); return 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Parsa Poorshikhian parsa.poorsh@gmail.com
[ Upstream commit ef9718b3d54e822de294351251f3a574f8a082ce ]
Fix noise from speakers connected to AUX port when no sound is playing. The problem occurs because the `alc_shutup_pins` function includes a 0x10ec0257 vendor ID, which causes noise on Lenovo IdeaPad 3 15IAU7 with Realtek ALC257 codec when no sound is playing. Removing this vendor ID from the function fixes the bug.
Fixes: 70794b9563fe ("ALSA: hda/realtek: Add more codec ID to no shutup pins list") Signed-off-by: Parsa Poorshikhian parsa.poorsh@gmail.com Link: https://patch.msgid.link/20240810150939.330693-1-parsa.poorsh@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 93d65a1acc475..b942ed868070d 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -583,7 +583,6 @@ static void alc_shutup_pins(struct hda_codec *codec) switch (codec->core.vendor_id) { case 0x10ec0236: case 0x10ec0256: - case 0x10ec0257: case 0x19e58326: case 0x10ec0283: case 0x10ec0285:
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lee Jones lee@kernel.org
[ Upstream commit a728342ae4ec2a7fdab0038b11427579424f133e ]
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/imu_v11_0.c: In function ‘imu_v11_0_init_microcode’: drivers/gpu/drm/amd/amdgpu/imu_v11_0.c:52:54: warning: ‘_imu.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] drivers/gpu/drm/amd/amdgpu/imu_v11_0.c:52:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40
Signed-off-by: Lee Jones lee@kernel.org Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/imu_v11_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/imu_v11_0.c b/drivers/gpu/drm/amd/amdgpu/imu_v11_0.c index 95548c512f4fb..3c21128fa1d82 100644 --- a/drivers/gpu/drm/amd/amdgpu/imu_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/imu_v11_0.c @@ -38,7 +38,7 @@ MODULE_FIRMWARE("amdgpu/gc_11_0_3_imu.bin");
static int imu_v11_0_init_microcode(struct amdgpu_device *adev) { - char fw_name[40]; + char fw_name[45]; char ucode_prefix[30]; int err; const struct imu_firmware_header_v1_0 *imu_hdr;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rand Deeb rand.sec96@gmail.com
[ Upstream commit e0b5127fa134fe0284d58877b6b3133939c8b3ce ]
In ssb_calc_clock_rate(), there is a potential issue where the value of m1 could be zero due to initialization using clkfactor_f6_resolv(). This situation raised concerns about the possibility of a division by zero error.
We fixed it by following the suggestions provided by Larry Finger Larry.Finger@lwfinger.net and Michael Büsch m@bues.ch. The fix involves returning a value of 1 instead of 0 in clkfactor_f6_resolv(). This modification ensures the proper functioning of the code and eliminates the risk of division by zero errors.
Signed-off-by: Rand Deeb rand.sec96@gmail.com Acked-by: Larry Finger Larry.Finger@lwfinger.net Acked-by: Michael Büsch m@bues.ch Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20230904232346.34991-1-rand.sec96@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ssb/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ssb/main.c b/drivers/ssb/main.c index 8a93c83cb6f80..d52e91258e989 100644 --- a/drivers/ssb/main.c +++ b/drivers/ssb/main.c @@ -837,7 +837,7 @@ static u32 clkfactor_f6_resolve(u32 v) case SSB_CHIPCO_CLK_F6_7: return 7; } - return 0; + return 1; }
/* Calculate the speed the backplane would run at a given set of clockcontrol values */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 1474bc87fe57deac726cc10203f73daa6c3212f7 ]
This might seem pretty pointless rather than changing the locking immediately, but it seems safer to run for a while with checks and the old locking scheme, and then remove the wdev lock later.
Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/core.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/wireless/core.h b/net/wireless/core.h index ee980965a7cfb..8118b8614ac68 100644 --- a/net/wireless/core.h +++ b/net/wireless/core.h @@ -228,6 +228,7 @@ void cfg80211_register_wdev(struct cfg80211_registered_device *rdev, static inline void wdev_lock(struct wireless_dev *wdev) __acquires(wdev) { + lockdep_assert_held(&wdev->wiphy->mtx); mutex_lock(&wdev->mtx); __acquire(wdev->mtx); } @@ -235,11 +236,16 @@ static inline void wdev_lock(struct wireless_dev *wdev) static inline void wdev_unlock(struct wireless_dev *wdev) __releases(wdev) { + lockdep_assert_held(&wdev->wiphy->mtx); __release(wdev->mtx); mutex_unlock(&wdev->mtx); }
-#define ASSERT_WDEV_LOCK(wdev) lockdep_assert_held(&(wdev)->mtx) +static inline void ASSERT_WDEV_LOCK(struct wireless_dev *wdev) +{ + lockdep_assert_held(&wdev->wiphy->mtx); + lockdep_assert_held(&wdev->mtx); +}
static inline bool cfg80211_has_monitors_only(struct cfg80211_registered_device *rdev) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 05f136220d17839eb7c155f015ace9152f603225 ]
As previously reported by Alexander, whose commit 69403bad97aa ("wifi: mac80211: sdata can be NULL during AMPDU start") I'm reverting as part of this commit, there's a race between station destruction and aggregation setup, where the aggregation setup can happen while the station is being removed and queue the work after ieee80211_sta_tear_down_BA_sessions() has already run in __sta_info_destroy_part1(), and thus the worker will run with a now freed station. In his case, this manifested in a NULL sdata pointer, but really there's no guarantee whatsoever.
The real issue seems to be that it's possible at all to have a situation where this occurs - we want to stop the BA sessions when doing _part1, but we cannot be sure, and WLAN_STA_BLOCK_BA isn't necessarily effective since we don't know that the setup isn't concurrently running and already got past the check.
Simply call ieee80211_sta_tear_down_BA_sessions() again in the second part of station destruction, since at that point really nothing else can hold a reference to the station any more.
Also revert the sdata checks since those are just misleading at this point.
Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/agg-tx.c | 6 +----- net/mac80211/driver-ops.c | 3 --- net/mac80211/sta_info.c | 14 ++++++++++++++ 3 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c index 85d2b9e4b51ce..e26a72f3a1042 100644 --- a/net/mac80211/agg-tx.c +++ b/net/mac80211/agg-tx.c @@ -491,7 +491,7 @@ void ieee80211_tx_ba_session_handle_start(struct sta_info *sta, int tid) { struct tid_ampdu_tx *tid_tx; struct ieee80211_local *local = sta->local; - struct ieee80211_sub_if_data *sdata; + struct ieee80211_sub_if_data *sdata = sta->sdata; struct ieee80211_ampdu_params params = { .sta = &sta->sta, .action = IEEE80211_AMPDU_TX_START, @@ -519,7 +519,6 @@ void ieee80211_tx_ba_session_handle_start(struct sta_info *sta, int tid) */ synchronize_net();
- sdata = sta->sdata; params.ssn = sta->tid_seq[tid] >> 4; ret = drv_ampdu_action(local, sdata, ¶ms); tid_tx->ssn = params.ssn; @@ -533,9 +532,6 @@ void ieee80211_tx_ba_session_handle_start(struct sta_info *sta, int tid) */ set_bit(HT_AGG_STATE_DRV_READY, &tid_tx->state); } else if (ret) { - if (!sdata) - return; - ht_dbg(sdata, "BA request denied - HW unavailable for %pM tid %d\n", sta->sta.addr, tid); diff --git a/net/mac80211/driver-ops.c b/net/mac80211/driver-ops.c index c08d3c9a4a177..5392ffa182704 100644 --- a/net/mac80211/driver-ops.c +++ b/net/mac80211/driver-ops.c @@ -391,9 +391,6 @@ int drv_ampdu_action(struct ieee80211_local *local,
might_sleep();
- if (!sdata) - return -EIO; - sdata = get_bss_sdata(sdata); if (!check_sdata_in_driver(sdata)) return -EIO; diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index 91768abf2d75b..dd1864f6549f2 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -1272,6 +1272,20 @@ static void __sta_info_destroy_part2(struct sta_info *sta) * after _part1 and before _part2! */
+ /* + * There's a potential race in _part1 where we set WLAN_STA_BLOCK_BA + * but someone might have just gotten past a check, and not yet into + * queuing the work/creating the data/etc. + * + * Do another round of destruction so that the worker is certainly + * canceled before we later free the station. + * + * Since this is after synchronize_rcu()/synchronize_net() we're now + * certain that nobody can actually hold a reference to the STA and + * be calling e.g. ieee80211_start_tx_ba_session(). + */ + ieee80211_sta_tear_down_BA_sessions(sta, AGG_STOP_DESTROY_STA); + might_sleep(); lockdep_assert_held(&local->sta_mtx);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhen Lei thunder.leizhen@huawei.com
[ Upstream commit 2cbc482d325ee58001472c4359b311958c4efdd1 ]
When a structure containing an RCU callback rhp is (incorrectly) freed and reallocated after rhp is passed to call_rcu(), it is not unusual for rhp->func to be set to NULL. This defeats the debugging prints used by __call_rcu_common() in kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, which expect to identify the offending code using the identity of this function.
And in kernels build without CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, things are even worse, as can be seen from this splat:
Unable to handle kernel NULL pointer dereference at virtual address 0 ... ... PC is at 0x0 LR is at rcu_do_batch+0x1c0/0x3b8 ... ... (rcu_do_batch) from (rcu_core+0x1d4/0x284) (rcu_core) from (__do_softirq+0x24c/0x344) (__do_softirq) from (__irq_exit_rcu+0x64/0x108) (__irq_exit_rcu) from (irq_exit+0x8/0x10) (irq_exit) from (__handle_domain_irq+0x74/0x9c) (__handle_domain_irq) from (gic_handle_irq+0x8c/0x98) (gic_handle_irq) from (__irq_svc+0x5c/0x94) (__irq_svc) from (arch_cpu_idle+0x20/0x3c) (arch_cpu_idle) from (default_idle_call+0x4c/0x78) (default_idle_call) from (do_idle+0xf8/0x150) (do_idle) from (cpu_startup_entry+0x18/0x20) (cpu_startup_entry) from (0xc01530)
This commit therefore adds calls to mem_dump_obj(rhp) to output some information, for example:
slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256
This provides the rough size of the memory block and the offset of the rcu_head structure, which as least provides at least a few clues to help locate the problem. If the problem is reproducible, additional slab debugging can be enabled, for example, CONFIG_DEBUG_SLAB=y, which can provide significantly more information.
Signed-off-by: Zhen Lei thunder.leizhen@huawei.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/rcu/rcu.h | 7 +++++++ kernel/rcu/srcutiny.c | 1 + kernel/rcu/srcutree.c | 1 + kernel/rcu/tasks.h | 1 + kernel/rcu/tiny.c | 1 + kernel/rcu/tree.c | 1 + 6 files changed, 12 insertions(+)
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 48d8f754b730e..49ff955ed2034 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -10,6 +10,7 @@ #ifndef __LINUX_RCU_H #define __LINUX_RCU_H
+#include <linux/slab.h> #include <trace/events/rcu.h>
/* @@ -211,6 +212,12 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head) } #endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
+static inline void debug_rcu_head_callback(struct rcu_head *rhp) +{ + if (unlikely(!rhp->func)) + kmem_dump_obj(rhp); +} + extern int rcu_cpu_stall_suppress_at_boot;
static inline bool rcu_stall_is_suppressed_at_boot(void) diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c index 33adafdad2613..5e7f336baa06a 100644 --- a/kernel/rcu/srcutiny.c +++ b/kernel/rcu/srcutiny.c @@ -138,6 +138,7 @@ void srcu_drive_gp(struct work_struct *wp) while (lh) { rhp = lh; lh = lh->next; + debug_rcu_head_callback(rhp); local_bh_disable(); rhp->func(rhp); local_bh_enable(); diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c index 929dcbc04d29c..f7825900bdfd7 100644 --- a/kernel/rcu/srcutree.c +++ b/kernel/rcu/srcutree.c @@ -1591,6 +1591,7 @@ static void srcu_invoke_callbacks(struct work_struct *work) rhp = rcu_cblist_dequeue(&ready_cbs); for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) { debug_rcu_head_unqueue(rhp); + debug_rcu_head_callback(rhp); local_bh_disable(); rhp->func(rhp); local_bh_enable(); diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 456c956f481ef..bb6b037ef30fa 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -487,6 +487,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags); len = rcl.len; for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) { + debug_rcu_head_callback(rhp); local_bh_disable(); rhp->func(rhp); local_bh_enable(); diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c index a33a8d4942c37..21c040cba4bd0 100644 --- a/kernel/rcu/tiny.c +++ b/kernel/rcu/tiny.c @@ -97,6 +97,7 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
trace_rcu_invoke_callback("", head); f = head->func; + debug_rcu_head_callback(head); WRITE_ONCE(head->func, (rcu_callback_t)0L); f(head); rcu_lock_release(&rcu_callback_map); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index cd6144cea5a1a..86923a8914007 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2292,6 +2292,7 @@ static void rcu_do_batch(struct rcu_data *rdp) trace_rcu_invoke_callback(rcu_state.name, rhp);
f = rhp->func; + debug_rcu_head_callback(rhp); WRITE_ONCE(rhp->func, (rcu_callback_t)0L); f(rhp);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul E. McKenney paulmck@kernel.org
[ Upstream commit 0ae9942f03d0d034fdb0a4f44fc99f62a3107987 ]
When using rcutorture as a module, there are a number of conditions that can abort the modprobe operation, for example, when attempting to run both RCU CPU stall warning tests and forward-progress tests. This can cause rcu_torture_cleanup() to be invoked on the unwind path out of rcu_rcu_torture_init(), which will mean that rcu_gp_slow_unregister() is invoked without a matching rcu_gp_slow_register(). This will cause a splat because rcu_gp_slow_unregister() is passed rcu_fwd_cb_nodelay, which does not match a NULL pointer.
This commit therefore forgives a mismatch involving a NULL pointer, thus avoiding this false-positive splat.
Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/rcu/tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 86923a8914007..dd6e15ca63b0c 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1336,7 +1336,7 @@ EXPORT_SYMBOL_GPL(rcu_gp_slow_register); /* Unregister a counter, with NULL for not caring which. */ void rcu_gp_slow_unregister(atomic_t *rgssp) { - WARN_ON_ONCE(rgssp && rgssp != rcu_gp_slow_suppress); + WARN_ON_ONCE(rgssp && rgssp != rcu_gp_slow_suppress && rcu_gp_slow_suppress != NULL);
WRITE_ONCE(rcu_gp_slow_suppress, NULL); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Johnson quic_jjohnson@quicinc.com
[ Upstream commit b7bcea9c27b3d87b54075735c870500123582145 ]
While converting struct ieee80211_tim_ie::virtual_map to be a flexible array it was observed that the TIM IE processing in cw1200_rx_cb() could potentially process a malformed IE in a manner that could result in a buffer over-read. Add logic to verify that the TIM IE length is large enough to hold a valid TIM payload before processing it.
Signed-off-by: Jeff Johnson quic_jjohnson@quicinc.com Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20230831-ieee80211_tim_ie-v3-1-e10ff584ab5d@quicin... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/st/cw1200/txrx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/st/cw1200/txrx.c b/drivers/net/wireless/st/cw1200/txrx.c index 6894b919ff94b..e16e9ae90d204 100644 --- a/drivers/net/wireless/st/cw1200/txrx.c +++ b/drivers/net/wireless/st/cw1200/txrx.c @@ -1166,7 +1166,7 @@ void cw1200_rx_cb(struct cw1200_common *priv, size_t ies_len = skb->len - (ies - (u8 *)(skb->data));
tim_ie = cfg80211_find_ie(WLAN_EID_TIM, ies, ies_len); - if (tim_ie) { + if (tim_ie && tim_ie[1] >= sizeof(struct ieee80211_tim_ie)) { struct ieee80211_tim_ie *tim = (struct ieee80211_tim_ie *)&tim_ie[2];
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kamalesh Babulal kamalesh.babulal@oracle.com
[ Upstream commit d24f05987ce8bf61e62d86fedbe47523dc5c3393 ]
Use css directly instead of dereferencing it from &cgroup->self, while adding the cgroup v2 cft base and psi files in css_populate_dir(). Both points to the same css, when css->ss is NULL, this avoids extra deferences and makes code consistent in usage across the function.
Signed-off-by: Kamalesh Babulal kamalesh.babulal@oracle.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/cgroup/cgroup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 489c25713edcb..455f67ff31b57 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1751,13 +1751,13 @@ static int css_populate_dir(struct cgroup_subsys_state *css)
if (!css->ss) { if (cgroup_on_dfl(cgrp)) { - ret = cgroup_addrm_files(&cgrp->self, cgrp, + ret = cgroup_addrm_files(css, cgrp, cgroup_base_files, true); if (ret < 0) return ret;
if (cgroup_psi_enabled()) { - ret = cgroup_addrm_files(&cgrp->self, cgrp, + ret = cgroup_addrm_files(css, cgrp, cgroup_psi_files, true); if (ret < 0) return ret;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit 7890fce6201aed46d3576e3d641f9ee5c1f0e16f ]
Value comes from DT, so it could be 0. Unlikely, but could be.
Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-riic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/i2c/busses/i2c-riic.c b/drivers/i2c/busses/i2c-riic.c index 849848ccb0802..b9959621cc5d7 100644 --- a/drivers/i2c/busses/i2c-riic.c +++ b/drivers/i2c/busses/i2c-riic.c @@ -314,7 +314,7 @@ static int riic_init_hw(struct riic_dev *riic, struct i2c_timings *t) * frequency with only 62 clock ticks max (31 high, 31 low). * Aim for a duty of 60% LOW, 40% HIGH. */ - total_ticks = DIV_ROUND_UP(rate, t->bus_freq_hz); + total_ticks = DIV_ROUND_UP(rate, t->bus_freq_hz ?: 1);
for (cks = 0; cks < 7; cks++) { /*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhu Yanjun yanjun.zhu@linux.dev
[ Upstream commit c5930a1aa08aafe6ffe15b5d28fe875f88f6ac86 ]
No functionality change. The variable which is not initialized fully will introduce potential risks.
Signed-off-by: Zhu Yanjun yanjun.zhu@linux.dev Link: https://lore.kernel.org/r/20230919020806.534183-1-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/ulp/rtrs/rtrs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c index 716ec7baddefd..d71b1d83e9ffb 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs.c +++ b/drivers/infiniband/ulp/rtrs/rtrs.c @@ -255,7 +255,7 @@ static int create_cq(struct rtrs_con *con, int cq_vector, int nr_cqe, static int create_qp(struct rtrs_con *con, struct ib_pd *pd, u32 max_send_wr, u32 max_recv_wr, u32 max_sge) { - struct ib_qp_init_attr init_attr = {NULL}; + struct ib_qp_init_attr init_attr = {}; struct rdma_cm_id *cm_id = con->cm_id; int ret;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiko Carstens hca@linux.ibm.com
[ Upstream commit 4a1725281fc5b0009944b1c0e1d2c1dc311a09ec ]
Both the external call as well as the emergency signal submask bits in control register 0 are set before any interrupt handler is registered.
Change the order and first register the interrupt handler and only then enable the interrupts by setting the corresponding bits in control register 0.
This prevents that the second part of the machine check handler for early machine check handling is not executed: the machine check handler sends an IPI to the CPU it runs on. If the corresponding interrupts are enabled, but no interrupt handler is present, the interrupt is ignored.
Reviewed-by: Sven Schnelle svens@linux.ibm.com Acked-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/early.c | 12 +++--------- arch/s390/kernel/smp.c | 4 ++-- 2 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c index 9693c8630e73f..b3cb256ec6692 100644 --- a/arch/s390/kernel/early.c +++ b/arch/s390/kernel/early.c @@ -237,15 +237,9 @@ static inline void save_vector_registers(void) #endif }
-static inline void setup_control_registers(void) +static inline void setup_low_address_protection(void) { - unsigned long reg; - - __ctl_store(reg, 0, 0); - reg |= CR0_LOW_ADDRESS_PROTECTION; - reg |= CR0_EMERGENCY_SIGNAL_SUBMASK; - reg |= CR0_EXTERNAL_CALL_SUBMASK; - __ctl_load(reg, 0, 0); + __ctl_set_bit(0, 28); }
static inline void setup_access_registers(void) @@ -304,7 +298,7 @@ void __init startup_init(void) save_vector_registers(); setup_topology(); sclp_early_detect(); - setup_control_registers(); + setup_low_address_protection(); setup_access_registers(); lockdep_on(); } diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c index 0031325ce4bc9..436dbf4d743d8 100644 --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -1007,12 +1007,12 @@ void __init smp_fill_possible_mask(void)
void __init smp_prepare_cpus(unsigned int max_cpus) { - /* request the 0x1201 emergency signal external interrupt */ if (register_external_irq(EXT_IRQ_EMERGENCY_SIG, do_ext_call_interrupt)) panic("Couldn't request external interrupt 0x1201"); - /* request the 0x1202 external call external interrupt */ + ctl_set_bit(0, 14); if (register_external_irq(EXT_IRQ_EXTERNAL_CALL, do_ext_call_interrupt)) panic("Couldn't request external interrupt 0x1202"); + ctl_set_bit(0, 13); }
void __init smp_prepare_boot_cpu(void)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomi Valkeinen tomi.valkeinen@ideasonboard.com
[ Upstream commit 9fc75c40faa29df14ba16066be6bdfaea9f39ce4 ]
The DSI horizontal timing calculations done by the driver seem to often lead to underflows or overflows, depending on the videomode.
There are two main things the current driver doesn't seem to get right: DSI HSW and HFP, and VSDly. However, even following Toshiba's documentation it seems we don't always get a working display.
This patch attempts to fix the horizontal timings for DSI event mode, and on a system with a DSI->HDMI encoder, a lot of standard HDMI modes now seem to work. The work relies on Toshiba's documentation, but also quite a bit on empirical testing.
This also adds timing related debug prints to make it easier to improve on this later.
The DSI pulse mode has only been tested with a fixed-resolution panel, which limits the testing of different modes on DSI pulse mode. However, as the VSDly calculation also affects pulse mode, so this might cause a regression.
Reviewed-by: Peter Ujfalusi peter.ujfalusi@gmail.com Tested-by: Marcel Ziswiler marcel.ziswiler@toradex.com Tested-by: Maxim Schwalm maxim.schwalm@gmail.com # Asus TF700T Signed-off-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Signed-off-by: Robert Foss rfoss@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20230906-tc358768-v4-12-31725f... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/tc358768.c | 213 ++++++++++++++++++++++++++---- 1 file changed, 185 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/bridge/tc358768.c b/drivers/gpu/drm/bridge/tc358768.c index 8429b6518b502..aabdb5c74d936 100644 --- a/drivers/gpu/drm/bridge/tc358768.c +++ b/drivers/gpu/drm/bridge/tc358768.c @@ -9,6 +9,7 @@ #include <linux/gpio/consumer.h> #include <linux/i2c.h> #include <linux/kernel.h> +#include <linux/math64.h> #include <linux/media-bus-format.h> #include <linux/minmax.h> #include <linux/module.h> @@ -158,6 +159,7 @@ struct tc358768_priv { u32 frs; /* PLL Freqency range for HSCK (post divider) */
u32 dsiclk; /* pll_clk / 2 */ + u32 pclk; /* incoming pclk rate */ };
static inline struct tc358768_priv *dsi_host_to_tc358768(struct mipi_dsi_host @@ -381,6 +383,7 @@ static int tc358768_calc_pll(struct tc358768_priv *priv, priv->prd = best_prd; priv->frs = frs; priv->dsiclk = best_pll / 2; + priv->pclk = mode->clock * 1000;
return 0; } @@ -639,6 +642,28 @@ static u32 tc358768_ps_to_ns(u32 ps) return ps / 1000; }
+static u32 tc358768_dpi_to_ns(u32 val, u32 pclk) +{ + return (u32)div_u64((u64)val * NANO, pclk); +} + +/* Convert value in DPI pixel clock units to DSI byte count */ +static u32 tc358768_dpi_to_dsi_bytes(struct tc358768_priv *priv, u32 val) +{ + u64 m = (u64)val * priv->dsiclk / 4 * priv->dsi_lanes; + u64 n = priv->pclk; + + return (u32)div_u64(m + n - 1, n); +} + +static u32 tc358768_dsi_bytes_to_ns(struct tc358768_priv *priv, u32 val) +{ + u64 m = (u64)val * NANO; + u64 n = priv->dsiclk / 4 * priv->dsi_lanes; + + return (u32)div_u64(m, n); +} + static void tc358768_bridge_pre_enable(struct drm_bridge *bridge) { struct tc358768_priv *priv = bridge_to_tc358768(bridge); @@ -648,11 +673,19 @@ static void tc358768_bridge_pre_enable(struct drm_bridge *bridge) s32 raw_val; const struct drm_display_mode *mode; u32 hsbyteclk_ps, dsiclk_ps, ui_ps; - u32 dsiclk, hsbyteclk, video_start; - const u32 internal_delay = 40; + u32 dsiclk, hsbyteclk; int ret, i; struct videomode vm; struct device *dev = priv->dev; + /* In pixelclock units */ + u32 dpi_htot, dpi_data_start; + /* In byte units */ + u32 dsi_dpi_htot, dsi_dpi_data_start; + u32 dsi_hsw, dsi_hbp, dsi_hact, dsi_hfp; + const u32 dsi_hss = 4; /* HSS is a short packet (4 bytes) */ + /* In hsbyteclk units */ + u32 dsi_vsdly; + const u32 internal_dly = 40;
if (mode_flags & MIPI_DSI_CLOCK_NON_CONTINUOUS) { dev_warn_once(dev, "Non-continuous mode unimplemented, falling back to continuous\n"); @@ -687,27 +720,23 @@ static void tc358768_bridge_pre_enable(struct drm_bridge *bridge) case MIPI_DSI_FMT_RGB888: val |= (0x3 << 4); hact = vm.hactive * 3; - video_start = (vm.hsync_len + vm.hback_porch) * 3; data_type = MIPI_DSI_PACKED_PIXEL_STREAM_24; break; case MIPI_DSI_FMT_RGB666: val |= (0x4 << 4); hact = vm.hactive * 3; - video_start = (vm.hsync_len + vm.hback_porch) * 3; data_type = MIPI_DSI_PACKED_PIXEL_STREAM_18; break;
case MIPI_DSI_FMT_RGB666_PACKED: val |= (0x4 << 4) | BIT(3); hact = vm.hactive * 18 / 8; - video_start = (vm.hsync_len + vm.hback_porch) * 18 / 8; data_type = MIPI_DSI_PIXEL_STREAM_3BYTE_18; break;
case MIPI_DSI_FMT_RGB565: val |= (0x5 << 4); hact = vm.hactive * 2; - video_start = (vm.hsync_len + vm.hback_porch) * 2; data_type = MIPI_DSI_PACKED_PIXEL_STREAM_16; break; default: @@ -717,9 +746,152 @@ static void tc358768_bridge_pre_enable(struct drm_bridge *bridge) return; }
+ /* + * There are three important things to make TC358768 work correctly, + * which are not trivial to manage: + * + * 1. Keep the DPI line-time and the DSI line-time as close to each + * other as possible. + * 2. TC358768 goes to LP mode after each line's active area. The DSI + * HFP period has to be long enough for entering and exiting LP mode. + * But it is not clear how to calculate this. + * 3. VSDly (video start delay) has to be long enough to ensure that the + * DSI TX does not start transmitting until we have started receiving + * pixel data from the DPI input. It is not clear how to calculate + * this either. + */ + + dpi_htot = vm.hactive + vm.hfront_porch + vm.hsync_len + vm.hback_porch; + dpi_data_start = vm.hsync_len + vm.hback_porch; + + dev_dbg(dev, "dpi horiz timing (pclk): %u + %u + %u + %u = %u\n", + vm.hsync_len, vm.hback_porch, vm.hactive, vm.hfront_porch, + dpi_htot); + + dev_dbg(dev, "dpi horiz timing (ns): %u + %u + %u + %u = %u\n", + tc358768_dpi_to_ns(vm.hsync_len, vm.pixelclock), + tc358768_dpi_to_ns(vm.hback_porch, vm.pixelclock), + tc358768_dpi_to_ns(vm.hactive, vm.pixelclock), + tc358768_dpi_to_ns(vm.hfront_porch, vm.pixelclock), + tc358768_dpi_to_ns(dpi_htot, vm.pixelclock)); + + dev_dbg(dev, "dpi data start (ns): %u + %u = %u\n", + tc358768_dpi_to_ns(vm.hsync_len, vm.pixelclock), + tc358768_dpi_to_ns(vm.hback_porch, vm.pixelclock), + tc358768_dpi_to_ns(dpi_data_start, vm.pixelclock)); + + dsi_dpi_htot = tc358768_dpi_to_dsi_bytes(priv, dpi_htot); + dsi_dpi_data_start = tc358768_dpi_to_dsi_bytes(priv, dpi_data_start); + + if (dsi_dev->mode_flags & MIPI_DSI_MODE_VIDEO_SYNC_PULSE) { + dsi_hsw = tc358768_dpi_to_dsi_bytes(priv, vm.hsync_len); + dsi_hbp = tc358768_dpi_to_dsi_bytes(priv, vm.hback_porch); + } else { + /* HBP is included in HSW in event mode */ + dsi_hbp = 0; + dsi_hsw = tc358768_dpi_to_dsi_bytes(priv, + vm.hsync_len + + vm.hback_porch); + + /* + * The pixel packet includes the actual pixel data, and: + * DSI packet header = 4 bytes + * DCS code = 1 byte + * DSI packet footer = 2 bytes + */ + dsi_hact = hact + 4 + 1 + 2; + + dsi_hfp = dsi_dpi_htot - dsi_hact - dsi_hsw - dsi_hss; + + /* + * Here we should check if HFP is long enough for entering LP + * and exiting LP, but it's not clear how to calculate that. + * Instead, this is a naive algorithm that just adjusts the HFP + * and HSW so that HFP is (at least) roughly 2/3 of the total + * blanking time. + */ + if (dsi_hfp < (dsi_hfp + dsi_hsw + dsi_hss) * 2 / 3) { + u32 old_hfp = dsi_hfp; + u32 old_hsw = dsi_hsw; + u32 tot = dsi_hfp + dsi_hsw + dsi_hss; + + dsi_hsw = tot / 3; + + /* + * Seems like sometimes HSW has to be divisible by num-lanes, but + * not always... + */ + dsi_hsw = roundup(dsi_hsw, priv->dsi_lanes); + + dsi_hfp = dsi_dpi_htot - dsi_hact - dsi_hsw - dsi_hss; + + dev_dbg(dev, + "hfp too short, adjusting dsi hfp and dsi hsw from %u, %u to %u, %u\n", + old_hfp, old_hsw, dsi_hfp, dsi_hsw); + } + + dev_dbg(dev, + "dsi horiz timing (bytes): %u, %u + %u + %u + %u = %u\n", + dsi_hss, dsi_hsw, dsi_hbp, dsi_hact, dsi_hfp, + dsi_hss + dsi_hsw + dsi_hbp + dsi_hact + dsi_hfp); + + dev_dbg(dev, "dsi horiz timing (ns): %u + %u + %u + %u + %u = %u\n", + tc358768_dsi_bytes_to_ns(priv, dsi_hss), + tc358768_dsi_bytes_to_ns(priv, dsi_hsw), + tc358768_dsi_bytes_to_ns(priv, dsi_hbp), + tc358768_dsi_bytes_to_ns(priv, dsi_hact), + tc358768_dsi_bytes_to_ns(priv, dsi_hfp), + tc358768_dsi_bytes_to_ns(priv, dsi_hss + dsi_hsw + + dsi_hbp + dsi_hact + dsi_hfp)); + } + + /* VSDly calculation */ + + /* Start with the HW internal delay */ + dsi_vsdly = internal_dly; + + /* Convert to byte units as the other variables are in byte units */ + dsi_vsdly *= priv->dsi_lanes; + + /* Do we need more delay, in addition to the internal? */ + if (dsi_dpi_data_start > dsi_vsdly + dsi_hss + dsi_hsw + dsi_hbp) { + dsi_vsdly = dsi_dpi_data_start - dsi_hss - dsi_hsw - dsi_hbp; + dsi_vsdly = roundup(dsi_vsdly, priv->dsi_lanes); + } + + dev_dbg(dev, "dsi data start (bytes) %u + %u + %u + %u = %u\n", + dsi_vsdly, dsi_hss, dsi_hsw, dsi_hbp, + dsi_vsdly + dsi_hss + dsi_hsw + dsi_hbp); + + dev_dbg(dev, "dsi data start (ns) %u + %u + %u + %u = %u\n", + tc358768_dsi_bytes_to_ns(priv, dsi_vsdly), + tc358768_dsi_bytes_to_ns(priv, dsi_hss), + tc358768_dsi_bytes_to_ns(priv, dsi_hsw), + tc358768_dsi_bytes_to_ns(priv, dsi_hbp), + tc358768_dsi_bytes_to_ns(priv, dsi_vsdly + dsi_hss + dsi_hsw + dsi_hbp)); + + /* Convert back to hsbyteclk */ + dsi_vsdly /= priv->dsi_lanes; + + /* + * The docs say that there is an internal delay of 40 cycles. + * However, we get underflows if we follow that rule. If we + * instead ignore the internal delay, things work. So either + * the docs are wrong or the calculations are wrong. + * + * As a temporary fix, add the internal delay here, to counter + * the subtraction when writing the register. + */ + dsi_vsdly += internal_dly; + + /* Clamp to the register max */ + if (dsi_vsdly - internal_dly > 0x3ff) { + dev_warn(dev, "VSDly too high, underflows likely\n"); + dsi_vsdly = 0x3ff + internal_dly; + } + /* VSDly[9:0] */ - video_start = max(video_start, internal_delay + 1) - internal_delay; - tc358768_write(priv, TC358768_VSDLY, video_start); + tc358768_write(priv, TC358768_VSDLY, dsi_vsdly - internal_dly);
tc358768_write(priv, TC358768_DATAFMT, val); tc358768_write(priv, TC358768_DSITX_DT, data_type); @@ -827,18 +999,6 @@ static void tc358768_bridge_pre_enable(struct drm_bridge *bridge)
/* vbp */ tc358768_write(priv, TC358768_DSI_VBPR, vm.vback_porch); - - /* hsw * byteclk * ndl / pclk */ - val = (u32)div_u64(vm.hsync_len * - (u64)hsbyteclk * priv->dsi_lanes, - vm.pixelclock); - tc358768_write(priv, TC358768_DSI_HSW, val); - - /* hbp * byteclk * ndl / pclk */ - val = (u32)div_u64(vm.hback_porch * - (u64)hsbyteclk * priv->dsi_lanes, - vm.pixelclock); - tc358768_write(priv, TC358768_DSI_HBPR, val); } else { /* Set event mode */ tc358768_write(priv, TC358768_DSI_EVENT, 1); @@ -852,16 +1012,13 @@ static void tc358768_bridge_pre_enable(struct drm_bridge *bridge)
/* vbp (not used in event mode) */ tc358768_write(priv, TC358768_DSI_VBPR, 0); + }
- /* (hsw + hbp) * byteclk * ndl / pclk */ - val = (u32)div_u64((vm.hsync_len + vm.hback_porch) * - (u64)hsbyteclk * priv->dsi_lanes, - vm.pixelclock); - tc358768_write(priv, TC358768_DSI_HSW, val); + /* hsw (bytes) */ + tc358768_write(priv, TC358768_DSI_HSW, dsi_hsw);
- /* hbp (not used in event mode) */ - tc358768_write(priv, TC358768_DSI_HBPR, 0); - } + /* hbp (bytes) */ + tc358768_write(priv, TC358768_DSI_HBPR, dsi_hbp);
/* hact (bytes) */ tc358768_write(priv, TC358768_DSI_HACT, hact);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jarkko Nikula jarkko.nikula@linux.intel.com
[ Upstream commit 361acacaf7c706223968c8186f0d3b6e214e7403 ]
Ring Abort request will timeout in case there is an error in the Host Controller interrupt delivery or Ring Header configuration. Using BUG() makes hard to debug those cases.
Make it less severe and turn BUG() to WARN_ON().
Signed-off-by: Jarkko Nikula jarkko.nikula@linux.intel.com Link: https://lore.kernel.org/r/20230921055704.1087277-6-jarkko.nikula@linux.intel... Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i3c/master/mipi-i3c-hci/dma.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/i3c/master/mipi-i3c-hci/dma.c b/drivers/i3c/master/mipi-i3c-hci/dma.c index 71b5dbe45c45c..a28ff177022ce 100644 --- a/drivers/i3c/master/mipi-i3c-hci/dma.c +++ b/drivers/i3c/master/mipi-i3c-hci/dma.c @@ -450,10 +450,9 @@ static bool hci_dma_dequeue_xfer(struct i3c_hci *hci, /* * We're deep in it if ever this condition is ever met. * Hardware might still be writing to memory, etc. - * Better suspend the world than risking silent corruption. */ dev_crit(&hci->master.dev, "unable to abort the ring\n"); - BUG(); + WARN_ON(1); }
for (i = 0; i < n; i++) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jarkko Nikula jarkko.nikula@linux.intel.com
[ Upstream commit b8806e0c939f168237593af0056c309bf31022b0 ]
Fix following warning (with CONFIG_DMA_API_DEBUG) which happens with a transfer without a data buffer.
DMA-API: i3c mipi-i3c-hci.0: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=0 bytes]
For those transfers the hci_dma_queue_xfer() doesn't create a mapping and the DMA address pointer xfer->data_dma is not set.
Signed-off-by: Jarkko Nikula jarkko.nikula@linux.intel.com Link: https://lore.kernel.org/r/20230921055704.1087277-10-jarkko.nikula@linux.inte... Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i3c/master/mipi-i3c-hci/dma.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/i3c/master/mipi-i3c-hci/dma.c b/drivers/i3c/master/mipi-i3c-hci/dma.c index a28ff177022ce..337c95d43f3f6 100644 --- a/drivers/i3c/master/mipi-i3c-hci/dma.c +++ b/drivers/i3c/master/mipi-i3c-hci/dma.c @@ -345,6 +345,8 @@ static void hci_dma_unmap_xfer(struct i3c_hci *hci,
for (i = 0; i < n; i++) { xfer = xfer_list + i; + if (!xfer->data) + continue; dma_unmap_single(&hci->master.dev, xfer->data_dma, xfer->data_len, xfer->rnw ? DMA_FROM_DEVICE : DMA_TO_DEVICE);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Philip Yang Philip.Yang@amd.com
[ Upstream commit 101b8104307eac734f2dfa4d3511430b0b631c73 ]
Otherwise GPU may access the stale mapping and generate IOMMU IO_PAGE_FAULT.
Move this to inside p->mutex to prevent multiple threads mapping and unmapping concurrently race condition.
After kfd_mem_dmaunmap_attachment is removed from unmap_bo_from_gpuvm, kfd_mem_dmaunmap_attachment is called if failed to map to GPUs, and before free the mem attachment in case failed to unmap from GPUs.
Signed-off-by: Philip Yang Philip.Yang@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 26 ++++++++++++++++--- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 20 ++++++++------ 3 files changed, 35 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index dbc842590b253..585d608c10e8e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -286,6 +286,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv); int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu( struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv); +void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv); int amdgpu_amdkfd_gpuvm_sync_memory( struct amdgpu_device *adev, struct kgd_mem *mem, bool intr); int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem *mem, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 7d5fbaaba72f7..3e7f4d8dc9d13 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -719,7 +719,7 @@ kfd_mem_dmaunmap_sg_bo(struct kgd_mem *mem, enum dma_data_direction dir;
if (unlikely(!ttm->sg)) { - pr_err("SG Table of BO is UNEXPECTEDLY NULL"); + pr_debug("SG Table of BO is NULL"); return; }
@@ -1226,8 +1226,6 @@ static void unmap_bo_from_gpuvm(struct kgd_mem *mem, amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update);
amdgpu_sync_fence(sync, bo_va->last_pt_update); - - kfd_mem_dmaunmap_attachment(mem, entry); }
static int update_gpuvm_pte(struct kgd_mem *mem, @@ -1282,6 +1280,7 @@ static int map_bo_to_gpuvm(struct kgd_mem *mem,
update_gpuvm_pte_failed: unmap_bo_from_gpuvm(mem, entry, sync); + kfd_mem_dmaunmap_attachment(mem, entry); return ret; }
@@ -1852,8 +1851,10 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( mem->va + bo_size * (1 + mem->aql_queue));
/* Remove from VM internal data structures */ - list_for_each_entry_safe(entry, tmp, &mem->attachments, list) + list_for_each_entry_safe(entry, tmp, &mem->attachments, list) { + kfd_mem_dmaunmap_attachment(mem, entry); kfd_mem_detach(entry); + }
ret = unreserve_bo_and_vms(&ctx, false, false);
@@ -2024,6 +2025,23 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( return ret; }
+void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv) +{ + struct kfd_mem_attachment *entry; + struct amdgpu_vm *vm; + + vm = drm_priv_to_vm(drm_priv); + + mutex_lock(&mem->lock); + + list_for_each_entry(entry, &mem->attachments, list) { + if (entry->bo_va->base.vm == vm) + kfd_mem_dmaunmap_attachment(mem, entry); + } + + mutex_unlock(&mem->lock); +} + int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu( struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv) { diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index b0f475d51ae7e..2b21ce967e766 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1400,17 +1400,21 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep, goto sync_memory_failed; } } - mutex_unlock(&p->mutex);
- if (flush_tlb) { - /* Flush TLBs after waiting for the page table updates to complete */ - for (i = 0; i < args->n_devices; i++) { - peer_pdd = kfd_process_device_data_by_id(p, devices_arr[i]); - if (WARN_ON_ONCE(!peer_pdd)) - continue; + /* Flush TLBs after waiting for the page table updates to complete */ + for (i = 0; i < args->n_devices; i++) { + peer_pdd = kfd_process_device_data_by_id(p, devices_arr[i]); + if (WARN_ON_ONCE(!peer_pdd)) + continue; + if (flush_tlb) kfd_flush_tlb(peer_pdd, TLB_FLUSH_HEAVYWEIGHT); - } + + /* Remove dma mapping after tlb flush to avoid IO_PAGE_FAULT */ + amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv); } + + mutex_unlock(&p->mutex); + kfree(devices_arr);
return 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 8b7f3cf4eb9a95940eaabad3226caeaa0d9aa59d ]
This fixes this warning:
drivers/media/radio/radio-isa.c: In function 'radio_isa_querycap': drivers/media/radio/radio-isa.c:39:57: warning: '%s' directive output may be truncated writing up to 35 bytes into a region of size 28 [-Wformat-truncation=] 39 | snprintf(v->bus_info, sizeof(v->bus_info), "ISA:%s", isa->v4l2_dev.name); | ^~ drivers/media/radio/radio-isa.c:39:9: note: 'snprintf' output between 5 and 40 bytes into a destination of size 32 39 | snprintf(v->bus_info, sizeof(v->bus_info), "ISA:%s", isa->v4l2_dev.name); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/radio/radio-isa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/radio/radio-isa.c b/drivers/media/radio/radio-isa.c index c591c0851fa28..ad49151f5ff09 100644 --- a/drivers/media/radio/radio-isa.c +++ b/drivers/media/radio/radio-isa.c @@ -36,7 +36,7 @@ static int radio_isa_querycap(struct file *file, void *priv,
strscpy(v->driver, isa->drv->driver.driver.name, sizeof(v->driver)); strscpy(v->card, isa->drv->card, sizeof(v->card)); - snprintf(v->bus_info, sizeof(v->bus_info), "ISA:%s", isa->v4l2_dev.name); + snprintf(v->bus_info, sizeof(v->bus_info), "ISA:%s", dev_name(isa->v4l2_dev.dev)); return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Lechner dlechner@baylibre.com
[ Upstream commit 7fe2d05cee46b1c4d9f1efaeab08cc31a0dfff60 ]
This fixes a use before initialization in ad2s1210_probe(). The ad2s1210_setup_gpios() function uses st->sdev but it was being called before this field was initialized.
Signed-off-by: David Lechner dlechner@baylibre.com Link: https://lore.kernel.org/r/20230929-ad2s1210-mainline-v3-2-fa4364281745@bayli... Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/iio/resolver/ad2s1210.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/staging/iio/resolver/ad2s1210.c b/drivers/staging/iio/resolver/ad2s1210.c index 636c45b128438..afe89c91c89ea 100644 --- a/drivers/staging/iio/resolver/ad2s1210.c +++ b/drivers/staging/iio/resolver/ad2s1210.c @@ -657,9 +657,6 @@ static int ad2s1210_probe(struct spi_device *spi) if (!indio_dev) return -ENOMEM; st = iio_priv(indio_dev); - ret = ad2s1210_setup_gpios(st); - if (ret < 0) - return ret;
spi_set_drvdata(spi, indio_dev);
@@ -670,6 +667,10 @@ static int ad2s1210_probe(struct spi_device *spi) st->resolution = 12; st->fexcit = AD2S1210_DEF_EXCIT;
+ ret = ad2s1210_setup_gpios(st); + if (ret < 0) + return ret; + indio_dev->info = &ad2s1210_info; indio_dev->modes = INDIO_DIRECT_MODE; indio_dev->channels = ad2s1210_channels;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Grzeschik m.grzeschik@pengutronix.de
[ Upstream commit 52a39f2cf62bb5430ad1f54cd522dbfdab1d71ba ]
The uvc_video_enable function of the uvc-gadget driver is dequeing and immediately deallocs all requests on its disable codepath. This is not save since the dequeue function is async and does not ensure that the requests are left unlinked in the controller driver.
By adding the ep_free_request into the completion path of the requests we ensure that the request will be properly deallocated.
Signed-off-by: Michael Grzeschik m.grzeschik@pengutronix.de Link: https://lore.kernel.org/r/20230911140530.2995138-3-m.grzeschik@pengutronix.d... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/function/uvc_video.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c index be48d5ab17c7b..95e3611811cd6 100644 --- a/drivers/usb/gadget/function/uvc_video.c +++ b/drivers/usb/gadget/function/uvc_video.c @@ -259,6 +259,12 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req) struct uvc_device *uvc = video->uvc; unsigned long flags;
+ if (uvc->state == UVC_STATE_CONNECTED) { + usb_ep_free_request(video->ep, ureq->req); + ureq->req = NULL; + return; + } + switch (req->status) { case 0: break;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Hung alex.hung@amd.com
[ Upstream commit 58c3b3341cea4f75dc8c003b89f8a6dd8ec55e50 ]
[WHAT] hw_points_num is 0 before ogam LUT is programmed; however, function "dwb3_program_ogam_pwl" assumes hw_points_num is always greater than 0, i.e. substracting it by 1 as an array index.
[HOW] Check hw_points_num is not equal to 0 before using it.
Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Hung alex.hung@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dwb_cm.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dwb_cm.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dwb_cm.c index 701c7d8bc038a..03a50c32fcfe1 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dwb_cm.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dwb_cm.c @@ -243,6 +243,9 @@ static bool dwb3_program_ogam_lut( return false; }
+ if (params->hw_points_num == 0) + return false; + REG_SET(DWB_OGAM_CONTROL, 0, DWB_OGAM_MODE, 2);
current_mode = dwb3_get_ogam_current(dwbc30);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chengfeng Ye dg573847474@gmail.com
[ Upstream commit 058cbee52ccd7be77e373d31a4f14670cfd32018 ]
As &priv->tx_dev.tx_dev_lock is also acquired by xmit callback which could be call from timer under softirq context, use spin_lock_bh() on it to prevent potential deadlock.
hostif_sme_work() --> hostif_sme_set_pmksa() --> hostif_mib_set_request() --> ks_wlan_hw_tx() --> spin_lock(&priv->tx_dev.tx_dev_lock)
ks_wlan_start_xmit() --> hostif_data_request() --> ks_wlan_hw_tx() --> spin_lock(&priv->tx_dev.tx_dev_lock)
Signed-off-by: Chengfeng Ye dg573847474@gmail.com Link: https://lore.kernel.org/r/20230926161323.41928-1-dg573847474@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/ks7010/ks7010_sdio.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/ks7010/ks7010_sdio.c b/drivers/staging/ks7010/ks7010_sdio.c index 9fb118e77a1f0..f1d44e4955fc6 100644 --- a/drivers/staging/ks7010/ks7010_sdio.c +++ b/drivers/staging/ks7010/ks7010_sdio.c @@ -395,9 +395,9 @@ int ks_wlan_hw_tx(struct ks_wlan_private *priv, void *p, unsigned long size, priv->hostt.buff[priv->hostt.qtail] = le16_to_cpu(hdr->event); priv->hostt.qtail = (priv->hostt.qtail + 1) % SME_EVENT_BUFF_SIZE;
- spin_lock(&priv->tx_dev.tx_dev_lock); + spin_lock_bh(&priv->tx_dev.tx_dev_lock); result = enqueue_txdev(priv, p, size, complete_handler, skb); - spin_unlock(&priv->tx_dev.tx_dev_lock); + spin_unlock_bh(&priv->tx_dev.tx_dev_lock);
if (txq_has_space(priv)) queue_delayed_work(priv->wq, &priv->rw_dwork, 0);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chengfeng Ye dg573847474@gmail.com
[ Upstream commit 04d19e65137e3cd4a5004e624c85c762933d115c ]
As &dev->condlock is acquired under irq context along the following call chain from s5p_mfc_irq(), other acquisition of the same lock inside process context or softirq context should disable irq avoid double lock. enc_post_frame_start() seems to be one such function that execute under process context or softirq context.
<deadlock #1>
enc_post_frame_start() --> clear_work_bit() --> spin_loc(&dev->condlock) <interrupt> --> s5p_mfc_irq() --> s5p_mfc_handle_frame() --> clear_work_bit() --> spin_lock(&dev->condlock)
This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock.
To prevent the potential deadlock, the patch change clear_work_bit() inside enc_post_frame_start() to clear_work_bit_irqsave().
Signed-off-by: Chengfeng Ye dg573847474@gmail.com Acked-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/samsung/s5p-mfc/s5p_mfc_enc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/platform/samsung/s5p-mfc/s5p_mfc_enc.c b/drivers/media/platform/samsung/s5p-mfc/s5p_mfc_enc.c index f62703cebb77c..4b4c129c09e70 100644 --- a/drivers/media/platform/samsung/s5p-mfc/s5p_mfc_enc.c +++ b/drivers/media/platform/samsung/s5p-mfc/s5p_mfc_enc.c @@ -1297,7 +1297,7 @@ static int enc_post_frame_start(struct s5p_mfc_ctx *ctx) if (ctx->state == MFCINST_FINISHING && ctx->ref_queue_cnt == 0) src_ready = false; if (!src_ready || ctx->dst_queue_cnt == 0) - clear_work_bit(ctx); + clear_work_bit_irqsave(ctx);
return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit 06a4d0d8c642b5ea654e832b74dca12965356da0 ]
'conf->log' is set with 'reconfig_mutex' grabbed, however, readers are not procted, hence protect it with READ_ONCE/WRITE_ONCE to prevent reading abnormal values.
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Song Liu song@kernel.org Link: https://lore.kernel.org/r/20231010151958.145896-3-yukuai1@huaweicloud.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/raid5-cache.c | 47 +++++++++++++++++++++------------------- 1 file changed, 25 insertions(+), 22 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index eb66d0bfe39d2..4b9585875a669 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -327,8 +327,9 @@ void r5l_wake_reclaim(struct r5l_log *log, sector_t space); void r5c_check_stripe_cache_usage(struct r5conf *conf) { int total_cached; + struct r5l_log *log = READ_ONCE(conf->log);
- if (!r5c_is_writeback(conf->log)) + if (!r5c_is_writeback(log)) return;
total_cached = atomic_read(&conf->r5c_cached_partial_stripes) + @@ -344,7 +345,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf) */ if (total_cached > conf->min_nr_stripes * 1 / 2 || atomic_read(&conf->empty_inactive_list_nr) > 0) - r5l_wake_reclaim(conf->log, 0); + r5l_wake_reclaim(log, 0); }
/* @@ -353,7 +354,9 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf) */ void r5c_check_cached_full_stripe(struct r5conf *conf) { - if (!r5c_is_writeback(conf->log)) + struct r5l_log *log = READ_ONCE(conf->log); + + if (!r5c_is_writeback(log)) return;
/* @@ -363,7 +366,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf) if (atomic_read(&conf->r5c_cached_full_stripes) >= min(R5C_FULL_STRIPE_FLUSH_BATCH(conf), conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf))) - r5l_wake_reclaim(conf->log, 0); + r5l_wake_reclaim(log, 0); }
/* @@ -396,7 +399,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf) */ static sector_t r5c_log_required_to_flush_cache(struct r5conf *conf) { - struct r5l_log *log = conf->log; + struct r5l_log *log = READ_ONCE(conf->log);
if (!r5c_is_writeback(log)) return 0; @@ -449,7 +452,7 @@ static inline void r5c_update_log_state(struct r5l_log *log) void r5c_make_stripe_write_out(struct stripe_head *sh) { struct r5conf *conf = sh->raid_conf; - struct r5l_log *log = conf->log; + struct r5l_log *log = READ_ONCE(conf->log);
BUG_ON(!r5c_is_writeback(log));
@@ -491,7 +494,7 @@ static void r5c_handle_parity_cached(struct stripe_head *sh) */ static void r5c_finish_cache_stripe(struct stripe_head *sh) { - struct r5l_log *log = sh->raid_conf->log; + struct r5l_log *log = READ_ONCE(sh->raid_conf->log);
if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) { BUG_ON(test_bit(STRIPE_R5C_CACHING, &sh->state)); @@ -692,7 +695,7 @@ static void r5c_disable_writeback_async(struct work_struct *work)
/* wait superblock change before suspend */ wait_event(mddev->sb_wait, - conf->log == NULL || + !READ_ONCE(conf->log) || (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) && (locked = mddev_trylock(mddev)))); if (locked) { @@ -1151,7 +1154,7 @@ static void r5l_run_no_space_stripes(struct r5l_log *log) static sector_t r5c_calculate_new_cp(struct r5conf *conf) { struct stripe_head *sh; - struct r5l_log *log = conf->log; + struct r5l_log *log = READ_ONCE(conf->log); sector_t new_cp; unsigned long flags;
@@ -1159,12 +1162,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf) return log->next_checkpoint;
spin_lock_irqsave(&log->stripe_in_journal_lock, flags); - if (list_empty(&conf->log->stripe_in_journal_list)) { + if (list_empty(&log->stripe_in_journal_list)) { /* all stripes flushed */ spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); return log->next_checkpoint; } - sh = list_first_entry(&conf->log->stripe_in_journal_list, + sh = list_first_entry(&log->stripe_in_journal_list, struct stripe_head, r5c); new_cp = sh->log_start; spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); @@ -1399,7 +1402,7 @@ void r5c_flush_cache(struct r5conf *conf, int num) struct stripe_head *sh, *next;
lockdep_assert_held(&conf->device_lock); - if (!conf->log) + if (!READ_ONCE(conf->log)) return;
count = 0; @@ -1420,7 +1423,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
static void r5c_do_reclaim(struct r5conf *conf) { - struct r5l_log *log = conf->log; + struct r5l_log *log = READ_ONCE(conf->log); struct stripe_head *sh; int count = 0; unsigned long flags; @@ -1549,7 +1552,7 @@ static void r5l_reclaim_thread(struct md_thread *thread) { struct mddev *mddev = thread->mddev; struct r5conf *conf = mddev->private; - struct r5l_log *log = conf->log; + struct r5l_log *log = READ_ONCE(conf->log);
if (!log) return; @@ -1589,7 +1592,7 @@ void r5l_quiesce(struct r5l_log *log, int quiesce)
bool r5l_log_disk_error(struct r5conf *conf) { - struct r5l_log *log = conf->log; + struct r5l_log *log = READ_ONCE(conf->log);
/* don't allow write if journal disk is missing */ if (!log) @@ -2633,7 +2636,7 @@ int r5c_try_caching_write(struct r5conf *conf, struct stripe_head_state *s, int disks) { - struct r5l_log *log = conf->log; + struct r5l_log *log = READ_ONCE(conf->log); int i; struct r5dev *dev; int to_cache = 0; @@ -2800,7 +2803,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf, struct stripe_head *sh, struct stripe_head_state *s) { - struct r5l_log *log = conf->log; + struct r5l_log *log = READ_ONCE(conf->log); int i; int do_wakeup = 0; sector_t tree_index; @@ -2939,7 +2942,7 @@ int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh) /* check whether this big stripe is in write back cache. */ bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect) { - struct r5l_log *log = conf->log; + struct r5l_log *log = READ_ONCE(conf->log); sector_t tree_index; void *slot;
@@ -3047,14 +3050,14 @@ int r5l_start(struct r5l_log *log) void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev) { struct r5conf *conf = mddev->private; - struct r5l_log *log = conf->log; + struct r5l_log *log = READ_ONCE(conf->log);
if (!log) return;
if ((raid5_calc_degraded(conf) > 0 || test_bit(Journal, &rdev->flags)) && - conf->log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK) + log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK) schedule_work(&log->disable_writeback_work); }
@@ -3143,7 +3146,7 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev) spin_lock_init(&log->stripe_in_journal_lock); atomic_set(&log->stripe_in_journal_count, 0);
- conf->log = log; + WRITE_ONCE(conf->log, log);
set_bit(MD_HAS_JOURNAL, &conf->mddev->flags); return 0; @@ -3171,7 +3174,7 @@ void r5l_exit_log(struct r5conf *conf) * 'reconfig_mutex' is held by caller, set 'confg->log' to NULL to * ensure disable_writeback_work wakes up and exits. */ - conf->log = NULL; + WRITE_ONCE(conf->log, NULL); wake_up(&conf->mddev->sb_wait); flush_work(&log->disable_writeback_work);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Brauner christian.brauner@ubuntu.com
[ Upstream commit 1c5976ef0f7ad76319df748ccb99a4c7ba2ba464 ]
Currently, registering a new binary type pins the binfmt_misc filesystem. Specifically, this means that as long as there is at least one binary type registered the binfmt_misc filesystem survives all umounts, i.e. the superblock is not destroyed. Meaning that a umount followed by another mount will end up with the same superblock and the same binary type handlers. This is a behavior we tend to discourage for any new filesystems (apart from a few special filesystems such as e.g. configfs or debugfs). A umount operation without the filesystem being pinned - by e.g. someone holding a file descriptor to an open file - should usually result in the destruction of the superblock and all associated resources. This makes introspection easier and leads to clearly defined, simple and clean semantics. An administrator can rely on the fact that a umount will guarantee a clean slate making it possible to reinitialize a filesystem. Right now all binary types would need to be explicitly deleted before that can happen.
This allows us to remove the heavy-handed calls to simple_pin_fs() and simple_release_fs() when creating and deleting binary types. This in turn allows us to replace the current brittle pinning mechanism abusing dget() which has caused a range of bugs judging from prior fixes in [2] and [3]. The additional dget() in load_misc_binary() pins the dentry but only does so for the sake to prevent ->evict_inode() from freeing the node when a user removes the binary type and kill_node() is run. Which would mean ->interpreter and ->interp_file would be freed causing a UAF.
This isn't really nicely documented nor is it very clean because it relies on simple_pin_fs() pinning the filesystem as long as at least one binary type exists. Otherwise it would cause load_misc_binary() to hold on to a dentry belonging to a superblock that has been shutdown. Replace that implicit pinning with a clean and simple per-node refcount and get rid of the ugly dget() pinning. A similar mechanism exists for e.g. binderfs (cf. [4]). All the cleanup work can now be done in ->evict_inode().
In a follow-up patch we will make it possible to use binfmt_misc in sandboxes. We will use the cleaner semantics where a umount for the filesystem will cause the superblock and all resources to be deallocated. In preparation for this apply the same semantics to the initial binfmt_misc mount. Note, that this is a user-visible change and as such a uapi change but one that we can reasonably risk. We've discussed this in earlier versions of this patchset (cf. [1]).
The main user and provider of binfmt_misc is systemd. Systemd provides binfmt_misc via autofs since it is configurable as a kernel module and is used by a few exotic packages and users. As such a binfmt_misc mount is triggered when /proc/sys/fs/binfmt_misc is accessed and is only provided on demand. Other autofs on demand filesystems include EFI ESP which systemd umounts if the mountpoint stays idle for a certain amount of time. This doesn't apply to the binfmt_misc autofs mount which isn't touched once it is mounted meaning this change can't accidently wipe binary type handlers without someone having explicitly unmounted binfmt_misc. After speaking to systemd folks they don't expect this change to affect them.
In line with our general policy, if we see a regression for systemd or other users with this change we will switch back to the old behavior for the initial binfmt_misc mount and have binary types pin the filesystem again. But while we touch this code let's take the chance and let's improve on the status quo.
[1]: https://lore.kernel.org/r/20191216091220.465626-2-laurent@vivier.eu [2]: commit 43a4f2619038 ("exec: binfmt_misc: fix race between load_misc_binary() and kill_node()" [3]: commit 83f918274e4b ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()") [4]: commit f0fe2c0f050d ("binder: prevent UAF for binderfs devices II")
Link: https://lore.kernel.org/r/20211028103114.2849140-1-brauner@kernel.org (v1) Cc: Sargun Dhillon sargun@sargun.me Cc: Serge Hallyn serge@hallyn.com Cc: Jann Horn jannh@google.com Cc: Henning Schild henning.schild@siemens.com Cc: Andrei Vagin avagin@gmail.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Laurent Vivier laurent@vivier.eu Cc: linux-fsdevel@vger.kernel.org Acked-by: Serge Hallyn serge@hallyn.com Signed-off-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- /* v2 */ - Christian Brauner christian.brauner@ubuntu.com: - Add more comments that explain what's going on. - Rename functions while changing them to better reflect what they are doing to make the code easier to understand. - In the first version when a specific binary type handler was removed either through a write to the entry's file or all binary type handlers were removed by a write to the binfmt_misc mount's status file all cleanup work happened during inode eviction. That includes removal of the relevant entries from entry list. While that works fine I disliked that model after thinking about it for a bit. Because it means that there was a window were someone has already removed a or all binary handlers but they could still be safely reached from load_misc_binary() when it has managed to take the read_lock() on the entries list while inode eviction was already happening. Again, that perfectly benign but it's cleaner to remove the binary handler from the list immediately meaning that ones the write to then entry's file or the binfmt_misc status file returns the binary type cannot be executed anymore. That gives stronger guarantees to the user. Signed-off-by: Sasha Levin sashal@kernel.org --- fs/binfmt_misc.c | 216 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 168 insertions(+), 48 deletions(-)
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index bb202ad369d53..740dac1012ae8 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -60,12 +60,11 @@ typedef struct { char *name; struct dentry *dentry; struct file *interp_file; + refcount_t users; /* sync removal with load_misc_binary() */ } Node;
static DEFINE_RWLOCK(entries_lock); static struct file_system_type bm_fs_type; -static struct vfsmount *bm_mnt; -static int entry_count;
/* * Max length of the register string. Determined by: @@ -82,19 +81,23 @@ static int entry_count; */ #define MAX_REGISTER_LENGTH 1920
-/* - * Check if we support the binfmt - * if we do, return the node, else NULL - * locking is done in load_misc_binary +/** + * search_binfmt_handler - search for a binary handler for @bprm + * @misc: handle to binfmt_misc instance + * @bprm: binary for which we are looking for a handler + * + * Search for a binary type handler for @bprm in the list of registered binary + * type handlers. + * + * Return: binary type list entry on success, NULL on failure */ -static Node *check_file(struct linux_binprm *bprm) +static Node *search_binfmt_handler(struct linux_binprm *bprm) { char *p = strrchr(bprm->interp, '.'); - struct list_head *l; + Node *e;
/* Walk all the registered handlers. */ - list_for_each(l, &entries) { - Node *e = list_entry(l, Node, list); + list_for_each_entry(e, &entries, list) { char *s; int j;
@@ -123,9 +126,49 @@ static Node *check_file(struct linux_binprm *bprm) if (j == e->size) return e; } + return NULL; }
+/** + * get_binfmt_handler - try to find a binary type handler + * @misc: handle to binfmt_misc instance + * @bprm: binary for which we are looking for a handler + * + * Try to find a binfmt handler for the binary type. If one is found take a + * reference to protect against removal via bm_{entry,status}_write(). + * + * Return: binary type list entry on success, NULL on failure + */ +static Node *get_binfmt_handler(struct linux_binprm *bprm) +{ + Node *e; + + read_lock(&entries_lock); + e = search_binfmt_handler(bprm); + if (e) + refcount_inc(&e->users); + read_unlock(&entries_lock); + return e; +} + +/** + * put_binfmt_handler - put binary handler node + * @e: node to put + * + * Free node syncing with load_misc_binary() and defer final free to + * load_misc_binary() in case it is using the binary type handler we were + * requested to remove. + */ +static void put_binfmt_handler(Node *e) +{ + if (refcount_dec_and_test(&e->users)) { + if (e->flags & MISC_FMT_OPEN_FILE) + filp_close(e->interp_file, NULL); + kfree(e); + } +} + /* * the loader itself */ @@ -139,12 +182,7 @@ static int load_misc_binary(struct linux_binprm *bprm) if (!enabled) return retval;
- /* to keep locking time low, we copy the interpreter string */ - read_lock(&entries_lock); - fmt = check_file(bprm); - if (fmt) - dget(fmt->dentry); - read_unlock(&entries_lock); + fmt = get_binfmt_handler(bprm); if (!fmt) return retval;
@@ -198,7 +236,16 @@ static int load_misc_binary(struct linux_binprm *bprm)
retval = 0; ret: - dput(fmt->dentry); + + /* + * If we actually put the node here all concurrent calls to + * load_misc_binary() will have finished. We also know + * that for the refcount to be zero ->evict_inode() must have removed + * the node to be deleted from the list. All that is left for us is to + * close and free. + */ + put_binfmt_handler(fmt); + return retval; }
@@ -553,30 +600,90 @@ static struct inode *bm_get_inode(struct super_block *sb, int mode) return inode; }
+/** + * bm_evict_inode - cleanup data associated with @inode + * @inode: inode to which the data is attached + * + * Cleanup the binary type handler data associated with @inode if a binary type + * entry is removed or the filesystem is unmounted and the super block is + * shutdown. + * + * If the ->evict call was not caused by a super block shutdown but by a write + * to remove the entry or all entries via bm_{entry,status}_write() the entry + * will have already been removed from the list. We keep the list_empty() check + * to make that explicit. +*/ static void bm_evict_inode(struct inode *inode) { Node *e = inode->i_private;
- if (e && e->flags & MISC_FMT_OPEN_FILE) - filp_close(e->interp_file, NULL); - clear_inode(inode); - kfree(e); + + if (e) { + write_lock(&entries_lock); + if (!list_empty(&e->list)) + list_del_init(&e->list); + write_unlock(&entries_lock); + put_binfmt_handler(e); + } }
-static void kill_node(Node *e) +/** + * unlink_binfmt_dentry - remove the dentry for the binary type handler + * @dentry: dentry associated with the binary type handler + * + * Do the actual filesystem work to remove a dentry for a registered binary + * type handler. Since binfmt_misc only allows simple files to be created + * directly under the root dentry of the filesystem we ensure that we are + * indeed passed a dentry directly beneath the root dentry, that the inode + * associated with the root dentry is locked, and that it is a regular file we + * are asked to remove. + */ +static void unlink_binfmt_dentry(struct dentry *dentry) { - struct dentry *dentry; + struct dentry *parent = dentry->d_parent; + struct inode *inode, *parent_inode; + + /* All entries are immediate descendants of the root dentry. */ + if (WARN_ON_ONCE(dentry->d_sb->s_root != parent)) + return;
+ /* We only expect to be called on regular files. */ + inode = d_inode(dentry); + if (WARN_ON_ONCE(!S_ISREG(inode->i_mode))) + return; + + /* The parent inode must be locked. */ + parent_inode = d_inode(parent); + if (WARN_ON_ONCE(!inode_is_locked(parent_inode))) + return; + + if (simple_positive(dentry)) { + dget(dentry); + simple_unlink(parent_inode, dentry); + d_delete(dentry); + dput(dentry); + } +} + +/** + * remove_binfmt_handler - remove a binary type handler + * @misc: handle to binfmt_misc instance + * @e: binary type handler to remove + * + * Remove a binary type handler from the list of binary type handlers and + * remove its associated dentry. This is called from + * binfmt_{entry,status}_write(). In the future, we might want to think about + * adding a proper ->unlink() method to binfmt_misc instead of forcing caller's + * to use writes to files in order to delete binary type handlers. But it has + * worked for so long that it's not a pressing issue. + */ +static void remove_binfmt_handler(Node *e) +{ write_lock(&entries_lock); list_del_init(&e->list); write_unlock(&entries_lock); - - dentry = e->dentry; - drop_nlink(d_inode(dentry)); - d_drop(dentry); - dput(dentry); - simple_release_fs(&bm_mnt, &entry_count); + unlink_binfmt_dentry(e->dentry); }
/* /<entry> */ @@ -603,8 +710,8 @@ bm_entry_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) static ssize_t bm_entry_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos) { - struct dentry *root; - Node *e = file_inode(file)->i_private; + struct inode *inode = file_inode(file); + Node *e = inode->i_private; int res = parse_command(buffer, count);
switch (res) { @@ -618,13 +725,22 @@ static ssize_t bm_entry_write(struct file *file, const char __user *buffer, break; case 3: /* Delete this handler. */ - root = file_inode(file)->i_sb->s_root; - inode_lock(d_inode(root)); + inode = d_inode(inode->i_sb->s_root); + inode_lock(inode);
+ /* + * In order to add new element or remove elements from the list + * via bm_{entry,register,status}_write() inode_lock() on the + * root inode must be held. + * The lock is exclusive ensuring that the list can't be + * modified. Only load_misc_binary() can access but does so + * read-only. So we only need to take the write lock when we + * actually remove the entry from the list. + */ if (!list_empty(&e->list)) - kill_node(e); + remove_binfmt_handler(e);
- inode_unlock(d_inode(root)); + inode_unlock(inode); break; default: return res; @@ -683,13 +799,7 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer, if (!inode) goto out2;
- err = simple_pin_fs(&bm_fs_type, &bm_mnt, &entry_count); - if (err) { - iput(inode); - inode = NULL; - goto out2; - } - + refcount_set(&e->users, 1); e->dentry = dget(dentry); inode->i_private = e; inode->i_fop = &bm_entry_operations; @@ -733,7 +843,8 @@ static ssize_t bm_status_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos) { int res = parse_command(buffer, count); - struct dentry *root; + Node *e, *next; + struct inode *inode;
switch (res) { case 1: @@ -746,13 +857,22 @@ static ssize_t bm_status_write(struct file *file, const char __user *buffer, break; case 3: /* Delete all handlers. */ - root = file_inode(file)->i_sb->s_root; - inode_lock(d_inode(root)); + inode = d_inode(file_inode(file)->i_sb->s_root); + inode_lock(inode);
- while (!list_empty(&entries)) - kill_node(list_first_entry(&entries, Node, list)); + /* + * In order to add new element or remove elements from the list + * via bm_{entry,register,status}_write() inode_lock() on the + * root inode must be held. + * The lock is exclusive ensuring that the list can't be + * modified. Only load_misc_binary() can access but does so + * read-only. So we only need to take the write lock when we + * actually remove the entry from the list. + */ + list_for_each_entry_safe(e, next, &entries, list) + remove_binfmt_handler(e);
- inode_unlock(d_inode(root)); + inode_unlock(inode); break; default: return res;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikko Perttunen mperttunen@nvidia.com
[ Upstream commit 3868ff006b572cf501a3327832d36c64a9eca86a ]
UBSAN reports an invalid load for bool, as the iosys_map is read later without being initialized. Zero-initialize it to avoid this.
Reported-by: Ashish Mhetre amhetre@nvidia.com Signed-off-by: Mikko Perttunen mperttunen@nvidia.com Signed-off-by: Thierry Reding treding@nvidia.com Link: https://patchwork.freedesktop.org/patch/msgid/20230901115910.701518-2-cyndis... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/tegra/gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c index 81991090adcc9..cd06f25499549 100644 --- a/drivers/gpu/drm/tegra/gem.c +++ b/drivers/gpu/drm/tegra/gem.c @@ -175,7 +175,7 @@ static void tegra_bo_unpin(struct host1x_bo_mapping *map) static void *tegra_bo_mmap(struct host1x_bo *bo) { struct tegra_bo *obj = host1x_to_tegra_bo(bo); - struct iosys_map map; + struct iosys_map map = { 0 }; int ret;
if (obj->vaddr) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 51b74c09ac8c5862007fc2bf0d465529d06dd446 ]
'pd' can be NULL, and in that case it shouldn't be passed to PTR_ERR. Fixes a smatch warning:
drivers/media/platform/qcom/venus/pm_helpers.c:873 vcodec_domains_get() warn: passing zero to 'PTR_ERR'
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Reviewed-by: Bryan O'Donoghue bryan.odonoghue@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/qcom/venus/pm_helpers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/platform/qcom/venus/pm_helpers.c b/drivers/media/platform/qcom/venus/pm_helpers.c index 48c9084bb4dba..a1b127caa90a7 100644 --- a/drivers/media/platform/qcom/venus/pm_helpers.c +++ b/drivers/media/platform/qcom/venus/pm_helpers.c @@ -870,7 +870,7 @@ static int vcodec_domains_get(struct venus_core *core) pd = dev_pm_domain_attach_by_name(dev, res->vcodec_pmdomains[i]); if (IS_ERR_OR_NULL(pd)) - return PTR_ERR(pd) ? : -ENODATA; + return pd ? PTR_ERR(pd) : -ENODATA; core->pmdomains[i] = pd; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 0b149cee836aa53989ea089af1cb9d90d7c6ac9e ]
If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we shouldn't access the sshdr. If it returns 0, then the cmd executed successfully, so there is no need to check the sshdr. This has us access the sshdr when we get a return value > 0.
Signed-off-by: Mike Christie michael.christie@oracle.com Link: https://lore.kernel.org/r/20231004210013.5601-7-michael.christie@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: John Garry john.g.garry@oracle.com Reviewed-by: Martin Wilck mwilck@suse.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_transport_spi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/scsi_transport_spi.c b/drivers/scsi/scsi_transport_spi.c index f569cf0095c28..a95a35635c333 100644 --- a/drivers/scsi/scsi_transport_spi.c +++ b/drivers/scsi/scsi_transport_spi.c @@ -677,10 +677,10 @@ spi_dv_device_echo_buffer(struct scsi_device *sdev, u8 *buffer, for (r = 0; r < retries; r++) { result = spi_execute(sdev, spi_write_buffer, DMA_TO_DEVICE, buffer, len, &sshdr); - if(result || !scsi_device_online(sdev)) { + if (result || !scsi_device_online(sdev)) {
scsi_device_set_state(sdev, SDEV_QUIESCE); - if (scsi_sense_valid(&sshdr) + if (result > 0 && scsi_sense_valid(&sshdr) && sshdr.sense_key == ILLEGAL_REQUEST /* INVALID FIELD IN CDB */ && sshdr.asc == 0x24 && sshdr.ascq == 0x00)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
[ Upstream commit 2d8d7990619878a848b1d916c2f936d3012ee17d ]
Add a missing initialization of variable ap in setattr_chown(). Without, chown() may be able to bypass quotas.
Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 23e6962cdd6e3..04fc3e72a96e4 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -1907,7 +1907,7 @@ static int setattr_chown(struct inode *inode, struct iattr *attr) kuid_t ouid, nuid; kgid_t ogid, ngid; int error; - struct gfs2_alloc_parms ap; + struct gfs2_alloc_parms ap = {};
ouid = inode->i_uid; ogid = inode->i_gid;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miri Korenblit miriam.rachel.korenblit@intel.com
[ Upstream commit 3c6a0b1f0add72e7f522bc9145222b86d0a7712a ]
In RFKILL we first set the RFKILL bit, then we abort scan (if one exists) by waiting for the notification from FW and notifying mac80211. And then we stop the device. But in case we have a scan ongoing in the period of time between rfkill on and before the device is stopped - we will not wait for the FW notification because of the iwl_mvm_is_radio_killed() condition, and then the scan_status and uid_status are misconfigured, (scan_status is cleared but uid_status not) and when the notification suddenly arrives (before stopping the device) we will get into the assert about scan_status and uid_status mismatch. Fix this by waiting for FW notif when rfkill is on but the device isn't disabled yet.
Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Signed-off-by: Gregory Greenman gregory.greenman@intel.com Link: https://lore.kernel.org/r/20231004123422.c43b69aa2c77.Icc7b5efb47974d6f49915... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 069bac72117fe..b58441c2af730 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -3226,7 +3226,7 @@ int iwl_mvm_scan_stop(struct iwl_mvm *mvm, int type, bool notify) if (!(mvm->scan_status & type)) return 0;
- if (iwl_mvm_is_radio_killed(mvm)) { + if (!test_bit(STATUS_DEVICE_ENABLED, &mvm->trans->status)) { ret = 0; goto out; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mukesh Sisodiya mukesh.sisodiya@intel.com
[ Upstream commit 048449fc666d736a1a17d950fde0b5c5c8fd10cc ]
During debugfs command handling transport function is used directly, this bypasses the locking used by runtime operation function and leads to a kernel warning when two commands are sent in parallel.
Fix it by using runtime operations function when sending debugfs command.
Signed-off-by: Mukesh Sisodiya mukesh.sisodiya@intel.com Signed-off-by: Gregory Greenman gregory.greenman@intel.com Link: https://lore.kernel.org/r/20231004123422.4f80ac90658a.Ia1dfa1195c919f3002fe0... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/fw/debugfs.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/debugfs.c b/drivers/net/wireless/intel/iwlwifi/fw/debugfs.c index 607e07ed2477c..7d4340c56628a 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/debugfs.c +++ b/drivers/net/wireless/intel/iwlwifi/fw/debugfs.c @@ -163,7 +163,11 @@ static int iwl_dbgfs_enabled_severities_write(struct iwl_fw_runtime *fwrt,
event_cfg.enabled_severities = cpu_to_le32(enabled_severities);
- ret = iwl_trans_send_cmd(fwrt->trans, &hcmd); + if (fwrt->ops && fwrt->ops->send_hcmd) + ret = fwrt->ops->send_hcmd(fwrt->ops_ctx, &hcmd); + else + ret = -EPERM; + IWL_INFO(fwrt, "sent host event cfg with enabled_severities: %u, ret: %d\n", enabled_severities, ret);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gustavo A. R. Silva gustavoars@kernel.org
[ Upstream commit 397d887c1601a71e8a8abdb6beea67d58f0472d3 ]
In order to gain the bounds-checking coverage that __counted_by provides to flexible-array members at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions), we must make sure that the counter member, in this particular case `num`, is updated before the first access to the flex-array member, in this particular case array `hws`. See below:
commit f316cdff8d67 ("clk: Annotate struct clk_hw_onecell_data with __counted_by") introduced `__counted_by` for `struct clk_hw_onecell_data` together with changes to relocate some of assignments of counter `num` before `hws` is accessed:
include/linux/clk-provider.h: 1380 struct clk_hw_onecell_data { 1381 unsigned int num; 1382 struct clk_hw *hws[] __counted_by(num); 1383 };
However, this structure is used as a member in other structs, in this case in `struct visconti_pll_provider`:
drivers/clk/visconti/pll.h: 16 struct visconti_pll_provider { 17 void __iomem *reg_base; 18 struct device_node *node; 19 20 /* Must be last */ 21 struct clk_hw_onecell_data clk_data; 22 };
Hence, we need to move the assignments to `ctx->clk_data.num` after allocation for `struct visconti_pll_provider` and before accessing the flexible array `ctx->clk_data.hws`. And, as assignments for all members in `struct visconti_pll_provider` are originally adjacent to each other, relocate all assignments together, so we don't split up `ctx->clk_data.hws = nr_plls` from the rest. :)
Reviewed-by: Kees Cook keescook@chromium.org Acked-by: Nobuhiro Iwamatsu nobuhiro1.iwamatsu@toshiba.co.jp Signed-off-by: Gustavo A. R. Silva gustavoars@kernel.org Link: https://lore.kernel.org/r/e3189f3e40e8723b6d794fb2260e2e9ab6b960bd.169749289... Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/visconti/pll.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/clk/visconti/pll.c b/drivers/clk/visconti/pll.c index 1f3234f226674..e9cd80e085dc3 100644 --- a/drivers/clk/visconti/pll.c +++ b/drivers/clk/visconti/pll.c @@ -329,12 +329,12 @@ struct visconti_pll_provider * __init visconti_init_pll(struct device_node *np, if (!ctx) return ERR_PTR(-ENOMEM);
- for (i = 0; i < nr_plls; ++i) - ctx->clk_data.hws[i] = ERR_PTR(-ENOENT); - ctx->node = np; ctx->reg_base = base; ctx->clk_data.num = nr_plls;
+ for (i = 0; i < nr_plls; ++i) + ctx->clk_data.hws[i] = ERR_PTR(-ENOENT); + return ctx; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chengfeng Ye dg573847474@gmail.com
[ Upstream commit 2f19c4b8395ccb6eb25ccafee883c8cfbe3fc193 ]
handle_receive_interrupt_napi_sp() running inside interrupt handler could introduce inverse lock ordering between &dd->irq_src_lock and &dd->uctxt_lock, if read_mod_write() is preempted by the isr.
[CPU0] | [CPU1] hfi1_ipoib_dev_open() | --> hfi1_netdev_enable_queues() | --> enable_queues(rx) | --> hfi1_rcvctrl() | --> set_intr_bits() | --> read_mod_write() | --> spin_lock(&dd->irq_src_lock) | | hfi1_poll() | --> poll_next() | --> spin_lock_irq(&dd->uctxt_lock) | | --> hfi1_rcvctrl() | --> set_intr_bits() | --> read_mod_write() | --> spin_lock(&dd->irq_src_lock) <interrupt> | --> handle_receive_interrupt_napi_sp() | --> set_all_fastpath() | --> hfi1_rcd_get_by_index() | --> spin_lock_irqsave(&dd->uctxt_lock) |
This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock.
To prevent the potential deadlock, the patch use spin_lock_irqsave() on &dd->irq_src_lock inside read_mod_write() to prevent the possible deadlock scenario.
Signed-off-by: Chengfeng Ye dg573847474@gmail.com Link: https://lore.kernel.org/r/20230926101116.2797-1-dg573847474@gmail.com Acked-by: Dennis Dalessandro dennis.dalessandro@cornelisnetworks.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hfi1/chip.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c index 194cac40da653..c560552244ae8 100644 --- a/drivers/infiniband/hw/hfi1/chip.c +++ b/drivers/infiniband/hw/hfi1/chip.c @@ -13183,15 +13183,16 @@ static void read_mod_write(struct hfi1_devdata *dd, u16 src, u64 bits, { u64 reg; u16 idx = src / BITS_PER_REGISTER; + unsigned long flags;
- spin_lock(&dd->irq_src_lock); + spin_lock_irqsave(&dd->irq_src_lock, flags); reg = read_csr(dd, CCE_INT_MASK + (8 * idx)); if (set) reg |= bits; else reg &= ~bits; write_csr(dd, CCE_INT_MASK + (8 * idx), reg); - spin_unlock(&dd->irq_src_lock); + spin_unlock_irqrestore(&dd->irq_src_lock, flags); }
/**
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antoniu Miclaus antoniu.miclaus@analog.com
[ Upstream commit 10b02902048737f376104bc69e5212466e65a542 ]
Do not allow setting shunt resistor to 0. This results in a division by zero when performing current value computations based on input voltages and connected resistor values.
Signed-off-by: Antoniu Miclaus antoniu.miclaus@analog.com Link: https://lore.kernel.org/r/20231011135754.13508-1-antoniu.miclaus@analog.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/ltc2992.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/hwmon/ltc2992.c b/drivers/hwmon/ltc2992.c index d88e883c7492c..984748f36594d 100644 --- a/drivers/hwmon/ltc2992.c +++ b/drivers/hwmon/ltc2992.c @@ -875,8 +875,12 @@ static int ltc2992_parse_dt(struct ltc2992_state *st) }
ret = fwnode_property_read_u32(child, "shunt-resistor-micro-ohms", &val); - if (!ret) + if (!ret) { + if (!val) + return dev_err_probe(&st->client->dev, -EINVAL, + "shunt resistor value cannot be zero\n"); st->r_sense_uohm[addr] = val; + } }
return 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miguel Ojeda ojeda@kernel.org
[ Upstream commit 7cd6a3e1f94bab4f2a3425e06f70ab13eb8190d4 ]
In order to match the version string, `sed` is used in a couple cases, and `grep` and `head` in a couple others.
Make the script more consistent and easier to understand by using the same method, `sed`, for all of them.
This makes the version matching also a bit more strict for the changed cases, since the strings `rustc ` and `bindgen ` will now be required, which should be fine since `rustc` complains if one attempts to call it with another program name, and `bindgen` uses a hardcoded string.
In addition, clarify why one of the existing `sed` commands does not provide an address like the others.
Reviewed-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Masahiro Yamada masahiroy@kernel.org Reviewed-by: Martin Rodriguez Reboredo yakoyoku@gmail.com Link: https://lore.kernel.org/r/20230616001631.463536-9-ojeda@kernel.org Signed-off-by: Miguel Ojeda ojeda@kernel.org Stable-dep-of: 5ce86c6c8613 ("rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT") Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/rust_is_available.sh | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/scripts/rust_is_available.sh b/scripts/rust_is_available.sh index 7a925d2b20fc7..db4519945f534 100755 --- a/scripts/rust_is_available.sh +++ b/scripts/rust_is_available.sh @@ -40,8 +40,7 @@ fi # Non-stable and distributions' versions may have a version suffix, e.g. `-dev`. rust_compiler_version=$( \ LC_ALL=C "$RUSTC" --version 2>/dev/null \ - | head -n 1 \ - | grep -oE '[0-9]+.[0-9]+.[0-9]+' \ + | sed -nE '1s:.*rustc ([0-9]+.[0-9]+.[0-9]+).*:\1:p' ) rust_compiler_min_version=$($min_tool_version rustc) rust_compiler_cversion=$(get_canonical_version $rust_compiler_version) @@ -67,8 +66,7 @@ fi # Non-stable and distributions' versions may have a version suffix, e.g. `-dev`. rust_bindings_generator_version=$( \ LC_ALL=C "$BINDGEN" --version 2>/dev/null \ - | head -n 1 \ - | grep -oE '[0-9]+.[0-9]+.[0-9]+' \ + | sed -nE '1s:.*bindgen ([0-9]+.[0-9]+.[0-9]+).*:\1:p' ) rust_bindings_generator_min_version=$($min_tool_version bindgen) rust_bindings_generator_cversion=$(get_canonical_version $rust_bindings_generator_version) @@ -110,6 +108,9 @@ fi
# `bindgen` returned successfully, thus use the output to check that the version # of the `libclang` found by the Rust bindings generator is suitable. +# +# Unlike other version checks, note that this one does not necessarily appear +# in the first line of the output, thus no `sed` address is provided. bindgen_libclang_version=$( \ echo "$bindgen_libclang_output" \ | sed -nE 's:.*clang version ([0-9]+.[0-9]+.[0-9]+).*:\1:p'
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miguel Ojeda ojeda@kernel.org
[ Upstream commit f295522886a4ebb628cadb2cd74d0661d6292978 ]
The script already checks if `$RUSTC` and `$BINDGEN` exists via `command`, but the environment variables may point to a non-executable file, or the programs may fail for some other reason. While the script successfully exits with a failure as it should, the error given can be quite confusing depending on the shell and the behavior of its `command`. For instance, with `dash`:
$ RUSTC=./mm BINDGEN=bindgen CC=clang scripts/rust_is_available.sh scripts/rust_is_available.sh: 19: arithmetic expression: expecting primary: "100000 * + 100 * + "
Thus detect failure exit codes when calling `$RUSTC` and `$BINDGEN` and print a better message, in a similar way to what we do when extracting the `libclang` version found by `bindgen`.
Link: https://lore.kernel.org/rust-for-linux/CAK7LNAQYk6s11MASRHW6oxtkqF00EJVqhHOP... Reviewed-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Martin Rodriguez Reboredo yakoyoku@gmail.com Link: https://lore.kernel.org/r/20230616001631.463536-10-ojeda@kernel.org Signed-off-by: Miguel Ojeda ojeda@kernel.org Stable-dep-of: 5ce86c6c8613 ("rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT") Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/rust_is_available.sh | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/scripts/rust_is_available.sh b/scripts/rust_is_available.sh index db4519945f534..1c9081d9dbea7 100755 --- a/scripts/rust_is_available.sh +++ b/scripts/rust_is_available.sh @@ -38,8 +38,20 @@ fi # Check that the Rust compiler version is suitable. # # Non-stable and distributions' versions may have a version suffix, e.g. `-dev`. +rust_compiler_output=$( \ + LC_ALL=C "$RUSTC" --version 2>/dev/null +) || rust_compiler_code=$? +if [ -n "$rust_compiler_code" ]; then + echo >&2 "***" + echo >&2 "*** Running '$RUSTC' to check the Rust compiler version failed with" + echo >&2 "*** code $rust_compiler_code. See output and docs below for details:" + echo >&2 "***" + echo >&2 "$rust_compiler_output" + echo >&2 "***" + exit 1 +fi rust_compiler_version=$( \ - LC_ALL=C "$RUSTC" --version 2>/dev/null \ + echo "$rust_compiler_output" \ | sed -nE '1s:.*rustc ([0-9]+.[0-9]+.[0-9]+).*:\1:p' ) rust_compiler_min_version=$($min_tool_version rustc) @@ -64,8 +76,20 @@ fi # Check that the Rust bindings generator is suitable. # # Non-stable and distributions' versions may have a version suffix, e.g. `-dev`. +rust_bindings_generator_output=$( \ + LC_ALL=C "$BINDGEN" --version 2>/dev/null +) || rust_bindings_generator_code=$? +if [ -n "$rust_bindings_generator_code" ]; then + echo >&2 "***" + echo >&2 "*** Running '$BINDGEN' to check the Rust bindings generator version failed with" + echo >&2 "*** code $rust_bindings_generator_code. See output and docs below for details:" + echo >&2 "***" + echo >&2 "$rust_bindings_generator_output" + echo >&2 "***" + exit 1 +fi rust_bindings_generator_version=$( \ - LC_ALL=C "$BINDGEN" --version 2>/dev/null \ + echo "$rust_bindings_generator_output" \ | sed -nE '1s:.*bindgen ([0-9]+.[0-9]+.[0-9]+).*:\1:p' ) rust_bindings_generator_min_version=$($min_tool_version bindgen)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miguel Ojeda ojeda@kernel.org
[ Upstream commit 9e98db17837093cb0f4dcfcc3524739d93249c45 ]
`bindgen` 0.69.0 contains a bug: `--version` does not work without providing a header [1]:
error: the following required arguments were not provided: <HEADER>
Usage: bindgen <FLAGS> <OPTIONS> <HEADER> -- <CLANG_ARGS>...
Thus, in preparation for supporting several `bindgen` versions, work around the issue by passing a dummy argument.
Include a comment so that we can remove the workaround in the future.
Link: https://github.com/rust-lang/rust-bindgen/pull/2678 [1] Reviewed-by: Finn Behrens me@kloenk.dev Tested-by: Benno Lossin benno.lossin@proton.me Tested-by: Andreas Hindborg a.hindborg@samsung.com Link: https://lore.kernel.org/r/20240709160615.998336-9-ojeda@kernel.org Signed-off-by: Miguel Ojeda ojeda@kernel.org Stable-dep-of: 5ce86c6c8613 ("rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT") Signed-off-by: Sasha Levin sashal@kernel.org --- init/Kconfig | 5 ++++- scripts/rust_is_available.sh | 6 +++++- 2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index 4cd3fc82b09e5..19de862768239 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1947,7 +1947,10 @@ config RUSTC_VERSION_TEXT config BINDGEN_VERSION_TEXT string depends on RUST - default $(shell,command -v $(BINDGEN) >/dev/null 2>&1 && $(BINDGEN) --version || echo n) + # The dummy parameter `workaround-for-0.69.0` is required to support 0.69.0 + # (https://github.com/rust-lang/rust-bindgen/pull/2678). It can be removed when + # the minimum version is upgraded past that (0.69.1 already fixed the issue). + default $(shell,command -v $(BINDGEN) >/dev/null 2>&1 && $(BINDGEN) --version workaround-for-0.69.0 || echo n)
# # Place an empty function call at each tracepoint site. Can be diff --git a/scripts/rust_is_available.sh b/scripts/rust_is_available.sh index 1c9081d9dbea7..141644c164636 100755 --- a/scripts/rust_is_available.sh +++ b/scripts/rust_is_available.sh @@ -76,8 +76,12 @@ fi # Check that the Rust bindings generator is suitable. # # Non-stable and distributions' versions may have a version suffix, e.g. `-dev`. +# +# The dummy parameter `workaround-for-0.69.0` is required to support 0.69.0 +# (https://github.com/rust-lang/rust-bindgen/pull/2678). It can be removed when +# the minimum version is upgraded past that (0.69.1 already fixed the issue). rust_bindings_generator_output=$( \ - LC_ALL=C "$BINDGEN" --version 2>/dev/null + LC_ALL=C "$BINDGEN" --version workaround-for-0.69.0 2>/dev/null ) || rust_bindings_generator_code=$? if [ -n "$rust_bindings_generator_code" ]; then echo >&2 "***"
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit 5ce86c6c861352c9346ebb5c96ed70cb67414aa3 ]
While this is a somewhat unusual case, I encountered odd error messages when I ran Kconfig in a foreign architecture chroot.
$ make allmodconfig sh: 1: rustc: not found sh: 1: bindgen: not found # # configuration written to .config #
The successful execution of 'command -v rustc' does not necessarily mean that 'rustc --version' will succeed.
$ sh -c 'command -v rustc' /home/masahiro/.cargo/bin/rustc $ sh -c 'rustc --version' sh: 1: rustc: not found
Here, 'rustc' is built for x86, and I ran it in an arm64 system.
The current code:
command -v $(RUSTC) >/dev/null 2>&1 && $(RUSTC) --version || echo n
can be turned into:
command -v $(RUSTC) >/dev/null 2>&1 && $(RUSTC) --version 2>/dev/null || echo n
However, I did not understand the necessity of 'command -v $(RUSTC)'.
I simplified it to:
$(RUSTC) --version 2>/dev/null || echo n
Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support") Signed-off-by: Masahiro Yamada masahiroy@kernel.org Link: https://lore.kernel.org/r/20240727140302.1806011-1-masahiroy@kernel.org [ Rebased on top of v6.11-rc1. - Miguel ] Signed-off-by: Miguel Ojeda ojeda@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- init/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index 19de862768239..85e8bf76aeccb 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1942,7 +1942,7 @@ config RUST config RUSTC_VERSION_TEXT string depends on RUST - default $(shell,command -v $(RUSTC) >/dev/null 2>&1 && $(RUSTC) --version || echo n) + default $(shell,$(RUSTC) --version 2>/dev/null || echo n)
config BINDGEN_VERSION_TEXT string @@ -1950,7 +1950,7 @@ config BINDGEN_VERSION_TEXT # The dummy parameter `workaround-for-0.69.0` is required to support 0.69.0 # (https://github.com/rust-lang/rust-bindgen/pull/2678). It can be removed when # the minimum version is upgraded past that (0.69.1 already fixed the issue). - default $(shell,command -v $(BINDGEN) >/dev/null 2>&1 && $(BINDGEN) --version workaround-for-0.69.0 || echo n) + default $(shell,$(BINDGEN) --version workaround-for-0.69.0 2>/dev/null || echo n)
# # Place an empty function call at each tracepoint site. Can be
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit aacf93e87f0d808ef46e621aa56caea336b4433c ]
Another oddity in these config entries is their default value can fall back to 'n', which is a value for bool or tristate symbols.
The '|| echo n' is an incorrect workaround to avoid the syntax error. This is not a big deal, as the entry is hidden by 'depends on RUST' in situations where '$(RUSTC) --version' or '$(BINDGEN) --version' fails. Anyway, it looks odd.
The default of a string type symbol should be a double-quoted string literal. Turn it into an empty string when the version command fails.
Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support") Signed-off-by: Masahiro Yamada masahiroy@kernel.org Link: https://lore.kernel.org/r/20240727140302.1806011-2-masahiroy@kernel.org [ Rebased on top of v6.11-rc1. - Miguel ] Signed-off-by: Miguel Ojeda ojeda@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- init/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index 85e8bf76aeccb..2825c8cfde3b5 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1942,7 +1942,7 @@ config RUST config RUSTC_VERSION_TEXT string depends on RUST - default $(shell,$(RUSTC) --version 2>/dev/null || echo n) + default "$(shell,$(RUSTC) --version 2>/dev/null)"
config BINDGEN_VERSION_TEXT string @@ -1950,7 +1950,7 @@ config BINDGEN_VERSION_TEXT # The dummy parameter `workaround-for-0.69.0` is required to support 0.69.0 # (https://github.com/rust-lang/rust-bindgen/pull/2678). It can be removed when # the minimum version is upgraded past that (0.69.1 already fixed the issue). - default $(shell,$(BINDGEN) --version workaround-for-0.69.0 2>/dev/null || echo n) + default "$(shell,$(BINDGEN) --version workaround-for-0.69.0 2>/dev/null)"
# # Place an empty function call at each tracepoint site. Can be
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Samuel Holland samuel.holland@sifive.com
[ Upstream commit f75c235565f90c4a17b125e47f1c68ef6b8c2bce ]
Currently, kasan_init_sw_tags() is called before setup_per_cpu_areas(), so per_cpu(prng_state, cpu) accesses the same address regardless of the value of "cpu", and the same seed value gets copied to the percpu area for every CPU. Fix this by moving the call to smp_prepare_boot_cpu(), which is the first architecture hook after setup_per_cpu_areas().
Fixes: 3c9e3aa11094 ("kasan: add tag related helper functions") Fixes: 3f41b6093823 ("kasan: fix random seed generation for tag-based mode") Signed-off-by: Samuel Holland samuel.holland@sifive.com Reviewed-by: Andrey Konovalov andreyknvl@gmail.com Link: https://lore.kernel.org/r/20240814091005.969756-1-samuel.holland@sifive.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kernel/setup.c | 3 --- arch/arm64/kernel/smp.c | 2 ++ 2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index fea3223704b63..44c4d79bd914c 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -360,9 +360,6 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p) smp_init_cpus(); smp_build_mpidr_hash();
- /* Init percpu seeds for random tags after cpus are set up. */ - kasan_init_sw_tags(); - #ifdef CONFIG_ARM64_SW_TTBR0_PAN /* * Make sure init_thread_info.ttbr0 always generates translation diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index d323621d14a59..b606093a5596c 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -464,6 +464,8 @@ void __init smp_prepare_boot_cpu(void) init_gic_priority_masking();
kasan_init_hw_tags(); + /* Init percpu seeds for random tags after cpus are set up. */ + kasan_init_sw_tags(); }
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li Lingfeng lilingfeng3@huawei.com
[ Upstream commit b313a8c835516bdda85025500be866ac8a74e022 ]
Lockdep reported a warning in Linux version 6.6:
[ 414.344659] ================================ [ 414.345155] WARNING: inconsistent lock state [ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted [ 414.346221] -------------------------------- [ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes: [ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.351204] {IN-SOFTIRQ-W} state was registered at: [ 414.351751] lock_acquire+0x18d/0x460 [ 414.352218] _raw_spin_lock_irqsave+0x39/0x60 [ 414.352769] __wake_up_common_lock+0x22/0x60 [ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0 [ 414.353829] sbitmap_queue_clear+0xdd/0x270 [ 414.354338] blk_mq_put_tag+0xdf/0x170 [ 414.354807] __blk_mq_free_request+0x381/0x4d0 [ 414.355335] blk_mq_free_request+0x28b/0x3e0 [ 414.355847] __blk_mq_end_request+0x242/0xc30 [ 414.356367] scsi_end_request+0x2c1/0x830 [ 414.345155] WARNING: inconsistent lock state [ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted [ 414.346221] -------------------------------- [ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes: [ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.351204] {IN-SOFTIRQ-W} state was registered at: [ 414.351751] lock_acquire+0x18d/0x460 [ 414.352218] _raw_spin_lock_irqsave+0x39/0x60 [ 414.352769] __wake_up_common_lock+0x22/0x60 [ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0 [ 414.353829] sbitmap_queue_clear+0xdd/0x270 [ 414.354338] blk_mq_put_tag+0xdf/0x170 [ 414.354807] __blk_mq_free_request+0x381/0x4d0 [ 414.355335] blk_mq_free_request+0x28b/0x3e0 [ 414.355847] __blk_mq_end_request+0x242/0xc30 [ 414.356367] scsi_end_request+0x2c1/0x830 [ 414.356863] scsi_io_completion+0x177/0x1610 [ 414.357379] scsi_complete+0x12f/0x260 [ 414.357856] blk_complete_reqs+0xba/0xf0 [ 414.358338] __do_softirq+0x1b0/0x7a2 [ 414.358796] irq_exit_rcu+0x14b/0x1a0 [ 414.359262] sysvec_call_function_single+0xaf/0xc0 [ 414.359828] asm_sysvec_call_function_single+0x1a/0x20 [ 414.360426] default_idle+0x1e/0x30 [ 414.360873] default_idle_call+0x9b/0x1f0 [ 414.361390] do_idle+0x2d2/0x3e0 [ 414.361819] cpu_startup_entry+0x55/0x60 [ 414.362314] start_secondary+0x235/0x2b0 [ 414.362809] secondary_startup_64_no_verify+0x18f/0x19b [ 414.363413] irq event stamp: 428794 [ 414.363825] hardirqs last enabled at (428793): [<ffffffff816bfd1c>] ktime_get+0x1dc/0x200 [ 414.364694] hardirqs last disabled at (428794): [<ffffffff85470177>] _raw_spin_lock_irq+0x47/0x50 [ 414.365629] softirqs last enabled at (428444): [<ffffffff85474780>] __do_softirq+0x540/0x7a2 [ 414.366522] softirqs last disabled at (428419): [<ffffffff813f65ab>] irq_exit_rcu+0x14b/0x1a0 [ 414.367425] other info that might help us debug this: [ 414.368194] Possible unsafe locking scenario: [ 414.368900] CPU0 [ 414.369225] ---- [ 414.369548] lock(&sbq->ws[i].wait); [ 414.370000] <Interrupt> [ 414.370342] lock(&sbq->ws[i].wait); [ 414.370802] *** DEADLOCK *** [ 414.371569] 5 locks held by kworker/u10:3/1152: [ 414.372088] #0: ffff88810130e938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_scheduled_works+0x357/0x13f0 [ 414.373180] #1: ffff88810201fdb8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x3a3/0x13f0 [ 414.374384] #2: ffffffff86ffbdc0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x637/0xa00 [ 414.375342] #3: ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.376377] #4: ffff888106205a08 (&hctx->dispatch_wait_lock){+.-.}-{2:2}, at: blk_mq_dispatch_rq_list+0x1337/0x1ee0 [ 414.378607] stack backtrace: [ 414.379177] CPU: 0 PID: 1152 Comm: kworker/u10:3 Not tainted 6.6.0-07439-gba2303cacfda #6 [ 414.380032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 414.381177] Workqueue: writeback wb_workfn (flush-253:0) [ 414.381805] Call Trace: [ 414.382136] <TASK> [ 414.382429] dump_stack_lvl+0x91/0xf0 [ 414.382884] mark_lock_irq+0xb3b/0x1260 [ 414.383367] ? __pfx_mark_lock_irq+0x10/0x10 [ 414.383889] ? stack_trace_save+0x8e/0xc0 [ 414.384373] ? __pfx_stack_trace_save+0x10/0x10 [ 414.384903] ? graph_lock+0xcf/0x410 [ 414.385350] ? save_trace+0x3d/0xc70 [ 414.385808] mark_lock.part.20+0x56d/0xa90 [ 414.386317] mark_held_locks+0xb0/0x110 [ 414.386791] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 414.387320] lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.387901] ? _raw_spin_unlock_irq+0x28/0x50 [ 414.388422] trace_hardirqs_on+0x58/0x100 [ 414.388917] _raw_spin_unlock_irq+0x28/0x50 [ 414.389422] __blk_mq_tag_busy+0x1d6/0x2a0 [ 414.389920] __blk_mq_get_driver_tag+0x761/0x9f0 [ 414.390899] blk_mq_dispatch_rq_list+0x1780/0x1ee0 [ 414.391473] ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10 [ 414.392070] ? sbitmap_get+0x2b8/0x450 [ 414.392533] ? __blk_mq_get_driver_tag+0x210/0x9f0 [ 414.393095] __blk_mq_sched_dispatch_requests+0xd99/0x1690 [ 414.393730] ? elv_attempt_insert_merge+0x1b1/0x420 [ 414.394302] ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10 [ 414.394970] ? lock_acquire+0x18d/0x460 [ 414.395456] ? blk_mq_run_hw_queue+0x637/0xa00 [ 414.395986] ? __pfx_lock_acquire+0x10/0x10 [ 414.396499] blk_mq_sched_dispatch_requests+0x109/0x190 [ 414.397100] blk_mq_run_hw_queue+0x66e/0xa00 [ 414.397616] blk_mq_flush_plug_list.part.17+0x614/0x2030 [ 414.398244] ? __pfx_blk_mq_flush_plug_list.part.17+0x10/0x10 [ 414.398897] ? writeback_sb_inodes+0x241/0xcc0 [ 414.399429] blk_mq_flush_plug_list+0x65/0x80 [ 414.399957] __blk_flush_plug+0x2f1/0x530 [ 414.400458] ? __pfx___blk_flush_plug+0x10/0x10 [ 414.400999] blk_finish_plug+0x59/0xa0 [ 414.401467] wb_writeback+0x7cc/0x920 [ 414.401935] ? __pfx_wb_writeback+0x10/0x10 [ 414.402442] ? mark_held_locks+0xb0/0x110 [ 414.402931] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 414.403462] ? lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.404062] wb_workfn+0x2b3/0xcf0 [ 414.404500] ? __pfx_wb_workfn+0x10/0x10 [ 414.404989] process_scheduled_works+0x432/0x13f0 [ 414.405546] ? __pfx_process_scheduled_works+0x10/0x10 [ 414.406139] ? do_raw_spin_lock+0x101/0x2a0 [ 414.406641] ? assign_work+0x19b/0x240 [ 414.407106] ? lock_is_held_type+0x9d/0x110 [ 414.407604] worker_thread+0x6f2/0x1160 [ 414.408075] ? __kthread_parkme+0x62/0x210 [ 414.408572] ? lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.409168] ? __kthread_parkme+0x13c/0x210 [ 414.409678] ? __pfx_worker_thread+0x10/0x10 [ 414.410191] kthread+0x33c/0x440 [ 414.410602] ? __pfx_kthread+0x10/0x10 [ 414.411068] ret_from_fork+0x4d/0x80 [ 414.411526] ? __pfx_kthread+0x10/0x10 [ 414.411993] ret_from_fork_asm+0x1b/0x30 [ 414.412489] </TASK>
When interrupt is turned on while a lock holding by spin_lock_irq it throws a warning because of potential deadlock.
blk_mq_prep_dispatch_rq blk_mq_get_driver_tag __blk_mq_get_driver_tag __blk_mq_alloc_driver_tag blk_mq_tag_busy -> tag is already busy // failed to get driver tag blk_mq_mark_tag_wait spin_lock_irq(&wq->lock) -> lock A (&sbq->ws[i].wait) __add_wait_queue(wq, wait) -> wait queue active blk_mq_get_driver_tag __blk_mq_tag_busy -> 1) tag must be idle, which means there can't be inflight IO spin_lock_irq(&tags->lock) -> lock B (hctx->tags) spin_unlock_irq(&tags->lock) -> unlock B, turn on interrupt accidentally -> 2) context must be preempt by IO interrupt to trigger deadlock.
As shown above, the deadlock is not possible in theory, but the warning still need to be fixed.
Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq.
Fixes: 4f1731df60f9 ("blk-mq: fix potential io hang by wrong 'wake_batch'") Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Bart Van Assche bvanassche@acm.org Link: https://lore.kernel.org/r/20240815024736.2040971-1-lilingfeng@huaweicloud.co... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-mq-tag.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 100889c276c3f..dcd620422d01d 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -40,6 +40,7 @@ static void blk_mq_update_wake_batch(struct blk_mq_tags *tags, void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) { unsigned int users; + unsigned long flags; struct blk_mq_tags *tags = hctx->tags;
/* @@ -58,11 +59,11 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) return; }
- spin_lock_irq(&tags->lock); + spin_lock_irqsave(&tags->lock, flags); users = tags->active_queues + 1; WRITE_ONCE(tags->active_queues, users); blk_mq_update_wake_batch(tags, users); - spin_unlock_irq(&tags->lock); + spin_unlock_irqrestore(&tags->lock, flags); }
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Clark robdclark@chromium.org
[ Upstream commit 4bea53b9c7c72fd12a0ceebe88a71723c0a514b8 ]
Until various PM devfreq/QoS and interconnect patches land, we could potentially trigger reclaim from gpu scheduler thread, and under enough memory pressure that could trigger a sort of deadlock. Eventually the wait will timeout and we'll move on to consider other GEM objects. But given that there is still a potential for deadlock/stalling, we should reduce the timeout to contain the damage.
Signed-off-by: Rob Clark robdclark@chromium.org Patchwork: https://patchwork.freedesktop.org/patch/568031/ Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/msm_gem_shrinker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c b/drivers/gpu/drm/msm/msm_gem_shrinker.c index 31f054c903a43..a35c98306f1e5 100644 --- a/drivers/gpu/drm/msm/msm_gem_shrinker.c +++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c @@ -76,7 +76,7 @@ static bool wait_for_idle(struct drm_gem_object *obj) { enum dma_resv_usage usage = dma_resv_usage_rw(true); - return dma_resv_wait_timeout(obj->resv, usage, false, 1000) > 0; + return dma_resv_wait_timeout(obj->resv, usage, false, 10) > 0; }
static bool
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ashish Mhetre amhetre@nvidia.com
[ Upstream commit 0d6c918011ce4764ed277de4726a468b7ffe5fed ]
There are few MC clients where SID security and override register offsets are not specified like "sw_cluster0" in tegra234. Don't program SID override for such clients because it leads to access to invalid addresses.
Signed-off-by: Ashish Mhetre amhetre@nvidia.com Link: https://lore.kernel.org/r/20231107112713.21399-2-amhetre@nvidia.com Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/memory/tegra/tegra186.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/memory/tegra/tegra186.c b/drivers/memory/tegra/tegra186.c index 7bb73f06fad3e..fd6f5e2e01a28 100644 --- a/drivers/memory/tegra/tegra186.c +++ b/drivers/memory/tegra/tegra186.c @@ -74,6 +74,9 @@ static void tegra186_mc_client_sid_override(struct tegra_mc *mc, { u32 value, old;
+ if (client->regs.sid.security == 0 && client->regs.sid.override == 0) + return; + value = readl(mc->regs + client->regs.sid.security); if ((value & MC_SID_STREAMID_SECURITY_OVERRIDE) == 0) { /*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kunwu Chan chentao@kylinos.cn
[ Upstream commit 45b1ba7e5d1f6881050d558baf9bc74a2ae13930 ]
kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Ensure the allocation was successful by checking the pointer validity.
Signed-off-by: Kunwu Chan chentao@kylinos.cn Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20231122030651.3818-1-chentao@kylinos.cn Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/sysdev/xics/icp-native.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/sysdev/xics/icp-native.c b/arch/powerpc/sysdev/xics/icp-native.c index edc17b6b1cc2f..9b2238d73003b 100644 --- a/arch/powerpc/sysdev/xics/icp-native.c +++ b/arch/powerpc/sysdev/xics/icp-native.c @@ -236,6 +236,8 @@ static int __init icp_native_map_one_cpu(int hw_id, unsigned long addr, rname = kasprintf(GFP_KERNEL, "CPU %d [0x%x] Interrupt Presentation", cpu, hw_id);
+ if (!rname) + return -ENOMEM; if (!request_mem_region(addr, size, rname)) { pr_warn("icp_native: Could not reserve ICP MMIO for CPU %d, interrupt server #0x%x\n", cpu, hw_id);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bard Liao yung-chuan.liao@linux.intel.com
[ Upstream commit 2bd512626f8ea3957c981cadd2ebf75feff737dd ]
snd_sof_ipc_msg_data could return error.
Signed-off-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Péter Ujfalusi peter.ujfalusi@linux.intel.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Signed-off-by: Peter Ujfalusi peter.ujfalusi@linux.intel.com Link: https://lore.kernel.org/r/20231129122021.679-1-peter.ujfalusi@linux.intel.co... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sof/ipc4.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/sound/soc/sof/ipc4.c b/sound/soc/sof/ipc4.c index 06e1872abfee7..1449837b0fb2c 100644 --- a/sound/soc/sof/ipc4.c +++ b/sound/soc/sof/ipc4.c @@ -616,7 +616,14 @@ static void sof_ipc4_rx_msg(struct snd_sof_dev *sdev) return;
ipc4_msg->data_size = data_size; - snd_sof_ipc_msg_data(sdev, NULL, ipc4_msg->data_ptr, ipc4_msg->data_size); + err = snd_sof_ipc_msg_data(sdev, NULL, ipc4_msg->data_ptr, ipc4_msg->data_size); + if (err < 0) { + dev_err(sdev->dev, "failed to read IPC notification data: %d\n", err); + kfree(ipc4_msg->data_ptr); + ipc4_msg->data_ptr = NULL; + ipc4_msg->data_size = 0; + return; + } }
sof_ipc4_log_header(sdev->dev, "ipc rx done ", ipc4_msg, true);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit 4265eb062a7303e537ab3792ade31f424c3c5189 ]
Without visibility into the initializers for data->innr, GCC suspects using it as an index could walk off the end of the various 14-element arrays in data. Perform an explicit clamp to the array size. Silences the following warning with GCC 12+:
../drivers/hwmon/pc87360.c: In function 'pc87360_update_device': ../drivers/hwmon/pc87360.c:341:49: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 341 | data->in_max[i] = pc87360_read_value(data, | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ 342 | LD_IN, i, | ~~~~~~~~~ 343 | PC87365_REG_IN_MAX); | ~~~~~~~~~~~~~~~~~~~ ../drivers/hwmon/pc87360.c:209:12: note: at offset 255 into destination object 'in_max' of size 14 209 | u8 in_max[14]; /* Register value */ | ^~~~~~
Cc: Jim Cromie jim.cromie@gmail.com Cc: Jean Delvare jdelvare@suse.com Cc: Guenter Roeck linux@roeck-us.net Cc: linux-hwmon@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Gustavo A. R. Silva gustavoars@kernel.org Link: https://lore.kernel.org/r/20231130200207.work.679-kees@kernel.org [groeck: Added comment into code clarifying context] Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/pc87360.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/hwmon/pc87360.c b/drivers/hwmon/pc87360.c index a4adc8bd531ff..534a6072036c9 100644 --- a/drivers/hwmon/pc87360.c +++ b/drivers/hwmon/pc87360.c @@ -323,7 +323,11 @@ static struct pc87360_data *pc87360_update_device(struct device *dev) }
/* Voltages */ - for (i = 0; i < data->innr; i++) { + /* + * The min() below does not have any practical meaning and is + * only needed to silence a warning observed with gcc 12+. + */ + for (i = 0; i < min(data->innr, ARRAY_SIZE(data->in)); i++) { data->in_status[i] = pc87360_read_value(data, LD_IN, i, PC87365_REG_IN_STATUS); /* Clear bits */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Yan andy.yan@rock-chips.com
[ Upstream commit 20529a68307feed00dd3d431d3fff0572616b0f2 ]
The enable bit and transform offset of cluster windows should be cleared when it work at linear mode, or we may have a iommu fault issue on rk3588 which cluster windows switch between afbc and linear mode.
As the cluster windows of rk3568 only supports afbc format so is therefore not affected.
Signed-off-by: Andy Yan andy.yan@rock-chips.com Reviewed-by: Sascha Hauer s.hauer@pengutronix.de Signed-off-by: Heiko Stuebner heiko@sntech.de Link: https://patchwork.freedesktop.org/patch/msgid/20231211115741.1784954-1-andys... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/rockchip/rockchip_drm_vop2.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c index 80b8c83342840..a6071464a543f 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c @@ -1258,6 +1258,11 @@ static void vop2_plane_atomic_update(struct drm_plane *plane, vop2_win_write(win, VOP2_WIN_AFBC_ROTATE_270, rotate_270); vop2_win_write(win, VOP2_WIN_AFBC_ROTATE_90, rotate_90); } else { + if (vop2_cluster_window(win)) { + vop2_win_write(win, VOP2_WIN_AFBC_ENABLE, 0); + vop2_win_write(win, VOP2_WIN_AFBC_TRANSFORM_OFFSET, 0); + } + vop2_win_write(win, VOP2_WIN_YRGB_VIR, DIV_ROUND_UP(fb->pitches[0], 4)); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zijun Hu quic_zijuhu@quicinc.com
[ Upstream commit 132d0fd0b8418094c9e269e5bc33bf5b864f4a65 ]
For some controllers such as QCA2066, it does not need to send HCI_Configure_Data_Path to configure non-HCI data transport path to support HFP offload, their device drivers may set hdev->get_codec_config_data as NULL, so Explicitly add this non NULL checking before calling the function.
Signed-off-by: Zijun Hu quic_zijuhu@quicinc.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_conn.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index bac5a369d2bef..858c454e35e67 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -293,6 +293,13 @@ static int configure_datapath_sync(struct hci_dev *hdev, struct bt_codec *codec) __u8 vnd_len, *vnd_data = NULL; struct hci_op_configure_data_path *cmd = NULL;
+ if (!codec->data_path || !hdev->get_codec_config_data) + return 0; + + /* Do not take me as error */ + if (!hdev->get_codec_config_data) + return 0; + err = hdev->get_codec_config_data(hdev, ESCO_LINK, codec, &vnd_len, &vnd_data); if (err < 0) @@ -338,9 +345,7 @@ static int hci_enhanced_setup_sync(struct hci_dev *hdev, void *data)
bt_dev_dbg(hdev, "hcon %p", conn);
- /* for offload use case, codec needs to configured before opening SCO */ - if (conn->codec.data_path) - configure_datapath_sync(hdev, &conn->codec); + configure_datapath_sync(hdev, &conn->codec);
conn->state = BT_CONNECT; conn->out = true;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
[ Upstream commit 4e58543e7da4859c4ba61d15493e3522b6ad71fd ]
It turns out that the .freeze_super and .thaw_super operations require the filesystem to manage the superblock refcount itself. We are using the freeze_super() and thaw_super() helpers to mostly take care of that for us, but this means that the superblock may no longer be around by when thaw_super() returns, and gfs2_thaw_super() will then access freed memory. Take an extra superblock reference in gfs2_thaw_super() to fix that.
Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/super.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index f9b47df485d17..aff8cdc61eff7 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -814,6 +814,7 @@ static int gfs2_thaw_super(struct super_block *sb) if (!test_bit(SDF_FREEZE_INITIATOR, &sdp->sd_flags)) goto out;
+ atomic_inc(&sb->s_active); gfs2_freeze_unlock(&sdp->sd_freeze_gh);
error = gfs2_do_thaw(sdp); @@ -824,6 +825,7 @@ static int gfs2_thaw_super(struct super_block *sb) } out: mutex_unlock(&sdp->sd_freeze_mutex); + deactivate_super(sb); return error; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Wagner dwagner@suse.de
[ Upstream commit 0e716cec6fb11a14c220ee17c404b67962e902f7 ]
The first command issued from the host to the target is the fabrics connect command. At this point, neither the target queue nor the controller have been allocated. But we already try to trace this command in nvmet_req_init.
Reported by KASAN.
Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Daniel Wagner dwagner@suse.de Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/trace.c | 6 +++--- drivers/nvme/target/trace.h | 28 +++++++++++++++++----------- 2 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/drivers/nvme/target/trace.c b/drivers/nvme/target/trace.c index bff454d46255b..6ee1f3db81d04 100644 --- a/drivers/nvme/target/trace.c +++ b/drivers/nvme/target/trace.c @@ -211,7 +211,7 @@ const char *nvmet_trace_disk_name(struct trace_seq *p, char *name) return ret; }
-const char *nvmet_trace_ctrl_name(struct trace_seq *p, struct nvmet_ctrl *ctrl) +const char *nvmet_trace_ctrl_id(struct trace_seq *p, u16 ctrl_id) { const char *ret = trace_seq_buffer_ptr(p);
@@ -224,8 +224,8 @@ const char *nvmet_trace_ctrl_name(struct trace_seq *p, struct nvmet_ctrl *ctrl) * If we can know the extra data of the connect command in this stage, * we can update this print statement later. */ - if (ctrl) - trace_seq_printf(p, "%d", ctrl->cntlid); + if (ctrl_id) + trace_seq_printf(p, "%d", ctrl_id); else trace_seq_printf(p, "_"); trace_seq_putc(p, 0); diff --git a/drivers/nvme/target/trace.h b/drivers/nvme/target/trace.h index 974d99d47f514..7f7ebf9558e50 100644 --- a/drivers/nvme/target/trace.h +++ b/drivers/nvme/target/trace.h @@ -32,18 +32,24 @@ const char *nvmet_trace_parse_fabrics_cmd(struct trace_seq *p, u8 fctype, nvmet_trace_parse_nvm_cmd(p, opcode, cdw10) : \ nvmet_trace_parse_admin_cmd(p, opcode, cdw10)))
-const char *nvmet_trace_ctrl_name(struct trace_seq *p, struct nvmet_ctrl *ctrl); -#define __print_ctrl_name(ctrl) \ - nvmet_trace_ctrl_name(p, ctrl) +const char *nvmet_trace_ctrl_id(struct trace_seq *p, u16 ctrl_id); +#define __print_ctrl_id(ctrl_id) \ + nvmet_trace_ctrl_id(p, ctrl_id)
const char *nvmet_trace_disk_name(struct trace_seq *p, char *name); #define __print_disk_name(name) \ nvmet_trace_disk_name(p, name)
#ifndef TRACE_HEADER_MULTI_READ -static inline struct nvmet_ctrl *nvmet_req_to_ctrl(struct nvmet_req *req) +static inline u16 nvmet_req_to_ctrl_id(struct nvmet_req *req) { - return req->sq->ctrl; + /* + * The queue and controller pointers are not valid until an association + * has been established. + */ + if (!req->sq || !req->sq->ctrl) + return 0; + return req->sq->ctrl->cntlid; }
static inline void __assign_req_name(char *name, struct nvmet_req *req) @@ -62,7 +68,7 @@ TRACE_EVENT(nvmet_req_init, TP_ARGS(req, cmd), TP_STRUCT__entry( __field(struct nvme_command *, cmd) - __field(struct nvmet_ctrl *, ctrl) + __field(u16, ctrl_id) __array(char, disk, DISK_NAME_LEN) __field(int, qid) __field(u16, cid) @@ -75,7 +81,7 @@ TRACE_EVENT(nvmet_req_init, ), TP_fast_assign( __entry->cmd = cmd; - __entry->ctrl = nvmet_req_to_ctrl(req); + __entry->ctrl_id = nvmet_req_to_ctrl_id(req); __assign_req_name(__entry->disk, req); __entry->qid = req->sq->qid; __entry->cid = cmd->common.command_id; @@ -89,7 +95,7 @@ TRACE_EVENT(nvmet_req_init, ), TP_printk("nvmet%s: %sqid=%d, cmdid=%u, nsid=%u, flags=%#x, " "meta=%#llx, cmd=(%s, %s)", - __print_ctrl_name(__entry->ctrl), + __print_ctrl_id(__entry->ctrl_id), __print_disk_name(__entry->disk), __entry->qid, __entry->cid, __entry->nsid, __entry->flags, __entry->metadata, @@ -103,7 +109,7 @@ TRACE_EVENT(nvmet_req_complete, TP_PROTO(struct nvmet_req *req), TP_ARGS(req), TP_STRUCT__entry( - __field(struct nvmet_ctrl *, ctrl) + __field(u16, ctrl_id) __array(char, disk, DISK_NAME_LEN) __field(int, qid) __field(int, cid) @@ -111,7 +117,7 @@ TRACE_EVENT(nvmet_req_complete, __field(u16, status) ), TP_fast_assign( - __entry->ctrl = nvmet_req_to_ctrl(req); + __entry->ctrl_id = nvmet_req_to_ctrl_id(req); __entry->qid = req->cq->qid; __entry->cid = req->cqe->command_id; __entry->result = le64_to_cpu(req->cqe->result.u64); @@ -119,7 +125,7 @@ TRACE_EVENT(nvmet_req_complete, __assign_req_name(__entry->disk, req); ), TP_printk("nvmet%s: %sqid=%d, cmdid=%u, res=%#llx, status=%#x", - __print_ctrl_name(__entry->ctrl), + __print_ctrl_id(__entry->ctrl_id), __print_disk_name(__entry->disk), __entry->qid, __entry->cid, __entry->result, __entry->status)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 172202152a125955367393956acf5f4ffd092e0d ]
Otherwise operating on an incorrupted block bitmap can lead to all sorts of unknown problems.
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240104142040.2835097-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/mballoc.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 004ad321a45d6..c723ee3e49959 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -6483,6 +6483,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group)) bool set_trimmed = false; void *bitmap;
+ if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) + return 0; + last = ext4_last_grp_cluster(sb, e4b->bd_group); bitmap = e4b->bd_bitmap; if (start == 0 && max >= last)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Al Viro viro@zeniv.linux.org.uk
[ Upstream commit 275655d3207b9e65d1561bf21c06a622d9ec1d43 ]
In __afs_break_callback() we might check ->cb_nr_mmap and if it's non-zero do queue_work(&vnode->cb_work). In afs_drop_open_mmap() we decrement ->cb_nr_mmap and do flush_work(&vnode->cb_work) if it reaches zero.
The trouble is, there's nothing to prevent __afs_break_callback() from seeing ->cb_nr_mmap before the decrement and do queue_work() after both the decrement and flush_work(). If that happens, we might be in trouble - vnode might get freed before the queued work runs.
__afs_break_callback() is always done under ->cb_lock, so let's make sure that ->cb_nr_mmap can change from non-zero to zero while holding ->cb_lock (the spinlock component of it - it's a seqlock and we don't need to mess with the counter).
Acked-by: Christian Brauner brauner@kernel.org Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/file.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/afs/file.c b/fs/afs/file.c index 2eeab57df133a..9051ed0085544 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -525,13 +525,17 @@ static void afs_add_open_mmap(struct afs_vnode *vnode)
static void afs_drop_open_mmap(struct afs_vnode *vnode) { - if (!atomic_dec_and_test(&vnode->cb_nr_mmap)) + if (atomic_add_unless(&vnode->cb_nr_mmap, -1, 1)) return;
down_write(&vnode->volume->cell->fs_open_mmaps_lock);
- if (atomic_read(&vnode->cb_nr_mmap) == 0) + read_seqlock_excl(&vnode->cb_lock); + // the only place where ->cb_nr_mmap may hit 0 + // see __afs_break_callback() for the other side... + if (atomic_dec_and_test(&vnode->cb_nr_mmap)) list_del_init(&vnode->cb_mmap_link); + read_sequnlock_excl(&vnode->cb_lock);
up_write(&vnode->volume->cell->fs_open_mmaps_lock); flush_work(&vnode->cb_work);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Al Viro viro@zeniv.linux.org.uk
[ Upstream commit 053fc4f755ad43cf35210677bcba798ccdc48d0c ]
->permission(), ->get_link() and ->inode_get_acl() might dereference ->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns as well) when called from rcu pathwalk.
Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info and dropping ->user_ns rcu-delayed too.
Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/fuse/cuse.c | 3 +-- fs/fuse/fuse_i.h | 1 + fs/fuse/inode.c | 15 +++++++++++---- 3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index c7d882a9fe339..295344a462e1d 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -474,8 +474,7 @@ static int cuse_send_init(struct cuse_conn *cc)
static void cuse_fc_release(struct fuse_conn *fc) { - struct cuse_conn *cc = fc_to_cc(fc); - kfree_rcu(cc, fc.rcu); + kfree(fc_to_cc(fc)); }
/** diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 253b9b78d6f13..66c2a99994683 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -872,6 +872,7 @@ struct fuse_mount {
/* Entry on fc->mounts */ struct list_head fc_entry; + struct rcu_head rcu; };
static inline struct fuse_mount *get_fuse_mount_super(struct super_block *sb) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index f19bdd7cbd779..64618548835b4 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -925,6 +925,14 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm, } EXPORT_SYMBOL_GPL(fuse_conn_init);
+static void delayed_release(struct rcu_head *p) +{ + struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu); + + put_user_ns(fc->user_ns); + fc->release(fc); +} + void fuse_conn_put(struct fuse_conn *fc) { if (refcount_dec_and_test(&fc->count)) { @@ -936,13 +944,12 @@ void fuse_conn_put(struct fuse_conn *fc) if (fiq->ops->release) fiq->ops->release(fiq); put_pid_ns(fc->pid_ns); - put_user_ns(fc->user_ns); bucket = rcu_dereference_protected(fc->curr_bucket, 1); if (bucket) { WARN_ON(atomic_read(&bucket->count) != 1); kfree(bucket); } - fc->release(fc); + call_rcu(&fc->rcu, delayed_release); } } EXPORT_SYMBOL_GPL(fuse_conn_put); @@ -1356,7 +1363,7 @@ EXPORT_SYMBOL_GPL(fuse_send_init); void fuse_free_conn(struct fuse_conn *fc) { WARN_ON(!list_empty(&fc->devices)); - kfree_rcu(fc, rcu); + kfree(fc); } EXPORT_SYMBOL_GPL(fuse_free_conn);
@@ -1895,7 +1902,7 @@ static void fuse_sb_destroy(struct super_block *sb) void fuse_mount_destroy(struct fuse_mount *fm) { fuse_conn_put(fm->fc); - kfree(fm); + kfree_rcu(fm, rcu); } EXPORT_SYMBOL(fuse_mount_destroy);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kara jack@suse.cz
[ Upstream commit 249f374eb9b6b969c64212dd860cc1439674c4a8 ]
dqget() checks whether dquot->dq_sb is set when returning it using BUG_ON. Firstly this doesn't work as an invalidation check for quite some time (we release dquot with dq_sb set these days), secondly using BUG_ON is quite harsh. Use WARN_ON_ONCE and check whether dquot is still hashed instead.
Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- fs/quota/dquot.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index b67557647d61f..f7ab6b44011b5 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -995,9 +995,8 @@ struct dquot *dqget(struct super_block *sb, struct kqid qid) * smp_mb__before_atomic() in dquot_acquire(). */ smp_rmb(); -#ifdef CONFIG_QUOTA_DEBUG - BUG_ON(!dquot->dq_sb); /* Has somebody invalidated entry under us? */ -#endif + /* Has somebody invalidated entry under us? */ + WARN_ON_ONCE(hlist_unhashed(&dquot->dq_hash)); out: if (empty) do_destroy_dquot(empty);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neel Natu neelnatu@google.com
[ Upstream commit 05d8f255867e3196565bb31a911a437697fab094 ]
Prior to this change 'on->nr_mmapped' tracked the total number of mmaps across all of its associated open files via kernfs_fop_mmap(). Thus if the file descriptor associated with a kernfs_open_file was mmapped 10 times then we would have: 'of->mmapped = true' and 'of_on(of)->nr_mmapped = 10'.
The problem is that closing or draining a 'of->mmapped' file would only decrement one from the 'of_on(of)->nr_mmapped' counter.
For e.g. we have this from kernfs_unlink_open_file(): if (of->mmapped) on->nr_mmapped--;
The WARN_ON_ONCE(on->nr_mmapped) in kernfs_drain_open_files() is easy to reproduce by: 1. opening a (mmap-able) kernfs file. 2. mmap-ing that file more than once (mapping just once masks the issue). 3. trigger a drain of that kernfs file.
Modulo out-of-tree patches I was able to trigger this reliably by identifying pci device nodes in sysfs that have resource regions that are mmap-able and that don't have any driver attached to them (steps 1 and 2). For step 3 we can "echo 1 > remove" to trigger a kernfs_drain.
Signed-off-by: Neel Natu neelnatu@google.com Link: https://lore.kernel.org/r/20240127234636.609265-1-neelnatu@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/kernfs/file.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index e4a50e4ff0d23..adf3536cfec81 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -532,9 +532,11 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma) goto out_put;
rc = 0; - of->mmapped = true; - of_on(of)->nr_mmapped++; - of->vm_ops = vma->vm_ops; + if (!of->mmapped) { + of->mmapped = true; + of_on(of)->nr_mmapped++; + of->vm_ops = vma->vm_ops; + } vma->vm_ops = &kernfs_vm_ops; out_put: kernfs_put_active(of->kn);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 15126b916e39b0cb67026b0af3c014bfeb1f76b3 ]
cx23885_vdev_init() can return a NULL pointer, but that pointer is used in the next line without a check.
Add a NULL pointer check and go to the error unwind if it is NULL.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Reported-by: Sicong Huang huangsicong@iie.ac.cn Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/pci/cx23885/cx23885-video.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/media/pci/cx23885/cx23885-video.c b/drivers/media/pci/cx23885/cx23885-video.c index 9af2c5596121c..51d7d720ec48b 100644 --- a/drivers/media/pci/cx23885/cx23885-video.c +++ b/drivers/media/pci/cx23885/cx23885-video.c @@ -1354,6 +1354,10 @@ int cx23885_video_register(struct cx23885_dev *dev) /* register Video device */ dev->video_dev = cx23885_vdev_init(dev, dev->pci, &cx23885_video_template, "video"); + if (!dev->video_dev) { + err = -ENOMEM; + goto fail_unreg; + } dev->video_dev->queue = &dev->vb2_vidq; dev->video_dev->device_caps = V4L2_CAP_READWRITE | V4L2_CAP_STREAMING | V4L2_CAP_AUDIO | V4L2_CAP_VIDEO_CAPTURE; @@ -1382,6 +1386,10 @@ int cx23885_video_register(struct cx23885_dev *dev) /* register VBI device */ dev->vbi_dev = cx23885_vdev_init(dev, dev->pci, &cx23885_vbi_template, "vbi"); + if (!dev->vbi_dev) { + err = -ENOMEM; + goto fail_unreg; + } dev->vbi_dev->queue = &dev->vb2_vbiq; dev->vbi_dev->device_caps = V4L2_CAP_READWRITE | V4L2_CAP_STREAMING | V4L2_CAP_AUDIO | V4L2_CAP_VBI_CAPTURE;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Filippov jcmvbkbc@gmail.com
[ Upstream commit 15fd1dc3dadb4268207fa6797e753541aca09a2a ]
Static FDPIC executable may get an executable stack even when it has non-executable GNU_STACK segment. This happens when STACK segment has rw permissions, but does not specify stack size. In that case FDPIC loader uses permissions of the interpreter's stack, and for static executables with no interpreter it results in choosing the arch-default permissions for the stack.
Fix that by using the interpreter's properties only when the interpreter is actually used.
Signed-off-by: Max Filippov jcmvbkbc@gmail.com Link: https://lore.kernel.org/r/20240118150637.660461-1-jcmvbkbc@gmail.com Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/binfmt_elf_fdpic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index 2aecd4ffb13b3..c71a409273150 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -320,7 +320,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm) else executable_stack = EXSTACK_DEFAULT;
- if (stack_size == 0) { + if (stack_size == 0 && interp_params.flags & ELF_FDPIC_FLAG_PRESENT) { stack_size = interp_params.stack_size; if (interp_params.flags & ELF_FDPIC_FLAG_EXEC_STACK) executable_stack = EXSTACK_ENABLE_X;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Justin Tee justin.tee@broadcom.com
[ Upstream commit 3d0f9342ae200aa1ddc4d6e7a573c6f8f068d994 ]
A static code analyzer tool indicates that the local variable called status in the lpfc_sli4_repost_sgl_list() routine could be used to print garbage uninitialized values in the routine's log message.
Fix by initializing to zero.
Signed-off-by: Justin Tee justin.tee@broadcom.com Link: https://lore.kernel.org/r/20240131185112.149731-2-justintee8345@gmail.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/lpfc/lpfc_sli.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 47b8102a7063a..587e3c2f7c48c 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -7596,7 +7596,7 @@ lpfc_sli4_repost_sgl_list(struct lpfc_hba *phba, struct lpfc_sglq *sglq_entry = NULL; struct lpfc_sglq *sglq_entry_next = NULL; struct lpfc_sglq *sglq_entry_first = NULL; - int status, total_cnt; + int status = 0, total_cnt; int post_cnt = 0, num_posted = 0, block_cnt = 0; int last_xritag = NO_XRI; LIST_HEAD(prep_sgl_list);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Philipp Stanner pstanner@redhat.com
[ Upstream commit 102fb77c2deb0df3683ef8ff7a6f4cf91dc456e2 ]
At several positions in dvb_frontend.c, memdup_user() is utilized to copy userspace arrays. This is done without overflow checks.
Use the new wrapper memdup_array_user() to copy the arrays more safely.
Link: https://lore.kernel.org/linux-media/20231102191633.52592-2-pstanner@redhat.c... Suggested-by: Dave Airlie airlied@redhat.com Signed-off-by: Philipp Stanner pstanner@redhat.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/dvb-core/dvb_frontend.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/media/dvb-core/dvb_frontend.c b/drivers/media/dvb-core/dvb_frontend.c index fce0e20940780..a1a3dbb0e7388 100644 --- a/drivers/media/dvb-core/dvb_frontend.c +++ b/drivers/media/dvb-core/dvb_frontend.c @@ -2160,7 +2160,8 @@ static int dvb_frontend_handle_compat_ioctl(struct file *file, unsigned int cmd, if (!tvps->num || (tvps->num > DTV_IOCTL_MAX_MSGS)) return -EINVAL;
- tvp = memdup_user(compat_ptr(tvps->props), tvps->num * sizeof(*tvp)); + tvp = memdup_array_user(compat_ptr(tvps->props), + tvps->num, sizeof(*tvp)); if (IS_ERR(tvp)) return PTR_ERR(tvp);
@@ -2191,7 +2192,8 @@ static int dvb_frontend_handle_compat_ioctl(struct file *file, unsigned int cmd, if (!tvps->num || (tvps->num > DTV_IOCTL_MAX_MSGS)) return -EINVAL;
- tvp = memdup_user(compat_ptr(tvps->props), tvps->num * sizeof(*tvp)); + tvp = memdup_array_user(compat_ptr(tvps->props), + tvps->num, sizeof(*tvp)); if (IS_ERR(tvp)) return PTR_ERR(tvp);
@@ -2368,7 +2370,8 @@ static int dvb_get_property(struct dvb_frontend *fe, struct file *file, if (!tvps->num || tvps->num > DTV_IOCTL_MAX_MSGS) return -EINVAL;
- tvp = memdup_user((void __user *)tvps->props, tvps->num * sizeof(*tvp)); + tvp = memdup_array_user((void __user *)tvps->props, + tvps->num, sizeof(*tvp)); if (IS_ERR(tvp)) return PTR_ERR(tvp);
@@ -2446,7 +2449,8 @@ static int dvb_frontend_handle_ioctl(struct file *file, if (!tvps->num || (tvps->num > DTV_IOCTL_MAX_MSGS)) return -EINVAL;
- tvp = memdup_user((void __user *)tvps->props, tvps->num * sizeof(*tvp)); + tvp = memdup_array_user((void __user *)tvps->props, + tvps->num, sizeof(*tvp)); if (IS_ERR(tvp)) return PTR_ERR(tvp);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit 4bea747f3fbec33c16d369b2f51e55981d7c78d0 ]
Since NUM_XMIT_BUFFS is always 1, building m68k with sun3_defconfig and -Warraybounds, this build warning is visible[1]:
drivers/net/ethernet/i825xx/sun3_82586.c: In function 'sun3_82586_timeout': drivers/net/ethernet/i825xx/sun3_82586.c:990:122: warning: array subscript 1 is above array bounds of 'volatile struct transmit_cmd_struct *[1]' [-Warray-bounds=] 990 | printk("%s: command-stats: %04x %04x\n",dev->name,swab16(p->xmit_cmds[0]->cmd_status),swab16(p->xmit_cmds[1]->cmd_status)); | ~~~~~~~~~~~~^~~ ... drivers/net/ethernet/i825xx/sun3_82586.c:156:46: note: while referencing 'xmit_cmds' 156 | volatile struct transmit_cmd_struct *xmit_cmds[NUM_XMIT_BUFFS];
Avoid accessing index 1 since it doesn't exist.
Link: https://github.com/KSPP/linux/issues/325 [1] Cc: Sam Creasey sammy@sammy.net Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Simon Horman horms@kernel.org Tested-by: Simon Horman horms@kernel.org # build-tested Reviewed-by: Gustavo A. R. Silva gustavoars@kernel.org Link: https://lore.kernel.org/r/20240206161651.work.876-kees@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/i825xx/sun3_82586.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/i825xx/sun3_82586.c b/drivers/net/ethernet/i825xx/sun3_82586.c index 3909c6a0af89f..72d3b5328ebb4 100644 --- a/drivers/net/ethernet/i825xx/sun3_82586.c +++ b/drivers/net/ethernet/i825xx/sun3_82586.c @@ -986,7 +986,7 @@ static void sun3_82586_timeout(struct net_device *dev, unsigned int txqueue) { #ifdef DEBUG printk("%s: xmitter timed out, try to restart! stat: %02x\n",dev->name,p->scb->cus); - printk("%s: command-stats: %04x %04x\n",dev->name,swab16(p->xmit_cmds[0]->cmd_status),swab16(p->xmit_cmds[1]->cmd_status)); + printk("%s: command-stats: %04x\n", dev->name, swab16(p->xmit_cmds[0]->cmd_status)); printk("%s: check, whether you set the right interrupt number!\n",dev->name); #endif sun3_82586_close(dev);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Erico Nunes nunes.erico@gmail.com
[ Upstream commit 27aa58ec85f973d98d336df7b7941149308db80f ]
This is required for reliable hard resets. Otherwise, doing a hard reset while a task is still running (such as a task which is being stopped by the drm_sched timeout handler) may result in random mmu write timeouts or lockups which cause the entire gpu to hang.
Signed-off-by: Erico Nunes nunes.erico@gmail.com Signed-off-by: Qiang Yu yuq825@gmail.com Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-5-nunes... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/lima/lima_gp.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/lima/lima_gp.c b/drivers/gpu/drm/lima/lima_gp.c index ca3842f719842..82071835ec9ed 100644 --- a/drivers/gpu/drm/lima/lima_gp.c +++ b/drivers/gpu/drm/lima/lima_gp.c @@ -166,6 +166,11 @@ static void lima_gp_task_run(struct lima_sched_pipe *pipe, gp_write(LIMA_GP_CMD, cmd); }
+static int lima_gp_bus_stop_poll(struct lima_ip *ip) +{ + return !!(gp_read(LIMA_GP_STATUS) & LIMA_GP_STATUS_BUS_STOPPED); +} + static int lima_gp_hard_reset_poll(struct lima_ip *ip) { gp_write(LIMA_GP_PERF_CNT_0_LIMIT, 0xC01A0000); @@ -179,6 +184,13 @@ static int lima_gp_hard_reset(struct lima_ip *ip)
gp_write(LIMA_GP_PERF_CNT_0_LIMIT, 0xC0FFE000); gp_write(LIMA_GP_INT_MASK, 0); + + gp_write(LIMA_GP_CMD, LIMA_GP_CMD_STOP_BUS); + ret = lima_poll_timeout(ip, lima_gp_bus_stop_poll, 10, 100); + if (ret) { + dev_err(dev->dev, "%s bus stop timeout\n", lima_ip_name(ip)); + return ret; + } gp_write(LIMA_GP_CMD, LIMA_GP_CMD_RESET); ret = lima_poll_timeout(ip, lima_gp_hard_reset_poll, 10, 100); if (ret) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Costa Shulyupin costa.shul@redhat.com
[ Upstream commit 56c2cb10120894be40c40a9bf0ce798da14c50f6 ]
During CPU-down hotplug, hrtimers may migrate to isolated CPUs, compromising CPU isolation.
Address this issue by masking valid CPUs for hrtimers using housekeeping_cpumask(HK_TYPE_TIMER).
Suggested-by: Waiman Long longman@redhat.com Signed-off-by: Costa Shulyupin costa.shul@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Waiman Long longman@redhat.com Link: https://lore.kernel.org/r/20240222200856.569036-1-costa.shul@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/time/hrtimer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 9bb88836c42e6..314fb7598a879 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -38,6 +38,7 @@ #include <linux/sched/deadline.h> #include <linux/sched/nohz.h> #include <linux/sched/debug.h> +#include <linux/sched/isolation.h> #include <linux/timer.h> #include <linux/freezer.h> #include <linux/compat.h> @@ -2220,8 +2221,8 @@ static void migrate_hrtimer_list(struct hrtimer_clock_base *old_base,
int hrtimers_cpu_dying(unsigned int dying_cpu) { + int i, ncpu = cpumask_any_and(cpu_active_mask, housekeeping_cpumask(HK_TYPE_TIMER)); struct hrtimer_cpu_base *old_base, *new_base; - int i, ncpu = cpumask_first(cpu_active_mask);
tick_cancel_sched_timer(dying_cpu);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Hajnoczi stefanha@redhat.com
[ Upstream commit 40488cc16f7ea0d193a4e248f0d809c25cc377db ]
Newlines in virtiofs tags are awkward for users and potential vectors for string injection attacks.
Signed-off-by: Stefan Hajnoczi stefanha@redhat.com Reviewed-by: Vivek Goyal vgoyal@redhat.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/fuse/virtio_fs.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 4d8d4f16c727b..92d41269f1d35 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -323,6 +323,16 @@ static int virtio_fs_read_tag(struct virtio_device *vdev, struct virtio_fs *fs) return -ENOMEM; memcpy(fs->tag, tag_buf, len); fs->tag[len] = '\0'; + + /* While the VIRTIO specification allows any character, newlines are + * awkward on mount(8) command-lines and cause problems in the sysfs + * "tag" attr and uevent TAG= properties. Forbid them. + */ + if (strchr(fs->tag, '\n')) { + dev_dbg(&vdev->dev, "refusing virtiofs tag with newline character\n"); + return -EINVAL; + } + return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Blumenstingl martin.blumenstingl@googlemail.com
[ Upstream commit e651f2fae33634175fae956d896277cf916f5d09 ]
The result of the division of new_rate by gt_target_rate can be zero (if new_rate is smaller than gt_target_rate). Using that result as divisor without checking can result in a division by zero error. Guard against this by checking for a zero value earlier. While here, also change the psv variable to an unsigned long to make sure we don't overflow the datatype as all other types involved are also unsiged long.
Signed-off-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://lore.kernel.org/r/20240225151336.2728533-3-martin.blumenstingl@googl... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clocksource/arm_global_timer.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/clocksource/arm_global_timer.c b/drivers/clocksource/arm_global_timer.c index e1c773bb55359..22a58d35a41fa 100644 --- a/drivers/clocksource/arm_global_timer.c +++ b/drivers/clocksource/arm_global_timer.c @@ -290,18 +290,17 @@ static int gt_clk_rate_change_cb(struct notifier_block *nb, switch (event) { case PRE_RATE_CHANGE: { - int psv; + unsigned long psv;
- psv = DIV_ROUND_CLOSEST(ndata->new_rate, - gt_target_rate); - - if (abs(gt_target_rate - (ndata->new_rate / psv)) > MAX_F_ERR) + psv = DIV_ROUND_CLOSEST(ndata->new_rate, gt_target_rate); + if (!psv || + abs(gt_target_rate - (ndata->new_rate / psv)) > MAX_F_ERR) return NOTIFY_BAD;
psv--;
/* prescaler within legal range? */ - if (psv < 0 || psv > GT_CONTROL_PRESCALER_MAX) + if (psv > GT_CONTROL_PRESCALER_MAX) return NOTIFY_BAD;
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit b5590270068c4324dac4a2b5a4a156e02e21339f ]
__netlink_dump_start() releases nlk->cb_mutex right before calling netlink_dump() which grabs it again.
This seems dangerous, even if KASAN did not bother yet.
Add a @lock_taken parameter to netlink_dump() to let it grab the mutex if called from netlink_recvmsg() only.
Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Jiri Pirko jiri@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/netlink/af_netlink.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index e9b81cba1e2b4..8d26bd2ae3d55 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -130,7 +130,7 @@ static const char *const nlk_cb_mutex_key_strings[MAX_LINKS + 1] = { "nlk_cb_mutex-MAX_LINKS" };
-static int netlink_dump(struct sock *sk); +static int netlink_dump(struct sock *sk, bool lock_taken);
/* nl_table locking explained: * Lookup and traversal are protected with an RCU read-side lock. Insertion @@ -1953,7 +1953,7 @@ static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
if (READ_ONCE(nlk->cb_running) && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) { - ret = netlink_dump(sk); + ret = netlink_dump(sk, false); if (ret) { WRITE_ONCE(sk->sk_err, -ret); sk_error_report(sk); @@ -2163,7 +2163,7 @@ static int netlink_dump_done(struct netlink_sock *nlk, struct sk_buff *skb, return 0; }
-static int netlink_dump(struct sock *sk) +static int netlink_dump(struct sock *sk, bool lock_taken) { struct netlink_sock *nlk = nlk_sk(sk); struct netlink_ext_ack extack = {}; @@ -2175,7 +2175,8 @@ static int netlink_dump(struct sock *sk) int alloc_min_size; int alloc_size;
- mutex_lock(nlk->cb_mutex); + if (!lock_taken) + mutex_lock(nlk->cb_mutex); if (!nlk->cb_running) { err = -EINVAL; goto errout_skb; @@ -2330,9 +2331,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, WRITE_ONCE(nlk->cb_running, true); nlk->dump_done_errno = INT_MAX;
- mutex_unlock(nlk->cb_mutex); - - ret = netlink_dump(sk); + ret = netlink_dump(sk, true);
sock_put(sk);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li Nan linan122@huawei.com
[ Upstream commit 9dd8702e7cd28ebf076ff838933f29cf671165ec ]
'disk->private_data' is set to mddev in md_alloc() and never set to NULL, and users need to open mddev before submitting ioctl. So mddev must not have been freed during ioctl, and there is no need to check mddev here. Clean up it.
Signed-off-by: Li Nan linan122@huawei.com Reviewed-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Song Liu song@kernel.org Link: https://lore.kernel.org/r/20240226031444.3606764-4-linan666@huaweicloud.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/md.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index b87c6ef0da8ab..297c86f5c70b5 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -7614,11 +7614,6 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
mddev = bdev->bd_disk->private_data;
- if (!mddev) { - BUG(); - goto out; - } - /* Some actions do not requires the mutex */ switch (cmd) { case GET_ARRAY_INFO:
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit 44c76825d6eefee9eb7ce06c38e1a6632ac7eb7d ]
In commit c1d171a00294 ("x86: randomize brk"), arch_randomize_brk() was defined to use a 32MB range (13 bits of entropy), but was never increased when moving to 64-bit. The default arch_randomize_brk() uses 32MB for 32-bit tasks, and 1GB (18 bits of entropy) for 64-bit tasks.
Update x86_64 to match the entropy used by arm64 and other 64-bit architectures.
Reported-by: y0un9n132@gmail.com Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Jiri Kosina jkosina@suse.com Closes: https://lore.kernel.org/linux-hardening/CA+2EKTVLvc8hDZc+2Yhwmus=dzOUG5E4gV7... Link: https://lore.kernel.org/r/20240217062545.1631668-1-keescook@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/process.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 279b5e9be80fc..acc83738bf5b4 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -991,7 +991,10 @@ unsigned long arch_align_stack(unsigned long sp)
unsigned long arch_randomize_brk(struct mm_struct *mm) { - return randomize_page(mm->brk, 0x02000000); + if (mmap_is_ia32()) + return randomize_page(mm->brk, SZ_32M); + + return randomize_page(mm->brk, SZ_1G); }
/*
Hi all,
this patch introduces a regression in some versions of qemu-aarch64 (at least as built by debian): https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087822
It doesn't look like it still is a problem with newer versions of qemu so I'm not sure if this should be reverted on master, but it took me a bit of time to track this down to this commit as my reproducer isn't great, so it might make sense to revert this commit on stable branches?
(I don't remember the policies on "don't break userspace", but qemu-user is a bit of a special case here so I'll leave that up to Greg)
I've confirmed that this bug occurs on top of the latest v6.1.118 and goes away reverting this. (I've also checked the problem also occurs on master and reverting the patch also works around the issue there at this point)
Thank you, Dominique.
[leaving subject below for context, no more text after this] Greg Kroah-Hartman wrote on Tue, Aug 27, 2024 at 04:38:03PM +0200:
From: Kees Cook keescook@chromium.org
[ Upstream commit 44c76825d6eefee9eb7ce06c38e1a6632ac7eb7d ]
In commit c1d171a00294 ("x86: randomize brk"), arch_randomize_brk() was defined to use a 32MB range (13 bits of entropy), but was never increased when moving to 64-bit. The default arch_randomize_brk() uses 32MB for 32-bit tasks, and 1GB (18 bits of entropy) for 64-bit tasks.
Update x86_64 to match the entropy used by arm64 and other 64-bit architectures.
Reported-by: y0un9n132@gmail.com Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Jiri Kosina jkosina@suse.com Closes: https://lore.kernel.org/linux-hardening/CA+2EKTVLvc8hDZc+2Yhwmus=dzOUG5E4gV7... Link: https://lore.kernel.org/r/20240217062545.1631668-1-keescook@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org
Hi,
On Wed, Nov 20, 2024 at 10:48:42AM +0900, Dominique Martinet wrote:
Hi all,
this patch introduces a regression in some versions of qemu-aarch64 (at least as built by debian): https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087822
It doesn't look like it still is a problem with newer versions of qemu so I'm not sure if this should be reverted on master, but it took me a bit of time to track this down to this commit as my reproducer isn't great, so it might make sense to revert this commit on stable branches?
(I don't remember the policies on "don't break userspace", but qemu-user is a bit of a special case here so I'll leave that up to Greg)
I've confirmed that this bug occurs on top of the latest v6.1.118 and goes away reverting this. (I've also checked the problem also occurs on master and reverting the patch also works around the issue there at this point)
Interestigly there is another report in Debian which identifies the backport of upstream commit 44c76825d6eefee9eb7ce06c38e1a6632ac7eb7d to cause issues in the 6.1.y series:
https://bugs.debian.org/1085762 https://lore.kernel.org/regressions/18f34d636390454180240e6a61af9217@kumkeo....
Regards, Salvatore
Salvatore Bonaccorso wrote on Wed, Nov 20, 2024 at 08:01:23AM +0100:
Interestigly there is another report in Debian which identifies the backport of upstream commit 44c76825d6eefee9eb7ce06c38e1a6632ac7eb7d to cause issues in the 6.1.y series:
https://bugs.debian.org/1085762 https://lore.kernel.org/regressions/18f34d636390454180240e6a61af9217@kumkeo....
Thanks for the heads up!
This would appear to be the same bug (running qemu-aarch64 in user mode on an x86 machine) ? I've added Ulrich in recipients to confirm he's using qemu-user-static like I was.
Shame I didn't notice all his work, that would have saved me some time :)
Thanks,
Hi Dominique & all,
Salvatore Bonaccorso wrote on Wed, Nov 20, 2024 at 08:01:23AM +0100:
Interestigly there is another report in Debian which identifies the backport of upstream commit 44c76825d6eefee9eb7ce06c38e1a6632ac7eb7d to cause issues in the 6.1.y series:
https://bugs.debian.org/1085762 https://lore.kernel.org/regressions/18f34d636390454180240e6a61af9217@kumkeo....
Thanks for the heads up!
This would appear to be the same bug (running qemu-aarch64 in user mode on an x86 machine) ? I've added Ulrich in recipients to confirm he's using qemu-user-static like I was.
pbuilder uses qemu inside the ARM64 chroot, yes. I don't know which variety of qemu, though.
Shame I didn't notice all his work, that would have saved me some time :)
Well, at least we now have confirmation from two different git bisects done by two different developers, pointing to the same commit ;-)
HTH, Uli
Mit freundlichen Grüßen / Kind regards
Dipl.-Inform. Ulrich Teichert Senior Software Developer
e.bs kumkeo GmbH Heidenkampsweg 82a 20097 Hamburg Germany
T: +49 40 2846761-0 F: +49 40 2846761-99
ulrich.teichert@kumkeo.de www.kumkeo.de
Amtsgericht Hamburg / Hamburg District Court, HRB 187712 Geschäftsführer / Managing Director: Michael Leitner; Günter Hagspiel
On Wed, Nov 20, 2024 at 08:25:57AM +0000, Ulrich Teichert wrote:
Hi Dominique & all,
Salvatore Bonaccorso wrote on Wed, Nov 20, 2024 at 08:01:23AM +0100:
Interestigly there is another report in Debian which identifies the backport of upstream commit 44c76825d6eefee9eb7ce06c38e1a6632ac7eb7d to cause issues in the 6.1.y series:
https://bugs.debian.org/1085762 https://lore.kernel.org/regressions/18f34d636390454180240e6a61af9217@kumkeo....
Thanks for the heads up!
This would appear to be the same bug (running qemu-aarch64 in user mode on an x86 machine) ? I've added Ulrich in recipients to confirm he's using qemu-user-static like I was.
pbuilder uses qemu inside the ARM64 chroot, yes. I don't know which variety of qemu, though.
Shame I didn't notice all his work, that would have saved me some time :)
Well, at least we now have confirmation from two different git bisects done by two different developers, pointing to the same commit ;-)
It seems like the correct first step is to revert the brk change. It's still not clear to me why the change is causing a problem -- I assume it is colliding with some other program area.
Is the problem strictly with qemu-user-static? (i.e. it was GCC running in qemu-user-static so the crash is qemu, not GCC) That should help me narrow down the issue. There must be some built-in assumption. And it's aarch64 within x86_64 where it happens (is the program within qemu also aarch64 or is it arm32)?
Is there an simple reproducer?
-Kees
On Wed, 20 Nov 2024, Kees Cook wrote:
Interestigly there is another report in Debian which identifies the backport of upstream commit 44c76825d6eefee9eb7ce06c38e1a6632ac7eb7d to cause issues in the 6.1.y series:
https://bugs.debian.org/1085762 https://lore.kernel.org/regressions/18f34d636390454180240e6a61af9217@kumkeo....
Thanks for the heads up!
This would appear to be the same bug (running qemu-aarch64 in user mode on an x86 machine) ? I've added Ulrich in recipients to confirm he's using qemu-user-static like I was.
pbuilder uses qemu inside the ARM64 chroot, yes. I don't know which variety of qemu, though.
Shame I didn't notice all his work, that would have saved me some time :)
Well, at least we now have confirmation from two different git bisects done by two different developers, pointing to the same commit ;-)
It seems like the correct first step is to revert the brk change. It's still not clear to me why the change is causing a problem -- I assume it is colliding with some other program area.
Is the problem strictly with qemu-user-static? (i.e. it was GCC running in qemu-user-static so the crash is qemu, not GCC) That should help me narrow down the issue. There must be some built-in assumption. And it's aarch64 within x86_64 where it happens (is the program within qemu also aarch64 or is it arm32)?
Is there an simple reproducer?
Also seeing /proc/<pid>/maps from a working case (without the patch) might give some clue.
Thanks,
Thank you for the replies,
Kees Cook wrote on Wed, Nov 20, 2024 at 10:08:04AM -0800:
It seems like the correct first step is to revert the brk change. It's still not clear to me why the change is causing a problem -- I assume it is colliding with some other program area.
Is the problem strictly with qemu-user-static? (i.e. it was GCC running in qemu-user-static so the crash is qemu, not GCC) That should help me narrow down the issue. There must be some built-in assumption. And it's aarch64 within x86_64 where it happens (is the program within qemu also aarch64 or is it arm32)?
As far as I'm aware I've only seen qemu-user-static fail for aarch64 (e.g. the arm32 variant works fine, didn't try other arches) in this particular configuration (the debian bookworm package has been built with --static-pie which is known to cause problems)
It's also fixed in newer versions of qemu, even with --static-pie, because they reworked the way they detect program mapppins in qemu commit dd55885516 ("linux-user: Rewrite non-fixed probe_guest_base") and that also fixed the issue (in qemu 8.1.0)
So, in short there are many fixes available; it's a qemu bug that assumed something about the memory layout and broke with this kaslr patch (and for some reason only happened on non-pie static build)
mjt will at the very least rebuild the package with pie enabled, because it's known to cause other issues with aarch64 and that was an oversight in the first place, so this issue will go away for debian without any further work.
This is the background behind me saying that this probably should be reverted in stable branches (to avoid other surprises with old userspace), but master can probably keep this commit if it brings tangible security benefits (and I think it does)
See the debian qemu-side of the bug for details: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087822
Is there an simple reproducer?
Unfortunately I couldn't get it to reproduce easily, but I think it's just a matter of finding the right binary that does problematic mappings (e.g. running true in a loop didn't work but running gcc in a loop did)
The most reliable reproducer I've been using is building a reasonably large program, we were working on modemmanager at the time so that's what I used; if usually fails between 100-300/500 of the build: ---- $ docker run -ti --rm --platform linux/arm64/v8 docker.io/arm64v8/alpine:3.20 sh / # apk add bash-completion-dev dbus-dev elogind-dev gobject-introspection-dev gtk-doc libgudev-dev libmbim-dev libqmi-dev linux-headers meson vala clang abuild alpine-sdk curl / # curl -O https://gitlab.freedesktop.org/mobile-broadband/ModemManager/-/archive/1.22.... / # tar xf ModemManager-1.22.0.tar.gz / # cd ModemManager-1.22.0/ / # abuild-meson \ -Db_lto=true \ -Dsystemdsystemunitdir=no \ -Ddbus_policy_dir=/usr/share/dbus-1/system.d \ -Dgtk_doc=true \ -Dsystemd_journal=false \ -Dsystemd_suspend_resume=true \ -Dvapi=true \ -Dpolkit=no \ . output / # meson compile -C output ----
If you take any of the command that failed from this build and run it in a loop, it'll also eventually fail after a couple hundred of invocations, but if your loop doesn't involve any parallelism that'll be slower to reproduce.
Note it requires qemu to be broken as well, so you'll have best chances with a debian bookworm (VM is fine; a chroot or podman instead of docker is also fine; in the chroot case ninja requires at least /dev (and possibly /proc) mounted)
Thanks,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Kerello christophe.kerello@foss.st.com
[ Upstream commit 722463f73bcf65a8c818752a38c14ee672c77da1 ]
Check regmap_read return value to avoid to use uninitialized local variables.
Signed-off-by: Christophe Kerello christophe.kerello@foss.st.com Link: https://lore.kernel.org/r/20240226101428.37791-3-christophe.kerello@foss.st.... Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/memory/stm32-fmc2-ebi.c | 122 +++++++++++++++++++++++--------- 1 file changed, 88 insertions(+), 34 deletions(-)
diff --git a/drivers/memory/stm32-fmc2-ebi.c b/drivers/memory/stm32-fmc2-ebi.c index ffec26a99313b..5c387d32c078f 100644 --- a/drivers/memory/stm32-fmc2-ebi.c +++ b/drivers/memory/stm32-fmc2-ebi.c @@ -179,8 +179,11 @@ static int stm32_fmc2_ebi_check_mux(struct stm32_fmc2_ebi *ebi, int cs) { u32 bcr; + int ret;
- regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + ret = regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + if (ret) + return ret;
if (bcr & FMC2_BCR_MTYP) return 0; @@ -193,8 +196,11 @@ static int stm32_fmc2_ebi_check_waitcfg(struct stm32_fmc2_ebi *ebi, int cs) { u32 bcr, val = FIELD_PREP(FMC2_BCR_MTYP, FMC2_BCR_MTYP_NOR); + int ret;
- regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + ret = regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + if (ret) + return ret;
if ((bcr & FMC2_BCR_MTYP) == val && bcr & FMC2_BCR_BURSTEN) return 0; @@ -207,8 +213,11 @@ static int stm32_fmc2_ebi_check_sync_trans(struct stm32_fmc2_ebi *ebi, int cs) { u32 bcr; + int ret;
- regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + ret = regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + if (ret) + return ret;
if (bcr & FMC2_BCR_BURSTEN) return 0; @@ -221,8 +230,11 @@ static int stm32_fmc2_ebi_check_async_trans(struct stm32_fmc2_ebi *ebi, int cs) { u32 bcr; + int ret;
- regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + ret = regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + if (ret) + return ret;
if (!(bcr & FMC2_BCR_BURSTEN) || !(bcr & FMC2_BCR_CBURSTRW)) return 0; @@ -235,8 +247,11 @@ static int stm32_fmc2_ebi_check_cpsize(struct stm32_fmc2_ebi *ebi, int cs) { u32 bcr, val = FIELD_PREP(FMC2_BCR_MTYP, FMC2_BCR_MTYP_PSRAM); + int ret;
- regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + ret = regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + if (ret) + return ret;
if ((bcr & FMC2_BCR_MTYP) == val && bcr & FMC2_BCR_BURSTEN) return 0; @@ -249,12 +264,18 @@ static int stm32_fmc2_ebi_check_address_hold(struct stm32_fmc2_ebi *ebi, int cs) { u32 bcr, bxtr, val = FIELD_PREP(FMC2_BXTR_ACCMOD, FMC2_BXTR_EXTMOD_D); + int ret; + + ret = regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + if (ret) + return ret;
- regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); if (prop->reg_type == FMC2_REG_BWTR) - regmap_read(ebi->regmap, FMC2_BWTR(cs), &bxtr); + ret = regmap_read(ebi->regmap, FMC2_BWTR(cs), &bxtr); else - regmap_read(ebi->regmap, FMC2_BTR(cs), &bxtr); + ret = regmap_read(ebi->regmap, FMC2_BTR(cs), &bxtr); + if (ret) + return ret;
if ((!(bcr & FMC2_BCR_BURSTEN) || !(bcr & FMC2_BCR_CBURSTRW)) && ((bxtr & FMC2_BXTR_ACCMOD) == val || bcr & FMC2_BCR_MUXEN)) @@ -268,12 +289,19 @@ static int stm32_fmc2_ebi_check_clk_period(struct stm32_fmc2_ebi *ebi, int cs) { u32 bcr, bcr1; + int ret;
- regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); - if (cs) - regmap_read(ebi->regmap, FMC2_BCR1, &bcr1); - else + ret = regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + if (ret) + return ret; + + if (cs) { + ret = regmap_read(ebi->regmap, FMC2_BCR1, &bcr1); + if (ret) + return ret; + } else { bcr1 = bcr; + }
if (bcr & FMC2_BCR_BURSTEN && (!cs || !(bcr1 & FMC2_BCR1_CCLKEN))) return 0; @@ -305,12 +333,18 @@ static u32 stm32_fmc2_ebi_ns_to_clk_period(struct stm32_fmc2_ebi *ebi, { u32 nb_clk_cycles = stm32_fmc2_ebi_ns_to_clock_cycles(ebi, cs, setup); u32 bcr, btr, clk_period; + int ret; + + ret = regmap_read(ebi->regmap, FMC2_BCR1, &bcr); + if (ret) + return ret;
- regmap_read(ebi->regmap, FMC2_BCR1, &bcr); if (bcr & FMC2_BCR1_CCLKEN || !cs) - regmap_read(ebi->regmap, FMC2_BTR1, &btr); + ret = regmap_read(ebi->regmap, FMC2_BTR1, &btr); else - regmap_read(ebi->regmap, FMC2_BTR(cs), &btr); + ret = regmap_read(ebi->regmap, FMC2_BTR(cs), &btr); + if (ret) + return ret;
clk_period = FIELD_GET(FMC2_BTR_CLKDIV, btr) + 1;
@@ -569,11 +603,16 @@ static int stm32_fmc2_ebi_set_address_setup(struct stm32_fmc2_ebi *ebi, if (ret) return ret;
- regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + ret = regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + if (ret) + return ret; + if (prop->reg_type == FMC2_REG_BWTR) - regmap_read(ebi->regmap, FMC2_BWTR(cs), &bxtr); + ret = regmap_read(ebi->regmap, FMC2_BWTR(cs), &bxtr); else - regmap_read(ebi->regmap, FMC2_BTR(cs), &bxtr); + ret = regmap_read(ebi->regmap, FMC2_BTR(cs), &bxtr); + if (ret) + return ret;
if ((bxtr & FMC2_BXTR_ACCMOD) == val || bcr & FMC2_BCR_MUXEN) val = clamp_val(setup, 1, FMC2_BXTR_ADDSET_MAX); @@ -691,11 +730,14 @@ static int stm32_fmc2_ebi_set_max_low_pulse(struct stm32_fmc2_ebi *ebi, int cs, u32 setup) { u32 old_val, new_val, pcscntr; + int ret;
if (setup < 1) return 0;
- regmap_read(ebi->regmap, FMC2_PCSCNTR, &pcscntr); + ret = regmap_read(ebi->regmap, FMC2_PCSCNTR, &pcscntr); + if (ret) + return ret;
/* Enable counter for the bank */ regmap_update_bits(ebi->regmap, FMC2_PCSCNTR, @@ -942,17 +984,20 @@ static void stm32_fmc2_ebi_disable_bank(struct stm32_fmc2_ebi *ebi, int cs) regmap_update_bits(ebi->regmap, FMC2_BCR(cs), FMC2_BCR_MBKEN, 0); }
-static void stm32_fmc2_ebi_save_setup(struct stm32_fmc2_ebi *ebi) +static int stm32_fmc2_ebi_save_setup(struct stm32_fmc2_ebi *ebi) { unsigned int cs; + int ret;
for (cs = 0; cs < FMC2_MAX_EBI_CE; cs++) { - regmap_read(ebi->regmap, FMC2_BCR(cs), &ebi->bcr[cs]); - regmap_read(ebi->regmap, FMC2_BTR(cs), &ebi->btr[cs]); - regmap_read(ebi->regmap, FMC2_BWTR(cs), &ebi->bwtr[cs]); + ret = regmap_read(ebi->regmap, FMC2_BCR(cs), &ebi->bcr[cs]); + ret |= regmap_read(ebi->regmap, FMC2_BTR(cs), &ebi->btr[cs]); + ret |= regmap_read(ebi->regmap, FMC2_BWTR(cs), &ebi->bwtr[cs]); + if (ret) + return ret; }
- regmap_read(ebi->regmap, FMC2_PCSCNTR, &ebi->pcscntr); + return regmap_read(ebi->regmap, FMC2_PCSCNTR, &ebi->pcscntr); }
static void stm32_fmc2_ebi_set_setup(struct stm32_fmc2_ebi *ebi) @@ -981,22 +1026,29 @@ static void stm32_fmc2_ebi_disable_banks(struct stm32_fmc2_ebi *ebi) }
/* NWAIT signal can not be connected to EBI controller and NAND controller */ -static bool stm32_fmc2_ebi_nwait_used_by_ctrls(struct stm32_fmc2_ebi *ebi) +static int stm32_fmc2_ebi_nwait_used_by_ctrls(struct stm32_fmc2_ebi *ebi) { + struct device *dev = ebi->dev; unsigned int cs; u32 bcr; + int ret;
for (cs = 0; cs < FMC2_MAX_EBI_CE; cs++) { if (!(ebi->bank_assigned & BIT(cs))) continue;
- regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + ret = regmap_read(ebi->regmap, FMC2_BCR(cs), &bcr); + if (ret) + return ret; + if ((bcr & FMC2_BCR_WAITEN || bcr & FMC2_BCR_ASYNCWAIT) && - ebi->bank_assigned & BIT(FMC2_NAND)) - return true; + ebi->bank_assigned & BIT(FMC2_NAND)) { + dev_err(dev, "NWAIT signal connected to EBI and NAND controllers\n"); + return -EINVAL; + } }
- return false; + return 0; }
static void stm32_fmc2_ebi_enable(struct stm32_fmc2_ebi *ebi) @@ -1083,10 +1135,9 @@ static int stm32_fmc2_ebi_parse_dt(struct stm32_fmc2_ebi *ebi) return -ENODEV; }
- if (stm32_fmc2_ebi_nwait_used_by_ctrls(ebi)) { - dev_err(dev, "NWAIT signal connected to EBI and NAND controllers\n"); - return -EINVAL; - } + ret = stm32_fmc2_ebi_nwait_used_by_ctrls(ebi); + if (ret) + return ret;
stm32_fmc2_ebi_enable(ebi);
@@ -1131,7 +1182,10 @@ static int stm32_fmc2_ebi_probe(struct platform_device *pdev) if (ret) goto err_release;
- stm32_fmc2_ebi_save_setup(ebi); + ret = stm32_fmc2_ebi_save_setup(ebi); + if (ret) + goto err_release; + platform_set_drvdata(pdev, ebi);
return 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
[ Upstream commit 73cb4a2d8d7e0259f94046116727084f21e4599f ]
Use irq*_rcu() functions to fix this kernel warning:
WARNING: CPU: 0 PID: 0 at kernel/context_tracking.c:367 ct_irq_enter+0xa0/0xd0 Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc3-64bit+ #1037 Hardware name: 9000/785/C3700
IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000412cd758 00000000412cd75c IIR: 03ffe01f ISR: 0000000000000000 IOR: 0000000043c20c20 CPU: 0 CR30: 0000000041caa000 CR31: 0000000000000000 ORIG_R28: 0000000000000005 IAOQ[0]: ct_irq_enter+0xa0/0xd0 IAOQ[1]: ct_irq_enter+0xa4/0xd0 RP(r2): irq_enter+0x34/0x68 Backtrace: [<000000004034a3ec>] irq_enter+0x34/0x68 [<000000004030dc48>] do_cpu_irq_mask+0xc0/0x450 [<0000000040303070>] intr_return+0x0/0xc
Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/parisc/kernel/irq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/parisc/kernel/irq.c b/arch/parisc/kernel/irq.c index 9ddb2e3970589..b481cde6bfb62 100644 --- a/arch/parisc/kernel/irq.c +++ b/arch/parisc/kernel/irq.c @@ -501,7 +501,7 @@ void do_cpu_irq_mask(struct pt_regs *regs)
old_regs = set_irq_regs(regs); local_irq_disable(); - irq_enter(); + irq_enter_rcu();
eirr_val = mfctl(23) & cpu_eiem & per_cpu(local_ack_eiem, cpu); if (!eirr_val) @@ -536,7 +536,7 @@ void do_cpu_irq_mask(struct pt_regs *regs) #endif /* CONFIG_IRQSTACKS */
out: - irq_exit(); + irq_exit_rcu(); set_irq_regs(old_regs); return;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li zeming zeming@nfschina.com
[ Upstream commit 69b0194ccec033c208b071e019032c1919c2822d ]
simple_malloc() will return NULL when there is not enough memory left. Check pointer 'new' before using it to copy the old data.
Signed-off-by: Li zeming zeming@nfschina.com [mpe: Reword subject, use change log from Christophe] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20221219021816.3012-1-zeming@nfschina.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/boot/simple_alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/boot/simple_alloc.c b/arch/powerpc/boot/simple_alloc.c index 267d6524caac4..db9aaa5face3f 100644 --- a/arch/powerpc/boot/simple_alloc.c +++ b/arch/powerpc/boot/simple_alloc.c @@ -112,7 +112,9 @@ static void *simple_realloc(void *ptr, unsigned long size) return ptr;
new = simple_malloc(size); - memcpy(new, ptr, p->size); + if (new) + memcpy(new, ptr, p->size); + simple_free(ptr); return new; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit f2d5bccaca3e8c09c9b9c8485375f7bdbb2631d2 ]
simple_realloc() frees the original buffer (ptr) even if the reallocation failed.
Fix it to behave like standard realloc() and only free the original buffer if the reallocation succeeded.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240229115149.749264-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/boot/simple_alloc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/boot/simple_alloc.c b/arch/powerpc/boot/simple_alloc.c index db9aaa5face3f..d07796fdf91aa 100644 --- a/arch/powerpc/boot/simple_alloc.c +++ b/arch/powerpc/boot/simple_alloc.c @@ -112,10 +112,11 @@ static void *simple_realloc(void *ptr, unsigned long size) return ptr;
new = simple_malloc(size); - if (new) + if (new) { memcpy(new, ptr, p->size); + simple_free(ptr); + }
- simple_free(ptr); return new; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
[ Upstream commit 778e618b8bfedcc39354373c1b072c5fe044fa7b ]
There's a BUG_ON checking for a valid pointer of fs_info::delayed_root but it is valid since init_mount_fs_info() and has the same lifetime as fs_info.
Reviewed-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/delayed-inode.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 1494ce990d298..948104332b4da 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -420,8 +420,6 @@ static void __btrfs_remove_delayed_item(struct btrfs_delayed_item *delayed_item)
delayed_root = delayed_node->root->fs_info->delayed_root;
- BUG_ON(!delayed_root); - if (delayed_item->type == BTRFS_DELAYED_INSERTION_ITEM) root = &delayed_node->ins_root; else
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
[ Upstream commit be73f4448b607e6b7ce41cd8ef2214fdf6e7986f ]
The pointer to root is initialized in btrfs_init_delayed_node(), no need to check for it again. Change the BUG_ON to assertion.
Reviewed-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/delayed-inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 948104332b4da..052112d0daa74 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -968,7 +968,7 @@ static void btrfs_release_delayed_inode(struct btrfs_delayed_node *delayed_node)
if (delayed_node && test_bit(BTRFS_DELAYED_NODE_INODE_DIRTY, &delayed_node->flags)) { - BUG_ON(!delayed_node->root); + ASSERT(delayed_node->root); clear_bit(BTRFS_DELAYED_NODE_INODE_DIRTY, &delayed_node->flags); delayed_node->count--;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
[ Upstream commit b2136cc288fce2f24a92f3d656531b2d50ebec5a ]
Allocate fs_info and root to have a valid fs_info pointer in case it's dereferenced by a helper outside of tests, like find_lock_delalloc_range().
Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tests/extent-io-tests.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c index 350da449db084..d6a5e6afd5dc0 100644 --- a/fs/btrfs/tests/extent-io-tests.c +++ b/fs/btrfs/tests/extent-io-tests.c @@ -11,6 +11,7 @@ #include "btrfs-tests.h" #include "../ctree.h" #include "../extent_io.h" +#include "../disk-io.h" #include "../btrfs_inode.h"
#define PROCESS_UNLOCK (1 << 0) @@ -105,9 +106,11 @@ static void dump_extent_io_tree(const struct extent_io_tree *tree) } }
-static int test_find_delalloc(u32 sectorsize) +static int test_find_delalloc(u32 sectorsize, u32 nodesize) { - struct inode *inode; + struct btrfs_fs_info *fs_info; + struct btrfs_root *root = NULL; + struct inode *inode = NULL; struct extent_io_tree *tmp; struct page *page; struct page *locked_page = NULL; @@ -121,12 +124,27 @@ static int test_find_delalloc(u32 sectorsize)
test_msg("running find delalloc tests");
+ fs_info = btrfs_alloc_dummy_fs_info(nodesize, sectorsize); + if (!fs_info) { + test_std_err(TEST_ALLOC_FS_INFO); + return -ENOMEM; + } + + root = btrfs_alloc_dummy_root(fs_info); + if (IS_ERR(root)) { + test_std_err(TEST_ALLOC_ROOT); + ret = PTR_ERR(root); + goto out; + } + inode = btrfs_new_test_inode(); if (!inode) { test_std_err(TEST_ALLOC_INODE); - return -ENOMEM; + ret = -ENOMEM; + goto out; } tmp = &BTRFS_I(inode)->io_tree; + BTRFS_I(inode)->root = root;
/* * Passing NULL as we don't have fs_info but tracepoints are not used @@ -316,6 +334,8 @@ static int test_find_delalloc(u32 sectorsize) process_page_range(inode, 0, total_dirty - 1, PROCESS_UNLOCK | PROCESS_RELEASE); iput(inode); + btrfs_free_dummy_root(root); + btrfs_free_dummy_fs_info(fs_info); return ret; }
@@ -598,7 +618,7 @@ int btrfs_test_extent_io(u32 sectorsize, u32 nodesize)
test_msg("running extent I/O tests");
- ret = test_find_delalloc(sectorsize); + ret = test_find_delalloc(sectorsize, nodesize); if (ret) goto out;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
[ Upstream commit 6fbc6f4ac1f4907da4fc674251527e7dc79ffbf6 ]
The may_destroy_subvol() looks up a root by a key, allowing to do an inexact search when key->offset is -1. It's never expected to find such item, as it would break the allowed range of a root id.
Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/inode.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 10ded9c2be03b..bd3388e1b532e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4614,7 +4614,14 @@ static noinline int may_destroy_subvol(struct btrfs_root *root) ret = btrfs_search_slot(NULL, fs_info->tree_root, &key, path, 0, 0); if (ret < 0) goto out; - BUG_ON(ret == 0); + if (ret == 0) { + /* + * Key with offset -1 found, there would have to exist a root + * with such id, but this is out of valid range. + */ + ret = -EUCLEAN; + goto out; + }
ret = 0; if (path->slots[0] > 0) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
[ Upstream commit e80e3f732cf53c64b0d811e1581470d67f6c3228 ]
Change BUG_ON to a proper error handling in the unlikely case of seeing data when the command is started. This is supposed to be reset when the command is finished (send_cmd, send_encoded_extent).
Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/send.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index ec3db315f5618..cfbd3ab679117 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -720,7 +720,12 @@ static int begin_cmd(struct send_ctx *sctx, int cmd) if (WARN_ON(!sctx->send_buf)) return -EINVAL;
- BUG_ON(sctx->send_size); + if (unlikely(sctx->send_size != 0)) { + btrfs_err(sctx->send_root->fs_info, + "send: command header buffer not empty cmd %d offset %llu", + cmd, sctx->send_off); + return -EINVAL; + }
sctx->send_size += sizeof(*hdr); hdr = (struct btrfs_cmd_header *)sctx->send_buf;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
[ Upstream commit 56f335e043ae73c32dbb70ba95488845dc0f1e6e ]
There's only one caller of tree_move_down() that does not pass level 0 so the assertion is better suited here.
Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/send.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index cfbd3ab679117..cc57a97860d8a 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -7185,8 +7185,8 @@ static int tree_move_down(struct btrfs_path *path, int *level, u64 reada_min_gen u64 reada_done = 0;
lockdep_assert_held_read(&parent->fs_info->commit_root_sem); + ASSERT(*level != 0);
- BUG_ON(*level == 0); eb = btrfs_read_node_slot(parent, slot); if (IS_ERR(eb)) return PTR_ERR(eb);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
[ Upstream commit f40a3ea94881f668084f68f6b9931486b1606db0 ]
The BUG_ON is deep in the qgroup code where we can expect that it exists. A NULL pointer would cause a crash.
It was added long ago in 550d7a2ed5db35 ("btrfs: qgroup: Add new qgroup calculation function btrfs_qgroup_account_extents()."). It maybe made sense back then as the quota enable/disable state machine was not that robust as it is nowadays, so we can just delete it.
Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/qgroup.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index e482889667ec9..f3b066b442807 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2697,8 +2697,6 @@ int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr, if (nr_old_roots == 0 && nr_new_roots == 0) goto out_free;
- BUG_ON(!fs_info->quota_root); - trace_btrfs_qgroup_account_extent(fs_info, trans->transid, bytenr, num_bytes, nr_old_roots, nr_new_roots);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhiguo Niu zhiguo.niu@unisoc.com
[ Upstream commit 36959d18c3cf09b3c12157c6950e18652067de77 ]
If GET_SEGNO return NULL_SEGNO for some unecpected case, update_sit_entry will access invalid memory address, cause system crash. It is better to do sanity check about GET_SEGNO just like update_segment_mtime & locate_dirty_segment.
Also remove some redundant judgment code.
Signed-off-by: Zhiguo Niu zhiguo.niu@unisoc.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/segment.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 1264a350d4d75..947849e66b0a7 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2191,6 +2191,8 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del) #endif
segno = GET_SEGNO(sbi, blkaddr); + if (segno == NULL_SEGNO) + return;
se = get_seg_entry(sbi, segno); new_vblocks = se->valid_blocks + del; @@ -3286,8 +3288,7 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, * since SSR needs latest valid block information. */ update_sit_entry(sbi, *new_blkaddr, 1); - if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO) - update_sit_entry(sbi, old_blkaddr, -1); + update_sit_entry(sbi, old_blkaddr, -1);
if (!__has_curseg_space(sbi, curseg)) { if (from_gc)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
[ Upstream commit 87850f6cc20911e35eafcbc1d56b0d649ae9162d ]
This fixes a W=1 warning about sprintf writing up to 16 bytes into a buffer of size 14. There is no practical relevance because there are not more than 32 endpoints.
Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Link: https://lore.kernel.org/r/6754df25c56aae04f8110594fad2cd2452b1862a.170870912... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/udc/fsl_udc_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c b/drivers/usb/gadget/udc/fsl_udc_core.c index a67873a074b7b..c1a62ebd78d66 100644 --- a/drivers/usb/gadget/udc/fsl_udc_core.c +++ b/drivers/usb/gadget/udc/fsl_udc_core.c @@ -2487,7 +2487,7 @@ static int fsl_udc_probe(struct platform_device *pdev) /* setup the udc->eps[] for non-control endpoints and link * to gadget.ep_list */ for (i = 1; i < (int)(udc_controller->max_ep / 2); i++) { - char name[14]; + char name[16];
sprintf(name, "ep%dout", i); struct_ep_setup(udc_controller, i * 2, name, 1);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Keith Busch kbusch@kernel.org
[ Upstream commit 7e80eb792bd7377a20f204943ac31c77d859be89 ]
The memory allocated for the identification is freed on failure. Set it to NULL so the caller doesn't have a pointer to that freed address.
Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 1aff793a1d77e..0729ab5430725 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1366,8 +1366,10 @@ static int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id)
error = nvme_submit_sync_cmd(dev->admin_q, &c, *id, sizeof(struct nvme_id_ctrl)); - if (error) + if (error) { kfree(*id); + *id = NULL; + } return error; }
@@ -1496,6 +1498,7 @@ static int nvme_identify_ns(struct nvme_ctrl *ctrl, unsigned nsid, if (error) { dev_warn(ctrl->device, "Identify namespace failed (%d)\n", error); kfree(*id); + *id = NULL; } return error; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 0f0639b4d6f649338ce29c62da3ec0787fa08cd1 ]
This fixes attempting to access past ethhdr.h_source, although it seems intentional to copy also the contents of h_proto this triggers out-of-bound access problems with the likes of static analyzer, so this instead just copy ETH_ALEN and then proceed to use put_unaligned to copy h_proto separetely.
Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/bnep/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c index 5a6a49885ab66..a660c428e2207 100644 --- a/net/bluetooth/bnep/core.c +++ b/net/bluetooth/bnep/core.c @@ -385,7 +385,8 @@ static int bnep_rx_frame(struct bnep_session *s, struct sk_buff *skb)
case BNEP_COMPRESSED_DST_ONLY: __skb_put_data(nskb, skb_mac_header(skb), ETH_ALEN); - __skb_put_data(nskb, s->eh.h_source, ETH_ALEN + 2); + __skb_put_data(nskb, s->eh.h_source, ETH_ALEN); + put_unaligned(s->eh.h_proto, (__be16 *)__skb_put(nskb, 2)); break;
case BNEP_GENERAL:
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit 66626b15636b5f5cf3d7f6104799f77462748974 ]
Initialize debugfs_root to -ENODEV so that if the client never sets a valid debugfs root the debugfs files will not be created.
A NULL pointer passed to any of the debugfs_create_*() functions means "create in the root of debugfs". It doesn't mean "ignore".
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Link: https://msgid.link/r/20240307105353.40067-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/cirrus/cs_dsp.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/cirrus/cs_dsp.c b/drivers/firmware/cirrus/cs_dsp.c index ee4c32669607f..68005cce01360 100644 --- a/drivers/firmware/cirrus/cs_dsp.c +++ b/drivers/firmware/cirrus/cs_dsp.c @@ -490,7 +490,7 @@ void cs_dsp_cleanup_debugfs(struct cs_dsp *dsp) { cs_dsp_debugfs_clear(dsp); debugfs_remove_recursive(dsp->debugfs_root); - dsp->debugfs_root = NULL; + dsp->debugfs_root = ERR_PTR(-ENODEV); } EXPORT_SYMBOL_GPL(cs_dsp_cleanup_debugfs); #else @@ -2300,6 +2300,11 @@ static int cs_dsp_common_init(struct cs_dsp *dsp)
mutex_init(&dsp->pwr_lock);
+#ifdef CONFIG_DEBUG_FS + /* Ensure this is invalid if client never provides a debugfs root */ + dsp->debugfs_root = ERR_PTR(-ENODEV); +#endif + return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Belloni alexandre.belloni@bootlin.com
[ Upstream commit babfeb9cbe7ebc657bd5b3e4f9fde79f560b6acc ]
alarm_enable and alarm_flag are allowed to be NULL but will be dereferenced later by the dev_dbg call.
Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter error27@gmail.com Closes: https://lore.kernel.org/r/202305180042.DEzW1pSd-lkp@intel.com/ Link: https://lore.kernel.org/r/20240229222127.1878176-1-alexandre.belloni@bootlin... Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-nct3018y.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/rtc/rtc-nct3018y.c b/drivers/rtc/rtc-nct3018y.c index d43acd3920ed3..108eced8f0030 100644 --- a/drivers/rtc/rtc-nct3018y.c +++ b/drivers/rtc/rtc-nct3018y.c @@ -99,6 +99,8 @@ static int nct3018y_get_alarm_mode(struct i2c_client *client, unsigned char *ala if (flags < 0) return flags; *alarm_enable = flags & NCT3018Y_BIT_AIE; + dev_dbg(&client->dev, "%s:alarm_enable:%x\n", __func__, *alarm_enable); + }
if (alarm_flag) { @@ -107,11 +109,9 @@ static int nct3018y_get_alarm_mode(struct i2c_client *client, unsigned char *ala if (flags < 0) return flags; *alarm_flag = flags & NCT3018Y_BIT_AF; + dev_dbg(&client->dev, "%s:alarm_flag:%x\n", __func__, *alarm_flag); }
- dev_dbg(&client->dev, "%s:alarm_enable:%x alarm_flag:%x\n", - __func__, *alarm_enable, *alarm_flag); - return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jian Shen shenjian15@huawei.com
[ Upstream commit 4e2969a0d6a7549bc0bc1ebc990588b622c4443d ]
Add checking for vf id of mailbox, in order to avoid array out-of-bounds risk.
Signed-off-by: Jian Shen shenjian15@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Reviewed-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c index 877feee53804f..61e155c4d441e 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c @@ -1124,10 +1124,11 @@ void hclge_mbx_handler(struct hclge_dev *hdev) req = (struct hclge_mbx_vf_to_pf_cmd *)desc->data;
flag = le16_to_cpu(crq->desc[crq->next_to_use].flag); - if (unlikely(!hnae3_get_bit(flag, HCLGE_CMDQ_RX_OUTVLD_B))) { + if (unlikely(!hnae3_get_bit(flag, HCLGE_CMDQ_RX_OUTVLD_B) || + req->mbx_src_vfid > hdev->num_req_vfs)) { dev_warn(&hdev->pdev->dev, - "dropped invalid mailbox message, code = %u\n", - req->msg.code); + "dropped invalid mailbox message, code = %u, vfid = %u\n", + req->msg.code, req->mbx_src_vfid);
/* dropping/not processing this invalid message */ crq->desc[crq->next_to_use].flag = 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hannes Reinecke hare@suse.de
[ Upstream commit 0889d13b9e1cbef49e802ae09f3b516911ad82a1 ]
When the length check for an icreq sqe fails we should not continue processing but rather return immediately as all other contents of that sqe cannot be relied on.
Signed-off-by: Hannes Reinecke hare@suse.de Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/tcp.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 5556f55880411..76b9eb438268f 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -836,6 +836,7 @@ static int nvmet_tcp_handle_icreq(struct nvmet_tcp_queue *queue) pr_err("bad nvme-tcp pdu length (%d)\n", le32_to_cpu(icreq->hdr.plen)); nvmet_tcp_fatal_error(queue); + return -EPROTO; }
if (icreq->pfv != NVME_TCP_PFV_1_0) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 2fdbc20036acda9e5694db74a032d3c605323005 ]
If pnfsd_update_layout() is called on a file for which recovery has failed it will enter a tight infinite loop.
NFS_LAYOUT_INVALID_STID will be set, nfs4_select_rw_stateid() will return -EIO, and nfs4_schedule_stateid_recovery() will do nothing, so nfs4_client_recover_expired_lease() will not wait. So the code will loop indefinitely.
Break the loop by testing the validity of the open stateid at the top of the loop.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/pnfs.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 4448ff829cbb9..8c1f47ca5dc53 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -1997,6 +1997,14 @@ pnfs_update_layout(struct inode *ino, }
lookup_again: + if (!nfs4_valid_open_stateid(ctx->state)) { + trace_pnfs_update_layout(ino, pos, count, + iomode, lo, lseg, + PNFS_UPDATE_LAYOUT_INVALID_OPEN); + lseg = ERR_PTR(-EIO); + goto out; + } + lseg = ERR_PTR(nfs4_client_recover_expired_lease(clp)); if (IS_ERR(lseg)) goto out;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oreoluwa Babatunde quic_obabatun@quicinc.com
[ Upstream commit 7b432bf376c9c198a7ff48f1ed14a14c0ffbe1fe ]
The unflatten_and_copy_device_tree() function contains a call to memblock_alloc(). This means that memblock is allocating memory before any of the reserved memory regions are set aside in the setup_memory() function which calls early_init_fdt_scan_reserved_mem(). Therefore, there is a possibility for memblock to allocate from any of the reserved memory regions.
Hence, move the call to setup_memory() to be earlier in the init sequence so that the reserved memory regions are set aside before any allocations are done using memblock.
Signed-off-by: Oreoluwa Babatunde quic_obabatun@quicinc.com Signed-off-by: Stafford Horne shorne@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/openrisc/kernel/setup.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/openrisc/kernel/setup.c b/arch/openrisc/kernel/setup.c index 0cd04d936a7a1..f2fe45d3094df 100644 --- a/arch/openrisc/kernel/setup.c +++ b/arch/openrisc/kernel/setup.c @@ -270,6 +270,9 @@ void calibrate_delay(void)
void __init setup_arch(char **cmdline_p) { + /* setup memblock allocator */ + setup_memory(); + unflatten_and_copy_device_tree();
setup_cpuinfo(); @@ -293,9 +296,6 @@ void __init setup_arch(char **cmdline_p) } #endif
- /* setup memblock allocator */ - setup_memory(); - /* paging_init() sets up the MMU and marks all pages as reserved */ paging_init();
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Gordeev agordeev@linux.ibm.com
[ Upstream commit 4e8477aeb46dfe74e829c06ea588dd00ba20c8cc ]
Fix IUCV_IPBUFLST-type buffers virtual vs physical address confusion. This does not fix a bug since virtual and physical address spaces are currently the same.
Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Reviewed-by: Alexandra Winter wintera@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/iucv/iucv.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c index db41eb2d977f2..038e1ba9aec27 100644 --- a/net/iucv/iucv.c +++ b/net/iucv/iucv.c @@ -1090,8 +1090,7 @@ static int iucv_message_receive_iprmdata(struct iucv_path *path, size = (size < 8) ? size : 8; for (array = buffer; size > 0; array++) { copy = min_t(size_t, size, array->length); - memcpy((u8 *)(addr_t) array->address, - rmmsg, copy); + memcpy(phys_to_virt(array->address), rmmsg, copy); rmmsg += copy; size -= copy; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Biju Das biju.das.jz@bp.renesas.com
[ Upstream commit dce0919c83c325ac9dec5bc8838d5de6d32c01b1 ]
As per the hardware team, TIEN and TINT source should not set at the same time due to a possible hardware race leading to spurious IRQ.
Currently on some scenarios hardware settings for TINT detection is not in sync with TINT source as the enable/disable overrides source setting value leading to hardware inconsistent state. For eg: consider the case GPIOINT0 is used as TINT interrupt and configuring GPIOINT5 as edge type. During rzg2l_irq_set_type(), TINT source for GPIOINT5 is set. On disable(), clearing of the entire bytes of TINT source selection for GPIOINT5 is same as GPIOINT0 with TIEN disabled. Apart from this during enable(), the setting of GPIOINT5 with TIEN results in spurious IRQ as due to a HW race, it is possible that IP can use the TIEN with previous source value (GPIOINT0).
So, just update TIEN during enable/disable as TINT source is already set during rzg2l_irq_set_type(). This will make the consistent hardware settings for detection method tied with TINT source and allows to simplify the code.
Signed-off-by: Biju Das biju.das.jz@bp.renesas.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-renesas-rzg2l.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/irqchip/irq-renesas-rzg2l.c b/drivers/irqchip/irq-renesas-rzg2l.c index be71459c7465a..70279ca7e6278 100644 --- a/drivers/irqchip/irq-renesas-rzg2l.c +++ b/drivers/irqchip/irq-renesas-rzg2l.c @@ -132,7 +132,7 @@ static void rzg2l_irqc_irq_disable(struct irq_data *d)
raw_spin_lock(&priv->lock); reg = readl_relaxed(priv->base + TSSR(tssr_index)); - reg &= ~(TSSEL_MASK << TSSEL_SHIFT(tssr_offset)); + reg &= ~(TIEN << TSSEL_SHIFT(tssr_offset)); writel_relaxed(reg, priv->base + TSSR(tssr_index)); raw_spin_unlock(&priv->lock); } @@ -145,7 +145,6 @@ static void rzg2l_irqc_irq_enable(struct irq_data *d)
if (hw_irq >= IRQC_TINT_START && hw_irq < IRQC_NUM_IRQ) { struct rzg2l_irqc_priv *priv = irq_data_to_priv(d); - unsigned long tint = (uintptr_t)d->chip_data; u32 offset = hw_irq - IRQC_TINT_START; u32 tssr_offset = TSSR_OFFSET(offset); u8 tssr_index = TSSR_INDEX(offset); @@ -153,7 +152,7 @@ static void rzg2l_irqc_irq_enable(struct irq_data *d)
raw_spin_lock(&priv->lock); reg = readl_relaxed(priv->base + TSSR(tssr_index)); - reg |= (TIEN | tint) << TSSEL_SHIFT(tssr_offset); + reg |= TIEN << TSSEL_SHIFT(tssr_offset); writel_relaxed(reg, priv->base + TSSR(tssr_index)); raw_spin_unlock(&priv->lock); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adrian Hunter adrian.hunter@intel.com
[ Upstream commit d0304569fb019d1bcfbbbce1ce6df6b96f04079b ]
Kernel timekeeping is designed to keep the change in cycles (since the last timer interrupt) below max_cycles, which prevents multiplication overflow when converting cycles to nanoseconds. However, if timer interrupts stop, the clocksource_cyc2ns() calculation will eventually overflow.
Add protection against that. Simplify by folding together clocksource_delta() and clocksource_cyc2ns() into cycles_to_nsec_safe(). Check against max_cycles, falling back to a slower higher precision calculation.
Suggested-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20240325064023.2997-20-adrian.hunter@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/time/clocksource.c | 42 +++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 22 deletions(-)
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index cd9a59011dee9..a3650699463bb 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -20,6 +20,16 @@ #include "tick-internal.h" #include "timekeeping_internal.h"
+static noinline u64 cycles_to_nsec_safe(struct clocksource *cs, u64 start, u64 end) +{ + u64 delta = clocksource_delta(end, start, cs->mask); + + if (likely(delta < cs->max_cycles)) + return clocksource_cyc2ns(delta, cs->mult, cs->shift); + + return mul_u64_u32_shr(delta, cs->mult, cs->shift); +} + /** * clocks_calc_mult_shift - calculate mult/shift factors for scaled math of clocks * @mult: pointer to mult variable @@ -219,8 +229,8 @@ enum wd_read_status { static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow) { unsigned int nretries, max_retries; - u64 wd_end, wd_end2, wd_delta; int64_t wd_delay, wd_seq_delay; + u64 wd_end, wd_end2;
max_retries = clocksource_get_max_watchdog_retry(); for (nretries = 0; nretries <= max_retries; nretries++) { @@ -231,9 +241,7 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, wd_end2 = watchdog->read(watchdog); local_irq_enable();
- wd_delta = clocksource_delta(wd_end, *wdnow, watchdog->mask); - wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, - watchdog->shift); + wd_delay = cycles_to_nsec_safe(watchdog, *wdnow, wd_end); if (wd_delay <= WATCHDOG_MAX_SKEW) { if (nretries > 1 && nretries >= max_retries) { pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n", @@ -251,8 +259,7 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, * report system busy, reinit the watchdog and skip the current * watchdog test. */ - wd_delta = clocksource_delta(wd_end2, wd_end, watchdog->mask); - wd_seq_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, watchdog->shift); + wd_seq_delay = cycles_to_nsec_safe(watchdog, wd_end, wd_end2); if (wd_seq_delay > WATCHDOG_MAX_SKEW/2) goto skip_test; } @@ -363,8 +370,7 @@ void clocksource_verify_percpu(struct clocksource *cs) delta = (csnow_end - csnow_mid) & cs->mask; if (delta < 0) cpumask_set_cpu(cpu, &cpus_ahead); - delta = clocksource_delta(csnow_end, csnow_begin, cs->mask); - cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift); + cs_nsec = cycles_to_nsec_safe(cs, csnow_begin, csnow_end); if (cs_nsec > cs_nsec_max) cs_nsec_max = cs_nsec; if (cs_nsec < cs_nsec_min) @@ -395,8 +401,8 @@ static inline void clocksource_reset_watchdog(void)
static void clocksource_watchdog(struct timer_list *unused) { - u64 csnow, wdnow, cslast, wdlast, delta; int64_t wd_nsec, cs_nsec, interval; + u64 csnow, wdnow, cslast, wdlast; int next_cpu, reset_pending; struct clocksource *cs; enum wd_read_status read_ret; @@ -453,12 +459,8 @@ static void clocksource_watchdog(struct timer_list *unused) continue; }
- delta = clocksource_delta(wdnow, cs->wd_last, watchdog->mask); - wd_nsec = clocksource_cyc2ns(delta, watchdog->mult, - watchdog->shift); - - delta = clocksource_delta(csnow, cs->cs_last, cs->mask); - cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift); + wd_nsec = cycles_to_nsec_safe(watchdog, cs->wd_last, wdnow); + cs_nsec = cycles_to_nsec_safe(cs, cs->cs_last, csnow); wdlast = cs->wd_last; /* save these in case we print them */ cslast = cs->cs_last; cs->cs_last = csnow; @@ -821,7 +823,7 @@ void clocksource_start_suspend_timing(struct clocksource *cs, u64 start_cycles) */ u64 clocksource_stop_suspend_timing(struct clocksource *cs, u64 cycle_now) { - u64 now, delta, nsec = 0; + u64 now, nsec = 0;
if (!suspend_clocksource) return 0; @@ -836,12 +838,8 @@ u64 clocksource_stop_suspend_timing(struct clocksource *cs, u64 cycle_now) else now = suspend_clocksource->read(suspend_clocksource);
- if (now > suspend_start) { - delta = clocksource_delta(now, suspend_start, - suspend_clocksource->mask); - nsec = mul_u64_u32_shr(delta, suspend_clocksource->mult, - suspend_clocksource->shift); - } + if (now > suspend_start) + nsec = cycles_to_nsec_safe(suspend_clocksource, suspend_start, now);
/* * Disable the suspend timer to save power if current clocksource is
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gergo Koteles soyer@irl.hu
[ Upstream commit e71c8481692582c70cdfd0996c20cdcc71e425d3 ]
W=1 warns about null argument to kprintf: warning: ‘%s’ directive argument is null [-Wformat-overflow=] pr_info("product: %s year: %d\n", product, year);
Use "unknown" instead of NULL.
Signed-off-by: Gergo Koteles soyer@irl.hu Reviewed-by: Kuppuswamy Sathyanarayanan sathyanarayanan.kuppuswamy@linux.intel.com Link: https://lore.kernel.org/r/33d40e976f08f82b9227d0ecae38c787fcc0c0b2.171215468... Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/lg-laptop.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/platform/x86/lg-laptop.c b/drivers/platform/x86/lg-laptop.c index 2e1dc91bfc764..5704981d18487 100644 --- a/drivers/platform/x86/lg-laptop.c +++ b/drivers/platform/x86/lg-laptop.c @@ -715,7 +715,7 @@ static int acpi_add(struct acpi_device *device) default: year = 2019; } - pr_info("product: %s year: %d\n", product, year); + pr_info("product: %s year: %d\n", product ?: "unknown", year);
if (year >= 2019) battery_limit_use_wmbb = 1;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krishna Kurapati quic_kriskura@quicinc.com
[ Upstream commit 89d7f962994604a3e3d480832788d06179abefc5 ]
On some SoC's like SA8295P where the tertiary controller is host-only capable, GEVTADDRHI/LO, GEVTSIZ, GEVTCOUNT registers are not accessible. Trying to access them leads to a crash.
For DRD/Peripheral supported controllers, event buffer setup is done again in gadget_pullup. Skip setup or cleanup of event buffers if controller is host-only capable.
Suggested-by: Johan Hovold johan@kernel.org Signed-off-by: Krishna Kurapati quic_kriskura@quicinc.com Acked-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Reviewed-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Bjorn Andersson andersson@kernel.org Tested-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20240420044901.884098-4-quic_kriskura@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/core.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 94bc7786a3c4e..4964fa7419efa 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -506,6 +506,13 @@ static void dwc3_free_event_buffers(struct dwc3 *dwc) static int dwc3_alloc_event_buffers(struct dwc3 *dwc, unsigned int length) { struct dwc3_event_buffer *evt; + unsigned int hw_mode; + + hw_mode = DWC3_GHWPARAMS0_MODE(dwc->hwparams.hwparams0); + if (hw_mode == DWC3_GHWPARAMS0_MODE_HOST) { + dwc->ev_buf = NULL; + return 0; + }
evt = dwc3_alloc_one_event_buffer(dwc, length); if (IS_ERR(evt)) { @@ -527,6 +534,9 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc) { struct dwc3_event_buffer *evt;
+ if (!dwc->ev_buf) + return 0; + evt = dwc->ev_buf; evt->lpos = 0; dwc3_writel(dwc->regs, DWC3_GEVNTADRLO(0), @@ -544,6 +554,9 @@ void dwc3_event_buffers_cleanup(struct dwc3 *dwc) { struct dwc3_event_buffer *evt;
+ if (!dwc->ev_buf) + return; + evt = dwc->ev_buf;
evt->lpos = 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Abdulrasaq Lawani abdulrasaqolawani@gmail.com
[ Upstream commit ce4a7ae84a58b9f33aae8d6c769b3c94f3d5ce76 ]
Replaced instance of of_node_put with __free(device_node) to simplify code and protect against any memory leaks due to future changes in the control flow.
Suggested-by: Julia Lawall julia.lawall@inria.fr Signed-off-by: Abdulrasaq Lawani abdulrasaqolawani@gmail.com Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/offb.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/video/fbdev/offb.c b/drivers/video/fbdev/offb.c index 91001990e351c..6f0a9851b0924 100644 --- a/drivers/video/fbdev/offb.c +++ b/drivers/video/fbdev/offb.c @@ -355,7 +355,7 @@ static void offb_init_palette_hacks(struct fb_info *info, struct device_node *dp par->cmap_type = cmap_gxt2000; } else if (of_node_name_prefix(dp, "vga,Display-")) { /* Look for AVIVO initialized by SLOF */ - struct device_node *pciparent = of_get_parent(dp); + struct device_node *pciparent __free(device_node) = of_get_parent(dp); const u32 *vid, *did; vid = of_get_property(pciparent, "vendor-id", NULL); did = of_get_property(pciparent, "device-id", NULL); @@ -367,7 +367,6 @@ static void offb_init_palette_hacks(struct fb_info *info, struct device_node *dp if (par->cmap_adr) par->cmap_type = cmap_avivo; } - of_node_put(pciparent); } else if (dp && of_device_is_compatible(dp, "qemu,std-vga")) { #ifdef __BIG_ENDIAN const __be32 io_of_addr[3] = { 0x01000000, 0x0, 0x0 };
Greg, Sasha,
On Tue, Aug 27, 2024 at 04:38:30PM GMT, Greg Kroah-Hartman wrote:
6.1-stable review patch. If anyone has any objections, please let me know.
We got v6.1.107 build error on ppc64le due to this commit, with:
CC drivers/video/fbdev/offb.o drivers/video/fbdev/offb.c: In function 'offb_init_palette_hacks': drivers/video/fbdev/offb.c:358:24: error: cleanup argument not a function 358 | struct device_node *pciparent __free(device_node) = of_get_parent(dp); | ^~~~~~~~~~~ make[4]: *** [scripts/Makefile.build:250: drivers/video/fbdev/offb.o] Error 1
Where CONFIG_FB_OF is enabled. Perhaps, due to this commit not being picked:
9448e55d032d ("of: Add cleanup.h based auto release via __free(device_node) markings")
Thanks,
From: Abdulrasaq Lawani abdulrasaqolawani@gmail.com
[ Upstream commit ce4a7ae84a58b9f33aae8d6c769b3c94f3d5ce76 ]
Replaced instance of of_node_put with __free(device_node) to simplify code and protect against any memory leaks due to future changes in the control flow.
Suggested-by: Julia Lawall julia.lawall@inria.fr Signed-off-by: Abdulrasaq Lawani abdulrasaqolawani@gmail.com Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org
drivers/video/fbdev/offb.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/video/fbdev/offb.c b/drivers/video/fbdev/offb.c index 91001990e351c..6f0a9851b0924 100644 --- a/drivers/video/fbdev/offb.c +++ b/drivers/video/fbdev/offb.c @@ -355,7 +355,7 @@ static void offb_init_palette_hacks(struct fb_info *info, struct device_node *dp par->cmap_type = cmap_gxt2000; } else if (of_node_name_prefix(dp, "vga,Display-")) { /* Look for AVIVO initialized by SLOF */
struct device_node *pciparent = of_get_parent(dp);
const u32 *vid, *did; vid = of_get_property(pciparent, "vendor-id", NULL); did = of_get_property(pciparent, "device-id", NULL);struct device_node *pciparent __free(device_node) = of_get_parent(dp);
@@ -367,7 +367,6 @@ static void offb_init_palette_hacks(struct fb_info *info, struct device_node *dp if (par->cmap_adr) par->cmap_type = cmap_avivo; }
} else if (dp && of_device_is_compatible(dp, "qemu,std-vga")) {of_node_put(pciparent);
#ifdef __BIG_ENDIAN const __be32 io_of_addr[3] = { 0x01000000, 0x0, 0x0 }; -- 2.43.0
Greg,
On Fri, Aug 30, 2024 at 12:11:53AM GMT, Vitaly Chikunov wrote:
Greg, Sasha,
On Tue, Aug 27, 2024 at 04:38:30PM GMT, Greg Kroah-Hartman wrote:
6.1-stable review patch. If anyone has any objections, please let me know.
We got v6.1.107 build error on ppc64le due to this commit, with:
CC drivers/video/fbdev/offb.o
drivers/video/fbdev/offb.c: In function 'offb_init_palette_hacks': drivers/video/fbdev/offb.c:358:24: error: cleanup argument not a function 358 | struct device_node *pciparent __free(device_node) = of_get_parent(dp); | ^~~~~~~~~~~ make[4]: *** [scripts/Makefile.build:250: drivers/video/fbdev/offb.o] Error 1
Where CONFIG_FB_OF is enabled. Perhaps, due to this commit not being picked:
9448e55d032d ("of: Add cleanup.h based auto release via __free(device_node) markings")
I just tested, and cherry-picking this commit over 6.1.107 fixes the build on ppc64le with CONFIG_FB_OF=y.
Thanks,
Thanks,
From: Abdulrasaq Lawani abdulrasaqolawani@gmail.com
[ Upstream commit ce4a7ae84a58b9f33aae8d6c769b3c94f3d5ce76 ]
Replaced instance of of_node_put with __free(device_node) to simplify code and protect against any memory leaks due to future changes in the control flow.
Suggested-by: Julia Lawall julia.lawall@inria.fr Signed-off-by: Abdulrasaq Lawani abdulrasaqolawani@gmail.com Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org
drivers/video/fbdev/offb.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/video/fbdev/offb.c b/drivers/video/fbdev/offb.c index 91001990e351c..6f0a9851b0924 100644 --- a/drivers/video/fbdev/offb.c +++ b/drivers/video/fbdev/offb.c @@ -355,7 +355,7 @@ static void offb_init_palette_hacks(struct fb_info *info, struct device_node *dp par->cmap_type = cmap_gxt2000; } else if (of_node_name_prefix(dp, "vga,Display-")) { /* Look for AVIVO initialized by SLOF */
struct device_node *pciparent = of_get_parent(dp);
const u32 *vid, *did; vid = of_get_property(pciparent, "vendor-id", NULL); did = of_get_property(pciparent, "device-id", NULL);struct device_node *pciparent __free(device_node) = of_get_parent(dp);
@@ -367,7 +367,6 @@ static void offb_init_palette_hacks(struct fb_info *info, struct device_node *dp if (par->cmap_adr) par->cmap_type = cmap_avivo; }
} else if (dp && of_device_is_compatible(dp, "qemu,std-vga")) {of_node_put(pciparent);
#ifdef __BIG_ENDIAN const __be32 io_of_addr[3] = { 0x01000000, 0x0, 0x0 }; -- 2.43.0
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guanrui Huang guanrui.huang@linux.alibaba.com
[ Upstream commit 382d2ffe86efb1e2fa803d2cf17e5bfc34e574f3 ]
This BUG_ON() is useless, because the same effect will be obtained by letting the code run its course and vm being dereferenced, triggering an exception.
So just remove this check.
Signed-off-by: Guanrui Huang guanrui.huang@linux.alibaba.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Zenghui Yu yuzenghui@huawei.com Acked-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20240418061053.96803-3-guanrui.huang@linux.alibaba... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-gic-v3-its.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 3620bdb5200f2..a7a952bbfdc28 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -4476,8 +4476,6 @@ static int its_vpe_irq_domain_alloc(struct irq_domain *domain, unsigned int virq struct page *vprop_page; int base, nr_ids, i, err = 0;
- BUG_ON(!vm); - bitmap = its_lpi_alloc(roundup_pow_of_two(nr_irqs), &base, &nr_ids); if (!bitmap) return -ENOMEM;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 261341a932d9244cbcd372a3659428c8723e5a49 ]
The max_zeroout is of type int and the s_extent_max_zeroout_kb is of type uint, and the s_extent_max_zeroout_kb can be freely modified via the sysfs interface. When the block size is 1024, max_zeroout may overflow, so declare it as unsigned int to avoid overflow.
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240319113325.3110393-9-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/extents.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 5cbe5ae5ad4a2..92b540754799c 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3404,9 +3404,10 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, struct ext4_extent *ex, *abut_ex; ext4_lblk_t ee_block, eof_block; unsigned int ee_len, depth, map_len = map->m_len; - int allocated = 0, max_zeroout = 0; int err = 0; int split_flag = EXT4_EXT_DATA_VALID2; + int allocated = 0; + unsigned int max_zeroout = 0;
ext_debug(inode, "logical block %llu, max_blocks %u\n", (unsigned long long)map->m_lblk, map_len);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sagi Grimberg sagi@grimberg.me
[ Upstream commit 73964c1d07c054376f1b32a62548571795159148 ]
It is possible that the host connected and saw a cm established event and started sending nvme capsules on the qp, however the ctrl did not yet see an established event. This is why the rsp_wait_list exists (for async handling of these cmds, we move them to a pending list).
Furthermore, it is possible that the ctrl cm times out, resulting in a connect-error cm event. in this case we hit a bad deref [1] because in nvmet_rdma_free_rsps we assume that all the responses are in the free list.
We are freeing the cmds array anyways, so don't even bother to remove the rsp from the free_list. It is also guaranteed that we are not racing anything when we are releasing the queue so no other context accessing this array should be running.
[1]: -- Workqueue: nvmet-free-wq nvmet_rdma_free_queue_work [nvmet_rdma] [...] pc : nvmet_rdma_free_rsps+0x78/0xb8 [nvmet_rdma] lr : nvmet_rdma_free_queue_work+0x88/0x120 [nvmet_rdma] Call trace: nvmet_rdma_free_rsps+0x78/0xb8 [nvmet_rdma] nvmet_rdma_free_queue_work+0x88/0x120 [nvmet_rdma] process_one_work+0x1ec/0x4a0 worker_thread+0x48/0x490 kthread+0x158/0x160 ret_from_fork+0x10/0x18 --
Signed-off-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/rdma.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 4597bca43a6d8..a6d55ebb82382 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -473,12 +473,8 @@ nvmet_rdma_alloc_rsps(struct nvmet_rdma_queue *queue) return 0;
out_free: - while (--i >= 0) { - struct nvmet_rdma_rsp *rsp = &queue->rsps[i]; - - list_del(&rsp->free_list); - nvmet_rdma_free_rsp(ndev, rsp); - } + while (--i >= 0) + nvmet_rdma_free_rsp(ndev, &queue->rsps[i]); kfree(queue->rsps); out: return ret; @@ -489,12 +485,8 @@ static void nvmet_rdma_free_rsps(struct nvmet_rdma_queue *queue) struct nvmet_rdma_device *ndev = queue->dev; int i, nr_rsps = queue->recv_queue_size * 2;
- for (i = 0; i < nr_rsps; i++) { - struct nvmet_rdma_rsp *rsp = &queue->rsps[i]; - - list_del(&rsp->free_list); - nvmet_rdma_free_rsp(ndev, rsp); - } + for (i = 0; i < nr_rsps; i++) + nvmet_rdma_free_rsp(ndev, &queue->rsps[i]); kfree(queue->rsps); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesse Zhang jesse.zhang@amd.com
[ Upstream commit 511a623fb46a6cf578c61d4f2755783c48807c77 ]
The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer parent.
Signed-off-by: Jesse Zhang Jesse.Zhang@amd.com Suggested-by: Christian König christian.koenig@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index 69b3829bbe53f..370d02bdde862 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -754,11 +754,15 @@ int amdgpu_vm_pde_update(struct amdgpu_vm_update_params *params, struct amdgpu_vm_bo_base *entry) { struct amdgpu_vm_bo_base *parent = amdgpu_vm_pt_parent(entry); - struct amdgpu_bo *bo = parent->bo, *pbo; + struct amdgpu_bo *bo, *pbo; struct amdgpu_vm *vm = params->vm; uint64_t pde, pt, flags; unsigned int level;
+ if (WARN_ON(!parent)) + return -EINVAL; + + bo = parent->bo; for (level = 0, pbo = bo->parent; pbo; ++level) pbo = pbo->parent;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Chang phil.chang@mediatek.com
[ Upstream commit 5a830bbce3af16833fe0092dec47b6dd30279825 ]
The hrtimer function callback must not be NULL. It has to be specified by the call side but it is not validated by the hrtimer code. When a hrtimer is queued without a function callback, the kernel crashes with a null pointer dereference when trying to execute the callback in __run_hrtimer().
Introduce a validation before queuing the hrtimer in hrtimer_start_range_ns().
[anna-maria: Rephrase commit message]
Signed-off-by: Phil Chang phil.chang@mediatek.com Signed-off-by: Anna-Maria Behnsen anna-maria@linutronix.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Anna-Maria Behnsen anna-maria@linutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/time/hrtimer.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 314fb7598a879..f62cc13b5f143 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1285,6 +1285,8 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, struct hrtimer_clock_base *base; unsigned long flags;
+ if (WARN_ON_ONCE(!timer->function)) + return; /* * Check whether the HRTIMER_MODE_SOFT bit and hrtimer.is_soft * match on CONFIG_PREEMPT_RT = n. With PREEMPT_RT check the hard
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 3a3be7ff9224f424e485287b54be00d2c6bd9c40 upstream.
syzbot/KMSAN reported use of uninit-value in get_dev_xmit() [1]
We must make sure the IPv4 or Ipv6 header is pulled in skb->head before accessing fields in them.
Use pskb_inet_may_pull() to fix this issue.
[1] BUG: KMSAN: uninit-value in ipv6_pdp_find drivers/net/gtp.c:220 [inline] BUG: KMSAN: uninit-value in gtp_build_skb_ip6 drivers/net/gtp.c:1229 [inline] BUG: KMSAN: uninit-value in gtp_dev_xmit+0x1424/0x2540 drivers/net/gtp.c:1281 ipv6_pdp_find drivers/net/gtp.c:220 [inline] gtp_build_skb_ip6 drivers/net/gtp.c:1229 [inline] gtp_dev_xmit+0x1424/0x2540 drivers/net/gtp.c:1281 __netdev_start_xmit include/linux/netdevice.h:4913 [inline] netdev_start_xmit include/linux/netdevice.h:4922 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3596 __dev_queue_xmit+0x358c/0x5610 net/core/dev.c:4423 dev_queue_xmit include/linux/netdevice.h:3105 [inline] packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3145 [inline] packet_sendmsg+0x90e3/0xa3a0 net/packet/af_packet.c:3177 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 __sys_sendto+0x685/0x830 net/socket.c:2204 __do_sys_sendto net/socket.c:2216 [inline] __se_sys_sendto net/socket.c:2212 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2212 x64_sys_call+0x3799/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:45 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at: slab_post_alloc_hook mm/slub.c:3994 [inline] slab_alloc_node mm/slub.c:4037 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4080 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:583 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:674 alloc_skb include/linux/skbuff.h:1320 [inline] alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6526 sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2815 packet_alloc_skb net/packet/af_packet.c:2994 [inline] packet_snd net/packet/af_packet.c:3088 [inline] packet_sendmsg+0x749c/0xa3a0 net/packet/af_packet.c:3177 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 __sys_sendto+0x685/0x830 net/socket.c:2204 __do_sys_sendto net/socket.c:2216 [inline] __se_sys_sendto net/socket.c:2212 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2212 x64_sys_call+0x3799/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:45 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
CPU: 0 UID: 0 PID: 7115 Comm: syz.1.515 Not tainted 6.11.0-rc1-syzkaller-00043-g94ede2a3e913 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
Fixes: 999cb275c807 ("gtp: add IPv6 support") Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Harald Welte laforge@gnumonks.org Reviewed-by: Pablo Neira Ayuso pablo@netfilter.org Link: https://patch.msgid.link/20240808132455.3413916-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/gtp.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -900,6 +900,9 @@ static netdev_tx_t gtp_dev_xmit(struct s if (skb_cow_head(skb, dev->needed_headroom)) goto tx_err;
+ if (!pskb_inet_may_pull(skb)) + goto tx_err; + skb_reset_inner_headers(skb);
/* PDP context lookups in gtp_build_skb_*() need rcu read-side lock. */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aurelien Jarno aurelien@aurel32.net
commit 31e97d7c9ae3de072d7b424b2cf706a03ec10720 upstream.
This patch replaces max(a, min(b, c)) by clamp(b, a, c) in the solo6x10 driver. This improves the readability and more importantly, for the solo6x10-p2m.c file, this reduces on my system (x86-64, gcc 13):
- the preprocessed size from 121 MiB to 4.5 MiB;
- the build CPU time from 46.8 s to 1.6 s;
- the build memory from 2786 MiB to 98MiB.
In fine, this allows this relatively simple C file to be built on a 32-bit system.
Reported-by: Jiri Slaby jirislaby@gmail.com Closes: https://lore.kernel.org/lkml/18c6df0d-45ed-450c-9eda-95160a2bbb8e@gmail.com/ Cc: stable@vger.kernel.org # v6.7+ Suggested-by: David Laight David.Laight@ACULAB.COM Signed-off-by: Aurelien Jarno aurelien@aurel32.net Reviewed-by: David Laight David.Laight@ACULAB.COM Reviewed-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Cc: Salvatore Bonaccorso carnil@debian.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/pci/solo6x10/solo6x10-offsets.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/media/pci/solo6x10/solo6x10-offsets.h +++ b/drivers/media/pci/solo6x10/solo6x10-offsets.h @@ -57,16 +57,16 @@ #define SOLO_MP4E_EXT_ADDR(__solo) \ (SOLO_EREF_EXT_ADDR(__solo) + SOLO_EREF_EXT_AREA(__solo)) #define SOLO_MP4E_EXT_SIZE(__solo) \ - max((__solo->nr_chans * 0x00080000), \ - min(((__solo->sdram_size - SOLO_MP4E_EXT_ADDR(__solo)) - \ - __SOLO_JPEG_MIN_SIZE(__solo)), 0x00ff0000)) + clamp(__solo->sdram_size - SOLO_MP4E_EXT_ADDR(__solo) - \ + __SOLO_JPEG_MIN_SIZE(__solo), \ + __solo->nr_chans * 0x00080000, 0x00ff0000)
#define __SOLO_JPEG_MIN_SIZE(__solo) (__solo->nr_chans * 0x00080000) #define SOLO_JPEG_EXT_ADDR(__solo) \ (SOLO_MP4E_EXT_ADDR(__solo) + SOLO_MP4E_EXT_SIZE(__solo)) #define SOLO_JPEG_EXT_SIZE(__solo) \ - max(__SOLO_JPEG_MIN_SIZE(__solo), \ - min((__solo->sdram_size - SOLO_JPEG_EXT_ADDR(__solo)), 0x00ff0000)) + clamp(__solo->sdram_size - SOLO_JPEG_EXT_ADDR(__solo), \ + __SOLO_JPEG_MIN_SIZE(__solo), 0x00ff0000)
#define SOLO_SDRAM_END(__solo) \ (SOLO_JPEG_EXT_ADDR(__solo) + SOLO_JPEG_EXT_SIZE(__solo))
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michał Mirosław mirq-linux@rere.qmqm.pl
[ Upstream commit a55efa7edf37dc428da7058b25c58a54dc9db4e4 ]
Save a bit of code for newer Tegra platforms by compiling out DVC's I2C mode support that's used only for Tegra2.
$ size i2c-tegra.o text data bss dec hex filename - 11381 292 8 11681 2da1 i2c-tegra.o + 10193 292 8 10493 28fd i2c-tegra.o
Signed-off-by: Michał Mirosław mirq-linux@rere.qmqm.pl Reviewed-by: Dmitry Osipenko digetx@gmail.com Signed-off-by: Wolfram Sang wsa@kernel.org Stable-dep-of: 14d069d92951 ("i2c: tegra: Do not mark ACPI devices as irq safe") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-tegra.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c index aa469b33ee2ee..cc9dfd4f6c362 100644 --- a/drivers/i2c/busses/i2c-tegra.c +++ b/drivers/i2c/busses/i2c-tegra.c @@ -298,6 +298,8 @@ struct tegra_i2c_dev { bool is_vi; };
+#define IS_DVC(dev) (IS_ENABLED(CONFIG_ARCH_TEGRA_2x_SOC) && (dev)->is_dvc) + static void dvc_writel(struct tegra_i2c_dev *i2c_dev, u32 val, unsigned int reg) { @@ -315,7 +317,7 @@ static u32 dvc_readl(struct tegra_i2c_dev *i2c_dev, unsigned int reg) */ static u32 tegra_i2c_reg_addr(struct tegra_i2c_dev *i2c_dev, unsigned int reg) { - if (i2c_dev->is_dvc) + if (IS_DVC(i2c_dev)) reg += (reg >= I2C_TX_FIFO) ? 0x10 : 0x40; else if (i2c_dev->is_vi) reg = 0xc00 + (reg << 2); @@ -639,7 +641,7 @@ static int tegra_i2c_init(struct tegra_i2c_dev *i2c_dev)
WARN_ON_ONCE(err);
- if (i2c_dev->is_dvc) + if (IS_DVC(i2c_dev)) tegra_dvc_init(i2c_dev);
val = I2C_CNFG_NEW_MASTER_FSM | I2C_CNFG_PACKET_MODE_EN | @@ -703,7 +705,7 @@ static int tegra_i2c_init(struct tegra_i2c_dev *i2c_dev) return err; }
- if (!i2c_dev->is_dvc && !i2c_dev->is_vi) { + if (!IS_DVC(i2c_dev) && !i2c_dev->is_vi) { u32 sl_cfg = i2c_readl(i2c_dev, I2C_SL_CNFG);
sl_cfg |= I2C_SL_CNFG_NACK | I2C_SL_CNFG_NEWSL; @@ -933,7 +935,7 @@ static irqreturn_t tegra_i2c_isr(int irq, void *dev_id) }
i2c_writel(i2c_dev, status, I2C_INT_STATUS); - if (i2c_dev->is_dvc) + if (IS_DVC(i2c_dev)) dvc_writel(i2c_dev, DVC_STATUS_I2C_DONE_INTR, DVC_STATUS);
/* @@ -972,7 +974,7 @@ static irqreturn_t tegra_i2c_isr(int irq, void *dev_id)
i2c_writel(i2c_dev, status, I2C_INT_STATUS);
- if (i2c_dev->is_dvc) + if (IS_DVC(i2c_dev)) dvc_writel(i2c_dev, DVC_STATUS_I2C_DONE_INTR, DVC_STATUS);
if (i2c_dev->dma_mode) { @@ -1660,7 +1662,9 @@ static const struct of_device_id tegra_i2c_of_match[] = { { .compatible = "nvidia,tegra114-i2c", .data = &tegra114_i2c_hw, }, { .compatible = "nvidia,tegra30-i2c", .data = &tegra30_i2c_hw, }, { .compatible = "nvidia,tegra20-i2c", .data = &tegra20_i2c_hw, }, +#if IS_ENABLED(CONFIG_ARCH_TEGRA_2x_SOC) { .compatible = "nvidia,tegra20-i2c-dvc", .data = &tegra20_i2c_hw, }, +#endif {}, }; MODULE_DEVICE_TABLE(of, tegra_i2c_of_match); @@ -1675,7 +1679,8 @@ static void tegra_i2c_parse_dt(struct tegra_i2c_dev *i2c_dev) multi_mode = device_property_read_bool(i2c_dev->dev, "multi-master"); i2c_dev->multimaster_mode = multi_mode;
- if (of_device_is_compatible(np, "nvidia,tegra20-i2c-dvc")) + if (IS_ENABLED(CONFIG_ARCH_TEGRA_2x_SOC) && + of_device_is_compatible(np, "nvidia,tegra20-i2c-dvc")) i2c_dev->is_dvc = true;
if (of_device_is_compatible(np, "nvidia,tegra210-i2c-vi"))
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michał Mirosław mirq-linux@rere.qmqm.pl
[ Upstream commit 4f5d68c8591498c3955dc0228ed6606c1b138278 ]
Save a bit of code for older Tegra platforms by compiling out VI's I2C mode support that's used only for Tegra210.
$ size i2c-tegra.o text data bss dec hex filename 11381 292 8 11681 2da1 i2c-tegra.o (full) 10193 292 8 10493 28fd i2c-tegra.o (no-dvc) 9145 292 8 9445 24e5 i2c-tegra.o (no-vi,no-dvc)
Signed-off-by: Michał Mirosław mirq-linux@rere.qmqm.pl Reviewed-by: Dmitry Osipenko digetx@gmail.com Signed-off-by: Wolfram Sang wsa@kernel.org Stable-dep-of: 14d069d92951 ("i2c: tegra: Do not mark ACPI devices as irq safe") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-tegra.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c index cc9dfd4f6c362..3792531b7ab7f 100644 --- a/drivers/i2c/busses/i2c-tegra.c +++ b/drivers/i2c/busses/i2c-tegra.c @@ -299,6 +299,7 @@ struct tegra_i2c_dev { };
#define IS_DVC(dev) (IS_ENABLED(CONFIG_ARCH_TEGRA_2x_SOC) && (dev)->is_dvc) +#define IS_VI(dev) (IS_ENABLED(CONFIG_ARCH_TEGRA_210_SOC) && (dev)->is_vi)
static void dvc_writel(struct tegra_i2c_dev *i2c_dev, u32 val, unsigned int reg) @@ -319,7 +320,7 @@ static u32 tegra_i2c_reg_addr(struct tegra_i2c_dev *i2c_dev, unsigned int reg) { if (IS_DVC(i2c_dev)) reg += (reg >= I2C_TX_FIFO) ? 0x10 : 0x40; - else if (i2c_dev->is_vi) + else if (IS_VI(i2c_dev)) reg = 0xc00 + (reg << 2);
return reg; @@ -332,7 +333,7 @@ static void i2c_writel(struct tegra_i2c_dev *i2c_dev, u32 val, unsigned int reg) /* read back register to make sure that register writes completed */ if (reg != I2C_TX_FIFO) readl_relaxed(i2c_dev->base + tegra_i2c_reg_addr(i2c_dev, reg)); - else if (i2c_dev->is_vi) + else if (IS_VI(i2c_dev)) readl_relaxed(i2c_dev->base + tegra_i2c_reg_addr(i2c_dev, I2C_INT_STATUS)); }
@@ -448,7 +449,7 @@ static int tegra_i2c_init_dma(struct tegra_i2c_dev *i2c_dev) u32 *dma_buf; int err;
- if (i2c_dev->is_vi) + if (IS_VI(i2c_dev)) return 0;
if (i2c_dev->hw->has_apb_dma) { @@ -653,7 +654,7 @@ static int tegra_i2c_init(struct tegra_i2c_dev *i2c_dev) i2c_writel(i2c_dev, val, I2C_CNFG); i2c_writel(i2c_dev, 0, I2C_INT_MASK);
- if (i2c_dev->is_vi) + if (IS_VI(i2c_dev)) tegra_i2c_vi_init(i2c_dev);
switch (t->bus_freq_hz) { @@ -705,7 +706,7 @@ static int tegra_i2c_init(struct tegra_i2c_dev *i2c_dev) return err; }
- if (!IS_DVC(i2c_dev) && !i2c_dev->is_vi) { + if (!IS_DVC(i2c_dev) && !IS_VI(i2c_dev)) { u32 sl_cfg = i2c_readl(i2c_dev, I2C_SL_CNFG);
sl_cfg |= I2C_SL_CNFG_NACK | I2C_SL_CNFG_NEWSL; @@ -848,7 +849,7 @@ static int tegra_i2c_fill_tx_fifo(struct tegra_i2c_dev *i2c_dev) i2c_dev->msg_buf_remaining = buf_remaining; i2c_dev->msg_buf = buf + words_to_transfer * BYTES_PER_FIFO_WORD;
- if (i2c_dev->is_vi) + if (IS_VI(i2c_dev)) i2c_writesl_vi(i2c_dev, buf, I2C_TX_FIFO, words_to_transfer); else i2c_writesl(i2c_dev, buf, I2C_TX_FIFO, words_to_transfer); @@ -1656,7 +1657,9 @@ static const struct tegra_i2c_hw_feature tegra194_i2c_hw = { static const struct of_device_id tegra_i2c_of_match[] = { { .compatible = "nvidia,tegra194-i2c", .data = &tegra194_i2c_hw, }, { .compatible = "nvidia,tegra186-i2c", .data = &tegra186_i2c_hw, }, +#if IS_ENABLED(CONFIG_ARCH_TEGRA_210_SOC) { .compatible = "nvidia,tegra210-i2c-vi", .data = &tegra210_i2c_hw, }, +#endif { .compatible = "nvidia,tegra210-i2c", .data = &tegra210_i2c_hw, }, { .compatible = "nvidia,tegra124-i2c", .data = &tegra124_i2c_hw, }, { .compatible = "nvidia,tegra114-i2c", .data = &tegra114_i2c_hw, }, @@ -1683,7 +1686,8 @@ static void tegra_i2c_parse_dt(struct tegra_i2c_dev *i2c_dev) of_device_is_compatible(np, "nvidia,tegra20-i2c-dvc")) i2c_dev->is_dvc = true;
- if (of_device_is_compatible(np, "nvidia,tegra210-i2c-vi")) + if (IS_ENABLED(CONFIG_ARCH_TEGRA_210_SOC) && + of_device_is_compatible(np, "nvidia,tegra210-i2c-vi")) i2c_dev->is_vi = true; }
@@ -1712,7 +1716,7 @@ static int tegra_i2c_init_clocks(struct tegra_i2c_dev *i2c_dev) if (i2c_dev->hw == &tegra20_i2c_hw || i2c_dev->hw == &tegra30_i2c_hw) i2c_dev->clocks[i2c_dev->nclocks++].id = "fast-clk";
- if (i2c_dev->is_vi) + if (IS_VI(i2c_dev)) i2c_dev->clocks[i2c_dev->nclocks++].id = "slow";
err = devm_clk_bulk_get(i2c_dev->dev, i2c_dev->nclocks, @@ -1830,7 +1834,7 @@ static int tegra_i2c_probe(struct platform_device *pdev) * VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't * be used for atomic transfers. */ - if (!i2c_dev->is_vi) + if (!IS_VI(i2c_dev)) pm_runtime_irq_safe(i2c_dev->dev);
pm_runtime_enable(i2c_dev->dev); @@ -1903,7 +1907,7 @@ static int __maybe_unused tegra_i2c_runtime_resume(struct device *dev) * power ON/OFF during runtime PM resume/suspend, meaning that * controller needs to be re-initialized after power ON. */ - if (i2c_dev->is_vi) { + if (IS_VI(i2c_dev)) { err = tegra_i2c_init(i2c_dev); if (err) goto disable_clocks;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
[ Upstream commit 14d069d92951a3e150c0a81f2ca3b93e54da913b ]
On ACPI machines, the tegra i2c module encounters an issue due to a mutex being called inside a spinlock. This leads to the following bug:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585 ...
Call trace: __might_sleep __mutex_lock_common mutex_lock_nested acpi_subsys_runtime_resume rpm_resume tegra_i2c_xfer
The problem arises because during __pm_runtime_resume(), the spinlock &dev->power.lock is acquired before rpm_resume() is called. Later, rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on mutexes, triggering the error.
To address this issue, devices on ACPI are now marked as not IRQ-safe, considering the dependency of acpi_subsys_runtime_resume() on mutexes.
Fixes: bd2fdedbf2ba ("i2c: tegra: Add the ACPI support") Cc: stable@vger.kernel.org # v5.17+ Co-developed-by: Michael van der Westhuizen rmikey@meta.com Signed-off-by: Michael van der Westhuizen rmikey@meta.com Signed-off-by: Breno Leitao leitao@debian.org Reviewed-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Andy Shevchenko andy@kernel.org Signed-off-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-tegra.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c index 3792531b7ab7f..f7b4977d66496 100644 --- a/drivers/i2c/busses/i2c-tegra.c +++ b/drivers/i2c/busses/i2c-tegra.c @@ -1832,9 +1832,9 @@ static int tegra_i2c_probe(struct platform_device *pdev) * domain. * * VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't - * be used for atomic transfers. + * be used for atomic transfers. ACPI device is not IRQ safe also. */ - if (!IS_VI(i2c_dev)) + if (!IS_VI(i2c_dev) && !has_acpi_companion(i2c_dev->dev)) pm_runtime_irq_safe(i2c_dev->dev);
pm_runtime_enable(i2c_dev->dev);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
[ Upstream commit 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 ]
This commit changes device mapper, so that it returns -ERESTARTSYS instead of -EINTR when it is interrupted by a signal (so that the ioctl can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 29270f6f272f6..ddd44a7f79dbf 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2511,7 +2511,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta break;
if (signal_pending_state(task_state, current)) { - r = -EINTR; + r = -ERESTARTSYS; break; }
@@ -2536,7 +2536,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st break;
if (signal_pending_state(task_state, current)) { - r = -EINTR; + r = -ERESTARTSYS; break; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Long Li longli@microsoft.com
[ Upstream commit 58a63729c957621f1990c3494c702711188ca347 ]
After napi_complete_done() is called when NAPI is polling in the current process context, another NAPI may be scheduled and start running in softirq on another CPU and may ring the doorbell before the current CPU does. When combined with unnecessary rings when there is no need to arm the CQ, it triggers error paths in the hardware.
This patch fixes this by calling napi_complete_done() after doorbell rings. It limits the number of unnecessary rings when there is no need to arm. MANA hardware specifies that there must be one doorbell ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical workloads, the 4 CQ wraparounds proves to be big enough that it rarely exceeds this limit before all the napi weight is consumed.
To implement this, add a per-CQ counter cq->work_done_since_doorbell, and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ.
Cc: stable@vger.kernel.org Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ") Reviewed-by: Haiyang Zhang haiyangz@microsoft.com Signed-off-by: Long Li longli@microsoft.com Link: https://patch.msgid.link/1723219138-29887-1-git-send-email-longli@linuxonhyp... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microsoft/mana/mana.h | 1 + drivers/net/ethernet/microsoft/mana/mana_en.c | 24 ++++++++++++------- 2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana.h b/drivers/net/ethernet/microsoft/mana/mana.h index d58be64374c84..41c99eabf40a0 100644 --- a/drivers/net/ethernet/microsoft/mana/mana.h +++ b/drivers/net/ethernet/microsoft/mana/mana.h @@ -262,6 +262,7 @@ struct mana_cq { /* NAPI data */ struct napi_struct napi; int work_done; + int work_done_since_doorbell; int budget; };
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index b751b03eddfb1..e7d1ce68f05e3 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -1311,7 +1311,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq) static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue) { struct mana_cq *cq = context; - u8 arm_bit; int w;
WARN_ON_ONCE(cq->gdma_cq != gdma_queue); @@ -1322,16 +1321,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue) mana_poll_tx_cq(cq);
w = cq->work_done; - - if (w < cq->budget && - napi_complete_done(&cq->napi, w)) { - arm_bit = SET_ARM_BIT; - } else { - arm_bit = 0; + cq->work_done_since_doorbell += w; + + if (w < cq->budget) { + mana_gd_ring_cq(gdma_queue, SET_ARM_BIT); + cq->work_done_since_doorbell = 0; + napi_complete_done(&cq->napi, w); + } else if (cq->work_done_since_doorbell > + cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) { + /* MANA hardware requires at least one doorbell ring every 8 + * wraparounds of CQ even if there is no need to arm the CQ. + * This driver rings the doorbell as soon as we have exceeded + * 4 wraparounds. + */ + mana_gd_ring_cq(gdma_queue, 0); + cq->work_done_since_doorbell = 0; }
- mana_gd_ring_cq(gdma_queue, arm_bit); - return w; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
[ Upstream commit 4e00422ee62663e31e611d7de4d2c4aa3f8555f2 ]
The block size stored in the super block is used by subsystems outside of btrfs and it's a copy of fs_info::sectorsize. Unify that to always use our sectorsize, with the exception of mount where we first need to use fixed values (4K) until we read the super block and can set the sectorsize.
Replace all uses, in most cases it's fewer pointer indirections.
Reviewed-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/disk-io.c | 2 ++ fs/btrfs/extent_io.c | 4 ++-- fs/btrfs/inode.c | 2 +- fs/btrfs/ioctl.c | 2 +- fs/btrfs/reflink.c | 6 +++--- fs/btrfs/send.c | 2 +- fs/btrfs/super.c | 2 +- 7 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c17232659942d..9ebb7bb37a22e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3130,6 +3130,7 @@ static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct super_block int ret;
fs_info->sb = sb; + /* Temporary fixed values for block size until we read the superblock. */ sb->s_blocksize = BTRFS_BDEV_BLOCKSIZE; sb->s_blocksize_bits = blksize_bits(BTRFS_BDEV_BLOCKSIZE);
@@ -3628,6 +3629,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super); sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
+ /* Update the values for the current filesystem. */ sb->s_blocksize = sectorsize; sb->s_blocksize_bits = blksize_bits(sectorsize); memcpy(&sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 5f923c9b773e0..72227c0b4b5a1 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1748,7 +1748,7 @@ static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached, int ret = 0; size_t pg_offset = 0; size_t iosize; - size_t blocksize = inode->i_sb->s_blocksize; + size_t blocksize = fs_info->sectorsize; struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
ret = set_page_extent_mapped(page); @@ -3348,7 +3348,7 @@ int extent_invalidate_folio(struct extent_io_tree *tree, struct extent_state *cached_state = NULL; u64 start = folio_pos(folio); u64 end = start + folio_size(folio) - 1; - size_t blocksize = folio->mapping->host->i_sb->s_blocksize; + size_t blocksize = btrfs_sb(folio->mapping->host->i_sb)->sectorsize;
/* This function is only called for the btree inode */ ASSERT(tree->owner == IO_TREE_BTREE_INODE_IO); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index bd3388e1b532e..934e360d1aefa 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9111,7 +9111,7 @@ static int btrfs_getattr(struct user_namespace *mnt_userns, u64 delalloc_bytes; u64 inode_bytes; struct inode *inode = d_inode(path->dentry); - u32 blocksize = inode->i_sb->s_blocksize; + u32 blocksize = btrfs_sb(inode->i_sb)->sectorsize; u32 bi_flags = BTRFS_I(inode)->flags; u32 bi_ro_flags = BTRFS_I(inode)->ro_flags;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 64b37afb7c87f..31f7fe31b607a 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -517,7 +517,7 @@ static noinline int btrfs_ioctl_fitrim(struct btrfs_fs_info *fs_info, * block group is in the logical address space, which can be any * sectorsize aligned bytenr in the range [0, U64_MAX]. */ - if (range.len < fs_info->sb->s_blocksize) + if (range.len < fs_info->sectorsize) return -EINVAL;
range.minlen = max(range.minlen, minlen); diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c index f50586ff85c84..fc6e428525781 100644 --- a/fs/btrfs/reflink.c +++ b/fs/btrfs/reflink.c @@ -659,7 +659,7 @@ static int btrfs_extent_same_range(struct inode *src, u64 loff, u64 len, struct inode *dst, u64 dst_loff) { struct btrfs_fs_info *fs_info = BTRFS_I(src)->root->fs_info; - const u64 bs = fs_info->sb->s_blocksize; + const u64 bs = fs_info->sectorsize; int ret;
/* @@ -726,7 +726,7 @@ static noinline int btrfs_clone_files(struct file *file, struct file *file_src, int ret; int wb_ret; u64 len = olen; - u64 bs = fs_info->sb->s_blocksize; + u64 bs = fs_info->sectorsize;
/* * VFS's generic_remap_file_range_prep() protects us from cloning the @@ -792,7 +792,7 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in, { struct inode *inode_in = file_inode(file_in); struct inode *inode_out = file_inode(file_out); - u64 bs = BTRFS_I(inode_out)->root->fs_info->sb->s_blocksize; + u64 bs = BTRFS_I(inode_out)->root->fs_info->sectorsize; u64 wb_len; int ret;
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index cc57a97860d8a..e311cc291fe45 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -5910,7 +5910,7 @@ static int send_write_or_clone(struct send_ctx *sctx, int ret = 0; u64 offset = key->offset; u64 end; - u64 bs = sctx->send_root->fs_info->sb->s_blocksize; + u64 bs = sctx->send_root->fs_info->sectorsize;
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size); if (offset >= end) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 6fc5fa18d1ee6..d063379a031dc 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2426,7 +2426,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) buf->f_bavail = 0;
buf->f_type = BTRFS_SUPER_MAGIC; - buf->f_bsize = dentry->d_sb->s_blocksize; + buf->f_bsize = fs_info->sectorsize; buf->f_namelen = BTRFS_NAME_LEN;
/* We treat it as constant endianness (it doesn't matter _which_)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea ]
If we a find that an extent is shared but its end offset is not sector size aligned, then we don't clone it and issue write operations instead. This is because the reflink (remap_file_range) operation does not allow to clone unaligned ranges, except if the end offset of the range matches the i_size of the source and destination files (and the start offset is sector size aligned).
While this is not incorrect because send can only guarantee that a file has the same data in the source and destination snapshots, it's not optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh #!/bin/bash
DEV=/dev/sdi MNT=/mnt/sdi
mkfs.btrfs -f $DEV mount $DEV $MNT
# Use a file size not aligned to any possible sector size. file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes dd if=/dev/random of=$MNT/foo bs=$file_size count=1 cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap rm -f /tmp/send-test btrfs send -f /tmp/send-test $MNT/snap
umount $MNT mkfs.btrfs -f $DEV mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...) mkfile o258-7-0 rename o258-7-0 -> bar write bar - offset=0 length=49152 write bar - offset=49152 length=49152 write bar - offset=98304 length=49152 write bar - offset=147456 length=49152 write bar - offset=196608 length=49152 write bar - offset=245760 length=49152 write bar - offset=294912 length=49152 write bar - offset=344064 length=49152 write bar - offset=393216 length=49152 write bar - offset=442368 length=49152 write bar - offset=491520 length=49152 write bar - offset=540672 length=49152 write bar - offset=589824 length=49152 write bar - offset=638976 length=49152 write bar - offset=688128 length=49152 write bar - offset=737280 length=49152 write bar - offset=786432 length=49152 write bar - offset=835584 length=49152 write bar - offset=884736 length=49152 write bar - offset=933888 length=49152 write bar - offset=983040 length=49152 write bar - offset=1032192 length=16389 chown bar - uid=0, gid=0 chmod bar - mode=0644 utimes bar utimes BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7 /mnt/sdi/snap/bar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...) mkfile o258-7-0 rename o258-7-0 -> bar clone bar - source=foo source offset=0 offset=0 length=1048581 chown bar - uid=0, gid=0 chmod bar - mode=0644 utimes bar utimes BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7 /mnt/sdi/snap/bar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416 CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/send.c | 52 ++++++++++++++++++++++++++++++++++++------------- 1 file changed, 39 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index e311cc291fe45..030edc1a9591b 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -5911,25 +5911,51 @@ static int send_write_or_clone(struct send_ctx *sctx, u64 offset = key->offset; u64 end; u64 bs = sctx->send_root->fs_info->sectorsize; + struct btrfs_file_extent_item *ei; + u64 disk_byte; + u64 data_offset; + u64 num_bytes; + struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size); if (offset >= end) return 0;
- if (clone_root && IS_ALIGNED(end, bs)) { - struct btrfs_file_extent_item *ei; - u64 disk_byte; - u64 data_offset; + num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0], - struct btrfs_file_extent_item); - disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei); - data_offset = btrfs_file_extent_offset(path->nodes[0], ei); - ret = clone_range(sctx, path, clone_root, disk_byte, - data_offset, offset, end - offset); - } else { - ret = send_extent_data(sctx, path, offset, end - offset); - } + if (!clone_root) + goto write_data; + + if (IS_ALIGNED(end, bs)) + goto clone_data; + + /* + * If the extent end is not aligned, we can clone if the extent ends at + * the i_size of the inode and the clone range ends at the i_size of the + * source inode, otherwise the clone operation fails with -EINVAL. + */ + if (end != sctx->cur_inode_size) + goto write_data; + + ret = get_inode_info(clone_root->root, clone_root->ino, &info); + if (ret < 0) + return ret; + + if (clone_root->offset + num_bytes == info.size) + goto clone_data; + +write_data: + ret = send_extent_data(sctx, path, offset, num_bytes); + sctx->cur_inode_next_write_offset = end; + return ret; + +clone_data: + ei = btrfs_item_ptr(path->nodes[0], path->slots[0], + struct btrfs_file_extent_item); + disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei); + data_offset = btrfs_file_extent_offset(path->nodes[0], ei); + ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset, + num_bytes); sctx->cur_inode_next_write_offset = end; return ret; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rodrigo Siqueira Rodrigo.Siqueira@amd.com
[ Upstream commit 56fb276d0244d430496f249335a44ae114dd5f54 ]
[why & how] When the commit 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror") was introduced, it used the wrong calculation for the position copy for X. This commit uses the correct calculation for that based on the original patch.
Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror") Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Acked-by: Wayne Lin wayne.lin@amd.com Signed-off-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 8f9b23abbae5ffcd64856facd26a86b67195bc2f) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c index d6c5d48c878ec..416168c7dcc52 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c @@ -3633,7 +3633,7 @@ void dcn10_set_cursor_position(struct pipe_ctx *pipe_ctx) (int)hubp->curs_attr.width || pos_cpy.x <= (int)hubp->curs_attr.width + pipe_ctx->plane_state->src_rect.x) { - pos_cpy.x = 2 * viewport_width - temp_x; + pos_cpy.x = temp_x + viewport_width; } } } else {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maximilian Luz luzmaximilian@gmail.com
[ Upstream commit bc923d594db21bee0ead128eb4bb78f7e77467a4 ]
There is a small window in ssam_serial_hub_probe() where the controller is initialized but has not been started yet. Specifically, between ssam_controller_init() and ssam_controller_start(). Any failure in this window, for example caused by a failure of serdev_device_open(), currently results in an incorrect warning being emitted.
In particular, any failure in this window results in the controller being destroyed via ssam_controller_destroy(). This function checks the state of the controller and, in an attempt to validate that the controller has been cleanly shut down before we try and deallocate any resources, emits a warning if that state is not SSAM_CONTROLLER_STOPPED.
However, since we have only just initialized the controller and have not yet started it, its state is SSAM_CONTROLLER_INITIALIZED. Note that this is the only point at which the controller has this state, as it will change after we start the controller with ssam_controller_start() and never revert back. Further, at this point no communication has taken place and the sender and receiver threads have not been started yet (and we may not even have an open serdev device either).
Therefore, it is perfectly safe to call ssam_controller_destroy() with a state of SSAM_CONTROLLER_INITIALIZED. This, however, means that the warning currently being emitted is incorrect. Fix it by extending the check.
Fixes: c167b9c7e3d6 ("platform/surface: Add Surface Aggregator subsystem") Signed-off-by: Maximilian Luz luzmaximilian@gmail.com Link: https://lore.kernel.org/r/20240811124645.246016-1-luzmaximilian@gmail.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/surface/aggregator/controller.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/platform/surface/aggregator/controller.c b/drivers/platform/surface/aggregator/controller.c index 30cea324ff95f..a6a8f4b1836a6 100644 --- a/drivers/platform/surface/aggregator/controller.c +++ b/drivers/platform/surface/aggregator/controller.c @@ -1354,7 +1354,8 @@ void ssam_controller_destroy(struct ssam_controller *ctrl) if (ctrl->state == SSAM_CONTROLLER_UNINITIALIZED) return;
- WARN_ON(ctrl->state != SSAM_CONTROLLER_STOPPED); + WARN_ON(ctrl->state != SSAM_CONTROLLER_STOPPED && + ctrl->state != SSAM_CONTROLLER_INITIALIZED);
/* * Note: New events could still have been received after the previous
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lang Yu Lang.Yu@amd.com
[ Upstream commit 0c93bd49576677ae1a18817d5ec000ef031d5187 ]
Fix a warning.
v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.
v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)
[ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by: Lang Yu Lang.Yu@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 20 ++++++++++++++++--- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 +++- 3 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 585d608c10e8e..4b694886715cf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -286,7 +286,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv); int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu( struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv); -void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv); +int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv); int amdgpu_amdkfd_gpuvm_sync_memory( struct amdgpu_device *adev, struct kgd_mem *mem, bool intr); int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem *mem, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 3e7f4d8dc9d13..d486f5dc052e4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -2025,21 +2025,35 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( return ret; }
-void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv) +int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv) { struct kfd_mem_attachment *entry; struct amdgpu_vm *vm; + int ret;
vm = drm_priv_to_vm(drm_priv);
mutex_lock(&mem->lock);
+ ret = amdgpu_bo_reserve(mem->bo, true); + if (ret) + goto out; + list_for_each_entry(entry, &mem->attachments, list) { - if (entry->bo_va->base.vm == vm) - kfd_mem_dmaunmap_attachment(mem, entry); + if (entry->bo_va->base.vm != vm) + continue; + if (entry->bo_va->base.bo->tbo.ttm && + !entry->bo_va->base.bo->tbo.ttm->sg) + continue; + + kfd_mem_dmaunmap_attachment(mem, entry); }
+ amdgpu_bo_unreserve(mem->bo); +out: mutex_unlock(&mem->lock); + + return ret; }
int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu( diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 2b21ce967e766..e3cd66c4d95d8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1410,7 +1410,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep, kfd_flush_tlb(peer_pdd, TLB_FLUSH_HEAVYWEIGHT);
/* Remove dma mapping after tlb flush to avoid IO_PAGE_FAULT */ - amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv); + err = amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv); + if (err) + goto sync_memory_failed; }
mutex_unlock(&p->mutex);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 932021a11805b9da4bd6abf66fe233cccd59fe0e ]
Function hci_sched_le needs to update the respective counter variable inplace other the likes of hci_quote_sent would attempt to use the possible outdated value of conn->{le_cnt,acl_cnt}.
Link: https://github.com/bluez/bluez/issues/915 Fixes: 73d80deb7bdf ("Bluetooth: prioritizing data over HCI") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_core.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index cf164ec9899c3..210e03a3609d4 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -3744,19 +3744,19 @@ static void hci_sched_le(struct hci_dev *hdev) { struct hci_chan *chan; struct sk_buff *skb; - int quote, cnt, tmp; + int quote, *cnt, tmp;
BT_DBG("%s", hdev->name);
if (!hci_conn_num(hdev, LE_LINK)) return;
- cnt = hdev->le_pkts ? hdev->le_cnt : hdev->acl_cnt; + cnt = hdev->le_pkts ? &hdev->le_cnt : &hdev->acl_cnt;
- __check_timeout(hdev, cnt, LE_LINK); + __check_timeout(hdev, *cnt, LE_LINK);
- tmp = cnt; - while (cnt && (chan = hci_chan_sent(hdev, LE_LINK, "e))) { + tmp = *cnt; + while (*cnt && (chan = hci_chan_sent(hdev, LE_LINK, "e))) { u32 priority = (skb_peek(&chan->data_q))->priority; while (quote-- && (skb = skb_peek(&chan->data_q))) { BT_DBG("chan %p skb %p len %d priority %u", chan, skb, @@ -3771,7 +3771,7 @@ static void hci_sched_le(struct hci_dev *hdev) hci_send_frame(hdev, skb); hdev->le_last_tx = jiffies;
- cnt--; + (*cnt)--; chan->sent++; chan->conn->sent++;
@@ -3781,12 +3781,7 @@ static void hci_sched_le(struct hci_dev *hdev) } }
- if (hdev->le_pkts) - hdev->le_cnt = cnt; - else - hdev->acl_cnt = cnt; - - if (cnt != tmp) + if (*cnt != tmp) hci_prio_recalculate(hdev, LE_LINK); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 28cd47f75185c4818b0fb1b46f2f02faaba96376 ]
SMP initiator role shall be considered the one that initiates the pairing procedure with SMP_CMD_PAIRING_REQ:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part H page 1557:
Figure 2.1: LE pairing phases
Note that by sending SMP_CMD_SECURITY_REQ it doesn't change the role to be Initiator.
Link: https://github.com/bluez/bluez/issues/567 Fixes: b28b4943660f ("Bluetooth: Add strict checks for allowed SMP PDUs") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/smp.c | 144 ++++++++++++++++++++++---------------------- 1 file changed, 72 insertions(+), 72 deletions(-)
diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c index ecb005bce65ac..d444fa1bd9f97 100644 --- a/net/bluetooth/smp.c +++ b/net/bluetooth/smp.c @@ -913,7 +913,7 @@ static int tk_request(struct l2cap_conn *conn, u8 remote_oob, u8 auth, * Confirms and the responder Enters the passkey. */ if (smp->method == OVERLAP) { - if (hcon->role == HCI_ROLE_MASTER) + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) smp->method = CFM_PASSKEY; else smp->method = REQ_PASSKEY; @@ -963,7 +963,7 @@ static u8 smp_confirm(struct smp_chan *smp)
smp_send_cmd(smp->conn, SMP_CMD_PAIRING_CONFIRM, sizeof(cp), &cp);
- if (conn->hcon->out) + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_CONFIRM); else SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RANDOM); @@ -979,7 +979,8 @@ static u8 smp_random(struct smp_chan *smp) int ret;
bt_dev_dbg(conn->hcon->hdev, "conn %p %s", conn, - conn->hcon->out ? "initiator" : "responder"); + test_bit(SMP_FLAG_INITIATOR, &smp->flags) ? "initiator" : + "responder");
ret = smp_c1(smp->tk, smp->rrnd, smp->preq, smp->prsp, hcon->init_addr_type, &hcon->init_addr, @@ -993,7 +994,7 @@ static u8 smp_random(struct smp_chan *smp) return SMP_CONFIRM_FAILED; }
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { u8 stk[16]; __le64 rand = 0; __le16 ediv = 0; @@ -1250,14 +1251,15 @@ static void smp_distribute_keys(struct smp_chan *smp) rsp = (void *) &smp->prsp[1];
/* The responder sends its keys first */ - if (hcon->out && (smp->remote_key_dist & KEY_DIST_MASK)) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags) && + (smp->remote_key_dist & KEY_DIST_MASK)) { smp_allow_key_dist(smp); return; }
req = (void *) &smp->preq[1];
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { keydist = &rsp->init_key_dist; *keydist &= req->init_key_dist; } else { @@ -1426,7 +1428,7 @@ static int sc_mackey_and_ltk(struct smp_chan *smp, u8 mackey[16], u8 ltk[16]) struct hci_conn *hcon = smp->conn->hcon; u8 *na, *nb, a[7], b[7];
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { na = smp->prnd; nb = smp->rrnd; } else { @@ -1454,7 +1456,7 @@ static void sc_dhkey_check(struct smp_chan *smp) a[6] = hcon->init_addr_type; b[6] = hcon->resp_addr_type;
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { local_addr = a; remote_addr = b; memcpy(io_cap, &smp->preq[1], 3); @@ -1533,7 +1535,7 @@ static u8 sc_passkey_round(struct smp_chan *smp, u8 smp_op) /* The round is only complete when the initiator * receives pairing random. */ - if (!hcon->out) { + if (!test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), smp->prnd); if (smp->passkey_round == 20) @@ -1561,7 +1563,7 @@ static u8 sc_passkey_round(struct smp_chan *smp, u8 smp_op)
SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RANDOM);
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), smp->prnd); return 0; @@ -1572,7 +1574,7 @@ static u8 sc_passkey_round(struct smp_chan *smp, u8 smp_op) case SMP_CMD_PUBLIC_KEY: default: /* Initiating device starts the round */ - if (!hcon->out) + if (!test_bit(SMP_FLAG_INITIATOR, &smp->flags)) return 0;
bt_dev_dbg(hdev, "Starting passkey round %u", @@ -1617,7 +1619,7 @@ static int sc_user_reply(struct smp_chan *smp, u16 mgmt_op, __le32 passkey) }
/* Initiator sends DHKey check first */ - if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { sc_dhkey_check(smp); SMP_ALLOW_CMD(smp, SMP_CMD_DHKEY_CHECK); } else if (test_and_clear_bit(SMP_FLAG_DHKEY_PENDING, &smp->flags)) { @@ -1740,7 +1742,7 @@ static u8 smp_cmd_pairing_req(struct l2cap_conn *conn, struct sk_buff *skb) struct smp_cmd_pairing rsp, *req = (void *) skb->data; struct l2cap_chan *chan = conn->smp; struct hci_dev *hdev = conn->hcon->hdev; - struct smp_chan *smp; + struct smp_chan *smp = chan->data; u8 key_size, auth, sec_level; int ret;
@@ -1749,16 +1751,14 @@ static u8 smp_cmd_pairing_req(struct l2cap_conn *conn, struct sk_buff *skb) if (skb->len < sizeof(*req)) return SMP_INVALID_PARAMS;
- if (conn->hcon->role != HCI_ROLE_SLAVE) + if (smp && test_bit(SMP_FLAG_INITIATOR, &smp->flags)) return SMP_CMD_NOTSUPP;
- if (!chan->data) + if (!smp) { smp = smp_chan_create(conn); - else - smp = chan->data; - - if (!smp) - return SMP_UNSPECIFIED; + if (!smp) + return SMP_UNSPECIFIED; + }
/* We didn't start the pairing, so match remote */ auth = req->auth_req & AUTH_REQ_MASK(hdev); @@ -1940,7 +1940,7 @@ static u8 smp_cmd_pairing_rsp(struct l2cap_conn *conn, struct sk_buff *skb) if (skb->len < sizeof(*rsp)) return SMP_INVALID_PARAMS;
- if (conn->hcon->role != HCI_ROLE_MASTER) + if (!test_bit(SMP_FLAG_INITIATOR, &smp->flags)) return SMP_CMD_NOTSUPP;
skb_pull(skb, sizeof(*rsp)); @@ -2035,7 +2035,7 @@ static u8 sc_check_confirm(struct smp_chan *smp) if (smp->method == REQ_PASSKEY || smp->method == DSP_PASSKEY) return sc_passkey_round(smp, SMP_CMD_PAIRING_CONFIRM);
- if (conn->hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), smp->prnd); SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RANDOM); @@ -2057,7 +2057,7 @@ static int fixup_sc_false_positive(struct smp_chan *smp) u8 auth;
/* The issue is only observed when we're in responder role */ - if (hcon->out) + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) return SMP_UNSPECIFIED;
if (hci_dev_test_flag(hdev, HCI_SC_ONLY)) { @@ -2093,7 +2093,8 @@ static u8 smp_cmd_pairing_confirm(struct l2cap_conn *conn, struct sk_buff *skb) struct hci_dev *hdev = hcon->hdev;
bt_dev_dbg(hdev, "conn %p %s", conn, - hcon->out ? "initiator" : "responder"); + test_bit(SMP_FLAG_INITIATOR, &smp->flags) ? "initiator" : + "responder");
if (skb->len < sizeof(smp->pcnf)) return SMP_INVALID_PARAMS; @@ -2115,7 +2116,7 @@ static u8 smp_cmd_pairing_confirm(struct l2cap_conn *conn, struct sk_buff *skb) return ret; }
- if (conn->hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), smp->prnd); SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RANDOM); @@ -2150,7 +2151,7 @@ static u8 smp_cmd_pairing_random(struct l2cap_conn *conn, struct sk_buff *skb) if (!test_bit(SMP_FLAG_SC, &smp->flags)) return smp_random(smp);
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { pkax = smp->local_pk; pkbx = smp->remote_pk; na = smp->prnd; @@ -2163,7 +2164,7 @@ static u8 smp_cmd_pairing_random(struct l2cap_conn *conn, struct sk_buff *skb) }
if (smp->method == REQ_OOB) { - if (!hcon->out) + if (!test_bit(SMP_FLAG_INITIATOR, &smp->flags)) smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), smp->prnd); SMP_ALLOW_CMD(smp, SMP_CMD_DHKEY_CHECK); @@ -2174,7 +2175,7 @@ static u8 smp_cmd_pairing_random(struct l2cap_conn *conn, struct sk_buff *skb) if (smp->method == REQ_PASSKEY || smp->method == DSP_PASSKEY) return sc_passkey_round(smp, SMP_CMD_PAIRING_RANDOM);
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { u8 cfm[16];
err = smp_f4(smp->tfm_cmac, smp->remote_pk, smp->local_pk, @@ -2215,7 +2216,7 @@ static u8 smp_cmd_pairing_random(struct l2cap_conn *conn, struct sk_buff *skb) return SMP_UNSPECIFIED;
if (smp->method == REQ_OOB) { - if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { sc_dhkey_check(smp); SMP_ALLOW_CMD(smp, SMP_CMD_DHKEY_CHECK); } @@ -2289,10 +2290,27 @@ bool smp_sufficient_security(struct hci_conn *hcon, u8 sec_level, return false; }
+static void smp_send_pairing_req(struct smp_chan *smp, __u8 auth) +{ + struct smp_cmd_pairing cp; + + if (smp->conn->hcon->type == ACL_LINK) + build_bredr_pairing_cmd(smp, &cp, NULL); + else + build_pairing_cmd(smp->conn, &cp, NULL, auth); + + smp->preq[0] = SMP_CMD_PAIRING_REQ; + memcpy(&smp->preq[1], &cp, sizeof(cp)); + + smp_send_cmd(smp->conn, SMP_CMD_PAIRING_REQ, sizeof(cp), &cp); + SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RSP); + + set_bit(SMP_FLAG_INITIATOR, &smp->flags); +} + static u8 smp_cmd_security_req(struct l2cap_conn *conn, struct sk_buff *skb) { struct smp_cmd_security_req *rp = (void *) skb->data; - struct smp_cmd_pairing cp; struct hci_conn *hcon = conn->hcon; struct hci_dev *hdev = hcon->hdev; struct smp_chan *smp; @@ -2341,16 +2359,20 @@ static u8 smp_cmd_security_req(struct l2cap_conn *conn, struct sk_buff *skb)
skb_pull(skb, sizeof(*rp));
- memset(&cp, 0, sizeof(cp)); - build_pairing_cmd(conn, &cp, NULL, auth); + smp_send_pairing_req(smp, auth);
- smp->preq[0] = SMP_CMD_PAIRING_REQ; - memcpy(&smp->preq[1], &cp, sizeof(cp)); + return 0; +}
- smp_send_cmd(conn, SMP_CMD_PAIRING_REQ, sizeof(cp), &cp); - SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RSP); +static void smp_send_security_req(struct smp_chan *smp, __u8 auth) +{ + struct smp_cmd_security_req cp;
- return 0; + cp.auth_req = auth; + smp_send_cmd(smp->conn, SMP_CMD_SECURITY_REQ, sizeof(cp), &cp); + SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_REQ); + + clear_bit(SMP_FLAG_INITIATOR, &smp->flags); }
int smp_conn_security(struct hci_conn *hcon, __u8 sec_level) @@ -2421,23 +2443,11 @@ int smp_conn_security(struct hci_conn *hcon, __u8 sec_level) authreq |= SMP_AUTH_MITM; }
- if (hcon->role == HCI_ROLE_MASTER) { - struct smp_cmd_pairing cp; - - build_pairing_cmd(conn, &cp, NULL, authreq); - smp->preq[0] = SMP_CMD_PAIRING_REQ; - memcpy(&smp->preq[1], &cp, sizeof(cp)); - - smp_send_cmd(conn, SMP_CMD_PAIRING_REQ, sizeof(cp), &cp); - SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RSP); - } else { - struct smp_cmd_security_req cp; - cp.auth_req = authreq; - smp_send_cmd(conn, SMP_CMD_SECURITY_REQ, sizeof(cp), &cp); - SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_REQ); - } + if (hcon->role == HCI_ROLE_MASTER) + smp_send_pairing_req(smp, authreq); + else + smp_send_security_req(smp, authreq);
- set_bit(SMP_FLAG_INITIATOR, &smp->flags); ret = 0;
unlock: @@ -2688,8 +2698,6 @@ static int smp_cmd_sign_info(struct l2cap_conn *conn, struct sk_buff *skb)
static u8 sc_select_method(struct smp_chan *smp) { - struct l2cap_conn *conn = smp->conn; - struct hci_conn *hcon = conn->hcon; struct smp_cmd_pairing *local, *remote; u8 local_mitm, remote_mitm, local_io, remote_io, method;
@@ -2702,7 +2710,7 @@ static u8 sc_select_method(struct smp_chan *smp) * the "struct smp_cmd_pairing" from them we need to skip the * first byte which contains the opcode. */ - if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { local = (void *) &smp->preq[1]; remote = (void *) &smp->prsp[1]; } else { @@ -2771,7 +2779,7 @@ static int smp_cmd_public_key(struct l2cap_conn *conn, struct sk_buff *skb) /* Non-initiating device sends its public key after receiving * the key from the initiating device. */ - if (!hcon->out) { + if (!test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { err = sc_send_public_key(smp); if (err) return err; @@ -2833,7 +2841,7 @@ static int smp_cmd_public_key(struct l2cap_conn *conn, struct sk_buff *skb) }
if (smp->method == REQ_OOB) { - if (hcon->out) + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), smp->prnd);
@@ -2842,7 +2850,7 @@ static int smp_cmd_public_key(struct l2cap_conn *conn, struct sk_buff *skb) return 0; }
- if (hcon->out) + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_CONFIRM);
if (smp->method == REQ_PASSKEY) { @@ -2857,7 +2865,7 @@ static int smp_cmd_public_key(struct l2cap_conn *conn, struct sk_buff *skb) /* The Initiating device waits for the non-initiating device to * send the confirm value. */ - if (conn->hcon->out) + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) return 0;
err = smp_f4(smp->tfm_cmac, smp->local_pk, smp->remote_pk, smp->prnd, @@ -2891,7 +2899,7 @@ static int smp_cmd_dhkey_check(struct l2cap_conn *conn, struct sk_buff *skb) a[6] = hcon->init_addr_type; b[6] = hcon->resp_addr_type;
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { local_addr = a; remote_addr = b; memcpy(io_cap, &smp->prsp[1], 3); @@ -2916,7 +2924,7 @@ static int smp_cmd_dhkey_check(struct l2cap_conn *conn, struct sk_buff *skb) if (crypto_memneq(check->e, e, 16)) return SMP_DHKEY_CHECK_FAILED;
- if (!hcon->out) { + if (!test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { if (test_bit(SMP_FLAG_WAIT_USER, &smp->flags)) { set_bit(SMP_FLAG_DHKEY_PENDING, &smp->flags); return 0; @@ -2928,7 +2936,7 @@ static int smp_cmd_dhkey_check(struct l2cap_conn *conn, struct sk_buff *skb)
sc_add_ltk(smp);
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { hci_le_start_enc(hcon, 0, 0, smp->tk, smp->enc_key_size); hcon->enc_key_size = smp->enc_key_size; } @@ -3077,7 +3085,6 @@ static void bredr_pairing(struct l2cap_chan *chan) struct l2cap_conn *conn = chan->conn; struct hci_conn *hcon = conn->hcon; struct hci_dev *hdev = hcon->hdev; - struct smp_cmd_pairing req; struct smp_chan *smp;
bt_dev_dbg(hdev, "chan %p", chan); @@ -3129,14 +3136,7 @@ static void bredr_pairing(struct l2cap_chan *chan)
bt_dev_dbg(hdev, "starting SMP over BR/EDR");
- /* Prepare and send the BR/EDR SMP Pairing Request */ - build_bredr_pairing_cmd(smp, &req, NULL); - - smp->preq[0] = SMP_CMD_PAIRING_REQ; - memcpy(&smp->preq[1], &req, sizeof(req)); - - smp_send_cmd(conn, SMP_CMD_PAIRING_REQ, sizeof(req), &req); - SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RSP); + smp_send_pairing_req(smp, 0x00); }
static void smp_resume_cb(struct l2cap_chan *chan)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit eabb1494c9f20362ae53a9991481a1523be4f4b7 ]
skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's located at skb->data (assumption which holds true here).
Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 67c3ca2c5cfe ("net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and register injection") Signed-off-by: Sasha Levin sashal@kernel.org --- net/dsa/tag_ocelot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c index 0d81f172b7a6e..afca3cdf190a0 100644 --- a/net/dsa/tag_ocelot.c +++ b/net/dsa/tag_ocelot.c @@ -22,7 +22,7 @@ static void ocelot_xmit_get_vlan_info(struct sk_buff *skb, struct dsa_port *dp, return; }
- hdr = (struct vlan_ethhdr *)skb_mac_header(skb); + hdr = skb_vlan_eth_hdr(skb); br_vlan_get_proto(br, &proto);
if (ntohs(hdr->h_vlan_proto) == proto) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 0bcf2e4aca6c29a07555b713f2fb461dc38d5977 ]
ocelot_xmit_get_vlan_info() calls __skb_vlan_pop() as the most appropriate helper I could find which strips away a VLAN header. That's all I need it to do, but __skb_vlan_pop() has more logic, which will become incompatible with the future revert of commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()").
Namely, it performs a sanity check on skb_mac_header(), which will stop being set after the above revert, so it will return an error instead of removing the VLAN tag.
ocelot_xmit_get_vlan_info() gets called in 2 circumstances:
(1) the port is under a VLAN-aware bridge and the bridge sends VLAN-tagged packets
(2) the port is under a VLAN-aware bridge and somebody else (an 8021q upper) sends VLAN-tagged packets (using a VID that isn't in the bridge vlan tables)
In case (1), there is actually no bug to defend against, because br_dev_xmit() calls skb_reset_mac_header() and things continue to work.
However, in case (2), illustrated using the commands below, it can be seen that our intervention is needed, since __skb_vlan_pop() complains:
$ ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up $ ip link set $eth master br0 && ip link set $eth up $ ip link add link $eth name $eth.100 type vlan id 100 && ip link set $eth.100 up $ ip addr add 192.168.100.1/24 dev $eth.100
I could fend off the checks in __skb_vlan_pop() with some skb_mac_header_was_set() calls, but seeing how few callers of __skb_vlan_pop() there are from TX paths, that seems rather unproductive.
As an alternative solution, extract the bare minimum logic to strip a VLAN header, and move it to a new helper named vlan_remove_tag(), close to the definition of vlan_insert_tag(). Document it appropriately and make ocelot_xmit_get_vlan_info() call this smaller helper instead.
Seeing that it doesn't appear illegal to test skb->protocol in the TX path, I guess it would be a good for vlan_remove_tag() to also absorb the vlan_set_encap_proto() function call.
Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 67c3ca2c5cfe ("net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and register injection") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/if_vlan.h | 21 +++++++++++++++++++++ net/core/skbuff.c | 8 +------- net/dsa/tag_ocelot.c | 2 +- 3 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index e0d0a645be7cf..83266201746c1 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -704,6 +704,27 @@ static inline void vlan_set_encap_proto(struct sk_buff *skb, skb->protocol = htons(ETH_P_802_2); }
+/** + * vlan_remove_tag - remove outer VLAN tag from payload + * @skb: skbuff to remove tag from + * @vlan_tci: buffer to store value + * + * Expects the skb to contain a VLAN tag in the payload, and to have skb->data + * pointing at the MAC header. + * + * Returns a new pointer to skb->data, or NULL on failure to pull. + */ +static inline void *vlan_remove_tag(struct sk_buff *skb, u16 *vlan_tci) +{ + struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb->data + ETH_HLEN); + + *vlan_tci = ntohs(vhdr->h_vlan_TCI); + + memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN); + vlan_set_encap_proto(skb, vhdr); + return __skb_pull(skb, VLAN_HLEN); +} + /** * skb_vlan_tagged - check if skb is vlan tagged. * @skb: skbuff to query diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4d46788cd493a..768b8d65a5baa 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -5791,7 +5791,6 @@ EXPORT_SYMBOL(skb_ensure_writable); */ int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci) { - struct vlan_hdr *vhdr; int offset = skb->data - skb_mac_header(skb); int err;
@@ -5807,13 +5806,8 @@ int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci)
skb_postpull_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
- vhdr = (struct vlan_hdr *)(skb->data + ETH_HLEN); - *vlan_tci = ntohs(vhdr->h_vlan_TCI); - - memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN); - __skb_pull(skb, VLAN_HLEN); + vlan_remove_tag(skb, vlan_tci);
- vlan_set_encap_proto(skb, vhdr); skb->mac_header += VLAN_HLEN;
if (skb_network_offset(skb) < ETH_HLEN) diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c index afca3cdf190a0..18dda9423fae5 100644 --- a/net/dsa/tag_ocelot.c +++ b/net/dsa/tag_ocelot.c @@ -26,7 +26,7 @@ static void ocelot_xmit_get_vlan_info(struct sk_buff *skb, struct dsa_port *dp, br_vlan_get_proto(br, &proto);
if (ntohs(hdr->h_vlan_proto) == proto) { - __skb_vlan_pop(skb, &tci); + vlan_remove_tag(skb, &tci); *vlan_tci = tci; } else { rcu_read_lock();
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 67c3ca2c5cfe6a50772514e3349b5e7b3b0fac03 ]
Problem description -------------------
On an NXP LS1028A (felix DSA driver) with the following configuration:
- ocelot-8021q tagging protocol - VLAN-aware bridge (with STP) spanning at least swp0 and swp1 - 8021q VLAN upper interfaces on swp0 and swp1: swp0.700, swp1.700 - ptp4l on swp0.700 and swp1.700
we see that the ptp4l instances do not see each other's traffic, and they all go to the grand master state due to the ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES condition.
Jumping to the conclusion for the impatient -------------------------------------------
There is a zero-day bug in the ocelot switchdev driver in the way it handles VLAN-tagged packet injection. The correct logic already exists in the source code, in function ocelot_xmit_get_vlan_info() added by commit 5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit"). But it is used only for normal NPI-based injection with the DSA "ocelot" tagging protocol. The other injection code paths (register-based and FDMA-based) roll their own wrong logic. This affects and was noticed on the DSA "ocelot-8021q" protocol because it uses register-based injection.
By moving ocelot_xmit_get_vlan_info() to a place that's common for both the DSA tagger and the ocelot switch library, it can also be called from ocelot_port_inject_frame() in ocelot.c.
We need to touch the lines with ocelot_ifh_port_set()'s prototype anyway, so let's rename it to something clearer regarding what it does, and add a kernel-doc. ocelot_ifh_set_basic() should do.
Investigation notes -------------------
Debugging reveals that PTP event (aka those carrying timestamps, like Sync) frames injected into swp0.700 (but also swp1.700) hit the wire with two VLAN tags:
00000000: 01 1b 19 00 00 00 00 01 02 03 04 05 81 00 02 bc ~~~~~~~~~~~ 00000010: 81 00 02 bc 88 f7 00 12 00 2c 00 00 02 00 00 00 ~~~~~~~~~~~ 00000020: 00 00 00 00 00 00 00 00 00 00 00 01 02 ff fe 03 00000030: 04 05 00 01 00 04 00 00 00 00 00 00 00 00 00 00 00000040: 00 00
The second (unexpected) VLAN tag makes felix_check_xtr_pkt() -> ptp_classify_raw() fail to see these as PTP packets at the link partner's receiving end, and return PTP_CLASS_NONE (because the BPF classifier is not written to expect 2 VLAN tags).
The reason why packets have 2 VLAN tags is because the transmission code treats VLAN incorrectly.
Neither ocelot switchdev, nor felix DSA, declare the NETIF_F_HW_VLAN_CTAG_TX feature. Therefore, at xmit time, all VLANs should be in the skb head, and none should be in the hwaccel area. This is done by:
static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb, netdev_features_t features) { if (skb_vlan_tag_present(skb) && !vlan_hw_offload_capable(features, skb->vlan_proto)) skb = __vlan_hwaccel_push_inside(skb); return skb; }
But ocelot_port_inject_frame() handles things incorrectly:
ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb));
void ocelot_ifh_port_set(struct sk_buff *skb, void *ifh, int port, u32 rew_op) { (...) if (vlan_tag) ocelot_ifh_set_vlan_tci(ifh, vlan_tag); (...) }
The way __vlan_hwaccel_push_inside() pushes the tag inside the skb head is by calling:
static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb) { skb->vlan_present = 0; }
which does _not_ zero out skb->vlan_tci as seen by skb_vlan_tag_get(). This means that ocelot, when it calls skb_vlan_tag_get(), sees (and uses) a residual skb->vlan_tci, while the same VLAN tag is _already_ in the skb head.
The trivial fix for double VLAN headers is to replace the content of ocelot_ifh_port_set() with:
if (skb_vlan_tag_present(skb)) ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));
but this would not be correct either, because, as mentioned, vlan_hw_offload_capable() is false for us, so we'd be inserting dead code and we'd always transmit packets with VID=0 in the injection frame header.
I can't actually test the ocelot switchdev driver and rely exclusively on code inspection, but I don't think traffic from 8021q uppers has ever been injected properly, and not double-tagged. Thus I'm blaming the introduction of VLAN fields in the injection header - early driver code.
As hinted at in the early conclusion, what we _want_ to happen for VLAN transmission was already described once in commit 5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit").
ocelot_xmit_get_vlan_info() intends to ensure that if the port through which we're transmitting is under a VLAN-aware bridge, the outer VLAN tag from the skb head is stripped from there and inserted into the injection frame header (so that the packet is processed in hardware through that actual VLAN). And in all other cases, the packet is sent with VID=0 in the injection frame header, since the port is VLAN-unaware and has logic to strip this VID on egress (making it invisible to the wire).
Fixes: 08d02364b12f ("net: mscc: fix the injection header") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mscc/ocelot.c | 29 +++++++++++---- drivers/net/ethernet/mscc/ocelot_fdma.c | 2 +- include/linux/dsa/ocelot.h | 47 +++++++++++++++++++++++++ include/soc/mscc/ocelot.h | 3 +- net/dsa/tag_ocelot.c | 37 ++----------------- 5 files changed, 75 insertions(+), 43 deletions(-)
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index 01b6e13f4692f..b594f3054afb6 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -1088,17 +1088,34 @@ bool ocelot_can_inject(struct ocelot *ocelot, int grp) } EXPORT_SYMBOL(ocelot_can_inject);
-void ocelot_ifh_port_set(void *ifh, int port, u32 rew_op, u32 vlan_tag) +/** + * ocelot_ifh_set_basic - Set basic information in Injection Frame Header + * @ifh: Pointer to Injection Frame Header memory + * @ocelot: Switch private data structure + * @port: Egress port number + * @rew_op: Egress rewriter operation for PTP + * @skb: Pointer to socket buffer (packet) + * + * Populate the Injection Frame Header with basic information for this skb: the + * analyzer bypass bit, destination port, VLAN info, egress rewriter info. + */ +void ocelot_ifh_set_basic(void *ifh, struct ocelot *ocelot, int port, + u32 rew_op, struct sk_buff *skb) { + struct ocelot_port *ocelot_port = ocelot->ports[port]; + u64 vlan_tci, tag_type; + + ocelot_xmit_get_vlan_info(skb, ocelot_port->bridge, &vlan_tci, + &tag_type); + ocelot_ifh_set_bypass(ifh, 1); ocelot_ifh_set_dest(ifh, BIT_ULL(port)); - ocelot_ifh_set_tag_type(ifh, IFH_TAG_TYPE_C); - if (vlan_tag) - ocelot_ifh_set_vlan_tci(ifh, vlan_tag); + ocelot_ifh_set_tag_type(ifh, tag_type); + ocelot_ifh_set_vlan_tci(ifh, vlan_tci); if (rew_op) ocelot_ifh_set_rew_op(ifh, rew_op); } -EXPORT_SYMBOL(ocelot_ifh_port_set); +EXPORT_SYMBOL(ocelot_ifh_set_basic);
void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp, u32 rew_op, struct sk_buff *skb) @@ -1109,7 +1126,7 @@ void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp, ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) | QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);
- ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb)); + ocelot_ifh_set_basic(ifh, ocelot, port, rew_op, skb);
for (i = 0; i < OCELOT_TAG_LEN / 4; i++) ocelot_write_rix(ocelot, ifh[i], QS_INJ_WR, grp); diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c b/drivers/net/ethernet/mscc/ocelot_fdma.c index 8e3894cf5f7cd..e9d2e96adb229 100644 --- a/drivers/net/ethernet/mscc/ocelot_fdma.c +++ b/drivers/net/ethernet/mscc/ocelot_fdma.c @@ -666,7 +666,7 @@ static int ocelot_fdma_prepare_skb(struct ocelot *ocelot, int port, u32 rew_op, ifh = skb_push(skb, OCELOT_TAG_LEN); skb_put(skb, ETH_FCS_LEN); memset(ifh, 0, OCELOT_TAG_LEN); - ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb)); + ocelot_ifh_set_basic(ifh, ocelot, port, rew_op, skb);
return 0; } diff --git a/include/linux/dsa/ocelot.h b/include/linux/dsa/ocelot.h index dca2969015d80..6fbfbde68a37c 100644 --- a/include/linux/dsa/ocelot.h +++ b/include/linux/dsa/ocelot.h @@ -5,6 +5,8 @@ #ifndef _NET_DSA_TAG_OCELOT_H #define _NET_DSA_TAG_OCELOT_H
+#include <linux/if_bridge.h> +#include <linux/if_vlan.h> #include <linux/kthread.h> #include <linux/packing.h> #include <linux/skbuff.h> @@ -273,4 +275,49 @@ static inline u32 ocelot_ptp_rew_op(struct sk_buff *skb) return rew_op; }
+/** + * ocelot_xmit_get_vlan_info: Determine VLAN_TCI and TAG_TYPE for injected frame + * @skb: Pointer to socket buffer + * @br: Pointer to bridge device that the port is under, if any + * @vlan_tci: + * @tag_type: + * + * If the port is under a VLAN-aware bridge, remove the VLAN header from the + * payload and move it into the DSA tag, which will make the switch classify + * the packet to the bridge VLAN. Otherwise, leave the classified VLAN at zero, + * which is the pvid of standalone ports (OCELOT_STANDALONE_PVID), although not + * of VLAN-unaware bridge ports (that would be ocelot_vlan_unaware_pvid()). + * Anyway, VID 0 is fine because it is stripped on egress for these port modes, + * and source address learning is not performed for packets injected from the + * CPU anyway, so it doesn't matter that the VID is "wrong". + */ +static inline void ocelot_xmit_get_vlan_info(struct sk_buff *skb, + struct net_device *br, + u64 *vlan_tci, u64 *tag_type) +{ + struct vlan_ethhdr *hdr; + u16 proto, tci; + + if (!br || !br_vlan_enabled(br)) { + *vlan_tci = 0; + *tag_type = IFH_TAG_TYPE_C; + return; + } + + hdr = (struct vlan_ethhdr *)skb_mac_header(skb); + br_vlan_get_proto(br, &proto); + + if (ntohs(hdr->h_vlan_proto) == proto) { + vlan_remove_tag(skb, &tci); + *vlan_tci = tci; + } else { + rcu_read_lock(); + br_vlan_get_pvid_rcu(br, &tci); + rcu_read_unlock(); + *vlan_tci = tci; + } + + *tag_type = (proto != ETH_P_8021Q) ? IFH_TAG_TYPE_S : IFH_TAG_TYPE_C; +} + #endif diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index 195ca8f0b6f9d..9b904ea2f0db9 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -1128,7 +1128,8 @@ void __ocelot_target_write_ix(struct ocelot *ocelot, enum ocelot_target target, bool ocelot_can_inject(struct ocelot *ocelot, int grp); void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp, u32 rew_op, struct sk_buff *skb); -void ocelot_ifh_port_set(void *ifh, int port, u32 rew_op, u32 vlan_tag); +void ocelot_ifh_set_basic(void *ifh, struct ocelot *ocelot, int port, + u32 rew_op, struct sk_buff *skb); int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **skb); void ocelot_drain_cpu_queue(struct ocelot *ocelot, int grp); void ocelot_ptp_rx_timestamp(struct ocelot *ocelot, struct sk_buff *skb, diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c index 18dda9423fae5..ce9d2b20d67a9 100644 --- a/net/dsa/tag_ocelot.c +++ b/net/dsa/tag_ocelot.c @@ -4,40 +4,6 @@ #include <linux/dsa/ocelot.h> #include "dsa_priv.h"
-/* If the port is under a VLAN-aware bridge, remove the VLAN header from the - * payload and move it into the DSA tag, which will make the switch classify - * the packet to the bridge VLAN. Otherwise, leave the classified VLAN at zero, - * which is the pvid of standalone and VLAN-unaware bridge ports. - */ -static void ocelot_xmit_get_vlan_info(struct sk_buff *skb, struct dsa_port *dp, - u64 *vlan_tci, u64 *tag_type) -{ - struct net_device *br = dsa_port_bridge_dev_get(dp); - struct vlan_ethhdr *hdr; - u16 proto, tci; - - if (!br || !br_vlan_enabled(br)) { - *vlan_tci = 0; - *tag_type = IFH_TAG_TYPE_C; - return; - } - - hdr = skb_vlan_eth_hdr(skb); - br_vlan_get_proto(br, &proto); - - if (ntohs(hdr->h_vlan_proto) == proto) { - vlan_remove_tag(skb, &tci); - *vlan_tci = tci; - } else { - rcu_read_lock(); - br_vlan_get_pvid_rcu(br, &tci); - rcu_read_unlock(); - *vlan_tci = tci; - } - - *tag_type = (proto != ETH_P_8021Q) ? IFH_TAG_TYPE_S : IFH_TAG_TYPE_C; -} - static void ocelot_xmit_common(struct sk_buff *skb, struct net_device *netdev, __be32 ifh_prefix, void **ifh) { @@ -49,7 +15,8 @@ static void ocelot_xmit_common(struct sk_buff *skb, struct net_device *netdev, u32 rew_op = 0; u64 qos_class;
- ocelot_xmit_get_vlan_info(skb, dp, &vlan_tci, &tag_type); + ocelot_xmit_get_vlan_info(skb, dsa_port_bridge_dev_get(dp), &vlan_tci, + &tag_type);
qos_class = netdev_get_num_tc(netdev) ? netdev_get_prio_tc_map(netdev, skb->priority) : skb->priority;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit e1b9e80236c540fa85d76e2d510d1b38e1968c5d ]
There are 2 distinct code paths (listed below) in the source code which set up an injection header for Ocelot(-like) switches. Code path (2) lacks the QoS class and source port being set correctly. Especially the improper QoS classification is a problem for the "ocelot-8021q" alternative DSA tagging protocol, because we support tc-taprio and each packet needs to be scheduled precisely through its time slot. This includes PTP, which is normally assigned to a traffic class other than 0, but would be sent through TC 0 nonetheless.
The code paths are:
(1) ocelot_xmit_common() from net/dsa/tag_ocelot.c - called only by the standard "ocelot" DSA tagging protocol which uses NPI-based injection - sets up bit fields in the tag manually to account for a small difference (destination port offset) between Ocelot and Seville. Namely, ocelot_ifh_set_dest() is omitted out of ocelot_xmit_common(), because there's also seville_ifh_set_dest().
(2) ocelot_ifh_set_basic(), called by: - ocelot_fdma_prepare_skb() for FDMA transmission of the ocelot switchdev driver - ocelot_port_xmit() -> ocelot_port_inject_frame() for register-based transmission of the ocelot switchdev driver - felix_port_deferred_xmit() -> ocelot_port_inject_frame() for the DSA tagger ocelot-8021q when it must transmit PTP frames (also through register-based injection). sets the bit fields according to its own logic.
The problem is that (2) doesn't call ocelot_ifh_set_qos_class(). Copying that logic from ocelot_xmit_common() fixes that.
Unfortunately, although desirable, it is not easily possible to de-duplicate code paths (1) and (2), and make net/dsa/tag_ocelot.c directly call ocelot_ifh_set_basic()), because of the ocelot/seville difference. This is the "minimal" fix with some logic duplicated (but at least more consolidated).
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mscc/ocelot.c | 10 +++++++++- drivers/net/ethernet/mscc/ocelot_fdma.c | 1 - 2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index b594f3054afb6..ad179a9dca61a 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -1103,13 +1103,21 @@ void ocelot_ifh_set_basic(void *ifh, struct ocelot *ocelot, int port, u32 rew_op, struct sk_buff *skb) { struct ocelot_port *ocelot_port = ocelot->ports[port]; + struct net_device *dev = skb->dev; u64 vlan_tci, tag_type; + int qos_class;
ocelot_xmit_get_vlan_info(skb, ocelot_port->bridge, &vlan_tci, &tag_type);
+ qos_class = netdev_get_num_tc(dev) ? + netdev_get_prio_tc_map(dev, skb->priority) : skb->priority; + + memset(ifh, 0, OCELOT_TAG_LEN); ocelot_ifh_set_bypass(ifh, 1); + ocelot_ifh_set_src(ifh, BIT_ULL(ocelot->num_phys_ports)); ocelot_ifh_set_dest(ifh, BIT_ULL(port)); + ocelot_ifh_set_qos_class(ifh, qos_class); ocelot_ifh_set_tag_type(ifh, tag_type); ocelot_ifh_set_vlan_tci(ifh, vlan_tci); if (rew_op) @@ -1120,7 +1128,7 @@ EXPORT_SYMBOL(ocelot_ifh_set_basic); void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp, u32 rew_op, struct sk_buff *skb) { - u32 ifh[OCELOT_TAG_LEN / 4] = {0}; + u32 ifh[OCELOT_TAG_LEN / 4]; unsigned int i, count, last;
ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) | diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c b/drivers/net/ethernet/mscc/ocelot_fdma.c index e9d2e96adb229..cc9bce5a4dcdf 100644 --- a/drivers/net/ethernet/mscc/ocelot_fdma.c +++ b/drivers/net/ethernet/mscc/ocelot_fdma.c @@ -665,7 +665,6 @@ static int ocelot_fdma_prepare_skb(struct ocelot *ocelot, int port, u32 rew_op,
ifh = skb_push(skb, OCELOT_TAG_LEN); skb_put(skb, ETH_FCS_LEN); - memset(ifh, 0, OCELOT_TAG_LEN); ocelot_ifh_set_basic(ifh, ocelot, port, rew_op, skb);
return 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit c5e12ac3beb0dd3a718296b2d8af5528e9ab728e ]
As explained by Horatiu Vultur in commit 603ead96582d ("net: sparx5: Add spinlock for frame transmission from CPU") which is for a similar hardware design, multiple CPUs can simultaneously perform injection or extraction. There are only 2 register groups for injection and 2 for extraction, and the driver only uses one of each. So we'd better serialize access using spin locks, otherwise frame corruption is possible.
Note that unlike in sparx5, FDMA in ocelot does not have this issue because struct ocelot_fdma_tx_ring already contains an xmit_lock.
I guess this is mostly a problem for NXP LS1028A, as that is dual core. I don't think VSC7514 is. So I'm blaming the commit where LS1028A (aka the felix DSA driver) started using register-based packet injection and extraction.
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/ocelot/felix.c | 11 +++++ drivers/net/ethernet/mscc/ocelot.c | 52 ++++++++++++++++++++++ drivers/net/ethernet/mscc/ocelot_vsc7514.c | 4 ++ include/soc/mscc/ocelot.h | 9 ++++ 4 files changed, 76 insertions(+)
diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c index 2d2c6f941272c..73da407bb0685 100644 --- a/drivers/net/dsa/ocelot/felix.c +++ b/drivers/net/dsa/ocelot/felix.c @@ -528,7 +528,9 @@ static int felix_tag_8021q_setup(struct dsa_switch *ds) * so we need to be careful that there are no extra frames to be * dequeued over MMIO, since we would never know to discard them. */ + ocelot_lock_xtr_grp_bh(ocelot, 0); ocelot_drain_cpu_queue(ocelot, 0); + ocelot_unlock_xtr_grp_bh(ocelot, 0);
return 0; } @@ -1493,6 +1495,8 @@ static void felix_port_deferred_xmit(struct kthread_work *work) int port = xmit_work->dp->index; int retries = 10;
+ ocelot_lock_inj_grp(ocelot, 0); + do { if (ocelot_can_inject(ocelot, 0)) break; @@ -1501,6 +1505,7 @@ static void felix_port_deferred_xmit(struct kthread_work *work) } while (--retries);
if (!retries) { + ocelot_unlock_inj_grp(ocelot, 0); dev_err(ocelot->dev, "port %d failed to inject skb\n", port); ocelot_port_purge_txtstamp_skb(ocelot, port, skb); @@ -1510,6 +1515,8 @@ static void felix_port_deferred_xmit(struct kthread_work *work)
ocelot_port_inject_frame(ocelot, port, 0, rew_op, skb);
+ ocelot_unlock_inj_grp(ocelot, 0); + consume_skb(skb); kfree(xmit_work); } @@ -1658,6 +1665,8 @@ static bool felix_check_xtr_pkt(struct ocelot *ocelot) if (!felix->info->quirk_no_xtr_irq) return false;
+ ocelot_lock_xtr_grp(ocelot, grp); + while (ocelot_read(ocelot, QS_XTR_DATA_PRESENT) & BIT(grp)) { struct sk_buff *skb; unsigned int type; @@ -1694,6 +1703,8 @@ static bool felix_check_xtr_pkt(struct ocelot *ocelot) ocelot_drain_cpu_queue(ocelot, 0); }
+ ocelot_unlock_xtr_grp(ocelot, grp); + return true; }
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index ad179a9dca61a..310a36356f568 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -994,6 +994,48 @@ void ocelot_ptp_rx_timestamp(struct ocelot *ocelot, struct sk_buff *skb, } EXPORT_SYMBOL(ocelot_ptp_rx_timestamp);
+void ocelot_lock_inj_grp(struct ocelot *ocelot, int grp) + __acquires(&ocelot->inj_lock) +{ + spin_lock(&ocelot->inj_lock); +} +EXPORT_SYMBOL_GPL(ocelot_lock_inj_grp); + +void ocelot_unlock_inj_grp(struct ocelot *ocelot, int grp) + __releases(&ocelot->inj_lock) +{ + spin_unlock(&ocelot->inj_lock); +} +EXPORT_SYMBOL_GPL(ocelot_unlock_inj_grp); + +void ocelot_lock_xtr_grp(struct ocelot *ocelot, int grp) + __acquires(&ocelot->inj_lock) +{ + spin_lock(&ocelot->inj_lock); +} +EXPORT_SYMBOL_GPL(ocelot_lock_xtr_grp); + +void ocelot_unlock_xtr_grp(struct ocelot *ocelot, int grp) + __releases(&ocelot->inj_lock) +{ + spin_unlock(&ocelot->inj_lock); +} +EXPORT_SYMBOL_GPL(ocelot_unlock_xtr_grp); + +void ocelot_lock_xtr_grp_bh(struct ocelot *ocelot, int grp) + __acquires(&ocelot->xtr_lock) +{ + spin_lock_bh(&ocelot->xtr_lock); +} +EXPORT_SYMBOL_GPL(ocelot_lock_xtr_grp_bh); + +void ocelot_unlock_xtr_grp_bh(struct ocelot *ocelot, int grp) + __releases(&ocelot->xtr_lock) +{ + spin_unlock_bh(&ocelot->xtr_lock); +} +EXPORT_SYMBOL_GPL(ocelot_unlock_xtr_grp_bh); + int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **nskb) { u64 timestamp, src_port, len; @@ -1004,6 +1046,8 @@ int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **nskb) u32 val, *buf; int err;
+ lockdep_assert_held(&ocelot->xtr_lock); + err = ocelot_xtr_poll_xfh(ocelot, grp, xfh); if (err) return err; @@ -1079,6 +1123,8 @@ bool ocelot_can_inject(struct ocelot *ocelot, int grp) { u32 val = ocelot_read(ocelot, QS_INJ_STATUS);
+ lockdep_assert_held(&ocelot->inj_lock); + if (!(val & QS_INJ_STATUS_FIFO_RDY(BIT(grp)))) return false; if (val & QS_INJ_STATUS_WMARK_REACHED(BIT(grp))) @@ -1131,6 +1177,8 @@ void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp, u32 ifh[OCELOT_TAG_LEN / 4]; unsigned int i, count, last;
+ lockdep_assert_held(&ocelot->inj_lock); + ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) | QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);
@@ -1167,6 +1215,8 @@ EXPORT_SYMBOL(ocelot_port_inject_frame);
void ocelot_drain_cpu_queue(struct ocelot *ocelot, int grp) { + lockdep_assert_held(&ocelot->xtr_lock); + while (ocelot_read(ocelot, QS_XTR_DATA_PRESENT) & BIT(grp)) ocelot_read_rix(ocelot, QS_XTR_RD, grp); } @@ -2758,6 +2808,8 @@ int ocelot_init(struct ocelot *ocelot) mutex_init(&ocelot->tas_lock); spin_lock_init(&ocelot->ptp_clock_lock); spin_lock_init(&ocelot->ts_id_lock); + spin_lock_init(&ocelot->inj_lock); + spin_lock_init(&ocelot->xtr_lock);
ocelot->owq = alloc_ordered_workqueue("ocelot-owq", 0); if (!ocelot->owq) diff --git a/drivers/net/ethernet/mscc/ocelot_vsc7514.c b/drivers/net/ethernet/mscc/ocelot_vsc7514.c index 6f22aea08a644..bf39a053dc82f 100644 --- a/drivers/net/ethernet/mscc/ocelot_vsc7514.c +++ b/drivers/net/ethernet/mscc/ocelot_vsc7514.c @@ -159,6 +159,8 @@ static irqreturn_t ocelot_xtr_irq_handler(int irq, void *arg) struct ocelot *ocelot = arg; int grp = 0, err;
+ ocelot_lock_xtr_grp(ocelot, grp); + while (ocelot_read(ocelot, QS_XTR_DATA_PRESENT) & BIT(grp)) { struct sk_buff *skb;
@@ -177,6 +179,8 @@ static irqreturn_t ocelot_xtr_irq_handler(int irq, void *arg) if (err < 0) ocelot_drain_cpu_queue(ocelot, 0);
+ ocelot_unlock_xtr_grp(ocelot, grp); + return IRQ_HANDLED; }
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index 9b904ea2f0db9..9b5562f545486 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -977,6 +977,9 @@ struct ocelot { const struct ocelot_stat_layout *stats_layout; struct list_head stats_regions;
+ spinlock_t inj_lock; + spinlock_t xtr_lock; + u32 pool_size[OCELOT_SB_NUM][OCELOT_SB_POOL_NUM]; int packet_buffer_size; int num_frame_refs; @@ -1125,6 +1128,12 @@ void __ocelot_target_write_ix(struct ocelot *ocelot, enum ocelot_target target, u32 val, u32 reg, u32 offset);
/* Packet I/O */ +void ocelot_lock_inj_grp(struct ocelot *ocelot, int grp); +void ocelot_unlock_inj_grp(struct ocelot *ocelot, int grp); +void ocelot_lock_xtr_grp(struct ocelot *ocelot, int grp); +void ocelot_unlock_xtr_grp(struct ocelot *ocelot, int grp); +void ocelot_lock_xtr_grp_bh(struct ocelot *ocelot, int grp); +void ocelot_unlock_xtr_grp_bh(struct ocelot *ocelot, int grp); bool ocelot_can_inject(struct ocelot *ocelot, int grp); void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp, u32 rew_op, struct sk_buff *skb);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Simon Horman horms@kernel.org
[ Upstream commit a0c9fe5eecc97680323ee83780ea3eaf440ba1b7 ]
Since commit 255c1c7279ab ("tc-testing: Allow test cases to be skipped") the variable test_ordinal doesn't exist in call_pre_case(). So it should not be accessed when an exception occurs.
This resolves the following splat:
... During handling of the above exception, another exception occurred:
Traceback (most recent call last): File ".../tdc.py", line 1028, in <module> main() File ".../tdc.py", line 1022, in main set_operation_mode(pm, parser, args, remaining) File ".../tdc.py", line 966, in set_operation_mode catresults = test_runner_serial(pm, args, alltests) File ".../tdc.py", line 642, in test_runner_serial (index, tsr) = test_runner(pm, args, alltests) File ".../tdc.py", line 536, in test_runner res = run_one_test(pm, args, index, tidx) File ".../tdc.py", line 419, in run_one_test pm.call_pre_case(tidx) File ".../tdc.py", line 146, in call_pre_case print('test_ordinal is {}'.format(test_ordinal)) NameError: name 'test_ordinal' is not defined
Fixes: 255c1c7279ab ("tc-testing: Allow test cases to be skipped") Signed-off-by: Simon Horman horms@kernel.org Acked-by: Jamal Hadi Salim jhs@mojatatu.com Link: https://patch.msgid.link/20240815-tdc-test-ordinal-v1-1-0255c122a427@kernel.... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/tc-testing/tdc.py | 1 - 1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/tc-testing/tdc.py b/tools/testing/selftests/tc-testing/tdc.py index ee22e3447ec7e..4702c99c99d3f 100755 --- a/tools/testing/selftests/tc-testing/tdc.py +++ b/tools/testing/selftests/tc-testing/tdc.py @@ -129,7 +129,6 @@ class PluginMgr: except Exception as ee: print('exception {} in call to pre_case for {} plugin'. format(ee, pgn_inst.__class__)) - print('test_ordinal is {}'.format(test_ordinal)) print('testid is {}'.format(caseinfo['id'])) raise
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lucas Karpinski lkarpins@redhat.com
[ Upstream commit 3bdd9fd29cb0f136b307559a19c107210ad5c314 ]
The sockets used by udpgso_bench_tx aren't always ready when udpgso_bench_tx transmits packets. This issue is more prevalent in -rt kernels, but can occur in both. Replace the hacky sleep calls with a function that checks whether the ports in the namespace are ready for use.
Suggested-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Lucas Karpinski lkarpins@redhat.com Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 7167395a4be7 ("selftests: udpgro: report error when receive failed") Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/net_helper.sh | 22 +++++++++++++++++++ tools/testing/selftests/net/udpgro.sh | 13 +++++------ tools/testing/selftests/net/udpgro_bench.sh | 5 +++-- tools/testing/selftests/net/udpgro_frglist.sh | 5 +++-- 4 files changed, 34 insertions(+), 11 deletions(-) create mode 100755 tools/testing/selftests/net/net_helper.sh
diff --git a/tools/testing/selftests/net/net_helper.sh b/tools/testing/selftests/net/net_helper.sh new file mode 100755 index 0000000000000..4fe0befa13fbc --- /dev/null +++ b/tools/testing/selftests/net/net_helper.sh @@ -0,0 +1,22 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Helper functions + +wait_local_port_listen() +{ + local listener_ns="${1}" + local port="${2}" + local protocol="${3}" + local port_hex + local i + + port_hex="$(printf "%04X" "${port}")" + for i in $(seq 10); do + if ip netns exec "${listener_ns}" cat /proc/net/"${protocol}"* | \ + grep -q "${port_hex}"; then + break + fi + sleep 0.1 + done +} diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh index 0c743752669af..af5dc57c8ce93 100755 --- a/tools/testing/selftests/net/udpgro.sh +++ b/tools/testing/selftests/net/udpgro.sh @@ -3,6 +3,8 @@ # # Run a series of udpgro functional tests.
+source net_helper.sh + readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
BPF_FILE="../bpf/xdp_dummy.bpf.o" @@ -51,8 +53,7 @@ run_one() { echo "ok" || \ echo "failed" &
- # Hack: let bg programs complete the startup - sleep 0.2 + wait_local_port_listen ${PEER_NS} 8000 udp ./udpgso_bench_tx ${tx_args} ret=$? wait $(jobs -p) @@ -97,7 +98,7 @@ run_one_nat() { echo "ok" || \ echo "failed"&
- sleep 0.1 + wait_local_port_listen "${PEER_NS}" 8000 udp ./udpgso_bench_tx ${tx_args} ret=$? kill -INT $pid @@ -118,11 +119,9 @@ run_one_2sock() { echo "ok" || \ echo "failed" &
- # Hack: let bg programs complete the startup - sleep 0.2 + wait_local_port_listen "${PEER_NS}" 12345 udp ./udpgso_bench_tx ${tx_args} -p 12345 - sleep 0.1 - # first UDP GSO socket should be closed at this point + wait_local_port_listen "${PEER_NS}" 8000 udp ./udpgso_bench_tx ${tx_args} ret=$? wait $(jobs -p) diff --git a/tools/testing/selftests/net/udpgro_bench.sh b/tools/testing/selftests/net/udpgro_bench.sh index 894972877e8b0..cb664679b4342 100755 --- a/tools/testing/selftests/net/udpgro_bench.sh +++ b/tools/testing/selftests/net/udpgro_bench.sh @@ -3,6 +3,8 @@ # # Run a series of udpgro benchmarks
+source net_helper.sh + readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
BPF_FILE="../bpf/xdp_dummy.bpf.o" @@ -40,8 +42,7 @@ run_one() { ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} -r & ip netns exec "${PEER_NS}" ./udpgso_bench_rx -t ${rx_args} -r &
- # Hack: let bg programs complete the startup - sleep 0.2 + wait_local_port_listen "${PEER_NS}" 8000 udp ./udpgso_bench_tx ${tx_args} }
diff --git a/tools/testing/selftests/net/udpgro_frglist.sh b/tools/testing/selftests/net/udpgro_frglist.sh index 0a6359bed0b92..dd47fa96f6b3e 100755 --- a/tools/testing/selftests/net/udpgro_frglist.sh +++ b/tools/testing/selftests/net/udpgro_frglist.sh @@ -3,6 +3,8 @@ # # Run a series of udpgro benchmarks
+source net_helper.sh + readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
BPF_FILE="../bpf/xdp_dummy.bpf.o" @@ -45,8 +47,7 @@ run_one() { echo ${rx_args} ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} -r &
- # Hack: let bg programs complete the startup - sleep 0.2 + wait_local_port_listen "${PEER_NS}" 8000 udp ./udpgso_bench_tx ${tx_args} }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 7167395a4be7930ecac6a33b4e54d7e3dd9ee209 ]
Currently, we only check the latest senders's exit code. If the receiver report failed, it is not recoreded. Fix it by checking the exit code of all the involved processes.
Before: bad GRO lookup ok multiple GRO socks ./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
failed $ echo $? 0
After: bad GRO lookup ok multiple GRO socks ./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
failed $ echo $? 1
Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Suggested-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/udpgro.sh | 44 ++++++++++++++++----------- 1 file changed, 27 insertions(+), 17 deletions(-)
diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh index af5dc57c8ce93..241c6c37994d8 100755 --- a/tools/testing/selftests/net/udpgro.sh +++ b/tools/testing/selftests/net/udpgro.sh @@ -46,17 +46,19 @@ run_one() { local -r all="$@" local -r tx_args=${all%rx*} local -r rx_args=${all#*rx} + local ret=0
cfg_veth
- ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 1000 -R 10 ${rx_args} && \ - echo "ok" || \ - echo "failed" & + ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 1000 -R 10 ${rx_args} & + local PID1=$!
wait_local_port_listen ${PEER_NS} 8000 udp ./udpgso_bench_tx ${tx_args} - ret=$? - wait $(jobs -p) + check_err $? + wait ${PID1} + check_err $? + [ "$ret" -eq 0 ] && echo "ok" || echo "failed" return $ret }
@@ -73,6 +75,7 @@ run_one_nat() { local -r all="$@" local -r tx_args=${all%rx*} local -r rx_args=${all#*rx} + local ret=0
if [[ ${tx_args} = *-4* ]]; then ipt_cmd=iptables @@ -93,16 +96,17 @@ run_one_nat() { # ... so that GRO will match the UDP_GRO enabled socket, but packets # will land on the 'plain' one ip netns exec "${PEER_NS}" ./udpgso_bench_rx -G ${family} -b ${addr1} -n 0 & - pid=$! - ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 1000 -R 10 ${family} -b ${addr2%/*} ${rx_args} && \ - echo "ok" || \ - echo "failed"& + local PID1=$! + ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 1000 -R 10 ${family} -b ${addr2%/*} ${rx_args} & + local PID2=$!
wait_local_port_listen "${PEER_NS}" 8000 udp ./udpgso_bench_tx ${tx_args} - ret=$? - kill -INT $pid - wait $(jobs -p) + check_err $? + kill -INT ${PID1} + wait ${PID2} + check_err $? + [ "$ret" -eq 0 ] && echo "ok" || echo "failed" return $ret }
@@ -111,20 +115,26 @@ run_one_2sock() { local -r all="$@" local -r tx_args=${all%rx*} local -r rx_args=${all#*rx} + local ret=0
cfg_veth
ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 1000 -R 10 ${rx_args} -p 12345 & - ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 2000 -R 10 ${rx_args} && \ - echo "ok" || \ - echo "failed" & + local PID1=$! + ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 2000 -R 10 ${rx_args} & + local PID2=$!
wait_local_port_listen "${PEER_NS}" 12345 udp ./udpgso_bench_tx ${tx_args} -p 12345 + check_err $? wait_local_port_listen "${PEER_NS}" 8000 udp ./udpgso_bench_tx ${tx_args} - ret=$? - wait $(jobs -p) + check_err $? + wait ${PID1} + check_err $? + wait ${PID2} + check_err $? + [ "$ret" -eq 0 ] && echo "ok" || echo "failed" return $ret }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 50e2907ef8bb52cf80ecde9eec5c4dac07177146 ]
TCP ehash table is often sparsely populated.
inet_twsk_purge() spends too much time calling cond_resched().
This patch can reduce time spent in inet_twsk_purge() by 20x.
Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240327191206.508114-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 565d121b6998 ("tcp: prevent concurrent execution of tcp_sk_exit_batch") Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/inet_timewait_sock.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 340a8f0c29800..15d6ce41e5de7 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -284,12 +284,17 @@ EXPORT_SYMBOL_GPL(__inet_twsk_schedule); /* Remove all non full sockets (TIME_WAIT and NEW_SYN_RECV) for dead netns */ void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family) { + struct inet_ehash_bucket *head = &hashinfo->ehash[0]; + unsigned int ehash_mask = hashinfo->ehash_mask; struct hlist_nulls_node *node; unsigned int slot; struct sock *sk;
- for (slot = 0; slot <= hashinfo->ehash_mask; slot++) { - struct inet_ehash_bucket *head = &hashinfo->ehash[slot]; + for (slot = 0; slot <= ehash_mask; slot++, head++) { + + if (hlist_nulls_empty(&head->chain)) + continue; + restart_rcu: cond_resched(); rcu_read_lock();
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 1eeb5043573981f3a1278876515851b7f6b1df1b ]
We lost ability to unload ipv6 module a long time ago.
Instead of calling expensive inet_twsk_purge() twice, we can handle all families in one round.
Also remove an extra line added in my prior patch, per Kuniyuki Iwashima feedback.
Signed-off-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/netdev/20240327192934.6843-1-kuniyu@amazon.com/ Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240329153203.345203-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 565d121b6998 ("tcp: prevent concurrent execution of tcp_sk_exit_batch") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/inet_timewait_sock.h | 2 +- include/net/tcp.h | 2 +- net/dccp/ipv4.c | 2 +- net/dccp/ipv6.c | 6 ------ net/ipv4/inet_timewait_sock.c | 9 +++------ net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/tcp_minisocks.c | 6 +++--- net/ipv6/tcp_ipv6.c | 6 ------ 8 files changed, 10 insertions(+), 25 deletions(-)
diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index 4a8e578405cb3..9365e5af8d6da 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -114,7 +114,7 @@ static inline void inet_twsk_reschedule(struct inet_timewait_sock *tw, int timeo
void inet_twsk_deschedule_put(struct inet_timewait_sock *tw);
-void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family); +void inet_twsk_purge(struct inet_hashinfo *hashinfo);
static inline struct net *twsk_net(const struct inet_timewait_sock *twsk) diff --git a/include/net/tcp.h b/include/net/tcp.h index cc314c383c532..c7501ca66dd34 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -352,7 +352,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb); void tcp_rcv_space_adjust(struct sock *sk); int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp); void tcp_twsk_destructor(struct sock *sk); -void tcp_twsk_purge(struct list_head *net_exit_list, int family); +void tcp_twsk_purge(struct list_head *net_exit_list); ssize_t tcp_splice_read(struct socket *sk, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index f4a2dce3e1048..db8d54fb88060 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -1042,7 +1042,7 @@ static void __net_exit dccp_v4_exit_net(struct net *net)
static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list) { - inet_twsk_purge(&dccp_hashinfo, AF_INET); + inet_twsk_purge(&dccp_hashinfo); }
static struct pernet_operations dccp_v4_ops = { diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 016af0301366d..d90bb941f2ada 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -1121,15 +1121,9 @@ static void __net_exit dccp_v6_exit_net(struct net *net) inet_ctl_sock_destroy(pn->v6_ctl_sk); }
-static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list) -{ - inet_twsk_purge(&dccp_hashinfo, AF_INET6); -} - static struct pernet_operations dccp_v6_ops = { .init = dccp_v6_init_net, .exit = dccp_v6_exit_net, - .exit_batch = dccp_v6_exit_batch, .id = &dccp_v6_pernet_id, .size = sizeof(struct dccp_v6_pernet), }; diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 15d6ce41e5de7..6356a8a47b345 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -282,7 +282,7 @@ void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, bool rearm) EXPORT_SYMBOL_GPL(__inet_twsk_schedule);
/* Remove all non full sockets (TIME_WAIT and NEW_SYN_RECV) for dead netns */ -void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family) +void inet_twsk_purge(struct inet_hashinfo *hashinfo) { struct inet_ehash_bucket *head = &hashinfo->ehash[0]; unsigned int ehash_mask = hashinfo->ehash_mask; @@ -291,7 +291,6 @@ void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family) struct sock *sk;
for (slot = 0; slot <= ehash_mask; slot++, head++) { - if (hlist_nulls_empty(&head->chain)) continue;
@@ -306,15 +305,13 @@ void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family) TCPF_NEW_SYN_RECV)) continue;
- if (sk->sk_family != family || - refcount_read(&sock_net(sk)->ns.count)) + if (refcount_read(&sock_net(sk)->ns.count)) continue;
if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt))) continue;
- if (unlikely(sk->sk_family != family || - refcount_read(&sock_net(sk)->ns.count))) { + if (refcount_read(&sock_net(sk)->ns.count)) { sock_gen_put(sk); goto restart; } diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index c64ba4f8ddaa9..167de693981a8 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -3242,7 +3242,7 @@ static void __net_exit tcp_sk_exit_batch(struct list_head *net_exit_list) { struct net *net;
- tcp_twsk_purge(net_exit_list, AF_INET); + tcp_twsk_purge(net_exit_list);
list_for_each_entry(net, net_exit_list, exit_list) { inet_pernet_hashinfo_free(net->ipv4.tcp_death_row.hashinfo); diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index b3bfa1a09df68..000dce7d0e2d0 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -347,7 +347,7 @@ void tcp_twsk_destructor(struct sock *sk) } EXPORT_SYMBOL_GPL(tcp_twsk_destructor);
-void tcp_twsk_purge(struct list_head *net_exit_list, int family) +void tcp_twsk_purge(struct list_head *net_exit_list) { bool purged_once = false; struct net *net; @@ -355,9 +355,9 @@ void tcp_twsk_purge(struct list_head *net_exit_list, int family) list_for_each_entry(net, net_exit_list, exit_list) { if (net->ipv4.tcp_death_row.hashinfo->pernet) { /* Even if tw_refcount == 1, we must clean up kernel reqsk */ - inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo, family); + inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo); } else if (!purged_once) { - inet_twsk_purge(&tcp_hashinfo, family); + inet_twsk_purge(&tcp_hashinfo); purged_once = true; } } diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index eb6fc0e2a4533..06b4acbfd314b 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -2217,15 +2217,9 @@ static void __net_exit tcpv6_net_exit(struct net *net) inet_ctl_sock_destroy(net->ipv6.tcp_sk); }
-static void __net_exit tcpv6_net_exit_batch(struct list_head *net_exit_list) -{ - tcp_twsk_purge(net_exit_list, AF_INET6); -} - static struct pernet_operations tcpv6_net_ops = { .init = tcpv6_net_init, .exit = tcpv6_net_exit, - .exit_batch = tcpv6_net_exit_batch, };
int __init tcpv6_init(void)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 565d121b69980637f040eb4d84289869cdaabedf ]
Its possible that two threads call tcp_sk_exit_batch() concurrently, once from the cleanup_net workqueue, once from a task that failed to clone a new netns. In the latter case, error unwinding calls the exit handlers in reverse order for the 'failed' netns.
tcp_sk_exit_batch() calls tcp_twsk_purge(). Problem is that since commit b099ce2602d8 ("net: Batch inet_twsk_purge"), this function picks up twsk in any dying netns, not just the one passed in via exit_batch list.
This means that the error unwind of setup_net() can "steal" and destroy timewait sockets belonging to the exiting netns.
This allows the netns exit worker to proceed to call
WARN_ON_ONCE(!refcount_dec_and_test(&net->ipv4.tcp_death_row.tw_refcount));
without the expected 1 -> 0 transition, which then splats.
At same time, error unwind path that is also running inet_twsk_purge() will splat as well:
WARNING: .. at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 ... refcount_dec include/linux/refcount.h:351 [inline] inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70 inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221 inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304 tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522 ops_exit_list+0x128/0x180 net/core/net_namespace.c:178 setup_net+0x714/0xb40 net/core/net_namespace.c:375 copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508 create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
... because refcount_dec() of tw_refcount unexpectedly dropped to 0.
This doesn't seem like an actual bug (no tw sockets got lost and I don't see a use-after-free) but as erroneous trigger of debug check.
Add a mutex to force strict ordering: the task that calls tcp_twsk_purge() blocks other task from doing final _dec_and_test before mutex-owner has removed all tw sockets of dying netns.
Fixes: e9bd0cca09d1 ("tcp: Don't allocate tcp_death_row outside of struct netns_ipv4.") Reported-by: syzbot+8ea26396ff85d23a8929@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/0000000000003a5292061f5e4e19@google.com/ Link: https://lore.kernel.org/netdev/20240812140104.GA21559@breakpoint.cc/ Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Jason Xing kerneljasonxing@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/20240812222857.29837-1-fw@strlen.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_ipv4.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 167de693981a8..1327447a3aade 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -93,6 +93,8 @@ EXPORT_SYMBOL(tcp_hashinfo);
static DEFINE_PER_CPU(struct sock *, ipv4_tcp_sk);
+static DEFINE_MUTEX(tcp_exit_batch_mutex); + static u32 tcp_v4_init_seq(const struct sk_buff *skb) { return secure_tcp_seq(ip_hdr(skb)->daddr, @@ -3242,6 +3244,16 @@ static void __net_exit tcp_sk_exit_batch(struct list_head *net_exit_list) { struct net *net;
+ /* make sure concurrent calls to tcp_sk_exit_batch from net_cleanup_work + * and failed setup_net error unwinding path are serialized. + * + * tcp_twsk_purge() handles twsk in any dead netns, not just those in + * net_exit_list, the thread that dismantles a particular twsk must + * do so without other thread progressing to refcount_dec_and_test() of + * tcp_death_row.tw_refcount. + */ + mutex_lock(&tcp_exit_batch_mutex); + tcp_twsk_purge(net_exit_list);
list_for_each_entry(net, net_exit_list, exit_list) { @@ -3249,6 +3261,8 @@ static void __net_exit tcp_sk_exit_batch(struct list_head *net_exit_list) WARN_ON_ONCE(!refcount_dec_and_test(&net->ipv4.tcp_death_row.tw_refcount)); tcp_fastopen_ctx_destroy(net); } + + mutex_unlock(&tcp_exit_batch_mutex); }
static struct pernet_operations __net_initdata tcp_sk_ops = {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeremy Kerr jk@codeconstruct.com.au
[ Upstream commit ce335db0621648472f9bb4b7191eb2e13a5793cf ]
In the MCTP route input test, we're routing one skb, then (when delivery is expected) checking the resulting routed skb.
However, we're currently checking the original skb length, rather than the routed skb. Check the routed skb instead; the original will have been freed at this point.
Fixes: 8892c0490779 ("mctp: Add route input to socket tests") Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lore.kernel.org/kernel-janitors/4ad204f0-94cf-46c5-bdab-49592addf315... Signed-off-by: Jeremy Kerr jk@codeconstruct.com.au Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20240816-mctp-kunit-skb-fix-v1-1-3c367ac89c27@codec... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mctp/test/route-test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mctp/test/route-test.c b/net/mctp/test/route-test.c index 92ea4158f7fc4..a944490a724d3 100644 --- a/net/mctp/test/route-test.c +++ b/net/mctp/test/route-test.c @@ -354,7 +354,7 @@ static void mctp_test_route_input_sk(struct kunit *test)
skb2 = skb_recv_datagram(sock->sk, MSG_DONTWAIT, &rc); KUNIT_EXPECT_NOT_ERR_OR_NULL(test, skb2); - KUNIT_EXPECT_EQ(test, skb->len, 1); + KUNIT_EXPECT_EQ(test, skb2->len, 1);
skb_free_datagram(sock->sk, skb2);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 807067bf014d4a3ae2cc55bd3de16f22a01eb580 ]
syzkaller reported UAF in kcm_release(). [0]
The scenario is
1. Thread A builds a skb with MSG_MORE and sets kcm->seq_skb.
2. Thread A resumes building skb from kcm->seq_skb but is blocked by sk_stream_wait_memory()
3. Thread B calls sendmsg() concurrently, finishes building kcm->seq_skb and puts the skb to the write queue
4. Thread A faces an error and finally frees skb that is already in the write queue
5. kcm_release() does double-free the skb in the write queue
When a thread is building a MSG_MORE skb, another thread must not touch it.
Let's add a per-sk mutex and serialise kcm_sendmsg().
[0]: BUG: KASAN: slab-use-after-free in __skb_unlink include/linux/skbuff.h:2366 [inline] BUG: KASAN: slab-use-after-free in __skb_dequeue include/linux/skbuff.h:2385 [inline] BUG: KASAN: slab-use-after-free in __skb_queue_purge_reason include/linux/skbuff.h:3175 [inline] BUG: KASAN: slab-use-after-free in __skb_queue_purge include/linux/skbuff.h:3181 [inline] BUG: KASAN: slab-use-after-free in kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691 Read of size 8 at addr ffff0000ced0fc80 by task syz-executor329/6167
CPU: 1 PID: 6167 Comm: syz-executor329 Tainted: G B 6.8.0-rc5-syzkaller-g9abbc24128bc #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call trace: dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:291 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:298 __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:377 [inline] print_report+0x178/0x518 mm/kasan/report.c:488 kasan_report+0xd8/0x138 mm/kasan/report.c:601 __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381 __skb_unlink include/linux/skbuff.h:2366 [inline] __skb_dequeue include/linux/skbuff.h:2385 [inline] __skb_queue_purge_reason include/linux/skbuff.h:3175 [inline] __skb_queue_purge include/linux/skbuff.h:3181 [inline] kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691 __sock_release net/socket.c:659 [inline] sock_close+0xa4/0x1e8 net/socket.c:1421 __fput+0x30c/0x738 fs/file_table.c:376 ____fput+0x20/0x30 fs/file_table.c:404 task_work_run+0x230/0x2e0 kernel/task_work.c:180 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0x618/0x1f64 kernel/exit.c:871 do_group_exit+0x194/0x22c kernel/exit.c:1020 get_signal+0x1500/0x15ec kernel/signal.c:2893 do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249 do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148 exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline] exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline] el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Allocated by task 6166: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x40/0x78 mm/kasan/common.c:68 kasan_save_alloc_info+0x70/0x84 mm/kasan/generic.c:626 unpoison_slab_object mm/kasan/common.c:314 [inline] __kasan_slab_alloc+0x74/0x8c mm/kasan/common.c:340 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3813 [inline] slab_alloc_node mm/slub.c:3860 [inline] kmem_cache_alloc_node+0x204/0x4c0 mm/slub.c:3903 __alloc_skb+0x19c/0x3d8 net/core/skbuff.c:641 alloc_skb include/linux/skbuff.h:1296 [inline] kcm_sendmsg+0x1d3c/0x2124 net/kcm/kcmsock.c:783 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_sendmsg+0x220/0x2c0 net/socket.c:768 splice_to_socket+0x7cc/0xd58 fs/splice.c:889 do_splice_from fs/splice.c:941 [inline] direct_splice_actor+0xec/0x1d8 fs/splice.c:1164 splice_direct_to_actor+0x438/0xa0c fs/splice.c:1108 do_splice_direct_actor fs/splice.c:1207 [inline] do_splice_direct+0x1e4/0x304 fs/splice.c:1233 do_sendfile+0x460/0xb3c fs/read_write.c:1295 __do_sys_sendfile64 fs/read_write.c:1362 [inline] __se_sys_sendfile64 fs/read_write.c:1348 [inline] __arm64_sys_sendfile64+0x160/0x3b4 fs/read_write.c:1348 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Freed by task 6167: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x40/0x78 mm/kasan/common.c:68 kasan_save_free_info+0x5c/0x74 mm/kasan/generic.c:640 poison_slab_object+0x124/0x18c mm/kasan/common.c:241 __kasan_slab_free+0x3c/0x78 mm/kasan/common.c:257 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2121 [inline] slab_free mm/slub.c:4299 [inline] kmem_cache_free+0x15c/0x3d4 mm/slub.c:4363 kfree_skbmem+0x10c/0x19c __kfree_skb net/core/skbuff.c:1109 [inline] kfree_skb_reason+0x240/0x6f4 net/core/skbuff.c:1144 kfree_skb include/linux/skbuff.h:1244 [inline] kcm_release+0x104/0x4c8 net/kcm/kcmsock.c:1685 __sock_release net/socket.c:659 [inline] sock_close+0xa4/0x1e8 net/socket.c:1421 __fput+0x30c/0x738 fs/file_table.c:376 ____fput+0x20/0x30 fs/file_table.c:404 task_work_run+0x230/0x2e0 kernel/task_work.c:180 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0x618/0x1f64 kernel/exit.c:871 do_group_exit+0x194/0x22c kernel/exit.c:1020 get_signal+0x1500/0x15ec kernel/signal.c:2893 do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249 do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148 exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline] exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline] el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
The buggy address belongs to the object at ffff0000ced0fc80 which belongs to the cache skbuff_head_cache of size 240 The buggy address is located 0 bytes inside of freed 240-byte region [ffff0000ced0fc80, ffff0000ced0fd70)
The buggy address belongs to the physical page: page:00000000d35f4ae4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10ed0f flags: 0x5ffc00000000800(slab|node=0|zone=2|lastcpupid=0x7ff) page_type: 0xffffffff() raw: 05ffc00000000800 ffff0000c1cbf640 fffffdffc3423100 dead000000000004 raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff0000ced0fb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff0000ced0fc00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
ffff0000ced0fc80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff0000ced0fd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc ffff0000ced0fd80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b72d86aa5df17ce74c60 Tested-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/20240815220437.69511-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/kcm.h | 1 + net/kcm/kcmsock.c | 4 ++++ 2 files changed, 5 insertions(+)
diff --git a/include/net/kcm.h b/include/net/kcm.h index 2d704f8f49059..8e8252e08a9ce 100644 --- a/include/net/kcm.h +++ b/include/net/kcm.h @@ -70,6 +70,7 @@ struct kcm_sock { struct work_struct tx_work; struct list_head wait_psock_list; struct sk_buff *seq_skb; + struct mutex tx_mutex; u32 tx_stopped : 1;
/* Don't use bit fields here, these are set under different locks */ diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index 7d37bf4334d26..462bdb6bfa4d8 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -912,6 +912,7 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) !(msg->msg_flags & MSG_MORE) : !!(msg->msg_flags & MSG_EOR); int err = -EPIPE;
+ mutex_lock(&kcm->tx_mutex); lock_sock(sk);
/* Per tcp_sendmsg this should be in poll */ @@ -1060,6 +1061,7 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) KCM_STATS_ADD(kcm->stats.tx_bytes, copied);
release_sock(sk); + mutex_unlock(&kcm->tx_mutex); return copied;
out_error: @@ -1085,6 +1087,7 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) sk->sk_write_space(sk);
release_sock(sk); + mutex_unlock(&kcm->tx_mutex); return err; }
@@ -1325,6 +1328,7 @@ static void init_kcm_sock(struct kcm_sock *kcm, struct kcm_mux *mux) spin_unlock_bh(&mux->lock);
INIT_WORK(&kcm->tx_work, kcm_tx_work); + mutex_init(&kcm->tx_mutex);
spin_lock_bh(&mux->rx_lock); kcm_rcv_ready(kcm);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit 1eacdd71b3436b54d5fc8218c4bb0187d92a6892 ]
The sequence counter nft_counter_seq is a per-CPU counter. There is no lock associated with it. nft_counter_do_eval() is using the same counter and disables BH which suggest that it can be invoked from a softirq. This in turn means that nft_counter_offload_stats(), which disables only preemption, can be interrupted by nft_counter_do_eval() leading to two writer for one seqcount_t. This can lead to loosing stats or reading statistics while they are updated.
Disable BH during stats update in nft_counter_offload_stats() to ensure one writer at a time.
Fixes: b72920f6e4a9d ("netfilter: nftables: counter hardware offload support") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_counter.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c index b5fe7fe4b60db..73e4d278d6c13 100644 --- a/net/netfilter/nft_counter.c +++ b/net/netfilter/nft_counter.c @@ -264,7 +264,7 @@ static void nft_counter_offload_stats(struct nft_expr *expr, struct nft_counter *this_cpu; seqcount_t *myseq;
- preempt_disable(); + local_bh_disable(); this_cpu = this_cpu_ptr(priv->counter); myseq = this_cpu_ptr(&nft_counter_seq);
@@ -272,7 +272,7 @@ static void nft_counter_offload_stats(struct nft_expr *expr, this_cpu->packets += stats->pkts; this_cpu->bytes += stats->bytes; write_seqcount_end(myseq); - preempt_enable(); + local_bh_enable(); }
void nft_counter_init_seqcount(void)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit a0b39e2dc7017ac667b70bdeee5293e410fab2fb ]
nft_counter_reset() resets the counter by subtracting the previously retrieved value from the counter. This is a write operation on the counter and as such it requires to be performed with a write sequence of nft_counter_seq to serialize against its possible reader.
Update the packets/ bytes within write-sequence of nft_counter_seq.
Fixes: d84701ecbcd6a ("netfilter: nft_counter: rework atomic dump and reset") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_counter.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c index 73e4d278d6c13..781d3a26f5df7 100644 --- a/net/netfilter/nft_counter.c +++ b/net/netfilter/nft_counter.c @@ -107,11 +107,16 @@ static void nft_counter_reset(struct nft_counter_percpu_priv *priv, struct nft_counter *total) { struct nft_counter *this_cpu; + seqcount_t *myseq;
local_bh_disable(); this_cpu = this_cpu_ptr(priv->counter); + myseq = this_cpu_ptr(&nft_counter_seq); + + write_seqcount_begin(myseq); this_cpu->packets -= total->packets; this_cpu->bytes -= total->bytes; + write_seqcount_end(myseq); local_bh_enable(); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Bogendoerfer tbogendoerfer@suse.de
[ Upstream commit 4b3e33fcc38f7750604b065c55a43e94c5bc3145 ]
GRO code checks for matching layer 2 headers to see, if packet belongs to the same flow and because ip6 tunnel set dev->hard_header_len this check fails in cases, where it shouldn't. To fix this don't set hard_header_len, but use needed_headroom like ipv4/ip_tunnel.c does.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Thomas Bogendoerfer tbogendoerfer@suse.de Link: https://patch.msgid.link/20240815151419.109864-1-tbogendoerfer@suse.de Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_tunnel.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 2699915bb85be..f3324f2a40466 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1510,7 +1510,8 @@ static void ip6_tnl_link_config(struct ip6_tnl *t) tdev = __dev_get_by_index(t->net, p->link);
if (tdev) { - dev->hard_header_len = tdev->hard_header_len + t_hlen; + dev->needed_headroom = tdev->hard_header_len + + tdev->needed_headroom + t_hlen; mtu = min_t(unsigned int, tdev->mtu, IP6_MAX_MTU);
mtu = mtu - t_hlen; @@ -1734,7 +1735,9 @@ ip6_tnl_siocdevprivate(struct net_device *dev, struct ifreq *ifr, int ip6_tnl_change_mtu(struct net_device *dev, int new_mtu) { struct ip6_tnl *tnl = netdev_priv(dev); + int t_hlen;
+ t_hlen = tnl->hlen + sizeof(struct ipv6hdr); if (tnl->parms.proto == IPPROTO_IPV6) { if (new_mtu < IPV6_MIN_MTU) return -EINVAL; @@ -1743,10 +1746,10 @@ int ip6_tnl_change_mtu(struct net_device *dev, int new_mtu) return -EINVAL; } if (tnl->parms.proto == IPPROTO_IPV6 || tnl->parms.proto == 0) { - if (new_mtu > IP6_MAX_MTU - dev->hard_header_len) + if (new_mtu > IP6_MAX_MTU - dev->hard_header_len - t_hlen) return -EINVAL; } else { - if (new_mtu > IP_MAX_MTU - dev->hard_header_len) + if (new_mtu > IP_MAX_MTU - dev->hard_header_len - t_hlen) return -EINVAL; } dev->mtu = new_mtu; @@ -1892,12 +1895,11 @@ ip6_tnl_dev_init_gen(struct net_device *dev) t_hlen = t->hlen + sizeof(struct ipv6hdr);
dev->type = ARPHRD_TUNNEL6; - dev->hard_header_len = LL_MAX_HEADER + t_hlen; dev->mtu = ETH_DATA_LEN - t_hlen; if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) dev->mtu -= 8; dev->min_mtu = ETH_MIN_MTU; - dev->max_mtu = IP6_MAX_MTU - dev->hard_header_len; + dev->max_mtu = IP6_MAX_MTU - dev->hard_header_len - t_hlen;
netdev_hold(dev, &t->dev_tracker, GFP_KERNEL); return 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Aleksandrov razor@blackwall.org
[ Upstream commit fc59b9a5f7201b9f7272944596113a82cc7773d5 ]
Fix the return type which should be bool.
Fixes: 955b785ec6b3 ("bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()") Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index be5348d0b22e5..5c7b249a1bcfa 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -595,34 +595,28 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs) struct net_device *real_dev; struct slave *curr_active; struct bonding *bond; - int err; + bool ok = false;
bond = netdev_priv(bond_dev); rcu_read_lock(); curr_active = rcu_dereference(bond->curr_active_slave); real_dev = curr_active->dev;
- if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) { - err = false; + if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) goto out; - }
- if (!xs->xso.real_dev) { - err = false; + if (!xs->xso.real_dev) goto out; - }
if (!real_dev->xfrmdev_ops || !real_dev->xfrmdev_ops->xdo_dev_offload_ok || - netif_is_bond_master(real_dev)) { - err = false; + netif_is_bond_master(real_dev)) goto out; - }
- err = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs); + ok = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs); out: rcu_read_unlock(); - return err; + return ok; }
static const struct xfrmdev_ops bond_xfrmdev_ops = {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Aleksandrov razor@blackwall.org
[ Upstream commit 95c90e4ad89d493a7a14fa200082e466e2548f9d ]
We must check if there is an active slave before dereferencing the pointer.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 5c7b249a1bcfa..2414861e04eb0 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -600,6 +600,8 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs) bond = netdev_priv(bond_dev); rcu_read_lock(); curr_active = rcu_dereference(bond->curr_active_slave); + if (!curr_active) + goto out; real_dev = curr_active->dev;
if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Aleksandrov razor@blackwall.org
[ Upstream commit f8cde9805981c50d0c029063dc7d82821806fc44 ]
We shouldn't set real_dev to NULL because packets can be in transit and xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume real_dev is set.
Example trace: kernel: BUG: unable to handle page fault for address: 0000000000001030 kernel: bond0: (slave eni0np1): making interface the new active one kernel: #PF: supervisor write access in kernel mode kernel: #PF: error_code(0x0002) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0002 [#1] PREEMPT SMP kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim] kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f kernel: bond0: (slave eni0np1): making interface the new active one kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60 kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00 kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014 kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000 kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000 kernel: FS: 00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0 kernel: bond0: (slave eni0np1): making interface the new active one kernel: Call Trace: kernel: <TASK> kernel: ? __die+0x1f/0x60 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: ? page_fault_oops+0x142/0x4c0 kernel: ? do_user_addr_fault+0x65/0x670 kernel: ? kvm_read_and_reset_apf_flags+0x3b/0x50 kernel: bond0: (slave eni0np1): making interface the new active one kernel: ? exc_page_fault+0x7b/0x180 kernel: ? asm_exc_page_fault+0x22/0x30 kernel: ? nsim_bpf_uninit+0x50/0x50 [netdevsim] kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim] kernel: bond0: (slave eni0np1): making interface the new active one kernel: bond_ipsec_offload_ok+0x7b/0x90 [bonding] kernel: xfrm_output+0x61/0x3b0 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: ip_push_pending_frames+0x56/0x80
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 2414861e04eb0..c218352814430 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -578,7 +578,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) } else { slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs); } - ipsec->xs->xso.real_dev = NULL; } spin_unlock_bh(&bond->ipsec_lock); rcu_read_unlock();
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Aleksandrov razor@blackwall.org
[ Upstream commit c4c5c5d2ef40a9f67a9241dc5422eac9ffe19547 ]
If the active slave is cleared manually the xfrm state is not flushed. This leads to xfrm add/del imbalance and adding the same state multiple times. For example when the device cannot handle anymore states we get: [ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA because it's filled with the same state after multiple active slave clearings. This change also has a few nice side effects: user-space gets a notification for the change, the old device gets its mac address and promisc/mcast adjusted properly.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_options.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 685fb4703ee1f..06c4cd0f00024 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -932,7 +932,7 @@ static int bond_option_active_slave_set(struct bonding *bond, /* check to see if we are clearing active */ if (!slave_dev) { netdev_dbg(bond->dev, "Clearing current active slave\n"); - RCU_INIT_POINTER(bond->curr_active_slave, NULL); + bond_change_active_slave(bond, NULL); bond_select_active_slave(bond); } else { struct slave *old_active = rtnl_dereference(bond->curr_active_slave);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit c61bcebde72de7f5dc194d28f29894f0f7661ff7 ]
Rx path is going to be modified in a way that fragmented frame will be gathered within xdp_buff in the first place. This approach implies that underlying buffer has to provide tailroom for skb_shared_info. This is currently the case when ring uses build_skb but not when legacy-rx knob is turned on. This case configures 2k Rx buffers and has no way to provide either headroom or tailroom - FWIW it currently has XDP_PACKET_HEADROOM which is broken and in here it is removed. 2k Rx buffers were used so driver in this setting was able to support 9k MTU as it can chain up to 5 Rx buffers. With offset configuring HW writing 2k of a data was passing the half of the page which broke the assumption of our internal page recycling tricks.
Now if above got fixed and legacy-rx path would be left as is, when referring to skb_shared_info via xdp_get_shared_info_from_buff(), packet's content would be corrupted again. Hence size of Rx buffer needs to be lowered and therefore supported MTU. This operation will allow us to keep the unified data path and with 8k MTU users (if any of legacy-rx) would still be good to go. However, tendency is to drop the support for this code path at some point.
Add ICE_RXBUF_1664 as vsi::rx_buf_len and ICE_MAX_FRAME_LEGACY_RX (8320) as vsi::max_frame for legacy-rx. For bigger page sizes configure 3k Rx buffers, not 2k.
Since headroom support is removed, disable data_meta support on legacy-rx. When preparing XDP buff, rely on ice_rx_ring::rx_offset setting when deciding whether to support data_meta or not.
Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Alexander Lobakin alexandr.lobakin@intel.com Link: https://lore.kernel.org/bpf/20230131204506.219292-2-maciej.fijalkowski@intel... Stable-dep-of: 50b2143356e8 ("ice: fix page reuse when PAGE_SIZE is over 8k") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_base.c | 3 --- drivers/net/ethernet/intel/ice/ice_lib.c | 8 ++------ drivers/net/ethernet/intel/ice/ice_main.c | 10 ++++++++-- drivers/net/ethernet/intel/ice/ice_txrx.c | 17 +++++------------ drivers/net/ethernet/intel/ice/ice_txrx.h | 2 ++ 5 files changed, 17 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 818eca6aa4a41..c7c6f01538e0d 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -355,9 +355,6 @@ static unsigned int ice_rx_offset(struct ice_rx_ring *rx_ring) { if (ice_ring_uses_build_skb(rx_ring)) return ICE_SKB_PAD; - else if (ice_is_xdp_ena_vsi(rx_ring->vsi)) - return XDP_PACKET_HEADROOM; - return 0; }
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 7661e735d0992..347c6c23bfc1c 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -1818,8 +1818,8 @@ void ice_update_eth_stats(struct ice_vsi *vsi) void ice_vsi_cfg_frame_size(struct ice_vsi *vsi) { if (!vsi->netdev || test_bit(ICE_FLAG_LEGACY_RX, vsi->back->flags)) { - vsi->max_frame = ICE_AQ_SET_MAC_FRAME_SIZE_MAX; - vsi->rx_buf_len = ICE_RXBUF_2048; + vsi->max_frame = ICE_MAX_FRAME_LEGACY_RX; + vsi->rx_buf_len = ICE_RXBUF_1664; #if (PAGE_SIZE < 8192) } else if (!ICE_2K_TOO_SMALL_WITH_PADDING && (vsi->netdev->mtu <= ETH_DATA_LEN)) { @@ -1828,11 +1828,7 @@ void ice_vsi_cfg_frame_size(struct ice_vsi *vsi) #endif } else { vsi->max_frame = ICE_AQ_SET_MAC_FRAME_SIZE_MAX; -#if (PAGE_SIZE < 8192) vsi->rx_buf_len = ICE_RXBUF_3072; -#else - vsi->rx_buf_len = ICE_RXBUF_2048; -#endif } }
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 6e55861dd86fe..9dbfbc90485e4 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -7328,8 +7328,8 @@ static void ice_rebuild(struct ice_pf *pf, enum ice_reset_req reset_type) */ static int ice_max_xdp_frame_size(struct ice_vsi *vsi) { - if (PAGE_SIZE >= 8192 || test_bit(ICE_FLAG_LEGACY_RX, vsi->back->flags)) - return ICE_RXBUF_2048 - XDP_PACKET_HEADROOM; + if (test_bit(ICE_FLAG_LEGACY_RX, vsi->back->flags)) + return ICE_RXBUF_1664; else return ICE_RXBUF_3072; } @@ -7362,6 +7362,12 @@ static int ice_change_mtu(struct net_device *netdev, int new_mtu) frame_size - ICE_ETH_PKT_HDR_PAD); return -EINVAL; } + } else if (test_bit(ICE_FLAG_LEGACY_RX, pf->flags)) { + if (new_mtu + ICE_ETH_PKT_HDR_PAD > ICE_MAX_FRAME_LEGACY_RX) { + netdev_err(netdev, "Too big MTU for legacy-rx; Max is %d\n", + ICE_MAX_FRAME_LEGACY_RX - ICE_ETH_PKT_HDR_PAD); + return -EINVAL; + } }
/* if a reset is in progress, wait for some time for it to complete */ diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index bd62781191b3d..2db20263420d8 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -984,17 +984,15 @@ static struct sk_buff * ice_construct_skb(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf, struct xdp_buff *xdp) { - unsigned int metasize = xdp->data - xdp->data_meta; unsigned int size = xdp->data_end - xdp->data; unsigned int headlen; struct sk_buff *skb;
/* prefetch first cache line of first page */ - net_prefetch(xdp->data_meta); + net_prefetch(xdp->data);
/* allocate a skb to store the frags */ - skb = __napi_alloc_skb(&rx_ring->q_vector->napi, - ICE_RX_HDR_SIZE + metasize, + skb = __napi_alloc_skb(&rx_ring->q_vector->napi, ICE_RX_HDR_SIZE, GFP_ATOMIC | __GFP_NOWARN); if (unlikely(!skb)) return NULL; @@ -1006,13 +1004,8 @@ ice_construct_skb(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf, headlen = eth_get_headlen(skb->dev, xdp->data, ICE_RX_HDR_SIZE);
/* align pull length to size of long to optimize memcpy performance */ - memcpy(__skb_put(skb, headlen + metasize), xdp->data_meta, - ALIGN(headlen + metasize, sizeof(long))); - - if (metasize) { - skb_metadata_set(skb, metasize); - __skb_pull(skb, metasize); - } + memcpy(__skb_put(skb, headlen), xdp->data, ALIGN(headlen, + sizeof(long)));
/* if we exhaust the linear part then add what is left as a frag */ size -= headlen; @@ -1187,7 +1180,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
hard_start = page_address(rx_buf->page) + rx_buf->page_offset - offset; - xdp_prepare_buff(&xdp, hard_start, offset, size, true); + xdp_prepare_buff(&xdp, hard_start, offset, size, !!offset); #if (PAGE_SIZE > 4096) /* At larger PAGE_SIZE, frame_sz depend on len size */ xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size); diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index 932b5661ec4d6..bfbe4b16df96a 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -9,10 +9,12 @@ #define ICE_DFLT_IRQ_WORK 256 #define ICE_RXBUF_3072 3072 #define ICE_RXBUF_2048 2048 +#define ICE_RXBUF_1664 1664 #define ICE_RXBUF_1536 1536 #define ICE_MAX_CHAINED_RX_BUFS 5 #define ICE_MAX_BUF_TXD 8 #define ICE_MIN_TX_LEN 17 +#define ICE_MAX_FRAME_LEGACY_RX 8320
/* The size limit for a transmit buffer in a descriptor is (16K - 1). * In order to align with the read requests we will align the value to
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit cb0473e0e9dccaa0ddafb252f2c3ef943b86bb56 ]
In preparation for XDP multi-buffer support, let's store xdp_buff on Rx ring struct. This will allow us to combine fragmented frames across separate NAPI cycles in the same way as currently skb fragments are handled. This means that skb pointer on Rx ring will become redundant and will be removed. For now it is kept and layout of Rx ring struct was not inspected, some member movement will be needed later on so that will be the time to take care of it.
Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Alexander Lobakin alexandr.lobakin@intel.com Link: https://lore.kernel.org/bpf/20230131204506.219292-3-maciej.fijalkowski@intel... Stable-dep-of: 50b2143356e8 ("ice: fix page reuse when PAGE_SIZE is over 8k") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_base.c | 1 + drivers/net/ethernet/intel/ice/ice_txrx.c | 39 +++++++++++++---------- drivers/net/ethernet/intel/ice/ice_txrx.h | 1 + 3 files changed, 25 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index c7c6f01538e0d..4db4ec4b8857a 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -534,6 +534,7 @@ int ice_vsi_cfg_rxq(struct ice_rx_ring *ring) } }
+ xdp_init_buff(&ring->xdp, ice_rx_pg_size(ring) / 2, &ring->xdp_rxq); err = ice_setup_rx_ctx(ring); if (err) { dev_err(dev, "ice_setup_rx_ctx failed for RxQ %d, err %d\n", diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 2db20263420d8..d3411170f3eaf 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -523,8 +523,16 @@ int ice_setup_rx_ring(struct ice_rx_ring *rx_ring) return -ENOMEM; }
+/** + * ice_rx_frame_truesize + * @rx_ring: ptr to Rx ring + * @size: size + * + * calculate the truesize with taking into the account PAGE_SIZE of + * underlying arch + */ static unsigned int -ice_rx_frame_truesize(struct ice_rx_ring *rx_ring, unsigned int __maybe_unused size) +ice_rx_frame_truesize(struct ice_rx_ring *rx_ring, const unsigned int size) { unsigned int truesize;
@@ -1103,21 +1111,20 @@ ice_is_non_eop(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc) */ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) { - unsigned int total_rx_bytes = 0, total_rx_pkts = 0, frame_sz = 0; + unsigned int total_rx_bytes = 0, total_rx_pkts = 0; u16 cleaned_count = ICE_DESC_UNUSED(rx_ring); unsigned int offset = rx_ring->rx_offset; + struct xdp_buff *xdp = &rx_ring->xdp; struct ice_tx_ring *xdp_ring = NULL; unsigned int xdp_res, xdp_xmit = 0; struct sk_buff *skb = rx_ring->skb; struct bpf_prog *xdp_prog = NULL; - struct xdp_buff xdp; bool failure;
/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */ #if (PAGE_SIZE < 8192) - frame_sz = ice_rx_frame_truesize(rx_ring, 0); + xdp->frame_sz = ice_rx_frame_truesize(rx_ring, 0); #endif - xdp_init_buff(&xdp, frame_sz, &rx_ring->xdp_rxq);
xdp_prog = READ_ONCE(rx_ring->xdp_prog); if (xdp_prog) @@ -1171,30 +1178,30 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) rx_buf = ice_get_rx_buf(rx_ring, size, &rx_buf_pgcnt);
if (!size) { - xdp.data = NULL; - xdp.data_end = NULL; - xdp.data_hard_start = NULL; - xdp.data_meta = NULL; + xdp->data = NULL; + xdp->data_end = NULL; + xdp->data_hard_start = NULL; + xdp->data_meta = NULL; goto construct_skb; }
hard_start = page_address(rx_buf->page) + rx_buf->page_offset - offset; - xdp_prepare_buff(&xdp, hard_start, offset, size, !!offset); + xdp_prepare_buff(xdp, hard_start, offset, size, !!offset); #if (PAGE_SIZE > 4096) /* At larger PAGE_SIZE, frame_sz depend on len size */ - xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size); + xdp->frame_sz = ice_rx_frame_truesize(rx_ring, size); #endif
if (!xdp_prog) goto construct_skb;
- xdp_res = ice_run_xdp(rx_ring, &xdp, xdp_prog, xdp_ring); + xdp_res = ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring); if (!xdp_res) goto construct_skb; if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) { xdp_xmit |= xdp_res; - ice_rx_buf_adjust_pg_offset(rx_buf, xdp.frame_sz); + ice_rx_buf_adjust_pg_offset(rx_buf, xdp->frame_sz); } else { rx_buf->pagecnt_bias++; } @@ -1207,11 +1214,11 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) construct_skb: if (skb) { ice_add_rx_frag(rx_ring, rx_buf, skb, size); - } else if (likely(xdp.data)) { + } else if (likely(xdp->data)) { if (ice_ring_uses_build_skb(rx_ring)) - skb = ice_build_skb(rx_ring, rx_buf, &xdp); + skb = ice_build_skb(rx_ring, rx_buf, xdp); else - skb = ice_construct_skb(rx_ring, rx_buf, &xdp); + skb = ice_construct_skb(rx_ring, rx_buf, xdp); } /* exit if we failed to retrieve a buffer */ if (!skb) { diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index bfbe4b16df96a..ef8245f795c5b 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -295,6 +295,7 @@ struct ice_rx_ring { struct bpf_prog *xdp_prog; struct ice_tx_ring *xdp_ring; struct xsk_buff_pool *xsk_pool; + struct xdp_buff xdp; struct sk_buff *skb; dma_addr_t dma; /* physical address of ring */ u64 cached_phctime;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit ac0753391195011ded23696d7882428e5c419a98 ]
This will allow us to avoid carrying additional auxiliary array of page counts when dealing with XDP multi buffer support. Previously combining fragmented frame to skb was not affected in the same way as XDP would be as whole frame is needed to be in place before executing XDP prog. Therefore, when going through HW Rx descriptors one-by-one, calls to ice_put_rx_buf() need to be taken *after* running XDP prog on a potentially multi buffered frame, so some additional storage of page count is needed.
By adding page count to rx buf, it will make it easier to walk through processed entries at the end of rx cleaning routine and decide whether or not buffers should be recycled.
While at it, bump ice_rx_buf::pagecnt_bias from u16 up to u32. It was proven many times that calculations on variables smaller than standard register size are harmful. This was also the case during experiments with embedding page count to ice_rx_buf - when this was added as u16 it had a performance impact.
Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Alexander Lobakin alexandr.lobakin@intel.com Link: https://lore.kernel.org/bpf/20230131204506.219292-4-maciej.fijalkowski@intel... Stable-dep-of: 50b2143356e8 ("ice: fix page reuse when PAGE_SIZE is over 8k") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 26 +++++++++-------------- drivers/net/ethernet/intel/ice/ice_txrx.h | 3 ++- 2 files changed, 12 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index d3411170f3eaf..977b268802dfa 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -791,7 +791,6 @@ ice_rx_buf_adjust_pg_offset(struct ice_rx_buf *rx_buf, unsigned int size) /** * ice_can_reuse_rx_page - Determine if page can be reused for another Rx * @rx_buf: buffer containing the page - * @rx_buf_pgcnt: rx_buf page refcount pre xdp_do_redirect() call * * If page is reusable, we have a green light for calling ice_reuse_rx_page, * which will assign the current buffer to the buffer that next_to_alloc is @@ -799,7 +798,7 @@ ice_rx_buf_adjust_pg_offset(struct ice_rx_buf *rx_buf, unsigned int size) * page freed */ static bool -ice_can_reuse_rx_page(struct ice_rx_buf *rx_buf, int rx_buf_pgcnt) +ice_can_reuse_rx_page(struct ice_rx_buf *rx_buf) { unsigned int pagecnt_bias = rx_buf->pagecnt_bias; struct page *page = rx_buf->page; @@ -810,7 +809,7 @@ ice_can_reuse_rx_page(struct ice_rx_buf *rx_buf, int rx_buf_pgcnt)
#if (PAGE_SIZE < 8192) /* if we are only owner of page we can reuse it */ - if (unlikely((rx_buf_pgcnt - pagecnt_bias) > 1)) + if (unlikely(rx_buf->pgcnt - pagecnt_bias > 1)) return false; #else #define ICE_LAST_OFFSET \ @@ -894,19 +893,17 @@ ice_reuse_rx_page(struct ice_rx_ring *rx_ring, struct ice_rx_buf *old_buf) * ice_get_rx_buf - Fetch Rx buffer and synchronize data for use * @rx_ring: Rx descriptor ring to transact packets on * @size: size of buffer to add to skb - * @rx_buf_pgcnt: rx_buf page refcount * * This function will pull an Rx buffer from the ring and synchronize it * for use by the CPU. */ static struct ice_rx_buf * -ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size, - int *rx_buf_pgcnt) +ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size) { struct ice_rx_buf *rx_buf;
rx_buf = &rx_ring->rx_buf[rx_ring->next_to_clean]; - *rx_buf_pgcnt = + rx_buf->pgcnt = #if (PAGE_SIZE < 8192) page_count(rx_buf->page); #else @@ -1042,15 +1039,13 @@ ice_construct_skb(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf, * ice_put_rx_buf - Clean up used buffer and either recycle or free * @rx_ring: Rx descriptor ring to transact packets on * @rx_buf: Rx buffer to pull data from - * @rx_buf_pgcnt: Rx buffer page count pre xdp_do_redirect() * * This function will update next_to_clean and then clean up the contents * of the rx_buf. It will either recycle the buffer or unmap it and free * the associated resources. */ static void -ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf, - int rx_buf_pgcnt) +ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf) { u16 ntc = rx_ring->next_to_clean + 1;
@@ -1061,7 +1056,7 @@ ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf, if (!rx_buf) return;
- if (ice_can_reuse_rx_page(rx_buf, rx_buf_pgcnt)) { + if (ice_can_reuse_rx_page(rx_buf)) { /* hand second half of page back to the ring */ ice_reuse_rx_page(rx_ring, rx_buf); } else { @@ -1137,7 +1132,6 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) unsigned char *hard_start; unsigned int size; u16 stat_err_bits; - int rx_buf_pgcnt; u16 vlan_tag = 0; u16 rx_ptype;
@@ -1166,7 +1160,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) if (rx_desc->wb.rxdid == FDIR_DESC_RXDID && ctrl_vsi->vf) ice_vc_fdir_irq_handler(ctrl_vsi, rx_desc); - ice_put_rx_buf(rx_ring, NULL, 0); + ice_put_rx_buf(rx_ring, NULL); cleaned_count++; continue; } @@ -1175,7 +1169,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) ICE_RX_FLX_DESC_PKT_LEN_M;
/* retrieve a buffer from the ring */ - rx_buf = ice_get_rx_buf(rx_ring, size, &rx_buf_pgcnt); + rx_buf = ice_get_rx_buf(rx_ring, size);
if (!size) { xdp->data = NULL; @@ -1209,7 +1203,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) total_rx_pkts++;
cleaned_count++; - ice_put_rx_buf(rx_ring, rx_buf, rx_buf_pgcnt); + ice_put_rx_buf(rx_ring, rx_buf); continue; construct_skb: if (skb) { @@ -1228,7 +1222,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) break; }
- ice_put_rx_buf(rx_ring, rx_buf, rx_buf_pgcnt); + ice_put_rx_buf(rx_ring, rx_buf); cleaned_count++;
/* skip if it is NOP desc */ diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index ef8245f795c5b..c1d9b3cebb059 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -172,7 +172,8 @@ struct ice_rx_buf { dma_addr_t dma; struct page *page; unsigned int page_offset; - u16 pagecnt_bias; + unsigned int pgcnt; + unsigned int pagecnt_bias; };
struct ice_q_stats {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit d7956d81f1502d3818500cff4847f3e9ae0c6aa4 ]
Plan is to move ice_put_rx_buf() to the end of ice_clean_rx_irq() so in order to keep the ability of walking through HW Rx descriptors, pull out next_to_clean handling out of ice_put_rx_buf().
Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Alexander Lobakin alexandr.lobakin@intel.com Link: https://lore.kernel.org/bpf/20230131204506.219292-5-maciej.fijalkowski@intel... Stable-dep-of: 50b2143356e8 ("ice: fix page reuse when PAGE_SIZE is over 8k") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 29 +++++++++++++---------- 1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 977b268802dfa..6f930d99b496b 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -898,11 +898,12 @@ ice_reuse_rx_page(struct ice_rx_ring *rx_ring, struct ice_rx_buf *old_buf) * for use by the CPU. */ static struct ice_rx_buf * -ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size) +ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size, + const unsigned int ntc) { struct ice_rx_buf *rx_buf;
- rx_buf = &rx_ring->rx_buf[rx_ring->next_to_clean]; + rx_buf = &rx_ring->rx_buf[ntc]; rx_buf->pgcnt = #if (PAGE_SIZE < 8192) page_count(rx_buf->page); @@ -1040,19 +1041,12 @@ ice_construct_skb(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf, * @rx_ring: Rx descriptor ring to transact packets on * @rx_buf: Rx buffer to pull data from * - * This function will update next_to_clean and then clean up the contents - * of the rx_buf. It will either recycle the buffer or unmap it and free - * the associated resources. + * This function will clean up the contents of the rx_buf. It will either + * recycle the buffer or unmap it and free the associated resources. */ static void ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf) { - u16 ntc = rx_ring->next_to_clean + 1; - - /* fetch, update, and store next to clean */ - ntc = (ntc < rx_ring->count) ? ntc : 0; - rx_ring->next_to_clean = ntc; - if (!rx_buf) return;
@@ -1114,6 +1108,8 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) unsigned int xdp_res, xdp_xmit = 0; struct sk_buff *skb = rx_ring->skb; struct bpf_prog *xdp_prog = NULL; + u32 ntc = rx_ring->next_to_clean; + u32 cnt = rx_ring->count; bool failure;
/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */ @@ -1136,7 +1132,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) u16 rx_ptype;
/* get the Rx desc from Rx ring based on 'next_to_clean' */ - rx_desc = ICE_RX_DESC(rx_ring, rx_ring->next_to_clean); + rx_desc = ICE_RX_DESC(rx_ring, ntc);
/* status_error_len will always be zero for unused descriptors * because it's cleared in cleanup, and overlaps with hdr_addr @@ -1160,6 +1156,8 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) if (rx_desc->wb.rxdid == FDIR_DESC_RXDID && ctrl_vsi->vf) ice_vc_fdir_irq_handler(ctrl_vsi, rx_desc); + if (++ntc == cnt) + ntc = 0; ice_put_rx_buf(rx_ring, NULL); cleaned_count++; continue; @@ -1169,7 +1167,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) ICE_RX_FLX_DESC_PKT_LEN_M;
/* retrieve a buffer from the ring */ - rx_buf = ice_get_rx_buf(rx_ring, size); + rx_buf = ice_get_rx_buf(rx_ring, size, ntc);
if (!size) { xdp->data = NULL; @@ -1203,6 +1201,8 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) total_rx_pkts++;
cleaned_count++; + if (++ntc == cnt) + ntc = 0; ice_put_rx_buf(rx_ring, rx_buf); continue; construct_skb: @@ -1222,6 +1222,8 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) break; }
+ if (++ntc == cnt) + ntc = 0; ice_put_rx_buf(rx_ring, rx_buf); cleaned_count++;
@@ -1262,6 +1264,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) total_rx_pkts++; }
+ rx_ring->next_to_clean = ntc; /* return up to cleaned_count buffers to hardware */ failure = ice_alloc_rx_bufs(rx_ring, cleaned_count);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit 50b2143356e888777fc5bca023c39f34f404613a ]
Architectures that have PAGE_SIZE >= 8192 such as arm64 should act the same as x86 currently, meaning reuse of a page should only take place when no one else is busy with it.
Do two things independently of underlying PAGE_SIZE: - store the page count under ice_rx_buf::pgcnt - then act upon its value vs ice_rx_buf::pagecnt_bias when making the decision regarding page reuse
Fixes: 2b245cb29421 ("ice: Implement transmit and NAPI support") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Chandan Kumar Rout chandanx.rout@intel.com (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 6f930d99b496b..6ad4bdb0124e7 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -807,16 +807,15 @@ ice_can_reuse_rx_page(struct ice_rx_buf *rx_buf) if (!dev_page_is_reusable(page)) return false;
-#if (PAGE_SIZE < 8192) /* if we are only owner of page we can reuse it */ if (unlikely(rx_buf->pgcnt - pagecnt_bias > 1)) return false; -#else +#if (PAGE_SIZE >= 8192) #define ICE_LAST_OFFSET \ (SKB_WITH_OVERHEAD(PAGE_SIZE) - ICE_RXBUF_2048) if (rx_buf->page_offset > ICE_LAST_OFFSET) return false; -#endif /* PAGE_SIZE < 8192) */ +#endif /* PAGE_SIZE >= 8192) */
/* If we have drained the page fragment pool we need to update * the pagecnt_bias and page count so that we fully restock the @@ -904,12 +903,7 @@ ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size, struct ice_rx_buf *rx_buf;
rx_buf = &rx_ring->rx_buf[ntc]; - rx_buf->pgcnt = -#if (PAGE_SIZE < 8192) - page_count(rx_buf->page); -#else - 0; -#endif + rx_buf->pgcnt = page_count(rx_buf->page); prefetchw(rx_buf->page);
if (!size)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit b966ad832942b5a11e002f9b5ef102b08425b84a ]
For bigger PAGE_SIZE archs, ice driver works on 3k Rx buffers. Therefore, ICE_LAST_OFFSET should take into account ICE_RXBUF_3072, not ICE_RXBUF_2048.
Fixes: 7237f5b0dba4 ("ice: introduce legacy Rx flag") Suggested-by: Luiz Capitulino luizcap@redhat.com Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Chandan Kumar Rout chandanx.rout@intel.com (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 6ad4bdb0124e7..8577bf0ed5402 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -812,7 +812,7 @@ ice_can_reuse_rx_page(struct ice_rx_buf *rx_buf) return false; #if (PAGE_SIZE >= 8192) #define ICE_LAST_OFFSET \ - (SKB_WITH_OVERHEAD(PAGE_SIZE) - ICE_RXBUF_2048) + (SKB_WITH_OVERHEAD(PAGE_SIZE) - ICE_RXBUF_3072) if (rx_buf->page_offset > ICE_LAST_OFFSET) return false; #endif /* PAGE_SIZE >= 8192) */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit c50e7475961c36ec4d21d60af055b32f9436b431 ]
The dpaa2_switch_add_bufs() function returns the number of bufs that it was able to add. It returns BUFS_PER_CMD (7) for complete success or a smaller number if there are not enough pages available. However, the error checking is looking at the total number of bufs instead of the number which were added on this iteration. Thus the error checking only works correctly for the first iteration through the loop and subsequent iterations are always counted as a success.
Fix this by checking only the bufs added in the current iteration.
Fixes: 0b1b71370458 ("staging: dpaa2-switch: handle Rx path on control interface") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Ioana Ciornei ioana.ciornei@nxp.com Tested-by: Ioana Ciornei ioana.ciornei@nxp.com Link: https://patch.msgid.link/eec27f30-b43f-42b6-b8ee-04a6f83423b6@stanley.mounta... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c index b98ef4ba172f6..d6c871f227947 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c @@ -2583,13 +2583,14 @@ static int dpaa2_switch_refill_bp(struct ethsw_core *ethsw)
static int dpaa2_switch_seed_bp(struct ethsw_core *ethsw) { - int *count, i; + int *count, ret, i;
for (i = 0; i < DPAA2_ETHSW_NUM_BUFS; i += BUFS_PER_CMD) { + ret = dpaa2_switch_add_bufs(ethsw, ethsw->bpid); count = ðsw->buf_count; - *count += dpaa2_switch_add_bufs(ethsw, ethsw->bpid); + *count += ret;
- if (unlikely(*count < BUFS_PER_CMD)) + if (unlikely(ret < BUFS_PER_CMD)) return -ENOMEM; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joseph Huang Joseph.Huang@garmin.com
[ Upstream commit 528876d867a23b5198022baf2e388052ca67c952 ]
If an ATU violation was caused by a CPU Load operation, the SPID could be larger than DSA_MAX_PORTS (the size of mv88e6xxx_chip.ports[] array).
Fixes: 75c05a74e745 ("net: dsa: mv88e6xxx: Fix counting of ATU violations") Signed-off-by: Joseph Huang Joseph.Huang@garmin.com Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://patch.msgid.link/20240819235251.1331763-1-Joseph.Huang@garmin.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mv88e6xxx/global1_atu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/mv88e6xxx/global1_atu.c b/drivers/net/dsa/mv88e6xxx/global1_atu.c index 7c513a03789cf..17fd62616ce6d 100644 --- a/drivers/net/dsa/mv88e6xxx/global1_atu.c +++ b/drivers/net/dsa/mv88e6xxx/global1_atu.c @@ -453,7 +453,8 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) trace_mv88e6xxx_atu_full_violation(chip->dev, spid, entry.portvec, entry.mac, fid); - chip->ports[spid].atu_full_violation++; + if (spid < ARRAY_SIZE(chip->ports)) + chip->ports[spid].atu_full_violation++; } mv88e6xxx_reg_unlock(chip);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephen Hemminger stephen@networkplumber.org
[ Upstream commit c07ff8592d57ed258afee5a5e04991a48dbaf382 ]
There is a bug in netem_enqueue() introduced by commit 5845f706388a ("net: netem: fix skb length BUG_ON in __skb_to_sgvec") that can lead to a use-after-free.
This commit made netem_enqueue() always return NET_XMIT_SUCCESS when a packet is duplicated, which can cause the parent qdisc's q.qlen to be mistakenly incremented. When this happens qlen_notify() may be skipped on the parent during destruction, leaving a dangling pointer for some classful qdiscs like DRR.
There are two ways for the bug happen:
- If the duplicated packet is dropped by rootq->enqueue() and then the original packet is also dropped. - If rootq->enqueue() sends the duplicated packet to a different qdisc and the original packet is dropped.
In both cases NET_XMIT_SUCCESS is returned even though no packets are enqueued at the netem qdisc.
The fix is to defer the enqueue of the duplicate packet until after the original packet has been guaranteed to return NET_XMIT_SUCCESS.
Fixes: 5845f706388a ("net: netem: fix skb length BUG_ON in __skb_to_sgvec") Reported-by: Budimir Markovic markovicbudimir@gmail.com Signed-off-by: Stephen Hemminger stephen@networkplumber.org Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20240819175753.5151-1-stephen@networkplumber.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_netem.c | 47 ++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 18 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index d0e045116d4e9..a18b24c125f4e 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -437,12 +437,10 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct netem_sched_data *q = qdisc_priv(sch); /* We don't fill cb now as skb_unshare() may invalidate it */ struct netem_skb_cb *cb; - struct sk_buff *skb2; + struct sk_buff *skb2 = NULL; struct sk_buff *segs = NULL; unsigned int prev_len = qdisc_pkt_len(skb); int count = 1; - int rc = NET_XMIT_SUCCESS; - int rc_drop = NET_XMIT_DROP;
/* Do not fool qdisc_drop_all() */ skb->prev = NULL; @@ -471,19 +469,11 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, skb_orphan_partial(skb);
/* - * If we need to duplicate packet, then re-insert at top of the - * qdisc tree, since parent queuer expects that only one - * skb will be queued. + * If we need to duplicate packet, then clone it before + * original is modified. */ - if (count > 1 && (skb2 = skb_clone(skb, GFP_ATOMIC)) != NULL) { - struct Qdisc *rootq = qdisc_root_bh(sch); - u32 dupsave = q->duplicate; /* prevent duplicating a dup... */ - - q->duplicate = 0; - rootq->enqueue(skb2, rootq, to_free); - q->duplicate = dupsave; - rc_drop = NET_XMIT_SUCCESS; - } + if (count > 1) + skb2 = skb_clone(skb, GFP_ATOMIC);
/* * Randomized packet corruption. @@ -495,7 +485,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, if (skb_is_gso(skb)) { skb = netem_segment(skb, sch, to_free); if (!skb) - return rc_drop; + goto finish_segs; + segs = skb->next; skb_mark_not_on_list(skb); qdisc_skb_cb(skb)->pkt_len = skb->len; @@ -521,7 +512,24 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, /* re-link segs, so that qdisc_drop_all() frees them all */ skb->next = segs; qdisc_drop_all(skb, sch, to_free); - return rc_drop; + if (skb2) + __qdisc_drop(skb2, to_free); + return NET_XMIT_DROP; + } + + /* + * If doing duplication then re-insert at top of the + * qdisc tree, since parent queuer expects that only one + * skb will be queued. + */ + if (skb2) { + struct Qdisc *rootq = qdisc_root_bh(sch); + u32 dupsave = q->duplicate; /* prevent duplicating a dup... */ + + q->duplicate = 0; + rootq->enqueue(skb2, rootq, to_free); + q->duplicate = dupsave; + skb2 = NULL; }
qdisc_qstats_backlog_inc(sch, skb); @@ -592,9 +600,12 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, }
finish_segs: + if (skb2) + __qdisc_drop(skb2, to_free); + if (segs) { unsigned int len, last_len; - int nb; + int rc, nb;
len = skb ? skb->len : 0; nb = skb ? 1 : 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit faa389b2fbaaec7fd27a390b4896139f9da662e3 ]
syzbot reported an UAF in ip6_send_skb() [1]
After ip6_local_out() has returned, we no longer can safely dereference rt, unless we hold rcu_read_lock().
A similar issue has been fixed in commit a688caa34beb ("ipv6: take rcu lock in rawv6_send_hdrinc()")
Another potential issue in ip6_finish_output2() is handled in a separate patch.
[1] BUG: KASAN: slab-use-after-free in ip6_send_skb+0x18d/0x230 net/ipv6/ip6_output.c:1964 Read of size 8 at addr ffff88806dde4858 by task syz.1.380/6530
CPU: 1 UID: 0 PID: 6530 Comm: syz.1.380 Not tainted 6.11.0-rc3-syzkaller-00306-gdf6cbc62cc9b #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 ip6_send_skb+0x18d/0x230 net/ipv6/ip6_output.c:1964 rawv6_push_pending_frames+0x75c/0x9e0 net/ipv6/raw.c:588 rawv6_sendmsg+0x19c7/0x23c0 net/ipv6/raw.c:926 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 sock_write_iter+0x2dd/0x400 net/socket.c:1160 do_iter_readv_writev+0x60a/0x890 vfs_writev+0x37c/0xbb0 fs/read_write.c:971 do_writev+0x1b1/0x350 fs/read_write.c:1018 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f936bf79e79 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f936cd7f038 EFLAGS: 00000246 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 00007f936c115f80 RCX: 00007f936bf79e79 RDX: 0000000000000001 RSI: 0000000020000040 RDI: 0000000000000004 RBP: 00007f936bfe7916 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f936c115f80 R15: 00007fff2860a7a8 </TASK>
Allocated by task 6530: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:312 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3988 [inline] slab_alloc_node mm/slub.c:4037 [inline] kmem_cache_alloc_noprof+0x135/0x2a0 mm/slub.c:4044 dst_alloc+0x12b/0x190 net/core/dst.c:89 ip6_blackhole_route+0x59/0x340 net/ipv6/route.c:2670 make_blackhole net/xfrm/xfrm_policy.c:3120 [inline] xfrm_lookup_route+0xd1/0x1c0 net/xfrm/xfrm_policy.c:3313 ip6_dst_lookup_flow+0x13e/0x180 net/ipv6/ip6_output.c:1257 rawv6_sendmsg+0x1283/0x23c0 net/ipv6/raw.c:898 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597 ___sys_sendmsg net/socket.c:2651 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 45: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2252 [inline] slab_free mm/slub.c:4473 [inline] kmem_cache_free+0x145/0x350 mm/slub.c:4548 dst_destroy+0x2ac/0x460 net/core/dst.c:124 rcu_do_batch kernel/rcu/tree.c:2569 [inline] rcu_core+0xafd/0x1830 kernel/rcu/tree.c:2843 handle_softirqs+0x2c4/0x970 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637 irq_exit_rcu+0x9/0x30 kernel/softirq.c:649 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
Last potentially related work creation: kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47 __kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541 __call_rcu_common kernel/rcu/tree.c:3106 [inline] call_rcu+0x167/0xa70 kernel/rcu/tree.c:3210 refdst_drop include/net/dst.h:263 [inline] skb_dst_drop include/net/dst.h:275 [inline] nf_ct_frag6_queue net/ipv6/netfilter/nf_conntrack_reasm.c:306 [inline] nf_ct_frag6_gather+0xb9a/0x2080 net/ipv6/netfilter/nf_conntrack_reasm.c:485 ipv6_defrag+0x2c8/0x3c0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:67 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626 nf_hook include/linux/netfilter.h:269 [inline] __ip6_local_out+0x6fa/0x800 net/ipv6/output_core.c:143 ip6_local_out+0x26/0x70 net/ipv6/output_core.c:153 ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1959 rawv6_push_pending_frames+0x75c/0x9e0 net/ipv6/raw.c:588 rawv6_sendmsg+0x19c7/0x23c0 net/ipv6/raw.c:926 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 sock_write_iter+0x2dd/0x400 net/socket.c:1160 do_iter_readv_writev+0x60a/0x890
Fixes: 0625491493d9 ("ipv6: ip6_push_pending_frames() should increment IPSTATS_MIB_OUTDISCARDS") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://patch.msgid.link/20240820160859.3786976-2-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index df79044fbf3c4..796cf0a0a4225 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1993,6 +1993,7 @@ int ip6_send_skb(struct sk_buff *skb) struct rt6_info *rt = (struct rt6_info *)skb_dst(skb); int err;
+ rcu_read_lock(); err = ip6_local_out(net, skb->sk, skb); if (err) { if (err > 0) @@ -2002,6 +2003,7 @@ int ip6_send_skb(struct sk_buff *skb) IPSTATS_MIB_OUTDISCARDS); }
+ rcu_read_unlock(); return err; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit da273b377ae0d9bd255281ed3c2adb228321687b ]
If skb_expand_head() returns NULL, skb has been freed and associated dst/idev could also have been freed.
We need to hold rcu_read_lock() to make sure the dst and associated idev are alive.
Fixes: 5796015fa968 ("ipv6: allocate enough headroom in ip6_finish_output2()") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Vasily Averin vasily.averin@linux.dev Reviewed-by: David Ahern dsahern@kernel.org Link: https://patch.msgid.link/20240820160859.3786976-3-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 796cf0a0a4225..cfca9627398ee 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -69,11 +69,15 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *
/* Be paranoid, rather than too clever. */ if (unlikely(hh_len > skb_headroom(skb)) && dev->header_ops) { + /* Make sure idev stays alive */ + rcu_read_lock(); skb = skb_expand_head(skb, hh_len); if (!skb) { IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); + rcu_read_unlock(); return -ENOMEM; } + rcu_read_unlock(); }
hdr = ipv6_hdr(skb);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 2d5ff7e339d04622d8282661df36151906d0e1c7 ]
If skb_expand_head() returns NULL, skb has been freed and the associated dst/idev could also have been freed.
We must use rcu_read_lock() to prevent a possible UAF.
Fixes: 0c9f227bee11 ("ipv6: use skb_expand_head in ip6_xmit") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Vasily Averin vasily.averin@linux.dev Reviewed-by: David Ahern dsahern@kernel.org Link: https://patch.msgid.link/20240820160859.3786976-4-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index cfca9627398ee..f2227e662d1cf 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -278,11 +278,15 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6, head_room += opt->opt_nflen + opt->opt_flen;
if (unlikely(head_room > skb_headroom(skb))) { + /* Make sure idev stays alive */ + rcu_read_lock(); skb = skb_expand_head(skb, head_room); if (!skb) { IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); + rcu_read_unlock(); return -ENOBUFS; } + rcu_read_unlock(); }
if (opt) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 6ea14ccb60c8ab829349979b22b58a941ec4a3ee ]
Ensure there is sufficient room to access the protocol field of the VLAN header, validate it once before the flowtable lookup.
===================================================== BUG: KMSAN: uninit-value in nf_flow_offload_inet_hook+0x45a/0x5f0 net/netfilter/nf_flow_table_inet.c:32 nf_flow_offload_inet_hook+0x45a/0x5f0 net/netfilter/nf_flow_table_inet.c:32 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626 nf_hook_ingress include/linux/netfilter_netdev.h:34 [inline] nf_ingress net/core/dev.c:5440 [inline]
Fixes: 4cd91f7c290f ("netfilter: flowtable: add vlan support") Reported-by: syzbot+8407d9bb88cd4c6bf61a@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_flow_table_inet.c | 3 +++ net/netfilter/nf_flow_table_ip.c | 3 +++ 2 files changed, 6 insertions(+)
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c index 6eef15648b7b0..b0f1991719324 100644 --- a/net/netfilter/nf_flow_table_inet.c +++ b/net/netfilter/nf_flow_table_inet.c @@ -17,6 +17,9 @@ nf_flow_offload_inet_hook(void *priv, struct sk_buff *skb,
switch (skb->protocol) { case htons(ETH_P_8021Q): + if (!pskb_may_pull(skb, skb_mac_offset(skb) + sizeof(*veth))) + return NF_ACCEPT; + veth = (struct vlan_ethhdr *)skb_mac_header(skb); proto = veth->h_vlan_encapsulated_proto; break; diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c index 22bc0e3d8a0b5..34be2c9bc39d8 100644 --- a/net/netfilter/nf_flow_table_ip.c +++ b/net/netfilter/nf_flow_table_ip.c @@ -275,6 +275,9 @@ static bool nf_flow_skb_encap_protocol(struct sk_buff *skb, __be16 proto,
switch (skb->protocol) { case htons(ETH_P_8021Q): + if (!pskb_may_pull(skb, skb_mac_offset(skb) + sizeof(*veth))) + return false; + veth = (struct vlan_ethhdr *)skb_mac_header(skb); if (veth->h_vlan_encapsulated_proto == proto) { *offset += VLAN_HLEN;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bharat Bhushan bbhushan2@marvell.com
[ Upstream commit af688a99eb1fc7ef69774665d61e6be51cea627a ]
Some CPT AF registers are per LF and others are global. Translation of PF/VF local LF slot number to actual LF slot number is required only for accessing perf LF registers. CPT AF global registers access do not require any LF slot number. Also, there is no reason CPT PF/VF to know actual lf's register offset.
Without this fix microcode loading will fail, VFs cannot be created and hardware is not usable.
Fixes: bc35e28af789 ("octeontx2-af: replace cpt slot with lf id on reg write") Signed-off-by: Bharat Bhushan bbhushan2@marvell.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20240821070558.1020101-1-bbhushan2@marvell.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/marvell/octeontx2/af/rvu_cpt.c | 23 +++++++++---------- 1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cpt.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cpt.c index b226a4d376aab..160e044c25c24 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cpt.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cpt.c @@ -632,7 +632,9 @@ int rvu_mbox_handler_cpt_inline_ipsec_cfg(struct rvu *rvu, return ret; }
-static bool is_valid_offset(struct rvu *rvu, struct cpt_rd_wr_reg_msg *req) +static bool validate_and_update_reg_offset(struct rvu *rvu, + struct cpt_rd_wr_reg_msg *req, + u64 *reg_offset) { u64 offset = req->reg_offset; int blkaddr, num_lfs, lf; @@ -663,6 +665,11 @@ static bool is_valid_offset(struct rvu *rvu, struct cpt_rd_wr_reg_msg *req) if (lf < 0) return false;
+ /* Translate local LF's offset to global CPT LF's offset to + * access LFX register. + */ + *reg_offset = (req->reg_offset & 0xFF000) + (lf << 3); + return true; } else if (!(req->hdr.pcifunc & RVU_PFVF_FUNC_MASK)) { /* Registers that can be accessed from PF */ @@ -697,7 +704,7 @@ int rvu_mbox_handler_cpt_rd_wr_register(struct rvu *rvu, struct cpt_rd_wr_reg_msg *rsp) { u64 offset = req->reg_offset; - int blkaddr, lf; + int blkaddr;
blkaddr = validate_and_get_cpt_blkaddr(req->blkaddr); if (blkaddr < 0) @@ -708,18 +715,10 @@ int rvu_mbox_handler_cpt_rd_wr_register(struct rvu *rvu, !is_cpt_vf(rvu, req->hdr.pcifunc)) return CPT_AF_ERR_ACCESS_DENIED;
- if (!is_valid_offset(rvu, req)) + if (!validate_and_update_reg_offset(rvu, req, &offset)) return CPT_AF_ERR_ACCESS_DENIED;
- /* Translate local LF used by VFs to global CPT LF */ - lf = rvu_get_lf(rvu, &rvu->hw->block[blkaddr], req->hdr.pcifunc, - (offset & 0xFFF) >> 3); - - /* Translate local LF's offset to global CPT LF's offset */ - offset &= 0xFF000; - offset += lf << 3; - - rsp->reg_offset = offset; + rsp->reg_offset = req->reg_offset; rsp->ret_val = req->ret_val; rsp->is_write = req->is_write;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Anderson sean.anderson@linux.dev
[ Upstream commit 4ae738dfef2c0323752ab81786e2d298c9939321 ]
If promiscuous mode is disabled when there are fewer than four multicast addresses, then it will not be reflected in the hardware. Fix this by always clearing the promiscuous mode flag even when we program multicast addresses.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Sean Anderson sean.anderson@linux.dev Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20240822154059.1066595-2-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index ff777735be66b..ff4b31e93d75f 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -429,6 +429,10 @@ static void axienet_set_multicast_list(struct net_device *ndev) } else if (!netdev_mc_empty(ndev)) { struct netdev_hw_addr *ha;
+ reg = axienet_ior(lp, XAE_FMI_OFFSET); + reg &= ~XAE_FMI_PM_MASK; + axienet_iow(lp, XAE_FMI_OFFSET, reg); + i = 0; netdev_for_each_mc_addr(ha, ndev) { if (i >= XAE_MULTICAST_CAM_TABLE_NUM)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Anderson sean.anderson@linux.dev
[ Upstream commit 797a68c9de0f5a5447baf4bd3bb9c10a3993435b ]
If a multicast address is removed but there are still some multicast addresses, that address would remain programmed into the frame filter. Fix this by explicitly setting the enable bit for each filter.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Sean Anderson sean.anderson@linux.dev Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20240822154059.1066595-3-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/xilinx/xilinx_axienet.h | 1 + .../net/ethernet/xilinx/xilinx_axienet_main.c | 21 ++++++++----------- 2 files changed, 10 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h index 969bea5541976..503c32413474a 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h @@ -169,6 +169,7 @@ #define XAE_UAW0_OFFSET 0x00000700 /* Unicast address word 0 */ #define XAE_UAW1_OFFSET 0x00000704 /* Unicast address word 1 */ #define XAE_FMI_OFFSET 0x00000708 /* Frame Filter Control */ +#define XAE_FFE_OFFSET 0x0000070C /* Frame Filter Enable */ #define XAE_AF0_OFFSET 0x00000710 /* Address Filter 0 */ #define XAE_AF1_OFFSET 0x00000714 /* Address Filter 1 */
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index ff4b31e93d75f..59d1cfbf7d6b7 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -411,7 +411,7 @@ static int netdev_set_mac_address(struct net_device *ndev, void *p) */ static void axienet_set_multicast_list(struct net_device *ndev) { - int i; + int i = 0; u32 reg, af0reg, af1reg; struct axienet_local *lp = netdev_priv(ndev);
@@ -433,7 +433,6 @@ static void axienet_set_multicast_list(struct net_device *ndev) reg &= ~XAE_FMI_PM_MASK; axienet_iow(lp, XAE_FMI_OFFSET, reg);
- i = 0; netdev_for_each_mc_addr(ha, ndev) { if (i >= XAE_MULTICAST_CAM_TABLE_NUM) break; @@ -452,6 +451,7 @@ static void axienet_set_multicast_list(struct net_device *ndev) axienet_iow(lp, XAE_FMI_OFFSET, reg); axienet_iow(lp, XAE_AF0_OFFSET, af0reg); axienet_iow(lp, XAE_AF1_OFFSET, af1reg); + axienet_iow(lp, XAE_FFE_OFFSET, 1); i++; } } else { @@ -459,18 +459,15 @@ static void axienet_set_multicast_list(struct net_device *ndev) reg &= ~XAE_FMI_PM_MASK;
axienet_iow(lp, XAE_FMI_OFFSET, reg); - - for (i = 0; i < XAE_MULTICAST_CAM_TABLE_NUM; i++) { - reg = axienet_ior(lp, XAE_FMI_OFFSET) & 0xFFFFFF00; - reg |= i; - - axienet_iow(lp, XAE_FMI_OFFSET, reg); - axienet_iow(lp, XAE_AF0_OFFSET, 0); - axienet_iow(lp, XAE_AF1_OFFSET, 0); - } - dev_info(&ndev->dev, "Promiscuous mode disabled.\n"); } + + for (; i < XAE_MULTICAST_CAM_TABLE_NUM; i++) { + reg = axienet_ior(lp, XAE_FMI_OFFSET) & 0xFFFFFF00; + reg |= i; + axienet_iow(lp, XAE_FMI_OFFSET, reg); + axienet_iow(lp, XAE_FFE_OFFSET, 0); + } }
/**
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit df24373435f5899a2a98b7d377479c8d4376613b ]
DPU debugging macros need to be converted to a proper drm_debug_* macros, however this is a going an intrusive patch, not suitable for a fix. Wire DPU_DEBUG and DPU_DEBUG_DRIVER to always use DRM_DEBUG_DRIVER to make sure that DPU debugging messages always end up in the drm debug messages and are controlled via the usual drm.debug mask.
I don't think that it is a good idea for a generic DPU_DEBUG macro to be tied to DRM_UT_KMS. It is used to report a debug message from driver, so by default it should go to the DRM_UT_DRIVER channel. While refactoring debug macros later on we might end up with particular messages going to ATOMIC or KMS, but DRIVER should be the default.
Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/606932/ Link: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-2-7eac9eb8e895@linaro.org Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h index bb35aa5f5709f..41e44a77c2beb 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h @@ -31,24 +31,14 @@ * @fmt: Pointer to format string */ #define DPU_DEBUG(fmt, ...) \ - do { \ - if (drm_debug_enabled(DRM_UT_KMS)) \ - DRM_DEBUG(fmt, ##__VA_ARGS__); \ - else \ - pr_debug(fmt, ##__VA_ARGS__); \ - } while (0) + DRM_DEBUG_DRIVER(fmt, ##__VA_ARGS__)
/** * DPU_DEBUG_DRIVER - macro for hardware driver logging * @fmt: Pointer to format string */ #define DPU_DEBUG_DRIVER(fmt, ...) \ - do { \ - if (drm_debug_enabled(DRM_UT_DRIVER)) \ - DRM_ERROR(fmt, ##__VA_ARGS__); \ - else \ - pr_debug(fmt, ##__VA_ARGS__); \ - } while (0) + DRM_DEBUG_DRIVER(fmt, ##__VA_ARGS__)
#define DPU_ERROR(fmt, ...) pr_err("[dpu error]" fmt, ##__VA_ARGS__) #define DPU_ERROR_RATELIMITED(fmt, ...) pr_err_ratelimited("[dpu error]" fmt, ##__VA_ARGS__)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Abhinav Kumar quic_abhinavk@quicinc.com
[ Upstream commit d19d5b8d8f6dab942ce5ddbcf34bf7275e778250 ]
Fix the dp_panel_get_supported_bpp() API to return the minimum supported bpp correctly for relevant cases and use this API to correct the behavior of DP driver which hard-codes the max supported bpp to 30.
This is incorrect because the number of lanes and max data rate supported by the lanes need to be taken into account.
Replace the hardcoded limit with the appropriate math which accounts for the accurate number of lanes and max data rate.
changes in v2: - Fix the dp_panel_get_supported_bpp() and use it - Drop the max_t usage as dp_panel_get_supported_bpp() already returns the min_bpp correctly now
changes in v3: - replace min_t with just min as all params are u32
Fixes: c943b4948b58 ("drm/msm/dp: add displayPort driver support") Reported-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/43 Tested-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org # SM8350-HDK Reviewed-by: Stephen Boyd swboyd@chromium.org Patchwork: https://patchwork.freedesktop.org/patch/607073/ Link: https://lore.kernel.org/r/20240805202009.1120981-1-quic_abhinavk@quicinc.com Signed-off-by: Stephen Boyd swboyd@chromium.org Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/dp/dp_panel.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/msm/dp/dp_panel.c b/drivers/gpu/drm/msm/dp/dp_panel.c index d38086650fcf7..f2cc0cc0b66b7 100644 --- a/drivers/gpu/drm/msm/dp/dp_panel.c +++ b/drivers/gpu/drm/msm/dp/dp_panel.c @@ -113,22 +113,22 @@ static int dp_panel_read_dpcd(struct dp_panel *dp_panel) static u32 dp_panel_get_supported_bpp(struct dp_panel *dp_panel, u32 mode_edid_bpp, u32 mode_pclk_khz) { - struct dp_link_info *link_info; + const struct dp_link_info *link_info; const u32 max_supported_bpp = 30, min_supported_bpp = 18; - u32 bpp = 0, data_rate_khz = 0; + u32 bpp, data_rate_khz;
- bpp = min_t(u32, mode_edid_bpp, max_supported_bpp); + bpp = min(mode_edid_bpp, max_supported_bpp);
link_info = &dp_panel->link_info; data_rate_khz = link_info->num_lanes * link_info->rate * 8;
- while (bpp > min_supported_bpp) { + do { if (mode_pclk_khz * bpp <= data_rate_khz) - break; + return bpp; bpp -= 6; - } + } while (bpp > min_supported_bpp);
- return bpp; + return min_supported_bpp; }
static int dp_panel_update_modes(struct drm_connector *connector, @@ -421,8 +421,9 @@ int dp_panel_init_panel_info(struct dp_panel *dp_panel) drm_mode->clock); drm_dbg_dp(panel->drm_dev, "bpp = %d\n", dp_panel->dp_mode.bpp);
- dp_panel->dp_mode.bpp = max_t(u32, 18, - min_t(u32, dp_panel->dp_mode.bpp, 30)); + dp_panel->dp_mode.bpp = dp_panel_get_mode_bpp(dp_panel, dp_panel->dp_mode.bpp, + dp_panel->dp_mode.drm_mode.clock); + drm_dbg_dp(panel->drm_dev, "updated bpp = %d\n", dp_panel->dp_mode.bpp);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Abhinav Kumar quic_abhinavk@quicinc.com
[ Upstream commit 319aca883bfa1b85ee08411541b51b9a934ac858 ]
Before re-starting link training reset the link phy params namely the pre-emphasis and voltage swing levels otherwise the next link training begins at the previously cached levels which can result in link training failures.
Fixes: 8ede2ecc3e5e ("drm/msm/dp: Add DP compliance tests on Snapdragon Chipsets") Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Tested-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org # SM8350-HDK Reviewed-by: Stephen Boyd swboyd@chromium.org Patchwork: https://patchwork.freedesktop.org/patch/605946/ Link: https://lore.kernel.org/r/20240725220450.131245-1-quic_abhinavk@quicinc.com Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/dp/dp_ctrl.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c index bd1343602f553..3c001b792423b 100644 --- a/drivers/gpu/drm/msm/dp/dp_ctrl.c +++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c @@ -1248,6 +1248,8 @@ static int dp_ctrl_link_train(struct dp_ctrl_private *ctrl, link_info.rate = ctrl->link->link_params.rate; link_info.capabilities = DP_LINK_CAP_ENHANCED_FRAMING;
+ dp_link_reset_phy_params_vx_px(ctrl->link); + dp_aux_link_configure(ctrl->aux, &link_info);
if (drm_dp_max_downspread(dpcd))
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit bfa1a6283be390947d3649c482e5167186a37016 ]
If the dpu_format_populate_layout() fails, then FB is prepared, but not cleaned up. This ends up leaking the pin_count on the GEM object and causes a splat during DRM file closure:
msm_obj->pin_count WARNING: CPU: 2 PID: 569 at drivers/gpu/drm/msm/msm_gem.c:121 update_lru_locked+0xc4/0xcc [...] Call trace: update_lru_locked+0xc4/0xcc put_pages+0xac/0x100 msm_gem_free_object+0x138/0x180 drm_gem_object_free+0x1c/0x30 drm_gem_object_handle_put_unlocked+0x108/0x10c drm_gem_object_release_handle+0x58/0x70 idr_for_each+0x68/0xec drm_gem_release+0x28/0x40 drm_file_free+0x174/0x234 drm_release+0xb0/0x160 __fput+0xc0/0x2c8 __fput_sync+0x50/0x5c __arm64_sys_close+0x38/0x7c invoke_syscall+0x48/0x118 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x4c/0x120 el0t_64_sync_handler+0x100/0x12c el0t_64_sync+0x190/0x194 irq event stamp: 129818 hardirqs last enabled at (129817): [<ffffa5f6d953fcc0>] console_unlock+0x118/0x124 hardirqs last disabled at (129818): [<ffffa5f6da7dcf04>] el1_dbg+0x24/0x8c softirqs last enabled at (129808): [<ffffa5f6d94afc18>] handle_softirqs+0x4c8/0x4e8 softirqs last disabled at (129785): [<ffffa5f6d94105e4>] __do_softirq+0x14/0x20
Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/600714/ Link: https://lore.kernel.org/r/20240625-dpu-mode-config-width-v5-1-501d984d634f@l... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c index 62d48c0f905e4..61c456c5015a5 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c @@ -889,6 +889,9 @@ static int dpu_plane_prepare_fb(struct drm_plane *plane, new_state->fb, &layout); if (ret) { DPU_ERROR_PLANE(pdpu, "failed to get format layout, %d\n", ret); + if (pstate->aspace) + msm_framebuffer_cleanup(new_state->fb, pstate->aspace, + pstate->needs_dirtyfb); return ret; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit a1e627af32ed60713941cbfc8075d44cad07f6dd ]
If the "test->highmem = alloc_pages()" allocation fails then calling __free_pages(test->highmem) will result in a NULL dereference. Also change the error code to -ENOMEM instead of returning success.
Fixes: 2661081f5ab9 ("mmc_test: highmem tests") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Link: https://lore.kernel.org/r/8c90be28-67b4-4b0d-a105-034dc72a0b31@stanley.mount... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/core/mmc_test.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/mmc/core/mmc_test.c b/drivers/mmc/core/mmc_test.c index 155ce2bdfe622..dbbcbdeab6c51 100644 --- a/drivers/mmc/core/mmc_test.c +++ b/drivers/mmc/core/mmc_test.c @@ -3109,13 +3109,13 @@ static ssize_t mtf_test_write(struct file *file, const char __user *buf, test->buffer = kzalloc(BUFFER_SIZE, GFP_KERNEL); #ifdef CONFIG_HIGHMEM test->highmem = alloc_pages(GFP_KERNEL | __GFP_HIGHMEM, BUFFER_ORDER); + if (!test->highmem) { + count = -ENOMEM; + goto free_test_buffer; + } #endif
-#ifdef CONFIG_HIGHMEM - if (test->buffer && test->highmem) { -#else if (test->buffer) { -#endif mutex_lock(&mmc_test_lock); mmc_test_run(test, testcase); mutex_unlock(&mmc_test_lock); @@ -3123,6 +3123,7 @@ static ssize_t mtf_test_write(struct file *file, const char __user *buf,
#ifdef CONFIG_HIGHMEM __free_pages(test->highmem, BUFFER_ORDER); +free_test_buffer: #endif kfree(test->buffer); kfree(test);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Griffin Kroah-Hartman griffin@kroah.com
commit 538fd3921afac97158d4177139a0ad39f056dbb2 upstream.
hci_conn_params_add() never checks for a NULL value and could lead to a NULL pointer dereference causing a crash.
Fixed by adding error handling in the function.
Cc: Stable stable@kernel.org Fixes: 5157b8a503fa ("Bluetooth: Fix initializing conn_params in scan phase") Signed-off-by: Griffin Kroah-Hartman griffin@kroah.com Reported-by: Yiwei Zhang zhan4630@purdue.edu Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/mgmt.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/bluetooth/mgmt.c +++ b/net/bluetooth/mgmt.c @@ -3524,6 +3524,10 @@ static int pair_device(struct sock *sk, * will be kept and this function does nothing. */ p = hci_conn_params_add(hdev, &cp->addr.bdaddr, addr_type); + if (!p) { + err = -EIO; + goto unlock; + }
if (p->auto_connect == HCI_AUTO_CONN_EXPLICIT) p->auto_connect = HCI_AUTO_CONN_DISABLED;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chaotian Jing chaotian.jing@mediatek.com
commit f03e94f23b04c2b71c0044c1534921b3975ef10c upstream.
scsi_logical_block_count() should return the block count of a given SCSI command. The original implementation ended up shifting twice, leading to an incorrect count being returned. Fix the conversion between bytes and logical blocks.
Cc: stable@vger.kernel.org Fixes: 6a20e21ae1e2 ("scsi: core: Add helper to return number of logical blocks in a request") Signed-off-by: Chaotian Jing chaotian.jing@mediatek.com Link: https://lore.kernel.org/r/20240813053534.7720-1-chaotian.jing@mediatek.com Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/scsi/scsi_cmnd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/scsi/scsi_cmnd.h +++ b/include/scsi/scsi_cmnd.h @@ -228,7 +228,7 @@ static inline sector_t scsi_get_lba(stru
static inline unsigned int scsi_logical_block_count(struct scsi_cmnd *scmd) { - unsigned int shift = ilog2(scmd->device->sector_size) - SECTOR_SHIFT; + unsigned int shift = ilog2(scmd->device->sector_size);
return blk_rq_bytes(scsi_cmd_to_rq(scmd)) >> shift; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
commit ce61b605a00502c59311d0a4b1f58d62b48272d0 upstream.
When STATUS_NO_MORE_FILES status is set to smb2 query dir response, ->StructureSize is set to 9, which mean buffer has 1 byte. This issue occurs because ->Buffer[1] in smb2_query_directory_rsp to flex-array.
Fixes: eb3e28c1e89b ("smb3: Replace smb2pdu 1-element arrays with flex-arrays") Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/smb2pdu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -4169,7 +4169,8 @@ int smb2_query_dir(struct ksmbd_work *wo rsp->OutputBufferLength = cpu_to_le32(0); rsp->Buffer[0] = 0; rc = ksmbd_iov_pin_rsp(work, (void *)rsp, - sizeof(struct smb2_query_directory_rsp)); + offsetof(struct smb2_query_directory_rsp, Buffer) + + 1); if (rc) goto err_out; } else {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Candice Li candice.li@amd.com
commit c99769bceab4ecb6a067b9af11f9db281eea3e2a upstream.
Add TA binary size validation to avoid OOB write.
Signed-off-by: Candice Li candice.li@amd.com Reviewed-by: Hawking Zhang Hawking.Zhang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit c0a04e3570d72aaf090962156ad085e37c62e442) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c @@ -148,6 +148,9 @@ static ssize_t ta_if_load_debugfs_write( if (ret) return -EINVAL;
+ if (ta_bin_len > PSP_1_MEG) + return -EINVAL; + copy_pos += sizeof(uint32_t);
ta_bin = kzalloc(ta_bin_len, GFP_KERNEL);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiaxun Yang jiaxun.yang@flygoat.com
commit 1cb6ab446424649f03c82334634360c2e3043684 upstream.
Loongson64 C and G processors have EXTIMER feature which is conflicting with CP0 counter.
Although the processor resets in EXTIMER disabled & INTIMER enabled mode, which is compatible with MIPS CP0 compare, firmware may attempt to enable EXTIMER and interfere CP0 compare.
Set timer mode back to MIPS compatible mode to fix booting on systems with such firmware before we have an actual driver for EXTIMER.
Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang jiaxun.yang@flygoat.com Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/mips/kernel/cpu-probe.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/arch/mips/kernel/cpu-probe.c +++ b/arch/mips/kernel/cpu-probe.c @@ -1723,12 +1723,16 @@ static inline void cpu_probe_loongson(st c->ases |= (MIPS_ASE_LOONGSON_MMI | MIPS_ASE_LOONGSON_CAM | MIPS_ASE_LOONGSON_EXT | MIPS_ASE_LOONGSON_EXT2); c->ases &= ~MIPS_ASE_VZ; /* VZ of Loongson-3A2000/3000 is incomplete */ + change_c0_config6(LOONGSON_CONF6_EXTIMER | LOONGSON_CONF6_INTIMER, + LOONGSON_CONF6_INTIMER); break; case PRID_IMP_LOONGSON_64G: __cpu_name[cpu] = "ICT Loongson-3"; set_elf_platform(cpu, "loongson3a"); set_isa(c, MIPS_CPU_ISA_M64R2); decode_cpucfg(c); + change_c0_config6(LOONGSON_CONF6_EXTIMER | LOONGSON_CONF6_INTIMER, + LOONGSON_CONF6_INTIMER); break; default: panic("Unknown Loongson Processor ID!");
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Gerecke jason.gerecke@wacom.com
commit 1b8f9c1fb464968a5b18d3acc1da8c00bad24fad upstream.
The Wacom driver maps the HID_DG_TWIST usage to ABS_Z (rather than ABS_RZ) for historic reasons. When the code to support twist was introduced in commit 50066a042da5 ("HID: wacom: generic: Add support for height, tilt, and twist usages"), we were careful to write it in such a way that it had HID calculate the resolution of the twist axis assuming ABS_RZ instead (so that we would get correct angular behavior). This was broken with the introduction of commit 08a46b4190d3 ("HID: wacom: Set a default resolution for older tablets"), which moved the resolution calculation to occur *before* the adjustment from ABS_Z to ABS_RZ occurred.
This commit moves the calculation of resolution after the point that we are finished setting things up for its proper use.
Signed-off-by: Jason Gerecke jason.gerecke@wacom.com Fixes: 08a46b4190d3 ("HID: wacom: Set a default resolution for older tablets") Cc: stable@vger.kernel.org Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/wacom_wac.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/hid/wacom_wac.c +++ b/drivers/hid/wacom_wac.c @@ -1921,12 +1921,14 @@ static void wacom_map_usage(struct input int fmax = field->logical_maximum; unsigned int equivalent_usage = wacom_equivalent_usage(usage->hid); int resolution_code = code; - int resolution = hidinput_calc_abs_res(field, resolution_code); + int resolution;
if (equivalent_usage == HID_DG_TWIST) { resolution_code = ABS_RZ; }
+ resolution = hidinput_calc_abs_res(field, resolution_code); + if (equivalent_usage == HID_GD_X) { fmin += features->offset_left; fmax -= features->offset_right;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Siarhei Vishniakou svv@google.com
commit f5554725f30475b05b5178b998366f11ecb50c7f upstream.
Currently, rumble is only supported via bluetooth on a single xbox controller, called 'model 1708'. On the back of the device, it's named 'wireless controller for xbox one'. However, in 2021, Microsoft released a firmware update for this controller. As part of this update, the HID descriptor of the device changed. The product ID was also changed from 0x02fd to 0x0b20. On this controller, rumble was supported via hid-microsoft, which matched against the old product id (0x02fd). As a result, the firmware update broke rumble support on this controller.
See: https://news.xbox.com/en-us/2021/09/08/xbox-controller-firmware-update-rolli...
The hid-microsoft driver actually supports rumble on the new firmware, as well. So simply adding new product id is sufficient to bring back this support.
After discussing further with the xbox team, it was pointed out that another xbox controller, xbox elite series 2, can be supported in a similar way.
Add rumble support for all of these devices in this patch. Two of the devices have received firmware updates that caused their product id's to change. Both old and new firmware versions of these devices were tested.
The tested controllers are:
1. 'wireless controller for xbox one', model 1708 2. 'xbox wireless controller', model 1914. This is also sometimes referred to as 'xbox series S|X'. 3. 'elite series 2', model 1797.
The tested configurations are: 1. model 1708, pid 0x02fd (old firmware) 2. model 1708, pid 0x0b20 (new firmware) 3. model 1914, pid 0x0b13 4. model 1797, pid 0x0b05 (old firmware) 5. model 1797, pid 0x0b22 (new firmware)
I verified rumble support on both bluetooth and usb.
Reviewed-by: Bastien Nocera hadess@hadess.net Signed-off-by: Siarhei Vishniakou svv@google.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-ids.h | 10 +++++++++- drivers/hid/hid-microsoft.c | 11 ++++++++++- 2 files changed, 19 insertions(+), 2 deletions(-)
--- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -924,7 +924,15 @@ #define USB_DEVICE_ID_MS_TYPE_COVER_2 0x07a9 #define USB_DEVICE_ID_MS_POWER_COVER 0x07da #define USB_DEVICE_ID_MS_SURFACE3_COVER 0x07de -#define USB_DEVICE_ID_MS_XBOX_ONE_S_CONTROLLER 0x02fd +/* + * For a description of the Xbox controller models, refer to: + * https://en.wikipedia.org/wiki/Xbox_Wireless_Controller#Summary + */ +#define USB_DEVICE_ID_MS_XBOX_CONTROLLER_MODEL_1708 0x02fd +#define USB_DEVICE_ID_MS_XBOX_CONTROLLER_MODEL_1708_BLE 0x0b20 +#define USB_DEVICE_ID_MS_XBOX_CONTROLLER_MODEL_1914 0x0b13 +#define USB_DEVICE_ID_MS_XBOX_CONTROLLER_MODEL_1797 0x0b05 +#define USB_DEVICE_ID_MS_XBOX_CONTROLLER_MODEL_1797_BLE 0x0b22 #define USB_DEVICE_ID_MS_PIXART_MOUSE 0x00cb #define USB_DEVICE_ID_8BITDO_SN30_PRO_PLUS 0x02e0 #define USB_DEVICE_ID_MS_MOUSE_0783 0x0783 --- a/drivers/hid/hid-microsoft.c +++ b/drivers/hid/hid-microsoft.c @@ -446,7 +446,16 @@ static const struct hid_device_id ms_dev .driver_data = MS_PRESENTER }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, 0x091B), .driver_data = MS_SURFACE_DIAL }, - { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_XBOX_ONE_S_CONTROLLER), + + { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_XBOX_CONTROLLER_MODEL_1708), + .driver_data = MS_QUIRK_FF }, + { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_XBOX_CONTROLLER_MODEL_1708_BLE), + .driver_data = MS_QUIRK_FF }, + { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_XBOX_CONTROLLER_MODEL_1914), + .driver_data = MS_QUIRK_FF }, + { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_XBOX_CONTROLLER_MODEL_1797), + .driver_data = MS_QUIRK_FF }, + { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_XBOX_CONTROLLER_MODEL_1797_BLE), .driver_data = MS_QUIRK_FF }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_8BITDO_SN30_PRO_PLUS), .driver_data = MS_QUIRK_FF },
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Werner Sembach wse@tuxedocomputers.com
commit 3d765ae2daccc570b3f4fbcb57eb321b12cdded2 upstream.
On s3 resume the i8042 driver tries to restore the controller to a known state by reinitializing things, however this can confuse the controller with different effects. Mostly occasionally unresponsive keyboards after resume.
These issues do not rise on s0ix resume as here the controller is assumed to preserved its state from before suspend.
This patch adds a quirk for devices where the reinitialization on s3 resume is not needed and might be harmful as described above. It does this by using the s0ix resume code path at selected locations.
This new quirk goes beyond what the preexisting reset=never quirk does, which only skips some reinitialization steps.
Signed-off-by: Werner Sembach wse@tuxedocomputers.com Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20240104183118.779778-2-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/serio/i8042-acpipnpio.h | 10 +++++++--- drivers/input/serio/i8042.c | 10 +++++++--- 2 files changed, 14 insertions(+), 6 deletions(-)
--- a/drivers/input/serio/i8042-acpipnpio.h +++ b/drivers/input/serio/i8042-acpipnpio.h @@ -83,6 +83,7 @@ static inline void i8042_write_command(i #define SERIO_QUIRK_KBDRESET BIT(12) #define SERIO_QUIRK_DRITEK BIT(13) #define SERIO_QUIRK_NOPNP BIT(14) +#define SERIO_QUIRK_FORCENORESTORE BIT(15)
/* Quirk table for different mainboards. Options similar or identical to i8042 * module parameters. @@ -1685,6 +1686,8 @@ static void __init i8042_check_quirks(vo if (quirks & SERIO_QUIRK_NOPNP) i8042_nopnp = true; #endif + if (quirks & SERIO_QUIRK_FORCENORESTORE) + i8042_forcenorestore = true; } #else static inline void i8042_check_quirks(void) {} @@ -1718,7 +1721,7 @@ static int __init i8042_platform_init(vo
i8042_check_quirks();
- pr_debug("Active quirks (empty means none):%s%s%s%s%s%s%s%s%s%s%s%s%s\n", + pr_debug("Active quirks (empty means none):%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n", i8042_nokbd ? " nokbd" : "", i8042_noaux ? " noaux" : "", i8042_nomux ? " nomux" : "", @@ -1738,10 +1741,11 @@ static int __init i8042_platform_init(vo "", #endif #ifdef CONFIG_PNP - i8042_nopnp ? " nopnp" : ""); + i8042_nopnp ? " nopnp" : "", #else - ""); + "", #endif + i8042_forcenorestore ? " forcenorestore" : "");
retval = i8042_pnp_init(); if (retval) --- a/drivers/input/serio/i8042.c +++ b/drivers/input/serio/i8042.c @@ -115,6 +115,10 @@ module_param_named(nopnp, i8042_nopnp, b MODULE_PARM_DESC(nopnp, "Do not use PNP to detect controller settings"); #endif
+static bool i8042_forcenorestore; +module_param_named(forcenorestore, i8042_forcenorestore, bool, 0); +MODULE_PARM_DESC(forcenorestore, "Force no restore on s3 resume, copying s2idle behaviour"); + #define DEBUG #ifdef DEBUG static bool i8042_debug; @@ -1232,7 +1236,7 @@ static int i8042_pm_suspend(struct devic { int i;
- if (pm_suspend_via_firmware()) + if (!i8042_forcenorestore && pm_suspend_via_firmware()) i8042_controller_reset(true);
/* Set up serio interrupts for system wakeup. */ @@ -1248,7 +1252,7 @@ static int i8042_pm_suspend(struct devic
static int i8042_pm_resume_noirq(struct device *dev) { - if (!pm_resume_via_firmware()) + if (i8042_forcenorestore || !pm_resume_via_firmware()) i8042_interrupt(0, NULL);
return 0; @@ -1271,7 +1275,7 @@ static int i8042_pm_resume(struct device * not restore the controller state to whatever it had been at boot * time, so we do not need to do anything. */ - if (!pm_suspend_via_firmware()) + if (i8042_forcenorestore || !pm_suspend_via_firmware()) return 0;
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Werner Sembach wse@tuxedocomputers.com
commit aaa4ca873d3da768896ffc909795359a01e853ef upstream.
The old quirk combination sometimes cause a laggy keyboard after boot. With the new quirk the initial issue of an unresponsive keyboard after s3 resume is also fixed, but it doesn't have the negative side effect of the sometimes laggy keyboard.
Signed-off-by: Werner Sembach wse@tuxedocomputers.com Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20240104183118.779778-3-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/serio/i8042-acpipnpio.h | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
--- a/drivers/input/serio/i8042-acpipnpio.h +++ b/drivers/input/serio/i8042-acpipnpio.h @@ -1150,18 +1150,10 @@ static const struct dmi_system_id i8042_ SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) }, { - /* - * Setting SERIO_QUIRK_NOMUX or SERIO_QUIRK_RESET_ALWAYS makes - * the keyboard very laggy for ~5 seconds after boot and - * sometimes also after resume. - * However both are required for the keyboard to not fail - * completely sometimes after boot or resume. - */ .matches = { DMI_MATCH(DMI_BOARD_NAME, "N150CU"), }, - .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | - SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + .driver_data = (void *)(SERIO_QUIRK_FORCENORESTORE) }, { .matches = {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Kuratov kniv@yandex-team.ru
commit 80a1e7b83bb1834b5568a3872e64c05795d88f31 upstream.
It is done everywhere in cxgb4 code, e.g. in is_filter_exact_match() There is no reason it should not be done here
Found by Linux Verification Center (linuxtesting.org) with SVACE
Signed-off-by: Nikolay Kuratov kniv@yandex-team.ru Cc: stable@vger.kernel.org Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters") Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Jacob Keller jacob.e.keller@intel.com Link: https://patch.msgid.link/20240819075408.92378-1-kniv@yandex-team.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c @@ -1244,7 +1244,8 @@ static u64 hash_filter_ntuple(struct ch_ * in the Compressed Filter Tuple. */ if (tp->vlan_shift >= 0 && fs->mask.ivlan) - ntuple |= (FT_VLAN_VLD_F | fs->val.ivlan) << tp->vlan_shift; + ntuple |= (u64)(FT_VLAN_VLD_F | + fs->val.ivlan) << tp->vlan_shift;
if (tp->port_shift >= 0 && fs->mask.iport) ntuple |= (u64)fs->val.iport << tp->port_shift;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier maz@kernel.org
commit 3e6245ebe7ef341639e9a7e402b3ade8ad45a19f upstream.
On a system with a GICv3, if a guest hasn't been configured with GICv3 and that the host is not capable of GICv2 emulation, a write to any of the ICC_*SGI*_EL1 registers is trapped to EL2.
We therefore try to emulate the SGI access, only to hit a NULL pointer as no private interrupt is allocated (no GIC, remember?).
The obvious fix is to give the guest what it deserves, in the shape of a UNDEF exception.
Reported-by: Alexander Potapenko glider@google.com Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240820100349.3544850-2-maz@kernel.org Signed-off-by: Oliver Upton oliver.upton@linux.dev Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/sys_regs.c | 6 ++++++ arch/arm64/kvm/vgic/vgic.h | 7 +++++++ 2 files changed, 13 insertions(+)
--- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -30,6 +30,7 @@ #include <trace/events/kvm.h>
#include "sys_regs.h" +#include "vgic/vgic.h"
#include "trace.h"
@@ -200,6 +201,11 @@ static bool access_gic_sgi(struct kvm_vc { bool g1;
+ if (!kvm_has_gicv3(vcpu->kvm)) { + kvm_inject_undefined(vcpu); + return false; + } + if (!p->is_write) return read_from_write_only(vcpu, p, r);
--- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -334,4 +334,11 @@ void vgic_v4_configure_vsgis(struct kvm void vgic_v4_get_vlpi_state(struct vgic_irq *irq, bool *val); int vgic_v4_request_vpe_irq(struct kvm_vcpu *vcpu, int irq);
+static inline bool kvm_has_gicv3(struct kvm *kvm) +{ + return (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif) && + irqchip_in_kernel(kvm) && + kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3); +} + #endif
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Whitten ben.whitten@gmail.com
commit 6275c7bc8dd07644ea8142a1773d826800f0f3f7 upstream.
Fix a race condition if the clock provider comes up after mmc is probed, this causes mmc to fail without retrying. When given the DEFER error from the clk source, pass it on up the chain.
Fixes: f90a0612f0e1 ("mmc: dw_mmc: lookup for optional biu and ciu clocks") Signed-off-by: Ben Whitten ben.whitten@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240811212212.123255-1-ben.whitten@gmail.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/dw_mmc.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -3295,6 +3295,10 @@ int dw_mci_probe(struct dw_mci *host) host->biu_clk = devm_clk_get(host->dev, "biu"); if (IS_ERR(host->biu_clk)) { dev_dbg(host->dev, "biu clock not available\n"); + ret = PTR_ERR(host->biu_clk); + if (ret == -EPROBE_DEFER) + return ret; + } else { ret = clk_prepare_enable(host->biu_clk); if (ret) { @@ -3306,6 +3310,10 @@ int dw_mci_probe(struct dw_mci *host) host->ciu_clk = devm_clk_get(host->dev, "ciu"); if (IS_ERR(host->ciu_clk)) { dev_dbg(host->dev, "ciu clock not available\n"); + ret = PTR_ERR(host->ciu_clk); + if (ret == -EPROBE_DEFER) + goto err_clk_biu; + host->bus_hz = host->pdata->bus_hz; } else { ret = clk_prepare_enable(host->ciu_clk);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peng Fan peng.fan@nxp.com
commit 52dd070c62e4ae2b5e7411b920e3f7a64235ecfb upstream.
With "quiet" set in bootargs, there is power domain failure: "imx93_power_domain 44462400.power-domain: pd_off timeout: name: 44462400.power-domain, stat: 4"
The current power on opertation takes ISO state as power on finished flag, but it is wrong. Before powering on operation really finishes, powering off comes and powering off will never finish because the last powering on still not finishes, so the following powering off actually not trigger hardware state machine to run. SSAR is the last step when powering on a domain, so need to wait SSAR done when powering on.
Since EdgeLock Enclave(ELE) handshake is involved in the flow, enlarge the waiting time to 10ms for both on and off to avoid timeout.
Cc: stable@vger.kernel.org Fixes: 0a0f7cc25d4a ("soc: imx: add i.MX93 SRC power domain driver") Reviewed-by: Jacky Bai ping.bai@nxp.com Signed-off-by: Peng Fan peng.fan@nxp.com Link: https://lore.kernel.org/r/20240814124740.2778952-1-peng.fan@oss.nxp.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/imx/imx93-pd.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/soc/imx/imx93-pd.c +++ b/drivers/soc/imx/imx93-pd.c @@ -20,6 +20,7 @@ #define FUNC_STAT_PSW_STAT_MASK BIT(0) #define FUNC_STAT_RST_STAT_MASK BIT(2) #define FUNC_STAT_ISO_STAT_MASK BIT(4) +#define FUNC_STAT_SSAR_STAT_MASK BIT(8)
struct imx93_power_domain { struct generic_pm_domain genpd; @@ -50,7 +51,7 @@ static int imx93_pd_on(struct generic_pm writel(val, addr + MIX_SLICE_SW_CTRL_OFF);
ret = readl_poll_timeout(addr + MIX_FUNC_STAT_OFF, val, - !(val & FUNC_STAT_ISO_STAT_MASK), 1, 10000); + !(val & FUNC_STAT_SSAR_STAT_MASK), 1, 10000); if (ret) { dev_err(domain->dev, "pd_on timeout: name: %s, stat: %x\n", genpd->name, val); return ret; @@ -72,7 +73,7 @@ static int imx93_pd_off(struct generic_p writel(val, addr + MIX_SLICE_SW_CTRL_OFF);
ret = readl_poll_timeout(addr + MIX_FUNC_STAT_OFF, val, - val & FUNC_STAT_PSW_STAT_MASK, 1, 1000); + val & FUNC_STAT_PSW_STAT_MASK, 1, 10000); if (ret) { dev_err(domain->dev, "pd_off timeout: name: %s, stat: %x\n", genpd->name, val); return ret;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit e255683c06df572ead96db5efb5d21be30c0efaa upstream.
If no subflow is attached to the 'signal' endpoint that is being removed, the addr ID will not be marked as available again.
Mark the linked ID as available when removing the address entry from the list to cover this case.
Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-1-38035d40de5b@... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1474,7 +1474,10 @@ static bool mptcp_pm_remove_anno_addr(st ret = remove_anno_list_by_saddr(msk, addr); if (ret || force) { spin_lock_bh(&msk->pm.lock); - msk->pm.add_addr_signaled -= ret; + if (ret) { + __set_bit(addr->id, msk->pm.id_avail_bitmap); + msk->pm.add_addr_signaled--; + } mptcp_pm_remove_addr(msk, &list); spin_unlock_bh(&msk->pm.lock); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit edd8b5d868a4d459f3065493001e293901af758d upstream.
If no subflow is attached to the 'subflow' endpoint that is being removed, the addr ID will not be marked as available again.
Mark the linked ID as available when removing the 'subflow' endpoint if no subflow is attached to it.
While at it, the local_addr_used counter is decremented if the ID was marked as being used to reflect the reality, but also to allow adding new endpoints after that.
Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-3-38035d40de5b@... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1512,8 +1512,17 @@ static int mptcp_nl_remove_subflow_and_s remove_subflow = lookup_subflow_by_saddr(&msk->conn_list, addr); mptcp_pm_remove_anno_addr(msk, addr, remove_subflow && !(entry->flags & MPTCP_PM_ADDR_FLAG_IMPLICIT)); - if (remove_subflow) + + if (remove_subflow) { mptcp_pm_remove_subflow(msk, &list); + } else if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) { + /* If the subflow has been used, but now closed */ + spin_lock_bh(&msk->pm.lock); + if (!__test_and_set_bit(entry->addr.id, msk->pm.id_avail_bitmap)) + msk->pm.local_addr_used--; + spin_unlock_bh(&msk->pm.lock); + } + release_sock(sk);
next:
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit ef34a6ea0cab1800f4b3c9c3c2cefd5091e03379 upstream.
If no subflows are attached to the 'subflow' endpoints that are being flushed, the corresponding addr IDs will not be marked as available again.
Mark all ID as being available when flushing all the 'subflow' endpoints, and reset local_addr_used counter to cover these cases.
Note that mptcp_pm_remove_addrs_and_subflows() helper is only called for flushing operations, not to remove a specific set of addresses and subflows.
Fixes: 06faa2271034 ("mptcp: remove multi addresses and subflows in PM") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-5-38035d40de5b@... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1666,8 +1666,15 @@ void mptcp_pm_remove_addrs_and_subflows( mptcp_pm_remove_addr(msk, &alist); spin_unlock_bh(&msk->pm.lock); } + if (slist.nr) mptcp_pm_remove_subflow(msk, &slist); + + /* Reset counters: maybe some subflows have been removed before */ + spin_lock_bh(&msk->pm.lock); + bitmap_fill(msk->pm.id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); + msk->pm.local_addr_used = 0; + spin_unlock_bh(&msk->pm.lock); }
static void mptcp_nl_remove_addrs_list(struct net *net,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 1c1f721375989579e46741f59523e39ec9b2a9bd upstream.
Adding the following warning ...
WARN_ON_ONCE(msk->pm.add_addr_accepted == 0)
... before decrementing the add_addr_accepted counter helped to find a bug when running the "remove single subflow" subtest from the mptcp_join.sh selftest.
Removing a 'subflow' endpoint will first trigger a RM_ADDR, then the subflow closure. Before this patch, and upon the reception of the RM_ADDR, the other peer will then try to decrement this add_addr_accepted. That's not correct because the attached subflows have not been created upon the reception of an ADD_ADDR.
A way to solve that is to decrement the counter only if the attached subflow was an MP_JOIN to a remote id that was not 0, and initiated by the host receiving the RM_ADDR.
Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-9-38035d40de5b@... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -834,7 +834,7 @@ static void mptcp_pm_nl_rm_addr_or_subfl mptcp_close_ssk(sk, ssk, subflow); spin_lock_bh(&msk->pm.lock);
- removed = true; + removed |= subflow->request_join; if (rm_type == MPTCP_MIB_RMSUBFLOW) __MPTCP_INC_STATS(sock_net(sk), rm_type); } @@ -848,7 +848,11 @@ static void mptcp_pm_nl_rm_addr_or_subfl if (!mptcp_pm_is_kernel(msk)) continue;
- if (rm_type == MPTCP_MIB_RMADDR) { + if (rm_type == MPTCP_MIB_RMADDR && rm_id && + !WARN_ON_ONCE(msk->pm.add_addr_accepted == 0)) { + /* Note: if the subflow has been closed before, this + * add_addr_accepted counter will not be decremented. + */ msk->pm.add_addr_accepted--; WRITE_ONCE(msk->pm.accept_addr, true); } else if (rm_type == MPTCP_MIB_RMSUBFLOW) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit dddc00f255415b826190cfbaa5d6dbc87cd9ded1 upstream.
This reverts commit 52a39f2cf62bb5430ad1f54cd522dbfdab1d71ba.
Based on review comments, it was applied too soon and needs more work.
Reported-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Link: https://lore.kernel.org/r/20231005081716.GA13853@pendragon.ideasonboard.com Cc: Michael Grzeschik m.grzeschik@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/function/uvc_video.c | 6 ------ 1 file changed, 6 deletions(-)
--- a/drivers/usb/gadget/function/uvc_video.c +++ b/drivers/usb/gadget/function/uvc_video.c @@ -259,12 +259,6 @@ uvc_video_complete(struct usb_ep *ep, st struct uvc_device *uvc = video->uvc; unsigned long flags;
- if (uvc->state == UVC_STATE_CONNECTED) { - usb_ep_free_request(video->ep, ureq->req); - ureq->req = NULL; - return; - } - switch (req->status) { case 0: break;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Hung alex.hung@amd.com
commit 8f4bdbc8e99db6ec9cb0520748e49a2f2d7d1727 upstream.
This reverts commit 58c3b3341cea4f75dc8c003b89f8a6dd8ec55e50.
[WHY & HOW] The writeback series cause a regression in thunderbolt display.
Signed-off-by: Alex Hung alex.hung@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dwb_cm.c | 3 --- 1 file changed, 3 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dwb_cm.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dwb_cm.c @@ -243,9 +243,6 @@ static bool dwb3_program_ogam_lut( return false; }
- if (params->hw_points_num == 0) - return false; - REG_SET(DWB_OGAM_CONTROL, 0, DWB_OGAM_MODE, 2);
current_mode = dwb3_get_ogam_current(dwbc30);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit c51db4ac10d57c366f9a92121e3889bfc6c324cd upstream.
After commit 1eeb50435739 ("tcp/dccp: do not care about families in inet_twsk_purge()") tcp_twsk_purge() is no longer potentially called from a module.
Signed-off-by: Eric Dumazet edumazet@google.com Cc: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_minisocks.c | 1 - 1 file changed, 1 deletion(-)
--- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -362,7 +362,6 @@ void tcp_twsk_purge(struct list_head *ne } } } -EXPORT_SYMBOL_GPL(tcp_twsk_purge);
/* Warning : This function is called without sk_listener being locked. * Be sure to read socket fields once, as their value could change under us.
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Javier Carrasco javier.carrasco.cruz@gmail.com
commit a94ff8e50c20bde6d50864849a98b106e45d30c6 upstream.
A new error path was added to the fwnode_for_each_available_node() loop in ltc2992_parse_dt(), which leads to an early return that requires a call to fwnode_handle_put() to avoid a memory leak in that case.
Add the missing fwnode_handle_put() in the error path from a zero value shunt resistor.
Cc: stable@vger.kernel.org Fixes: 10b029020487 ("hwmon: (ltc2992) Avoid division by zero") Signed-off-by: Javier Carrasco javier.carrasco.cruz@gmail.com Link: https://lore.kernel.org/r/20240523-fwnode_for_each_available_child_node_scop... Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwmon/ltc2992.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/hwmon/ltc2992.c +++ b/drivers/hwmon/ltc2992.c @@ -876,9 +876,11 @@ static int ltc2992_parse_dt(struct ltc29
ret = fwnode_property_read_u32(child, "shunt-resistor-micro-ohms", &val); if (!ret) { - if (!val) + if (!val) { + fwnode_handle_put(child); return dev_err_probe(&st->client->dev, -EINVAL, "shunt resistor value cannot be zero\n"); + } st->r_sense_uohm[addr] = val; } }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 upstream.
The recent addition of a sanity check for a too low start tick time seems breaking some applications that uses aloop with a certain slave timer setup. They may have the initial resolution 0, hence it's treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time") Cc: stable@vger.kernel.org Link: https://github.com/raspberrypi/linux/issues/6294 Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/core/timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/core/timer.c +++ b/sound/core/timer.c @@ -556,7 +556,7 @@ static int snd_timer_start1(struct snd_t /* check the actual time for the start tick; * bail out as error if it's way too low (< 100us) */ - if (start) { + if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) { if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000) { result = -EINVAL; goto unlock;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hailong Liu hailong.liu@oppo.com
commit 61ebe5a747da649057c37be1c37eb934b4af79ca upstream.
The __vmap_pages_range_noflush() assumes its argument pages** contains pages with the same page shift. However, since commit e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages** may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption.
Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE):
kvmalloc(2M, __GFP_NOFAIL|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code.
Link: https://lkml.kernel.org/r/20240808122019.3361-1-hailong.liu@oppo.com Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu hailong.liu@oppo.com Reported-by: Tangquan Zheng zhengtangquan@oppo.com Reviewed-by: Baoquan He bhe@redhat.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Acked-by: Barry Song baohua@kernel.org Acked-by: Michal Hocko mhocko@suse.com Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/vmalloc.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)
--- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2992,15 +2992,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages(alloc_gfp, order); else page = alloc_pages_node(nid, alloc_gfp, order); - if (unlikely(!page)) { - if (!nofail) - break; - - /* fall back to the zero order allocations */ - alloc_gfp |= __GFP_NOFAIL; - order = 0; - continue; - } + if (unlikely(!page)) + break;
/* * Higher order allocations must be able to be treated as
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zi Yan ziy@nvidia.com
commit fd8c35a92910f4829b7c99841f39b1b952c259d5 upstream.
When handling a numa page fault, task_numa_fault() should be called by a process that restores the page table of the faulted folio to avoid duplicated stats counting. Commit c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling") restructured do_huge_pmd_numa_page() and did not avoid task_numa_fault() call in the second page table check after a numa migration failure. Fix it by making all !pmd_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary and lead to unexpected numa balancing results (It is hard to tell whether the issue will cause positive or negative performance impact due to duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-3-ziy@nvidia.com Fixes: c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling") Reported-by: "Huang, Ying" ying.huang@intel.com Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel... Signed-off-by: Zi Yan ziy@nvidia.com Acked-by: David Hildenbrand david@redhat.com Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Mel Gorman mgorman@suse.de Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/huge_memory.c | 30 +++++++++++++----------------- 1 file changed, 13 insertions(+), 17 deletions(-)
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1492,7 +1492,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) { spin_unlock(vmf->ptl); - goto out; + return 0; }
pmd = pmd_modify(oldpmd, vma->vm_page_prot); @@ -1525,23 +1525,16 @@ vm_fault_t do_huge_pmd_numa_page(struct if (migrated) { flags |= TNF_MIGRATED; page_nid = target_nid; - } else { - flags |= TNF_MIGRATE_FAIL; - vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); - if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) { - spin_unlock(vmf->ptl); - goto out; - } - goto out_map; + task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR, flags); + return 0; }
-out: - if (page_nid != NUMA_NO_NODE) - task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR, - flags); - - return 0; - + flags |= TNF_MIGRATE_FAIL; + vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); + if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) { + spin_unlock(vmf->ptl); + return 0; + } out_map: /* Restore the PMD */ pmd = pmd_modify(oldpmd, vma->vm_page_prot); @@ -1551,7 +1544,10 @@ out_map: set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); spin_unlock(vmf->ptl); - goto out; + + if (page_nid != NUMA_NO_NODE) + task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR, flags); + return 0; }
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zi Yan ziy@nvidia.com
commit 40b760cfd44566bca791c80e0720d70d75382b84 upstream.
When handling a numa page fault, task_numa_fault() should be called by a process that restores the page table of the faulted folio to avoid duplicated stats counting. Commit b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault") restructured do_numa_page() and did not avoid task_numa_fault() call in the second page table check after a numa migration failure. Fix it by making all !pte_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary and lead to unexpected numa balancing results (It is hard to tell whether the issue will cause positive or negative performance impact due to duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-2-ziy@nvidia.com Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault") Signed-off-by: Zi Yan ziy@nvidia.com Reported-by: "Huang, Ying" ying.huang@intel.com Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel... Acked-by: David Hildenbrand david@redhat.com Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Mel Gorman mgorman@suse.de Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-)
--- a/mm/memory.c +++ b/mm/memory.c @@ -4786,7 +4786,7 @@ static vm_fault_t do_numa_page(struct vm spin_lock(vmf->ptl); if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { pte_unmap_unlock(vmf->pte, vmf->ptl); - goto out; + return 0; }
/* Get the normal PTE */ @@ -4841,21 +4841,17 @@ static vm_fault_t do_numa_page(struct vm if (migrate_misplaced_page(page, vma, target_nid)) { page_nid = target_nid; flags |= TNF_MIGRATED; - } else { - flags |= TNF_MIGRATE_FAIL; - vmf->pte = pte_offset_map(vmf->pmd, vmf->address); - spin_lock(vmf->ptl); - if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { - pte_unmap_unlock(vmf->pte, vmf->ptl); - goto out; - } - goto out_map; + task_numa_fault(last_cpupid, page_nid, 1, flags); + return 0; }
-out: - if (page_nid != NUMA_NO_NODE) - task_numa_fault(last_cpupid, page_nid, 1, flags); - return 0; + flags |= TNF_MIGRATE_FAIL; + vmf->pte = pte_offset_map(vmf->pmd, vmf->address); + spin_lock(vmf->ptl); + if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; + } out_map: /* * Make it present again, depending on how arch implements @@ -4869,7 +4865,10 @@ out_map: ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte); update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); - goto out; + + if (page_nid != NUMA_NO_NODE) + task_numa_fault(last_cpupid, page_nid, 1, flags); + return 0; }
static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 18e4cf915543257eae2925671934937163f5639b ]
Previously a thread could exit asynchronously (due to a signal) so some care was needed to hold nfsd_mutex over the last svc_put() call. Now a thread can only exit when svc_set_num_threads() is called, and this is always called under nfsd_mutex. So no care is needed.
Not only is the mutex held when a thread exits now, but the svc refcount is elevated, so the svc_put() in svc_exit_thread() will never be a final put, so the mutex isn't even needed at this point in the code.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfssvc.c | 23 ----------------------- include/linux/sunrpc/svc.h | 13 ------------- 2 files changed, 36 deletions(-)
--- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -983,31 +983,8 @@ nfsd(void *vrqstp) atomic_dec(&nfsd_th_cnt);
out: - /* Take an extra ref so that the svc_put in svc_exit_thread() - * doesn't call svc_destroy() - */ - svc_get(nn->nfsd_serv); - /* Release the thread */ svc_exit_thread(rqstp); - - /* We need to drop a ref, but may not drop the last reference - * without holding nfsd_mutex, and we cannot wait for nfsd_mutex as that - * could deadlock with nfsd_shutdown_threads() waiting for us. - * So three options are: - * - drop a non-final reference, - * - get the mutex without waiting - * - sleep briefly andd try the above again - */ - while (!svc_put_not_last(nn->nfsd_serv)) { - if (mutex_trylock(&nfsd_mutex)) { - nfsd_put(net); - mutex_unlock(&nfsd_mutex); - break; - } - msleep(20); - } - return 0; }
--- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -123,19 +123,6 @@ static inline void svc_put(struct svc_se kref_put(&serv->sv_refcnt, svc_destroy); }
-/** - * svc_put_not_last - decrement non-final reference count on SUNRPC serv - * @serv: the svc_serv to have count decremented - * - * Returns: %true is refcount was decremented. - * - * If the refcount is 1, it is not decremented and instead failure is reported. - */ -static inline bool svc_put_not_last(struct svc_serv *serv) -{ - return refcount_dec_not_one(&serv->sv_refcnt.refcount); -} - /* * Maximum payload size supported by a kernel RPC server. * This is use to determine the max number of pages nfsd is
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 9f28a971ee9fdf1bf8ce8c88b103f483be610277 ]
Now that the last nfsd thread is stopped by an explicit act of calling svc_set_num_threads() with a count of zero, we only have a limited number of places that can happen, and don't need to call nfsd_last_thread() in nfsd_put()
So separate that out and call it at the two places where the number of threads is set to zero.
Move the clearing of ->nfsd_serv and the call to svc_xprt_destroy_all() into nfsd_last_thread(), as they are really part of the same action.
nfsd_put() is now a thin wrapper around svc_put(), so make it a static inline.
nfsd_put() cannot be called after nfsd_last_thread(), so in a couple of places we have to use svc_put() instead.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfsd.h | 7 ++++++- fs/nfsd/nfssvc.c | 52 +++++++++++++++++++--------------------------------- 2 files changed, 25 insertions(+), 34 deletions(-)
--- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -97,7 +97,12 @@ int nfsd_pool_stats_open(struct inode * int nfsd_pool_stats_release(struct inode *, struct file *); void nfsd_shutdown_threads(struct net *net);
-void nfsd_put(struct net *net); +static inline void nfsd_put(struct net *net) +{ + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + + svc_put(nn->nfsd_serv); +}
bool i_am_nfsd(void);
--- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -532,9 +532,14 @@ static struct notifier_block nfsd_inet6a /* Only used under nfsd_mutex, so this atomic may be overkill: */ static atomic_t nfsd_notifier_refcount = ATOMIC_INIT(0);
-static void nfsd_last_thread(struct svc_serv *serv, struct net *net) +static void nfsd_last_thread(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv = nn->nfsd_serv; + + spin_lock(&nfsd_notifier_lock); + nn->nfsd_serv = NULL; + spin_unlock(&nfsd_notifier_lock);
/* check if the notifier still has clients */ if (atomic_dec_return(&nfsd_notifier_refcount) == 0) { @@ -544,6 +549,8 @@ static void nfsd_last_thread(struct svc_ #endif }
+ svc_xprt_destroy_all(serv, net); + /* * write_ports can create the server without actually starting * any threads--if we get shut down before any threads are @@ -634,7 +641,8 @@ void nfsd_shutdown_threads(struct net *n svc_get(serv); /* Kill outstanding nfsd threads */ svc_set_num_threads(serv, NULL, 0); - nfsd_put(net); + nfsd_last_thread(net); + svc_put(serv); mutex_unlock(&nfsd_mutex); }
@@ -665,9 +673,6 @@ int nfsd_create_serv(struct net *net) serv->sv_maxconn = nn->max_connections; error = svc_bind(serv, net); if (error < 0) { - /* NOT nfsd_put() as notifiers (see below) haven't - * been set up yet. - */ svc_put(serv); return error; } @@ -710,29 +715,6 @@ int nfsd_get_nrthreads(int n, int *nthre return 0; }
-/* This is the callback for kref_put() below. - * There is no code here as the first thing to be done is - * call svc_shutdown_net(), but we cannot get the 'net' from - * the kref. So do all the work when kref_put returns true. - */ -static void nfsd_noop(struct kref *ref) -{ -} - -void nfsd_put(struct net *net) -{ - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - - if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { - svc_xprt_destroy_all(nn->nfsd_serv, net); - nfsd_last_thread(nn->nfsd_serv, net); - svc_destroy(&nn->nfsd_serv->sv_refcnt); - spin_lock(&nfsd_notifier_lock); - nn->nfsd_serv = NULL; - spin_unlock(&nfsd_notifier_lock); - } -} - int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) { int i = 0; @@ -783,7 +765,7 @@ int nfsd_set_nrthreads(int n, int *nthre if (err) break; } - nfsd_put(net); + svc_put(nn->nfsd_serv); return err; }
@@ -798,6 +780,7 @@ nfsd_svc(int nrservs, struct net *net, c int error; bool nfsd_up_before; struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv;
mutex_lock(&nfsd_mutex); dprintk("nfsd: creating service\n"); @@ -817,22 +800,25 @@ nfsd_svc(int nrservs, struct net *net, c goto out;
nfsd_up_before = nn->nfsd_net_up; + serv = nn->nfsd_serv;
error = nfsd_startup_net(net, cred); if (error) goto out_put; - error = svc_set_num_threads(nn->nfsd_serv, NULL, nrservs); + error = svc_set_num_threads(serv, NULL, nrservs); if (error) goto out_shutdown; - error = nn->nfsd_serv->sv_nrthreads; + error = serv->sv_nrthreads; + if (error == 0) + nfsd_last_thread(net); out_shutdown: if (error < 0 && !nfsd_up_before) nfsd_shutdown_net(net); out_put: /* Threads now hold service active */ if (xchg(&nn->keep_active, 0)) - nfsd_put(net); - nfsd_put(net); + svc_put(serv); + svc_put(serv); out: mutex_unlock(&nfsd_mutex); return error;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit bf32075256e9dd9c6b736859e2c5813981339908 ]
The error paths in nfsd_svc() are needlessly complex and can result in a final call to svc_put() without nfsd_last_thread() being called. This results in the listening sockets not being closed properly.
The per-netns setup provided by nfsd_startup_new() and removed by nfsd_shutdown_net() is needed precisely when there are running threads. So we don't need nfsd_up_before. We don't need to know if it *was* up. We only need to know if any threads are left. If none are, then we must call nfsd_shutdown_net(). But we don't need to do that explicitly as nfsd_last_thread() does that for us.
So simply call nfsd_last_thread() before the last svc_put() if there are no running threads. That will always do the right thing.
Also discard: pr_info("nfsd: last server has exited, flushing export cache\n"); It may not be true if an attempt to start the first server failed, and it isn't particularly helpful and it simply reports normal behaviour.
Signed-off-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfssvc.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-)
--- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -562,7 +562,6 @@ static void nfsd_last_thread(struct net return;
nfsd_shutdown_net(net); - pr_info("nfsd: last server has exited, flushing export cache\n"); nfsd_export_flush(net); }
@@ -778,7 +777,6 @@ int nfsd_svc(int nrservs, struct net *net, const struct cred *cred) { int error; - bool nfsd_up_before; struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct svc_serv *serv;
@@ -798,8 +796,6 @@ nfsd_svc(int nrservs, struct net *net, c error = nfsd_create_serv(net); if (error) goto out; - - nfsd_up_before = nn->nfsd_net_up; serv = nn->nfsd_serv;
error = nfsd_startup_net(net, cred); @@ -807,17 +803,15 @@ nfsd_svc(int nrservs, struct net *net, c goto out_put; error = svc_set_num_threads(serv, NULL, nrservs); if (error) - goto out_shutdown; + goto out_put; error = serv->sv_nrthreads; - if (error == 0) - nfsd_last_thread(net); -out_shutdown: - if (error < 0 && !nfsd_up_before) - nfsd_shutdown_net(net); out_put: /* Threads now hold service active */ if (xchg(&nn->keep_active, 0)) svc_put(serv); + + if (serv->sv_nrthreads == 0) + nfsd_last_thread(net); svc_put(serv); out: mutex_unlock(&nfsd_mutex);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 2a501f55cd641eb4d3c16a2eab0d678693fac663 ]
If write_ports_addfd or write_ports_addxprt fail, they call nfsd_put() without calling nfsd_last_thread(). This leaves nn->nfsd_serv pointing to a structure that has been freed.
So remove 'static' from nfsd_last_thread() and call it when the nfsd_serv is about to be destroyed.
Fixes: ec52361df99b ("SUNRPC: stop using ->sv_nrthreads as a refcount") Signed-off-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfsctl.c | 9 +++++++-- fs/nfsd/nfsd.h | 1 + fs/nfsd/nfssvc.c | 2 +- 3 files changed, 9 insertions(+), 3 deletions(-)
--- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -720,8 +720,10 @@ static ssize_t __write_ports_addfd(char
err = svc_addsock(nn->nfsd_serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
- if (err >= 0 && - !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + if (err < 0 && !nn->nfsd_serv->sv_nrthreads && !nn->keep_active) + nfsd_last_thread(net); + else if (err >= 0 && + !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) svc_get(nn->nfsd_serv);
nfsd_put(net); @@ -771,6 +773,9 @@ out_close: svc_xprt_put(xprt); } out_err: + if (!nn->nfsd_serv->sv_nrthreads && !nn->keep_active) + nfsd_last_thread(net); + nfsd_put(net); return err; } --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -139,6 +139,7 @@ int nfsd_vers(struct nfsd_net *nn, int v int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change); void nfsd_reset_versions(struct nfsd_net *nn); int nfsd_create_serv(struct net *net); +void nfsd_last_thread(struct net *net);
extern int nfsd_max_blksize;
--- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -532,7 +532,7 @@ static struct notifier_block nfsd_inet6a /* Only used under nfsd_mutex, so this atomic may be overkill: */ static atomic_t nfsd_notifier_refcount = ATOMIC_INIT(0);
-static void nfsd_last_thread(struct net *net) +void nfsd_last_thread(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct svc_serv *serv = nn->nfsd_serv;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 64e6304169f1e1f078e7f0798033f80a7fb0ea46 ]
It's not safe to call nfsd_put once nfsd_last_thread has been called, as that function will zero out the nn->nfsd_serv pointer.
Drop the nfsd_put helper altogether and open-code the svc_put in its callers instead. That allows us to not be reliant on the value of that pointer when handling an error.
Fixes: 2a501f55cd64 ("nfsd: call nfsd_last_thread() before final nfsd_put()") Reported-by: Zhi Li yieli@redhat.com Cc: NeilBrown neilb@suse.de Signed-off-by: Jeffrey Layton jlayton@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfsctl.c | 31 +++++++++++++++++-------------- fs/nfsd/nfsd.h | 7 ------- 2 files changed, 17 insertions(+), 21 deletions(-)
--- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -709,6 +709,7 @@ static ssize_t __write_ports_addfd(char char *mesg = buf; int fd, err; struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv;
err = get_int(&mesg, &fd); if (err != 0 || fd < 0) @@ -718,15 +719,15 @@ static ssize_t __write_ports_addfd(char if (err != 0) return err;
- err = svc_addsock(nn->nfsd_serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); + serv = nn->nfsd_serv; + err = svc_addsock(serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
- if (err < 0 && !nn->nfsd_serv->sv_nrthreads && !nn->keep_active) + if (err < 0 && !serv->sv_nrthreads && !nn->keep_active) nfsd_last_thread(net); - else if (err >= 0 && - !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) - svc_get(nn->nfsd_serv); + else if (err >= 0 && !serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + svc_get(serv);
- nfsd_put(net); + svc_put(serv); return err; }
@@ -740,6 +741,7 @@ static ssize_t __write_ports_addxprt(cha struct svc_xprt *xprt; int port, err; struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv;
if (sscanf(buf, "%15s %5u", transport, &port) != 2) return -EINVAL; @@ -751,32 +753,33 @@ static ssize_t __write_ports_addxprt(cha if (err != 0) return err;
- err = svc_xprt_create(nn->nfsd_serv, transport, net, + serv = nn->nfsd_serv; + err = svc_xprt_create(serv, transport, net, PF_INET, port, SVC_SOCK_ANONYMOUS, cred); if (err < 0) goto out_err;
- err = svc_xprt_create(nn->nfsd_serv, transport, net, + err = svc_xprt_create(serv, transport, net, PF_INET6, port, SVC_SOCK_ANONYMOUS, cred); if (err < 0 && err != -EAFNOSUPPORT) goto out_close;
- if (!nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) - svc_get(nn->nfsd_serv); + if (!serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + svc_get(serv);
- nfsd_put(net); + svc_put(serv); return 0; out_close: - xprt = svc_find_xprt(nn->nfsd_serv, transport, net, PF_INET, port); + xprt = svc_find_xprt(serv, transport, net, PF_INET, port); if (xprt != NULL) { svc_xprt_close(xprt); svc_xprt_put(xprt); } out_err: - if (!nn->nfsd_serv->sv_nrthreads && !nn->keep_active) + if (!serv->sv_nrthreads && !nn->keep_active) nfsd_last_thread(net);
- nfsd_put(net); + svc_put(serv); return err; }
--- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -97,13 +97,6 @@ int nfsd_pool_stats_open(struct inode * int nfsd_pool_stats_release(struct inode *, struct file *); void nfsd_shutdown_threads(struct net *net);
-static inline void nfsd_put(struct net *net) -{ - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - - svc_put(nn->nfsd_serv); -} - bool i_am_nfsd(void);
struct nfsdfs_client {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 05eda6e75773592760285e10ac86c56d683be17f ]
It is possible for free_blocked_lock() to be called twice concurrently, once from nfsd4_lock() and once from nfsd4_release_lockowner() calling remove_blocked_locks(). This is why a kref was added.
It is perfectly safe for locks_delete_block() and kref_put() to be called in parallel as they use locking or atomicity respectively as protection. However locks_release_private() has no locking. It is safe for it to be called twice sequentially, but not concurrently.
This patch moves that call from free_blocked_lock() where it could race with itself, to free_nbl() where it cannot. This will slightly delay the freeing of private info or release of the owner - but not by much. It is arguably more natural for this freeing to happen in free_nbl() where the structure itself is freed.
This bug was found by code inspection - it has not been seen in practice.
Fixes: 47446d74f170 ("nfsd4: add refcount for nfsd4_blocked_lock") Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -318,6 +318,7 @@ free_nbl(struct kref *kref) struct nfsd4_blocked_lock *nbl;
nbl = container_of(kref, struct nfsd4_blocked_lock, nbl_kref); + locks_release_private(&nbl->nbl_lock); kfree(nbl); }
@@ -325,7 +326,6 @@ static void free_blocked_lock(struct nfsd4_blocked_lock *nbl) { locks_delete_block(&nbl->nbl_lock); - locks_release_private(&nbl->nbl_lock); kref_put(&nbl->nbl_kref, free_nbl); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 6412e44c40aaf8f1d7320b2099c5bdd6cb9126ac ]
Commit bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations") broke the NFSv3 pre/post op attributes behaviour when doing a SETATTR rpc call by stripping out the calls to fh_fill_pre_attrs() and fh_fill_post_attrs().
Fixes: bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Reviewed-by: Jeff Layton jlayton@kernel.org Reviewed-by: NeilBrown neilb@suse.de Message-ID: 20240216012451.22725-1-trondmy@kernel.org [ cel: adjusted to apply to v6.1.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4proc.c | 4 ++++ fs/nfsd/vfs.c | 6 ++++-- 2 files changed, 8 insertions(+), 2 deletions(-)
--- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1106,6 +1106,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, st }; struct inode *inode; __be32 status = nfs_ok; + bool save_no_wcc; int err;
if (setattr->sa_iattr.ia_valid & ATTR_SIZE) { @@ -1131,8 +1132,11 @@ nfsd4_setattr(struct svc_rqst *rqstp, st
if (status) goto out; + save_no_wcc = cstate->current_fh.fh_no_wcc; + cstate->current_fh.fh_no_wcc = true; status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, 0, (time64_t)0); + cstate->current_fh.fh_no_wcc = save_no_wcc; if (!status) status = nfserrno(attrs.na_labelerr); if (!status) --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -475,7 +475,7 @@ nfsd_setattr(struct svc_rqst *rqstp, str int accmode = NFSD_MAY_SATTR; umode_t ftype = 0; __be32 err; - int host_err; + int host_err = 0; bool get_write_count; bool size_change = (iap->ia_valid & ATTR_SIZE); int retries; @@ -533,6 +533,7 @@ nfsd_setattr(struct svc_rqst *rqstp, str }
inode_lock(inode); + fh_fill_pre_attrs(fhp); for (retries = 1;;) { struct iattr attrs;
@@ -560,13 +561,14 @@ nfsd_setattr(struct svc_rqst *rqstp, str attr->na_aclerr = set_posix_acl(&init_user_ns, inode, ACL_TYPE_DEFAULT, attr->na_dpacl); + fh_fill_post_attrs(fhp); inode_unlock(inode); if (size_change) put_write_access(inode); out: if (!host_err) host_err = commit_metadata(fhp); - return nfserrno(host_err); + return err != 0 ? err : nfserrno(host_err); }
#if defined(CONFIG_NFSD_V4)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lee, Chun-Yi joeyli.kernel@gmail.com
commit 9c33663af9ad115f90c076a1828129a3fbadea98 upstream.
This patch adds code to check HCI_UART_PROTO_READY flag before accessing hci_uart->proto. It fixes the race condition in hci_uart_tty_ioctl() between HCIUARTSETPROTO and HCIUARTGETPROTO. This issue bug found by Yu Hao and Weiteng Chen:
BUG: general protection fault in hci_uart_tty_ioctl [1]
The information of C reproducer can also reference the link [2]
Reported-by: Yu Hao yhao016@ucr.edu Closes: https://lore.kernel.org/all/CA+UBctC3p49aTgzbVgkSZ2+TQcqq4fPDO7yZitFT5uBPDeC... [1] Reported-by: Weiteng Chen wchen130@ucr.edu Closes: https://lore.kernel.org/lkml/CA+UBctDPEvHdkHMwD340=n02rh+jNRJNNQ5LBZNA+Wm4Ke... [2] Signed-off-by: "Lee, Chun-Yi" jlee@suse.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/bluetooth/hci_ldisc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/bluetooth/hci_ldisc.c +++ b/drivers/bluetooth/hci_ldisc.c @@ -770,7 +770,8 @@ static int hci_uart_tty_ioctl(struct tty break;
case HCIUARTGETPROTO: - if (test_bit(HCI_UART_PROTO_SET, &hu->flags)) + if (test_bit(HCI_UART_PROTO_SET, &hu->flags) && + test_bit(HCI_UART_PROTO_READY, &hu->flags)) err = hu->proto->id; else err = -EUNATCH;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boyuan Zhang boyuan.zhang@amd.com
commit ecfa23c8df7ef3ea2a429dfe039341bf792e95b4 upstream.
Determine whether VCN using unified queue in sw_init, instead of calling functions later on.
v2: fix coding style
Signed-off-by: Boyuan Zhang boyuan.zhang@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Ruijing Dong ruijing.dong@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 39 ++++++++++++-------------------- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 2 files changed, 16 insertions(+), 24 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -239,6 +239,10 @@ int amdgpu_vcn_sw_init(struct amdgpu_dev return r; }
+ /* from vcn4 and above, only unified queue is used */ + adev->vcn.using_unified_queue = + adev->ip_versions[UVD_HWIP][0] >= IP_VERSION(4, 0, 0); + hdr = (const struct common_firmware_header *)adev->vcn.fw->data; adev->vcn.fw_version = le32_to_cpu(hdr->ucode_version);
@@ -357,18 +361,6 @@ int amdgpu_vcn_sw_fini(struct amdgpu_dev return 0; }
-/* from vcn4 and above, only unified queue is used */ -static bool amdgpu_vcn_using_unified_queue(struct amdgpu_ring *ring) -{ - struct amdgpu_device *adev = ring->adev; - bool ret = false; - - if (adev->ip_versions[UVD_HWIP][0] >= IP_VERSION(4, 0, 0)) - ret = true; - - return ret; -} - bool amdgpu_vcn_is_disabled_vcn(struct amdgpu_device *adev, enum vcn_ring_type type, uint32_t vcn_instance) { bool ret = false; @@ -806,12 +798,11 @@ static int amdgpu_vcn_dec_sw_send_msg(st struct amdgpu_job *job; struct amdgpu_ib *ib; uint64_t addr = AMDGPU_GPU_PAGE_ALIGN(ib_msg->gpu_addr); - bool sq = amdgpu_vcn_using_unified_queue(ring); uint32_t *ib_checksum; uint32_t ib_pack_in_dw; int i, r;
- if (sq) + if (adev->vcn.using_unified_queue) ib_size_dw += 8;
r = amdgpu_job_alloc_with_ib(adev, ib_size_dw * 4, @@ -823,7 +814,7 @@ static int amdgpu_vcn_dec_sw_send_msg(st ib->length_dw = 0;
/* single queue headers */ - if (sq) { + if (adev->vcn.using_unified_queue) { ib_pack_in_dw = sizeof(struct amdgpu_vcn_decode_buffer) / sizeof(uint32_t) + 4 + 2; /* engine info + decoding ib in dw */ ib_checksum = amdgpu_vcn_unified_ring_ib_header(ib, ib_pack_in_dw, false); @@ -842,7 +833,7 @@ static int amdgpu_vcn_dec_sw_send_msg(st for (i = ib->length_dw; i < ib_size_dw; ++i) ib->ptr[i] = 0x0;
- if (sq) + if (adev->vcn.using_unified_queue) amdgpu_vcn_unified_ring_ib_checksum(&ib_checksum, ib_pack_in_dw);
r = amdgpu_job_submit_direct(job, ring, &f); @@ -932,15 +923,15 @@ static int amdgpu_vcn_enc_get_create_msg struct dma_fence **fence) { unsigned int ib_size_dw = 16; + struct amdgpu_device *adev = ring->adev; struct amdgpu_job *job; struct amdgpu_ib *ib; struct dma_fence *f = NULL; uint32_t *ib_checksum = NULL; uint64_t addr; - bool sq = amdgpu_vcn_using_unified_queue(ring); int i, r;
- if (sq) + if (adev->vcn.using_unified_queue) ib_size_dw += 8;
r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, @@ -953,7 +944,7 @@ static int amdgpu_vcn_enc_get_create_msg
ib->length_dw = 0;
- if (sq) + if (adev->vcn.using_unified_queue) ib_checksum = amdgpu_vcn_unified_ring_ib_header(ib, 0x11, true);
ib->ptr[ib->length_dw++] = 0x00000018; @@ -975,7 +966,7 @@ static int amdgpu_vcn_enc_get_create_msg for (i = ib->length_dw; i < ib_size_dw; ++i) ib->ptr[i] = 0x0;
- if (sq) + if (adev->vcn.using_unified_queue) amdgpu_vcn_unified_ring_ib_checksum(&ib_checksum, 0x11);
r = amdgpu_job_submit_direct(job, ring, &f); @@ -998,15 +989,15 @@ static int amdgpu_vcn_enc_get_destroy_ms struct dma_fence **fence) { unsigned int ib_size_dw = 16; + struct amdgpu_device *adev = ring->adev; struct amdgpu_job *job; struct amdgpu_ib *ib; struct dma_fence *f = NULL; uint32_t *ib_checksum = NULL; uint64_t addr; - bool sq = amdgpu_vcn_using_unified_queue(ring); int i, r;
- if (sq) + if (adev->vcn.using_unified_queue) ib_size_dw += 8;
r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, @@ -1019,7 +1010,7 @@ static int amdgpu_vcn_enc_get_destroy_ms
ib->length_dw = 0;
- if (sq) + if (adev->vcn.using_unified_queue) ib_checksum = amdgpu_vcn_unified_ring_ib_header(ib, 0x11, true);
ib->ptr[ib->length_dw++] = 0x00000018; @@ -1041,7 +1032,7 @@ static int amdgpu_vcn_enc_get_destroy_ms for (i = ib->length_dw; i < ib_size_dw; ++i) ib->ptr[i] = 0x0;
- if (sq) + if (adev->vcn.using_unified_queue) amdgpu_vcn_unified_ring_ib_checksum(&ib_checksum, 0x11);
r = amdgpu_job_submit_direct(job, ring, &f); --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h @@ -271,6 +271,7 @@ struct amdgpu_vcn {
struct ras_common_if *ras_if; struct amdgpu_vcn_ras *ras; + bool using_unified_queue; };
struct amdgpu_fw_shared_rb_ptrs_struct {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boyuan Zhang boyuan.zhang@amd.com
commit 7d75ef3736a025db441be652c8cc8e84044a215f upstream.
For unified queue, DPG pause for encoding is done inside VCN firmware, so there is no need to pause dpg based on ring type in kernel.
For VCN3 and below, pausing DPG for encoding in kernel is still needed.
v2: add more comments v3: update commit message
Signed-off-by: Boyuan Zhang boyuan.zhang@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Ruijing Dong ruijing.dong@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -472,7 +472,9 @@ static void amdgpu_vcn_idle_work_handler fence[j] += amdgpu_fence_count_emitted(&adev->vcn.inst[j].ring_enc[i]); }
- if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) { + /* Only set DPG pause for VCN3 or below, VCN4 and above will be handled by FW */ + if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG && + !adev->vcn.using_unified_queue) { struct dpg_pause_state new_state;
if (fence[j] || @@ -518,7 +520,9 @@ void amdgpu_vcn_ring_begin_use(struct am amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCN, AMD_PG_STATE_UNGATE);
- if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) { + /* Only set DPG pause for VCN3 or below, VCN4 and above will be handled by FW */ + if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG && + !adev->vcn.using_unified_queue) { struct dpg_pause_state new_state;
if (ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC) { @@ -544,8 +548,12 @@ void amdgpu_vcn_ring_begin_use(struct am
void amdgpu_vcn_ring_end_use(struct amdgpu_ring *ring) { + struct amdgpu_device *adev = ring->adev; + + /* Only set DPG pause for VCN3 or below, VCN4 and above will be handled by FW */ if (ring->adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG && - ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC) + ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC && + !adev->vcn.using_unified_queue) atomic_dec(&ring->adev->vcn.inst[ring->me].dpg_enc_submission_cnt);
atomic_dec(&ring->adev->vcn.total_submission_cnt);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li RongQing lirongqing@baidu.com
commit 8e6ed96cdd5001c55fccc80a17f651741c1ca7d2 upstream.
when the vCPU was migrated, if its timer is expired, KVM _should_ fire the timer ASAP, zeroing the deadline here will cause the timer to immediately fire on the destination
Cc: Sean Christopherson seanjc@google.com Cc: Peter Shier pshier@google.com Cc: Jim Mattson jmattson@google.com Cc: Wanpeng Li wanpengli@tencent.com Cc: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Li RongQing lirongqing@baidu.com Link: https://lore.kernel.org/r/20230106040625.8404-1-lirongqing@baidu.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: David Hunter david.hunter.linux@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/lapic.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1843,8 +1843,12 @@ static bool set_target_expiration(struct if (unlikely(count_reg != APIC_TMICT)) { deadline = tmict_to_ns(apic, kvm_lapic_get_reg(apic, count_reg)); - if (unlikely(deadline <= 0)) - deadline = apic->lapic_timer.period; + if (unlikely(deadline <= 0)) { + if (apic_lvtt_period(apic)) + deadline = apic->lapic_timer.period; + else + deadline = 0; + } else if (unlikely(deadline > apic->lapic_timer.period)) { pr_info_ratelimited( "kvm: vcpu %i: requested lapic timer restore with "
On Tue, Aug 27, 2024, Greg Kroah-Hartman wrote:
6.1-stable review patch. If anyone has any objections, please let me know.
Purely to try and avoid more confusion,
Acked-by: Sean Christopherson seanjc@google.com
as the fix that needs to be paired with this commit has already landed in 6.1.y as 7545ddda9c98 ("KVM: x86: Fix lapic timer interrupt lost after loading a snapshot.")
From: Li RongQing lirongqing@baidu.com
commit 8e6ed96cdd5001c55fccc80a17f651741c1ca7d2 upstream.
when the vCPU was migrated, if its timer is expired, KVM _should_ fire the timer ASAP, zeroing the deadline here will cause the timer to immediately fire on the destination
Cc: Sean Christopherson seanjc@google.com Cc: Peter Shier pshier@google.com Cc: Jim Mattson jmattson@google.com Cc: Wanpeng Li wanpengli@tencent.com Cc: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Li RongQing lirongqing@baidu.com Link: https://lore.kernel.org/r/20230106040625.8404-1-lirongqing@baidu.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: David Hunter david.hunter.linux@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/x86/kvm/lapic.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1843,8 +1843,12 @@ static bool set_target_expiration(struct if (unlikely(count_reg != APIC_TMICT)) { deadline = tmict_to_ns(apic, kvm_lapic_get_reg(apic, count_reg));
if (unlikely(deadline <= 0))
deadline = apic->lapic_timer.period;
if (unlikely(deadline <= 0)) {
if (apic_lvtt_period(apic))
deadline = apic->lapic_timer.period;
else
deadline = 0;
} else if (unlikely(deadline > apic->lapic_timer.period)) { pr_info_ratelimited( "kvm: vcpu %i: requested lapic timer restore with "
On Tue, Aug 27, 2024 at 11:56:32AM -0700, Sean Christopherson wrote:
On Tue, Aug 27, 2024, Greg Kroah-Hartman wrote:
6.1-stable review patch. If anyone has any objections, please let me know.
Purely to try and avoid more confusion,
Acked-by: Sean Christopherson seanjc@google.com
as the fix that needs to be paired with this commit has already landed in 6.1.y as 7545ddda9c98 ("KVM: x86: Fix lapic timer interrupt lost after loading a snapshot.")
Thanks for the clarification, much appreciated.
greg k-h
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Jan H�ppner" hoeppner@linux.ibm.com
This reverts commit bc792884b76f ("s390/dasd: Establish DMA alignment").
Quoting the original commit: linux-next commit bf8d08532bc1 ("iomap: add support for dma aligned direct-io") changes the alignment requirement to come from the block device rather than the block size, and the default alignment requirement is 512-byte boundaries. Since DASD I/O has page alignments for IDAW/TIDAW requests, let's override this value to restore the expected behavior.
I mentioned TIDAW, but that was wrong. TIDAWs have no distinct alignment requirement (per p. 15-70 of POPS SA22-7832-13):
Unless otherwise specified, TIDAWs may designate a block of main storage on any boundary and length up to 4K bytes, provided the specified block does not cross a 4 K-byte boundary.
IDAWs do, but the original commit neglected that while ECKD DASD are typically formatted in 4096-byte blocks, they don't HAVE to be. Formatting an ECKD volume with smaller blocks is permitted (dasdfmt -b xxx), and the problematic commit enforces alignment properties to such a device that will result in errors, such as:
[test@host ~]# lsdasd -l a367 | grep blksz blksz: 512 [test@host ~]# mkfs.xfs -f /dev/disk/by-path/ccw-0.0.a367-part1 meta-data=/dev/dasdc1 isize=512 agcount=4, agsize=230075 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 data = bsize=4096 blocks=920299, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=16384, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 error reading existing superblock: Invalid argument mkfs.xfs: pwrite failed: Invalid argument libxfs_bwrite: write failed on (unknown) bno 0x70565c/0x100, err=22 mkfs.xfs: Releasing dirty buffer to free list! found dirty buffer (bulk) on free list! mkfs.xfs: pwrite failed: Invalid argument ...snipped...
The original commit omitted the FBA discipline for just this reason, but the formatted block size of the other disciplines was overlooked. The solution to all of this is to revert to the original behavior, such that the block size can be respected.
But what of the original problem? That was manifested with a direct-io QEMU guest, where QEMU itself was changed a month or two later with commit 25474d90aa ("block: use the request length for iov alignment") such that the blamed kernel commit is unnecessary.
Note: This is an adapted version of the original upstream commit 2a07bb64d801 ("s390/dasd: Remove DMA alignment").
Cc: stable@vger.kernel.org # 6.0+ Signed-off-by: Jan Höppner hoeppner@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/block/dasd_diag.c | 1 - drivers/s390/block/dasd_eckd.c | 1 - 2 files changed, 2 deletions(-)
--- a/drivers/s390/block/dasd_diag.c +++ b/drivers/s390/block/dasd_diag.c @@ -639,7 +639,6 @@ static void dasd_diag_setup_blk_queue(st /* With page sized segments each segment can be translated into one idaw/tidaw */ blk_queue_max_segment_size(q, PAGE_SIZE); blk_queue_segment_boundary(q, PAGE_SIZE - 1); - blk_queue_dma_alignment(q, PAGE_SIZE - 1); }
static int dasd_diag_pe_handler(struct dasd_device *device, --- a/drivers/s390/block/dasd_eckd.c +++ b/drivers/s390/block/dasd_eckd.c @@ -6889,7 +6889,6 @@ static void dasd_eckd_setup_blk_queue(st /* With page sized segments each segment can be translated into one idaw/tidaw */ blk_queue_max_segment_size(q, PAGE_SIZE); blk_queue_segment_boundary(q, PAGE_SIZE - 1); - blk_queue_dma_alignment(q, PAGE_SIZE - 1); }
static struct ccw_driver dasd_eckd_driver = {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrew Melnychenko andrew@daynix.com
commit 1fd54773c26787b9aea80e2f62c7d0780ea444d0 upstream.
Allow UDP_L4 for robust packets.
Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: Andrew Melnychenko andrew@daynix.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/udp_offload.c | 3 ++- net/ipv6/udp_offload.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
--- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -387,7 +387,8 @@ static struct sk_buff *udp4_ufo_fragment if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto out;
- if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && + !skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) return __udp_gso_segment(skb, features, false);
mss = skb_shinfo(skb)->gso_size; --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -42,7 +42,8 @@ static struct sk_buff *udp6_ufo_fragment if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto out;
- if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && + !skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) return __udp_gso_segment(skb, features, true);
mss = skb_shinfo(skb)->gso_size;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yan Zhai yan@cloudflare.com
commit 9840036786d90cea11a90d1f30b6dc003b34ee67 upstream.
Commit 1fd54773c267 ("udp: allow header check for dodgy GSO_UDP_L4 packets.") checks DODGY bit for UDP, but for packets that can be fed directly to the device after gso_segs reset, it actually falls through to fragmentation:
https://lore.kernel.org/all/CAJPywTKDdjtwkLVUW6LRA2FU912qcDmQOQGt2WaDo28KzYD...
This change restores the expected behavior of GSO_UDP_L4 packets.
Fixes: 1fd54773c267 ("udp: allow header check for dodgy GSO_UDP_L4 packets.") Suggested-by: Willem de Bruijn willemdebruijn.kernel@gmail.com Signed-off-by: Yan Zhai yan@cloudflare.com Reviewed-by: Willem de Bruijn willemb@google.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/udp_offload.c | 16 +++++++++++----- net/ipv6/udp_offload.c | 3 +-- 2 files changed, 12 insertions(+), 7 deletions(-)
--- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -273,13 +273,20 @@ struct sk_buff *__udp_gso_segment(struct __sum16 check; __be16 newlen;
- if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST) - return __udp_gso_segment_list(gso_skb, features, is_ipv6); - mss = skb_shinfo(gso_skb)->gso_size; if (gso_skb->len <= sizeof(*uh) + mss) return ERR_PTR(-EINVAL);
+ if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) { + /* Packet is from an untrusted source, reset gso_segs. */ + skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh), + mss); + return NULL; + } + + if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST) + return __udp_gso_segment_list(gso_skb, features, is_ipv6); + skb_pull(gso_skb, sizeof(*uh));
/* clear destructor to avoid skb_segment assigning it to tail */ @@ -387,8 +394,7 @@ static struct sk_buff *udp4_ufo_fragment if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto out;
- if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && - !skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) return __udp_gso_segment(skb, features, false);
mss = skb_shinfo(skb)->gso_size; --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -42,8 +42,7 @@ static struct sk_buff *udp6_ufo_fragment if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto out;
- if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && - !skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) return __udp_gso_segment(skb, features, true);
mss = skb_shinfo(skb)->gso_size;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
commit fc8b2a619469378717e7270d2a4e1ef93c585f7a upstream.
Syzbot reported two new paths to hit an internal WARNING using the new virtio gso type VIRTIO_NET_HDR_GSO_UDP_L4.
RIP: 0010:skb_checksum_help+0x4a2/0x600 net/core/dev.c:3260 skb len=64521 gso_size=344 and
RIP: 0010:skb_warn_bad_offload+0x118/0x240 net/core/dev.c:3262
Older virtio types have historically had loose restrictions, leading to many entirely impractical fuzzer generated packets causing problems deep in the kernel stack. Ideally, we would have had strict validation for all types from the start.
New virtio types can have tighter validation. Limit UDP GSO packets inserted via virtio to the same limits imposed by the UDP_SEGMENT socket interface:
1. must use checksum offload 2. checksum offload matches UDP header 3. no more segments than UDP_MAX_SEGMENTS 4. UDP GSO does not take modifier flags, notably SKB_GSO_TCP_ECN
Fixes: 860b7f27b8f7 ("linux/virtio_net.h: Support USO offload in vnet header.") Reported-by: syzbot+01cdbc31e9c0ae9b33ac@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/0000000000005039270605eb0b7f@google.com/ Reported-by: syzbot+c99d835ff081ca30f986@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/0000000000005426680605eb0b9f@google.com/ Signed-off-by: Willem de Bruijn willemb@google.com Reviewed-by: Eric Dumazet edumazet@google.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/virtio_net.h | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
--- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -3,8 +3,8 @@ #define _LINUX_VIRTIO_NET_H
#include <linux/if_vlan.h> +#include <linux/udp.h> #include <uapi/linux/tcp.h> -#include <uapi/linux/udp.h> #include <uapi/linux/virtio_net.h>
static inline bool virtio_net_hdr_match_proto(__be16 protocol, __u8 gso_type) @@ -155,9 +155,22 @@ retry: unsigned int nh_off = p_off; struct skb_shared_info *shinfo = skb_shinfo(skb);
- /* UFO may not include transport header in gso_size. */ - if (gso_type & SKB_GSO_UDP) + switch (gso_type & ~SKB_GSO_TCP_ECN) { + case SKB_GSO_UDP: + /* UFO may not include transport header in gso_size. */ nh_off -= thlen; + break; + case SKB_GSO_UDP_L4: + if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM)) + return -EINVAL; + if (skb->csum_offset != offsetof(struct udphdr, check)) + return -EINVAL; + if (skb->len - p_off > gso_size * UDP_MAX_SEGMENTS) + return -EINVAL; + if (gso_type != SKB_GSO_UDP_L4) + return -EINVAL; + break; + }
/* Kernel has a special handling for GSO_BY_FRAGS. */ if (gso_size == GSO_BY_FRAGS)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
commit 89add40066f9ed9abe5f7f886fe5789ff7e0c50e upstream.
Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb for GSO packets.
The function already checks that a checksum requested with VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets this might not hold for segs after segmentation.
Syzkaller demonstrated to reach this warning in skb_checksum_help
offset = skb_checksum_start_offset(skb); ret = -EINVAL; if (WARN_ON_ONCE(offset >= skb_headlen(skb)))
By injecting a TSO packet:
WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0 ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774 ip_finish_output_gso net/ipv4/ip_output.c:279 [inline] __ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301 iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4850 [inline] netdev_start_xmit include/linux/netdevice.h:4864 [inline] xmit_one net/core/dev.c:3595 [inline] dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611 __dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261 packet_snd net/packet/af_packet.c:3073 [inline]
The geometry of the bad input packet at tcp_gso_segment:
[ 52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0 [ 52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244 [ 52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0)) [ 52.003050][ T8403] csum(0x60000c7 start=199 offset=1536 ip_summed=3 complete_sw=0 valid=0 level=0)
Mitigate with stricter input validation.
csum_offset: for GSO packets, deduce the correct value from gso_type. This is already done for USO. Extend it to TSO. Let UFO be: udp[46]_ufo_fragment ignores these fields and always computes the checksum in software.
csum_start: finding the real offset requires parsing to the transport header. Do not add a parser, use existing segmentation parsing. Thanks to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded. Again test both TSO and USO. Do not test UFO for the above reason, and do not test UDP tunnel offload.
GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be CHECKSUM_NONE since commit 10154dbded6d6 ("udp: Allow GSO transmit from devices with no checksum offload"), but then still these fields are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no need to test for ip_summed == CHECKSUM_PARTIAL first.
This revises an existing fix mentioned in the Fixes tag, which broke small packets with GSO offload, as detected by kselftests.
Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871 Link: https://lore.kernel.org/netdev/20240723223109.2196886-1-kuba@kernel.org Fixes: e269d79c7d35 ("net: missing check virtio") Cc: stable@vger.kernel.org Signed-off-by: Willem de Bruijn willemb@google.com Link: https://patch.msgid.link/20240729201108.1615114-1-willemdebruijn.kernel@gmai... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/virtio_net.h | 16 +++++----------- net/ipv4/tcp_offload.c | 3 +++ net/ipv4/udp_offload.c | 4 ++++ 3 files changed, 12 insertions(+), 11 deletions(-)
--- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -51,7 +51,6 @@ static inline int virtio_net_hdr_to_skb( unsigned int thlen = 0; unsigned int p_off = 0; unsigned int ip_proto; - u64 ret, remainder, gso_size;
if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { @@ -88,16 +87,6 @@ static inline int virtio_net_hdr_to_skb( u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset); u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16));
- if (hdr->gso_size) { - gso_size = __virtio16_to_cpu(little_endian, hdr->gso_size); - ret = div64_u64_rem(skb->len, gso_size, &remainder); - if (!(ret && (hdr->gso_size > needed) && - ((remainder > needed) || (remainder == 0)))) { - return -EINVAL; - } - skb_shinfo(skb)->tx_flags |= SKBFL_SHARED_FRAG; - } - if (!pskb_may_pull(skb, needed)) return -EINVAL;
@@ -170,6 +159,11 @@ retry: if (gso_type != SKB_GSO_UDP_L4) return -EINVAL; break; + case SKB_GSO_TCPV4: + case SKB_GSO_TCPV6: + if (skb->csum_offset != offsetof(struct tcphdr, check)) + return -EINVAL; + break; }
/* Kernel has a special handling for GSO_BY_FRAGS. */ --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -72,6 +72,9 @@ struct sk_buff *tcp_gso_segment(struct s if (thlen < sizeof(*th)) goto out;
+ if (unlikely(skb_checksum_start(skb) != skb_transport_header(skb))) + goto out; + if (!pskb_may_pull(skb, thlen)) goto out;
--- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -277,6 +277,10 @@ struct sk_buff *__udp_gso_segment(struct if (gso_skb->len <= sizeof(*uh) + mss) return ERR_PTR(-EINVAL);
+ if (unlikely(skb_checksum_start(gso_skb) != + skb_transport_header(gso_skb))) + return ERR_PTR(-EINVAL); + if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) { /* Packet is from an untrusted source, reset gso_segs. */ skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
commit 3caf31e7b18a90b74a2709d761a0dfa423f2c2e4 upstream.
This documentation wasn't added in the original patch, add it now.
Reported-by: Stephen Rothwell sfr@canb.auug.org.au Fixes: 6e4c0d0460bd ("wifi: mac80211: add a workaround for receiving non-standard mesh A-MSDU") Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mac80211/sta_info.h | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/mac80211/sta_info.h +++ b/net/mac80211/sta_info.h @@ -621,6 +621,8 @@ struct link_sta_info { * taken from HT/VHT capabilities or VHT operating mode notification * @cparams: CoDel parameters for this station. * @reserved_tid: reserved TID (if any, otherwise IEEE80211_TID_UNRESERVED) + * @amsdu_mesh_control: track the mesh A-MSDU format used by the peer + * (-1: not yet known, 0: non-standard [without mesh header], 1: standard) * @fast_tx: TX fastpath information * @fast_rx: RX fastpath information * @tdls_chandef: a TDLS peer can have a wider chandef that is compatible to
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
commit f355f70145744518ca1d9799b42f4a8da9aa0d36 upstream.
If a packet has reached its intended destination, it was bumped to the code that accepts it, without first checking if a mesh_path needs to be created based on the discovered source. Fix this by moving the destination address check further down.
Cc: stable@vger.kernel.org Fixes: 986e43b19ae9 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces") Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230314095956.62085-3-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mac80211/rx.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)
--- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2770,17 +2770,6 @@ ieee80211_rx_mesh_data(struct ieee80211_ mesh_rmc_check(sdata, eth->h_source, mesh_hdr)) return RX_DROP_MONITOR;
- /* Frame has reached destination. Don't forward */ - if (ether_addr_equal(sdata->vif.addr, eth->h_dest)) - goto rx_accept; - - if (!ifmsh->mshcfg.dot11MeshForwarding) { - if (is_multicast_ether_addr(eth->h_dest)) - goto rx_accept; - - return RX_DROP_MONITOR; - } - /* forward packet */ if (sdata->crypto_tx_tailroom_needed_cnt) tailroom = IEEE80211_ENCRYPT_TAILROOM; @@ -2819,6 +2808,17 @@ ieee80211_rx_mesh_data(struct ieee80211_ rcu_read_unlock(); }
+ /* Frame has reached destination. Don't forward */ + if (ether_addr_equal(sdata->vif.addr, eth->h_dest)) + goto rx_accept; + + if (!ifmsh->mshcfg.dot11MeshForwarding) { + if (is_multicast_ether_addr(eth->h_dest)) + goto rx_accept; + + return RX_DROP_MONITOR; + } + skb_set_queue_mapping(skb, ieee802_1d_to_ac[skb->priority]);
ieee80211_fill_mesh_addresses(&hdr, &hdr.frame_control,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
commit 8f0149a8ac59c12cd47271ac625c27dac5621d3a upstream.
Linearize packets (needed for forwarding A-MSDU subframes).
Fixes: 986e43b19ae9 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces") Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230324120924.38412-1-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mac80211/rx.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2838,6 +2838,9 @@ ieee80211_rx_mesh_data(struct ieee80211_
if (skb_cow_head(fwd_skb, hdrlen - sizeof(struct ethhdr))) return RX_DROP_UNUSABLE; + + if (skb_linearize(fwd_skb)) + return RX_DROP_UNUSABLE; }
fwd_hdr = skb_push(fwd_skb, hdrlen - sizeof(struct ethhdr));
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
commit 899c2c11810cfe38cb01c847d0df98e181ea5728 upstream.
Adjust the network header to point at the correct payload offset
Fixes: 986e43b19ae9 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces") Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230324120924.38412-2-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mac80211/rx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2855,7 +2855,7 @@ ieee80211_rx_mesh_data(struct ieee80211_ hdrlen += ETH_ALEN; else fwd_skb->protocol = htons(fwd_skb->len - hdrlen); - skb_set_network_header(fwd_skb, hdrlen); + skb_set_network_header(fwd_skb, hdrlen + 2);
info = IEEE80211_SKB_CB(fwd_skb); memset(info, 0, sizeof(*info));
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
commit e26c0946a5c1aa4d27f8dfe78f2a72b4550df91f upstream.
When forwarding is set to 0, frames are typically sent with ttl=1. Move the ttl decrement check below the check for local receive in order to fix packet drops.
Reported-by: Thomas Hühn thomas.huehn@hs-nordhausen.de Reported-by: Nick Hainke vincent@systemli.org Fixes: 986e43b19ae9 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces") Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230326151709.17743-1-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mac80211/rx.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2774,14 +2774,6 @@ ieee80211_rx_mesh_data(struct ieee80211_ if (sdata->crypto_tx_tailroom_needed_cnt) tailroom = IEEE80211_ENCRYPT_TAILROOM;
- if (!--mesh_hdr->ttl) { - if (multicast) - goto rx_accept; - - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, dropped_frames_ttl); - return RX_DROP_MONITOR; - } - if (mesh_hdr->flags & MESH_FLAGS_AE) { struct mesh_path *mppath; char *proxied_addr; @@ -2812,6 +2804,14 @@ ieee80211_rx_mesh_data(struct ieee80211_ if (ether_addr_equal(sdata->vif.addr, eth->h_dest)) goto rx_accept;
+ if (!--mesh_hdr->ttl) { + if (multicast) + goto rx_accept; + + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, dropped_frames_ttl); + return RX_DROP_MONITOR; + } + if (!ifmsh->mshcfg.dot11MeshForwarding) { if (is_multicast_ether_addr(eth->h_dest)) goto rx_accept;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
commit 4d78e032fee5d532e189cdb2c3c76112094e9751 upstream.
These were unintentional copy&paste mistakes.
Cc: stable@vger.kernel.org Fixes: 986e43b19ae9 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces") Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230330090001.60750-1-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mac80211/rx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2904,7 +2904,7 @@ __ieee80211_rx_h_amsdu(struct ieee80211_ struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data; __le16 fc = hdr->frame_control; struct sk_buff_head frame_list; - static ieee80211_rx_result res; + ieee80211_rx_result res; struct ethhdr ethhdr; const u8 *check_da = ethhdr.h_dest, *check_sa = ethhdr.h_source;
@@ -3045,7 +3045,7 @@ ieee80211_rx_h_data(struct ieee80211_rx_ struct net_device *dev = sdata->dev; struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)rx->skb->data; __le16 fc = hdr->frame_control; - static ieee80211_rx_result res; + ieee80211_rx_result res; bool port_control; int err;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
commit a16fc38315f2c69c520ee769976ecb9c706b8560 upstream.
rx->sta->amsdu_mesh_control is being passed to ieee80211_amsdu_to_8023s without checking rx->sta. Since it doesn't make sense to accept A-MSDU packets without a sta, simply add a check earlier.
Fixes: 6e4c0d0460bd ("wifi: mac80211: add a workaround for receiving non-standard mesh A-MSDU") Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230330090001.60750-2-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mac80211/rx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2938,7 +2938,7 @@ __ieee80211_rx_h_amsdu(struct ieee80211_ data_offset, true)) return RX_DROP_UNUSABLE;
- if (rx->sta && rx->sta->amsdu_mesh_control < 0) { + if (rx->sta->amsdu_mesh_control < 0) { bool valid_std = ieee80211_is_valid_amsdu(skb, true); bool valid_nonstd = ieee80211_is_valid_amsdu(skb, false);
@@ -3014,7 +3014,7 @@ ieee80211_rx_h_amsdu(struct ieee80211_rx } }
- if (is_multicast_ether_addr(hdr->addr1)) + if (is_multicast_ether_addr(hdr->addr1) || !rx->sta) return RX_DROP_UNUSABLE;
if (rx->key) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
commit fec3ebb5ed299ac3a998f011c380f2ded47f4866 upstream.
Fix ethernet header length field after stripping the mesh header
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/CT5GNZSK28AI.2K6M69OXM9RW5@syracuse/ Fixes: 986e43b19ae9 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces") Reported-and-tested-by: Nicolas Escande nico.escande@gmail.com Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230711115052.68430-1-nbd@nbd.name Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/wireless/util.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -580,6 +580,8 @@ int ieee80211_strip_8023_mesh_hdr(struct hdrlen += ETH_ALEN + 2; else if (!pskb_may_pull(skb, hdrlen)) return -EINVAL; + else + payload.eth.h_proto = htons(skb->len - hdrlen);
mesh_addr = skb->data + sizeof(payload.eth) + ETH_ALEN; switch (payload.flags & MESH_FLAGS_AE) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
commit 52954b750958dcab9e44935f0c32643279091c85 upstream.
On a thawed filesystem, the freeze glock is held in shared mode. In order to initiate a cluster-wide freeze, the node initiating the freeze drops the freeze glock and grabs it in exclusive mode. The other nodes recognize this as contention on the freeze glock; function freeze_go_callback is invoked. This indicates to them that they must freeze the filesystem locally, drop the freeze glock, and then re-acquire it in shared mode before being able to unfreeze the filesystem locally.
While a node is trying to re-acquire the freeze glock in shared mode, additional contention can occur. In that case, the node must behave in the same way as above.
Unfortunately, freeze_go_callback() contains a check that causes it to bail out when the freeze glock isn't held in shared mode. Fix that to allow the glock to be unlocked or held in shared mode.
In addition, update a reference to trylock_super() which has been renamed to super_trylock_shared() in the meantime.
Fixes: b77b4a4815a9 ("gfs2: Rework freeze / thaw logic") Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/gfs2/glops.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -566,15 +566,16 @@ static void freeze_go_callback(struct gf struct super_block *sb = sdp->sd_vfs;
if (!remote || - gl->gl_state != LM_ST_SHARED || + (gl->gl_state != LM_ST_SHARED && + gl->gl_state != LM_ST_UNLOCKED) || gl->gl_demote_state != LM_ST_UNLOCKED) return;
/* * Try to get an active super block reference to prevent racing with - * unmount (see trylock_super()). But note that unmount isn't the only - * place where a write lock on s_umount is taken, and we can fail here - * because of things like remount as well. + * unmount (see super_trylock_shared()). But note that unmount isn't + * the only place where a write lock on s_umount is taken, and we can + * fail here because of things like remount as well. */ if (down_read_trylock(&sb->s_umount)) { atomic_inc(&sb->s_active);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
commit 0cdc6f44e9fdc2d20d720145bf99a39f611f6d61 upstream.
In gfs2_fill_super(), when mounting a gfs2 filesystem is interrupted, kthread_create() can return -EINTR. When that happens, we roll back what has already been done and abort the mount.
Since commit 62dd0f98a0e5 ("gfs2: Flag a withdraw if init_threads() fails), we are calling gfs2_withdraw_delayed() in gfs2_fill_super(); first via gfs2_make_fs_rw(), then directly. But gfs2_withdraw_delayed() only marks the filesystem as withdrawing and relies on a caller further up the stack to do the actual withdraw, which doesn't exist in the gfs2_fill_super() case. Because the filesystem is marked as withdrawing / withdrawn, function gfs2_lm_unmount() doesn't release the dlm lockspace, so when we try to mount that filesystem again, we get:
gfs2: fsid=gohan:gohan0: Trying to join cluster "lock_dlm", "gohan:gohan0" gfs2: fsid=gohan:gohan0: dlm_new_lockspace error -17
Since commit b77b4a4815a9 ("gfs2: Rework freeze / thaw logic"), the deadlock this gfs2_withdraw_delayed() call was supposed to work around cannot occur anymore because freeze_go_callback() won't take the sb->s_umount semaphore unconditionally anymore, so we can get rid of the gfs2_withdraw_delayed() in gfs2_fill_super() entirely.
Reported-by: Alexander Aring aahringo@redhat.com Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Cc: stable@vger.kernel.org # v6.5+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/gfs2/ops_fstype.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -1259,10 +1259,8 @@ static int gfs2_fill_super(struct super_
if (!sb_rdonly(sb)) { error = init_threads(sdp); - if (error) { - gfs2_withdraw_delayed(sdp); + if (error) goto fail_per_node; - } }
error = gfs2_freeze_lock_shared(sdp, &sdp->sd_freeze_gh, 0);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
commit 0b93bac2271e11beb980fca037a34a9819c7dc37 upstream.
The last user of this flag was removed in commit b77b4a4815a9 ("gfs2: Rework freeze / thaw logic").
Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/filesystems/gfs2-glocks.rst | 3 +-- fs/gfs2/glock.c | 23 ++++++----------------- fs/gfs2/glock.h | 9 --------- fs/gfs2/lock_dlm.c | 5 ----- 4 files changed, 7 insertions(+), 33 deletions(-)
--- a/Documentation/filesystems/gfs2-glocks.rst +++ b/Documentation/filesystems/gfs2-glocks.rst @@ -20,8 +20,7 @@ The gl_holders list contains all the que just the holders) associated with the glock. If there are any held locks, then they will be contiguous entries at the head of the list. Locks are granted in strictly the order that they -are queued, except for those marked LM_FLAG_PRIORITY which are -used only during recovery, and even then only for journal locks. +are queued.
There are three lock states that users of the glock layer can request, namely shared (SH), deferred (DF) and exclusive (EX). Those translate --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -661,8 +661,7 @@ static void finish_xmote(struct gfs2_glo if (gh && !test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags)) { /* move to back of queue and try next entry */ if (ret & LM_OUT_CANCELED) { - if ((gh->gh_flags & LM_FLAG_PRIORITY) == 0) - list_move_tail(&gh->gh_list, &gl->gl_holders); + list_move_tail(&gh->gh_list, &gl->gl_holders); gh = find_first_waiter(gl); gl->gl_target = gh->gh_state; goto retry; @@ -749,8 +748,7 @@ __acquires(&gl->gl_lockref.lock) gh && !(gh->gh_flags & LM_FLAG_NOEXP)) goto skip_inval;
- lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP | - LM_FLAG_PRIORITY); + lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP); GLOCK_BUG_ON(gl, gl->gl_state == target); GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target); if ((target == LM_ST_UNLOCKED || target == LM_ST_DEFERRED) && @@ -1528,27 +1526,20 @@ fail: } if (test_bit(HIF_HOLDER, &gh2->gh_iflags)) continue; - if (unlikely((gh->gh_flags & LM_FLAG_PRIORITY) && !insert_pt)) - insert_pt = &gh2->gh_list; } trace_gfs2_glock_queue(gh, 1); gfs2_glstats_inc(gl, GFS2_LKS_QCOUNT); gfs2_sbstats_inc(gl, GFS2_LKS_QCOUNT); if (likely(insert_pt == NULL)) { list_add_tail(&gh->gh_list, &gl->gl_holders); - if (unlikely(gh->gh_flags & LM_FLAG_PRIORITY)) - goto do_cancel; return; } list_add_tail(&gh->gh_list, insert_pt); -do_cancel: gh = list_first_entry(&gl->gl_holders, struct gfs2_holder, gh_list); - if (!(gh->gh_flags & LM_FLAG_PRIORITY)) { - spin_unlock(&gl->gl_lockref.lock); - if (sdp->sd_lockstruct.ls_ops->lm_cancel) - sdp->sd_lockstruct.ls_ops->lm_cancel(gl); - spin_lock(&gl->gl_lockref.lock); - } + spin_unlock(&gl->gl_lockref.lock); + if (sdp->sd_lockstruct.ls_ops->lm_cancel) + sdp->sd_lockstruct.ls_ops->lm_cancel(gl); + spin_lock(&gl->gl_lockref.lock); return;
trap_recursive: @@ -2296,8 +2287,6 @@ static const char *hflags2str(char *buf, *p++ = 'e'; if (flags & LM_FLAG_ANY) *p++ = 'A'; - if (flags & LM_FLAG_PRIORITY) - *p++ = 'p'; if (flags & LM_FLAG_NODE_SCOPE) *p++ = 'n'; if (flags & GL_ASYNC) --- a/fs/gfs2/glock.h +++ b/fs/gfs2/glock.h @@ -68,14 +68,6 @@ enum { * also be granted in SHARED. The preferred state is whichever is compatible * with other granted locks, or the specified state if no other locks exist. * - * LM_FLAG_PRIORITY - * Override fairness considerations. Suppose a lock is held in a shared state - * and there is a pending request for the deferred state. A shared lock - * request with the priority flag would be allowed to bypass the deferred - * request and directly join the other shared lock. A shared lock request - * without the priority flag might be forced to wait until the deferred - * requested had acquired and released the lock. - * * LM_FLAG_NODE_SCOPE * This holder agrees to share the lock within this node. In other words, * the glock is held in EX mode according to DLM, but local holders on the @@ -86,7 +78,6 @@ enum { #define LM_FLAG_TRY_1CB 0x0002 #define LM_FLAG_NOEXP 0x0004 #define LM_FLAG_ANY 0x0008 -#define LM_FLAG_PRIORITY 0x0010 #define LM_FLAG_NODE_SCOPE 0x0020 #define GL_ASYNC 0x0040 #define GL_EXACT 0x0080 --- a/fs/gfs2/lock_dlm.c +++ b/fs/gfs2/lock_dlm.c @@ -222,11 +222,6 @@ static u32 make_flags(struct gfs2_glock lkf |= DLM_LKF_NOQUEUEBAST; }
- if (gfs_flags & LM_FLAG_PRIORITY) { - lkf |= DLM_LKF_NOORDER; - lkf |= DLM_LKF_HEADQUE; - } - if (gfs_flags & LM_FLAG_ANY) { if (req == DLM_LOCK_PR) lkf |= DLM_LKF_ALTCW;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
commit bbacb395ac5c57290cdfd02389788cbce64c237e upstream.
Before commit b77b4a4815a9 ("gfs2: Rework freeze / thaw logic"), the freeze glock was kept around in the glock cache in shared mode without being actively held while a filesystem is in thawed state. In that state, memory pressure could have eventually evicted the freeze glock, and the freeze_go_demote_ok callback was needed to prevent that from happening.
With the freeze / thaw rework, the freeze glock is now always actively held in shared mode while a filesystem is thawed, and the freeze_go_demote_ok hack is no longer needed.
Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/gfs2/glops.c | 13 ------------- 1 file changed, 13 deletions(-)
--- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -613,18 +613,6 @@ static int freeze_go_xmote_bh(struct gfs }
/** - * freeze_go_demote_ok - * @gl: the glock - * - * Always returns 0 - */ - -static int freeze_go_demote_ok(const struct gfs2_glock *gl) -{ - return 0; -} - -/** * iopen_go_callback - schedule the dcache entry for the inode to be deleted * @gl: the glock * @remote: true if this came from a different cluster node @@ -748,7 +736,6 @@ const struct gfs2_glock_operations gfs2_
const struct gfs2_glock_operations gfs2_freeze_glops = { .go_xmote_bh = freeze_go_xmote_bh, - .go_demote_ok = freeze_go_demote_ok, .go_callback = freeze_go_callback, .go_type = LM_TYPE_NONDISK, .go_flags = GLOF_NONDISK,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
commit b128ed5ab27330deeeaf51ea8bb69f1442a96f7f upstream.
When assembling fraglist GSO packets, udp4_gro_complete does not set skb->csum_start, which makes the extra validation in __udp_gso_segment fail.
Fixes: 89add40066f9 ("net: drop bad gso csum_start and offset in virtio_net_hdr") Signed-off-by: Felix Fietkau nbd@nbd.name Reviewed-by: Willem de Bruijn willemb@google.com Link: https://patch.msgid.link/20240819150621.59833-1-nbd@nbd.name Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/udp_offload.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -278,7 +278,8 @@ struct sk_buff *__udp_gso_segment(struct return ERR_PTR(-EINVAL);
if (unlikely(skb_checksum_start(gso_skb) != - skb_transport_header(gso_skb))) + skb_transport_header(gso_skb) && + !(skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST))) return ERR_PTR(-EINVAL);
if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesse Brandeburg jesse.brandeburg@intel.com
commit 66ceaa4c4507f2b598d37b528796dd34158d31bf upstream.
make modules W=1 returns: .../ice/ice_txrx_lib.c:448: warning: Function parameter or member 'first_idx' not described in 'ice_finalize_xdp_rx' .../ice/ice_txrx.c:948: warning: Function parameter or member 'ntc' not described in 'ice_get_rx_buf' .../ice/ice_txrx.c:1038: warning: Excess function parameter 'rx_buf' description in 'ice_construct_skb'
Fix these warnings by adding and deleting the deviant arguments.
Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Fixes: d7956d81f150 ("ice: Pull out next_to_clean bump out of ice_put_rx_buf()") CC: Maciej Fijalkowski maciej.fijalkowski@intel.com Signed-off-by: Jesse Brandeburg jesse.brandeburg@intel.com Reviewed-by: Piotr Raczynski piotr.raczynski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -892,6 +892,7 @@ ice_reuse_rx_page(struct ice_rx_ring *rx * ice_get_rx_buf - Fetch Rx buffer and synchronize data for use * @rx_ring: Rx descriptor ring to transact packets on * @size: size of buffer to add to skb + * @ntc: index of next to clean element * * This function will pull an Rx buffer from the ring and synchronize it * for use by the CPU. @@ -973,7 +974,6 @@ ice_build_skb(struct ice_rx_ring *rx_rin /** * ice_construct_skb - Allocate skb and populate it * @rx_ring: Rx descriptor ring to transact packets on - * @rx_buf: Rx buffer to pull data from * @xdp: xdp_buff pointing to the data * * This function allocates an skb. It then populates it with the page --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -349,6 +349,7 @@ int ice_xmit_xdp_buff(struct xdp_buff *x * ice_finalize_xdp_rx - Bump XDP Tx tail and/or flush redirect map * @xdp_ring: XDP ring * @xdp_res: Result of the receive batch + * @first_idx: index to write from caller * * This function bumps XDP Tx tail and/or flush redirect map, and * should be called when a batch of packets has been processed in the
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Kleikamp dave.kleikamp@oracle.com
commit e42e29cc442395d62f1a8963ec2dfb700ba6a5d7 upstream.
This reverts commit cca974daeb6c43ea971f8ceff5a7080d7d49ee30.
The added sanity check is incorrect. BUDMIN is not the wrong value and is too small.
Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/jfs/jfs_dmap.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
--- a/fs/jfs/jfs_dmap.c +++ b/fs/jfs/jfs_dmap.c @@ -2765,9 +2765,7 @@ static int dbBackSplit(dmtree_t *tp, int * leafno - the number of the leaf to be updated. * newval - the new value for the leaf. * - * RETURN VALUES: - * 0 - success - * -EIO - i/o error + * RETURN VALUES: none */ static int dbJoin(dmtree_t *tp, int leafno, int newval, bool is_ctl) { @@ -2794,10 +2792,6 @@ static int dbJoin(dmtree_t *tp, int leaf * get the buddy size (number of words covered) of * the new value. */ - - if ((newval - tp->dmt_budmin) > BUDMIN) - return -EIO; - budsz = BUDSIZE(newval, tp->dmt_budmin);
/* try to join.
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuri Benditovich yuri.benditovich@daynix.com
commit 1382e3b6a3500c245e5278c66d210c02926f804f upstream.
The commit fc8b2a619469 ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation") adds check of potential number of UDP segments vs UDP_MAX_SEGMENTS in linux/virtio_net.h. After this change certification test of USO guest-to-guest transmit on Windows driver for virtio-net device fails, for example with packet size of ~64K and mss of 536 bytes. In general the USO should not be more restrictive than TSO. Indeed, in case of unreasonably small mss a lot of segments can cause queue overflow and packet loss on the destination. Limit of 128 segments is good for any practical purpose, with minimal meaningful mss of 536 the maximal UDP packet will be divided to ~120 segments. The number of segments for UDP packets is validated vs UDP_MAX_SEGMENTS also in udp.c (v4,v6), this does not affect quest-to-guest path but does affect packets sent to host, for example. It is important to mention that UDP_MAX_SEGMENTS is kernel-only define and not available to user mode socket applications. In order to request MSS smaller than MTU the applications just uses setsockopt with SOL_UDP and UDP_SEGMENT and there is no limitations on socket API level.
Fixes: fc8b2a619469 ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation") Signed-off-by: Yuri Benditovich yuri.benditovich@daynix.com Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/udp.h | 2 +- tools/testing/selftests/net/udpgso.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -102,7 +102,7 @@ struct udp_sock { #define udp_assign_bit(nr, sk, val) \ assign_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags, val)
-#define UDP_MAX_SEGMENTS (1 << 6UL) +#define UDP_MAX_SEGMENTS (1 << 7UL)
static inline struct udp_sock *udp_sk(const struct sock *sk) { --- a/tools/testing/selftests/net/udpgso.c +++ b/tools/testing/selftests/net/udpgso.c @@ -34,7 +34,7 @@ #endif
#ifndef UDP_MAX_SEGMENTS -#define UDP_MAX_SEGMENTS (1 << 6UL) +#define UDP_MAX_SEGMENTS (1 << 7UL) #endif
#define CONST_MTU_TEST 1500
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit a71d0908e32f3dd41e355d83eeadd44d94811fd6 upstream.
The helper waiting for a listener port can match any socket whose hexadecimal representation of source or destination addresses matches that of the given port.
Additionally, any socket state is accepted.
All the above can let the helper return successfully before the relevant listener is actually ready, with unexpected results.
So far I could not find any related failure in the netdev CI, but the next patch is going to make the critical event more easily reproducible.
Address the issue matching the port hex only vs the relevant socket field and additionally checking the socket state for TCP sockets.
Fixes: 3bdd9fd29cb0 ("selftests/net: synchronize udpgro tests' tx and rx connection") Signed-off-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/192b3dbc443d953be32991d1b0ca432bd4c65008.170773108... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/net_helper.sh | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/tools/testing/selftests/net/net_helper.sh +++ b/tools/testing/selftests/net/net_helper.sh @@ -8,13 +8,16 @@ wait_local_port_listen() local listener_ns="${1}" local port="${2}" local protocol="${3}" - local port_hex + local pattern local i
- port_hex="$(printf "%04X" "${port}")" + pattern=":$(printf "%04X" "${port}") " + + # for tcp protocol additionally check the socket state + [ ${protocol} = "tcp" ] && pattern="${pattern}0A" for i in $(seq 10); do - if ip netns exec "${listener_ns}" cat /proc/net/"${protocol}"* | \ - grep -q "${port_hex}"; then + if ip netns exec "${listener_ns}" awk '{print $2" "$4}' \ + /proc/net/"${protocol}"* | grep -q "${pattern}"; then break fi sleep 0.1
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 99d3bf5f7377d42f8be60a6b9cb60fb0be34dceb upstream.
syzbot is reporting too large allocation at input_mt_init_slots(), for num_slots is supplied from userspace using ioctl(UI_DEV_CREATE).
Since nobody knows possible max slots, this patch chose 1024.
Reported-by: syzbot syzbot+0122fa359a69694395d5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0122fa359a69694395d5 Suggested-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Cc: George Kennedy george.kennedy@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/input-mt.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/input/input-mt.c +++ b/drivers/input/input-mt.c @@ -46,6 +46,9 @@ int input_mt_init_slots(struct input_dev return 0; if (mt) return mt->num_slots != num_slots ? -EINVAL : 0; + /* Arbitrary limit for avoiding too large memory allocation. */ + if (num_slots > 1024) + return -EINVAL;
mt = kzalloc(struct_size(mt, slots, num_slots), GFP_KERNEL); if (!mt)
On 8/27/24 07:35, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.107-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Loads and loads of the below warning plus linking failure building perf:
../include/linux/bitmap.h: In function 'bitmap_zero': ../include/linux/bitmap.h:28:34: warning: implicit declaration of function 'ALIGN' [-Wimplicit-function-declaration] 28 | #define bitmap_size(nbits) (ALIGN(nbits, BITS_PER_LONG) / BITS_PER_BYTE) | ^~~~~ ../include/linux/bitmap.h:35:32: note: in expansion of macro 'bitmap_size' 35 | memset(dst, 0, bitmap_size(nbits)); | ^~~~~~~~~~~ CC /local/users/fainelli/buildroot/output/arm/build/li
/local/stbopt_p/toolchains_303/stbgcc-12.3-1.0/bin/../lib/gcc/arm-unknown-linux-gnueabihf/12.3.0/../../../../arm-unknown-linux-gnueabihf/bin/ld: /local/users/fainelli/buildroot/output/arm/build/linux-custom/tools/perf/perf-in.o: in function `bitmap_zero': /local/users/fainelli/buildroot/output/arm/build/linux-custom/tools/perf/../include/linux/bitmap.h:35: undefined reference to `ALIGN' /local/stbopt_p/toolchains_303/stbgcc-12.3-1.0/bin/../lib/gcc/arm-unknown-linux-gnueabihf/12.3.0/../../../../arm-unknown-linux-gnueabihf/bin/ld: /local/users/fainelli/buildroot/output/arm/build/linux-custom/tools/perf/perf-in.o: in function `bitmap_zalloc': /local/users/fainelli/buildroot/output/arm/build/linux-custom/tools/perf/../include/linux/bitmap.h:121: undefined reference to `ALIGN' /local/stbopt_p/toolchains_303/stbgcc-12.3-1.0/bin/../lib/gcc/arm-unknown-linux-gnueabihf/12.3.0/../../../../arm-unknown-linux-gnueabihf/bin/ld: /local/users/fainelli/buildroot/output/arm/build/linux-custom/tools/perf/../include/linux/bitmap.h:121: undefined reference to `ALIGN' /local/stbopt_p/toolchains_303/stbgcc-12.3-1.0/bin/../lib/gcc/arm-unknown-linux-gnueabihf/12.3.0/../../../../arm-unknown-linux-gnueabihf/bin/ld: /local/users/fainelli/buildroot/output/arm/build/linux-custom/tools/perf/../include/linux/bitmap.h:121: undefined reference to `ALIGN' /local/stbopt_p/toolchains_303/stbgcc-12.3-1.0/bin/../lib/gcc/arm-unknown-linux-gnueabihf/12.3.0/../../../../arm-unknown-linux-gnueabihf/bin/ld: /local/users/fainelli/buildroot/output/arm/build/linux-custom/tools/perf/../include/linux/bitmap.h:121: undefined reference to `ALIGN' /local/stbopt_p/toolchains_303/stbgcc-12.3-1.0/bin/../lib/gcc/arm-unknown-linux-gnueabihf/12.3.0/../../../../arm-unknown-linux-gnueabihf/bin/ld: /local/users/fainelli/buildroot/output/arm/build/linux-custom/tools/perf/perf-in.o:/local/users/fainelli/buildroot/output/arm/build/linux-custom/tools/perf/../include/linux/bitmap.h:121: more undefined references to `ALIGN' follow collect2: error: ld returned 1 exit status make[4]: *** [Makefile.perf:675: /local/users/fainelli/buildroot/output/arm/build/linux-custom/tools/perf/perf] Error 1 make[3]: *** [Makefile.perf:240: sub-make] Error 2 make[2]: *** [Makefile:70: all] Error 2 make[1]: *** [package/pkg-generic.mk:294: /local/users/fainelli/buildroot/output/arm/build/linux-tools/.stamp_built] Error 2 make: *** [Makefile:29: _all] Error 2
This was already reported here for 5.10.225-rc1:
https://lore.kernel.org/all/96ad9bc6-1cac-4ebf-8385-661c0d4f1b99@gmail.com/
and it is the same root cause. -- Florian
On Tue, 27 Aug 2024 at 20:47, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.107-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
The tinyconfig builds failed due to following build warnings / errors on the stable-rc linux-6.1.y and linux-6.6.y
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Build error: ------- In file included from /kernel/rcu/update.c:49: /kernel/rcu/rcu.h: In function 'debug_rcu_head_callback': /kernel/rcu/rcu.h:218:17: error: implicit declaration of function 'kmem_dump_obj'; did you mean 'mem_dump_obj'? [-Werror=implicit-function-declaration] 218 | kmem_dump_obj(rhp); | ^~~~~~~~~~~~~ | mem_dump_obj cc1: some warnings being treated as errors
Build log links, ------ - https://storage.tuxsuite.com/public/linaro/lkft/builds/2lFQ5BrEWOun7OOelPa1R...
metadata: ---- git describe: v6.1.106-322-gb9218327d235 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git sha: b9218327d235d21e2e82c8dc6a8ef4a389c9c6a6 kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2lFQ5BrEWOun7OOelPa1R... build url: https://storage.tuxsuite.com/public/linaro/lkft/builds/2lFQ5BrEWOun7OOelPa1R... toolchain: clang-18 and gcc-13 config: tinyconfig
steps to reproduce: ------ # tuxmake --runtime podman --target-arch arm --toolchain gcc-13 --kconfig tinyconfig
-- Linaro LKFT https://lkft.linaro.org
On Tue, Aug 27, 2024 at 11:37:42PM +0530, Naresh Kamboju wrote:
On Tue, 27 Aug 2024 at 20:47, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.107-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
The tinyconfig builds failed due to following build warnings / errors on the stable-rc linux-6.1.y and linux-6.6.y
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Build error:
In file included from /kernel/rcu/update.c:49: /kernel/rcu/rcu.h: In function 'debug_rcu_head_callback': /kernel/rcu/rcu.h:218:17: error: implicit declaration of function 'kmem_dump_obj'; did you mean 'mem_dump_obj'? [-Werror=implicit-function-declaration] 218 | kmem_dump_obj(rhp); | ^~~~~~~~~~~~~ | mem_dump_obj cc1: some warnings being treated as errors
Also now fixed here, thanks.
greg k-h
Hello,
On Tue, 27 Aug 2024 16:35:08 +0200 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.107-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] b9218327d235 ("Linux 6.1.107-rc1")
Thanks, SJ
[...]
---
ok 1 selftests: damon: debugfs_attrs.sh ok 2 selftests: damon: debugfs_schemes.sh ok 3 selftests: damon: debugfs_target_ids.sh ok 4 selftests: damon: debugfs_empty_targets.sh ok 5 selftests: damon: debugfs_huge_count_read_write.sh ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: sysfs.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_arm64.sh ok 12 selftests: damon-tests: build_m68k.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m
Am 27.08.2024 um 16:35 schrieb Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Builds, boots and works on my 2-socket Ivy Bridge Xeon E5-2697 v2 server. No dmesg oddities or regressions found.
Tested-by: Peter Schneider pschneider1968@googlemail.com
Beste Grüße, Peter Schneider
Hi!
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-6...
6.6 and 6.10 test ok too, for us.
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
On Tue, Aug 27, 2024 at 04:35:08PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Mark Brown broonie@kernel.org
On Tue, 27 Aug 2024 at 20:47, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.107-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
The tinyconfig builds failed for all architectures on 6.1.107-rc1.
Builds: - clang-18-tinyconfig - clang-nightly-tinyconfig - gcc-13-tinyconfig - gcc-8-tinyconfig
lore links: - https://lore.kernel.org/stable/CA+G9fYuS47-zRgv9GY3XO54GN_4EHPFe7jGR50ZoChEY...
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.1.107-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git commit: b9218327d235d21e2e82c8dc6a8ef4a389c9c6a6 * git describe: v6.1.106-322-gb9218327d235 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/build/v6.1.10...
## Test Regressions (compared to v6.1.105-39-g09ce23af4dbb) * arm64, build * arm, build * i386, build * x86_64, build - clang-18-tinyconfig - clang-nightly-tinyconfig - gcc-13-tinyconfig - gcc-8-tinyconfig
## Metric Regressions (compared to v6.1.105-39-g09ce23af4dbb)
## Test Fixes (compared to v6.1.105-39-g09ce23af4dbb)
## Metric Fixes (compared to v6.1.105-39-g09ce23af4dbb)
## Test result summary total: 141679, pass: 122248, fail: 2025, skip: 17193, xfail: 213
## Build Summary * arc: 5 total, 4 passed, 1 failed * arm: 135 total, 131 passed, 4 failed * arm64: 41 total, 37 passed, 4 failed * i386: 28 total, 23 passed, 5 failed * mips: 26 total, 21 passed, 5 failed * parisc: 4 total, 3 passed, 1 failed * powerpc: 36 total, 21 passed, 15 failed * riscv: 11 total, 8 passed, 3 failed * s390: 14 total, 10 passed, 4 failed * sh: 10 total, 8 passed, 2 failed * sparc: 7 total, 5 passed, 2 failed * x86_64: 33 total, 29 passed, 4 failed
## Test suites summary * boot * commands * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-kcmp * kselftest-kvm * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-mincore * kselftest-mqueue * kselftest-net * kselftest-net-mptcp * kselftest-openat2 * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user_events * kselftest-vDSO * kselftest-x86 * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-smoke * ltp-syscalls * ltp-tracing * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
On Tue, 27 Aug 2024 16:35:08 +0200 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.107-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
Boot-tested under QEMU for Rust x86_64.
Tested-by: Miguel Ojeda ojeda@kernel.org
Thanks!
Cheers, Miguel
On 8/27/24 7:35 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.107-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
Hi!
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
I believe these have problems:
Jakub Kicinski kuba@kernel.org net: don't dump stack on queue timeout
This does not fix bug and will cause surprises for people parsing logs, as changelog explains
Christian Brauner brauner@kernel.org binfmt_misc: cleanup on filesystem umount
Changelog explains how this may cause problems. It does not fix a bug. It is overly long. It does not have proper signoff by stable team.
Li Nan linan122@huawei.com md: clean up invalid BUG_ON in md_ioctl
Cleanup, not a bugfix.
David Sterba dsterba@suse.com btrfs: change BUG_ON to assertion in tree_move_down()
Cleanup, not a bugfix.
David Sterba dsterba@suse.com btrfs: delete pointless BUG_ON check on quota root in btrfs_qgroup_account_extent()
Cleanup, not a bugfix.
Guanrui Huang guanrui.huang@linux.alibaba.com irqchip/gic-v3-its: Remove BUG_ON in its_vpe_irq_domain_alloc
Cleanup, not a bugfix.
Best regards, Pavel
On Thu, Aug 29, 2024 at 01:37:40PM +0200, Pavel Machek wrote:
Christian Brauner brauner@kernel.org binfmt_misc: cleanup on filesystem umount
Changelog explains how this may cause problems. It does not fix a bug. It is overly long. It does not have proper signoff by stable team.
The sign off is there, it's just further down than you might expect.
On Thu 2024-08-29 13:52:59, Greg Kroah-Hartman wrote:
On Thu, Aug 29, 2024 at 01:37:40PM +0200, Pavel Machek wrote:
Christian Brauner brauner@kernel.org binfmt_misc: cleanup on filesystem umount
Changelog explains how this may cause problems. It does not fix a bug. It is overly long. It does not have proper signoff by stable team.
The sign off is there, it's just further down than you might expect.
Is it? Who signed this off for stable?
cf7602cbd58246d02a8544e4f107658fe846137a
In line with our general policy, if we see a regression for systemd or other users with this change we will switch back to the old behavior for the initial binfmt_misc mount and have binary types pin the filesystem again. But while we touch this code let's take the chance and let's improve on the status quo.
[1]: https://lore.kernel.org/r/20191216091220.465626-2-laurent@vivier.eu [2]: commit 43a4f2619038 ("exec: binfmt_misc: fix race between load_misc_binary() and kill_node()" [3]: commit 83f918274e4b ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()") [4]: commit f0fe2c0f050d ("binder: prevent UAF for binderfs devices II")
Link: https://lore.kernel.org/r/20211028103114.2849140-1-brauner@kernel.org (v1) Cc: Sargun Dhillon sargun@sargun.me Cc: Serge Hallyn serge@hallyn.com Cc: Jann Horn jannh@google.com Cc: Henning Schild henning.schild@siemens.com Cc: Andrei Vagin avagin@gmail.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Laurent Vivier laurent@vivier.eu Cc: linux-fsdevel@vger.kernel.org Acked-by: Serge Hallyn serge@hallyn.com Signed-off-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Kees Cook keescook@chromium.org
Regards, Pavel
On Thu, Aug 29, 2024 at 02:11:57PM +0200, Pavel Machek wrote:
On Thu 2024-08-29 13:52:59, Greg Kroah-Hartman wrote:
On Thu, Aug 29, 2024 at 01:37:40PM +0200, Pavel Machek wrote:
Christian Brauner brauner@kernel.org binfmt_misc: cleanup on filesystem umount
Changelog explains how this may cause problems. It does not fix a bug. It is overly long. It does not have proper signoff by stable team.
The sign off is there, it's just further down than you might expect.
Is it? Who signed this off for stable?
cf7602cbd58246d02a8544e4f107658fe846137a
In line with our general policy, if we see a regression for systemd or other users with this change we will switch back to the old behavior for the initial binfmt_misc mount and have binary types pin the filesystem again. But while we touch this code let's take the chance and let's improve on the status quo.
[1]: https://lore.kernel.org/r/20191216091220.465626-2-laurent@vivier.eu [2]: commit 43a4f2619038 ("exec: binfmt_misc: fix race between load_misc_binary() and kill_node()" [3]: commit 83f918274e4b ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()") [4]: commit f0fe2c0f050d ("binder: prevent UAF for binderfs devices II") Link: https://lore.kernel.org/r/20211028103114.2849140-1-brauner@kernel.org (v1) Cc: Sargun Dhillon sargun@sargun.me Cc: Serge Hallyn serge@hallyn.com Cc: Jann Horn jannh@google.com Cc: Henning Schild henning.schild@siemens.com Cc: Andrei Vagin avagin@gmail.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Laurent Vivier laurent@vivier.eu Cc: linux-fsdevel@vger.kernel.org Acked-by: Serge Hallyn serge@hallyn.com Signed-off-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Kees Cook keescook@chromium.org
If you look at the actual patch in our tree, it shows this, and was in the original email.
Yes, git stripped it off here, but really, you should be saying "Hey, something looks wrong here, the patch has it but the git commit does not", which would have been a lot more helpful...
Anyway, I'll go fix this up in the quilt tree now, thanks.
greg k-h
On Thu, Aug 29, 2024 at 01:37:40PM +0200, Pavel Machek wrote:
Hi!
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
I believe these have problems:
Jakub Kicinski kuba@kernel.org net: don't dump stack on queue timeout
This does not fix bug and will cause surprises for people parsing logs, as changelog explains
And as the changelog explains, this fixes one of the most reported syzbot issues around, and as such, qualifies for a stable tree inclusion. Please be more through in your reviews. Same goes for the others "reported" here.
greg k-h
---- On Tue, 27 Aug 2024 20:05:08 +0530 Greg Kroah-Hartman wrote ---
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.107-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
Hi,
Please find the KernelCI report below :-
OVERVIEW
Builds: 25 passed, 2 failed
Boot tests: 412 passed, 0 failed
CI systems: broonie, maestro
REVISION
Commit name: v6.1.106-322-gb9218327d235 hash: b9218327d235d21e2e82c8dc6a8ef4a389c9c6a6 Checked out from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
BUILDS
Failures -i386 (tinyconfig) Build detail: https://kcidb.kernelci.org/d/build/build?orgId=1&var-id=maestro:66cdf1ff... Build error: kernel/rcu/rcu.h:218:17: error: implicit declaration of function ‘kmem_dump_obj’; did you mean ‘mem_dump_obj’? [-Werror=implicit-function-declaration] -x86_64 (tinyconfig) Build detail: https://kcidb.kernelci.org/d/build/build?orgId=1&var-id=maestro:66cdf200... Build error: kernel/rcu/rcu.h:218:17: error: implicit declaration of function ‘kmem_dump_obj’; did you mean ‘mem_dump_obj’? [-Werror=implicit-function-declaration] CI system: maestro
BOOT TESTS
No new boot failures found
See complete and up-to-date report at:
https://kcidb.kernelci.org/d/revision/revision?orgId=1&var-git_commit_ha...
Tested-by: kernelci.org bot bot@kernelci.org
Thanks, KernelCI team
On 8/27/24 07:35, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
[ ... ]
Abdulrasaq Lawani abdulrasaqolawani@gmail.com fbdev: offb: replace of_node_put with __free(device_node)
This patch triggers:
Building powerpc:defconfig ... failed Building powerpc:ppc64e_defconfig ... failed Building powerpc:ppc6xx_defconfig ... failed -------------- Error log: drivers/video/fbdev/offb.c: In function 'offb_init_palette_hacks': drivers/video/fbdev/offb.c:358:24: error: cleanup argument not a function 358 | struct device_node *pciparent __free(device_node) = of_get_parent(dp); | ^~~~~~~~~~~ make[5]: *** [scripts/Makefile.build:250: drivers/video/fbdev/offb.o] Error 1
Guenter
On Sat, Aug 31, 2024 at 02:19:12PM -0700, Guenter Roeck wrote:
On 8/27/24 07:35, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.107 release. There are 321 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
[ ... ]
Abdulrasaq Lawani abdulrasaqolawani@gmail.com fbdev: offb: replace of_node_put with __free(device_node)
This patch triggers:
Building powerpc:defconfig ... failed Building powerpc:ppc64e_defconfig ... failed Building powerpc:ppc6xx_defconfig ... failed
Error log: drivers/video/fbdev/offb.c: In function 'offb_init_palette_hacks': drivers/video/fbdev/offb.c:358:24: error: cleanup argument not a function 358 | struct device_node *pciparent __free(device_node) = of_get_parent(dp); | ^~~~~~~~~~~ make[5]: *** [scripts/Makefile.build:250: drivers/video/fbdev/offb.o] Error 1
Thanks for the report, should now be fixed up.
greg k-h
linux-stable-mirror@lists.linaro.org