This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.13-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.4.13-rc1
Florian Westphal fw@strlen.de netfilter: nf_tables: fix kdoc warnings after gc rework
Günther Noack gnoack3000@gmail.com TIOCSTI: Document CAP_SYS_ADMIN behaviour in Kconfig
Arnd Bergmann arnd@arndb.de ASoC: amd: vangogh: select CONFIG_SND_AMD_ACP_CONFIG
Liam R. Howlett Liam.Howlett@oracle.com maple_tree: disable mas_wr_append() when other readers are possible
Mario Limonciello mario.limonciello@amd.com ASoC: amd: yc: Fix a non-functional mic on Lenovo 82SJ
Bartosz Golaszewski bartosz.golaszewski@linaro.org gpio: sim: pass the GPIO device's software node to irq domain
Bartosz Golaszewski bartosz.golaszewski@linaro.org gpio: sim: dispose of irq mappings before destroying the irq_sim domain
Rob Clark robdclark@chromium.org dma-buf/sw_sync: Avoid recursive lock during fence signal
Biju Das biju.das.jz@bp.renesas.com pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
Biju Das biju.das.jz@bp.renesas.com pinctrl: renesas: rzv2m: Fix NULL pointer dereference in rzv2m_dt_subnode_to_map()
Biju Das biju.das.jz@bp.renesas.com pinctrl: renesas: rzg2l: Fix NULL pointer dereference in rzg2l_dt_subnode_to_map()
Maciej Strozek mstrozek@opensource.cirrus.com ASoC: cs35l56: Read firmware uuid from a device property instead of _SUB
Chao Song chao.song@linux.intel.com ASoC: SOF: ipc4-pcm: fix possible null pointer deference
Biju Das biju.das.jz@bp.renesas.com clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
Zhu Wang wangzhu9@huawei.com scsi: core: raid_class: Remove raid_component_add()
Neil Armstrong neil.armstrong@linaro.org scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW major version > 5
Zhu Wang wangzhu9@huawei.com scsi: snic: Fix double free in snic_tgt_create()
Yin Fengwei fengwei.yin@intel.com madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
Yin Fengwei fengwei.yin@intel.com madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
Matt Roper matthew.d.roper@intel.com drm/i915: Fix error handling if driver creation fails during probe
Oliver Hartkopp socketcan@hartkopp.net can: raw: add missing refcount for memory leak fix
Sanjay R Mehta sanju.mehta@amd.com thunderbolt: Fix Thunderbolt 3 display flickering issue on 2nd hot plug onwards
Igor Mammedov imammedo@redhat.com PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus
Wei Chen harperchen1110@gmail.com media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
Mario Limonciello mario.limonciello@amd.com pinctrl: amd: Mask wake bits on probe again
Rob Herring robh@kernel.org of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock
Rob Herring robh@kernel.org of: unittest: Fix EXPECT for parse_phandle_with_args_map() test
Arnd Bergmann arnd@arndb.de radix tree: remove unused variable
Mingzheng Xing xingmingzheng@iscas.ac.cn riscv: Fix build errors using binutils2.37 toolchains
Mingzheng Xing xingmingzheng@iscas.ac.cn riscv: Handle zicsr/zifencei issue between gcc and binutils
Helge Deller deller@gmx.de lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
Hans de Goede hdegoede@redhat.com ACPI: resource: Fix IRQ override quirk for PCSpecialist Elimina Pro 16 M
Sven Eckelmann sven@narfation.org batman-adv: Hold rtnl lock during MTU update via netlink
Remi Pommarel repk@triplefau.lt batman-adv: Fix batadv_v_ogm_aggr_send memory leak
Remi Pommarel repk@triplefau.lt batman-adv: Fix TT global entry leak when client roamed back
Remi Pommarel repk@triplefau.lt batman-adv: Do not get eth header before batadv_check_management_packet
Sven Eckelmann sven@narfation.org batman-adv: Don't increase MTU when set by user
Sven Eckelmann sven@narfation.org batman-adv: Trigger events for auto adjusted MTU
Christian Göttsche cgzones@googlemail.com selinux: set next pointer before attaching to list
Benjamin Coddington bcodding@redhat.com nfsd: Fix race to FREE_STATEID and cl_revoked
Trond Myklebust trond.myklebust@hammerspace.com NFS: Fix a use after free in nfs_direct_join_group()
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()
T.J. Mercier tjmercier@google.com mm: multi-gen LRU: don't spin during memcg release
Miaohe Lin linmiaohe@huawei.com mm: memory-failure: fix unexpected return value in soft_offline_page()
Alexandre Ghiti alexghiti@rivosinc.com mm: add a call to flush_cache_vmap() in vmap_pfn()
Dietmar Eggemann dietmar.eggemann@arm.com cgroup/cpuset: Free DL BW in case can_attach() fails
Dietmar Eggemann dietmar.eggemann@arm.com sched/deadline: Create DL BW alloc, free & check overflow interface
Juri Lelli juri.lelli@redhat.com cgroup/cpuset: Iterate only if DEADLINE tasks are present
Juri Lelli juri.lelli@redhat.com sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
Juri Lelli juri.lelli@redhat.com sched/cpuset: Bring back cpuset_mutex
Juri Lelli juri.lelli@redhat.com cgroup/cpuset: Rename functions dealing with DEADLINE accounting
Jani Nikula jani.nikula@intel.com drm/i915: fix display probe for IVB Q and IVB D GT2 server
Matt Roper matthew.d.roper@intel.com drm/i915/display: Handle GMD_ID identification in display code
Feng Tang feng.tang@intel.com x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
Rick Edgecombe rick.p.edgecombe@intel.com x86/fpu: Invalidate FPU state correctly on exec()
Huacai Chen chenhuacai@kernel.org LoongArch: Fix hw_breakpoint_control() for watchpoints
Imre Deak imre.deak@intel.com drm/i915: Fix HPD polling, reenabling the output poll work as needed
Ankit Nautiyal ankit.k.nautiyal@intel.com drm/display/dp: Fix the DP DSC Receiver cap size
Anshuman Gupta anshuman.gupta@intel.com drm/i915/dgfx: Enable d3cold at s2idle
David Michael fedora.dm0@gmail.com drm/panfrost: Skip speed binning on EOPNOTSUPP
Imre Deak imre.deak@intel.com drm: Add an HPD poll helper to reschedule the poll work
Zack Rusin zackr@vmware.com drm/vmwgfx: Fix possible invalid drm gem put calls
Zack Rusin zackr@vmware.com drm/vmwgfx: Fix shader stage validation
David Hildenbrand david@redhat.com mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
David Hildenbrand david@redhat.com mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
Suren Baghdasaryan surenb@google.com mm: enable page walking API to lock vmas during the walk
Ayush Jain ayush.jain3@amd.com selftests/mm: FOLL_LONGTERM need to be updated to 0x100
Takashi Iwai tiwai@suse.de ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
Hugh Dickins hughd@google.com shmem: fix smaps BUG sleeping while atomic
Rik van Riel riel@surriel.com mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer
Andrey Skvortsov andrej.skvortzov@gmail.com clk: Fix slab-out-of-bounds error in devm_clk_release()
Benjamin Coddington bcodding@redhat.com NFSv4: Fix dropped lock for racing OPEN and delegation return
André Apitzsch git@apitzsch.eu platform/x86: ideapad-laptop: Add support for new hotkeys found on ThinkBook 14s Yoga ITL
Swapnil Devesh me@sidevesh.com platform/x86: lenovo-ymc: Add Lenovo Yoga 7 14ACN6 to ec_trigger_quirk_dmi_table
Ping-Ke Shih pkshih@realtek.com wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
Michael Ellerman mpe@ellerman.id.au ibmveth: Use dcbf rather than dcbfl
Srinivas Goud srinivas.goud@amd.com spi: spi-cadence: Fix data corruption issues in slave mode
Charles Keepax ckeepax@opensource.cirrus.com ASoC: cs35l41: Correct amp_gain_tlv values
BrenoRCBrito brenorcbrito@gmail.com ASoC: amd: yc: Add VivoBook Pro 15 to quirks list for acp6x
Hangbin Liu liuhangbin@gmail.com bonding: fix macvlan over alb bond support
Ido Schimmel idosch@nvidia.com rtnetlink: Reject negative ifindexes in RTM_NEWLINK
Florian Westphal fw@strlen.de netfilter: nf_tables: defer gc run if previous batch is still pending
Florian Westphal fw@strlen.de netfilter: nf_tables: fix out of memory error handling
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: use correct lock to protect gc_list
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: GC transaction race with abort path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: flush pending destroy work before netlink notifier
Florian Westphal fw@strlen.de netfilter: nf_tables: validate all pending tables
Andrii Staikov andrii.staikov@intel.com i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
Jamal Hadi Salim jhs@mojatatu.com net/sched: fix a qdisc modification with ambiguous command request
Sasha Neftin sasha.neftin@intel.com igc: Fix the typo in the PTM Control macro
Alessio Igor Bogani alessio.bogani@elettra.eu igb: Avoid starting unnecessary workqueues
Oliver Hartkopp socketcan@hartkopp.net can: isotp: fix support for transmission of SF without flow control
Daniel Golle daniel@makrotopia.org net: ethernet: mtk_eth_soc: fix NULL pointer on hw reset
Kees Cook keescook@chromium.org tg3: Use slab_build_skb() when needed
Hangbin Liu liuhangbin@gmail.com selftests: bonding: do not set port down before adding to bond
Petr Oros poros@redhat.com ice: Fix NULL pointer deref during VF reset
Petr Oros poros@redhat.com Revert "ice: Fix ice VF reset during iavf initialization"
Jesse Brandeburg jesse.brandeburg@intel.com ice: fix receive buffer size miscalculation
Eric Dumazet edumazet@google.com ipv4: fix data-races around inet->inet_id
Jakub Kicinski kuba@kernel.org net: validate veth and vxcan peer ifindexes
Ruan Jinjie ruanjinjie@huawei.com net: bcmgenet: Fix return value check for fixed_phy_register()
Ruan Jinjie ruanjinjie@huawei.com net: bgmac: Fix return value check for fixed_phy_register()
Serge Semin fancer.lancer@gmail.com net: mdio: mdio-bitbang: Fix C45 read/write protocol
Arınç ÜNAL arinc.unal@arinc9.com net: dsa: mt7530: fix handling of 802.1X PAE frames
Ido Schimmel idosch@nvidia.com selftests: mlxsw: Fix test failure on Spectrum-4
Amit Cohen amcohen@nvidia.com mlxsw: Fix the size of 'VIRT_ROUTER_MSB'
Ido Schimmel idosch@nvidia.com mlxsw: reg: Fix SSPR register layout
Danielle Ratson danieller@nvidia.com mlxsw: pci: Set time stamp fields also when its type is MIRROR_UTC
Lu Wei luwei32@huawei.com ipvlan: Fix a reference count leak warning in ipvlan_ns_exit()
Eric Dumazet edumazet@google.com dccp: annotate data-races in dccp_poll()
Eric Dumazet edumazet@google.com sock: annotate data-races around prot->memory_pressure
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: felix: fix oversize frame dropping for always closed tc-taprio gates
Jiri Pirko jiri@resnulli.us devlink: add missing unregister linecard notification
Hariprasad Kelam hkelam@marvell.com octeontx2-af: SDP: fix receive link config
Zheng Yejian zhengyejian1@huawei.com tracing: Fix memleak due to race between current_tracer and trace
Sven Schnelle svens@linux.ibm.com tracing/synthetic: Allocate one additional element for size
Sven Schnelle svens@linux.ibm.com tracing/synthetic: Skip first entry for stack traces
Sven Schnelle svens@linux.ibm.com tracing/synthetic: Use union instead of casts
Zheng Yejian zhengyejian1@huawei.com tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
Randy Dunlap rdunlap@infradead.org wifi: iwlwifi: mvm: add dependency for PTP clock
Eric Dumazet edumazet@google.com can: raw: fix lockdep issue in raw_release()
Ziyang Xuan william.xuanziyang@huawei.com can: raw: fix receiver memory leak
Zhang Yi yi.zhang@huawei.com jbd2: fix a race when checking checkpoint buffer busy
Zhang Yi yi.zhang@huawei.com jbd2: remove journal_clean_one_cp_list()
Zhang Yi yi.zhang@huawei.com jbd2: remove t_checkpoint_io_list
Igor Mammedov imammedo@redhat.com PCI: acpiphp: Reassign resources on bridge if necessary
Chuck Lever chuck.lever@oracle.com xprtrdma: Remap Receive buffers after a reconnect
Fedor Pchelkin pchelkin@ispras.ru NFSv4: fix out path in __nfs4_get_acl_uncached
Fedor Pchelkin pchelkin@ispras.ru NFSv4.2: fix error handling in nfs42_proc_getxattr
-------------
Diffstat:
Makefile | 4 +- arch/loongarch/kernel/hw_breakpoint.c | 3 +- arch/powerpc/mm/book3s64/subpage_prot.c | 1 + arch/riscv/Kconfig | 28 ++- arch/riscv/kernel/compat_vdso/Makefile | 8 +- arch/riscv/mm/pageattr.c | 1 + arch/s390/mm/gmap.c | 5 + arch/x86/kernel/fpu/context.h | 3 +- arch/x86/kernel/fpu/core.c | 2 +- arch/x86/kernel/fpu/xstate.c | 7 + drivers/acpi/resource.c | 6 +- drivers/clk/clk-devres.c | 13 +- drivers/dma-buf/sw_sync.c | 18 +- drivers/gpio/gpio-sim.c | 15 +- drivers/gpu/drm/drm_probe_helper.c | 68 ++++-- .../gpu/drm/i915/display/intel_display_device.c | 91 +++++++- .../gpu/drm/i915/display/intel_display_device.h | 5 +- drivers/gpu/drm/i915/display/intel_hotplug.c | 4 +- drivers/gpu/drm/i915/i915_driver.c | 50 +++-- drivers/gpu/drm/i915/intel_device_info.c | 13 +- drivers/gpu/drm/panfrost/panfrost_devfreq.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 6 +- drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 8 + drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 12 + drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 35 ++- drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 6 +- drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c | 3 +- drivers/gpu/drm/vmwgfx/vmwgfx_shader.c | 3 +- .../platform/mediatek/vcodec/mtk_vcodec_enc.c | 2 + drivers/net/bonding/bond_alb.c | 6 +- drivers/net/can/vxcan.c | 7 +- drivers/net/dsa/mt7530.c | 4 + drivers/net/dsa/mt7530.h | 2 + drivers/net/dsa/ocelot/felix_vsc9959.c | 3 + drivers/net/ethernet/broadcom/bgmac.c | 2 +- drivers/net/ethernet/broadcom/genet/bcmmii.c | 2 +- drivers/net/ethernet/broadcom/tg3.c | 5 +- .../chelsio/inline_crypto/chtls/chtls_cm.c | 2 +- drivers/net/ethernet/ibm/ibmveth.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 5 +- drivers/net/ethernet/intel/ice/ice_base.c | 3 +- drivers/net/ethernet/intel/ice/ice_sriov.c | 8 +- drivers/net/ethernet/intel/ice/ice_vf_lib.c | 34 +-- drivers/net/ethernet/intel/ice/ice_vf_lib.h | 1 - drivers/net/ethernet/intel/ice/ice_virtchnl.c | 1 - drivers/net/ethernet/intel/igb/igb_ptp.c | 24 +- drivers/net/ethernet/intel/igc/igc_defines.h | 2 +- .../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 3 +- drivers/net/ethernet/mediatek/mtk_wed.c | 12 +- .../ethernet/mellanox/mlxsw/core_acl_flex_keys.c | 4 +- drivers/net/ethernet/mellanox/mlxsw/pci.c | 8 +- drivers/net/ethernet/mellanox/mlxsw/reg.h | 9 - .../ethernet/mellanox/mlxsw/spectrum2_mr_tcam.c | 2 +- .../mellanox/mlxsw/spectrum_acl_flex_keys.c | 4 +- drivers/net/ipvlan/ipvlan_main.c | 3 +- drivers/net/mdio/mdio-bitbang.c | 4 +- drivers/net/veth.c | 5 +- drivers/net/wireless/intel/iwlwifi/Kconfig | 1 + drivers/of/dynamic.c | 31 +-- drivers/of/kexec.c | 3 +- drivers/of/unittest.c | 4 +- drivers/pci/hotplug/acpiphp_glue.c | 9 +- drivers/pinctrl/pinctrl-amd.c | 30 +++ drivers/pinctrl/renesas/pinctrl-rza2.c | 17 +- drivers/pinctrl/renesas/pinctrl-rzg2l.c | 15 +- drivers/pinctrl/renesas/pinctrl-rzv2m.c | 13 +- drivers/platform/x86/ideapad-laptop.c | 5 + drivers/platform/x86/lenovo-ymc.c | 7 + drivers/scsi/raid_class.c | 48 ---- drivers/scsi/snic/snic_disc.c | 3 +- drivers/spi/spi-cadence.c | 19 +- drivers/thunderbolt/tmu.c | 3 +- drivers/tty/Kconfig | 3 + drivers/ufs/host/ufs-qcom.c | 2 +- fs/jbd2/checkpoint.c | 165 +++++--------- fs/jbd2/commit.c | 3 +- fs/jbd2/transaction.c | 17 +- fs/nfs/direct.c | 26 ++- fs/nfs/nfs42proc.c | 5 +- fs/nfs/nfs4proc.c | 14 +- fs/nfsd/nfs4state.c | 2 +- fs/nilfs2/segment.c | 5 + fs/proc/task_mmu.c | 5 + include/drm/display/drm_dp.h | 2 +- include/drm/drm_probe_helper.h | 1 + include/linux/clk.h | 80 +++---- include/linux/cpuset.h | 12 +- include/linux/jbd2.h | 7 +- include/linux/mm.h | 21 +- include/linux/mm_types.h | 9 + include/linux/pagewalk.h | 11 + include/linux/raid_class.h | 4 - include/linux/sched.h | 4 +- include/linux/trace_events.h | 11 + include/net/bonding.h | 11 +- include/net/inet_sock.h | 2 +- include/net/ip.h | 15 +- include/net/mac80211.h | 1 + include/net/netfilter/nf_tables.h | 7 + include/net/rtnetlink.h | 4 +- include/net/sock.h | 7 +- include/trace/events/jbd2.h | 12 +- kernel/cgroup/cgroup.c | 4 + kernel/cgroup/cpuset.c | 244 +++++++++++++-------- kernel/sched/core.c | 41 ++-- kernel/sched/deadline.c | 67 ++++-- kernel/sched/sched.h | 2 +- kernel/trace/trace.c | 15 +- kernel/trace/trace.h | 8 + kernel/trace/trace_events_synth.c | 103 ++++----- kernel/trace/trace_irqsoff.c | 3 +- kernel/trace/trace_sched_wakeup.c | 2 + lib/clz_ctz.c | 32 +-- lib/maple_tree.c | 3 + lib/radix-tree.c | 1 - mm/damon/vaddr.c | 2 + mm/gup.c | 30 ++- mm/hmm.c | 1 + mm/huge_memory.c | 3 +- mm/internal.h | 10 + mm/ksm.c | 25 ++- mm/madvise.c | 11 +- mm/memcontrol.c | 2 + mm/memory-failure.c | 12 +- mm/mempolicy.c | 22 +- mm/migrate_device.c | 1 + mm/mincore.c | 1 + mm/mlock.c | 1 + mm/mprotect.c | 1 + mm/pagewalk.c | 36 ++- mm/shmem.c | 6 +- mm/vmalloc.c | 4 + mm/vmscan.c | 14 +- net/batman-adv/bat_v_elp.c | 3 +- net/batman-adv/bat_v_ogm.c | 7 +- net/batman-adv/hard-interface.c | 14 +- net/batman-adv/netlink.c | 3 + net/batman-adv/soft-interface.c | 3 + net/batman-adv/translation-table.c | 1 - net/batman-adv/types.h | 6 + net/can/isotp.c | 22 +- net/can/raw.c | 77 ++++--- net/core/rtnetlink.c | 25 ++- net/dccp/ipv4.c | 4 +- net/dccp/proto.c | 20 +- net/devlink/leftover.c | 3 + net/ipv4/af_inet.c | 2 +- net/ipv4/datagram.c | 2 +- net/ipv4/tcp_ipv4.c | 4 +- net/mac80211/rx.c | 12 +- net/netfilter/nf_tables_api.c | 23 +- net/netfilter/nft_set_hash.c | 3 + net/netfilter/nft_set_pipapo.c | 15 +- net/netfilter/nft_set_rbtree.c | 3 + net/sched/sch_api.c | 53 +++-- net/sctp/socket.c | 4 +- net/sunrpc/xprtrdma/verbs.c | 9 +- security/selinux/ss/policydb.c | 2 +- sound/pci/ymfpci/ymfpci.c | 10 +- sound/soc/amd/Kconfig | 1 + sound/soc/amd/yc/acp6x-mach.c | 9 +- sound/soc/codecs/cs35l41.c | 2 +- sound/soc/codecs/cs35l56.c | 31 +-- sound/soc/sof/ipc4-pcm.c | 3 + .../drivers/net/bonding/bond-break-lacpdu-tx.sh | 4 +- .../selftests/drivers/net/mlxsw/sharedbuffer.sh | 16 +- tools/testing/selftests/mm/hmm-tests.c | 9 +- 167 files changed, 1438 insertions(+), 939 deletions(-)
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fedor Pchelkin pchelkin@ispras.ru
[ Upstream commit 4e3733fd2b0f677faae21cf838a43faf317986d3 ]
There is a slight issue with error handling code inside nfs42_proc_getxattr(). If page allocating loop fails then we free the failing page array element which is NULL but __free_page() can't deal with NULL args.
Found by Linux Verification Center (linuxtesting.org).
Fixes: a1f26739ccdc ("NFSv4.2: improve page handling for GETXATTR") Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Reviewed-by: Benjamin Coddington bcodding@redhat.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs42proc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 93e306bf4430f..5d7e0511f3513 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -1360,7 +1360,6 @@ ssize_t nfs42_proc_getxattr(struct inode *inode, const char *name, for (i = 0; i < np; i++) { pages[i] = alloc_page(GFP_KERNEL); if (!pages[i]) { - np = i + 1; err = -ENOMEM; goto out; } @@ -1384,8 +1383,8 @@ ssize_t nfs42_proc_getxattr(struct inode *inode, const char *name, } while (exception.retry);
out: - while (--np >= 0) - __free_page(pages[np]); + while (--i >= 0) + __free_page(pages[i]); kfree(pages);
return err;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fedor Pchelkin pchelkin@ispras.ru
[ Upstream commit f4e89f1a6dab4c063fc1e823cc9dddc408ff40cf ]
Another highly rare error case when a page allocating loop (inside __nfs4_get_acl_uncached, this time) is not properly unwound on error. Since pages array is allocated being uninitialized, need to free only lower array indices. NULL checks were useful before commit 62a1573fcf84 ("NFSv4 fix acl retrieval over krb5i/krb5p mounts") when the array had been initialized to zero on stack.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 62a1573fcf84 ("NFSv4 fix acl retrieval over krb5i/krb5p mounts") Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Reviewed-by: Benjamin Coddington bcodding@redhat.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4proc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 9faba2dac11dd..f16742b8e0e21 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -6004,9 +6004,8 @@ static ssize_t __nfs4_get_acl_uncached(struct inode *inode, void *buf, out_ok: ret = res.acl_len; out_free: - for (i = 0; i < npages; i++) - if (pages[i]) - __free_page(pages[i]); + while (--i >= 0) + __free_page(pages[i]); if (res.acl_scratch) __free_page(res.acl_scratch); kfree(pages);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 895cedc1791916e8a98864f12b656702fad0bb67 ]
On server-initiated disconnect, rpcrdma_xprt_disconnect() was DMA- unmapping the Receive buffers, but rpcrdma_post_recvs() neglected to remap them after a new connection had been established. The result was immediate failure of the new connection with the Receives flushing with LOCAL_PROT_ERR.
Fixes: 671c450b6fe0 ("xprtrdma: Fix oops in Receive handler after device removal") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/xprtrdma/verbs.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index b098fde373abf..28c0771c4e8c3 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -935,9 +935,6 @@ struct rpcrdma_rep *rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt, if (!rep->rr_rdmabuf) goto out_free;
- if (!rpcrdma_regbuf_dma_map(r_xprt, rep->rr_rdmabuf)) - goto out_free_regbuf; - rep->rr_cid.ci_completion_id = atomic_inc_return(&r_xprt->rx_ep->re_completion_ids);
@@ -956,8 +953,6 @@ struct rpcrdma_rep *rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt, spin_unlock(&buf->rb_lock); return rep;
-out_free_regbuf: - rpcrdma_regbuf_free(rep->rr_rdmabuf); out_free: kfree(rep); out: @@ -1363,6 +1358,10 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp) rep = rpcrdma_rep_create(r_xprt, temp); if (!rep) break; + if (!rpcrdma_regbuf_dma_map(r_xprt, rep->rr_rdmabuf)) { + rpcrdma_rep_put(buf, rep); + break; + }
rep->rr_cid.ci_queue_id = ep->re_attr.recv_cq->res.id; trace_xprtrdma_post_recv(rep);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Igor Mammedov imammedo@redhat.com
[ Upstream commit 40613da52b13fb21c5566f10b287e0ca8c12c4e9 ]
When using ACPI PCI hotplug, hotplugging a device with large BARs may fail if bridge windows programmed by firmware are not large enough.
Reproducer: $ qemu-kvm -monitor stdio -M q35 -m 4G \ -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=on \ -device id=rp1,pcie-root-port,bus=pcie.0,chassis=4 \ disk_image
wait till linux guest boots, then hotplug device: (qemu) device_add qxl,bus=rp1
hotplug on guest side fails with: pci 0000:01:00.0: [1b36:0100] type 00 class 0x038000 pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x03ffffff] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x03ffffff] pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x00001fff] pci 0000:01:00.0: reg 0x1c: [io 0x0000-0x001f] pci 0000:01:00.0: BAR 0: no space for [mem size 0x04000000] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x04000000] pci 0000:01:00.0: BAR 1: no space for [mem size 0x04000000] pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x04000000] pci 0000:01:00.0: BAR 2: assigned [mem 0xfe800000-0xfe801fff] pci 0000:01:00.0: BAR 3: assigned [io 0x1000-0x101f] qxl 0000:01:00.0: enabling device (0000 -> 0003) Unable to create vram_mapping qxl: probe of 0000:01:00.0 failed with error -12
However when using native PCIe hotplug '-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off' it works fine, since kernel attempts to reassign unused resources.
Use the same machinery as native PCIe hotplug to (re)assign resources.
Link: https://lore.kernel.org/r/20230424191557.2464760-1-imammedo@redhat.com Signed-off-by: Igor Mammedov imammedo@redhat.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Acked-by: Michael S. Tsirkin mst@redhat.com Acked-by: Rafael J. Wysocki rafael@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/hotplug/acpiphp_glue.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index 5b1f271c6034b..328d1e4160147 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -498,7 +498,6 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) acpiphp_native_scan_bridge(dev); } } else { - LIST_HEAD(add_list); int max, pass;
acpiphp_rescan_slot(slot); @@ -512,12 +511,10 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) if (pass && dev->subordinate) { check_hotplug_bridge(slot, dev); pcibios_resource_survey_bus(dev->subordinate); - __pci_bus_size_bridges(dev->subordinate, - &add_list); } } } - __pci_bus_assign_resources(bus, &add_list, NULL); + pci_assign_unassigned_bridge_resources(bus->self); }
acpiphp_sanitize_bus(bus);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit be22255360f80d3af789daad00025171a65424a5 ]
Since t_checkpoint_io_list was stop using in jbd2_log_do_checkpoint() now, it's time to remove the whole t_checkpoint_io_list logic.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230606135928.434610-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jbd2/checkpoint.c | 42 ++---------------------------------------- fs/jbd2/commit.c | 3 +-- include/linux/jbd2.h | 6 ------ 3 files changed, 3 insertions(+), 48 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index c4e0da6db7195..723b4eb112828 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -27,7 +27,7 @@ * * Called with j_list_lock held. */ -static inline void __buffer_unlink_first(struct journal_head *jh) +static inline void __buffer_unlink(struct journal_head *jh) { transaction_t *transaction = jh->b_cp_transaction;
@@ -40,23 +40,6 @@ static inline void __buffer_unlink_first(struct journal_head *jh) } }
-/* - * Unlink a buffer from a transaction checkpoint(io) list. - * - * Called with j_list_lock held. - */ -static inline void __buffer_unlink(struct journal_head *jh) -{ - transaction_t *transaction = jh->b_cp_transaction; - - __buffer_unlink_first(jh); - if (transaction->t_checkpoint_io_list == jh) { - transaction->t_checkpoint_io_list = jh->b_cpnext; - if (transaction->t_checkpoint_io_list == jh) - transaction->t_checkpoint_io_list = NULL; - } -} - /* * Check a checkpoint buffer could be release or not. * @@ -505,15 +488,6 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, break; if (need_resched() || spin_needbreak(&journal->j_list_lock)) break; - if (released) - continue; - - nr_freed += journal_shrink_one_cp_list(transaction->t_checkpoint_io_list, - nr_to_scan, &released); - if (*nr_to_scan == 0) - break; - if (need_resched() || spin_needbreak(&journal->j_list_lock)) - break; } while (transaction != last_transaction);
if (transaction != last_transaction) { @@ -568,17 +542,6 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy) */ if (need_resched()) return; - if (ret) - continue; - /* - * It is essential that we are as careful as in the case of - * t_checkpoint_list with removing the buffer from the list as - * we can possibly see not yet submitted buffers on io_list - */ - ret = journal_clean_one_cp_list(transaction-> - t_checkpoint_io_list, destroy); - if (need_resched()) - return; /* * Stop scanning if we couldn't free the transaction. This * avoids pointless scanning of transactions which still @@ -663,7 +626,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) jbd2_journal_put_journal_head(jh);
/* Is this transaction empty? */ - if (transaction->t_checkpoint_list || transaction->t_checkpoint_io_list) + if (transaction->t_checkpoint_list) return 0;
/* @@ -755,7 +718,6 @@ void __jbd2_journal_drop_transaction(journal_t *journal, transaction_t *transact J_ASSERT(transaction->t_forget == NULL); J_ASSERT(transaction->t_shadow_list == NULL); J_ASSERT(transaction->t_checkpoint_list == NULL); - J_ASSERT(transaction->t_checkpoint_io_list == NULL); J_ASSERT(atomic_read(&transaction->t_updates) == 0); J_ASSERT(journal->j_committing_transaction != transaction); J_ASSERT(journal->j_running_transaction != transaction); diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index b33155dd70017..1073259902a60 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -1141,8 +1141,7 @@ void jbd2_journal_commit_transaction(journal_t *journal) spin_lock(&journal->j_list_lock); commit_transaction->t_state = T_FINISHED; /* Check if the transaction can be dropped now that we are finished */ - if (commit_transaction->t_checkpoint_list == NULL && - commit_transaction->t_checkpoint_io_list == NULL) { + if (commit_transaction->t_checkpoint_list == NULL) { __jbd2_journal_drop_transaction(journal, commit_transaction); jbd2_journal_free_transaction(commit_transaction); } diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index f619bae1dcc5d..91a2cf4bc5756 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -622,12 +622,6 @@ struct transaction_s */ struct journal_head *t_checkpoint_list;
- /* - * Doubly-linked circular list of all buffers submitted for IO while - * checkpointing. [j_list_lock] - */ - struct journal_head *t_checkpoint_io_list; - /* * Doubly-linked circular list of metadata buffers being * shadowed by log IO. The IO buffers on the iobuf list and
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit b98dba273a0e47dbfade89c9af73c5b012a4eabb ]
journal_clean_one_cp_list() and journal_shrink_one_cp_list() are almost the same, so merge them into journal_shrink_one_cp_list(), remove the nr_to_scan parameter, always scan and try to free the whole checkpoint list.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230606135928.434610-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jbd2/checkpoint.c | 75 +++++++++---------------------------- include/trace/events/jbd2.h | 12 ++---- 2 files changed, 21 insertions(+), 66 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index 723b4eb112828..42b34cab64fbd 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -349,50 +349,10 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
/* Checkpoint list management */
-/* - * journal_clean_one_cp_list - * - * Find all the written-back checkpoint buffers in the given list and - * release them. If 'destroy' is set, clean all buffers unconditionally. - * - * Called with j_list_lock held. - * Returns 1 if we freed the transaction, 0 otherwise. - */ -static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy) -{ - struct journal_head *last_jh; - struct journal_head *next_jh = jh; - - if (!jh) - return 0; - - last_jh = jh->b_cpprev; - do { - jh = next_jh; - next_jh = jh->b_cpnext; - - if (!destroy && __cp_buffer_busy(jh)) - return 0; - - if (__jbd2_journal_remove_checkpoint(jh)) - return 1; - /* - * This function only frees up some memory - * if possible so we dont have an obligation - * to finish processing. Bail out if preemption - * requested: - */ - if (need_resched()) - return 0; - } while (jh != last_jh); - - return 0; -} - /* * journal_shrink_one_cp_list * - * Find 'nr_to_scan' written-back checkpoint buffers in the given list + * Find all the written-back checkpoint buffers in the given list * and try to release them. If the whole transaction is released, set * the 'released' parameter. Return the number of released checkpointed * buffers. @@ -400,15 +360,15 @@ static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy) * Called with j_list_lock held. */ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh, - unsigned long *nr_to_scan, - bool *released) + bool destroy, bool *released) { struct journal_head *last_jh; struct journal_head *next_jh = jh; unsigned long nr_freed = 0; int ret;
- if (!jh || *nr_to_scan == 0) + *released = false; + if (!jh) return 0;
last_jh = jh->b_cpprev; @@ -416,8 +376,7 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh, jh = next_jh; next_jh = jh->b_cpnext;
- (*nr_to_scan)--; - if (__cp_buffer_busy(jh)) + if (!destroy && __cp_buffer_busy(jh)) continue;
nr_freed++; @@ -429,7 +388,7 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
if (need_resched()) break; - } while (jh != last_jh && *nr_to_scan); + } while (jh != last_jh);
return nr_freed; } @@ -447,11 +406,11 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan) { transaction_t *transaction, *last_transaction, *next_transaction; - bool released; + bool __maybe_unused released; tid_t first_tid = 0, last_tid = 0, next_tid = 0; tid_t tid = 0; unsigned long nr_freed = 0; - unsigned long nr_scanned = *nr_to_scan; + unsigned long freed;
again: spin_lock(&journal->j_list_lock); @@ -480,10 +439,11 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, transaction = next_transaction; next_transaction = transaction->t_cpnext; tid = transaction->t_tid; - released = false;
- nr_freed += journal_shrink_one_cp_list(transaction->t_checkpoint_list, - nr_to_scan, &released); + freed = journal_shrink_one_cp_list(transaction->t_checkpoint_list, + false, &released); + nr_freed += freed; + (*nr_to_scan) -= min(*nr_to_scan, freed); if (*nr_to_scan == 0) break; if (need_resched() || spin_needbreak(&journal->j_list_lock)) @@ -504,9 +464,8 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, if (*nr_to_scan && next_tid) goto again; out: - nr_scanned -= *nr_to_scan; trace_jbd2_shrink_checkpoint_list(journal, first_tid, tid, last_tid, - nr_freed, nr_scanned, next_tid); + nr_freed, next_tid);
return nr_freed; } @@ -522,7 +481,7 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy) { transaction_t *transaction, *last_transaction, *next_transaction; - int ret; + bool released;
transaction = journal->j_checkpoint_transactions; if (!transaction) @@ -533,8 +492,8 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy) do { transaction = next_transaction; next_transaction = transaction->t_cpnext; - ret = journal_clean_one_cp_list(transaction->t_checkpoint_list, - destroy); + journal_shrink_one_cp_list(transaction->t_checkpoint_list, + destroy, &released); /* * This function only frees up some memory if possible so we * dont have an obligation to finish processing. Bail out if @@ -547,7 +506,7 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy) * avoids pointless scanning of transactions which still * weren't checkpointed. */ - if (!ret) + if (!released) return; } while (transaction != last_transaction); } diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h index 8f5ee380d3093..5646ae15a957a 100644 --- a/include/trace/events/jbd2.h +++ b/include/trace/events/jbd2.h @@ -462,11 +462,9 @@ TRACE_EVENT(jbd2_shrink_scan_exit, TRACE_EVENT(jbd2_shrink_checkpoint_list,
TP_PROTO(journal_t *journal, tid_t first_tid, tid_t tid, tid_t last_tid, - unsigned long nr_freed, unsigned long nr_scanned, - tid_t next_tid), + unsigned long nr_freed, tid_t next_tid),
- TP_ARGS(journal, first_tid, tid, last_tid, nr_freed, - nr_scanned, next_tid), + TP_ARGS(journal, first_tid, tid, last_tid, nr_freed, next_tid),
TP_STRUCT__entry( __field(dev_t, dev) @@ -474,7 +472,6 @@ TRACE_EVENT(jbd2_shrink_checkpoint_list, __field(tid_t, tid) __field(tid_t, last_tid) __field(unsigned long, nr_freed) - __field(unsigned long, nr_scanned) __field(tid_t, next_tid) ),
@@ -484,15 +481,14 @@ TRACE_EVENT(jbd2_shrink_checkpoint_list, __entry->tid = tid; __entry->last_tid = last_tid; __entry->nr_freed = nr_freed; - __entry->nr_scanned = nr_scanned; __entry->next_tid = next_tid; ),
TP_printk("dev %d,%d shrink transaction %u-%u(%u) freed %lu " - "scanned %lu next transaction %u", + "next transaction %u", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->first_tid, __entry->tid, __entry->last_tid, - __entry->nr_freed, __entry->nr_scanned, __entry->next_tid) + __entry->nr_freed, __entry->next_tid) );
#endif /* _TRACE_JBD2_H */
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit 46f881b5b1758dc4a35fba4a643c10717d0cf427 ]
Before removing checkpoint buffer from the t_checkpoint_list, we have to check both BH_Dirty and BH_Lock bits together to distinguish buffers have not been or were being written back. But __cp_buffer_busy() checks them separately, it first check lock state and then check dirty, the window between these two checks could be raced by writing back procedure, which locks buffer and clears buffer dirty before I/O completes. So it cannot guarantee checkpointing buffers been written back to disk if some error happens later. Finally, it may clean checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the same problem (journal_unmap_buffer() escape from this issue since it's running under the buffer lock), so fix them through introducing a new helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490 Cc: stable@vger.kernel.org Suggested-by: Jan Kara jack@suse.cz Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jbd2/checkpoint.c | 38 +++++++++++++++++++++++++++++++++++--- fs/jbd2/transaction.c | 17 +++++------------ include/linux/jbd2.h | 1 + 3 files changed, 41 insertions(+), 15 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index 42b34cab64fbd..9ec91017a7f3c 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh, jh = next_jh; next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh)) - continue; + if (destroy) { + ret = __jbd2_journal_remove_checkpoint(jh); + } else { + ret = jbd2_journal_try_remove_checkpoint(jh); + if (ret < 0) + continue; + }
nr_freed++; - ret = __jbd2_journal_remove_checkpoint(jh); if (ret) { *released = true; break; @@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) return 1; }
+/* + * Check the checkpoint buffer and try to remove it from the checkpoint + * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if + * it frees the transaction, 0 otherwise. + * + * This function is called with j_list_lock held. + */ +int jbd2_journal_try_remove_checkpoint(struct journal_head *jh) +{ + struct buffer_head *bh = jh2bh(jh); + + if (!trylock_buffer(bh)) + return -EBUSY; + if (buffer_dirty(bh)) { + unlock_buffer(bh); + return -EBUSY; + } + unlock_buffer(bh); + + /* + * Buffer is clean and the IO has finished (we held the buffer + * lock) so the checkpoint is done. We can safely remove the + * buffer from this transaction. + */ + JBUFFER_TRACE(jh, "remove from checkpoint list"); + return __jbd2_journal_remove_checkpoint(jh); +} + /* * journal_insert_checkpoint: put a committed buffer onto a checkpoint * list so that we know when it is safe to clean the transaction out of diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 18611241f4513..6ef5022949c46 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh) * Otherwise, if the buffer has been written to disk, * it is safe to remove the checkpoint and drop it. */ - if (!buffer_dirty(bh)) { - __jbd2_journal_remove_checkpoint(jh); + if (jbd2_journal_try_remove_checkpoint(jh) >= 0) { spin_unlock(&journal->j_list_lock); goto drop; } @@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh)) - goto out; - if (jh->b_next_transaction != NULL || jh->b_transaction != NULL) - goto out; + return;
spin_lock(&journal->j_list_lock); - if (jh->b_cp_transaction != NULL) { - /* written-back checkpointed metadata buffer */ - JBUFFER_TRACE(jh, "remove from checkpoint list"); - __jbd2_journal_remove_checkpoint(jh); - } + /* Remove written-back checkpointed metadata buffer */ + if (jh->b_cp_transaction != NULL) + jbd2_journal_try_remove_checkpoint(jh); spin_unlock(&journal->j_list_lock); -out: return; }
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 91a2cf4bc5756..c212da35a052c 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *); void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy); unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan); int __jbd2_journal_remove_checkpoint(struct journal_head *); +int jbd2_journal_try_remove_checkpoint(struct journal_head *jh); void jbd2_journal_destroy_checkpoint(journal_t *journal); void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit ee8b94c8510ce64afe0b87ef548d23e00915fb10 ]
Got kmemleak errors with the following ltp can_filter testcase:
for ((i=1; i<=100; i++)) do ./can_filter & sleep 0.1 done
============================================================== [<00000000db4a4943>] can_rx_register+0x147/0x360 [can] [<00000000a289549d>] raw_setsockopt+0x5ef/0x853 [can_raw] [<000000006d3d9ebd>] __sys_setsockopt+0x173/0x2c0 [<00000000407dbfec>] __x64_sys_setsockopt+0x61/0x70 [<00000000fd468496>] do_syscall_64+0x33/0x40 [<00000000b7e47d51>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
It's a bug in the concurrent scenario of unregister_netdevice_many() and raw_release() as following:
cpu0 cpu1 unregister_netdevice_many(can_dev) unlist_netdevice(can_dev) // dev_get_by_index() return NULL after this net_set_todo(can_dev) raw_release(can_socket) dev = dev_get_by_index(, ro->ifindex); // dev == NULL if (dev) { // receivers in dev_rcv_lists not free because dev is NULL raw_disable_allfilters(, dev, ); dev_put(dev); } ... ro->bound = 0; ...
call_netdevice_notifiers(NETDEV_UNREGISTER, ) raw_notify(, NETDEV_UNREGISTER, ) if (ro->bound) // invalid because ro->bound has been set 0 raw_disable_allfilters(, dev, ); // receivers in dev_rcv_lists will never be freed
Add a net_device pointer member in struct raw_sock to record bound can_dev, and use rtnl_lock to serialize raw_socket members between raw_bind(), raw_release(), raw_setsockopt() and raw_notify(). Use ro->dev to decide whether to free receivers in dev_rcv_lists.
Fixes: 8d0caedb7596 ("can: bcm/raw/isotp: use per module netdevice notifier") Reviewed-by: Oliver Hartkopp socketcan@hartkopp.net Acked-by: Oliver Hartkopp socketcan@hartkopp.net Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Link: https://lore.kernel.org/all/20230711011737.1969582-1-william.xuanziyang@huaw... Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/can/raw.c | 57 ++++++++++++++++++++++----------------------------- 1 file changed, 24 insertions(+), 33 deletions(-)
diff --git a/net/can/raw.c b/net/can/raw.c index f8e3866157a33..9fdad12d16325 100644 --- a/net/can/raw.c +++ b/net/can/raw.c @@ -84,6 +84,7 @@ struct raw_sock { struct sock sk; int bound; int ifindex; + struct net_device *dev; struct list_head notifier; int loopback; int recv_own_msgs; @@ -277,7 +278,7 @@ static void raw_notify(struct raw_sock *ro, unsigned long msg, if (!net_eq(dev_net(dev), sock_net(sk))) return;
- if (ro->ifindex != dev->ifindex) + if (ro->dev != dev) return;
switch (msg) { @@ -292,6 +293,7 @@ static void raw_notify(struct raw_sock *ro, unsigned long msg,
ro->ifindex = 0; ro->bound = 0; + ro->dev = NULL; ro->count = 0; release_sock(sk);
@@ -337,6 +339,7 @@ static int raw_init(struct sock *sk)
ro->bound = 0; ro->ifindex = 0; + ro->dev = NULL;
/* set default filter to single entry dfilter */ ro->dfilter.can_id = 0; @@ -385,19 +388,13 @@ static int raw_release(struct socket *sock)
lock_sock(sk);
+ rtnl_lock(); /* remove current filters & unregister */ if (ro->bound) { - if (ro->ifindex) { - struct net_device *dev; - - dev = dev_get_by_index(sock_net(sk), ro->ifindex); - if (dev) { - raw_disable_allfilters(dev_net(dev), dev, sk); - dev_put(dev); - } - } else { + if (ro->dev) + raw_disable_allfilters(dev_net(ro->dev), ro->dev, sk); + else raw_disable_allfilters(sock_net(sk), NULL, sk); - } }
if (ro->count > 1) @@ -405,8 +402,10 @@ static int raw_release(struct socket *sock)
ro->ifindex = 0; ro->bound = 0; + ro->dev = NULL; ro->count = 0; free_percpu(ro->uniq); + rtnl_unlock();
sock_orphan(sk); sock->sk = NULL; @@ -422,6 +421,7 @@ static int raw_bind(struct socket *sock, struct sockaddr *uaddr, int len) struct sockaddr_can *addr = (struct sockaddr_can *)uaddr; struct sock *sk = sock->sk; struct raw_sock *ro = raw_sk(sk); + struct net_device *dev = NULL; int ifindex; int err = 0; int notify_enetdown = 0; @@ -431,14 +431,13 @@ static int raw_bind(struct socket *sock, struct sockaddr *uaddr, int len) if (addr->can_family != AF_CAN) return -EINVAL;
+ rtnl_lock(); lock_sock(sk);
if (ro->bound && addr->can_ifindex == ro->ifindex) goto out;
if (addr->can_ifindex) { - struct net_device *dev; - dev = dev_get_by_index(sock_net(sk), addr->can_ifindex); if (!dev) { err = -ENODEV; @@ -467,26 +466,20 @@ static int raw_bind(struct socket *sock, struct sockaddr *uaddr, int len) if (!err) { if (ro->bound) { /* unregister old filters */ - if (ro->ifindex) { - struct net_device *dev; - - dev = dev_get_by_index(sock_net(sk), - ro->ifindex); - if (dev) { - raw_disable_allfilters(dev_net(dev), - dev, sk); - dev_put(dev); - } - } else { + if (ro->dev) + raw_disable_allfilters(dev_net(ro->dev), + ro->dev, sk); + else raw_disable_allfilters(sock_net(sk), NULL, sk); - } } ro->ifindex = ifindex; ro->bound = 1; + ro->dev = dev; }
out: release_sock(sk); + rtnl_unlock();
if (notify_enetdown) { sk->sk_err = ENETDOWN; @@ -553,9 +546,9 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, rtnl_lock(); lock_sock(sk);
- if (ro->bound && ro->ifindex) { - dev = dev_get_by_index(sock_net(sk), ro->ifindex); - if (!dev) { + dev = ro->dev; + if (ro->bound && dev) { + if (dev->reg_state != NETREG_REGISTERED) { if (count > 1) kfree(filter); err = -ENODEV; @@ -596,7 +589,6 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, ro->count = count;
out_fil: - dev_put(dev); release_sock(sk); rtnl_unlock();
@@ -614,9 +606,9 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, rtnl_lock(); lock_sock(sk);
- if (ro->bound && ro->ifindex) { - dev = dev_get_by_index(sock_net(sk), ro->ifindex); - if (!dev) { + dev = ro->dev; + if (ro->bound && dev) { + if (dev->reg_state != NETREG_REGISTERED) { err = -ENODEV; goto out_err; } @@ -640,7 +632,6 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, ro->err_mask = err_mask;
out_err: - dev_put(dev); release_sock(sk); rtnl_unlock();
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 11c9027c983e9e4b408ee5613b6504d24ebd85be ]
syzbot complained about a lockdep issue [1]
Since raw_bind() and raw_setsockopt() first get RTNL before locking the socket, we must adopt the same order in raw_release()
[1] WARNING: possible circular locking dependency detected 6.5.0-rc1-syzkaller-00192-g78adb4bcf99e #0 Not tainted ------------------------------------------------------ syz-executor.0/14110 is trying to acquire lock: ffff88804e4b6130 (sk_lock-AF_CAN){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1708 [inline] ffff88804e4b6130 (sk_lock-AF_CAN){+.+.}-{0:0}, at: raw_bind+0xb1/0xab0 net/can/raw.c:435
but task is already holding lock: ffffffff8e3df368 (rtnl_mutex){+.+.}-{3:3}, at: raw_bind+0xa7/0xab0 net/can/raw.c:434
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (rtnl_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:603 [inline] __mutex_lock+0x181/0x1340 kernel/locking/mutex.c:747 raw_release+0x1c6/0x9b0 net/can/raw.c:391 __sock_release+0xcd/0x290 net/socket.c:654 sock_close+0x1c/0x20 net/socket.c:1386 __fput+0x3fd/0xac0 fs/file_table.c:384 task_work_run+0x14d/0x240 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x210/0x240 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (sk_lock-AF_CAN){+.+.}-{0:0}: check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144 lock_acquire kernel/locking/lockdep.c:5761 [inline] lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726 lock_sock_nested+0x3a/0xf0 net/core/sock.c:3492 lock_sock include/net/sock.h:1708 [inline] raw_bind+0xb1/0xab0 net/can/raw.c:435 __sys_bind+0x1ec/0x220 net/socket.c:1792 __do_sys_bind net/socket.c:1803 [inline] __se_sys_bind net/socket.c:1801 [inline] __x64_sys_bind+0x72/0xb0 net/socket.c:1801 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(sk_lock-AF_CAN); lock(rtnl_mutex); lock(sk_lock-AF_CAN);
*** DEADLOCK ***
1 lock held by syz-executor.0/14110:
stack backtrace: CPU: 0 PID: 14110 Comm: syz-executor.0 Not tainted 6.5.0-rc1-syzkaller-00192-g78adb4bcf99e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106 check_noncircular+0x311/0x3f0 kernel/locking/lockdep.c:2195 check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144 lock_acquire kernel/locking/lockdep.c:5761 [inline] lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726 lock_sock_nested+0x3a/0xf0 net/core/sock.c:3492 lock_sock include/net/sock.h:1708 [inline] raw_bind+0xb1/0xab0 net/can/raw.c:435 __sys_bind+0x1ec/0x220 net/socket.c:1792 __do_sys_bind net/socket.c:1803 [inline] __se_sys_bind net/socket.c:1801 [inline] __x64_sys_bind+0x72/0xb0 net/socket.c:1801 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fd89007cb29 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fd890d2a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 RAX: ffffffffffffffda RBX: 00007fd89019bf80 RCX: 00007fd89007cb29 RDX: 0000000000000010 RSI: 0000000020000040 RDI: 0000000000000003 RBP: 00007fd8900c847a R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007fd89019bf80 R15: 00007ffebf8124f8 </TASK>
Fixes: ee8b94c8510c ("can: raw: fix receiver memory leak") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Ziyang Xuan william.xuanziyang@huawei.com Cc: Oliver Hartkopp socketcan@hartkopp.net Cc: stable@vger.kernel.org Cc: Marc Kleine-Budde mkl@pengutronix.de Link: https://lore.kernel.org/all/20230720114438.172434-1-edumazet@google.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/can/raw.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/can/raw.c b/net/can/raw.c index 9fdad12d16325..9fbbf6e00287f 100644 --- a/net/can/raw.c +++ b/net/can/raw.c @@ -386,9 +386,9 @@ static int raw_release(struct socket *sock) list_del(&ro->notifier); spin_unlock(&raw_notifier_lock);
+ rtnl_lock(); lock_sock(sk);
- rtnl_lock(); /* remove current filters & unregister */ if (ro->bound) { if (ro->dev) @@ -405,12 +405,13 @@ static int raw_release(struct socket *sock) ro->dev = NULL; ro->count = 0; free_percpu(ro->uniq); - rtnl_unlock();
sock_orphan(sk); sock->sk = NULL;
release_sock(sk); + rtnl_unlock(); + sock_put(sk);
return 0;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 609a1bcd7bebac90a1b443e9fed47fd48dac5799 ]
When the code to use the PTP HW clock was added, it didn't update the Kconfig entry for the PTP dependency, leading to build errors, so update the Kconfig entry to depend on PTP_1588_CLOCK_OPTIONAL.
aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.o: in function `iwl_mvm_ptp_init': drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:294: undefined reference to `ptp_clock_register' drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:294:(.text+0xce8): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_register' aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:301: undefined reference to `ptp_clock_index' drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:301:(.text+0xd18): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_index' aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.o: in function `iwl_mvm_ptp_remove': drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:315: undefined reference to `ptp_clock_index' drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:315:(.text+0xe80): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_index' aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:319: undefined reference to `ptp_clock_unregister' drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:319:(.text+0xeac): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_unregister'
Fixes: 1595ecce1cf3 ("wifi: iwlwifi: mvm: add support for PTP HW clock (PHC)") Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/all/202308110447.4QSJHmFH-lkp@intel.com/ Cc: Krishnanand Prabhu krishnanand.prabhu@intel.com Cc: Luca Coelho luciano.coelho@intel.com Cc: Gregory Greenman gregory.greenman@intel.com Cc: Johannes Berg johannes.berg@intel.com Cc: Kalle Valo kvalo@kernel.org Cc: linux-wireless@vger.kernel.org Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Reviewed-by: Simon Horman horms@kernel.org Tested-by: Simon Horman horms@kernel.org # build-tested Acked-by: Richard Cochran richardcochran@gmail.com Acked-by: Gregory Greenman gregory.greenman@intel.com Link: https://lore.kernel.org/r/20230812052947.22913-1-rdunlap@infradead.org Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/intel/iwlwifi/Kconfig b/drivers/net/wireless/intel/iwlwifi/Kconfig index b20409f8c13ab..20971304fdef4 100644 --- a/drivers/net/wireless/intel/iwlwifi/Kconfig +++ b/drivers/net/wireless/intel/iwlwifi/Kconfig @@ -66,6 +66,7 @@ config IWLMVM tristate "Intel Wireless WiFi MVM Firmware support" select WANT_DEV_COREDUMP depends on MAC80211 + depends on PTP_1588_CLOCK_OPTIONAL help This is the driver that supports the MVM firmware. The list of the devices that use this firmware is available here:
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zheng Yejian zhengyejian1@huawei.com
[ Upstream commit b71645d6af10196c46cbe3732de2ea7d36b3ff6d ]
Trace ring buffer can no longer record anything after executing following commands at the shell prompt:
# cd /sys/kernel/tracing # cat tracing_cpumask fff # echo 0 > tracing_cpumask # echo 1 > snapshot # echo fff > tracing_cpumask # echo 1 > tracing_on # echo "hello world" > trace_marker -bash: echo: write error: Bad file descriptor
The root cause is that: 1. After `echo 0 > tracing_cpumask`, 'record_disabled' of cpu buffers in 'tr->array_buffer.buffer' became 1 (see tracing_set_cpumask()); 2. After `echo 1 > snapshot`, 'tr->array_buffer.buffer' is swapped with 'tr->max_buffer.buffer', then the 'record_disabled' became 0 (see update_max_tr()); 3. After `echo fff > tracing_cpumask`, the 'record_disabled' become -1; Then array_buffer and max_buffer are both unavailable due to value of 'record_disabled' is not 0.
To fix it, enable or disable both array_buffer and max_buffer at the same time in tracing_set_cpumask().
Link: https://lkml.kernel.org/r/20230805033816.3284594-2-zhengyejian1@huawei.com
Cc: mhiramat@kernel.org Cc: vnagarnaik@google.com Cc: shuah@kernel.org Fixes: 71babb2705e2 ("tracing: change CPU ring buffer state from tracing_cpumask") Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index fd051f85efd4b..17663ce4936a4 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -5260,11 +5260,17 @@ int tracing_set_cpumask(struct trace_array *tr, !cpumask_test_cpu(cpu, tracing_cpumask_new)) { atomic_inc(&per_cpu_ptr(tr->array_buffer.data, cpu)->disabled); ring_buffer_record_disable_cpu(tr->array_buffer.buffer, cpu); +#ifdef CONFIG_TRACER_MAX_TRACE + ring_buffer_record_disable_cpu(tr->max_buffer.buffer, cpu); +#endif } if (!cpumask_test_cpu(cpu, tr->tracing_cpumask) && cpumask_test_cpu(cpu, tracing_cpumask_new)) { atomic_dec(&per_cpu_ptr(tr->array_buffer.data, cpu)->disabled); ring_buffer_record_enable_cpu(tr->array_buffer.buffer, cpu); +#ifdef CONFIG_TRACER_MAX_TRACE + ring_buffer_record_enable_cpu(tr->max_buffer.buffer, cpu); +#endif } } arch_spin_unlock(&tr->max_lock);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sven Schnelle svens@linux.ibm.com
[ Upstream commit ddeea494a16f32522bce16ee65f191d05d4b8282 ]
The current code uses a lot of casts to access the fields member in struct synth_trace_events with different sizes. This makes the code hard to read, and had already introduced an endianness bug. Use a union and struct instead.
Link: https://lkml.kernel.org/r/20230816154928.4171614-2-svens@linux.ibm.com
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Fixes: 00cf3d672a9dd ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Sven Schnelle svens@linux.ibm.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Stable-dep-of: 887f92e09ef3 ("tracing/synthetic: Skip first entry for stack traces") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/trace_events.h | 11 ++++ kernel/trace/trace.h | 8 +++ kernel/trace/trace_events_synth.c | 87 +++++++++++++------------------ 3 files changed, 56 insertions(+), 50 deletions(-)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 7c4a0b72334eb..c55fc453e33b5 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -59,6 +59,17 @@ int trace_raw_output_prep(struct trace_iterator *iter, extern __printf(2, 3) void trace_event_printf(struct trace_iterator *iter, const char *fmt, ...);
+/* Used to find the offset and length of dynamic fields in trace events */ +struct trace_dynamic_info { +#ifdef CONFIG_CPU_BIG_ENDIAN + u16 offset; + u16 len; +#else + u16 len; + u16 offset; +#endif +}; + /* * The trace entry - the most basic unit of tracing. This is what * is printed in the end as a single line in the trace output, such as: diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index eee1f3ca47494..2daeac8e690a6 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1282,6 +1282,14 @@ static inline void trace_branch_disable(void) /* set ring buffers to default size if not already done so */ int tracing_update_buffers(void);
+union trace_synth_field { + u8 as_u8; + u16 as_u16; + u32 as_u32; + u64 as_u64; + struct trace_dynamic_info as_dynamic; +}; + struct ftrace_event_field { struct list_head link; const char *name; diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c index d6a70aff24101..da0627fa91caf 100644 --- a/kernel/trace/trace_events_synth.c +++ b/kernel/trace/trace_events_synth.c @@ -127,7 +127,7 @@ static bool synth_event_match(const char *system, const char *event,
struct synth_trace_event { struct trace_entry ent; - u64 fields[]; + union trace_synth_field fields[]; };
static int synth_event_define_fields(struct trace_event_call *call) @@ -321,19 +321,19 @@ static const char *synth_field_fmt(char *type)
static void print_synth_event_num_val(struct trace_seq *s, char *print_fmt, char *name, - int size, u64 val, char *space) + int size, union trace_synth_field *val, char *space) { switch (size) { case 1: - trace_seq_printf(s, print_fmt, name, (u8)val, space); + trace_seq_printf(s, print_fmt, name, val->as_u8, space); break;
case 2: - trace_seq_printf(s, print_fmt, name, (u16)val, space); + trace_seq_printf(s, print_fmt, name, val->as_u16, space); break;
case 4: - trace_seq_printf(s, print_fmt, name, (u32)val, space); + trace_seq_printf(s, print_fmt, name, val->as_u32, space); break;
default: @@ -374,36 +374,26 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter, /* parameter values */ if (se->fields[i]->is_string) { if (se->fields[i]->is_dynamic) { - u32 offset, data_offset; - char *str_field; - - offset = (u32)entry->fields[n_u64]; - data_offset = offset & 0xffff; - - str_field = (char *)entry + data_offset; + union trace_synth_field *data = &entry->fields[n_u64];
trace_seq_printf(s, print_fmt, se->fields[i]->name, STR_VAR_LEN_MAX, - str_field, + (char *)entry + data->as_dynamic.offset, i == se->n_fields - 1 ? "" : " "); n_u64++; } else { trace_seq_printf(s, print_fmt, se->fields[i]->name, STR_VAR_LEN_MAX, - (char *)&entry->fields[n_u64], + (char *)&entry->fields[n_u64].as_u64, i == se->n_fields - 1 ? "" : " "); n_u64 += STR_VAR_LEN_MAX / sizeof(u64); } } else if (se->fields[i]->is_stack) { - u32 offset, data_offset, len; unsigned long *p, *end; + union trace_synth_field *data = &entry->fields[n_u64];
- offset = (u32)entry->fields[n_u64]; - data_offset = offset & 0xffff; - len = offset >> 16; - - p = (void *)entry + data_offset; - end = (void *)p + len - (sizeof(long) - 1); + p = (void *)entry + data->as_dynamic.offset; + end = (void *)p + data->as_dynamic.len - (sizeof(long) - 1);
trace_seq_printf(s, "%s=STACK:\n", se->fields[i]->name);
@@ -419,13 +409,13 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter, print_synth_event_num_val(s, print_fmt, se->fields[i]->name, se->fields[i]->size, - entry->fields[n_u64], + &entry->fields[n_u64], space);
if (strcmp(se->fields[i]->type, "gfp_t") == 0) { trace_seq_puts(s, " ("); trace_print_flags_seq(s, "|", - entry->fields[n_u64], + entry->fields[n_u64].as_u64, __flags); trace_seq_putc(s, ')'); } @@ -454,21 +444,16 @@ static unsigned int trace_string(struct synth_trace_event *entry, int ret;
if (is_dynamic) { - u32 data_offset; + union trace_synth_field *data = &entry->fields[*n_u64];
- data_offset = struct_size(entry, fields, event->n_u64); - data_offset += data_size; - - len = fetch_store_strlen((unsigned long)str_val); - - data_offset |= len << 16; - *(u32 *)&entry->fields[*n_u64] = data_offset; + data->as_dynamic.offset = struct_size(entry, fields, event->n_u64) + data_size; + data->as_dynamic.len = fetch_store_strlen((unsigned long)str_val);
ret = fetch_store_string((unsigned long)str_val, &entry->fields[*n_u64], entry);
(*n_u64)++; } else { - str_field = (char *)&entry->fields[*n_u64]; + str_field = (char *)&entry->fields[*n_u64].as_u64;
#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE if ((unsigned long)str_val < TASK_SIZE) @@ -492,6 +477,7 @@ static unsigned int trace_stack(struct synth_trace_event *entry, unsigned int data_size, unsigned int *n_u64) { + union trace_synth_field *data = &entry->fields[*n_u64]; unsigned int len; u32 data_offset; void *data_loc; @@ -515,8 +501,9 @@ static unsigned int trace_stack(struct synth_trace_event *entry, memcpy(data_loc, stack, len);
/* Fill in the field that holds the offset/len combo */ - data_offset |= len << 16; - *(u32 *)&entry->fields[*n_u64] = data_offset; + + data->as_dynamic.offset = data_offset; + data->as_dynamic.len = len;
(*n_u64)++;
@@ -592,19 +579,19 @@ static notrace void trace_event_raw_event_synth(void *__data,
switch (field->size) { case 1: - *(u8 *)&entry->fields[n_u64] = (u8)val; + entry->fields[n_u64].as_u8 = (u8)val; break;
case 2: - *(u16 *)&entry->fields[n_u64] = (u16)val; + entry->fields[n_u64].as_u16 = (u16)val; break;
case 4: - *(u32 *)&entry->fields[n_u64] = (u32)val; + entry->fields[n_u64].as_u32 = (u32)val; break;
default: - entry->fields[n_u64] = val; + entry->fields[n_u64].as_u64 = val; break; } n_u64++; @@ -1790,19 +1777,19 @@ int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
switch (field->size) { case 1: - *(u8 *)&state.entry->fields[n_u64] = (u8)val; + state.entry->fields[n_u64].as_u8 = (u8)val; break;
case 2: - *(u16 *)&state.entry->fields[n_u64] = (u16)val; + state.entry->fields[n_u64].as_u16 = (u16)val; break;
case 4: - *(u32 *)&state.entry->fields[n_u64] = (u32)val; + state.entry->fields[n_u64].as_u32 = (u32)val; break;
default: - state.entry->fields[n_u64] = val; + state.entry->fields[n_u64].as_u64 = val; break; } n_u64++; @@ -1883,19 +1870,19 @@ int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
switch (field->size) { case 1: - *(u8 *)&state.entry->fields[n_u64] = (u8)val; + state.entry->fields[n_u64].as_u8 = (u8)val; break;
case 2: - *(u16 *)&state.entry->fields[n_u64] = (u16)val; + state.entry->fields[n_u64].as_u16 = (u16)val; break;
case 4: - *(u32 *)&state.entry->fields[n_u64] = (u32)val; + state.entry->fields[n_u64].as_u32 = (u32)val; break;
default: - state.entry->fields[n_u64] = val; + state.entry->fields[n_u64].as_u64 = val; break; } n_u64++; @@ -2030,19 +2017,19 @@ static int __synth_event_add_val(const char *field_name, u64 val, } else { switch (field->size) { case 1: - *(u8 *)&trace_state->entry->fields[field->offset] = (u8)val; + trace_state->entry->fields[field->offset].as_u8 = (u8)val; break;
case 2: - *(u16 *)&trace_state->entry->fields[field->offset] = (u16)val; + trace_state->entry->fields[field->offset].as_u16 = (u16)val; break;
case 4: - *(u32 *)&trace_state->entry->fields[field->offset] = (u32)val; + trace_state->entry->fields[field->offset].as_u32 = (u32)val; break;
default: - trace_state->entry->fields[field->offset] = val; + trace_state->entry->fields[field->offset].as_u64 = val; break; } }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sven Schnelle svens@linux.ibm.com
[ Upstream commit 887f92e09ef34a949745ad26ce82be69e2dabcf6 ]
While debugging another issue I noticed that the stack trace output contains the number of entries on top:
<idle>-0 [000] d..4. 203.322502: wake_lat: pid=0 delta=2268270616 stack=STACK: => 0x10 => __schedule+0xac6/0x1a98 => schedule+0x126/0x2c0 => schedule_timeout+0x242/0x2c0 => __wait_for_common+0x434/0x680 => __wait_rcu_gp+0x198/0x3e0 => synchronize_rcu+0x112/0x138 => ring_buffer_reset_online_cpus+0x140/0x2e0 => tracing_reset_online_cpus+0x15c/0x1d0 => tracing_set_clock+0x180/0x1d8 => hist_register_trigger+0x486/0x670 => event_hist_trigger_parse+0x494/0x1318 => trigger_process_regex+0x1d4/0x258 => event_trigger_write+0xb4/0x170 => vfs_write+0x210/0xad0 => ksys_write+0x122/0x208
Fix this by skipping the first element. Also replace the pointer logic with an index variable which is easier to read.
Link: https://lkml.kernel.org/r/20230816154928.4171614-3-svens@linux.ibm.com
Cc: Masami Hiramatsu mhiramat@kernel.org Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Sven Schnelle svens@linux.ibm.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace_events_synth.c | 17 ++++------------- 1 file changed, 4 insertions(+), 13 deletions(-)
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c index da0627fa91caf..a4c58a932dfa6 100644 --- a/kernel/trace/trace_events_synth.c +++ b/kernel/trace/trace_events_synth.c @@ -350,7 +350,7 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter, struct trace_seq *s = &iter->seq; struct synth_trace_event *entry; struct synth_event *se; - unsigned int i, n_u64; + unsigned int i, j, n_u64; char print_fmt[32]; const char *fmt;
@@ -389,18 +389,13 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter, n_u64 += STR_VAR_LEN_MAX / sizeof(u64); } } else if (se->fields[i]->is_stack) { - unsigned long *p, *end; union trace_synth_field *data = &entry->fields[n_u64]; - - p = (void *)entry + data->as_dynamic.offset; - end = (void *)p + data->as_dynamic.len - (sizeof(long) - 1); + unsigned long *p = (void *)entry + data->as_dynamic.offset;
trace_seq_printf(s, "%s=STACK:\n", se->fields[i]->name); - - for (; *p && p < end; p++) - trace_seq_printf(s, "=> %pS\n", (void *)*p); + for (j = 1; j < data->as_dynamic.len / sizeof(long); j++) + trace_seq_printf(s, "=> %pS\n", (void *)p[j]); n_u64++; - } else { struct trace_print_flags __flags[] = { __def_gfpflag_names, {-1, NULL} }; @@ -490,10 +485,6 @@ static unsigned int trace_stack(struct synth_trace_event *entry, break; }
- /* Include the zero'd element if it fits */ - if (len < HIST_STACKTRACE_DEPTH) - len++; - len *= sizeof(long);
/* Find the dynamic section to copy the stack into. */
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sven Schnelle svens@linux.ibm.com
[ Upstream commit c4d6b5438116c184027b2e911c0f2c7c406fb47c ]
While debugging another issue I noticed that the stack trace contains one invalid entry at the end:
<idle>-0 [008] d..4. 26.484201: wake_lat: pid=0 delta=2629976084 000000009cc24024 stack=STACK: => __schedule+0xac6/0x1a98 => schedule+0x126/0x2c0 => schedule_timeout+0x150/0x2c0 => kcompactd+0x9ca/0xc20 => kthread+0x2f6/0x3d8 => __ret_from_fork+0x8a/0xe8 => 0x6b6b6b6b6b6b6b6b
This is because the code failed to add the one element containing the number of entries to field_size.
Link: https://lkml.kernel.org/r/20230816154928.4171614-4-svens@linux.ibm.com
Cc: Masami Hiramatsu mhiramat@kernel.org Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Sven Schnelle svens@linux.ibm.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace_events_synth.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c index a4c58a932dfa6..32109d092b10f 100644 --- a/kernel/trace/trace_events_synth.c +++ b/kernel/trace/trace_events_synth.c @@ -528,7 +528,8 @@ static notrace void trace_event_raw_event_synth(void *__data, str_val = (char *)(long)var_ref_vals[val_idx];
if (event->dynamic_fields[i]->is_stack) { - len = *((unsigned long *)str_val); + /* reserve one extra element for size */ + len = *((unsigned long *)str_val) + 1; len *= sizeof(unsigned long); } else { len = fetch_store_strlen((unsigned long)str_val);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zheng Yejian zhengyejian1@huawei.com
[ Upstream commit eecb91b9f98d6427d4af5fdb8f108f52572a39e7 ]
Kmemleak report a leak in graph_trace_open():
unreferenced object 0xffff0040b95f4a00 (size 128): comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s) hex dump (first 32 bytes): e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}.......... f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e... backtrace: [<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0 [<000000007df90faa>] graph_trace_open+0xb0/0x344 [<00000000737524cd>] __tracing_open+0x450/0xb10 [<0000000098043327>] tracing_open+0x1a0/0x2a0 [<00000000291c3876>] do_dentry_open+0x3c0/0xdc0 [<000000004015bcd6>] vfs_open+0x98/0xd0 [<000000002b5f60c9>] do_open+0x520/0x8d0 [<00000000376c7820>] path_openat+0x1c0/0x3e0 [<00000000336a54b5>] do_filp_open+0x14c/0x324 [<000000002802df13>] do_sys_openat2+0x2c4/0x530 [<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4 [<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394 [<00000000313647bf>] do_el0_svc+0xac/0xec [<000000002ef1c651>] el0_svc+0x20/0x30 [<000000002fd4692a>] el0_sync_handler+0xb0/0xb4 [<000000000c309c35>] el0_sync+0x160/0x180
The root cause is descripted as follows:
__tracing_open() { // 1. File 'trace' is being opened; ... *iter->trace = *tr->current_trace; // 2. Tracer 'function_graph' is // currently set; ... iter->trace->open(iter); // 3. Call graph_trace_open() here, // and memory are allocated in it; ... }
s_start() { // 4. The opened file is being read; ... *iter->trace = *tr->current_trace; // 5. If tracer is switched to // 'nop' or others, then memory // in step 3 are leaked!!! ... }
To fix it, in s_start(), close tracer before switching then reopen the new tracer after switching. And some tracers like 'wakeup' may not update 'iter->private' in some cases when reopen, then it should be cleared to avoid being mistakenly closed again.
Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyej...
Fixes: d7350c3f4569 ("tracing/core: make the read callbacks reentrants") Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace.c | 9 ++++++++- kernel/trace/trace_irqsoff.c | 3 ++- kernel/trace/trace_sched_wakeup.c | 2 ++ 3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 17663ce4936a4..f4855be6ac2b5 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4196,8 +4196,15 @@ static void *s_start(struct seq_file *m, loff_t *pos) * will point to the same string as current_trace->name. */ mutex_lock(&trace_types_lock); - if (unlikely(tr->current_trace && iter->trace->name != tr->current_trace->name)) + if (unlikely(tr->current_trace && iter->trace->name != tr->current_trace->name)) { + /* Close iter->trace before switching to the new current tracer */ + if (iter->trace->close) + iter->trace->close(iter); *iter->trace = *tr->current_trace; + /* Reopen the new current tracer */ + if (iter->trace->open) + iter->trace->open(iter); + } mutex_unlock(&trace_types_lock);
#ifdef CONFIG_TRACER_MAX_TRACE diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index 590b3d51afae9..ba37f768e2f27 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -231,7 +231,8 @@ static void irqsoff_trace_open(struct trace_iterator *iter) { if (is_graph(iter->tr)) graph_trace_open(iter); - + else + iter->private = NULL; }
static void irqsoff_trace_close(struct trace_iterator *iter) diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c index 330aee1c1a49e..0469a04a355f2 100644 --- a/kernel/trace/trace_sched_wakeup.c +++ b/kernel/trace/trace_sched_wakeup.c @@ -168,6 +168,8 @@ static void wakeup_trace_open(struct trace_iterator *iter) { if (is_graph(iter->tr)) graph_trace_open(iter); + else + iter->private = NULL; }
static void wakeup_trace_close(struct trace_iterator *iter)
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hariprasad Kelam hkelam@marvell.com
[ Upstream commit 05f3d5bc23524bed6f043dfe6b44da687584f9fb ]
On SDP interfaces, frame oversize and undersize errors are observed as driver is not considering packet sizes of all subscribers of the link before updating the link config.
This patch fixes the same.
Fixes: 9b7dd87ac071 ("octeontx2-af: Support to modify min/max allowed packet lengths") Signed-off-by: Hariprasad Kelam hkelam@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/20230817063006.10366-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c index 8cdf91a5bf44f..49c1dbe5ec788 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c @@ -4016,9 +4016,10 @@ int rvu_mbox_handler_nix_set_hw_frs(struct rvu *rvu, struct nix_frs_cfg *req, if (link < 0) return NIX_AF_ERR_RX_LINK_INVALID;
- nix_find_link_frs(rvu, req, pcifunc);
linkcfg: + nix_find_link_frs(rvu, req, pcifunc); + cfg = rvu_read64(rvu, blkaddr, NIX_AF_RX_LINKX_CFG(link)); cfg = (cfg & ~(0xFFFFULL << 16)) | ((u64)req->maxlen << 16); if (req->update_minlen)
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Pirko jiri@nvidia.com
[ Upstream commit 2ebbc9752d06bb1d01201fe632cb6da033b0248d ]
Cited fixes commit introduced linecard notifications for register, however it didn't add them for unregister. Fix that by adding them.
Fixes: c246f9b5fd61 ("devlink: add support to create line card and expose to user") Signed-off-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230817125240.2144794-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/devlink/leftover.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 790e61b2a9404..6ef6090eeffe5 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -6739,6 +6739,7 @@ void devlink_notify_unregister(struct devlink *devlink) struct devlink_param_item *param_item; struct devlink_trap_item *trap_item; struct devlink_port *devlink_port; + struct devlink_linecard *linecard; struct devlink_rate *rate_node; struct devlink_region *region; unsigned long port_index; @@ -6767,6 +6768,8 @@ void devlink_notify_unregister(struct devlink *devlink)
xa_for_each(&devlink->ports, port_index, devlink_port) devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL); + list_for_each_entry_reverse(linecard, &devlink->linecard_list, list) + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_DEL); devlink_notify(devlink, DEVLINK_CMD_DEL); }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit d44036cad31170da0cb9c728e80743f84267da6e ]
The blamed commit resolved a bug where frames would still get stuck at egress, even though they're smaller than the maxSDU[tc], because the driver did not take into account the extra 33 ns that the queue system needs for scheduling the frame.
It now takes that into account, but the arithmetic that we perform in vsc9959_tas_remaining_gate_len_ps() is buggy, because we operate on 64-bit unsigned integers, so gate_len_ns - VSC9959_TAS_MIN_GATE_LEN_NS may become a very large integer if gate_len_ns < 33 ns.
In practice, this means that we've introduced a regression where all traffic class gates which are permanently closed will not get detected by the driver, and we won't enable oversize frame dropping for them.
Before: mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000 mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames mscc_felix 0000:00:00.5: port 0 tc 1 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 2 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 3 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 4 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 5 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 6 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS
After: mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000 mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS
Fixes: 11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230817120111.3522827-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/ocelot/felix_vsc9959.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c index ca69973ae91b9..b1ecd08cec96a 100644 --- a/drivers/net/dsa/ocelot/felix_vsc9959.c +++ b/drivers/net/dsa/ocelot/felix_vsc9959.c @@ -1081,6 +1081,9 @@ static u64 vsc9959_tas_remaining_gate_len_ps(u64 gate_len_ns) if (gate_len_ns == U64_MAX) return U64_MAX;
+ if (gate_len_ns < VSC9959_TAS_MIN_GATE_LEN_NS) + return 0; + return (gate_len_ns - VSC9959_TAS_MIN_GATE_LEN_NS) * PSEC_PER_NSEC; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 76f33296d2e09f63118db78125c95ef56df438e9 ]
*prot->memory_pressure is read/writen locklessly, we need to add proper annotations.
A recent commit added a new race, it is time to audit all accesses.
Fixes: 2d0c88e84e48 ("sock: Fix misuse of sk_under_memory_pressure()") Fixes: 4d93df0abd50 ("[SCTP]: Rewrite of sctp buffer management code") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Abel Wu wuyun.abel@bytedance.com Reviewed-by: Shakeel Butt shakeelb@google.com Link: https://lore.kernel.org/r/20230818015132.2699348-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/sock.h | 7 ++++--- net/sctp/socket.c | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h index 415f3840a26aa..d0d796d51a504 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1324,6 +1324,7 @@ struct proto { /* * Pressure flag: try to collapse. * Technical note: it is used by multiple contexts non atomically. + * Make sure to use READ_ONCE()/WRITE_ONCE() for all reads/writes. * All the __sk_mem_schedule() is of this nature: accounting * is strict, actions are advisory and have some latency. */ @@ -1424,7 +1425,7 @@ static inline bool sk_has_memory_pressure(const struct sock *sk) static inline bool sk_under_global_memory_pressure(const struct sock *sk) { return sk->sk_prot->memory_pressure && - !!*sk->sk_prot->memory_pressure; + !!READ_ONCE(*sk->sk_prot->memory_pressure); }
static inline bool sk_under_memory_pressure(const struct sock *sk) @@ -1436,7 +1437,7 @@ static inline bool sk_under_memory_pressure(const struct sock *sk) mem_cgroup_under_socket_pressure(sk->sk_memcg)) return true;
- return !!*sk->sk_prot->memory_pressure; + return !!READ_ONCE(*sk->sk_prot->memory_pressure); }
static inline long @@ -1513,7 +1514,7 @@ proto_memory_pressure(struct proto *prot) { if (!prot->memory_pressure) return false; - return !!*prot->memory_pressure; + return !!READ_ONCE(*prot->memory_pressure); }
diff --git a/net/sctp/socket.c b/net/sctp/socket.c index ee15eff6364ee..de52045774303 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -99,7 +99,7 @@ struct percpu_counter sctp_sockets_allocated;
static void sctp_enter_memory_pressure(struct sock *sk) { - sctp_memory_pressure = 1; + WRITE_ONCE(sctp_memory_pressure, 1); }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit cba3f1786916063261e3e5ccbb803abc325b24ef ]
We changed tcp_poll() over time, bug never updated dccp.
Note that we also could remove dccp instead of maintaining it.
Fixes: 7c657876b63c ("[DCCP]: Initial implementation") Signed-off-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20230818015820.2701595-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/dccp/proto.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 18873f2308ec8..f3494cb5fab04 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -315,11 +315,15 @@ EXPORT_SYMBOL_GPL(dccp_disconnect); __poll_t dccp_poll(struct file *file, struct socket *sock, poll_table *wait) { - __poll_t mask; struct sock *sk = sock->sk; + __poll_t mask; + u8 shutdown; + int state;
sock_poll_wait(file, sock, wait); - if (sk->sk_state == DCCP_LISTEN) + + state = inet_sk_state_load(sk); + if (state == DCCP_LISTEN) return inet_csk_listen_poll(sk);
/* Socket is not locked. We are protected from async events @@ -328,20 +332,21 @@ __poll_t dccp_poll(struct file *file, struct socket *sock, */
mask = 0; - if (sk->sk_err) + if (READ_ONCE(sk->sk_err)) mask = EPOLLERR; + shutdown = READ_ONCE(sk->sk_shutdown);
- if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == DCCP_CLOSED) + if (shutdown == SHUTDOWN_MASK || state == DCCP_CLOSED) mask |= EPOLLHUP; - if (sk->sk_shutdown & RCV_SHUTDOWN) + if (shutdown & RCV_SHUTDOWN) mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
/* Connected? */ - if ((1 << sk->sk_state) & ~(DCCPF_REQUESTING | DCCPF_RESPOND)) { + if ((1 << state) & ~(DCCPF_REQUESTING | DCCPF_RESPOND)) { if (atomic_read(&sk->sk_rmem_alloc) > 0) mask |= EPOLLIN | EPOLLRDNORM;
- if (!(sk->sk_shutdown & SEND_SHUTDOWN)) { + if (!(shutdown & SEND_SHUTDOWN)) { if (sk_stream_is_writeable(sk)) { mask |= EPOLLOUT | EPOLLWRNORM; } else { /* send SIGIO later */ @@ -359,7 +364,6 @@ __poll_t dccp_poll(struct file *file, struct socket *sock, } return mask; } - EXPORT_SYMBOL_GPL(dccp_poll);
int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg)
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lu Wei luwei32@huawei.com
[ Upstream commit 043d5f68d0ccdda91029b4b6dce7eeffdcfad281 ]
There are two network devices(veth1 and veth3) in ns1, and ipvlan1 with L3S mode and ipvlan2 with L2 mode are created based on them as figure (1). In this case, ipvlan_register_nf_hook() will be called to register nf hook which is needed by ipvlans in L3S mode in ns1 and value of ipvl_nf_hook_refcnt is set to 1.
(1) ns1 ns2 ------------ ------------
veth1--ipvlan1 (L3S)
veth3--ipvlan2 (L2)
(2) ns1 ns2 ------------ ------------
veth1--ipvlan1 (L3S)
ipvlan2 (L2) veth3 | | |------->-------->--------->-------- migrate
When veth3 migrates from ns1 to ns2 as figure (2), veth3 will register in ns2 and calls call_netdevice_notifiers with NETDEV_REGISTER event:
dev_change_net_namespace call_netdevice_notifiers ipvlan_device_event ipvlan_migrate_l3s_hook ipvlan_register_nf_hook(newnet) (I) ipvlan_unregister_nf_hook(oldnet) (II)
In function ipvlan_migrate_l3s_hook(), ipvl_nf_hook_refcnt in ns1 is not 0 since veth1 with ipvlan1 still in ns1, (I) and (II) will be called to register nf_hook in ns2 and unregister nf_hook in ns1. As a result, ipvl_nf_hook_refcnt in ns1 is decreased incorrectly and this in ns2 is increased incorrectly. When the second net namespace is removed, a reference count leak warning in ipvlan_ns_exit() will be triggered.
This patch add a check before ipvlan_migrate_l3s_hook() is called. The warning can be triggered as follows:
$ ip netns add ns1 $ ip netns add ns2 $ ip netns exec ns1 ip link add veth1 type veth peer name veth2 $ ip netns exec ns1 ip link add veth3 type veth peer name veth4 $ ip netns exec ns1 ip link add ipv1 link veth1 type ipvlan mode l3s $ ip netns exec ns1 ip link add ipv2 link veth3 type ipvlan mode l2 $ ip netns exec ns1 ip link set veth3 netns ns2 $ ip net del ns2
Fixes: 3133822f5ac1 ("ipvlan: use pernet operations and restrict l3s hooks to master netns") Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Florian Westphal fw@strlen.de Link: https://lore.kernel.org/r/20230817145449.141827-1-luwei32@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ipvlan/ipvlan_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index b15dd9a3ad540..1b55928e89b8a 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -748,7 +748,8 @@ static int ipvlan_device_event(struct notifier_block *unused,
write_pnet(&port->pnet, newnet);
- ipvlan_migrate_l3s_hook(oldnet, newnet); + if (port->mode == IPVLAN_MODE_L3S) + ipvlan_migrate_l3s_hook(oldnet, newnet); break; } case NETDEV_UNREGISTER:
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Danielle Ratson danieller@nvidia.com
[ Upstream commit bc2de151ab6ad0762a04563527ec42e54dde572a ]
Currently, in Spectrum-2 and above, time stamps are extracted from the CQE into the time stamp fields in 'struct mlxsw_skb_cb', only when the CQE time stamp type is UTC. The time stamps are read directly from the CQE and software can get the time stamp in UTC format using CQEv2.
From Spectrum-4, the time stamps that are read from the CQE are allowed
to be also from MIRROR_UTC type.
Therefore, we get a warning [1] from the driver that the time stamp fields were not set, when LLDP control packet is sent.
Allow the time stamp type to be MIRROR_UTC and set the time stamp in this case as well.
[1] WARNING: CPU: 11 PID: 0 at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:1409 mlxsw_sp2_ptp_hwtstamp_fill+0x1f/0x70 [mlxsw_spectrum] [...] Call Trace: <IRQ> mlxsw_sp2_ptp_receive+0x3c/0x80 [mlxsw_spectrum] mlxsw_core_skb_receive+0x119/0x190 [mlxsw_core] mlxsw_pci_cq_tasklet+0x3c9/0x780 [mlxsw_pci] tasklet_action_common.constprop.0+0x9f/0x110 __do_softirq+0xbb/0x296 irq_exit_rcu+0x79/0xa0 common_interrupt+0x86/0xa0 </IRQ> <TASK>
Fixes: 4735402173e6 ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC") Signed-off-by: Danielle Ratson danieller@nvidia.com Reviewed-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Petr Machata petrm@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/bcef4d044ef608a4e258d33a7ec0ecd91f480db5.169226842... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlxsw/pci.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c index c968309657dd1..51eea1f0529c8 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c @@ -517,11 +517,15 @@ static void mlxsw_pci_skb_cb_ts_set(struct mlxsw_pci *mlxsw_pci, struct sk_buff *skb, enum mlxsw_pci_cqe_v cqe_v, char *cqe) { + u8 ts_type; + if (cqe_v != MLXSW_PCI_CQE_V2) return;
- if (mlxsw_pci_cqe2_time_stamp_type_get(cqe) != - MLXSW_PCI_CQE_TIME_STAMP_TYPE_UTC) + ts_type = mlxsw_pci_cqe2_time_stamp_type_get(cqe); + + if (ts_type != MLXSW_PCI_CQE_TIME_STAMP_TYPE_UTC && + ts_type != MLXSW_PCI_CQE_TIME_STAMP_TYPE_MIRROR_UTC) return;
mlxsw_skb_cb(skb)->cqe_ts.sec = mlxsw_pci_cqe2_time_stamp_sec_get(cqe);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit 0dc63b9cfd4c5666ced52c829fdd65dcaeb9f0f1 ]
The two most significant bits of the "local_port" field in the SSPR register are always cleared since they are overwritten by the deprecated and overlapping "sub_port" field.
On systems with more than 255 local ports (e.g., Spectrum-4), this results in the firmware maintaining invalid mappings between system port and local port. Specifically, two different systems ports (0x1 and 0x101) point to the same local port (0x1), which eventually leads to firmware errors.
Fix by removing the deprecated "sub_port" field.
Fixes: fd24b29a1b74 ("mlxsw: reg: Align existing registers to use extended local_port field") Signed-off-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Petr Machata petrm@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/9b909a3033c8d3d6f67f237306bef4411c5e6ae4.169226842... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlxsw/reg.h | 9 --------- 1 file changed, 9 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h index 8165bf31a99ae..17160e867befb 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/reg.h +++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h @@ -97,14 +97,6 @@ MLXSW_ITEM32(reg, sspr, m, 0x00, 31, 1); */ MLXSW_ITEM32_LP(reg, sspr, 0x00, 16, 0x00, 12);
-/* reg_sspr_sub_port - * Virtual port within the physical port. - * Should be set to 0 when virtual ports are not enabled on the port. - * - * Access: RW - */ -MLXSW_ITEM32(reg, sspr, sub_port, 0x00, 8, 8); - /* reg_sspr_system_port * Unique identifier within the stacking domain that represents all the ports * that are available in the system (external ports). @@ -120,7 +112,6 @@ static inline void mlxsw_reg_sspr_pack(char *payload, u16 local_port) MLXSW_REG_ZERO(sspr, payload); mlxsw_reg_sspr_m_set(payload, 1); mlxsw_reg_sspr_local_port_set(payload, local_port); - mlxsw_reg_sspr_sub_port_set(payload, 0); mlxsw_reg_sspr_system_port_set(payload, local_port); }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amit Cohen amcohen@nvidia.com
[ Upstream commit 348c976be0a599918b88729def198a843701c9fe ]
The field 'virtual router' was extended to 12 bits in Spectrum-4. Therefore, the element 'MLXSW_AFK_ELEMENT_VIRT_ROUTER_MSB' needs 3 bits for Spectrum < 4 and 4 bits for Spectrum >= 4.
The elements are stored in an internal storage scratchpad. Currently, the MSB is defined there as 3 bits. It means that for Spectrum-4, only 2K VRFs can be used for multicast routing, as the highest bit is not really used by the driver. Fix the definition of 'VIRT_ROUTER_MSB' to use 4 bits. Adjust the definitions of 'virtual router' field in the blocks accordingly - use '_avoid_size_check' for Spectrum-2 instead of for Spectrum-4. Fix the mask in parse function to use 4 bits.
Fixes: 6d5d8ebb881c ("mlxsw: Rename virtual router flex key element") Signed-off-by: Amit Cohen amcohen@nvidia.com Reviewed-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Petr Machata petrm@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/79bed2b70f6b9ed58d4df02e9798a23da648015b.169226842... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c | 4 ++-- drivers/net/ethernet/mellanox/mlxsw/spectrum2_mr_tcam.c | 2 +- drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c index bd1a51a0a5408..f208a237d0b52 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c @@ -32,8 +32,8 @@ static const struct mlxsw_afk_element_info mlxsw_afk_element_infos[] = { MLXSW_AFK_ELEMENT_INFO_U32(IP_TTL_, 0x18, 0, 8), MLXSW_AFK_ELEMENT_INFO_U32(IP_ECN, 0x18, 9, 2), MLXSW_AFK_ELEMENT_INFO_U32(IP_DSCP, 0x18, 11, 6), - MLXSW_AFK_ELEMENT_INFO_U32(VIRT_ROUTER_MSB, 0x18, 17, 3), - MLXSW_AFK_ELEMENT_INFO_U32(VIRT_ROUTER_LSB, 0x18, 20, 8), + MLXSW_AFK_ELEMENT_INFO_U32(VIRT_ROUTER_MSB, 0x18, 17, 4), + MLXSW_AFK_ELEMENT_INFO_U32(VIRT_ROUTER_LSB, 0x18, 21, 8), MLXSW_AFK_ELEMENT_INFO_BUF(SRC_IP_96_127, 0x20, 4), MLXSW_AFK_ELEMENT_INFO_BUF(SRC_IP_64_95, 0x24, 4), MLXSW_AFK_ELEMENT_INFO_BUF(SRC_IP_32_63, 0x28, 4), diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum2_mr_tcam.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum2_mr_tcam.c index e4f4cded2b6f9..b1178b7a7f51a 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum2_mr_tcam.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum2_mr_tcam.c @@ -193,7 +193,7 @@ mlxsw_sp2_mr_tcam_rule_parse(struct mlxsw_sp_acl_rule *rule, key->vrid, GENMASK(7, 0)); mlxsw_sp_acl_rulei_keymask_u32(rulei, MLXSW_AFK_ELEMENT_VIRT_ROUTER_MSB, - key->vrid >> 8, GENMASK(2, 0)); + key->vrid >> 8, GENMASK(3, 0)); switch (key->proto) { case MLXSW_SP_L3_PROTO_IPV4: return mlxsw_sp2_mr_tcam_rule_parse4(rulei, key); diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.c index 00c32320f8915..173808c096bab 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.c @@ -169,7 +169,7 @@ static struct mlxsw_afk_element_inst mlxsw_sp_afk_element_info_ipv4_2[] = {
static struct mlxsw_afk_element_inst mlxsw_sp_afk_element_info_ipv4_4[] = { MLXSW_AFK_ELEMENT_INST_U32(VIRT_ROUTER_LSB, 0x04, 24, 8), - MLXSW_AFK_ELEMENT_INST_U32(VIRT_ROUTER_MSB, 0x00, 0, 3), + MLXSW_AFK_ELEMENT_INST_EXT_U32(VIRT_ROUTER_MSB, 0x00, 0, 3, 0, true), };
static struct mlxsw_afk_element_inst mlxsw_sp_afk_element_info_ipv6_0[] = { @@ -319,7 +319,7 @@ static struct mlxsw_afk_element_inst mlxsw_sp_afk_element_info_mac_5b[] = {
static struct mlxsw_afk_element_inst mlxsw_sp_afk_element_info_ipv4_4b[] = { MLXSW_AFK_ELEMENT_INST_U32(VIRT_ROUTER_LSB, 0x04, 13, 8), - MLXSW_AFK_ELEMENT_INST_EXT_U32(VIRT_ROUTER_MSB, 0x04, 21, 4, 0, true), + MLXSW_AFK_ELEMENT_INST_U32(VIRT_ROUTER_MSB, 0x04, 21, 4), };
static struct mlxsw_afk_element_inst mlxsw_sp_afk_element_info_ipv6_2b[] = {
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f520489e99a35b0a5257667274fbe9afd2d8c50b ]
Remove assumptions about shared buffer cell size and instead query the cell size from devlink. Adjust the test to send small packets that fit inside a single cell.
Tested on Spectrum-{1,2,3,4}.
Fixes: 4735402173e6 ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Signed-off-by: Petr Machata petrm@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/f7dfbf3c4d1cb23838d9eb99bab09afaa320c4ca.169226842... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/drivers/net/mlxsw/sharedbuffer.sh | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh b/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh index 7d9e73a43a49b..0c47faff9274b 100755 --- a/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh +++ b/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh @@ -98,12 +98,12 @@ sb_occ_etc_check()
port_pool_test() { - local exp_max_occ=288 + local exp_max_occ=$(devlink_cell_size_get) local max_occ
devlink sb occupancy clearmax $DEVLINK_DEV
- $MZ $h1 -c 1 -p 160 -a $h1mac -b $h2mac -A 192.0.1.1 -B 192.0.1.2 \ + $MZ $h1 -c 1 -p 10 -a $h1mac -b $h2mac -A 192.0.1.1 -B 192.0.1.2 \ -t ip -q
devlink sb occupancy snapshot $DEVLINK_DEV @@ -126,12 +126,12 @@ port_pool_test()
port_tc_ip_test() { - local exp_max_occ=288 + local exp_max_occ=$(devlink_cell_size_get) local max_occ
devlink sb occupancy clearmax $DEVLINK_DEV
- $MZ $h1 -c 1 -p 160 -a $h1mac -b $h2mac -A 192.0.1.1 -B 192.0.1.2 \ + $MZ $h1 -c 1 -p 10 -a $h1mac -b $h2mac -A 192.0.1.1 -B 192.0.1.2 \ -t ip -q
devlink sb occupancy snapshot $DEVLINK_DEV @@ -154,16 +154,12 @@ port_tc_ip_test()
port_tc_arp_test() { - local exp_max_occ=96 + local exp_max_occ=$(devlink_cell_size_get) local max_occ
- if [[ $MLXSW_CHIP != "mlxsw_spectrum" ]]; then - exp_max_occ=144 - fi - devlink sb occupancy clearmax $DEVLINK_DEV
- $MZ $h1 -c 1 -p 160 -a $h1mac -A 192.0.1.1 -t arp -q + $MZ $h1 -c 1 -p 10 -a $h1mac -A 192.0.1.1 -t arp -q
devlink sb occupancy snapshot $DEVLINK_DEV
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arınç ÜNAL arinc.unal@arinc9.com
[ Upstream commit e94b590abfff2cdbf0bdaa7d9904364c8d480af5 ]
802.1X PAE frames are link-local frames, therefore they must be trapped to the CPU port. Currently, the MT753X switches treat 802.1X PAE frames as regular multicast frames, therefore flooding them to user ports. To fix this, set 802.1X PAE frames to be trapped to the CPU port(s).
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL arinc.unal@arinc9.com Reviewed-by: Vladimir Oltean olteanv@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mt7530.c | 4 ++++ drivers/net/dsa/mt7530.h | 2 ++ 2 files changed, 6 insertions(+)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index 7e773c4ba0463..32dc4f19c82c6 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -1006,6 +1006,10 @@ mt753x_trap_frames(struct mt7530_priv *priv) mt7530_rmw(priv, MT753X_BPC, MT753X_BPDU_PORT_FW_MASK, MT753X_BPDU_CPU_ONLY);
+ /* Trap 802.1X PAE frames to the CPU port(s) */ + mt7530_rmw(priv, MT753X_BPC, MT753X_PAE_PORT_FW_MASK, + MT753X_PAE_PORT_FW(MT753X_BPDU_CPU_ONLY)); + /* Trap LLDP frames with :0E MAC DA to the CPU port(s) */ mt7530_rmw(priv, MT753X_RGAC2, MT753X_R0E_PORT_FW_MASK, MT753X_R0E_PORT_FW(MT753X_BPDU_CPU_ONLY)); diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h index 08045b035e6ab..17e42d30fff4b 100644 --- a/drivers/net/dsa/mt7530.h +++ b/drivers/net/dsa/mt7530.h @@ -66,6 +66,8 @@ enum mt753x_id { /* Registers for BPDU and PAE frame control*/ #define MT753X_BPC 0x24 #define MT753X_BPDU_PORT_FW_MASK GENMASK(2, 0) +#define MT753X_PAE_PORT_FW_MASK GENMASK(18, 16) +#define MT753X_PAE_PORT_FW(x) FIELD_PREP(MT753X_PAE_PORT_FW_MASK, x)
/* Register for :03 and :0E MAC DA frame control */ #define MT753X_RGAC2 0x2c
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Serge Semin fancer.lancer@gmail.com
[ Upstream commit 2572ce62415cf3b632391091447252e2661ed520 ]
Based on the original code semantic in case of Clause 45 MDIO, the address command is supposed to be followed by the command sending the MMD address, not the CSR address. The commit 002dd3de097c ("net: mdio: mdio-bitbang: Separate C22 and C45 transactions") has erroneously broken that. So most likely due to an unfortunate variable name it switched the code to sending the CSR address. In our case it caused the protocol malfunction so the read operation always failed with the turnaround bit always been driven to one by PHY instead of zero. Fix that by getting back the correct behaviour: sending MMD address command right after the regular address command.
Fixes: 002dd3de097c ("net: mdio: mdio-bitbang: Separate C22 and C45 transactions") Signed-off-by: Serge Semin fancer.lancer@gmail.com Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/mdio/mdio-bitbang.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/mdio/mdio-bitbang.c b/drivers/net/mdio/mdio-bitbang.c index b83932562be21..81b7748c10ce0 100644 --- a/drivers/net/mdio/mdio-bitbang.c +++ b/drivers/net/mdio/mdio-bitbang.c @@ -186,7 +186,7 @@ int mdiobb_read_c45(struct mii_bus *bus, int phy, int devad, int reg) struct mdiobb_ctrl *ctrl = bus->priv;
mdiobb_cmd_addr(ctrl, phy, devad, reg); - mdiobb_cmd(ctrl, MDIO_C45_READ, phy, reg); + mdiobb_cmd(ctrl, MDIO_C45_READ, phy, devad);
return mdiobb_read_common(bus, phy); } @@ -222,7 +222,7 @@ int mdiobb_write_c45(struct mii_bus *bus, int phy, int devad, int reg, u16 val) struct mdiobb_ctrl *ctrl = bus->priv;
mdiobb_cmd_addr(ctrl, phy, devad, reg); - mdiobb_cmd(ctrl, MDIO_C45_WRITE, phy, reg); + mdiobb_cmd(ctrl, MDIO_C45_WRITE, phy, devad);
return mdiobb_write_common(bus, val); }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ruan Jinjie ruanjinjie@huawei.com
[ Upstream commit 23a14488ea5882dea5851b65c9fce2127ee8fcad ]
The fixed_phy_register() function returns error pointers and never returns NULL. Update the checks accordingly.
Fixes: c25b23b8a387 ("bgmac: register fixed PHY for ARM BCM470X / BCM5301X chipsets") Signed-off-by: Ruan Jinjie ruanjinjie@huawei.com Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bgmac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c index 10c7c232cc4ec..52ee3751187a2 100644 --- a/drivers/net/ethernet/broadcom/bgmac.c +++ b/drivers/net/ethernet/broadcom/bgmac.c @@ -1448,7 +1448,7 @@ int bgmac_phy_connect_direct(struct bgmac *bgmac) int err;
phy_dev = fixed_phy_register(PHY_POLL, &fphy_status, NULL); - if (!phy_dev || IS_ERR(phy_dev)) { + if (IS_ERR(phy_dev)) { dev_err(bgmac->dev, "Failed to register fixed PHY device\n"); return -ENODEV; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ruan Jinjie ruanjinjie@huawei.com
[ Upstream commit 32bbe64a1386065ab2aef8ce8cae7c689d0add6e ]
The fixed_phy_register() function returns error pointers and never returns NULL. Update the checks accordingly.
Fixes: b0ba512e25d7 ("net: bcmgenet: enable driver to work without a device tree") Signed-off-by: Ruan Jinjie ruanjinjie@huawei.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Acked-by: Doug Berger opendmb@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/genet/bcmmii.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c b/drivers/net/ethernet/broadcom/genet/bcmmii.c index 0092e46c46f83..cc3afb605b1ec 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmmii.c +++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c @@ -617,7 +617,7 @@ static int bcmgenet_mii_pd_init(struct bcmgenet_priv *priv) };
phydev = fixed_phy_register(PHY_POLL, &fphy_status, NULL); - if (!phydev || IS_ERR(phydev)) { + if (IS_ERR(phydev)) { dev_err(kdev, "failed to register fixed PHY device\n"); return -ENODEV; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit f534f6581ec084fe94d6759f7672bd009794b07e ]
veth and vxcan need to make sure the ifindexes of the peer are not negative, core does not validate this.
Using iproute2 with user-space-level checking removed:
Before:
# ./ip link add index 10 type veth peer index -1 # ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff 10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff -1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff
Now:
$ ./ip link add index 10 type veth peer index -1 Error: ifindex can't be negative.
This problem surfaced in net-next because an explicit WARN() was added, the root cause is older.
Fixes: e6f8f1a739b6 ("veth: Allow to create peer link with given ifindex") Fixes: a8f820a380a2 ("can: add Virtual CAN Tunnel driver (vxcan)") Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/vxcan.c | 7 +------ drivers/net/veth.c | 5 +---- include/net/rtnetlink.h | 4 ++-- net/core/rtnetlink.c | 22 ++++++++++++++++++---- 4 files changed, 22 insertions(+), 16 deletions(-)
diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c index 4068d962203d6..98c669ad51414 100644 --- a/drivers/net/can/vxcan.c +++ b/drivers/net/can/vxcan.c @@ -192,12 +192,7 @@ static int vxcan_newlink(struct net *net, struct net_device *dev,
nla_peer = data[VXCAN_INFO_PEER]; ifmp = nla_data(nla_peer); - err = rtnl_nla_parse_ifla(peer_tb, - nla_data(nla_peer) + - sizeof(struct ifinfomsg), - nla_len(nla_peer) - - sizeof(struct ifinfomsg), - NULL); + err = rtnl_nla_parse_ifinfomsg(peer_tb, nla_peer, extack); if (err < 0) return err;
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 76019949e3fe9..c977b704f1342 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1851,10 +1851,7 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
nla_peer = data[VETH_INFO_PEER]; ifmp = nla_data(nla_peer); - err = rtnl_nla_parse_ifla(peer_tb, - nla_data(nla_peer) + sizeof(struct ifinfomsg), - nla_len(nla_peer) - sizeof(struct ifinfomsg), - NULL); + err = rtnl_nla_parse_ifinfomsg(peer_tb, nla_peer, extack); if (err < 0) return err;
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h index d9076a7a430c2..6506221c5fe31 100644 --- a/include/net/rtnetlink.h +++ b/include/net/rtnetlink.h @@ -190,8 +190,8 @@ int rtnl_delete_link(struct net_device *dev, u32 portid, const struct nlmsghdr * int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm, u32 portid, const struct nlmsghdr *nlh);
-int rtnl_nla_parse_ifla(struct nlattr **tb, const struct nlattr *head, int len, - struct netlink_ext_ack *exterr); +int rtnl_nla_parse_ifinfomsg(struct nlattr **tb, const struct nlattr *nla_peer, + struct netlink_ext_ack *exterr); struct net *rtnl_get_net_ns_capable(struct sock *sk, int netnsid);
#define MODULE_ALIAS_RTNL_LINK(kind) MODULE_ALIAS("rtnl-link-" kind) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index aa1743b2b770b..baa323ca37c42 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2268,13 +2268,27 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) return err; }
-int rtnl_nla_parse_ifla(struct nlattr **tb, const struct nlattr *head, int len, - struct netlink_ext_ack *exterr) +int rtnl_nla_parse_ifinfomsg(struct nlattr **tb, const struct nlattr *nla_peer, + struct netlink_ext_ack *exterr) { - return nla_parse_deprecated(tb, IFLA_MAX, head, len, ifla_policy, + const struct ifinfomsg *ifmp; + const struct nlattr *attrs; + size_t len; + + ifmp = nla_data(nla_peer); + attrs = nla_data(nla_peer) + sizeof(struct ifinfomsg); + len = nla_len(nla_peer) - sizeof(struct ifinfomsg); + + if (ifmp->ifi_index < 0) { + NL_SET_ERR_MSG_ATTR(exterr, nla_peer, + "ifindex can't be negative"); + return -EINVAL; + } + + return nla_parse_deprecated(tb, IFLA_MAX, attrs, len, ifla_policy, exterr); } -EXPORT_SYMBOL(rtnl_nla_parse_ifla); +EXPORT_SYMBOL(rtnl_nla_parse_ifinfomsg);
struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[]) {
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit f866fbc842de5976e41ba874b76ce31710b634b5 ]
UDP sendmsg() is lockless, so ip_select_ident_segs() can very well be run from multiple cpus [1]
Convert inet->inet_id to an atomic_t, but implement a dedicated path for TCP, avoiding cost of a locked instruction (atomic_add_return())
Note that this patch will cause a trivial merge conflict because we added inet->flags in net-next tree.
v2: added missing change in drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c (David Ahern)
[1]
BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb
read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1: ip_select_ident_segs include/net/ip.h:542 [inline] ip_select_ident include/net/ip.h:556 [inline] __ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446 ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560 udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260 inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494 ___sys_sendmsg net/socket.c:2548 [inline] __sys_sendmmsg+0x269/0x500 net/socket.c:2634 __do_sys_sendmmsg net/socket.c:2663 [inline] __se_sys_sendmmsg net/socket.c:2660 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0: ip_select_ident_segs include/net/ip.h:541 [inline] ip_select_ident include/net/ip.h:556 [inline] __ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446 ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560 udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260 inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494 ___sys_sendmsg net/socket.c:2548 [inline] __sys_sendmmsg+0x269/0x500 net/socket.c:2634 __do_sys_sendmmsg net/socket.c:2663 [inline] __se_sys_sendmmsg net/socket.c:2660 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x184d -> 0x184e
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 ==================================================================
Fixes: 23f57406b82d ("ipv4: avoid using shared IP generator for connected sockets") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../chelsio/inline_crypto/chtls/chtls_cm.c | 2 +- include/net/inet_sock.h | 2 +- include/net/ip.h | 15 +++++++++++++-- net/dccp/ipv4.c | 4 ++-- net/ipv4/af_inet.c | 2 +- net/ipv4/datagram.c | 2 +- net/ipv4/tcp_ipv4.c | 4 ++-- net/sctp/socket.c | 2 +- 8 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c index c2e7037c7ba1c..7750702900fa6 100644 --- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c +++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c @@ -1466,7 +1466,7 @@ static void make_established(struct sock *sk, u32 snd_isn, unsigned int opt) tp->write_seq = snd_isn; tp->snd_nxt = snd_isn; tp->snd_una = snd_isn; - inet_sk(sk)->inet_id = get_random_u16(); + atomic_set(&inet_sk(sk)->inet_id, get_random_u16()); assign_rxopt(sk, opt);
if (tp->rcv_wnd > (RCV_BUFSIZ_M << 10)) diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 0bb32bfc61832..491ceb7ebe5d1 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -222,8 +222,8 @@ struct inet_sock { __s16 uc_ttl; __u16 cmsg_flags; struct ip_options_rcu __rcu *inet_opt; + atomic_t inet_id; __be16 inet_sport; - __u16 inet_id;
__u8 tos; __u8 min_ttl; diff --git a/include/net/ip.h b/include/net/ip.h index 530e7257e4389..1872f570abeda 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -532,8 +532,19 @@ static inline void ip_select_ident_segs(struct net *net, struct sk_buff *skb, * generator as much as we can. */ if (sk && inet_sk(sk)->inet_daddr) { - iph->id = htons(inet_sk(sk)->inet_id); - inet_sk(sk)->inet_id += segs; + int val; + + /* avoid atomic operations for TCP, + * as we hold socket lock at this point. + */ + if (sk_is_tcp(sk)) { + sock_owned_by_me(sk); + val = atomic_read(&inet_sk(sk)->inet_id); + atomic_set(&inet_sk(sk)->inet_id, val + segs); + } else { + val = atomic_add_return(segs, &inet_sk(sk)->inet_id); + } + iph->id = htons(val); return; } if ((iph->frag_off & htons(IP_DF)) && !skb->ignore_df) { diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 3ab68415d121c..e7b9703bd1a1a 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -130,7 +130,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) inet->inet_daddr, inet->inet_sport, inet->inet_dport); - inet->inet_id = get_random_u16(); + atomic_set(&inet->inet_id, get_random_u16());
err = dccp_connect(sk); rt = NULL; @@ -432,7 +432,7 @@ struct sock *dccp_v4_request_recv_sock(const struct sock *sk, RCU_INIT_POINTER(newinet->inet_opt, rcu_dereference(ireq->ireq_opt)); newinet->mc_index = inet_iif(skb); newinet->mc_ttl = ip_hdr(skb)->ttl; - newinet->inet_id = get_random_u16(); + atomic_set(&newinet->inet_id, get_random_u16());
if (dst == NULL && (dst = inet_csk_route_child_sock(sk, newsk, req)) == NULL) goto put_and_exit; diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 10ebe39dcc873..9dde8e842befe 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -340,7 +340,7 @@ static int inet_create(struct net *net, struct socket *sock, int protocol, else inet->pmtudisc = IP_PMTUDISC_WANT;
- inet->inet_id = 0; + atomic_set(&inet->inet_id, 0);
sock_init_data(sock, sk);
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c index 4d1af0cd7d99e..cb5dbee9e018f 100644 --- a/net/ipv4/datagram.c +++ b/net/ipv4/datagram.c @@ -73,7 +73,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len reuseport_has_conns_set(sk); sk->sk_state = TCP_ESTABLISHED; sk_set_txhash(sk); - inet->inet_id = get_random_u16(); + atomic_set(&inet->inet_id, get_random_u16());
sk_dst_set(sk, &rt->dst); err = 0; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 498dd4acdeec8..caecb4d1e424a 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -312,7 +312,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) inet->inet_daddr)); }
- inet->inet_id = get_random_u16(); + atomic_set(&inet->inet_id, get_random_u16());
if (tcp_fastopen_defer_connect(sk, &err)) return err; @@ -1596,7 +1596,7 @@ struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, inet_csk(newsk)->icsk_ext_hdr_len = 0; if (inet_opt) inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen; - newinet->inet_id = get_random_u16(); + atomic_set(&newinet->inet_id, get_random_u16());
/* Set ToS of the new socket based upon the value of incoming SYN. * ECT bits are set later in tcp_init_transfer(). diff --git a/net/sctp/socket.c b/net/sctp/socket.c index de52045774303..d77561d97a1ed 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -9479,7 +9479,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk, newinet->inet_rcv_saddr = inet->inet_rcv_saddr; newinet->inet_dport = htons(asoc->peer.port); newinet->pmtudisc = inet->pmtudisc; - newinet->inet_id = get_random_u16(); + atomic_set(&newinet->inet_id, get_random_u16());
newinet->uc_ttl = inet->uc_ttl; newinet->mc_loop = 1;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesse Brandeburg jesse.brandeburg@intel.com
[ Upstream commit 10083aef784031fa9f06c19a1b182e6fad5338d9 ]
The driver is misconfiguring the hardware for some values of MTU such that it could use multiple descriptors to receive a packet when it could have simply used one.
Change the driver to use a round-up instead of the result of a shift, as the shift can truncate the lower bits of the size, and result in the problem noted above. It also aligns this driver with similar code in i40e.
The insidiousness of this problem is that everything works with the wrong size, it's just not working as well as it could, as some MTU sizes end up using two or more descriptors, and there is no way to tell that is happening without looking at ice_trace or a bus analyzer.
Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Signed-off-by: Jesse Brandeburg jesse.brandeburg@intel.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_base.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 619cb07a40691..25e09ab708ca1 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -393,7 +393,8 @@ static int ice_setup_rx_ctx(struct ice_rx_ring *ring) /* Receive Packet Data Buffer Size. * The Packet Data Buffer Size is defined in 128 byte units. */ - rlan_ctx.dbuf = ring->rx_buf_len >> ICE_RLAN_CTX_DBUF_S; + rlan_ctx.dbuf = DIV_ROUND_UP(ring->rx_buf_len, + BIT_ULL(ICE_RLAN_CTX_DBUF_S));
/* use 32 byte descriptors */ rlan_ctx.dsize = 1;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Oros poros@redhat.com
[ Upstream commit 0ecff05e6c59dd82dbcb9706db911f7fd9f40fb8 ]
This reverts commit 7255355a0636b4eff08d5e8139c77d98f151c4fc.
After this commit we are not able to attach VF to VM: virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52 error: Failed to attach interface error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable
ice_check_vf_ready_for_cfg() already contain waiting for reset. New condition in ice_check_vf_ready_for_reset() causing only problems.
Fixes: 7255355a0636 ("ice: Fix ice VF reset during iavf initialization") Signed-off-by: Petr Oros poros@redhat.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_sriov.c | 8 ++++---- drivers/net/ethernet/intel/ice/ice_vf_lib.c | 19 ------------------- drivers/net/ethernet/intel/ice/ice_vf_lib.h | 1 - drivers/net/ethernet/intel/ice/ice_virtchnl.c | 1 - 4 files changed, 4 insertions(+), 25 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c index 588ad8696756d..f1dca59bd8449 100644 --- a/drivers/net/ethernet/intel/ice/ice_sriov.c +++ b/drivers/net/ethernet/intel/ice/ice_sriov.c @@ -1171,7 +1171,7 @@ int ice_set_vf_spoofchk(struct net_device *netdev, int vf_id, bool ena) if (!vf) return -EINVAL;
- ret = ice_check_vf_ready_for_reset(vf); + ret = ice_check_vf_ready_for_cfg(vf); if (ret) goto out_put_vf;
@@ -1286,7 +1286,7 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) goto out_put_vf; }
- ret = ice_check_vf_ready_for_reset(vf); + ret = ice_check_vf_ready_for_cfg(vf); if (ret) goto out_put_vf;
@@ -1340,7 +1340,7 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted) return -EOPNOTSUPP; }
- ret = ice_check_vf_ready_for_reset(vf); + ret = ice_check_vf_ready_for_cfg(vf); if (ret) goto out_put_vf;
@@ -1653,7 +1653,7 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos, if (!vf) return -EINVAL;
- ret = ice_check_vf_ready_for_reset(vf); + ret = ice_check_vf_ready_for_cfg(vf); if (ret) goto out_put_vf;
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c b/drivers/net/ethernet/intel/ice/ice_vf_lib.c index bf74a2f3a4f8c..89fd6982df093 100644 --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c @@ -185,25 +185,6 @@ int ice_check_vf_ready_for_cfg(struct ice_vf *vf) return 0; }
-/** - * ice_check_vf_ready_for_reset - check if VF is ready to be reset - * @vf: VF to check if it's ready to be reset - * - * The purpose of this function is to ensure that the VF is not in reset, - * disabled, and is both initialized and active, thus enabling us to safely - * initialize another reset. - */ -int ice_check_vf_ready_for_reset(struct ice_vf *vf) -{ - int ret; - - ret = ice_check_vf_ready_for_cfg(vf); - if (!ret && !test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) - ret = -EAGAIN; - - return ret; -} - /** * ice_trigger_vf_reset - Reset a VF on HW * @vf: pointer to the VF structure diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h index a38ef00a36794..e3cda6fb71ab1 100644 --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h @@ -215,7 +215,6 @@ u16 ice_get_num_vfs(struct ice_pf *pf); struct ice_vsi *ice_get_vf_vsi(struct ice_vf *vf); bool ice_is_vf_disabled(struct ice_vf *vf); int ice_check_vf_ready_for_cfg(struct ice_vf *vf); -int ice_check_vf_ready_for_reset(struct ice_vf *vf); void ice_set_vf_state_dis(struct ice_vf *vf); bool ice_is_any_vf_in_unicast_promisc(struct ice_pf *pf); void diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c index f4a524f80b110..97243c616d5d6 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c @@ -3955,7 +3955,6 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, ice_vc_notify_vf_link_state(vf); break; case VIRTCHNL_OP_RESET_VF: - clear_bit(ICE_VF_STATE_ACTIVE, vf->vf_states); ops->reset_vf(vf); break; case VIRTCHNL_OP_ADD_ETH_ADDR:
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Oros poros@redhat.com
[ Upstream commit 67f6317dfa609846a227a706532439a22828c24b ]
During stress test with attaching and detaching VF from KVM and simultaneously changing VFs spoofcheck and trust there was a NULL pointer dereference in ice_reset_vf that VF's VSI is null.
More than one instance of ice_reset_vf() can be running at a given time. When we rebuild the VSI in ice_reset_vf, another reset can be triaged from ice_service_task. In this case we can access the currently uninitialized VSI and cause panic. The window for this racing condition has been around for a long time but it's much worse after commit 227bf4500aaa ("ice: move VSI delete outside deconfig") because the reset runs faster. ice_reset_vf() using vf->cfg_lock and when we move this lock before accessing to the VF VSI, we can fix BUG for all cases.
Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes in ice_vsi_stop_all_rx_rings()
With our reproducer, we can hit BUG: ~8h before commit 227bf4500aaa ("ice: move VSI delete outside deconfig"). ~20m after commit 227bf4500aaa ("ice: move VSI delete outside deconfig"). After this fix we are not able to reproduce it after ~48h
There was commit cf90b74341ee ("ice: Fix call trace with null VSI during VF reset") which also tried to fix this issue, but it was only partially resolved and the bug still exists.
[ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 6420.665382] #PF: supervisor read access in kernel mode [ 6420.670521] #PF: error_code(0x0000) - not-present page [ 6420.675659] PGD 0 [ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1 [ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022 [ 6420.698729] Workqueue: ice ice_service_task [ice] [ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice] [ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted [ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83 [ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized [ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246 [ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000 [ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828 [ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000 [ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0 [ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff [ 6420.783742] FS: 0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000 [ 6420.791829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0 [ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002) [ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6420.811841] PKRU: 55555554 [ 6420.811842] Call Trace: [ 6420.811843] <TASK> [ 6420.811844] ice_reset_vf+0x9a/0x450 [ice] [ 6420.811876] ice_process_vflr_event+0x8f/0xc0 [ice] [ 6420.841343] ice_service_task+0x23b/0x600 [ice] [ 6420.845884] ? __schedule+0x212/0x550 [ 6420.849550] process_one_work+0x1e2/0x3b0 [ 6420.853563] ? rescuer_thread+0x390/0x390 [ 6420.857577] worker_thread+0x50/0x3a0 [ 6420.861242] ? rescuer_thread+0x390/0x390 [ 6420.865253] kthread+0xdd/0x100 [ 6420.868400] ? kthread_complete_and_exit+0x20/0x20 [ 6420.873194] ret_from_fork+0x1f/0x30 [ 6420.876774] </TASK> [ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk r
Fixes: efe41860008e ("ice: Fix memory corruption in VF driver") Fixes: f23df5220d2b ("ice: Fix spurious interrupt during removal of trusted VF") Signed-off-by: Petr Oros poros@redhat.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_vf_lib.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c b/drivers/net/ethernet/intel/ice/ice_vf_lib.c index 89fd6982df093..14da7ebaaead7 100644 --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c @@ -612,11 +612,17 @@ int ice_reset_vf(struct ice_vf *vf, u32 flags) return 0; }
+ if (flags & ICE_VF_RESET_LOCK) + mutex_lock(&vf->cfg_lock); + else + lockdep_assert_held(&vf->cfg_lock); + if (ice_is_vf_disabled(vf)) { vsi = ice_get_vf_vsi(vf); if (!vsi) { dev_dbg(dev, "VF is already removed\n"); - return -EINVAL; + err = -EINVAL; + goto out_unlock; } ice_vsi_stop_lan_tx_rings(vsi, ICE_NO_RESET, vf->vf_id);
@@ -625,14 +631,9 @@ int ice_reset_vf(struct ice_vf *vf, u32 flags)
dev_dbg(dev, "VF is already disabled, there is no need for resetting it, telling VM, all is fine %d\n", vf->vf_id); - return 0; + goto out_unlock; }
- if (flags & ICE_VF_RESET_LOCK) - mutex_lock(&vf->cfg_lock); - else - lockdep_assert_held(&vf->cfg_lock); - /* Set VF disable bit state here, before triggering reset */ set_bit(ICE_VF_STATE_DIS, vf->vf_states); ice_trigger_vf_reset(vf, flags & ICE_VF_RESET_VFLR, false);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit be809424659c2844a2d7ab653aacca4898538023 ]
Before adding a port to bond, it need to be set down first. In the lacpdu test the author set the port down specifically. But commit a4abfa627c38 ("net: rtnetlink: Enslave device before bringing it up") changed the operation order, the kernel will set the port down _after_ adding to bond. So all the ports will be down at last and the test failed.
In fact, the veth interfaces are already inactive when added. This means there's no need to set them down again before adding to the bond. Let's just remove the link down operation.
Fixes: a4abfa627c38 ("net: rtnetlink: Enslave device before bringing it up") Reported-by: Zhengchao Shao shaozhengchao@huawei.com Closes: https://lore.kernel.org/netdev/a0ef07c7-91b0-94bd-240d-944a330fcabd@huawei.c... Signed-off-by: Hangbin Liu liuhangbin@gmail.com Link: https://lore.kernel.org/r/20230817082459.1685972-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh b/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh index 47ab90596acb2..6358df5752f90 100755 --- a/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh +++ b/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh @@ -57,8 +57,8 @@ ip link add name veth2-bond type veth peer name veth2-end
# add ports ip link set fbond master fab-br0 -ip link set veth1-bond down master fbond -ip link set veth2-bond down master fbond +ip link set veth1-bond master fbond +ip link set veth2-bond master fbond
# bring up ip link set veth1-end up
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit 99b415fe8986803ba0eaf6b8897b16edc8fe7ec2 ]
The tg3 driver will use kmalloc() under some conditions. Check the frag_size and use slab_build_skb() when frag_size is 0. Silences the warning introduced by commit ce098da1497c ("skbuff: Introduce slab_build_skb()"):
Use slab_build_skb() instead ... tg3_poll_work+0x638/0xf90 [tg3]
Fixes: ce098da1497c ("skbuff: Introduce slab_build_skb()") Reported-by: Fiona Ebner f.ebner@proxmox.com Closes: https://lore.kernel.org/all/1bd4cb9c-4eb8-3bdb-3e05-8689817242d1@proxmox.com Cc: Siva Reddy Kallam siva.kallam@broadcom.com Cc: Prashant Sreedharan prashant@broadcom.com Cc: Michael Chan mchan@broadcom.com Cc: Bagas Sanjaya bagasdotme@gmail.com Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Link: https://lore.kernel.org/r/20230818175417.never.273-kees@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/tg3.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 5ef073a79ce94..cb2810f175ccd 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -6881,7 +6881,10 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
ri->data = NULL;
- skb = build_skb(data, frag_size); + if (frag_size) + skb = build_skb(data, frag_size); + else + skb = slab_build_skb(data); if (!skb) { tg3_frag_free(frag_size != 0, data); goto drop_it_no_recycle;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Golle daniel@makrotopia.org
[ Upstream commit 604204fcb321abe81238551936ecda5269e81076 ]
When a hardware reset is triggered on devices not initializing WED the calls to mtk_wed_fe_reset and mtk_wed_fe_reset_complete dereference a pointer on uninitialized stack memory. Break out of both functions in case a hw_list entry is 0.
Fixes: 08a764a7c51b ("net: ethernet: mtk_wed: add reset/reset_complete callbacks") Signed-off-by: Daniel Golle daniel@makrotopia.org Reviewed-by: Simon Horman horms@kernel.org Acked-by: Lorenzo Bianconi lorenzo@kernel.org Link: https://lore.kernel.org/r/5465c1609b464cc7407ae1530c40821dcdf9d3e6.169263426... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mediatek/mtk_wed.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_wed.c b/drivers/net/ethernet/mediatek/mtk_wed.c index 985cff910f30c..3b651efcc25e1 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed.c +++ b/drivers/net/ethernet/mediatek/mtk_wed.c @@ -221,9 +221,13 @@ void mtk_wed_fe_reset(void)
for (i = 0; i < ARRAY_SIZE(hw_list); i++) { struct mtk_wed_hw *hw = hw_list[i]; - struct mtk_wed_device *dev = hw->wed_dev; + struct mtk_wed_device *dev; int err;
+ if (!hw) + break; + + dev = hw->wed_dev; if (!dev || !dev->wlan.reset) continue;
@@ -244,8 +248,12 @@ void mtk_wed_fe_reset_complete(void)
for (i = 0; i < ARRAY_SIZE(hw_list); i++) { struct mtk_wed_hw *hw = hw_list[i]; - struct mtk_wed_device *dev = hw->wed_dev; + struct mtk_wed_device *dev; + + if (!hw) + break;
+ dev = hw->wed_dev; if (!dev || !dev->wlan.reset_complete) continue;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oliver Hartkopp socketcan@hartkopp.net
[ Upstream commit 0bfe71159230bab79ee230225ae12ffecbb69f3e ]
The original implementation had a very simple handling for single frame transmissions as it just sent the single frame without a timeout handling.
With the new echo frame handling the echo frame was also introduced for single frames but the former exception ('simple without timers') has been maintained by accident. This leads to a 1 second timeout when closing the socket and to an -ECOMM error when CAN_ISOTP_WAIT_TX_DONE is selected.
As the echo handling is always active (also for single frames) remove the wrong extra condition for single frames.
Fixes: 9f39d36530e5 ("can: isotp: add support for transmission without flow control") Signed-off-by: Oliver Hartkopp socketcan@hartkopp.net Link: https://lore.kernel.org/r/20230821144547.6658-2-socketcan@hartkopp.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/can/isotp.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-)
diff --git a/net/can/isotp.c b/net/can/isotp.c index ca9d728d6d727..9d498a886a586 100644 --- a/net/can/isotp.c +++ b/net/can/isotp.c @@ -188,12 +188,6 @@ static bool isotp_register_rxid(struct isotp_sock *so) return (isotp_bc_flags(so) == 0); }
-static bool isotp_register_txecho(struct isotp_sock *so) -{ - /* all modes but SF_BROADCAST register for tx echo skbs */ - return (isotp_bc_flags(so) != CAN_ISOTP_SF_BROADCAST); -} - static enum hrtimer_restart isotp_rx_timer_handler(struct hrtimer *hrtimer) { struct isotp_sock *so = container_of(hrtimer, struct isotp_sock, @@ -1209,7 +1203,7 @@ static int isotp_release(struct socket *sock) lock_sock(sk);
/* remove current filters & unregister */ - if (so->bound && isotp_register_txecho(so)) { + if (so->bound) { if (so->ifindex) { struct net_device *dev;
@@ -1332,14 +1326,12 @@ static int isotp_bind(struct socket *sock, struct sockaddr *uaddr, int len) can_rx_register(net, dev, rx_id, SINGLE_MASK(rx_id), isotp_rcv, sk, "isotp", sk);
- if (isotp_register_txecho(so)) { - /* no consecutive frame echo skb in flight */ - so->cfecho = 0; + /* no consecutive frame echo skb in flight */ + so->cfecho = 0;
- /* register for echo skb's */ - can_rx_register(net, dev, tx_id, SINGLE_MASK(tx_id), - isotp_rcv_echo, sk, "isotpe", sk); - } + /* register for echo skb's */ + can_rx_register(net, dev, tx_id, SINGLE_MASK(tx_id), + isotp_rcv_echo, sk, "isotpe", sk);
dev_put(dev);
@@ -1560,7 +1552,7 @@ static void isotp_notify(struct isotp_sock *so, unsigned long msg, case NETDEV_UNREGISTER: lock_sock(sk); /* remove current filters & unregister */ - if (so->bound && isotp_register_txecho(so)) { + if (so->bound) { if (isotp_register_rxid(so)) can_rx_unregister(dev_net(dev), dev, so->rxid, SINGLE_MASK(so->rxid),
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alessio Igor Bogani alessio.bogani@elettra.eu
[ Upstream commit b888c510f7b3d64ca75fc0f43b4a4bd1a611312f ]
If ptp_clock_register() fails or CONFIG_PTP isn't enabled, avoid starting PTP related workqueues.
In this way we can fix this: BUG: unable to handle page fault for address: ffffc9000440b6f8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 100000067 P4D 100000067 PUD 1001e0067 PMD 107dc5067 PTE 0 Oops: 0000 [#1] PREEMPT SMP [...] Workqueue: events igb_ptp_overflow_check RIP: 0010:igb_rd32+0x1f/0x60 [...] Call Trace: igb_ptp_read_82580+0x20/0x50 timecounter_read+0x15/0x60 igb_ptp_overflow_check+0x1a/0x50 process_one_work+0x1cb/0x3c0 worker_thread+0x53/0x3f0 ? rescuer_thread+0x370/0x370 kthread+0x142/0x160 ? kthread_associate_blkcg+0xc0/0xc0 ret_from_fork+0x1f/0x30
Fixes: 1f6e8178d685 ("igb: Prevent dropped Tx timestamps via work items and interrupts.") Fixes: d339b1331616 ("igb: add PTP Hardware Clock code") Signed-off-by: Alessio Igor Bogani alessio.bogani@elettra.eu Tested-by: Arpana Arland arpanax.arland@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230821171927.2203644-1-anthony.l.nguyen@intel.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_ptp.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c index 405886ee52615..319c544b9f04c 100644 --- a/drivers/net/ethernet/intel/igb/igb_ptp.c +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c @@ -1385,18 +1385,6 @@ void igb_ptp_init(struct igb_adapter *adapter) return; }
- spin_lock_init(&adapter->tmreg_lock); - INIT_WORK(&adapter->ptp_tx_work, igb_ptp_tx_work); - - if (adapter->ptp_flags & IGB_PTP_OVERFLOW_CHECK) - INIT_DELAYED_WORK(&adapter->ptp_overflow_work, - igb_ptp_overflow_check); - - adapter->tstamp_config.rx_filter = HWTSTAMP_FILTER_NONE; - adapter->tstamp_config.tx_type = HWTSTAMP_TX_OFF; - - igb_ptp_reset(adapter); - adapter->ptp_clock = ptp_clock_register(&adapter->ptp_caps, &adapter->pdev->dev); if (IS_ERR(adapter->ptp_clock)) { @@ -1406,6 +1394,18 @@ void igb_ptp_init(struct igb_adapter *adapter) dev_info(&adapter->pdev->dev, "added PHC on %s\n", adapter->netdev->name); adapter->ptp_flags |= IGB_PTP_ENABLED; + + spin_lock_init(&adapter->tmreg_lock); + INIT_WORK(&adapter->ptp_tx_work, igb_ptp_tx_work); + + if (adapter->ptp_flags & IGB_PTP_OVERFLOW_CHECK) + INIT_DELAYED_WORK(&adapter->ptp_overflow_work, + igb_ptp_overflow_check); + + adapter->tstamp_config.rx_filter = HWTSTAMP_FILTER_NONE; + adapter->tstamp_config.tx_type = HWTSTAMP_TX_OFF; + + igb_ptp_reset(adapter); } }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sasha Neftin sasha.neftin@intel.com
[ Upstream commit de43975721b97283d5f17eea4228faddf08f2681 ]
The IGC_PTM_CTRL_SHRT_CYC defines the time between two consecutive PTM requests. The bit resolution of this field is six bits. That bit five was missing in the mask. This patch comes to correct the typo in the IGC_PTM_CTRL_SHRT_CYC macro.
Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()") Signed-off-by: Sasha Neftin sasha.neftin@intel.com Tested-by: Naama Meir naamax.meir@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Link: https://lore.kernel.org/r/20230821171721.2203572-1-anthony.l.nguyen@intel.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc_defines.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h index 44a5070299465..2f780cc90883c 100644 --- a/drivers/net/ethernet/intel/igc/igc_defines.h +++ b/drivers/net/ethernet/intel/igc/igc_defines.h @@ -546,7 +546,7 @@ #define IGC_PTM_CTRL_START_NOW BIT(29) /* Start PTM Now */ #define IGC_PTM_CTRL_EN BIT(30) /* Enable PTM */ #define IGC_PTM_CTRL_TRIG BIT(31) /* PTM Cycle trigger */ -#define IGC_PTM_CTRL_SHRT_CYC(usec) (((usec) & 0x2f) << 2) +#define IGC_PTM_CTRL_SHRT_CYC(usec) (((usec) & 0x3f) << 2) #define IGC_PTM_CTRL_PTM_TO(usec) (((usec) & 0xff) << 8)
#define IGC_PTM_SHORT_CYC_DEFAULT 10 /* Default Short/interrupted cycle interval */
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jamal Hadi Salim jhs@mojatatu.com
[ Upstream commit da71714e359b64bd7aab3bd56ec53f307f058133 ]
When replacing an existing root qdisc, with one that is of the same kind, the request boils down to essentially a parameterization change i.e not one that requires allocation and grafting of a new qdisc. syzbot was able to create a scenario which resulted in a taprio qdisc replacing an existing taprio qdisc with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to create and graft scenario. The fix ensures that only when the qdisc kinds are different that we should allow a create and graft, otherwise it goes into the "change" codepath.
While at it, fix the code and comments to improve readability.
While syzbot was able to create the issue, it did not zone on the root cause. Analysis from Vladimir Oltean vladimir.oltean@nxp.com helped narrow it down.
v1->V2 changes: - remove "inline" function definition (Vladmir) - remove extrenous braces in branches (Vladmir) - change inline function names (Pedro) - Run tdc tests (Victor) v2->v3 changes: - dont break else/if (Simon)
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/ Tested-by: Vladimir Oltean vladimir.oltean@nxp.com Tested-by: Victor Nogueira victor@mojatatu.com Reviewed-by: Pedro Tammela pctammela@mojatatu.com Reviewed-by: Victor Nogueira victor@mojatatu.com Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_api.c | 53 ++++++++++++++++++++++++++++++++++----------- 1 file changed, 40 insertions(+), 13 deletions(-)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index aa6b1fe651519..e9eaf637220e9 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1547,10 +1547,28 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n, return 0; }
+static bool req_create_or_replace(struct nlmsghdr *n) +{ + return (n->nlmsg_flags & NLM_F_CREATE && + n->nlmsg_flags & NLM_F_REPLACE); +} + +static bool req_create_exclusive(struct nlmsghdr *n) +{ + return (n->nlmsg_flags & NLM_F_CREATE && + n->nlmsg_flags & NLM_F_EXCL); +} + +static bool req_change(struct nlmsghdr *n) +{ + return (!(n->nlmsg_flags & NLM_F_CREATE) && + !(n->nlmsg_flags & NLM_F_REPLACE) && + !(n->nlmsg_flags & NLM_F_EXCL)); +} + /* * Create/change qdisc. */ - static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n, struct netlink_ext_ack *extack) { @@ -1644,27 +1662,35 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n, * * We know, that some child q is already * attached to this parent and have choice: - * either to change it or to create/graft new one. + * 1) change it or 2) create/graft new one. + * If the requested qdisc kind is different + * than the existing one, then we choose graft. + * If they are the same then this is "change" + * operation - just let it fallthrough.. * * 1. We are allowed to create/graft only - * if CREATE and REPLACE flags are set. + * if the request is explicitly stating + * "please create if it doesn't exist". * - * 2. If EXCL is set, requestor wanted to say, - * that qdisc tcm_handle is not expected + * 2. If the request is to exclusive create + * then the qdisc tcm_handle is not expected * to exist, so that we choose create/graft too. * * 3. The last case is when no flags are set. + * This will happen when for example tc + * utility issues a "change" command. * Alas, it is sort of hole in API, we * cannot decide what to do unambiguously. - * For now we select create/graft, if - * user gave KIND, which does not match existing. + * For now we select create/graft. */ - if ((n->nlmsg_flags & NLM_F_CREATE) && - (n->nlmsg_flags & NLM_F_REPLACE) && - ((n->nlmsg_flags & NLM_F_EXCL) || - (tca[TCA_KIND] && - nla_strcmp(tca[TCA_KIND], q->ops->id)))) - goto create_n_graft; + if (tca[TCA_KIND] && + nla_strcmp(tca[TCA_KIND], q->ops->id)) { + if (req_create_or_replace(n) || + req_create_exclusive(n)) + goto create_n_graft; + else if (req_change(n)) + goto create_n_graft2; + } } } } else { @@ -1698,6 +1724,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n, NL_SET_ERR_MSG(extack, "Qdisc not found. To create specify NLM_F_CREATE flag"); return -ENOENT; } +create_n_graft2: if (clid == TC_H_INGRESS) { if (dev_ingress_queue(dev)) { q = qdisc_create(dev, dev_ingress_queue(dev),
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrii Staikov andrii.staikov@intel.com
[ Upstream commit 9525a3c38accd2e186f52443e35e633e296cc7f5 ]
Add check for pf->vf not being NULL before dereferencing pf->vf[vsi->vf_id] in updating VSI filter sync. Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted in the condition for clearing promisc mode bit.
Fixes: c87c938f62d8 ("i40e: Add VF VLAN pruning") Signed-off-by: Andrii Staikov andrii.staikov@intel.com Signed-off-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index b847bd105b16e..5d21cb4ef6301 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -2615,7 +2615,7 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi) retval = i40e_correct_mac_vlan_filters (vsi, &tmp_add_list, &tmp_del_list, vlan_filters); - else + else if (pf->vf) retval = i40e_correct_vf_mac_vlan_filters (vsi, &tmp_add_list, &tmp_del_list, vlan_filters, pf->vf[vsi->vf_id].trusted); @@ -2788,7 +2788,8 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi) }
/* if the VF is not trusted do not do promisc */ - if ((vsi->type == I40E_VSI_SRIOV) && !pf->vf[vsi->vf_id].trusted) { + if (vsi->type == I40E_VSI_SRIOV && pf->vf && + !pf->vf[vsi->vf_id].trusted) { clear_bit(__I40E_VSI_OVERFLOW_PROMISC, vsi->state); goto out; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 4b80ced971b0d118f9a11dd503a5833a5016de92 ]
We have to validate all tables in the transaction that are in VALIDATE_DO state, the blamed commit below did not move the break statement to its right location so we only validate one table.
Moreover, we can't init table->validate to _SKIP when a table object is allocated.
If we do, then if a transcaction creates a new table and then fails the transaction, nfnetlink will loop and nft will hang until user cancels the command.
Add back the pernet state as a place to stash the last state encountered. This is either _DO (we hit an error during commit validation) or _SKIP (transaction passed all checks).
Fixes: 00c320f9b755 ("netfilter: nf_tables: make validation state per table") Reported-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 1 + net/netfilter/nf_tables_api.c | 11 +++++++---- 2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index ad97049e28881..a9a730fb9f963 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1817,6 +1817,7 @@ struct nftables_pernet { u64 table_handle; unsigned int base_seq; unsigned int gc_seq; + u8 validate_state; };
extern unsigned int nf_tables_net_id; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index b280b151a9e98..5275c4112b57d 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1372,7 +1372,7 @@ static int nf_tables_newtable(struct sk_buff *skb, const struct nfnl_info *info, if (table == NULL) goto err_kzalloc;
- table->validate_state = NFT_VALIDATE_SKIP; + table->validate_state = nft_net->validate_state; table->name = nla_strdup(attr, GFP_KERNEL_ACCOUNT); if (table->name == NULL) goto err_strdup; @@ -9065,9 +9065,8 @@ static int nf_tables_validate(struct net *net) return -EAGAIN;
nft_validate_state_update(table, NFT_VALIDATE_SKIP); + break; } - - break; }
return 0; @@ -9813,8 +9812,10 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) }
/* 0. Validate ruleset, otherwise roll back for error reporting. */ - if (nf_tables_validate(net) < 0) + if (nf_tables_validate(net) < 0) { + nft_net->validate_state = NFT_VALIDATE_DO; return -EAGAIN; + }
err = nft_flow_rule_offload_commit(net); if (err < 0) @@ -10070,6 +10071,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nf_tables_commit_audit_log(&adl, nft_net->base_seq);
nft_gc_seq_end(nft_net, gc_seq); + nft_net->validate_state = NFT_VALIDATE_SKIP; nf_tables_commit_release(net);
return 0; @@ -11126,6 +11128,7 @@ static int __net_init nf_tables_init_net(struct net *net) mutex_init(&nft_net->commit_mutex); nft_net->base_seq = 1; nft_net->gc_seq = 0; + nft_net->validate_state = NFT_VALIDATE_SKIP;
return 0; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 2c9f0293280e258606e54ed2b96fa71498432eae ]
Destroy work waits for the RCU grace period then it releases the objects with no mutex held. All releases objects follow this path for transactions, therefore, order is guaranteed and references to top-level objects in the hierarchy remain valid.
However, netlink notifier might interfer with pending destroy work. rcu_barrier() is not correct because objects are not release via RCU callback. Flush destroy work before releasing objects from netlink notifier path.
Fixes: d4bc8271db21 ("netfilter: nf_tables: netlink notifier might race to release objects") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 5275c4112b57d..539bc5d5c12fd 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -11084,7 +11084,7 @@ static int nft_rcv_nl_event(struct notifier_block *this, unsigned long event, gc_seq = nft_gc_seq_begin(nft_net);
if (!list_empty(&nf_tables_destroy_list)) - rcu_barrier(); + nf_tables_trans_destroy_flush_work(); again: list_for_each_entry(table, &nft_net->tables, list) { if (nft_table_has_owner(table) &&
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 720344340fb9be2765bbaab7b292ece0a4570eae ]
Abort path is missing a synchronization point with GC transactions. Add GC sequence number hence any GC transaction losing race will be discarded.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 539bc5d5c12fd..9cd8c14a0faf4 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -10348,8 +10348,12 @@ static int nf_tables_abort(struct net *net, struct sk_buff *skb, enum nfnl_abort_action action) { struct nftables_pernet *nft_net = nft_pernet(net); - int ret = __nf_tables_abort(net, action); + unsigned int gc_seq; + int ret;
+ gc_seq = nft_gc_seq_begin(nft_net); + ret = __nf_tables_abort(net, action); + nft_gc_seq_end(nft_net, gc_seq); mutex_unlock(&nft_net->commit_mutex);
return ret;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 8357bc946a2abc2a10ca40e5a2105d2b4c57515e ]
Use nf_tables_gc_list_lock spinlock, not nf_tables_destroy_list_lock to protect the gc list.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 9cd8c14a0faf4..ad38f84a8f11a 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -9470,9 +9470,9 @@ static void nft_trans_gc_work(struct work_struct *work) struct nft_trans_gc *trans, *next; LIST_HEAD(trans_gc_list);
- spin_lock(&nf_tables_destroy_list_lock); + spin_lock(&nf_tables_gc_list_lock); list_splice_init(&nf_tables_gc_list, &trans_gc_list); - spin_unlock(&nf_tables_destroy_list_lock); + spin_unlock(&nf_tables_gc_list_lock);
list_for_each_entry_safe(trans, next, &trans_gc_list, list) { list_del(&trans->list);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 5e1be4cdc98c989d5387ce94ff15b5ad06a5b681 ]
Several instances of pipapo_resize() don't propagate allocation failures, this causes a crash when fault injection is enabled for gfp_kernel slabs.
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 352180b123fc7..58bd514260b90 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -902,12 +902,14 @@ static void pipapo_lt_bits_adjust(struct nft_pipapo_field *f) static int pipapo_insert(struct nft_pipapo_field *f, const uint8_t *k, int mask_bits) { - int rule = f->rules++, group, ret, bit_offset = 0; + int rule = f->rules, group, ret, bit_offset = 0;
- ret = pipapo_resize(f, f->rules - 1, f->rules); + ret = pipapo_resize(f, f->rules, f->rules + 1); if (ret) return ret;
+ f->rules++; + for (group = 0; group < f->groups; group++) { int i, v; u8 mask; @@ -1052,7 +1054,9 @@ static int pipapo_expand(struct nft_pipapo_field *f, step++; if (step >= len) { if (!masks) { - pipapo_insert(f, base, 0); + err = pipapo_insert(f, base, 0); + if (err < 0) + return err; masks = 1; } goto out; @@ -1235,6 +1239,9 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set, else ret = pipapo_expand(f, start, end, f->groups * f->bb);
+ if (ret < 0) + return ret; + if (f->bsize > bsize_max) bsize_max = f->bsize;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 8e51830e29e12670b4c10df070a4ea4c9593e961 ]
Don't queue more gc work, else we may queue the same elements multiple times.
If an element is flagged as dead, this can mean that either the previous gc request was invalidated/discarded by a transaction or that the previous request is still pending in the system work queue.
The latter will happen if the gc interval is set to a very low value, e.g. 1ms, and system work queue is backlogged.
The sets refcount is 1 if no previous gc requeusts are queued, so add a helper for this and skip gc run if old requests are pending.
Add a helper for this and skip the gc run in this case.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 5 +++++ net/netfilter/nft_set_hash.c | 3 +++ net/netfilter/nft_set_rbtree.c | 3 +++ 3 files changed, 11 insertions(+)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index a9a730fb9f963..394b22b44b0e8 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -586,6 +586,11 @@ static inline void *nft_set_priv(const struct nft_set *set) return (void *)set->data; }
+static inline bool nft_set_gc_is_pending(const struct nft_set *s) +{ + return refcount_read(&s->refs) != 1; +} + static inline struct nft_set *nft_set_container_of(const void *priv) { return (void *)priv - offsetof(struct nft_set, data); diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c index cef5df8460009..524763659f251 100644 --- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -326,6 +326,9 @@ static void nft_rhash_gc(struct work_struct *work) nft_net = nft_pernet(net); gc_seq = READ_ONCE(nft_net->gc_seq);
+ if (nft_set_gc_is_pending(set)) + goto done; + gc = nft_trans_gc_alloc(set, gc_seq, GFP_KERNEL); if (!gc) goto done; diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index f9d4c8fcbbf82..c6435e7092319 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -611,6 +611,9 @@ static void nft_rbtree_gc(struct work_struct *work) nft_net = nft_pernet(net); gc_seq = READ_ONCE(nft_net->gc_seq);
+ if (nft_set_gc_is_pending(set)) + goto done; + gc = nft_trans_gc_alloc(set, gc_seq, GFP_KERNEL); if (!gc) goto done;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit 30188bd7838c16a98a520db1fe9df01ffc6ed368 ]
Negative ifindexes are illegal, but the kernel does not validate the ifindex in the ancillary header of RTM_NEWLINK messages, resulting in the kernel generating a warning [1] when such an ifindex is specified.
Fix by rejecting negative ifindexes.
[1] WARNING: CPU: 0 PID: 5031 at net/core/dev.c:9593 dev_index_reserve+0x1a2/0x1c0 net/core/dev.c:9593 [...] Call Trace: <TASK> register_netdevice+0x69a/0x1490 net/core/dev.c:10081 br_dev_newlink+0x27/0x110 net/bridge/br_netlink.c:1552 rtnl_newlink_create net/core/rtnetlink.c:3471 [inline] __rtnl_newlink+0x115e/0x18c0 net/core/rtnetlink.c:3688 rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3701 rtnetlink_rcv_msg+0x439/0xd30 net/core/rtnetlink.c:6427 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2545 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline] netlink_unicast+0x536/0x810 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x93c/0xe40 net/netlink/af_netlink.c:1910 sock_sendmsg_nosec net/socket.c:728 [inline] sock_sendmsg+0xd9/0x180 net/socket.c:751 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2538 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2592 __sys_sendmsg+0x117/0x1e0 net/socket.c:2621 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 38f7b870d4a6 ("[RTNETLINK]: Link creation API") Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Jakub Kicinski kuba@kernel.org Link: https://lore.kernel.org/r/20230823064348.2252280-1-idosch@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/rtnetlink.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index baa323ca37c42..fd6d2430d40ff 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3560,6 +3560,9 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, if (ifm->ifi_index > 0) { link_specified = true; dev = __dev_get_by_index(net, ifm->ifi_index); + } else if (ifm->ifi_index < 0) { + NL_SET_ERR_MSG(extack, "ifindex can't be negative"); + return -EINVAL; } else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME]) { link_specified = true; dev = rtnl_dev_get(net, tb);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit e74216b8def3803e98ae536de78733e9d7f3b109 ]
The commit 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds") aims to enable the use of macvlans on top of rlb bond mode. However, the current rlb bond mode only handles ARP packets to update remote neighbor entries. This causes an issue when a macvlan is on top of the bond, and remote devices send packets to the macvlan using the bond's MAC address as the destination. After delivering the packets to the macvlan, the macvlan will rejects them as the MAC address is incorrect. Consequently, this commit makes macvlan over bond non-functional.
To address this problem, one potential solution is to check for the presence of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev) and return NULL in the rlb_arp_xmit() function. However, this approach doesn't fully resolve the situation when a VLAN exists between the bond and macvlan.
So let's just do a partial revert for commit 14af9963ba1e in rlb_arp_xmit(). As the comment said, Don't modify or load balance ARPs that do not originate locally.
Fixes: 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds") Reported-by: susan.zheng@veritas.com Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117816 Signed-off-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Jay Vosburgh jay.vosburgh@canonical.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_alb.c | 6 +++--- include/net/bonding.h | 11 +---------- 2 files changed, 4 insertions(+), 13 deletions(-)
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index b9dbad3a8af82..fc5da5d7744da 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -660,10 +660,10 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond) return NULL; arp = (struct arp_pkt *)skb_network_header(skb);
- /* Don't modify or load balance ARPs that do not originate locally - * (e.g.,arrive via a bridge). + /* Don't modify or load balance ARPs that do not originate + * from the bond itself or a VLAN directly above the bond. */ - if (!bond_slave_has_mac_rx(bond, arp->mac_src)) + if (!bond_slave_has_mac_rcu(bond, arp->mac_src)) return NULL;
dev = ip_dev_find(dev_net(bond->dev), arp->ip_src); diff --git a/include/net/bonding.h b/include/net/bonding.h index 59955ac331578..6e4e406d8cd20 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -724,23 +724,14 @@ static inline struct slave *bond_slave_has_mac(struct bonding *bond, }
/* Caller must hold rcu_read_lock() for read */ -static inline bool bond_slave_has_mac_rx(struct bonding *bond, const u8 *mac) +static inline bool bond_slave_has_mac_rcu(struct bonding *bond, const u8 *mac) { struct list_head *iter; struct slave *tmp; - struct netdev_hw_addr *ha;
bond_for_each_slave_rcu(bond, tmp, iter) if (ether_addr_equal_64bits(mac, tmp->dev->dev_addr)) return true; - - if (netdev_uc_empty(bond->dev)) - return false; - - netdev_for_each_uc_addr(ha, bond->dev) - if (ether_addr_equal_64bits(mac, ha->addr)) - return true; - return false; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: BrenoRCBrito brenorcbrito@gmail.com
commit 3b1f08833c45d0167741e4097b0150e7cf086102 upstream.
VivoBook Pro 15 Ryzen Edition uses Ryzen 6800H processor, and adding to quirks list for acp6x will enable internal mic.
Signed-off-by: BrenoRCBrito brenorcbrito@gmail.com Link: https://lore.kernel.org/r/20230818211417.32167-1-brenorcbrito@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/amd/yc/acp6x-mach.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/sound/soc/amd/yc/acp6x-mach.c +++ b/sound/soc/amd/yc/acp6x-mach.c @@ -251,6 +251,13 @@ static const struct dmi_system_id yc_acp { .driver_data = &acp6x_card, .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_PRODUCT_NAME, "M6500RC"), + } + }, + { + .driver_data = &acp6x_card, + .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "Alienware"), DMI_MATCH(DMI_PRODUCT_NAME, "Alienware m17 R5 AMD"), }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Charles Keepax ckeepax@opensource.cirrus.com
commit 1613781d7e8a93618ff3a6b37f81f06769b53717 upstream.
The current analog gain TLV seems to have completely incorrect values in it. The gain starts at 0.5dB, proceeds in 1dB steps, and has no mute value, correct the control to match.
Signed-off-by: Charles Keepax ckeepax@opensource.cirrus.com Link: https://lore.kernel.org/r/20230823085308.753572-1-ckeepax@opensource.cirrus.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/codecs/cs35l41.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/codecs/cs35l41.c +++ b/sound/soc/codecs/cs35l41.c @@ -168,7 +168,7 @@ static int cs35l41_get_fs_mon_config_ind static const DECLARE_TLV_DB_RANGE(dig_vol_tlv, 0, 0, TLV_DB_SCALE_ITEM(TLV_DB_GAIN_MUTE, 0, 1), 1, 913, TLV_DB_MINMAX_ITEM(-10200, 1200)); -static DECLARE_TLV_DB_SCALE(amp_gain_tlv, 0, 1, 1); +static DECLARE_TLV_DB_SCALE(amp_gain_tlv, 50, 100, 0);
static const struct snd_kcontrol_new dre_ctrl = SOC_DAPM_SINGLE("Switch", CS35L41_PWR_CTRL3, 20, 1, 0);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivas Goud srinivas.goud@amd.com
commit 627d05a41ca1fbb9d390f9513af262f001f261f7 upstream.
Remove 10us delay in cdns_spi_process_fifo() (called from cdns_spi_irq()) to fix data corruption issue on Master side when this driver configured in Slave mode, as Slave is failed to prepare the date on time due to above delay.
Add 10us delay before processing the RX FIFO as TX empty doesn't guarantee valid data in RX FIFO.
Signed-off-by: Srinivas Goud srinivas.goud@amd.com Reviewed-by: Charles Keepax ckeepax@opensource.cirrus.com Tested-by: Charles Keepax ckeepax@opensource.cirrus.com Link: https://lore.kernel.org/r/1692610216-217644-1-git-send-email-srinivas.goud@a... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/spi/spi-cadence.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-)
--- a/drivers/spi/spi-cadence.c +++ b/drivers/spi/spi-cadence.c @@ -316,12 +316,6 @@ static void cdns_spi_process_fifo(struct xspi->rx_bytes -= nrx;
while (ntx || nrx) { - /* When xspi in busy condition, bytes may send failed, - * then spi control did't work thoroughly, add one byte delay - */ - if (cdns_spi_read(xspi, CDNS_SPI_ISR) & CDNS_SPI_IXR_TXFULL) - udelay(10); - if (ntx) { if (xspi->txbuf) cdns_spi_write(xspi, CDNS_SPI_TXD, *xspi->txbuf++); @@ -391,6 +385,11 @@ static irqreturn_t cdns_spi_irq(int irq, if (xspi->tx_bytes) { cdns_spi_process_fifo(xspi, trans_cnt, trans_cnt); } else { + /* Fixed delay due to controller limitation with + * RX_NEMPTY incorrect status + * Xilinx AR:65885 contains more details + */ + udelay(10); cdns_spi_process_fifo(xspi, 0, trans_cnt); cdns_spi_write(xspi, CDNS_SPI_IDR, CDNS_SPI_IXR_DEFAULT); @@ -438,12 +437,18 @@ static int cdns_transfer_one(struct spi_ cdns_spi_setup_transfer(spi, transfer); } else { /* Set TX empty threshold to half of FIFO depth - * only if TX bytes are more than half FIFO depth. + * only if TX bytes are more than FIFO depth. */ if (xspi->tx_bytes > xspi->tx_fifo_depth) cdns_spi_write(xspi, CDNS_SPI_THLD, xspi->tx_fifo_depth >> 1); }
+ /* When xspi in busy condition, bytes may send failed, + * then spi control didn't work thoroughly, add one byte delay + */ + if (cdns_spi_read(xspi, CDNS_SPI_ISR) & CDNS_SPI_IXR_TXFULL) + udelay(10); + cdns_spi_process_fifo(xspi, xspi->tx_fifo_depth, 0); spi_transfer_delay_exec(transfer);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Ellerman mpe@ellerman.id.au
commit bfedba3b2c7793ce127680bc8f70711e05ec7a17 upstream.
When building for power4, newer binutils don't recognise the "dcbfl" extended mnemonic.
dcbfl RA, RB is equivalent to dcbf RA, RB, 1.
Switch to "dcbf" to avoid the build error.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ibm/ibmveth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -203,7 +203,7 @@ static inline void ibmveth_flush_buffer( unsigned long offset;
for (offset = 0; offset < length; offset += SMP_CACHE_BYTES) - asm("dcbfl %0,%1" :: "b" (addr), "r" (offset)); + asm("dcbf %0,%1,1" :: "b" (addr), "r" (offset)); }
/* replenish the buffers for a pool. note that we don't need to
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ping-Ke Shih pkshih@realtek.com
commit b98c16107cc1647242abbd11f234c05a3a5864f6 upstream.
The commit 06470f7468c8 ("mac80211: add API to allow filtering frames in BA sessions") added reorder_buf_filtered to mark frames filtered by firmware, and it can only work correctly if hw.max_rx_aggregation_subframes <= 64 since it stores the bitmap in a u64 variable.
However, new HE or EHT devices can support BlockAck number up to 256 or 1024, and then using a higher subframe index leads UBSAN warning:
UBSAN: shift-out-of-bounds in net/mac80211/rx.c:1129:39 shift exponent 215 is too large for 64-bit type 'long long unsigned int' Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 dump_stack+0x10/0x20 __ubsan_handle_shift_out_of_bounds+0x1ac/0x360 ieee80211_release_reorder_frame.constprop.0.cold+0x64/0x69 [mac80211] ieee80211_sta_reorder_release+0x9c/0x400 [mac80211] ieee80211_prepare_and_rx_handle+0x1234/0x1420 [mac80211] ieee80211_rx_list+0xaef/0xf60 [mac80211] ieee80211_rx_napi+0x53/0xd0 [mac80211]
Since only old hardware that supports <=64 BlockAck uses ieee80211_mark_rx_ba_filtered_frames(), limit the use as it is, so add a WARN_ONCE() and comment to note to avoid using this function if hardware capability is not suitable.
Signed-off-by: Ping-Ke Shih pkshih@realtek.com Link: https://lore.kernel.org/r/20230818014004.16177-1-pkshih@realtek.com [edit commit message] Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/mac80211.h | 1 + net/mac80211/rx.c | 12 ++++++++++-- 2 files changed, 11 insertions(+), 2 deletions(-)
--- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -6578,6 +6578,7 @@ void ieee80211_stop_rx_ba_session(struct * marks frames marked in the bitmap as having been filtered. Afterwards, it * checks if any frames in the window starting from @ssn can now be released * (in case they were only waiting for frames that were filtered.) + * (Only work correctly if @max_rx_aggregation_subframes <= 64 frames) */ void ieee80211_mark_rx_ba_filtered_frames(struct ieee80211_sta *pubsta, u8 tid, u16 ssn, u64 filtered, --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -1083,7 +1083,8 @@ static inline bool ieee80211_rx_reorder_ struct sk_buff *tail = skb_peek_tail(frames); struct ieee80211_rx_status *status;
- if (tid_agg_rx->reorder_buf_filtered & BIT_ULL(index)) + if (tid_agg_rx->reorder_buf_filtered && + tid_agg_rx->reorder_buf_filtered & BIT_ULL(index)) return true;
if (!tail) @@ -1124,7 +1125,8 @@ static void ieee80211_release_reorder_fr }
no_frame: - tid_agg_rx->reorder_buf_filtered &= ~BIT_ULL(index); + if (tid_agg_rx->reorder_buf_filtered) + tid_agg_rx->reorder_buf_filtered &= ~BIT_ULL(index); tid_agg_rx->head_seq_num = ieee80211_sn_inc(tid_agg_rx->head_seq_num); }
@@ -4245,6 +4247,7 @@ void ieee80211_mark_rx_ba_filtered_frame u16 ssn, u64 filtered, u16 received_mpdus) { + struct ieee80211_local *local; struct sta_info *sta; struct tid_ampdu_rx *tid_agg_rx; struct sk_buff_head frames; @@ -4262,6 +4265,11 @@ void ieee80211_mark_rx_ba_filtered_frame
sta = container_of(pubsta, struct sta_info, sta);
+ local = sta->sdata->local; + WARN_ONCE(local->hw.max_rx_aggregation_subframes > 64, + "RX BA marker can't support max_rx_aggregation_subframes %u > 64\n", + local->hw.max_rx_aggregation_subframes); + if (!ieee80211_rx_data_set_sta(&rx, sta, -1)) return;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Swapnil Devesh me@sidevesh.com
commit db35610a181c18f7a521a2e157f7acdef7ce425f upstream.
This adds my laptop Lenovo Yoga 7 14ACN6, with Product Name: 82N7 (from `dmidecode -t1 | grep "Product Name"`) to the ec_trigger_quirk_dmi_table, have tested that this is required for the YMC driver to work correctly on this model.
Signed-off-by: Swapnil Devesh me@sidevesh.com Reviewed-by: Gergő Köteles soyer@irl.hu Link: https://lore.kernel.org/r/18a08a8b173.895ef3b250414.1213194126082324071@side... Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/lenovo-ymc.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/platform/x86/lenovo-ymc.c +++ b/drivers/platform/x86/lenovo-ymc.c @@ -36,6 +36,13 @@ static const struct dmi_system_id ec_tri DMI_MATCH(DMI_PRODUCT_NAME, "82QF"), }, }, + { + /* Lenovo Yoga 7 14ACN6 */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82N7"), + }, + }, { } };
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: André Apitzsch git@apitzsch.eu
commit a260f7d726fde52c0278bd3fa085a758639bcee2 upstream.
The Lenovo Thinkbook 14s Yoga ITL has 4 new symbols/shortcuts on their F9-F11 and PrtSc keys:
F9: Has a symbol of a head with a headset, the manual says "Service key" F10: Has a symbol of a telephone horn which has been picked up from the receiver, the manual says: "Answer incoming calls" F11: Has a symbol of a telephone horn which is resting on the receiver, the manual says: "Reject incoming calls" PrtSc: Has a symbol of a siccor and a dashed ellipse, the manual says: "Open the Windows 'Snipping' Tool app"
This commit adds support for these 4 new hkey events.
Signed-off-by: André Apitzsch git@apitzsch.eu Link: https://lore.kernel.org/r/20230819-lenovo_keys-v1-1-9d34eac88e0a@apitzsch.eu Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/ideapad-laptop.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/platform/x86/ideapad-laptop.c +++ b/drivers/platform/x86/ideapad-laptop.c @@ -1049,6 +1049,11 @@ static const struct key_entry ideapad_ke { KE_IGNORE, 0x03 | IDEAPAD_WMI_KEY }, /* Customizable Lenovo Hotkey ("star" with 'S' inside) */ { KE_KEY, 0x01 | IDEAPAD_WMI_KEY, { KEY_FAVORITES } }, + { KE_KEY, 0x04 | IDEAPAD_WMI_KEY, { KEY_SELECTIVE_SCREENSHOT } }, + /* Lenovo Support */ + { KE_KEY, 0x07 | IDEAPAD_WMI_KEY, { KEY_HELP } }, + { KE_KEY, 0x0e | IDEAPAD_WMI_KEY, { KEY_PICKUP_PHONE } }, + { KE_KEY, 0x0f | IDEAPAD_WMI_KEY, { KEY_HANGUP_PHONE } }, /* Dark mode toggle */ { KE_KEY, 0x13 | IDEAPAD_WMI_KEY, { KEY_PROG1 } }, /* Sound profile switch */
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Coddington bcodding@redhat.com
commit 1cbc11aaa01f80577b67ae02c73ee781112125fd upstream.
Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation return") attempted to solve this problem by using nfs4's generic async error handling, but introduced a regression where v4.0 lock recovery would hang. The additional complexity introduced by overloading that error handling is not necessary for this case. This patch expects that commit to be reverted.
The problem as originally explained in the above commit is:
There's a small window where a LOCK sent during a delegation return can race with another OPEN on client, but the open stateid has not yet been updated. In this case, the client doesn't handle the OLD_STATEID error from the server and will lose this lock, emitting: "NFS: nfs4_handle_delegation_recall_error: unhandled error -10024".
Fix this by using the old_stateid refresh helpers if the server replies with OLD_STATEID.
Suggested-by: Trond Myklebust trondmy@hammerspace.com Signed-off-by: Benjamin Coddington bcodding@redhat.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfs/nfs4proc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -7180,8 +7180,15 @@ static void nfs4_lock_done(struct rpc_ta } else if (!nfs4_update_lock_stateid(lsp, &data->res.stateid)) goto out_restart; break; - case -NFS4ERR_BAD_STATEID: case -NFS4ERR_OLD_STATEID: + if (data->arg.new_lock_owner != 0 && + nfs4_refresh_open_old_stateid(&data->arg.open_stateid, + lsp->ls_state)) + goto out_restart; + if (nfs4_refresh_lock_old_stateid(&data->arg.lock_stateid, lsp)) + goto out_restart; + fallthrough; + case -NFS4ERR_BAD_STATEID: case -NFS4ERR_STALE_STATEID: case -NFS4ERR_EXPIRED: if (data->arg.new_lock_owner != 0) {
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrey Skvortsov andrej.skvortzov@gmail.com
commit 66fbfb35da47f391bdadf9fa7ceb88af4faa9022 upstream.
Problem can be reproduced by unloading snd_soc_simple_card, because in devm_get_clk_from_child() devres data is allocated as `struct clk`, but devm_clk_release() expects devres data to be `struct devm_clk_state`.
KASAN report: ================================================================== BUG: KASAN: slab-out-of-bounds in devm_clk_release+0x20/0x54 Read of size 8 at addr ffffff800ee09688 by task (udev-worker)/287
Call trace: dump_backtrace+0xe8/0x11c show_stack+0x1c/0x30 dump_stack_lvl+0x60/0x78 print_report+0x150/0x450 kasan_report+0xa8/0xf0 __asan_load8+0x78/0xa0 devm_clk_release+0x20/0x54 release_nodes+0x84/0x120 devres_release_all+0x144/0x210 device_unbind_cleanup+0x1c/0xac really_probe+0x2f0/0x5b0 __driver_probe_device+0xc0/0x1f0 driver_probe_device+0x68/0x120 __driver_attach+0x140/0x294 bus_for_each_dev+0xec/0x160 driver_attach+0x38/0x44 bus_add_driver+0x24c/0x300 driver_register+0xf0/0x210 __platform_driver_register+0x48/0x54 asoc_simple_card_init+0x24/0x1000 [snd_soc_simple_card] do_one_initcall+0xac/0x340 do_init_module+0xd0/0x300 load_module+0x2ba4/0x3100 __do_sys_init_module+0x2c8/0x300 __arm64_sys_init_module+0x48/0x5c invoke_syscall+0x64/0x190 el0_svc_common.constprop.0+0x124/0x154 do_el0_svc+0x44/0xdc el0_svc+0x14/0x50 el0t_64_sync_handler+0xec/0x11c el0t_64_sync+0x14c/0x150
Allocated by task 287: kasan_save_stack+0x38/0x60 kasan_set_track+0x28/0x40 kasan_save_alloc_info+0x20/0x30 __kasan_kmalloc+0xac/0xb0 __kmalloc_node_track_caller+0x6c/0x1c4 __devres_alloc_node+0x44/0xb4 devm_get_clk_from_child+0x44/0xa0 asoc_simple_parse_clk+0x1b8/0x1dc [snd_soc_simple_card_utils] simple_parse_node.isra.0+0x1ec/0x230 [snd_soc_simple_card] simple_dai_link_of+0x1bc/0x334 [snd_soc_simple_card] __simple_for_each_link+0x2ec/0x320 [snd_soc_simple_card] asoc_simple_probe+0x468/0x4dc [snd_soc_simple_card] platform_probe+0x90/0xf0 really_probe+0x118/0x5b0 __driver_probe_device+0xc0/0x1f0 driver_probe_device+0x68/0x120 __driver_attach+0x140/0x294 bus_for_each_dev+0xec/0x160 driver_attach+0x38/0x44 bus_add_driver+0x24c/0x300 driver_register+0xf0/0x210 __platform_driver_register+0x48/0x54 asoc_simple_card_init+0x24/0x1000 [snd_soc_simple_card] do_one_initcall+0xac/0x340 do_init_module+0xd0/0x300 load_module+0x2ba4/0x3100 __do_sys_init_module+0x2c8/0x300 __arm64_sys_init_module+0x48/0x5c invoke_syscall+0x64/0x190 el0_svc_common.constprop.0+0x124/0x154 do_el0_svc+0x44/0xdc el0_svc+0x14/0x50 el0t_64_sync_handler+0xec/0x11c el0t_64_sync+0x14c/0x150
The buggy address belongs to the object at ffffff800ee09600 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 136 bytes inside of 256-byte region [ffffff800ee09600, ffffff800ee09700)
The buggy address belongs to the physical page: page:000000002d97303b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4ee08 head:000000002d97303b order:1 compound_mapcount:0 compound_pincount:0 flags: 0x10200(slab|head|zone=0) raw: 0000000000010200 0000000000000000 dead000000000122 ffffff8002c02480 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffffff800ee09580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffffff800ee09600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffff800ee09680: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffffff800ee09700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffffff800ee09780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ==================================================================
Fixes: abae8e57e49a ("clk: generalize devm_clk_get() a bit") Signed-off-by: Andrey Skvortsov andrej.skvortzov@gmail.com Link: https://lore.kernel.org/r/20230805084847.3110586-1-andrej.skvortzov@gmail.co... Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/clk-devres.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
--- a/drivers/clk/clk-devres.c +++ b/drivers/clk/clk-devres.c @@ -205,18 +205,19 @@ EXPORT_SYMBOL(devm_clk_put); struct clk *devm_get_clk_from_child(struct device *dev, struct device_node *np, const char *con_id) { - struct clk **ptr, *clk; + struct devm_clk_state *state; + struct clk *clk;
- ptr = devres_alloc(devm_clk_release, sizeof(*ptr), GFP_KERNEL); - if (!ptr) + state = devres_alloc(devm_clk_release, sizeof(*state), GFP_KERNEL); + if (!state) return ERR_PTR(-ENOMEM);
clk = of_clk_get_by_name(np, con_id); if (!IS_ERR(clk)) { - *ptr = clk; - devres_add(dev, ptr); + state->clk = clk; + devres_add(dev, state); } else { - devres_free(ptr); + devres_free(state); }
return clk;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rik van Riel riel@surriel.com
commit f0362a253606e2031f8d61c74195d4d6556e12a4 upstream.
The code calling ima_free_kexec_buffer runs long after the memblock allocator has already been torn down, potentially resulting in a use after free in memblock_isolate_range.
With KASAN or KFENCE, this use after free will result in a BUG from the idle task, and a subsequent kernel panic.
Switch ima_free_kexec_buffer over to memblock_free_late to avoid that issue.
Fixes: fee3ff99bc67 ("powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c") Cc: stable@kernel.org Signed-off-by: Rik van Riel riel@surriel.com Suggested-by: Mike Rappoport rppt@kernel.org Link: https://lore.kernel.org/r/20230817135759.0888e5ef@imladris.surriel.com Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/of/kexec.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/of/kexec.c +++ b/drivers/of/kexec.c @@ -184,7 +184,8 @@ int __init ima_free_kexec_buffer(void) if (ret) return ret;
- return memblock_phys_free(addr, size); + memblock_free_late(addr, size); + return 0; } #endif
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins hughd@google.com
commit e5548f85b4527c4c803b7eae7887c10bf8f90c97 upstream.
smaps_pte_hole_lookup() is calling shmem_partial_swap_usage() with page table lock held: but shmem_partial_swap_usage() does cond_resched_rcu() if need_resched(): "BUG: sleeping function called from invalid context".
Since shmem_partial_swap_usage() is designed to count across a range, but smaps_pte_hole_lookup() only calls it for a single page slot, just break out of the loop on the last or only page, before checking need_resched().
Link: https://lkml.kernel.org/r/6fe3b3ec-abdf-332f-5c23-6a3b3a3b11a9@google.com Fixes: 230100321518 ("mm/smaps: simplify shmem handling of pte holes") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Peter Xu peterx@redhat.com Cc: stable@vger.kernel.org [5.16+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/shmem.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/shmem.c +++ b/mm/shmem.c @@ -806,14 +806,16 @@ unsigned long shmem_partial_swap_usage(s XA_STATE(xas, &mapping->i_pages, start); struct page *page; unsigned long swapped = 0; + unsigned long max = end - 1;
rcu_read_lock(); - xas_for_each(&xas, page, end - 1) { + xas_for_each(&xas, page, max) { if (xas_retry(&xas, page)) continue; if (xa_is_value(page)) swapped++; - + if (xas.xa_index == max) + break; if (need_resched()) { xas_pause(&xas); cond_resched_rcu();
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 1d0eb6143c1e85d3f9a3f5a616ee7e5dc351d33b upstream.
Like a few other drivers, YMFPCI driver needs to clean up with snd_card_free() call at an error path of the probe; otherwise the other devres resources are released before the card and it results in the UAF.
This patch uses the helper for handling the probe error gracefully.
Fixes: f33fc1576757 ("ALSA: ymfpci: Create card with device-managed snd_devm_card_new()") Cc: stable@vger.kernel.org Reported-and-tested-by: Takashi Yano takashi.yano@nifty.ne.jp Closes: https://lore.kernel.org/r/20230823135846.1812-1-takashi.yano@nifty.ne.jp Link: https://lore.kernel.org/r/20230823161625.5807-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/ymfpci/ymfpci.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/sound/pci/ymfpci/ymfpci.c +++ b/sound/pci/ymfpci/ymfpci.c @@ -152,8 +152,8 @@ static inline int snd_ymfpci_create_game void snd_ymfpci_free_gameport(struct snd_ymfpci *chip) { } #endif /* SUPPORT_JOYSTICK */
-static int snd_card_ymfpci_probe(struct pci_dev *pci, - const struct pci_device_id *pci_id) +static int __snd_card_ymfpci_probe(struct pci_dev *pci, + const struct pci_device_id *pci_id) { static int dev; struct snd_card *card; @@ -348,6 +348,12 @@ static int snd_card_ymfpci_probe(struct return 0; }
+static int snd_card_ymfpci_probe(struct pci_dev *pci, + const struct pci_device_id *pci_id) +{ + return snd_card_free_on_error(&pci->dev, __snd_card_ymfpci_probe(pci, pci_id)); +} + static struct pci_driver ymfpci_driver = { .name = KBUILD_MODNAME, .id_table = snd_ymfpci_ids,
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ayush Jain ayush.jain3@amd.com
commit 1738b949625c7e17a454b25de33f1f415da3db69 upstream.
After commit 2c2241081f7d ("mm/gup: move private gup FOLL_ flags to internal.h") FOLL_LONGTERM flag value got updated from 0x10000 to 0x100 at include/linux/mm_types.h.
As hmm.hmm_device_private.hmm_gup_test uses FOLL_LONGTERM Updating same here as well.
Before this change test goes in an infinite assert loop in hmm.hmm_device_private.hmm_gup_test ========================================================== RUN hmm.hmm_device_private.hmm_gup_test ... hmm-tests.c:1962:hmm_gup_test:Expected HMM_DMIRROR_PROT_WRITE.. ..(2) == m[2] (34) hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0) hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0) ... ==========================================================
Call Trace: <TASK> ? sched_clock+0xd/0x20 ? __lock_acquire.constprop.0+0x120/0x6c0 ? ktime_get+0x2c/0xd0 ? sched_clock+0xd/0x20 ? local_clock+0x12/0xd0 ? lock_release+0x26e/0x3b0 pin_user_pages_fast+0x4c/0x70 gup_test_ioctl+0x4ff/0xbb0 ? gup_test_ioctl+0x68c/0xbb0 __x64_sys_ioctl+0x99/0xd0 do_syscall_64+0x60/0x90 ? syscall_exit_to_user_mode+0x2a/0x50 ? do_syscall_64+0x6d/0x90 ? syscall_exit_to_user_mode+0x2a/0x50 ? do_syscall_64+0x6d/0x90 ? irqentry_exit_to_user_mode+0xd/0x20 ? irqentry_exit+0x3f/0x50 ? exc_page_fault+0x96/0x200 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f6aaa31aaff
After this change test is able to pass successfully.
Link: https://lkml.kernel.org/r/20230808124347.79163-1-ayush.jain3@amd.com Fixes: 2c2241081f7d ("mm/gup: move private gup FOLL_ flags to internal.h") Signed-off-by: Ayush Jain ayush.jain3@amd.com Reviewed-by: Raghavendra K T raghavendra.kt@amd.com Reviewed-by: John Hubbard jhubbard@nvidia.com Acked-by: David Hildenbrand david@redhat.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/mm/hmm-tests.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/hmm-tests.c b/tools/testing/selftests/mm/hmm-tests.c index 4adaad1b822f..20294553a5dd 100644 --- a/tools/testing/selftests/mm/hmm-tests.c +++ b/tools/testing/selftests/mm/hmm-tests.c @@ -57,9 +57,14 @@ enum {
#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) /* Just the flags we need, copied from mm.h: */ -#define FOLL_WRITE 0x01 /* check pte is writable */ -#define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite */
+#ifndef FOLL_WRITE +#define FOLL_WRITE 0x01 /* check pte is writable */ +#endif + +#ifndef FOLL_LONGTERM +#define FOLL_LONGTERM 0x100 /* mapping lifetime is indefinite */ +#endif FIXTURE(hmm) { int fd;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suren Baghdasaryan surenb@google.com
commit 49b0638502da097c15d46cd4e871dbaa022caf7c upstream.
walk_page_range() and friends often operate under write-locked mmap_lock. With introduction of vma locks, the vmas have to be locked as well during such walks to prevent concurrent page faults in these areas. Add an additional member to mm_walk_ops to indicate locking requirements for the walk.
The change ensures that page walks which prevent concurrent page faults by write-locking mmap_lock, operate correctly after introduction of per-vma locks. With per-vma locks page faults can be handled under vma lock without taking mmap_lock at all, so write locking mmap_lock would not stop them. The change ensures vmas are properly locked during such walks.
A sample issue this solves is do_mbind() performing queue_pages_range() to queue pages for migration. Without this change a concurrent page can be faulted into the area and be left out of migration.
Link: https://lkml.kernel.org/r/20230804152724.3090321-2-surenb@google.com Signed-off-by: Suren Baghdasaryan surenb@google.com Suggested-by: Linus Torvalds torvalds@linuxfoundation.org Suggested-by: Jann Horn jannh@google.com Cc: David Hildenbrand david@redhat.com Cc: Davidlohr Bueso dave@stgolabs.net Cc: Hugh Dickins hughd@google.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Laurent Dufour ldufour@linux.ibm.com Cc: Liam Howlett liam.howlett@oracle.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Michal Hocko mhocko@suse.com Cc: Michel Lespinasse michel@lespinasse.org Cc: Peter Xu peterx@redhat.com Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/mm/book3s64/subpage_prot.c | 1 arch/riscv/mm/pageattr.c | 1 arch/s390/mm/gmap.c | 5 ++++ fs/proc/task_mmu.c | 5 ++++ include/linux/pagewalk.h | 11 +++++++++ mm/damon/vaddr.c | 2 + mm/hmm.c | 1 mm/ksm.c | 25 ++++++++++++++-------- mm/madvise.c | 3 ++ mm/memcontrol.c | 2 + mm/memory-failure.c | 1 mm/mempolicy.c | 22 ++++++++++++------- mm/migrate_device.c | 1 mm/mincore.c | 1 mm/mlock.c | 1 mm/mprotect.c | 1 mm/pagewalk.c | 36 +++++++++++++++++++++++++++++--- mm/vmscan.c | 1 18 files changed, 100 insertions(+), 20 deletions(-)
--- a/arch/powerpc/mm/book3s64/subpage_prot.c +++ b/arch/powerpc/mm/book3s64/subpage_prot.c @@ -143,6 +143,7 @@ static int subpage_walk_pmd_entry(pmd_t
static const struct mm_walk_ops subpage_walk_ops = { .pmd_entry = subpage_walk_pmd_entry, + .walk_lock = PGWALK_WRLOCK_VERIFY, };
static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr, --- a/arch/riscv/mm/pageattr.c +++ b/arch/riscv/mm/pageattr.c @@ -102,6 +102,7 @@ static const struct mm_walk_ops pageattr .pmd_entry = pageattr_pmd_entry, .pte_entry = pageattr_pte_entry, .pte_hole = pageattr_pte_hole, + .walk_lock = PGWALK_RDLOCK, };
static int __set_memory(unsigned long addr, int numpages, pgprot_t set_mask, --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2514,6 +2514,7 @@ static int thp_split_walk_pmd_entry(pmd_
static const struct mm_walk_ops thp_split_walk_ops = { .pmd_entry = thp_split_walk_pmd_entry, + .walk_lock = PGWALK_WRLOCK_VERIFY, };
static inline void thp_split_mm(struct mm_struct *mm) @@ -2558,6 +2559,7 @@ static int __zap_zero_pages(pmd_t *pmd,
static const struct mm_walk_ops zap_zero_walk_ops = { .pmd_entry = __zap_zero_pages, + .walk_lock = PGWALK_WRLOCK, };
/* @@ -2648,6 +2650,7 @@ static const struct mm_walk_ops enable_s .hugetlb_entry = __s390_enable_skey_hugetlb, .pte_entry = __s390_enable_skey_pte, .pmd_entry = __s390_enable_skey_pmd, + .walk_lock = PGWALK_WRLOCK, };
int s390_enable_skey(void) @@ -2685,6 +2688,7 @@ static int __s390_reset_cmma(pte_t *pte,
static const struct mm_walk_ops reset_cmma_walk_ops = { .pte_entry = __s390_reset_cmma, + .walk_lock = PGWALK_WRLOCK, };
void s390_reset_cmma(struct mm_struct *mm) @@ -2721,6 +2725,7 @@ static int s390_gather_pages(pte_t *ptep
static const struct mm_walk_ops gather_pages_ops = { .pte_entry = s390_gather_pages, + .walk_lock = PGWALK_RDLOCK, };
/* --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -759,12 +759,14 @@ static int smaps_hugetlb_range(pte_t *pt static const struct mm_walk_ops smaps_walk_ops = { .pmd_entry = smaps_pte_range, .hugetlb_entry = smaps_hugetlb_range, + .walk_lock = PGWALK_RDLOCK, };
static const struct mm_walk_ops smaps_shmem_walk_ops = { .pmd_entry = smaps_pte_range, .hugetlb_entry = smaps_hugetlb_range, .pte_hole = smaps_pte_hole, + .walk_lock = PGWALK_RDLOCK, };
/* @@ -1245,6 +1247,7 @@ static int clear_refs_test_walk(unsigned static const struct mm_walk_ops clear_refs_walk_ops = { .pmd_entry = clear_refs_pte_range, .test_walk = clear_refs_test_walk, + .walk_lock = PGWALK_WRLOCK, };
static ssize_t clear_refs_write(struct file *file, const char __user *buf, @@ -1621,6 +1624,7 @@ static const struct mm_walk_ops pagemap_ .pmd_entry = pagemap_pmd_range, .pte_hole = pagemap_pte_hole, .hugetlb_entry = pagemap_hugetlb_range, + .walk_lock = PGWALK_RDLOCK, };
/* @@ -1932,6 +1936,7 @@ static int gather_hugetlb_stats(pte_t *p static const struct mm_walk_ops show_numa_ops = { .hugetlb_entry = gather_hugetlb_stats, .pmd_entry = gather_pte_stats, + .walk_lock = PGWALK_RDLOCK, };
/* --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -6,6 +6,16 @@
struct mm_walk;
+/* Locking requirement during a page walk. */ +enum page_walk_lock { + /* mmap_lock should be locked for read to stabilize the vma tree */ + PGWALK_RDLOCK = 0, + /* vma will be write-locked during the walk */ + PGWALK_WRLOCK = 1, + /* vma is expected to be already write-locked during the walk */ + PGWALK_WRLOCK_VERIFY = 2, +}; + /** * struct mm_walk_ops - callbacks for walk_page_range * @pgd_entry: if set, called for each non-empty PGD (top-level) entry @@ -66,6 +76,7 @@ struct mm_walk_ops { int (*pre_vma)(unsigned long start, unsigned long end, struct mm_walk *walk); void (*post_vma)(struct mm_walk *walk); + enum page_walk_lock walk_lock; };
/* --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -384,6 +384,7 @@ out: static const struct mm_walk_ops damon_mkold_ops = { .pmd_entry = damon_mkold_pmd_entry, .hugetlb_entry = damon_mkold_hugetlb_entry, + .walk_lock = PGWALK_RDLOCK, };
static void damon_va_mkold(struct mm_struct *mm, unsigned long addr) @@ -519,6 +520,7 @@ out: static const struct mm_walk_ops damon_young_ops = { .pmd_entry = damon_young_pmd_entry, .hugetlb_entry = damon_young_hugetlb_entry, + .walk_lock = PGWALK_RDLOCK, };
static bool damon_va_young(struct mm_struct *mm, unsigned long addr, --- a/mm/hmm.c +++ b/mm/hmm.c @@ -560,6 +560,7 @@ static const struct mm_walk_ops hmm_walk .pte_hole = hmm_vma_walk_hole, .hugetlb_entry = hmm_vma_walk_hugetlb_entry, .test_walk = hmm_vma_walk_test, + .walk_lock = PGWALK_RDLOCK, };
/** --- a/mm/ksm.c +++ b/mm/ksm.c @@ -454,6 +454,12 @@ static int break_ksm_pmd_entry(pmd_t *pm
static const struct mm_walk_ops break_ksm_ops = { .pmd_entry = break_ksm_pmd_entry, + .walk_lock = PGWALK_RDLOCK, +}; + +static const struct mm_walk_ops break_ksm_lock_vma_ops = { + .pmd_entry = break_ksm_pmd_entry, + .walk_lock = PGWALK_WRLOCK, };
/* @@ -469,16 +475,17 @@ static const struct mm_walk_ops break_ks * of the process that owns 'vma'. We also do not want to enforce * protection keys here anyway. */ -static int break_ksm(struct vm_area_struct *vma, unsigned long addr) +static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool lock_vma) { vm_fault_t ret = 0; + const struct mm_walk_ops *ops = lock_vma ? + &break_ksm_lock_vma_ops : &break_ksm_ops;
do { int ksm_page;
cond_resched(); - ksm_page = walk_page_range_vma(vma, addr, addr + 1, - &break_ksm_ops, NULL); + ksm_page = walk_page_range_vma(vma, addr, addr + 1, ops, NULL); if (WARN_ON_ONCE(ksm_page < 0)) return ksm_page; if (!ksm_page) @@ -564,7 +571,7 @@ static void break_cow(struct ksm_rmap_it mmap_read_lock(mm); vma = find_mergeable_vma(mm, addr); if (vma) - break_ksm(vma, addr); + break_ksm(vma, addr, false); mmap_read_unlock(mm); }
@@ -870,7 +877,7 @@ static void remove_trailing_rmap_items(s * in cmp_and_merge_page on one of the rmap_items we would be removing. */ static int unmerge_ksm_pages(struct vm_area_struct *vma, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, bool lock_vma) { unsigned long addr; int err = 0; @@ -881,7 +888,7 @@ static int unmerge_ksm_pages(struct vm_a if (signal_pending(current)) err = -ERESTARTSYS; else - err = break_ksm(vma, addr); + err = break_ksm(vma, addr, lock_vma); } return err; } @@ -1028,7 +1035,7 @@ static int unmerge_and_remove_all_rmap_i if (!(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma) continue; err = unmerge_ksm_pages(vma, - vma->vm_start, vma->vm_end); + vma->vm_start, vma->vm_end, false); if (err) goto error; } @@ -2528,7 +2535,7 @@ static int __ksm_del_vma(struct vm_area_ return 0;
if (vma->anon_vma) { - err = unmerge_ksm_pages(vma, vma->vm_start, vma->vm_end); + err = unmerge_ksm_pages(vma, vma->vm_start, vma->vm_end, true); if (err) return err; } @@ -2666,7 +2673,7 @@ int ksm_madvise(struct vm_area_struct *v return 0; /* just ignore the advice */
if (vma->anon_vma) { - err = unmerge_ksm_pages(vma, start, end); + err = unmerge_ksm_pages(vma, start, end, true); if (err) return err; } --- a/mm/madvise.c +++ b/mm/madvise.c @@ -227,6 +227,7 @@ static int swapin_walk_pmd_entry(pmd_t *
static const struct mm_walk_ops swapin_walk_ops = { .pmd_entry = swapin_walk_pmd_entry, + .walk_lock = PGWALK_RDLOCK, };
static void force_shm_swapin_readahead(struct vm_area_struct *vma, @@ -521,6 +522,7 @@ regular_folio:
static const struct mm_walk_ops cold_walk_ops = { .pmd_entry = madvise_cold_or_pageout_pte_range, + .walk_lock = PGWALK_RDLOCK, };
static void madvise_cold_page_range(struct mmu_gather *tlb, @@ -741,6 +743,7 @@ next:
static const struct mm_walk_ops madvise_free_walk_ops = { .pmd_entry = madvise_free_pte_range, + .walk_lock = PGWALK_RDLOCK, };
static int madvise_free_single_vma(struct vm_area_struct *vma, --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6072,6 +6072,7 @@ static int mem_cgroup_count_precharge_pt
static const struct mm_walk_ops precharge_walk_ops = { .pmd_entry = mem_cgroup_count_precharge_pte_range, + .walk_lock = PGWALK_RDLOCK, };
static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) @@ -6351,6 +6352,7 @@ put: /* get_mctgt_type() gets & locks
static const struct mm_walk_ops charge_walk_ops = { .pmd_entry = mem_cgroup_move_charge_pte_range, + .walk_lock = PGWALK_RDLOCK, };
static void mem_cgroup_move_charge(void) --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -836,6 +836,7 @@ static int hwpoison_hugetlb_range(pte_t static const struct mm_walk_ops hwp_walk_ops = { .pmd_entry = hwpoison_pte_range, .hugetlb_entry = hwpoison_hugetlb_range, + .walk_lock = PGWALK_RDLOCK, };
/* --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -715,6 +715,14 @@ static const struct mm_walk_ops queue_pa .hugetlb_entry = queue_folios_hugetlb, .pmd_entry = queue_folios_pte_range, .test_walk = queue_pages_test_walk, + .walk_lock = PGWALK_RDLOCK, +}; + +static const struct mm_walk_ops queue_pages_lock_vma_walk_ops = { + .hugetlb_entry = queue_folios_hugetlb, + .pmd_entry = queue_folios_pte_range, + .test_walk = queue_pages_test_walk, + .walk_lock = PGWALK_WRLOCK, };
/* @@ -735,7 +743,7 @@ static const struct mm_walk_ops queue_pa static int queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, nodemask_t *nodes, unsigned long flags, - struct list_head *pagelist) + struct list_head *pagelist, bool lock_vma) { int err; struct queue_pages qp = { @@ -746,8 +754,10 @@ queue_pages_range(struct mm_struct *mm, .end = end, .first = NULL, }; + const struct mm_walk_ops *ops = lock_vma ? + &queue_pages_lock_vma_walk_ops : &queue_pages_walk_ops;
- err = walk_page_range(mm, start, end, &queue_pages_walk_ops, &qp); + err = walk_page_range(mm, start, end, ops, &qp);
if (!qp.first) /* whole range in hole */ @@ -1075,7 +1085,7 @@ static int migrate_to_node(struct mm_str vma = find_vma(mm, 0); VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))); queue_pages_range(mm, vma->vm_start, mm->task_size, &nmask, - flags | MPOL_MF_DISCONTIG_OK, &pagelist); + flags | MPOL_MF_DISCONTIG_OK, &pagelist, false);
if (!list_empty(&pagelist)) { err = migrate_pages(&pagelist, alloc_migration_target, NULL, @@ -1321,12 +1331,8 @@ static long do_mbind(unsigned long start * Lock the VMAs before scanning for pages to migrate, to ensure we don't * miss a concurrently inserted page. */ - vma_iter_init(&vmi, mm, start); - for_each_vma_range(vmi, vma, end) - vma_start_write(vma); - ret = queue_pages_range(mm, start, end, nmask, - flags | MPOL_MF_INVERT, &pagelist); + flags | MPOL_MF_INVERT, &pagelist, true);
if (ret < 0) { err = ret; --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -286,6 +286,7 @@ next: static const struct mm_walk_ops migrate_vma_walk_ops = { .pmd_entry = migrate_vma_collect_pmd, .pte_hole = migrate_vma_collect_hole, + .walk_lock = PGWALK_RDLOCK, };
/* --- a/mm/mincore.c +++ b/mm/mincore.c @@ -177,6 +177,7 @@ static const struct mm_walk_ops mincore_ .pmd_entry = mincore_pte_range, .pte_hole = mincore_unmapped_range, .hugetlb_entry = mincore_hugetlb, + .walk_lock = PGWALK_RDLOCK, };
/* --- a/mm/mlock.c +++ b/mm/mlock.c @@ -365,6 +365,7 @@ static void mlock_vma_pages_range(struct { static const struct mm_walk_ops mlock_walk_ops = { .pmd_entry = mlock_pte_range, + .walk_lock = PGWALK_WRLOCK_VERIFY, };
/* --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -611,6 +611,7 @@ static const struct mm_walk_ops prot_non .pte_entry = prot_none_pte_entry, .hugetlb_entry = prot_none_hugetlb_entry, .test_walk = prot_none_test, + .walk_lock = PGWALK_WRLOCK, };
int --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -384,6 +384,33 @@ static int __walk_page_range(unsigned lo return err; }
+static inline void process_mm_walk_lock(struct mm_struct *mm, + enum page_walk_lock walk_lock) +{ + if (walk_lock == PGWALK_RDLOCK) + mmap_assert_locked(mm); + else + mmap_assert_write_locked(mm); +} + +static inline void process_vma_walk_lock(struct vm_area_struct *vma, + enum page_walk_lock walk_lock) +{ +#ifdef CONFIG_PER_VMA_LOCK + switch (walk_lock) { + case PGWALK_WRLOCK: + vma_start_write(vma); + break; + case PGWALK_WRLOCK_VERIFY: + vma_assert_write_locked(vma); + break; + case PGWALK_RDLOCK: + /* PGWALK_RDLOCK is handled by process_mm_walk_lock */ + break; + } +#endif +} + /** * walk_page_range - walk page table with caller specific callbacks * @mm: mm_struct representing the target process of page table walk @@ -443,7 +470,7 @@ int walk_page_range(struct mm_struct *mm if (!walk.mm) return -EINVAL;
- mmap_assert_locked(walk.mm); + process_mm_walk_lock(walk.mm, ops->walk_lock);
vma = find_vma(walk.mm, start); do { @@ -458,6 +485,7 @@ int walk_page_range(struct mm_struct *mm if (ops->pte_hole) err = ops->pte_hole(start, next, -1, &walk); } else { /* inside vma */ + process_vma_walk_lock(vma, ops->walk_lock); walk.vma = vma; next = min(end, vma->vm_end); vma = find_vma(mm, vma->vm_end); @@ -533,7 +561,8 @@ int walk_page_range_vma(struct vm_area_s if (start < vma->vm_start || end > vma->vm_end) return -EINVAL;
- mmap_assert_locked(walk.mm); + process_mm_walk_lock(walk.mm, ops->walk_lock); + process_vma_walk_lock(vma, ops->walk_lock); return __walk_page_range(start, end, &walk); }
@@ -550,7 +579,8 @@ int walk_page_vma(struct vm_area_struct if (!walk.mm) return -EINVAL;
- mmap_assert_locked(walk.mm); + process_mm_walk_lock(walk.mm, ops->walk_lock); + process_vma_walk_lock(vma, ops->walk_lock); return __walk_page_range(vma->vm_start, vma->vm_end, &walk); }
--- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4249,6 +4249,7 @@ static void walk_mm(struct lruvec *lruve static const struct mm_walk_ops mm_walk_ops = { .test_walk = should_skip_vma, .p4d_entry = walk_pud_range, + .walk_lock = PGWALK_RDLOCK, };
int err;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit d74943a2f3cdade34e471b36f55f7979be656867 upstream.
Unfortunately commit 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") missed that follow_page() and follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting or due to inaccessible (PROT_NONE) VMAs.
As spelled out in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from gup/gup_fast"): "Other follow_page callers like KSM should not use FOLL_NUMA, or they would fail to get the pages if they use follow_page instead of get_user_pages."
liubo reported [1] that smaps_rollup results are imprecise, because they miss accounting of pages that are mapped PROT_NONE. Further, it's easy to reproduce that KSM no longer works on inaccessible VMAs on x86-64, because pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs, and follow_page() refuses to return such pages right now.
As KVM really depends on these NUMA hinting faults, removing the pte_protnone()/pmd_protnone() handling in GUP code completely is not really an option.
To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT to restore the original behavior for now and add better comments.
Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in is_valid_gup_args(), to add that flag for all external GUP users.
Note that there are three GUP-internal __get_user_pages() users that don't end up calling is_valid_gup_args() and consequently won't get FOLL_HONOR_NUMA_FAULT set.
1) get_dump_page(): we really don't want to handle NUMA hinting faults. It specifies FOLL_FORCE and wouldn't have honored NUMA hinting faults already. 2) populate_vma_page_range(): we really don't want to handle NUMA hinting faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have honored NUMA hinting faults already. 3) faultin_vma_page_range(): we similarly don't want to handle NUMA hinting faults.
To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in inaccessible VMAs properly, we have to perform VMA accessibility checks in gup_can_follow_protnone().
As GUP-fast should reject such pages either way in pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and arm64 that both implement pte_protnone() -- let's just always fallback to ordinary GUP when stumbling over pte_protnone()/pmd_protnone().
As Linus notes [2], honoring NUMA faults might only make sense for selected GUP users.
So we should really see if we can instead let relevant GUP callers specify it manually, and not trigger NUMA hinting faults from GUP as default. Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and adding appropriate documenation.
While at it, remove a stale comment from follow_trans_huge_pmd(): That comment for pmd_protnone() was added in commit 2b4847e73004 ("mm: numa: serialise parallel get_user_page against THP migration"), which noted:
THP does not unmap pages due to a lack of support for migration entries at a PMD level. This allows races with get_user_pages
Nowadays, we do have PMD migration entries, so the comment no longer applies. Let's drop it.
[1] https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com [2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs=...
Link: https://lkml.kernel.org/r/20230803143208.383663-2-david@redhat.com Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") Signed-off-by: David Hildenbrand david@redhat.com Reported-by: liubo liubo254@huawei.com Closes: https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com Reported-by: Peter Xu peterx@redhat.com Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/ Acked-by: Mel Gorman mgorman@techsingularity.net Acked-by: Peter Xu peterx@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: John Hubbard jhubbard@nvidia.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@suse.de Cc: Paolo Bonzini pbonzini@redhat.com Cc: Shuah Khan shuah@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/mm.h | 21 +++++++++++++++------ include/linux/mm_types.h | 9 +++++++++ mm/gup.c | 30 ++++++++++++++++++++++++------ mm/huge_memory.c | 3 +-- 4 files changed, 49 insertions(+), 14 deletions(-)
--- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3381,15 +3381,24 @@ static inline int vm_fault_to_errno(vm_f * Indicates whether GUP can follow a PROT_NONE mapped page, or whether * a (NUMA hinting) fault is required. */ -static inline bool gup_can_follow_protnone(unsigned int flags) +static inline bool gup_can_follow_protnone(struct vm_area_struct *vma, + unsigned int flags) { /* - * FOLL_FORCE has to be able to make progress even if the VMA is - * inaccessible. Further, FOLL_FORCE access usually does not represent - * application behaviour and we should avoid triggering NUMA hinting - * faults. + * If callers don't want to honor NUMA hinting faults, no need to + * determine if we would actually have to trigger a NUMA hinting fault. */ - return flags & FOLL_FORCE; + if (!(flags & FOLL_HONOR_NUMA_FAULT)) + return true; + + /* + * NUMA hinting faults don't apply in inaccessible (PROT_NONE) VMAs. + * + * Requiring a fault here even for inaccessible VMAs would mean that + * FOLL_FORCE cannot make any progress, because handle_mm_fault() + * refuses to process NUMA hinting faults in inaccessible VMAs. + */ + return !vma_is_accessible(vma); }
typedef int (*pte_fn_t)(pte_t *pte, unsigned long addr, void *data); --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1286,6 +1286,15 @@ enum { FOLL_PCI_P2PDMA = 1 << 10, /* allow interrupts from generic signals */ FOLL_INTERRUPTIBLE = 1 << 11, + /* + * Always honor (trigger) NUMA hinting faults. + * + * FOLL_WRITE implicitly honors NUMA hinting faults because a + * PROT_NONE-mapped page is not writable (exceptions with FOLL_FORCE + * apply). get_user_pages_fast_only() always implicitly honors NUMA + * hinting faults. + */ + FOLL_HONOR_NUMA_FAULT = 1 << 12,
/* See also internal only FOLL flags in mm/internal.h */ }; --- a/mm/gup.c +++ b/mm/gup.c @@ -551,7 +551,7 @@ static struct page *follow_page_pte(stru pte = *ptep; if (!pte_present(pte)) goto no_page; - if (pte_protnone(pte) && !gup_can_follow_protnone(flags)) + if (pte_protnone(pte) && !gup_can_follow_protnone(vma, flags)) goto no_page;
page = vm_normal_page(vma, address, pte); @@ -672,7 +672,7 @@ static struct page *follow_pmd_mask(stru if (likely(!pmd_trans_huge(pmdval))) return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
- if (pmd_protnone(pmdval) && !gup_can_follow_protnone(flags)) + if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags)) return no_page_table(vma, flags);
ptl = pmd_lock(mm, pmd); @@ -820,6 +820,10 @@ struct page *follow_page(struct vm_area_ if (WARN_ON_ONCE(foll_flags & FOLL_PIN)) return NULL;
+ /* + * We never set FOLL_HONOR_NUMA_FAULT because callers don't expect + * to fail on PROT_NONE-mapped pages. + */ page = follow_page_mask(vma, address, foll_flags, &ctx); if (ctx.pgmap) put_dev_pagemap(ctx.pgmap); @@ -2134,6 +2138,13 @@ static bool is_valid_gup_args(struct pag gup_flags |= FOLL_UNLOCKABLE; }
+ /* + * For now, always trigger NUMA hinting faults. Some GUP users like + * KVM require the hint to be as the calling context of GUP is + * functionally similar to a memory reference from task context. + */ + gup_flags |= FOLL_HONOR_NUMA_FAULT; + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ if (WARN_ON_ONCE((gup_flags & (FOLL_PIN | FOLL_GET)) == (FOLL_PIN | FOLL_GET))) @@ -2394,7 +2405,14 @@ static int gup_pte_range(pmd_t pmd, pmd_ struct page *page; struct folio *folio;
- if (pte_protnone(pte) && !gup_can_follow_protnone(flags)) + /* + * Always fallback to ordinary GUP on PROT_NONE-mapped pages: + * pte_access_permitted() better should reject these pages + * either way: otherwise, GUP-fast might succeed in + * cases where ordinary GUP would fail due to VMA access + * permissions. + */ + if (pte_protnone(pte)) goto pte_unmap;
if (!pte_access_permitted(pte, flags & FOLL_WRITE)) @@ -2784,8 +2802,8 @@ static int gup_pmd_range(pud_t *pudp, pu
if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) || pmd_devmap(pmd))) { - if (pmd_protnone(pmd) && - !gup_can_follow_protnone(flags)) + /* See gup_pte_range() */ + if (pmd_protnone(pmd)) return 0;
if (!gup_huge_pmd(pmd, pmdp, addr, next, flags, @@ -2965,7 +2983,7 @@ static int internal_get_user_pages_fast( if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | FOLL_FORCE | FOLL_PIN | FOLL_GET | FOLL_FAST_ONLY | FOLL_NOFAULT | - FOLL_PCI_P2PDMA))) + FOLL_PCI_P2PDMA | FOLL_HONOR_NUMA_FAULT))) return -EINVAL;
if (gup_flags & FOLL_PIN) --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1467,8 +1467,7 @@ struct page *follow_trans_huge_pmd(struc if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd)) return ERR_PTR(-EFAULT);
- /* Full NUMA hinting faults to serialise migration in fault paths */ - if (pmd_protnone(*pmd) && !gup_can_follow_protnone(flags)) + if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags)) return NULL;
if (!pmd_write(*pmd) && gup_must_unshare(vma, flags, page))
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit 5805192c7b7257d290474cb1a3897d0567281bbc upstream.
In contrast to most other GUP code, GUP-fast common page table walking code like gup_pte_range() also handles hugetlb pages. But in contrast to other hugetlb page table walking code, it does not look at the hugetlb PTE abstraction whereby we have only a single logical hugetlb PTE per hugetlb page, even when using multiple cont-PTEs underneath -- which is for example what huge_ptep_get() abstracts.
So when we have a hugetlb page that is mapped via cont-PTEs, GUP-fast might stumble over a PTE that does not map the head page of a hugetlb page -- not the first "head" PTE of such a cont mapping.
Logically, the whole hugetlb page is mapped (entire_mapcount == 1), but we might end up calling gup_must_unshare() with a tail page of a hugetlb page.
We only maintain a single PageAnonExclusive flag per hugetlb page (as hugetlb pages cannot get partially COW-shared), stored for the head page. That flag is clear for all tail pages.
So when gup_must_unshare() ends up calling PageAnonExclusive() with a tail page of a hugetlb page:
1) With CONFIG_DEBUG_VM_PGFLAGS
Stumbles over the:
VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page);
For example, when executing the COW selftests with 64k hugetlb pages on arm64:
[ 61.082187] page:00000000829819ff refcount:3 mapcount:1 mapping:0000000000000000 index:0x1 pfn:0x11ee11 [ 61.082842] head:0000000080f79bf7 order:4 entire_mapcount:1 nr_pages_mapped:0 pincount:2 [ 61.083384] anon flags: 0x17ffff80003000e(referenced|uptodate|dirty|head|mappedtodisk|node=0|zone=2|lastcpupid=0xfffff) [ 61.084101] page_type: 0xffffffff() [ 61.084332] raw: 017ffff800000000 fffffc00037b8401 0000000000000402 0000000200000000 [ 61.084840] raw: 0000000000000010 0000000000000000 00000000ffffffff 0000000000000000 [ 61.085359] head: 017ffff80003000e ffffd9e95b09b788 ffffd9e95b09b788 ffff0007ff63cf71 [ 61.085885] head: 0000000000000000 0000000000000002 00000003ffffffff 0000000000000000 [ 61.086415] page dumped because: VM_BUG_ON_PAGE(PageHuge(page) && !PageHead(page)) [ 61.086914] ------------[ cut here ]------------ [ 61.087220] kernel BUG at include/linux/page-flags.h:990! [ 61.087591] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 61.087999] Modules linked in: ... [ 61.089404] CPU: 0 PID: 4612 Comm: cow Kdump: loaded Not tainted 6.5.0-rc4+ #3 [ 61.089917] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 61.090409] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 61.090897] pc : gup_must_unshare.part.0+0x64/0x98 [ 61.091242] lr : gup_must_unshare.part.0+0x64/0x98 [ 61.091592] sp : ffff8000825eb940 [ 61.091826] x29: ffff8000825eb940 x28: 0000000000000000 x27: fffffc00037b8440 [ 61.092329] x26: 0400000000000001 x25: 0000000000080101 x24: 0000000000080000 [ 61.092835] x23: 0000000000080100 x22: ffff0000cffb9588 x21: ffff0000c8ec6b58 [ 61.093341] x20: 0000ffffad6b1000 x19: fffffc00037b8440 x18: ffffffffffffffff [ 61.093850] x17: 2864616548656761 x16: 5021202626202965 x15: 6761702865677548 [ 61.094358] x14: 6567615028454741 x13: 2929656761702864 x12: 6165486567615021 [ 61.094858] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffd9e958b7a1c0 [ 61.095359] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 00000000002bffa8 [ 61.095873] x5 : ffff0008bb19e708 x4 : 0000000000000000 x3 : 0000000000000000 [ 61.096380] x2 : 0000000000000000 x1 : ffff0000cf6636c0 x0 : 0000000000000046 [ 61.096894] Call trace: [ 61.097080] gup_must_unshare.part.0+0x64/0x98 [ 61.097392] gup_pte_range+0x3a8/0x3f0 [ 61.097662] gup_pgd_range+0x1ec/0x280 [ 61.097942] lockless_pages_from_mm+0x64/0x1a0 [ 61.098258] internal_get_user_pages_fast+0xe4/0x1d0 [ 61.098612] pin_user_pages_fast+0x58/0x78 [ 61.098917] pin_longterm_test_start+0xf4/0x2b8 [ 61.099243] gup_test_ioctl+0x170/0x3b0 [ 61.099528] __arm64_sys_ioctl+0xa8/0xf0 [ 61.099822] invoke_syscall.constprop.0+0x7c/0xd0 [ 61.100160] el0_svc_common.constprop.0+0xe8/0x100 [ 61.100500] do_el0_svc+0x38/0xa0 [ 61.100736] el0_svc+0x3c/0x198 [ 61.100971] el0t_64_sync_handler+0x134/0x150 [ 61.101280] el0t_64_sync+0x17c/0x180 [ 61.101543] Code: aa1303e0 f00074c1 912b0021 97fffeb2 (d4210000)
2) Without CONFIG_DEBUG_VM_PGFLAGS
Always detects "not exclusive" for passed tail pages and refuses to PIN the tail pages R/O, as gup_must_unshare() == true. GUP-fast will fallback to ordinary GUP. As ordinary GUP properly considers the logical hugetlb PTE abstraction in hugetlb_follow_page_mask(), pinning the page will succeed when looking at the PageAnonExclusive on the head page only.
So the only real effect of this is that with cont-PTE hugetlb pages, we'll always fallback from GUP-fast to ordinary GUP when not working on the head page, which ends up checking the head page and do the right thing.
Consequently, the cow selftests pass with cont-PTE hugetlb pages as well without CONFIG_DEBUG_VM_PGFLAGS.
Note that this only applies to anon hugetlb pages that are mapped using cont-PTEs: for example 64k hugetlb pages on a 4k arm64 kernel.
... and only when R/O-pinning (FOLL_PIN) such pages that are mapped into the page table R/O using GUP-fast.
On production kernels (and even most debug kernels, that don't set CONFIG_DEBUG_VM_PGFLAGS) this patch should theoretically not be required to be backported. But of course, it does not hurt.
Link: https://lkml.kernel.org/r/20230805101256.87306-1-david@redhat.com Fixes: a7f226604170 ("mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page") Signed-off-by: David Hildenbrand david@redhat.com Reported-by: Ryan Roberts ryan.roberts@arm.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Tested-by: Ryan Roberts ryan.roberts@arm.com Cc: Vlastimil Babka vbabka@suse.cz Cc: John Hubbard jhubbard@nvidia.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Peter Xu peterx@redhat.com Cc: Mike Kravetz mike.kravetz@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/internal.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/mm/internal.h +++ b/mm/internal.h @@ -995,6 +995,16 @@ static inline bool gup_must_unshare(stru smp_rmb();
/* + * During GUP-fast we might not get called on the head page for a + * hugetlb page that is mapped using cont-PTE, because GUP-fast does + * not work with the abstracted hugetlb PTEs that always point at the + * head page. For hugetlb, PageAnonExclusive only applies on the head + * page (as it cannot be partially COW-shared), so lookup the head page. + */ + if (unlikely(!PageHead(page) && PageHuge(page))) + page = compound_head(page); + + /* * Note that PageKsm() pages cannot be exclusive, and consequently, * cannot get pinned. */
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zack Rusin zackr@vmware.com
commit 14abdfae508228a7307f7491b5c4215ae70c6542 upstream.
For multiple commands the driver was not correctly validating the shader stages resulting in possible kernel oopses. The validation code was only. if ever, checking the upper bound on the shader stages but never a lower bound (valid shader stages start at 1 not 0).
Fixes kernel oopses ending up in vmw_binding_add, e.g.: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 2443 Comm: testcase Not tainted 6.3.0-rc4-vmwgfx #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:vmw_binding_add+0x4c/0x140 [vmwgfx] Code: 7e 30 49 83 ff 0e 0f 87 ea 00 00 00 4b 8d 04 7f 89 d2 89 cb 48 c1 e0 03 4c 8b b0 40 3d 93 c0 48 8b 80 48 3d 93 c0 49 0f af de <48> 03 1c d0 4c 01 e3 49 8> RSP: 0018:ffffb8014416b968 EFLAGS: 00010206 RAX: ffffffffc0933ec0 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 00000000ffffffff RSI: ffffb8014416b9c0 RDI: ffffb8014316f000 RBP: ffffb8014416b998 R08: 0000000000000003 R09: 746f6c735f726564 R10: ffffffffaaf2bda0 R11: 732e676e69646e69 R12: ffffb8014316f000 R13: ffffb8014416b9c0 R14: 0000000000000040 R15: 0000000000000006 FS: 00007fba8c0af740(0000) GS:ffff8a1277c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000007c0933eb8 CR3: 0000000118244001 CR4: 00000000003706e0 Call Trace: <TASK> vmw_view_bindings_add+0xf5/0x1b0 [vmwgfx] ? ___drm_dbg+0x8a/0xb0 [drm] vmw_cmd_dx_set_shader_res+0x8f/0xc0 [vmwgfx] vmw_execbuf_process+0x590/0x1360 [vmwgfx] vmw_execbuf_ioctl+0x173/0x370 [vmwgfx] ? __drm_dev_dbg+0xb4/0xe0 [drm] ? __pfx_vmw_execbuf_ioctl+0x10/0x10 [vmwgfx] drm_ioctl_kernel+0xbc/0x160 [drm] drm_ioctl+0x2d2/0x580 [drm] ? __pfx_vmw_execbuf_ioctl+0x10/0x10 [vmwgfx] ? do_fault+0x1a6/0x420 vmw_generic_ioctl+0xbd/0x180 [vmwgfx] vmw_unlocked_ioctl+0x19/0x20 [vmwgfx] __x64_sys_ioctl+0x96/0xd0 do_syscall_64+0x5d/0x90 ? handle_mm_fault+0xe4/0x2f0 ? debug_smp_processor_id+0x1b/0x30 ? fpregs_assert_state_consistent+0x2e/0x50 ? exit_to_user_mode_prepare+0x40/0x180 ? irqentry_exit_to_user_mode+0xd/0x20 ? irqentry_exit+0x3f/0x50 ? exc_page_fault+0x8b/0x180 entry_SYSCALL_64_after_hwframe+0x72/0xdc
Signed-off-by: Zack Rusin zackr@vmware.com Cc: security@openanolis.org Reported-by: Ziming Zhang ezrakiez@gmail.com Testcase-found-by: Niels De Graef ndegraef@redhat.com Fixes: d80efd5cb3de ("drm/vmwgfx: Initial DX support") Cc: stable@vger.kernel.org # v4.3+ Reviewed-by: Maaz Mombasawalamombasawalam@vmware.com Reviewed-by: Martin Krastev krastevm@vmware.com Link: https://patchwork.freedesktop.org/patch/msgid/20230616190934.54828-1-zack@kd... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 12 ++++++++++++ drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 29 +++++++++++------------------ 2 files changed, 23 insertions(+), 18 deletions(-)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@ -1513,4 +1513,16 @@ static inline bool vmw_has_fences(struct return (vmw_fifo_caps(vmw) & SVGA_FIFO_CAP_FENCE) != 0; }
+static inline bool vmw_shadertype_is_valid(enum vmw_sm_type shader_model, + u32 shader_type) +{ + SVGA3dShaderType max_allowed = SVGA3D_SHADERTYPE_PREDX_MAX; + + if (shader_model >= VMW_SM_5) + max_allowed = SVGA3D_SHADERTYPE_MAX; + else if (shader_model >= VMW_SM_4) + max_allowed = SVGA3D_SHADERTYPE_DX10_MAX; + return shader_type >= SVGA3D_SHADERTYPE_MIN && shader_type < max_allowed; +} + #endif --- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c @@ -1992,7 +1992,7 @@ static int vmw_cmd_set_shader(struct vmw
cmd = container_of(header, typeof(*cmd), header);
- if (cmd->body.type >= SVGA3D_SHADERTYPE_PREDX_MAX) { + if (!vmw_shadertype_is_valid(VMW_SM_LEGACY, cmd->body.type)) { VMW_DEBUG_USER("Illegal shader type %u.\n", (unsigned int) cmd->body.type); return -EINVAL; @@ -2115,8 +2115,6 @@ vmw_cmd_dx_set_single_constant_buffer(st SVGA3dCmdHeader *header) { VMW_DECLARE_CMD_VAR(*cmd, SVGA3dCmdDXSetSingleConstantBuffer); - SVGA3dShaderType max_shader_num = has_sm5_context(dev_priv) ? - SVGA3D_NUM_SHADERTYPE : SVGA3D_NUM_SHADERTYPE_DX10;
struct vmw_resource *res = NULL; struct vmw_ctx_validation_info *ctx_node = VMW_GET_CTX_NODE(sw_context); @@ -2133,6 +2131,14 @@ vmw_cmd_dx_set_single_constant_buffer(st if (unlikely(ret != 0)) return ret;
+ if (!vmw_shadertype_is_valid(dev_priv->sm_type, cmd->body.type) || + cmd->body.slot >= SVGA3D_DX_MAX_CONSTBUFFERS) { + VMW_DEBUG_USER("Illegal const buffer shader %u slot %u.\n", + (unsigned int) cmd->body.type, + (unsigned int) cmd->body.slot); + return -EINVAL; + } + binding.bi.ctx = ctx_node->ctx; binding.bi.res = res; binding.bi.bt = vmw_ctx_binding_cb; @@ -2141,14 +2147,6 @@ vmw_cmd_dx_set_single_constant_buffer(st binding.size = cmd->body.sizeInBytes; binding.slot = cmd->body.slot;
- if (binding.shader_slot >= max_shader_num || - binding.slot >= SVGA3D_DX_MAX_CONSTBUFFERS) { - VMW_DEBUG_USER("Illegal const buffer shader %u slot %u.\n", - (unsigned int) cmd->body.type, - (unsigned int) binding.slot); - return -EINVAL; - } - vmw_binding_add(ctx_node->staged, &binding.bi, binding.shader_slot, binding.slot);
@@ -2207,15 +2205,13 @@ static int vmw_cmd_dx_set_shader_res(str { VMW_DECLARE_CMD_VAR(*cmd, SVGA3dCmdDXSetShaderResources) = container_of(header, typeof(*cmd), header); - SVGA3dShaderType max_allowed = has_sm5_context(dev_priv) ? - SVGA3D_SHADERTYPE_MAX : SVGA3D_SHADERTYPE_DX10_MAX;
u32 num_sr_view = (cmd->header.size - sizeof(cmd->body)) / sizeof(SVGA3dShaderResourceViewId);
if ((u64) cmd->body.startView + (u64) num_sr_view > (u64) SVGA3D_DX_MAX_SRVIEWS || - cmd->body.type >= max_allowed) { + !vmw_shadertype_is_valid(dev_priv->sm_type, cmd->body.type)) { VMW_DEBUG_USER("Invalid shader binding.\n"); return -EINVAL; } @@ -2239,8 +2235,6 @@ static int vmw_cmd_dx_set_shader(struct SVGA3dCmdHeader *header) { VMW_DECLARE_CMD_VAR(*cmd, SVGA3dCmdDXSetShader); - SVGA3dShaderType max_allowed = has_sm5_context(dev_priv) ? - SVGA3D_SHADERTYPE_MAX : SVGA3D_SHADERTYPE_DX10_MAX; struct vmw_resource *res = NULL; struct vmw_ctx_validation_info *ctx_node = VMW_GET_CTX_NODE(sw_context); struct vmw_ctx_bindinfo_shader binding; @@ -2251,8 +2245,7 @@ static int vmw_cmd_dx_set_shader(struct
cmd = container_of(header, typeof(*cmd), header);
- if (cmd->body.type >= max_allowed || - cmd->body.type < SVGA3D_SHADERTYPE_MIN) { + if (!vmw_shadertype_is_valid(dev_priv->sm_type, cmd->body.type)) { VMW_DEBUG_USER("Illegal shader type %u.\n", (unsigned int) cmd->body.type); return -EINVAL;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zack Rusin zackr@vmware.com
commit f9e96bf1905479f18e83a3a4c314a8dfa56ede2c upstream.
vmw_bo_unreference sets the input buffer to null on exit, resulting in null ptr deref's on the subsequent drm gem put calls.
This went unnoticed because only very old userspace would be exercising those paths but it wouldn't be hard to hit on old distros with brand new kernels.
Introduce a new function that abstracts unrefing of user bo's to make the code cleaner and more explicit.
Signed-off-by: Zack Rusin zackr@vmware.com Reported-by: Ian Forbes iforbes@vmware.com Fixes: 9ef8d83e8e25 ("drm/vmwgfx: Do not drop the reference to the handle too soon") Cc: stable@vger.kernel.org # v6.4+ Reviewed-by: Maaz Mombasawalamombasawalam@vmware.com Link: https://patchwork.freedesktop.org/patch/msgid/20230818041301.407636-1-zack@k... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 6 ++---- drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 8 ++++++++ drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 6 ++---- drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 6 ++---- drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c | 3 +-- drivers/gpu/drm/vmwgfx/vmwgfx_shader.c | 3 +-- 6 files changed, 16 insertions(+), 16 deletions(-)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c @@ -497,10 +497,9 @@ static int vmw_user_bo_synccpu_release(s if (!(flags & drm_vmw_synccpu_allow_cs)) { atomic_dec(&vmw_bo->cpu_writers); } - ttm_bo_put(&vmw_bo->tbo); + vmw_user_bo_unref(vmw_bo); }
- drm_gem_object_put(&vmw_bo->tbo.base); return ret; }
@@ -540,8 +539,7 @@ int vmw_user_bo_synccpu_ioctl(struct drm return ret;
ret = vmw_user_bo_synccpu_grab(vbo, arg->flags); - vmw_bo_unreference(&vbo); - drm_gem_object_put(&vbo->tbo.base); + vmw_user_bo_unref(vbo); if (unlikely(ret != 0)) { if (ret == -ERESTARTSYS || ret == -EBUSY) return -EBUSY; --- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h @@ -195,6 +195,14 @@ static inline struct vmw_bo *vmw_bo_refe return buf; }
+static inline void vmw_user_bo_unref(struct vmw_bo *vbo) +{ + if (vbo) { + ttm_bo_put(&vbo->tbo); + drm_gem_object_put(&vbo->tbo.base); + } +} + static inline struct vmw_bo *to_vmw_bo(struct drm_gem_object *gobj) { return container_of((gobj), struct vmw_bo, tbo.base); --- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c @@ -1164,8 +1164,7 @@ static int vmw_translate_mob_ptr(struct } vmw_bo_placement_set(vmw_bo, VMW_BO_DOMAIN_MOB, VMW_BO_DOMAIN_MOB); ret = vmw_validation_add_bo(sw_context->ctx, vmw_bo); - ttm_bo_put(&vmw_bo->tbo); - drm_gem_object_put(&vmw_bo->tbo.base); + vmw_user_bo_unref(vmw_bo); if (unlikely(ret != 0)) return ret;
@@ -1221,8 +1220,7 @@ static int vmw_translate_guest_ptr(struc vmw_bo_placement_set(vmw_bo, VMW_BO_DOMAIN_GMR | VMW_BO_DOMAIN_VRAM, VMW_BO_DOMAIN_GMR | VMW_BO_DOMAIN_VRAM); ret = vmw_validation_add_bo(sw_context->ctx, vmw_bo); - ttm_bo_put(&vmw_bo->tbo); - drm_gem_object_put(&vmw_bo->tbo.base); + vmw_user_bo_unref(vmw_bo); if (unlikely(ret != 0)) return ret;
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c @@ -1665,10 +1665,8 @@ static struct drm_framebuffer *vmw_kms_f
err_out: /* vmw_user_lookup_handle takes one ref so does new_fb */ - if (bo) { - vmw_bo_unreference(&bo); - drm_gem_object_put(&bo->tbo.base); - } + if (bo) + vmw_user_bo_unref(bo); if (surface) vmw_surface_unreference(&surface);
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c @@ -451,8 +451,7 @@ int vmw_overlay_ioctl(struct drm_device
ret = vmw_overlay_update_stream(dev_priv, buf, arg, true);
- vmw_bo_unreference(&buf); - drm_gem_object_put(&buf->tbo.base); + vmw_user_bo_unref(buf);
out_unlock: mutex_unlock(&overlay->mutex); --- a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c @@ -809,8 +809,7 @@ static int vmw_shader_define(struct drm_ shader_type, num_input_sig, num_output_sig, tfile, shader_handle); out_bad_arg: - vmw_bo_unreference(&buffer); - drm_gem_object_put(&buffer->tbo.base); + vmw_user_bo_unref(buffer); return ret; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Imre Deak imre.deak@intel.com
commit a94e7ccfc400c024976f3c2f31689ed843498b7c upstream.
Add a helper to reschedule drm_mode_config::output_poll_work after polling has been enabled for a connector (and needing a reschedule, since previously polling was disabled for all connectors and hence output_poll_work was not running).
This is needed by the next patch fixing HPD polling on i915.
CC: stable@vger.kernel.org # 6.4+ Cc: Dmitry Baryshkov dmitry.baryshkov@linaro.org Cc: dri-devel@lists.freedesktop.org Reviewed-by: Jouni Högander jouni.hogander@intel.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Imre Deak imre.deak@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230822113015.41224-1-imre.de... (cherry picked from commit fe2352fd64029918174de4b460dfe6df0c6911cd) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_probe_helper.c | 68 ++++++++++++++++++++---------- include/drm/drm_probe_helper.h | 1 + 2 files changed, 47 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/drm_probe_helper.c b/drivers/gpu/drm/drm_probe_helper.c index 2fb9bf901a2c..3f479483d7d8 100644 --- a/drivers/gpu/drm/drm_probe_helper.c +++ b/drivers/gpu/drm/drm_probe_helper.c @@ -262,6 +262,26 @@ static bool drm_kms_helper_enable_hpd(struct drm_device *dev) }
#define DRM_OUTPUT_POLL_PERIOD (10*HZ) +static void reschedule_output_poll_work(struct drm_device *dev) +{ + unsigned long delay = DRM_OUTPUT_POLL_PERIOD; + + if (dev->mode_config.delayed_event) + /* + * FIXME: + * + * Use short (1s) delay to handle the initial delayed event. + * This delay should not be needed, but Optimus/nouveau will + * fail in a mysterious way if the delayed event is handled as + * soon as possible like it is done in + * drm_helper_probe_single_connector_modes() in case the poll + * was enabled before. + */ + delay = HZ; + + schedule_delayed_work(&dev->mode_config.output_poll_work, delay); +} + /** * drm_kms_helper_poll_enable - re-enable output polling. * @dev: drm_device @@ -279,37 +299,41 @@ static bool drm_kms_helper_enable_hpd(struct drm_device *dev) */ void drm_kms_helper_poll_enable(struct drm_device *dev) { - bool poll = false; - unsigned long delay = DRM_OUTPUT_POLL_PERIOD; - if (!dev->mode_config.poll_enabled || !drm_kms_helper_poll || dev->mode_config.poll_running) return;
- poll = drm_kms_helper_enable_hpd(dev); - - if (dev->mode_config.delayed_event) { - /* - * FIXME: - * - * Use short (1s) delay to handle the initial delayed event. - * This delay should not be needed, but Optimus/nouveau will - * fail in a mysterious way if the delayed event is handled as - * soon as possible like it is done in - * drm_helper_probe_single_connector_modes() in case the poll - * was enabled before. - */ - poll = true; - delay = HZ; - } - - if (poll) - schedule_delayed_work(&dev->mode_config.output_poll_work, delay); + if (drm_kms_helper_enable_hpd(dev) || + dev->mode_config.delayed_event) + reschedule_output_poll_work(dev);
dev->mode_config.poll_running = true; } EXPORT_SYMBOL(drm_kms_helper_poll_enable);
+/** + * drm_kms_helper_poll_reschedule - reschedule the output polling work + * @dev: drm_device + * + * This function reschedules the output polling work, after polling for a + * connector has been enabled. + * + * Drivers must call this helper after enabling polling for a connector by + * setting %DRM_CONNECTOR_POLL_CONNECT / %DRM_CONNECTOR_POLL_DISCONNECT flags + * in drm_connector::polled. Note that after disabling polling by clearing these + * flags for a connector will stop the output polling work automatically if + * the polling is disabled for all other connectors as well. + * + * The function can be called only after polling has been enabled by calling + * drm_kms_helper_poll_init() / drm_kms_helper_poll_enable(). + */ +void drm_kms_helper_poll_reschedule(struct drm_device *dev) +{ + if (dev->mode_config.poll_running) + reschedule_output_poll_work(dev); +} +EXPORT_SYMBOL(drm_kms_helper_poll_reschedule); + static enum drm_connector_status drm_helper_probe_detect_ctx(struct drm_connector *connector, bool force) { diff --git a/include/drm/drm_probe_helper.h b/include/drm/drm_probe_helper.h index 4977e0ab72db..fad3c4003b2b 100644 --- a/include/drm/drm_probe_helper.h +++ b/include/drm/drm_probe_helper.h @@ -25,6 +25,7 @@ void drm_kms_helper_connector_hotplug_event(struct drm_connector *connector);
void drm_kms_helper_poll_disable(struct drm_device *dev); void drm_kms_helper_poll_enable(struct drm_device *dev); +void drm_kms_helper_poll_reschedule(struct drm_device *dev); bool drm_kms_helper_is_poll_worker(void);
enum drm_mode_status drm_crtc_helper_mode_valid_fixed(struct drm_crtc *crtc,
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Michael fedora.dm0@gmail.com
commit f19df6e4de64b7fc6d71f192aa9ff3b701e4bade upstream.
Encountered on an ARM Mali-T760 MP4, attempting to read the nvmem variable can also return EOPNOTSUPP instead of ENOENT when speed binning is unsupported.
Cc: stable@vger.kernel.org Fixes: 7d690f936e9b ("drm/panfrost: Add basic support for speed binning") Signed-off-by: David Michael fedora.dm0@gmail.com Reviewed-by: Steven Price steven.price@arm.com Signed-off-by: Steven Price steven.price@arm.com Link: https://patchwork.freedesktop.org/patch/msgid/87msyryd7y.fsf@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/panfrost/panfrost_devfreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c index 58dfb15a8757..e78de99e9933 100644 --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c @@ -96,7 +96,7 @@ static int panfrost_read_speedbin(struct device *dev) * keep going without it; any other error means that we are * supposed to read the bin value, but we failed doing so. */ - if (ret != -ENOENT) { + if (ret != -ENOENT && ret != -EOPNOTSUPP) { DRM_DEV_ERROR(dev, "Cannot read speed-bin (%d).", ret); return ret; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anshuman Gupta anshuman.gupta@intel.com
commit 2872144aec04baa7e43ecd2a60f7f0be3aa843fd upstream.
System wide suspend already has support for lmem save/restore during suspend therefore enabling d3cold for s2idle and keepng it disable for runtime PM.(Refer below commit for d3cold runtime PM disable justification) 'commit 66eb93e71a7a ("drm/i915/dgfx: Keep PCI autosuspend control 'on' by default on all dGPU")'
It will reduce the DG2 Card power consumption to ~0 Watt for s2idle power KPI.
v2: - Added "Cc: stable@vger.kernel.org".
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8755 Cc: stable@vger.kernel.org Cc: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Anshuman Gupta anshuman.gupta@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com Tested-by: Aaron Ma aaron.ma@canonical.com Tested-by: Jianshui Yu Jianshui.yu@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230816125216.1722002-1-anshu... (cherry picked from commit 2643e6d1f2a5e51877be24042d53cf956589be10) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/i915_driver.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-)
--- a/drivers/gpu/drm/i915/i915_driver.c +++ b/drivers/gpu/drm/i915/i915_driver.c @@ -433,7 +433,6 @@ static int i915_pcode_init(struct drm_i9 static int i915_driver_hw_probe(struct drm_i915_private *dev_priv) { struct pci_dev *pdev = to_pci_dev(dev_priv->drm.dev); - struct pci_dev *root_pdev; int ret;
if (i915_inject_probe_failure(dev_priv)) @@ -547,15 +546,6 @@ static int i915_driver_hw_probe(struct d
intel_bw_init_hw(dev_priv);
- /* - * FIXME: Temporary hammer to avoid freezing the machine on our DGFX - * This should be totally removed when we handle the pci states properly - * on runtime PM and on s2idle cases. - */ - root_pdev = pcie_find_root_port(pdev); - if (root_pdev) - pci_d3cold_disable(root_pdev); - return 0;
err_opregion: @@ -581,7 +571,6 @@ err_perf: static void i915_driver_hw_remove(struct drm_i915_private *dev_priv) { struct pci_dev *pdev = to_pci_dev(dev_priv->drm.dev); - struct pci_dev *root_pdev;
i915_perf_fini(dev_priv);
@@ -589,10 +578,6 @@ static void i915_driver_hw_remove(struct
if (pdev->msi_enabled) pci_disable_msi(pdev); - - root_pdev = pcie_find_root_port(pdev); - if (root_pdev) - pci_d3cold_enable(root_pdev); }
/** @@ -1499,6 +1484,8 @@ static int intel_runtime_suspend(struct { struct drm_i915_private *dev_priv = kdev_to_i915(kdev); struct intel_runtime_pm *rpm = &dev_priv->runtime_pm; + struct pci_dev *pdev = to_pci_dev(dev_priv->drm.dev); + struct pci_dev *root_pdev; struct intel_gt *gt; int ret, i;
@@ -1550,6 +1537,15 @@ static int intel_runtime_suspend(struct drm_err(&dev_priv->drm, "Unclaimed access detected prior to suspending\n");
+ /* + * FIXME: Temporary hammer to avoid freezing the machine on our DGFX + * This should be totally removed when we handle the pci states properly + * on runtime PM. + */ + root_pdev = pcie_find_root_port(pdev); + if (root_pdev) + pci_d3cold_disable(root_pdev); + rpm->suspended = true;
/* @@ -1588,6 +1584,8 @@ static int intel_runtime_resume(struct d { struct drm_i915_private *dev_priv = kdev_to_i915(kdev); struct intel_runtime_pm *rpm = &dev_priv->runtime_pm; + struct pci_dev *pdev = to_pci_dev(dev_priv->drm.dev); + struct pci_dev *root_pdev; struct intel_gt *gt; int ret, i;
@@ -1601,6 +1599,11 @@ static int intel_runtime_resume(struct d
intel_opregion_notify_adapter(dev_priv, PCI_D0); rpm->suspended = false; + + root_pdev = pcie_find_root_port(pdev); + if (root_pdev) + pci_d3cold_enable(root_pdev); + if (intel_uncore_unclaimed_mmio(&dev_priv->uncore)) drm_dbg(&dev_priv->drm, "Unclaimed access during suspend, bios?\n");
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ankit Nautiyal ankit.k.nautiyal@intel.com
commit 5ad1ab30ac0809d2963ddcf39ac34317a24a2f17 upstream.
DP DSC Receiver Capabilities are exposed via DPCD 60h-6Fh. Fix the DSC RECEIVER CAP SIZE accordingly.
Fixes: ffddc4363c28 ("drm/dp: Add DP DSC DPCD receiver capability size define and missing SHIFT") Cc: Anusha Srivatsa anusha.srivatsa@intel.com Cc: Manasi Navare manasi.d.navare@intel.com Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Ankit Nautiyal ankit.k.nautiyal@intel.com Reviewed-by: Stanislav Lisovskiy stanislav.lisovskiy@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230818044436.177806-1-ankit.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/drm/display/drm_dp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/drm/display/drm_dp.h +++ b/include/drm/display/drm_dp.h @@ -1534,7 +1534,7 @@ enum drm_dp_phy {
#define DP_BRANCH_OUI_HEADER_SIZE 0xc #define DP_RECEIVER_CAP_SIZE 0xf -#define DP_DSC_RECEIVER_CAP_SIZE 0xf +#define DP_DSC_RECEIVER_CAP_SIZE 0x10 /* DSC Capabilities 0x60 through 0x6F */ #define EDP_PSR_RECEIVER_CAP_SIZE 2 #define EDP_DISPLAY_CTL_CAP_SIZE 3 #define DP_LTTPR_COMMON_CAP_SIZE 8
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Imre Deak imre.deak@intel.com
commit 1dcc437427bbcebc8381226352f7ade08a271191 upstream.
After the commit in the Fixes: line below, HPD polling stopped working on i915, since after that change calling drm_kms_helper_poll_enable() doesn't restart drm_mode_config::output_poll_work if the work was stopped (no connectors needing polling) and enabling polling for a connector (during runtime suspend or detecting an HPD IRQ storm).
After the above change calling drm_kms_helper_poll_enable() is a nop after it's been called already and polling for some connectors was disabled/re-enabled.
Fix this by calling drm_kms_helper_poll_reschedule() added in the previous patch instead, which reschedules the work whenever expected.
Fixes: d33a54e3991d ("drm/probe_helper: sort out poll_running vs poll_enabled") CC: stable@vger.kernel.org # 6.4+ Cc: Dmitry Baryshkov dmitry.baryshkov@linaro.org Cc: dri-devel@lists.freedesktop.org Reviewed-by: Jouni Högander jouni.hogander@intel.com Signed-off-by: Imre Deak imre.deak@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230822113015.41224-2-imre.de... (cherry picked from commit 50452f2f76852322620b63e62922b85e955abe94) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_hotplug.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_hotplug.c +++ b/drivers/gpu/drm/i915/display/intel_hotplug.c @@ -210,7 +210,7 @@ intel_hpd_irq_storm_switch_to_polling(st
/* Enable polling and queue hotplug re-enabling. */ if (hpd_disabled) { - drm_kms_helper_poll_enable(&dev_priv->drm); + drm_kms_helper_poll_reschedule(&dev_priv->drm); mod_delayed_work(system_wq, &dev_priv->display.hotplug.reenable_work, msecs_to_jiffies(HPD_STORM_REENABLE_DELAY)); } @@ -644,7 +644,7 @@ static void i915_hpd_poll_init_work(stru drm_connector_list_iter_end(&conn_iter);
if (enabled) - drm_kms_helper_poll_enable(&dev_priv->drm); + drm_kms_helper_poll_reschedule(&dev_priv->drm);
mutex_unlock(&dev_priv->drm.mode_config.mutex);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huacai Chen chenhuacai@loongson.cn
commit 9730870b484e9de852b51df08a8b357b1129489e upstream.
In hw_breakpoint_control(), encode_ctrl_reg() has already encoded the MWPnCFG3_LoadEn/MWPnCFG3_StoreEn bits in info->ctrl. We don't need to add (1 << MWPnCFG3_LoadEn | 1 << MWPnCFG3_StoreEn) unconditionally.
Otherwise we can't set read watchpoint and write watchpoint separately.
Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/kernel/hw_breakpoint.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/loongarch/kernel/hw_breakpoint.c +++ b/arch/loongarch/kernel/hw_breakpoint.c @@ -207,8 +207,7 @@ static int hw_breakpoint_control(struct write_wb_reg(CSR_CFG_CTRL, i, 0, CTRL_PLV_ENABLE); } else { ctrl = encode_ctrl_reg(info->ctrl); - write_wb_reg(CSR_CFG_CTRL, i, 1, ctrl | CTRL_PLV_ENABLE | - 1 << MWPnCFG3_LoadEn | 1 << MWPnCFG3_StoreEn); + write_wb_reg(CSR_CFG_CTRL, i, 1, ctrl | CTRL_PLV_ENABLE); } enable = csr_read64(LOONGARCH_CSR_CRMD); csr_write64(CSR_CRMD_WE | enable, LOONGARCH_CSR_CRMD);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rick Edgecombe rick.p.edgecombe@intel.com
commit 1f69383b203e28cf8a4ca9570e572da1699f76cd upstream.
The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is valid and should be reloaded when returning to userspace. However, the kernel will skip doing this if the FPU registers are already valid as determined by fpregs_state_valid(). The logic embedded there considers the state valid if two cases are both true:
1: fpu_fpregs_owner_ctx points to the current tasks FPU state 2: the last CPU the registers were live in was the current CPU.
This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to the current FPU during the fpregs_restore_userregs() operation, so it indicates that the registers have been restored on this CPU. But this alone doesn’t preclude that the task hasn’t been rescheduled to a different CPU, where the registers were modified, and then back to the current CPU. To verify that this was not the case the logic relies on the second condition. So the assumption is that if the registers have been restored, AND they haven’t had the chance to be modified (by being loaded on another CPU), then they MUST be valid on the current CPU.
Besides the lazy FPU optimizations, the other cases where the FPU registers might not be valid are when the kernel modifies the FPU register state or the FPU saved buffer. In this case the operation modifying the FPU state needs to let the kernel know the correspondence has been broken. The comment in “arch/x86/kernel/fpu/context.h” has: /* ... * If the FPU register state is valid, the kernel can skip restoring the * FPU state from memory. * * Any code that clobbers the FPU registers or updates the in-memory * FPU state for a task MUST let the rest of the kernel know that the * FPU registers are no longer valid for this task. * * Either one of these invalidation functions is enough. Invalidate * a resource you control: CPU if using the CPU for something else * (with preemption disabled), FPU for the current task, or a task that * is prevented from running by the current task. */
However, this is not completely true. When the kernel modifies the registers or saved FPU state, it can only rely on __fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu tracking. The exec path instead relies on fpregs_deactivate(), which sets the CPU’s FPU context to NULL. This was observed to fail to restore the reset FPU state to the registers when returning to userspace in the following scenario:
1. A task is executing in userspace on CPU0 - CPU0’s FPU context points to tasks - fpu->last_cpu=CPU0
2. The task exec()’s
3. While in the kernel the task is preempted - CPU0 gets a thread executing in the kernel (such that no other FPU context is activated) - Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out
4. Task is migrated to CPU1
5. Continuing the exec(), the task gets to fpu_flush_thread()->fpu_reset_fpregs() - Sets CPU1’s fpu context to NULL - Copies the init state to the task’s FPU buffer - Sets TIF_NEED_FPU_LOAD on the task
6. The task reschedules back to CPU0 before completing the exec() and returning to userspace - During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set - Skips saving the registers and updating task’s fpu→last_cpu, because TIF_NEED_FPU_LOAD is the canonical source.
7. Now CPU0’s FPU context is still pointing to the task’s, and fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even though the reset FPU state has not been restored.
So the root cause is that exec() is doing the wrong kind of invalidate. It should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further, fpu__drop() doesn't really seem appropriate as the task (and FPU) are not going away, they are just getting reset as part of an exec. So switch to __fpu_invalidate_fpregs_state().
Also, delete the misleading comment that says that either kind of invalidate will be enough, because it’s not always the case.
Fixes: 33344368cb08 ("x86/fpu: Clean up the fpu__clear() variants") Reported-by: Lei Wang lei4.wang@intel.com Signed-off-by: Rick Edgecombe rick.p.edgecombe@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Lijun Pan lijun.pan@intel.com Reviewed-by: Sohil Mehta sohil.mehta@intel.com Acked-by: Lijun Pan lijun.pan@intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/fpu/context.h | 3 +-- arch/x86/kernel/fpu/core.c | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/fpu/context.h +++ b/arch/x86/kernel/fpu/context.h @@ -19,8 +19,7 @@ * FPU state for a task MUST let the rest of the kernel know that the * FPU registers are no longer valid for this task. * - * Either one of these invalidation functions is enough. Invalidate - * a resource you control: CPU if using the CPU for something else + * Invalidate a resource you control: CPU if using the CPU for something else * (with preemption disabled), FPU for the current task, or a task that * is prevented from running by the current task. */ --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -679,7 +679,7 @@ static void fpu_reset_fpregs(void) struct fpu *fpu = ¤t->thread.fpu;
fpregs_lock(); - fpu__drop(fpu); + __fpu_invalidate_fpregs_state(fpu); /* * This does not change the actual hardware registers. It just * resets the memory image and sets TIF_NEED_FPU_LOAD so a
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Feng Tang feng.tang@intel.com
commit 2c66ca3949dc701da7f4c9407f2140ae425683a5 upstream.
0-Day found a 34.6% regression in stress-ng's 'af-alg' test case, and bisected it to commit b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()"), which optimizes the FPU init order, and moves the CR4_OSXSAVE enabling into a later place:
arch_cpu_finalize_init identify_boot_cpu identify_cpu generic_identify get_cpu_cap --> setup cpu capability ... fpu__init_cpu fpu__init_cpu_xstate cr4_set_bits(X86_CR4_OSXSAVE);
As the FPU is not yet initialized the CPU capability setup fails to set X86_FEATURE_OSXSAVE. Many security module like 'camellia_aesni_avx_x86_64' depend on this feature and therefore fail to load, causing the regression.
Cure this by setting X86_FEATURE_OSXSAVE feature right after OSXSAVE enabling.
[ tglx: Moved it into the actual BSP FPU initialization code and added a comment ]
Fixes: b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()") Reported-by: kernel test robot oliver.sang@intel.com Signed-off-by: Feng Tang feng.tang@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/202307192135.203ac24e-oliver.sang@intel.com Link: https://lore.kernel.org/lkml/20230823065747.92257-1-feng.tang@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/fpu/xstate.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -882,6 +882,13 @@ void __init fpu__init_system_xstate(unsi goto out_disable; }
+ /* + * CPU capabilities initialization runs before FPU init. So + * X86_FEATURE_OSXSAVE is not set. Now that XSAVE is completely + * functional, set the feature bit so depending code works. + */ + setup_force_cpu_cap(X86_FEATURE_OSXSAVE); + print_xstate_offset_size(); pr_info("x86/fpu: Enabled xstate features 0x%llx, context size is %d bytes, using '%s' format.\n", fpu_kernel_cfg.max_features,
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matt Roper matthew.d.roper@intel.com
commit 12e6f6dc78e4f4a418648fb1a9c0cd2ae9b3430b upstream.
For platforms with GMD_ID support (i.e., everything MTL and beyond), identification of the display IP present should be based on the contents of the GMD_ID register rather than a PCI devid match.
Note that since GMD_ID readout requires access to the PCI BAR, a slight change to the driver init sequence is needed --- pci_enable_device() is now called before i915_driver_create().
v2: - Fix use of uninitialized i915 pointer in error path if pci_enable_device() fails before the i915 device is created. (lkp) - Use drm_device parameter to intel_display_device_probe. This goes against i915 conventions, but since the primary goal here is to make it easy to call this function from other drivers (like Xe) and since we don't need anything from the i915 structure, this seems like an exception where drm_device is a more natural fit. v3: - Go back do drm_i915_private for intel_display_device_probe. (Jani) - Move forward decl to top of header. (Jani)
Signed-off-by: Matt Roper matthew.d.roper@intel.com Reviewed-by: Andrzej Hajda andrzej.hajda@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230523195609.73627-6-matthew... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_display_device.c | 65 ++++++++++++++++++-- drivers/gpu/drm/i915/display/intel_display_device.h | 5 + drivers/gpu/drm/i915/i915_driver.c | 17 +++-- drivers/gpu/drm/i915/intel_device_info.c | 13 ++-- 4 files changed, 84 insertions(+), 16 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_display_device.c +++ b/drivers/gpu/drm/i915/display/intel_display_device.c @@ -5,7 +5,10 @@
#include <drm/i915_pciids.h> #include <drm/drm_color_mgmt.h> +#include <linux/pci.h>
+#include "i915_drv.h" +#include "i915_reg.h" #include "intel_display_device.h" #include "intel_display_power.h" #include "intel_display_reg_defs.h" @@ -710,19 +713,73 @@ static const struct { INTEL_RPLP_IDS(&xe_lpd_display), INTEL_DG2_IDS(&xe_hpd_display),
- /* FIXME: Replace this with a GMD_ID lookup */ - INTEL_MTL_IDS(&xe_lpdp_display), + /* + * Do not add any GMD_ID-based platforms to this list. They will + * be probed automatically based on the IP version reported by + * the hardware. + */ };
+static const struct { + u16 ver; + u16 rel; + const struct intel_display_device_info *display; +} gmdid_display_map[] = { + { 14, 0, &xe_lpdp_display }, +}; + +static const struct intel_display_device_info * +probe_gmdid_display(struct drm_i915_private *i915, u16 *ver, u16 *rel, u16 *step) +{ + struct pci_dev *pdev = to_pci_dev(i915->drm.dev); + void __iomem *addr; + u32 val; + int i; + + addr = pci_iomap_range(pdev, 0, i915_mmio_reg_offset(GMD_ID_DISPLAY), sizeof(u32)); + if (!addr) { + drm_err(&i915->drm, "Cannot map MMIO BAR to read display GMD_ID\n"); + return &no_display; + } + + val = ioread32(addr); + pci_iounmap(pdev, addr); + + if (val == 0) + /* Platform doesn't have display */ + return &no_display; + + *ver = REG_FIELD_GET(GMD_ID_ARCH_MASK, val); + *rel = REG_FIELD_GET(GMD_ID_RELEASE_MASK, val); + *step = REG_FIELD_GET(GMD_ID_STEP, val); + + for (i = 0; i < ARRAY_SIZE(gmdid_display_map); i++) + if (*ver == gmdid_display_map[i].ver && + *rel == gmdid_display_map[i].rel) + return gmdid_display_map[i].display; + + drm_err(&i915->drm, "Unrecognized display IP version %d.%02d; disabling display.\n", + *ver, *rel); + return &no_display; +} + const struct intel_display_device_info * -intel_display_device_probe(u16 pci_devid) +intel_display_device_probe(struct drm_i915_private *i915, bool has_gmdid, + u16 *gmdid_ver, u16 *gmdid_rel, u16 *gmdid_step) { + struct pci_dev *pdev = to_pci_dev(i915->drm.dev); int i;
+ if (has_gmdid) + return probe_gmdid_display(i915, gmdid_ver, gmdid_rel, gmdid_step); + for (i = 0; i < ARRAY_SIZE(intel_display_ids); i++) { - if (intel_display_ids[i].devid == pci_devid) + if (intel_display_ids[i].devid == pdev->device) return intel_display_ids[i].info; }
+ drm_dbg(&i915->drm, "No display ID found for device ID %04x; disabling display.\n", + pdev->device); + return &no_display; } --- a/drivers/gpu/drm/i915/display/intel_display_device.h +++ b/drivers/gpu/drm/i915/display/intel_display_device.h @@ -10,6 +10,8 @@
#include "display/intel_display_limits.h"
+struct drm_i915_private; + #define DEV_INFO_DISPLAY_FOR_EACH_FLAG(func) \ /* Keep in alphabetical order */ \ func(cursor_needs_physical); \ @@ -81,6 +83,7 @@ struct intel_display_device_info { };
const struct intel_display_device_info * -intel_display_device_probe(u16 pci_devid); +intel_display_device_probe(struct drm_i915_private *i915, bool has_gmdid, + u16 *ver, u16 *rel, u16 *step);
#endif --- a/drivers/gpu/drm/i915/i915_driver.c +++ b/drivers/gpu/drm/i915/i915_driver.c @@ -739,13 +739,17 @@ int i915_driver_probe(struct pci_dev *pd struct drm_i915_private *i915; int ret;
- i915 = i915_driver_create(pdev, ent); - if (IS_ERR(i915)) - return PTR_ERR(i915); - ret = pci_enable_device(pdev); - if (ret) - goto out_fini; + if (ret) { + pr_err("Failed to enable graphics device: %pe\n", ERR_PTR(ret)); + return ret; + } + + i915 = i915_driver_create(pdev, ent); + if (IS_ERR(i915)) { + ret = PTR_ERR(i915); + goto out_pci_disable; + }
ret = i915_driver_early_probe(i915); if (ret < 0) @@ -828,7 +832,6 @@ out_runtime_pm_put: i915_driver_late_release(i915); out_pci_disable: pci_disable_device(pdev); -out_fini: i915_probe_error(i915, "Device initialization failed (%d)\n", ret); return ret; } --- a/drivers/gpu/drm/i915/intel_device_info.c +++ b/drivers/gpu/drm/i915/intel_device_info.c @@ -345,7 +345,6 @@ static void ip_ver_read(struct drm_i915_ static void intel_ipver_early_init(struct drm_i915_private *i915) { struct intel_runtime_info *runtime = RUNTIME_INFO(i915); - struct intel_display_runtime_info *display_runtime = DISPLAY_RUNTIME_INFO(i915);
if (!HAS_GMD_ID(i915)) { drm_WARN_ON(&i915->drm, RUNTIME_INFO(i915)->graphics.ip.ver > 12); @@ -366,8 +365,6 @@ static void intel_ipver_early_init(struc RUNTIME_INFO(i915)->graphics.ip.ver = 12; RUNTIME_INFO(i915)->graphics.ip.rel = 70; } - ip_ver_read(i915, i915_mmio_reg_offset(GMD_ID_DISPLAY), - (struct intel_ip_version *)&display_runtime->ip); ip_ver_read(i915, i915_mmio_reg_offset(GMD_ID_MEDIA), &runtime->media.ip); } @@ -574,6 +571,7 @@ void intel_device_info_driver_create(str { struct intel_device_info *info; struct intel_runtime_info *runtime; + u16 ver, rel, step;
/* Setup the write-once "constant" device info */ info = mkwrite_device_info(i915); @@ -584,11 +582,18 @@ void intel_device_info_driver_create(str memcpy(runtime, &INTEL_INFO(i915)->__runtime, sizeof(*runtime));
/* Probe display support */ - info->display = intel_display_device_probe(device_id); + info->display = intel_display_device_probe(i915, info->has_gmd_id, + &ver, &rel, &step); memcpy(DISPLAY_RUNTIME_INFO(i915), &DISPLAY_INFO(i915)->__runtime_defaults, sizeof(*DISPLAY_RUNTIME_INFO(i915)));
+ if (info->has_gmd_id) { + DISPLAY_RUNTIME_INFO(i915)->ip.ver = ver; + DISPLAY_RUNTIME_INFO(i915)->ip.rel = rel; + DISPLAY_RUNTIME_INFO(i915)->ip.step = step; + } + runtime->device_id = device_id; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jani Nikula jani.nikula@intel.com
commit 423ffe62c06ae241ad460f4629dddb9dcf55e060 upstream.
The current display probe is unable to differentiate between IVB Q and IVB D GT2 server, as they both have the same device id, but different subvendor and subdevice. This leads to the latter being misidentified as the former, and should just end up not having a display. However, the no display case returns a NULL as the display device info, and promptly oopses.
As the IVB Q case is rare, and we're anyway moving towards GMD ID, handle the identification requiring subvendor and subdevice as a special case first, instead of unnecessarily growing the intel_display_ids[] array with subvendor and subdevice.
[ 5.425298] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 5.426059] #PF: supervisor read access in kernel mode [ 5.426810] #PF: error_code(0x0000) - not-present page [ 5.427570] PGD 0 P4D 0 [ 5.428285] Oops: 0000 [#1] PREEMPT SMP PTI [ 5.429035] CPU: 0 PID: 137 Comm: (udev-worker) Not tainted 6.4.0-1-amd64 #1 Debian 6.4.4-1 [ 5.429759] Hardware name: HP HP Z220 SFF Workstation/HP Z220 SFF Workstation, BIOS 4.19-218-gb184e6e0a1 02/02/2023 [ 5.430485] RIP: 0010:intel_device_info_driver_create+0xf1/0x120 [i915] [ 5.431338] Code: 48 8b 97 80 1b 00 00 89 8f c0 1b 00 00 48 89 b7 b0 1b 00 00 48 89 97 b8 1b 00 00 0f b7 fd e8 76 e8 14 00 48 89 83 50 1b 00 00 <48> 8b 08 48 89 8b c4 1b 00 00 48 8b 48 08 48 89 8b cc 1b 00 00 8b [ 5.432920] RSP: 0018:ffffb8254044fb98 EFLAGS: 00010206 [ 5.433707] RAX: 0000000000000000 RBX: ffff923076e80000 RCX: 0000000000000000 [ 5.434494] RDX: 0000000000000260 RSI: 0000000100001000 RDI: 000000000000016a [ 5.435277] RBP: 000000000000016a R08: ffffb8254044fb00 R09: 0000000000000000 [ 5.436055] R10: ffff922d02761de8 R11: 00657361656c6572 R12: ffffffffc0e5d140 [ 5.436867] R13: ffff922d00b720d0 R14: 0000000076e80000 R15: ffff923078c0cae8 [ 5.437646] FS: 00007febd19a18c0(0000) GS:ffff92307c000000(0000) knlGS:0000000000000000 [ 5.438434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.439218] CR2: 0000000000000000 CR3: 000000010256e002 CR4: 00000000001706f0 [ 5.440009] Call Trace: [ 5.440824] <TASK> [ 5.441611] ? __die+0x23/0x70 [ 5.442394] ? page_fault_oops+0x17d/0x4c0 [ 5.443173] ? exc_page_fault+0x7f/0x180 [ 5.443949] ? asm_exc_page_fault+0x26/0x30 [ 5.444756] ? intel_device_info_driver_create+0xf1/0x120 [i915] [ 5.445652] ? intel_device_info_driver_create+0xea/0x120 [i915] [ 5.446545] i915_driver_probe+0x7f/0xb60 [i915] [ 5.447431] ? drm_privacy_screen_get+0x15c/0x1a0 [drm] [ 5.448240] local_pci_probe+0x45/0xa0 [ 5.449013] pci_device_probe+0xc7/0x240 [ 5.449748] really_probe+0x19e/0x3e0 [ 5.450464] ? __pfx___driver_attach+0x10/0x10 [ 5.451172] __driver_probe_device+0x78/0x160 [ 5.451870] driver_probe_device+0x1f/0x90 [ 5.452601] __driver_attach+0xd2/0x1c0 [ 5.453293] bus_for_each_dev+0x88/0xd0 [ 5.453989] bus_add_driver+0x116/0x220 [ 5.454672] driver_register+0x59/0x100 [ 5.455336] i915_init+0x25/0xc0 [i915] [ 5.456104] ? __pfx_i915_init+0x10/0x10 [i915] [ 5.456882] do_one_initcall+0x5d/0x240 [ 5.457511] do_init_module+0x60/0x250 [ 5.458126] __do_sys_finit_module+0xac/0x120 [ 5.458721] do_syscall_64+0x60/0xc0 [ 5.459314] ? syscall_exit_to_user_mode+0x1b/0x40 [ 5.459897] ? do_syscall_64+0x6c/0xc0 [ 5.460510] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 5.461082] RIP: 0033:0x7febd20b0eb9 [ 5.461648] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2f 1f 0d 00 f7 d8 64 89 01 48 [ 5.462905] RSP: 002b:00007fffabb1ba78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 5.463554] RAX: ffffffffffffffda RBX: 0000561e6304f410 RCX: 00007febd20b0eb9 [ 5.464201] RDX: 0000000000000000 RSI: 00007febd2244f0d RDI: 0000000000000015 [ 5.464869] RBP: 00007febd2244f0d R08: 0000000000000000 R09: 000000000000000a [ 5.465512] R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000020000 [ 5.466124] R13: 0000000000000000 R14: 0000561e63032b60 R15: 000000000000000a [ 5.466700] </TASK> [ 5.467271] Modules linked in: i915(+) drm_buddy video crc32_pclmul sr_mod hid_generic wmi crc32c_intel i2c_algo_bit sd_mod cdrom drm_display_helper cec usbhid rc_core ghash_clmulni_intel hid sha512_ssse3 ttm sha512_generic xhci_pci ehci_pci xhci_hcd ehci_hcd nvme ahci drm_kms_helper nvme_core libahci t10_pi libata psmouse aesni_intel scsi_mod crypto_simd i2c_i801 scsi_common crc64_rocksoft_generic cryptd i2c_smbus drm lpc_ich crc64_rocksoft crc_t10dif e1000e usbcore crct10dif_generic usb_common crct10dif_pclmul crc64 crct10dif_common button [ 5.469750] CR2: 0000000000000000 [ 5.470364] ---[ end trace 0000000000000000 ]--- [ 5.470971] RIP: 0010:intel_device_info_driver_create+0xf1/0x120 [i915] [ 5.471699] Code: 48 8b 97 80 1b 00 00 89 8f c0 1b 00 00 48 89 b7 b0 1b 00 00 48 89 97 b8 1b 00 00 0f b7 fd e8 76 e8 14 00 48 89 83 50 1b 00 00 <48> 8b 08 48 89 8b c4 1b 00 00 48 8b 48 08 48 89 8b cc 1b 00 00 8b [ 5.473034] RSP: 0018:ffffb8254044fb98 EFLAGS: 00010206 [ 5.473698] RAX: 0000000000000000 RBX: ffff923076e80000 RCX: 0000000000000000 [ 5.474371] RDX: 0000000000000260 RSI: 0000000100001000 RDI: 000000000000016a [ 5.475045] RBP: 000000000000016a R08: ffffb8254044fb00 R09: 0000000000000000 [ 5.475725] R10: ffff922d02761de8 R11: 00657361656c6572 R12: ffffffffc0e5d140 [ 5.476405] R13: ffff922d00b720d0 R14: 0000000076e80000 R15: ffff923078c0cae8 [ 5.477124] FS: 00007febd19a18c0(0000) GS:ffff92307c000000(0000) knlGS:0000000000000000 [ 5.477811] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.478499] CR2: 0000000000000000 CR3: 000000010256e002 CR4: 00000000001706f0
Fixes: 69d439818fe5 ("drm/i915/display: Make display responsible for probing its own IP") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8991 Cc: Matt Roper matthew.d.roper@intel.com Cc: Andrzej Hajda andrzej.hajda@intel.com Reviewed-by: Luca Coelho luciano.coelho@intel.com Reviewed-by: Matt Roper matthew.d.roper@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230804084600.1005818-1-jani.... (cherry picked from commit 1435188307d128671f677eb908e165666dd83652) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_display_device.c | 24 +++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_display_device.c +++ b/drivers/gpu/drm/i915/display/intel_display_device.c @@ -660,10 +660,24 @@ static const struct intel_display_device BIT(TRANSCODER_C) | BIT(TRANSCODER_D), };
+/* + * Separate detection for no display cases to keep the display id array simple. + * + * IVB Q requires subvendor and subdevice matching to differentiate from IVB D + * GT2 server. + */ +static bool has_no_display(struct pci_dev *pdev) +{ + static const struct pci_device_id ids[] = { + INTEL_IVB_Q_IDS(0), + {} + }; + + return pci_match_id(ids, pdev); +} + #undef INTEL_VGA_DEVICE -#undef INTEL_QUANTA_VGA_DEVICE #define INTEL_VGA_DEVICE(id, info) { id, info } -#define INTEL_QUANTA_VGA_DEVICE(info) { 0x16a, info }
static const struct { u32 devid; @@ -688,7 +702,6 @@ static const struct { INTEL_IRONLAKE_M_IDS(&ilk_m_display), INTEL_SNB_D_IDS(&snb_display), INTEL_SNB_M_IDS(&snb_display), - INTEL_IVB_Q_IDS(NULL), /* must be first IVB in list */ INTEL_IVB_M_IDS(&ivb_display), INTEL_IVB_D_IDS(&ivb_display), INTEL_HSW_IDS(&hsw_display), @@ -773,6 +786,11 @@ intel_display_device_probe(struct drm_i9 if (has_gmdid) return probe_gmdid_display(i915, gmdid_ver, gmdid_rel, gmdid_step);
+ if (has_no_display(pdev)) { + drm_dbg_kms(&i915->drm, "Device doesn't have display\n"); + return &no_display; + } + for (i = 0; i < ARRAY_SIZE(intel_display_ids); i++) { if (intel_display_ids[i].devid == pdev->device) return intel_display_ids[i].info;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juri Lelli juri.lelli@redhat.com
commit ad3a557daf6915296a43ef97a3e9c48e076c9dd8 upstream.
rebuild_root_domains() and update_tasks_root_domain() have neutral names, but actually deal with DEADLINE bandwidth accounting.
Rename them to use 'dl_' prefix so that intent is more clear.
No functional change.
Suggested-by: Qais Yousef (Google) qyousef@layalina.io Signed-off-by: Juri Lelli juri.lelli@redhat.com Reviewed-by: Waiman Long longman@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Qais Yousef (Google) qyousef@layalina.io Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cgroup/cpuset.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1066,7 +1066,7 @@ done: return ndoms; }
-static void update_tasks_root_domain(struct cpuset *cs) +static void dl_update_tasks_root_domain(struct cpuset *cs) { struct css_task_iter it; struct task_struct *task; @@ -1079,7 +1079,7 @@ static void update_tasks_root_domain(str css_task_iter_end(&it); }
-static void rebuild_root_domains(void) +static void dl_rebuild_rd_accounting(void) { struct cpuset *cs = NULL; struct cgroup_subsys_state *pos_css; @@ -1107,7 +1107,7 @@ static void rebuild_root_domains(void)
rcu_read_unlock();
- update_tasks_root_domain(cs); + dl_update_tasks_root_domain(cs);
rcu_read_lock(); css_put(&cs->css); @@ -1121,7 +1121,7 @@ partition_and_rebuild_sched_domains(int { mutex_lock(&sched_domains_mutex); partition_sched_domains_locked(ndoms_new, doms_new, dattr_new); - rebuild_root_domains(); + dl_rebuild_rd_accounting(); mutex_unlock(&sched_domains_mutex); }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juri Lelli juri.lelli@redhat.com
commit 111cd11bbc54850f24191c52ff217da88a5e639b upstream.
Turns out percpu_cpuset_rwsem - commit 1243dc518c9d ("cgroup/cpuset: Convert cpuset_mutex to percpu_rwsem") - wasn't such a brilliant idea, as it has been reported to cause slowdowns in workloads that need to change cpuset configuration frequently and it is also not implementing priority inheritance (which causes troubles with realtime workloads).
Convert percpu_cpuset_rwsem back to regular cpuset_mutex. Also grab it only for SCHED_DEADLINE tasks (other policies don't care about stable cpusets anyway).
Signed-off-by: Juri Lelli juri.lelli@redhat.com Reviewed-by: Waiman Long longman@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Qais Yousef (Google) qyousef@layalina.io Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/cpuset.h | 8 +- kernel/cgroup/cpuset.c | 159 ++++++++++++++++++++++++------------------------- kernel/sched/core.c | 22 ++++-- 3 files changed, 99 insertions(+), 90 deletions(-)
--- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -71,8 +71,8 @@ extern void cpuset_init_smp(void); extern void cpuset_force_rebuild(void); extern void cpuset_update_active_cpus(void); extern void cpuset_wait_for_hotplug(void); -extern void cpuset_read_lock(void); -extern void cpuset_read_unlock(void); +extern void cpuset_lock(void); +extern void cpuset_unlock(void); extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask); extern bool cpuset_cpus_allowed_fallback(struct task_struct *p); extern nodemask_t cpuset_mems_allowed(struct task_struct *p); @@ -189,8 +189,8 @@ static inline void cpuset_update_active_
static inline void cpuset_wait_for_hotplug(void) { }
-static inline void cpuset_read_lock(void) { } -static inline void cpuset_read_unlock(void) { } +static inline void cpuset_lock(void) { } +static inline void cpuset_unlock(void) { }
static inline void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask) --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -366,22 +366,23 @@ static struct cpuset top_cpuset = { if (is_cpuset_online(((des_cs) = css_cs((pos_css)))))
/* - * There are two global locks guarding cpuset structures - cpuset_rwsem and + * There are two global locks guarding cpuset structures - cpuset_mutex and * callback_lock. We also require taking task_lock() when dereferencing a * task's cpuset pointer. See "The task_lock() exception", at the end of this - * comment. The cpuset code uses only cpuset_rwsem write lock. Other - * kernel subsystems can use cpuset_read_lock()/cpuset_read_unlock() to - * prevent change to cpuset structures. + * comment. The cpuset code uses only cpuset_mutex. Other kernel subsystems + * can use cpuset_lock()/cpuset_unlock() to prevent change to cpuset + * structures. Note that cpuset_mutex needs to be a mutex as it is used in + * paths that rely on priority inheritance (e.g. scheduler - on RT) for + * correctness. * * A task must hold both locks to modify cpusets. If a task holds - * cpuset_rwsem, it blocks others wanting that rwsem, ensuring that it - * is the only task able to also acquire callback_lock and be able to - * modify cpusets. It can perform various checks on the cpuset structure - * first, knowing nothing will change. It can also allocate memory while - * just holding cpuset_rwsem. While it is performing these checks, various - * callback routines can briefly acquire callback_lock to query cpusets. - * Once it is ready to make the changes, it takes callback_lock, blocking - * everyone else. + * cpuset_mutex, it blocks others, ensuring that it is the only task able to + * also acquire callback_lock and be able to modify cpusets. It can perform + * various checks on the cpuset structure first, knowing nothing will change. + * It can also allocate memory while just holding cpuset_mutex. While it is + * performing these checks, various callback routines can briefly acquire + * callback_lock to query cpusets. Once it is ready to make the changes, it + * takes callback_lock, blocking everyone else. * * Calls to the kernel memory allocator can not be made while holding * callback_lock, as that would risk double tripping on callback_lock @@ -403,16 +404,16 @@ static struct cpuset top_cpuset = { * guidelines for accessing subsystem state in kernel/cgroup.c */
-DEFINE_STATIC_PERCPU_RWSEM(cpuset_rwsem); +static DEFINE_MUTEX(cpuset_mutex);
-void cpuset_read_lock(void) +void cpuset_lock(void) { - percpu_down_read(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); }
-void cpuset_read_unlock(void) +void cpuset_unlock(void) { - percpu_up_read(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); }
static DEFINE_SPINLOCK(callback_lock); @@ -496,7 +497,7 @@ static inline bool partition_is_populate * One way or another, we guarantee to return some non-empty subset * of cpu_online_mask. * - * Call with callback_lock or cpuset_rwsem held. + * Call with callback_lock or cpuset_mutex held. */ static void guarantee_online_cpus(struct task_struct *tsk, struct cpumask *pmask) @@ -538,7 +539,7 @@ out_unlock: * One way or another, we guarantee to return some non-empty subset * of node_states[N_MEMORY]. * - * Call with callback_lock or cpuset_rwsem held. + * Call with callback_lock or cpuset_mutex held. */ static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask) { @@ -550,7 +551,7 @@ static void guarantee_online_mems(struct /* * update task's spread flag if cpuset's page/slab spread flag is set * - * Call with callback_lock or cpuset_rwsem held. The check can be skipped + * Call with callback_lock or cpuset_mutex held. The check can be skipped * if on default hierarchy. */ static void cpuset_update_task_spread_flags(struct cpuset *cs, @@ -575,7 +576,7 @@ static void cpuset_update_task_spread_fl * * One cpuset is a subset of another if all its allowed CPUs and * Memory Nodes are a subset of the other, and its exclusive flags - * are only set if the other's are set. Call holding cpuset_rwsem. + * are only set if the other's are set. Call holding cpuset_mutex. */
static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q) @@ -713,7 +714,7 @@ out: * If we replaced the flag and mask values of the current cpuset * (cur) with those values in the trial cpuset (trial), would * our various subset and exclusive rules still be valid? Presumes - * cpuset_rwsem held. + * cpuset_mutex held. * * 'cur' is the address of an actual, in-use cpuset. Operations * such as list traversal that depend on the actual address of the @@ -829,7 +830,7 @@ static void update_domain_attr_tree(stru rcu_read_unlock(); }
-/* Must be called with cpuset_rwsem held. */ +/* Must be called with cpuset_mutex held. */ static inline int nr_cpusets(void) { /* jump label reference count + the top-level cpuset */ @@ -855,7 +856,7 @@ static inline int nr_cpusets(void) * domains when operating in the severe memory shortage situations * that could cause allocation failures below. * - * Must be called with cpuset_rwsem held. + * Must be called with cpuset_mutex held. * * The three key local variables below are: * cp - cpuset pointer, used (together with pos_css) to perform a @@ -1084,7 +1085,7 @@ static void dl_rebuild_rd_accounting(voi struct cpuset *cs = NULL; struct cgroup_subsys_state *pos_css;
- percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex); lockdep_assert_cpus_held(); lockdep_assert_held(&sched_domains_mutex);
@@ -1134,7 +1135,7 @@ partition_and_rebuild_sched_domains(int * 'cpus' is removed, then call this routine to rebuild the * scheduler's dynamic sched domains. * - * Call with cpuset_rwsem held. Takes cpus_read_lock(). + * Call with cpuset_mutex held. Takes cpus_read_lock(). */ static void rebuild_sched_domains_locked(void) { @@ -1145,7 +1146,7 @@ static void rebuild_sched_domains_locked int ndoms;
lockdep_assert_cpus_held(); - percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex);
/* * If we have raced with CPU hotplug, return early to avoid @@ -1196,9 +1197,9 @@ static void rebuild_sched_domains_locked void rebuild_sched_domains(void) { cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); rebuild_sched_domains_locked(); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); }
@@ -1208,7 +1209,7 @@ void rebuild_sched_domains(void) * @new_cpus: the temp variable for the new effective_cpus mask * * Iterate through each task of @cs updating its cpus_allowed to the - * effective cpuset's. As this function is called with cpuset_rwsem held, + * effective cpuset's. As this function is called with cpuset_mutex held, * cpuset membership stays stable. For top_cpuset, task_cpu_possible_mask() * is used instead of effective_cpus to make sure all offline CPUs are also * included as hotplug code won't update cpumasks for tasks in top_cpuset. @@ -1322,7 +1323,7 @@ static int update_parent_subparts_cpumas int old_prs, new_prs; int part_error = PERR_NONE; /* Partition error? */
- percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex);
/* * The parent must be a partition root. @@ -1545,7 +1546,7 @@ static int update_parent_subparts_cpumas * * On legacy hierarchy, effective_cpus will be the same with cpu_allowed. * - * Called with cpuset_rwsem held + * Called with cpuset_mutex held */ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, bool force) @@ -1705,7 +1706,7 @@ static void update_sibling_cpumasks(stru struct cpuset *sibling; struct cgroup_subsys_state *pos_css;
- percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex);
/* * Check all its siblings and call update_cpumasks_hier() @@ -1955,12 +1956,12 @@ static void *cpuset_being_rebound; * @cs: the cpuset in which each task's mems_allowed mask needs to be changed * * Iterate through each task of @cs updating its mems_allowed to the - * effective cpuset's. As this function is called with cpuset_rwsem held, + * effective cpuset's. As this function is called with cpuset_mutex held, * cpuset membership stays stable. */ static void update_tasks_nodemask(struct cpuset *cs) { - static nodemask_t newmems; /* protected by cpuset_rwsem */ + static nodemask_t newmems; /* protected by cpuset_mutex */ struct css_task_iter it; struct task_struct *task;
@@ -1973,7 +1974,7 @@ static void update_tasks_nodemask(struct * take while holding tasklist_lock. Forks can happen - the * mpol_dup() cpuset_being_rebound check will catch such forks, * and rebind their vma mempolicies too. Because we still hold - * the global cpuset_rwsem, we know that no other rebind effort + * the global cpuset_mutex, we know that no other rebind effort * will be contending for the global variable cpuset_being_rebound. * It's ok if we rebind the same mm twice; mpol_rebind_mm() * is idempotent. Also migrate pages in each mm to new nodes. @@ -2019,7 +2020,7 @@ static void update_tasks_nodemask(struct * * On legacy hierarchy, effective_mems will be the same with mems_allowed. * - * Called with cpuset_rwsem held + * Called with cpuset_mutex held */ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems) { @@ -2072,7 +2073,7 @@ static void update_nodemasks_hier(struct * mempolicies and if the cpuset is marked 'memory_migrate', * migrate the tasks pages to the new memory. * - * Call with cpuset_rwsem held. May take callback_lock during call. + * Call with cpuset_mutex held. May take callback_lock during call. * Will take tasklist_lock, scan tasklist for tasks in cpuset cs, * lock each such tasks mm->mmap_lock, scan its vma's and rebind * their mempolicies to the cpusets new mems_allowed. @@ -2164,7 +2165,7 @@ static int update_relax_domain_level(str * @cs: the cpuset in which each task's spread flags needs to be changed * * Iterate through each task of @cs updating its spread flags. As this - * function is called with cpuset_rwsem held, cpuset membership stays + * function is called with cpuset_mutex held, cpuset membership stays * stable. */ static void update_tasks_flags(struct cpuset *cs) @@ -2184,7 +2185,7 @@ static void update_tasks_flags(struct cp * cs: the cpuset to update * turning_on: whether the flag is being set or cleared * - * Call with cpuset_rwsem held. + * Call with cpuset_mutex held. */
static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, @@ -2234,7 +2235,7 @@ out: * @new_prs: new partition root state * Return: 0 if successful, != 0 if error * - * Call with cpuset_rwsem held. + * Call with cpuset_mutex held. */ static int update_prstate(struct cpuset *cs, int new_prs) { @@ -2472,7 +2473,7 @@ static int cpuset_can_attach_check(struc return 0; }
-/* Called by cgroups to determine if a cpuset is usable; cpuset_rwsem held */ +/* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */ static int cpuset_can_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; @@ -2484,7 +2485,7 @@ static int cpuset_can_attach(struct cgro cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css)); cs = css_cs(css);
- percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex);
/* Check to see if task is allowed in the cpuset */ ret = cpuset_can_attach_check(cs); @@ -2506,7 +2507,7 @@ static int cpuset_can_attach(struct cgro */ cs->attach_in_progress++; out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); return ret; }
@@ -2518,15 +2519,15 @@ static void cpuset_cancel_attach(struct cgroup_taskset_first(tset, &css); cs = css_cs(css);
- percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); cs->attach_in_progress--; if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); }
/* - * Protected by cpuset_rwsem. cpus_attach is used only by cpuset_attach_task() + * Protected by cpuset_mutex. cpus_attach is used only by cpuset_attach_task() * but we can't allocate it dynamically there. Define it global and * allocate from cpuset_init(). */ @@ -2535,7 +2536,7 @@ static nodemask_t cpuset_attach_nodemask
static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task) { - percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex);
if (cs != &top_cpuset) guarantee_online_cpus(task, cpus_attach); @@ -2565,7 +2566,7 @@ static void cpuset_attach(struct cgroup_ cs = css_cs(css);
lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */ - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); cpus_updated = !cpumask_equal(cs->effective_cpus, oldcs->effective_cpus); mems_updated = !nodes_equal(cs->effective_mems, oldcs->effective_mems); @@ -2626,7 +2627,7 @@ out: if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq);
- percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); }
/* The various types of files and directories in a cpuset file system */ @@ -2658,7 +2659,7 @@ static int cpuset_write_u64(struct cgrou int retval = 0;
cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) { retval = -ENODEV; goto out_unlock; @@ -2694,7 +2695,7 @@ static int cpuset_write_u64(struct cgrou break; } out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); return retval; } @@ -2707,7 +2708,7 @@ static int cpuset_write_s64(struct cgrou int retval = -ENODEV;
cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock;
@@ -2720,7 +2721,7 @@ static int cpuset_write_s64(struct cgrou break; } out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); return retval; } @@ -2753,7 +2754,7 @@ static ssize_t cpuset_write_resmask(stru * operation like this one can lead to a deadlock through kernfs * active_ref protection. Let's break the protection. Losing the * protection is okay as we check whether @cs is online after - * grabbing cpuset_rwsem anyway. This only happens on the legacy + * grabbing cpuset_mutex anyway. This only happens on the legacy * hierarchies. */ css_get(&cs->css); @@ -2761,7 +2762,7 @@ static ssize_t cpuset_write_resmask(stru flush_work(&cpuset_hotplug_work);
cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock;
@@ -2785,7 +2786,7 @@ static ssize_t cpuset_write_resmask(stru
free_cpuset(trialcs); out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); kernfs_unbreak_active_protection(of->kn); css_put(&cs->css); @@ -2933,13 +2934,13 @@ static ssize_t sched_partition_write(str
css_get(&cs->css); cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock;
retval = update_prstate(cs, val); out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); css_put(&cs->css); return retval ?: nbytes; @@ -3156,7 +3157,7 @@ static int cpuset_css_online(struct cgro return 0;
cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex);
set_bit(CS_ONLINE, &cs->flags); if (is_spread_page(parent)) @@ -3207,7 +3208,7 @@ static int cpuset_css_online(struct cgro cpumask_copy(cs->effective_cpus, parent->cpus_allowed); spin_unlock_irq(&callback_lock); out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); return 0; } @@ -3228,7 +3229,7 @@ static void cpuset_css_offline(struct cg struct cpuset *cs = css_cs(css);
cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex);
if (is_partition_valid(cs)) update_prstate(cs, 0); @@ -3247,7 +3248,7 @@ static void cpuset_css_offline(struct cg cpuset_dec(); clear_bit(CS_ONLINE, &cs->flags);
- percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); }
@@ -3260,7 +3261,7 @@ static void cpuset_css_free(struct cgrou
static void cpuset_bind(struct cgroup_subsys_state *root_css) { - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); spin_lock_irq(&callback_lock);
if (is_in_v2_mode()) { @@ -3273,7 +3274,7 @@ static void cpuset_bind(struct cgroup_su }
spin_unlock_irq(&callback_lock); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); }
/* @@ -3294,7 +3295,7 @@ static int cpuset_can_fork(struct task_s return 0;
lockdep_assert_held(&cgroup_mutex); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex);
/* Check to see if task is allowed in the cpuset */ ret = cpuset_can_attach_check(cs); @@ -3315,7 +3316,7 @@ static int cpuset_can_fork(struct task_s */ cs->attach_in_progress++; out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); return ret; }
@@ -3331,11 +3332,11 @@ static void cpuset_cancel_fork(struct ta if (same_cs) return;
- percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); cs->attach_in_progress--; if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); }
/* @@ -3363,7 +3364,7 @@ static void cpuset_fork(struct task_stru }
/* CLONE_INTO_CGROUP */ - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); guarantee_online_mems(cs, &cpuset_attach_nodemask_to); cpuset_attach_task(cs, task);
@@ -3371,7 +3372,7 @@ static void cpuset_fork(struct task_stru if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq);
- percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); }
struct cgroup_subsys cpuset_cgrp_subsys = { @@ -3472,7 +3473,7 @@ hotplug_update_tasks_legacy(struct cpuse is_empty = cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed);
- percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex);
/* * Move tasks to the nearest ancestor with execution resources, @@ -3482,7 +3483,7 @@ hotplug_update_tasks_legacy(struct cpuse if (is_empty) remove_tasks_in_empty_cpuset(cs);
- percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); }
static void @@ -3533,14 +3534,14 @@ static void cpuset_hotplug_update_tasks( retry: wait_event(cpuset_attach_wq, cs->attach_in_progress == 0);
- percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex);
/* * We have raced with task attaching. We wait until attaching * is finished, so we won't attach a task to an empty cpuset. */ if (cs->attach_in_progress) { - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); goto retry; }
@@ -3637,7 +3638,7 @@ update_tasks: cpus_updated, mems_updated);
unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); }
/** @@ -3667,7 +3668,7 @@ static void cpuset_hotplug_workfn(struct if (on_dfl && !alloc_cpumasks(NULL, &tmp)) ptmp = &tmp;
- percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex);
/* fetch the available cpus/mems and find out which changed how */ cpumask_copy(&new_cpus, cpu_active_mask); @@ -3724,7 +3725,7 @@ static void cpuset_hotplug_workfn(struct update_tasks_nodemask(&top_cpuset); }
- percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex);
/* if cpus or mems changed, we need to propagate to descendants */ if (cpus_updated || mems_updated) { @@ -4155,7 +4156,7 @@ void __cpuset_memory_pressure_bump(void) * - Used for /proc/<pid>/cpuset. * - No need to task_lock(tsk) on this tsk->cpuset reference, as it * doesn't really matter if tsk->cpuset changes after we read it, - * and we take cpuset_rwsem, keeping cpuset_attach() from changing it + * and we take cpuset_mutex, keeping cpuset_attach() from changing it * anyway. */ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns, --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7590,6 +7590,7 @@ static int __sched_setscheduler(struct t int reset_on_fork; int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; struct rq *rq; + bool cpuset_locked = false;
/* The pi code expects interrupts enabled */ BUG_ON(pi && in_interrupt()); @@ -7639,8 +7640,14 @@ recheck: return retval; }
- if (pi) - cpuset_read_lock(); + /* + * SCHED_DEADLINE bandwidth accounting relies on stable cpusets + * information. + */ + if (dl_policy(policy) || dl_policy(p->policy)) { + cpuset_locked = true; + cpuset_lock(); + }
/* * Make sure no PI-waiters arrive (or leave) while we are @@ -7716,8 +7723,8 @@ change: if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) { policy = oldpolicy = -1; task_rq_unlock(rq, p, &rf); - if (pi) - cpuset_read_unlock(); + if (cpuset_locked) + cpuset_unlock(); goto recheck; }
@@ -7784,7 +7791,8 @@ change: task_rq_unlock(rq, p, &rf);
if (pi) { - cpuset_read_unlock(); + if (cpuset_locked) + cpuset_unlock(); rt_mutex_adjust_pi(p); }
@@ -7796,8 +7804,8 @@ change:
unlock: task_rq_unlock(rq, p, &rf); - if (pi) - cpuset_read_unlock(); + if (cpuset_locked) + cpuset_unlock(); return retval; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juri Lelli juri.lelli@redhat.com
commit 6c24849f5515e4966d94fa5279bdff4acf2e9489 upstream.
Qais reported that iterating over all tasks when rebuilding root domains for finding out which ones are DEADLINE and need their bandwidth correctly restored on such root domains can be a costly operation (10+ ms delays on suspend-resume).
To fix the problem keep track of the number of DEADLINE tasks belonging to each cpuset and then use this information (followup patch) to only perform the above iteration if DEADLINE tasks are actually present in the cpuset for which a corresponding root domain is being rebuilt.
Reported-by: Qais Yousef (Google) qyousef@layalina.io Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/ Signed-off-by: Juri Lelli juri.lelli@redhat.com Reviewed-by: Waiman Long longman@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Qais Yousef (Google) qyousef@layalina.io Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/cpuset.h | 4 ++++ kernel/cgroup/cgroup.c | 4 ++++ kernel/cgroup/cpuset.c | 25 +++++++++++++++++++++++++ kernel/sched/deadline.c | 14 ++++++++++++++ 4 files changed, 47 insertions(+)
--- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -71,6 +71,8 @@ extern void cpuset_init_smp(void); extern void cpuset_force_rebuild(void); extern void cpuset_update_active_cpus(void); extern void cpuset_wait_for_hotplug(void); +extern void inc_dl_tasks_cs(struct task_struct *task); +extern void dec_dl_tasks_cs(struct task_struct *task); extern void cpuset_lock(void); extern void cpuset_unlock(void); extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask); @@ -189,6 +191,8 @@ static inline void cpuset_update_active_
static inline void cpuset_wait_for_hotplug(void) { }
+static inline void inc_dl_tasks_cs(struct task_struct *task) { } +static inline void dec_dl_tasks_cs(struct task_struct *task) { } static inline void cpuset_lock(void) { } static inline void cpuset_unlock(void) { }
--- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -57,6 +57,7 @@ #include <linux/file.h> #include <linux/fs_parser.h> #include <linux/sched/cputime.h> +#include <linux/sched/deadline.h> #include <linux/psi.h> #include <net/sock.h>
@@ -6696,6 +6697,9 @@ void cgroup_exit(struct task_struct *tsk list_add_tail(&tsk->cg_list, &cset->dying_tasks); cset->nr_tasks--;
+ if (dl_task(tsk)) + dec_dl_tasks_cs(tsk); + WARN_ON_ONCE(cgroup_task_frozen(tsk)); if (unlikely(!(tsk->flags & PF_KTHREAD) && test_bit(CGRP_FREEZE, &task_dfl_cgroup(tsk)->flags))) --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -193,6 +193,12 @@ struct cpuset { int use_parent_ecpus; int child_ecpus_count;
+ /* + * number of SCHED_DEADLINE tasks attached to this cpuset, so that we + * know when to rebuild associated root domain bandwidth information. + */ + int nr_deadline_tasks; + /* Invalid partition error code, not lock protected */ enum prs_errcode prs_err;
@@ -245,6 +251,20 @@ static inline struct cpuset *parent_cs(s return css_cs(cs->css.parent); }
+void inc_dl_tasks_cs(struct task_struct *p) +{ + struct cpuset *cs = task_cs(p); + + cs->nr_deadline_tasks++; +} + +void dec_dl_tasks_cs(struct task_struct *p) +{ + struct cpuset *cs = task_cs(p); + + cs->nr_deadline_tasks--; +} + /* bits in struct cpuset flags field */ typedef enum { CS_ONLINE, @@ -2499,6 +2519,11 @@ static int cpuset_can_attach(struct cgro ret = security_task_setscheduler(task); if (ret) goto out_unlock; + + if (dl_task(task)) { + cs->nr_deadline_tasks++; + cpuset_attach_old_cs->nr_deadline_tasks--; + } }
/* --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -16,6 +16,8 @@ * Fabio Checconi fchecconi@gmail.com */
+#include <linux/cpuset.h> + /* * Default limits for DL period; on the top end we guard against small util * tasks still getting ridiculously long effective runtimes, on the bottom end we @@ -2596,6 +2598,12 @@ static void switched_from_dl(struct rq * if (task_on_rq_queued(p) && p->dl.dl_runtime) task_non_contending(p);
+ /* + * In case a task is setscheduled out from SCHED_DEADLINE we need to + * keep track of that on its cpuset (for correct bandwidth tracking). + */ + dec_dl_tasks_cs(p); + if (!task_on_rq_queued(p)) { /* * Inactive timer is armed. However, p is leaving DEADLINE and @@ -2636,6 +2644,12 @@ static void switched_to_dl(struct rq *rq if (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1) put_task_struct(p);
+ /* + * In case a task is setscheduled to SCHED_DEADLINE we need to keep + * track of that on its cpuset (for correct bandwidth tracking). + */ + inc_dl_tasks_cs(p); + /* If p is not queued we will update its parameters at next wakeup. */ if (!task_on_rq_queued(p)) { add_rq_bw(&p->dl, &rq->dl);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juri Lelli juri.lelli@redhat.com
commit c0f78fd5edcf29b2822ac165f9248a6c165e8554 upstream.
update_tasks_root_domain currently iterates over all tasks even if no DEADLINE task is present on the cpuset/root domain for which bandwidth accounting is being rebuilt. This has been reported to introduce 10+ ms delays on suspend-resume operations.
Skip the costly iteration for cpusets that don't contain DEADLINE tasks.
Reported-by: Qais Yousef (Google) qyousef@layalina.io Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/ Signed-off-by: Juri Lelli juri.lelli@redhat.com Reviewed-by: Waiman Long longman@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Qais Yousef (Google) qyousef@layalina.io Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cgroup/cpuset.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1092,6 +1092,9 @@ static void dl_update_tasks_root_domain( struct css_task_iter it; struct task_struct *task;
+ if (cs->nr_deadline_tasks == 0) + return; + css_task_iter_start(&cs->css, 0, &it);
while ((task = css_task_iter_next(&it)))
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dietmar Eggemann dietmar.eggemann@arm.com
commit 85989106feb734437e2d598b639991b9185a43a6 upstream.
While moving a set of tasks between exclusive cpusets, cpuset_can_attach() -> task_can_attach() calls dl_cpu_busy(..., p) for DL BW overflow checking and per-task DL BW allocation on the destination root_domain for the DL tasks in this set.
This approach has the issue of not freeing already allocated DL BW in the following error cases:
(1) The set of tasks includes multiple DL tasks and DL BW overflow checking fails for one of the subsequent DL tasks.
(2) Another controller next to the cpuset controller which is attached to the same cgroup fails in its can_attach().
To address this problem rework dl_cpu_busy():
(1) Split it into dl_bw_check_overflow() & dl_bw_alloc() and add a dedicated dl_bw_free().
(2) dl_bw_alloc() & dl_bw_free() take a `u64 dl_bw` parameter instead of a `struct task_struct *p` used in dl_cpu_busy(). This allows to allocate DL BW for a set of tasks too rather than only for a single task.
Signed-off-by: Dietmar Eggemann dietmar.eggemann@arm.com Signed-off-by: Juri Lelli juri.lelli@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Qais Yousef (Google) qyousef@layalina.io Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/sched.h | 2 + kernel/sched/core.c | 4 +-- kernel/sched/deadline.c | 53 ++++++++++++++++++++++++++++++++++++------------ kernel/sched/sched.h | 2 - 4 files changed, 45 insertions(+), 16 deletions(-)
--- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1853,6 +1853,8 @@ current_restore_flags(unsigned long orig
extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus); +extern int dl_bw_alloc(int cpu, u64 dl_bw); +extern void dl_bw_free(int cpu, u64 dl_bw); #ifdef CONFIG_SMP extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask); extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask); --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9319,7 +9319,7 @@ int task_can_attach(struct task_struct *
if (unlikely(cpu >= nr_cpu_ids)) return -EINVAL; - ret = dl_cpu_busy(cpu, p); + ret = dl_bw_alloc(cpu, p->dl.dl_bw); }
out: @@ -9604,7 +9604,7 @@ static void cpuset_cpu_active(void) static int cpuset_cpu_inactive(unsigned int cpu) { if (!cpuhp_tasks_frozen) { - int ret = dl_cpu_busy(cpu, NULL); + int ret = dl_bw_check_overflow(cpu);
if (ret) return ret; --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -3058,26 +3058,38 @@ int dl_cpuset_cpumask_can_shrink(const s return ret; }
-int dl_cpu_busy(int cpu, struct task_struct *p) +enum dl_bw_request { + dl_bw_req_check_overflow = 0, + dl_bw_req_alloc, + dl_bw_req_free +}; + +static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw) { - unsigned long flags, cap; + unsigned long flags; struct dl_bw *dl_b; - bool overflow; + bool overflow = 0;
rcu_read_lock_sched(); dl_b = dl_bw_of(cpu); raw_spin_lock_irqsave(&dl_b->lock, flags); - cap = dl_bw_capacity(cpu); - overflow = __dl_overflow(dl_b, cap, 0, p ? p->dl.dl_bw : 0);
- if (!overflow && p) { - /* - * We reserve space for this task in the destination - * root_domain, as we can't fail after this point. - * We will free resources in the source root_domain - * later on (see set_cpus_allowed_dl()). - */ - __dl_add(dl_b, p->dl.dl_bw, dl_bw_cpus(cpu)); + if (req == dl_bw_req_free) { + __dl_sub(dl_b, dl_bw, dl_bw_cpus(cpu)); + } else { + unsigned long cap = dl_bw_capacity(cpu); + + overflow = __dl_overflow(dl_b, cap, 0, dl_bw); + + if (req == dl_bw_req_alloc && !overflow) { + /* + * We reserve space in the destination + * root_domain, as we can't fail after this point. + * We will free resources in the source root_domain + * later on (see set_cpus_allowed_dl()). + */ + __dl_add(dl_b, dl_bw, dl_bw_cpus(cpu)); + } }
raw_spin_unlock_irqrestore(&dl_b->lock, flags); @@ -3085,6 +3097,21 @@ int dl_cpu_busy(int cpu, struct task_str
return overflow ? -EBUSY : 0; } + +int dl_bw_check_overflow(int cpu) +{ + return dl_bw_manage(dl_bw_req_check_overflow, cpu, 0); +} + +int dl_bw_alloc(int cpu, u64 dl_bw) +{ + return dl_bw_manage(dl_bw_req_alloc, cpu, dl_bw); +} + +void dl_bw_free(int cpu, u64 dl_bw) +{ + dl_bw_manage(dl_bw_req_free, cpu, dl_bw); +} #endif
#ifdef CONFIG_SCHED_DEBUG --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -330,7 +330,7 @@ extern void __getparam_dl(struct task_st extern bool __checkparam_dl(const struct sched_attr *attr); extern bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr); extern int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); -extern int dl_cpu_busy(int cpu, struct task_struct *p); +extern int dl_bw_check_overflow(int cpu);
#ifdef CONFIG_CGROUP_SCHED
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dietmar Eggemann dietmar.eggemann@arm.com
commit 2ef269ef1ac006acf974793d975539244d77b28f upstream.
cpuset_can_attach() can fail. Postpone DL BW allocation until all tasks have been checked. DL BW is not allocated per-task but as a sum over all DL tasks migrating.
If multiple controllers are attached to the cgroup next to the cpuset controller a non-cpuset can_attach() can fail. In this case free DL BW in cpuset_cancel_attach().
Finally, update cpuset DL task count (nr_deadline_tasks) only in cpuset_attach().
Suggested-by: Waiman Long longman@redhat.com Signed-off-by: Dietmar Eggemann dietmar.eggemann@arm.com Signed-off-by: Juri Lelli juri.lelli@redhat.com Reviewed-by: Waiman Long longman@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Qais Yousef (Google) qyousef@layalina.io Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/sched.h | 2 - kernel/cgroup/cpuset.c | 53 ++++++++++++++++++++++++++++++++++++++++++++----- kernel/sched/core.c | 17 +-------------- 3 files changed, 51 insertions(+), 21 deletions(-)
--- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1852,7 +1852,7 @@ current_restore_flags(unsigned long orig }
extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); -extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus); +extern int task_can_attach(struct task_struct *p); extern int dl_bw_alloc(int cpu, u64 dl_bw); extern void dl_bw_free(int cpu, u64 dl_bw); #ifdef CONFIG_SMP --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -198,6 +198,8 @@ struct cpuset { * know when to rebuild associated root domain bandwidth information. */ int nr_deadline_tasks; + int nr_migrate_dl_tasks; + u64 sum_migrate_dl_bw;
/* Invalid partition error code, not lock protected */ enum prs_errcode prs_err; @@ -2496,16 +2498,23 @@ static int cpuset_can_attach_check(struc return 0; }
+static void reset_migrate_dl_data(struct cpuset *cs) +{ + cs->nr_migrate_dl_tasks = 0; + cs->sum_migrate_dl_bw = 0; +} + /* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */ static int cpuset_can_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; - struct cpuset *cs; + struct cpuset *cs, *oldcs; struct task_struct *task; int ret;
/* used later by cpuset_attach() */ cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css)); + oldcs = cpuset_attach_old_cs; cs = css_cs(css);
mutex_lock(&cpuset_mutex); @@ -2516,7 +2525,7 @@ static int cpuset_can_attach(struct cgro goto out_unlock;
cgroup_taskset_for_each(task, css, tset) { - ret = task_can_attach(task, cs->effective_cpus); + ret = task_can_attach(task); if (ret) goto out_unlock; ret = security_task_setscheduler(task); @@ -2524,11 +2533,31 @@ static int cpuset_can_attach(struct cgro goto out_unlock;
if (dl_task(task)) { - cs->nr_deadline_tasks++; - cpuset_attach_old_cs->nr_deadline_tasks--; + cs->nr_migrate_dl_tasks++; + cs->sum_migrate_dl_bw += task->dl.dl_bw; } }
+ if (!cs->nr_migrate_dl_tasks) + goto out_success; + + if (!cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)) { + int cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus); + + if (unlikely(cpu >= nr_cpu_ids)) { + reset_migrate_dl_data(cs); + ret = -EINVAL; + goto out_unlock; + } + + ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw); + if (ret) { + reset_migrate_dl_data(cs); + goto out_unlock; + } + } + +out_success: /* * Mark attach is in progress. This makes validate_change() fail * changes which zero cpus/mems_allowed. @@ -2551,6 +2580,14 @@ static void cpuset_cancel_attach(struct cs->attach_in_progress--; if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq); + + if (cs->nr_migrate_dl_tasks) { + int cpu = cpumask_any(cs->effective_cpus); + + dl_bw_free(cpu, cs->sum_migrate_dl_bw); + reset_migrate_dl_data(cs); + } + mutex_unlock(&cpuset_mutex); }
@@ -2651,6 +2688,12 @@ static void cpuset_attach(struct cgroup_ out: cs->old_mems_allowed = cpuset_attach_nodemask_to;
+ if (cs->nr_migrate_dl_tasks) { + cs->nr_deadline_tasks += cs->nr_migrate_dl_tasks; + oldcs->nr_deadline_tasks -= cs->nr_migrate_dl_tasks; + reset_migrate_dl_data(cs); + } + cs->attach_in_progress--; if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq); @@ -3330,7 +3373,7 @@ static int cpuset_can_fork(struct task_s if (ret) goto out_unlock;
- ret = task_can_attach(task, cs->effective_cpus); + ret = task_can_attach(task); if (ret) goto out_unlock;
--- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9294,8 +9294,7 @@ int cpuset_cpumask_can_shrink(const stru return ret; }
-int task_can_attach(struct task_struct *p, - const struct cpumask *cs_effective_cpus) +int task_can_attach(struct task_struct *p) { int ret = 0;
@@ -9308,21 +9307,9 @@ int task_can_attach(struct task_struct * * success of set_cpus_allowed_ptr() on all attached tasks * before cpus_mask may be changed. */ - if (p->flags & PF_NO_SETAFFINITY) { + if (p->flags & PF_NO_SETAFFINITY) ret = -EINVAL; - goto out; - }
- if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span, - cs_effective_cpus)) { - int cpu = cpumask_any_and(cpu_active_mask, cs_effective_cpus); - - if (unlikely(cpu >= nr_cpu_ids)) - return -EINVAL; - ret = dl_bw_alloc(cpu, p->dl.dl_bw); - } - -out: return ret; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
commit a50420c79731fc5cf27ad43719c1091e842a2606 upstream.
flush_cache_vmap() must be called after new vmalloc mappings are installed in the page table in order to allow architectures to make sure the new mapping is visible.
It could lead to a panic since on some architectures (like powerpc), the page table walker could see the wrong pte value and trigger a spurious page fault that can not be resolved (see commit f1cb8f9beba8 ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags")).
But actually the patch is aiming at riscv: the riscv specification allows the caching of invalid entries in the TLB, and since we recently removed the vmalloc page fault handling, we now need to emit a tlb shootdown whenever a new vmalloc mapping is emitted (https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivosi...). That's a temporary solution, there are ways to avoid that :)
Link: https://lkml.kernel.org/r/20230809164633.1556126-1-alexghiti@rivosinc.com Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function") Reported-by: Dylan Jhong dylan@andestech.com Closes: https://lore.kernel.org/linux-riscv/ZMytNY2J8iyjbPPy@atctrx.andestech.com/ Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Palmer Dabbelt palmer@rivosinc.com Acked-by: Palmer Dabbelt palmer@rivosinc.com Reviewed-by: Dylan Jhong dylan@andestech.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/vmalloc.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2929,6 +2929,10 @@ void *vmap_pfn(unsigned long *pfns, unsi free_vm_area(area); return NULL; } + + flush_cache_vmap((unsigned long)area->addr, + (unsigned long)area->addr + count * PAGE_SIZE); + return area->addr; } EXPORT_SYMBOL_GPL(vmap_pfn);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miaohe Lin linmiaohe@huawei.com
commit e2c1ab070fdc81010ec44634838d24fce9ff9e53 upstream.
When page_handle_poison() fails to handle the hugepage or free page in retry path, soft_offline_page() will return 0 while -EBUSY is expected in this case.
Consequently the user will think soft_offline_page succeeds while it in fact failed. So the user will not try again later in this case.
Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory-failure.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2744,10 +2744,13 @@ retry: if (ret > 0) { ret = soft_offline_in_use_page(page); } else if (ret == 0) { - if (!page_handle_poison(page, true, false) && try_again) { - try_again = false; - flags &= ~MF_COUNT_INCREASED; - goto retry; + if (!page_handle_poison(page, true, false)) { + if (try_again) { + try_again = false; + flags &= ~MF_COUNT_INCREASED; + goto retry; + } + ret = -EBUSY; } }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: T.J. Mercier tjmercier@google.com
commit 6867c7a3320669cbe44b905a3eb35db725c6d470 upstream.
When a memcg is in the process of being released mem_cgroup_tryget will fail because its reference count has already reached 0. This can happen during reclaim if the memcg has already been offlined, and we reclaim all remaining pages attributed to the offlined memcg. shrink_many attempts to skip the empty memcg in this case, and continue reclaiming from the remaining memcgs in the old generation. If there is only one memcg remaining, or if all remaining memcgs are in the process of being released then shrink_many will spin until all memcgs have finished being released. The release occurs through a workqueue, so it can take a while before kswapd is able to make any further progress.
This fix results in reductions in kswapd activity and direct reclaim in a test where 28 apps (working set size > total memory) are repeatedly launched in a random sequence:
A B delta ratio(%) allocstall_movable 5962 3539 -2423 -40.64 allocstall_normal 2661 2417 -244 -9.17 kswapd_high_wmark_hit_quickly 53152 7594 -45558 -85.71 pageoutrun 57365 11750 -45615 -79.52
Link: https://lkml.kernel.org/r/20230814151636.1639123-1-tjmercier@google.com Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: T.J. Mercier tjmercier@google.com Acked-by: Yu Zhao yuzhao@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/vmscan.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4818,16 +4818,17 @@ void lru_gen_release_memcg(struct mem_cg
spin_lock_irq(&pgdat->memcg_lru.lock);
- VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list)); + if (hlist_nulls_unhashed(&lruvec->lrugen.list)) + goto unlock;
gen = lruvec->lrugen.gen;
- hlist_nulls_del_rcu(&lruvec->lrugen.list); + hlist_nulls_del_init_rcu(&lruvec->lrugen.list); pgdat->memcg_lru.nr_memcgs[gen]--;
if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq)) WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); - +unlock: spin_unlock_irq(&pgdat->memcg_lru.lock); } } @@ -5398,8 +5399,10 @@ restart: rcu_read_lock();
hlist_nulls_for_each_entry_rcu(lrugen, pos, &pgdat->memcg_lru.fifo[gen][bin], list) { - if (op) + if (op) { lru_gen_rotate_memcg(lruvec, op); + op = 0; + }
mem_cgroup_put(memcg);
@@ -5407,7 +5410,7 @@ restart: memcg = lruvec_memcg(lruvec);
if (!mem_cgroup_tryget(memcg)) { - op = 0; + lru_gen_release_memcg(memcg); memcg = NULL; continue; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit f83913f8c5b882a312e72b7669762f8a5c9385e4 upstream.
A syzbot stress test reported that create_empty_buffers() called from nilfs_lookup_dirty_data_buffers() can cause a general protection fault.
Analysis using its reproducer revealed that the back reference "mapping" from a page/folio has been changed to NULL after dirty page/folio gang lookup in nilfs_lookup_dirty_data_buffers().
Fix this issue by excluding pages/folios from being collected if, after acquiring a lock on each page/folio, its back reference "mapping" differs from the pointer to the address space struct that held the page/folio.
Link: https://lkml.kernel.org/r/20230805132038.6435-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+0ad741797f4565e7e2d2@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000002930a705fc32b231@google.com Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/segment.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -725,6 +725,11 @@ static size_t nilfs_lookup_dirty_data_bu struct folio *folio = fbatch.folios[i];
folio_lock(folio); + if (unlikely(folio->mapping != mapping)) { + /* Exclude folios removed from the address space */ + folio_unlock(folio); + continue; + } head = folio_buffers(folio); if (!head) { create_empty_buffers(&folio->page, i_blocksize(inode), 0);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
commit be2fd1560eb57b7298aa3c258ddcca0d53ecdea3 upstream.
Be more careful when tearing down the subrequests of an O_DIRECT write as part of a retransmission.
Reported-by: Chris Mason clm@fb.com Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfs/direct.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-)
--- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -472,20 +472,26 @@ out: return result; }
-static void -nfs_direct_join_group(struct list_head *list, struct inode *inode) +static void nfs_direct_join_group(struct list_head *list, struct inode *inode) { - struct nfs_page *req, *next; + struct nfs_page *req, *subreq;
list_for_each_entry(req, list, wb_list) { - if (req->wb_head != req || req->wb_this_page == req) + if (req->wb_head != req) continue; - for (next = req->wb_this_page; - next != req->wb_head; - next = next->wb_this_page) { - nfs_list_remove_request(next); - nfs_release_request(next); - } + subreq = req->wb_this_page; + if (subreq == req) + continue; + do { + /* + * Remove subrequests from this list before freeing + * them in the call to nfs_join_page_group(). + */ + if (!list_empty(&subreq->wb_list)) { + nfs_list_remove_request(subreq); + nfs_release_request(subreq); + } + } while ((subreq = subreq->wb_this_page) != req); nfs_join_page_group(req, inode); } }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Coddington bcodding@redhat.com
commit 3b816601e279756e781e6c4d9b3f3bd21a72ac67 upstream.
We have some reports of linux NFS clients that cannot satisfy a linux knfsd server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though those clients repeatedly walk all their known state using TEST_STATEID and receive NFS4_OK for all.
Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then nfsd4_free_stateid() finds the delegation and returns NFS4_OK to FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to cl_revoked. This would produce the observed client/server effect.
Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID and move to cl_revoked happens within the same cl_lock. This will allow nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575 Signed-off-by: Benjamin Coddington bcodding@redhat.com Cc: stable@vger.kernel.org # v4.17+ Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1354,9 +1354,9 @@ static void revoke_delegation(struct nfs trace_nfsd_stid_revoke(&dp->dl_stid);
if (clp->cl_minorversion) { + spin_lock(&clp->cl_lock); dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID; refcount_inc(&dp->dl_stid.sc_count); - spin_lock(&clp->cl_lock); list_add(&dp->dl_recall_lru, &clp->cl_revoked); spin_unlock(&clp->cl_lock); }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Göttsche cgzones@googlemail.com
commit 70d91dc9b2ac91327d0eefd86163abc3548effa6 upstream.
Set the next pointer in filename_trans_read_helper() before attaching the new node under construction to the list, otherwise garbage would be dereferenced on subsequent failure during cleanup in the out goto label.
Cc: stable@vger.kernel.org Fixes: 430059024389 ("selinux: implement new format of filename transitions") Signed-off-by: Christian Göttsche cgzones@googlemail.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/selinux/ss/policydb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/security/selinux/ss/policydb.c +++ b/security/selinux/ss/policydb.c @@ -2005,6 +2005,7 @@ static int filename_trans_read_helper(st if (!datum) goto out;
+ datum->next = NULL; *dst = datum;
/* ebitmap_read() will at least init the bitmap */ @@ -2017,7 +2018,6 @@ static int filename_trans_read_helper(st goto out;
datum->otype = le32_to_cpu(buf[0]); - datum->next = NULL;
dst = &datum->next; }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sven Eckelmann sven@narfation.org
commit c6a953cce8d0438391e6da48c8d0793d3fbfcfa6 upstream.
If an interface changes the MTU, it is expected that an NETDEV_PRECHANGEMTU and NETDEV_CHANGEMTU notification events is triggered. This worked fine for .ndo_change_mtu based changes because core networking code took care of it. But for auto-adjustments after hard-interfaces changes, these events were simply missing.
Due to this problem, non-batman-adv components weren't aware of MTU changes and thus couldn't perform their own tasks correctly.
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Cc: stable@vger.kernel.org Signed-off-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/batman-adv/hard-interface.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/batman-adv/hard-interface.c +++ b/net/batman-adv/hard-interface.c @@ -630,7 +630,7 @@ out: */ void batadv_update_min_mtu(struct net_device *soft_iface) { - soft_iface->mtu = batadv_hardif_min_mtu(soft_iface); + dev_set_mtu(soft_iface, batadv_hardif_min_mtu(soft_iface));
/* Check if the local translate table should be cleaned up to match a * new (and smaller) MTU.
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sven Eckelmann sven@narfation.org
commit d8e42a2b0addf238be8b3b37dcd9795a5c1be459 upstream.
If the user set an MTU value, it usually means that there are special requirements for the MTU. But if an interface gots activated, the MTU was always recalculated and then the user set value was overwritten.
The only reason why this user set value has to be overwritten, is when the MTU has to be decreased because batman-adv is not able to transfer packets with the user specified size.
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Cc: stable@vger.kernel.org Signed-off-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/batman-adv/hard-interface.c | 14 +++++++++++++- net/batman-adv/soft-interface.c | 3 +++ net/batman-adv/types.h | 6 ++++++ 3 files changed, 22 insertions(+), 1 deletion(-)
--- a/net/batman-adv/hard-interface.c +++ b/net/batman-adv/hard-interface.c @@ -630,7 +630,19 @@ out: */ void batadv_update_min_mtu(struct net_device *soft_iface) { - dev_set_mtu(soft_iface, batadv_hardif_min_mtu(soft_iface)); + struct batadv_priv *bat_priv = netdev_priv(soft_iface); + int limit_mtu; + int mtu; + + mtu = batadv_hardif_min_mtu(soft_iface); + + if (bat_priv->mtu_set_by_user) + limit_mtu = bat_priv->mtu_set_by_user; + else + limit_mtu = ETH_DATA_LEN; + + mtu = min(mtu, limit_mtu); + dev_set_mtu(soft_iface, mtu);
/* Check if the local translate table should be cleaned up to match a * new (and smaller) MTU. --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -153,11 +153,14 @@ static int batadv_interface_set_mac_addr
static int batadv_interface_change_mtu(struct net_device *dev, int new_mtu) { + struct batadv_priv *bat_priv = netdev_priv(dev); + /* check ranges */ if (new_mtu < 68 || new_mtu > batadv_hardif_min_mtu(dev)) return -EINVAL;
dev->mtu = new_mtu; + bat_priv->mtu_set_by_user = new_mtu;
return 0; } --- a/net/batman-adv/types.h +++ b/net/batman-adv/types.h @@ -1547,6 +1547,12 @@ struct batadv_priv { struct net_device *soft_iface;
/** + * @mtu_set_by_user: MTU was set once by user + * protected by rtnl_lock + */ + int mtu_set_by_user; + + /** * @bat_counters: mesh internal traffic statistic counters (see * batadv_counters) */
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Remi Pommarel repk@triplefau.lt
commit eac27a41ab641de074655d2932fc7f8cdb446881 upstream.
If received skb in batadv_v_elp_packet_recv or batadv_v_ogm_packet_recv is either cloned or non linearized then its data buffer will be reallocated by batadv_check_management_packet when skb_cow or skb_linearize get called. Thus geting ethernet header address inside skb data buffer before batadv_check_management_packet had any chance to reallocate it could lead to the following kernel panic:
Unable to handle kernel paging request at virtual address ffffff8020ab069a Mem abort info: ESR = 0x96000007 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x07: level 3 translation fault Data abort info: ISV = 0, ISS = 0x00000007 CM = 0, WnR = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000040f45000 [ffffff8020ab069a] pgd=180000007fffa003, p4d=180000007fffa003, pud=180000007fffa003, pmd=180000007fefe003, pte=0068000020ab0706 Internal error: Oops: 96000007 [#1] SMP Modules linked in: ahci_mvebu libahci_platform libahci dvb_usb_af9035 dvb_usb_dib0700 dib0070 dib7000m dibx000_common ath11k_pci ath10k_pci ath10k_core mwl8k_new nf_nat_sip nf_conntrack_sip xhci_plat_hcd xhci_hcd nf_nat_pptp nf_conntrack_pptp at24 sbsa_gwdt CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.15.42-00066-g3242268d425c-dirty #550 Hardware name: A8k (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : batadv_is_my_mac+0x60/0xc0 lr : batadv_v_ogm_packet_recv+0x98/0x5d0 sp : ffffff8000183820 x29: ffffff8000183820 x28: 0000000000000001 x27: ffffff8014f9af00 x26: 0000000000000000 x25: 0000000000000543 x24: 0000000000000003 x23: ffffff8020ab0580 x22: 0000000000000110 x21: ffffff80168ae880 x20: 0000000000000000 x19: ffffff800b561000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 00dc098924ae0032 x14: 0f0405433e0054b0 x13: ffffffff00000080 x12: 0000004000000001 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : ffffffc076dae000 x6 : ffffff8000183700 x5 : ffffffc00955e698 x4 : ffffff80168ae000 x3 : ffffff80059cf000 x2 : ffffff800b561000 x1 : ffffff8020ab0696 x0 : ffffff80168ae880 Call trace: batadv_is_my_mac+0x60/0xc0 batadv_v_ogm_packet_recv+0x98/0x5d0 batadv_batman_skb_recv+0x1b8/0x244 __netif_receive_skb_core.isra.0+0x440/0xc74 __netif_receive_skb_one_core+0x14/0x20 netif_receive_skb+0x68/0x140 br_pass_frame_up+0x70/0x80 br_handle_frame_finish+0x108/0x284 br_handle_frame+0x190/0x250 __netif_receive_skb_core.isra.0+0x240/0xc74 __netif_receive_skb_list_core+0x6c/0x90 netif_receive_skb_list_internal+0x1f4/0x310 napi_complete_done+0x64/0x1d0 gro_cell_poll+0x7c/0xa0 __napi_poll+0x34/0x174 net_rx_action+0xf8/0x2a0 _stext+0x12c/0x2ac run_ksoftirqd+0x4c/0x7c smpboot_thread_fn+0x120/0x210 kthread+0x140/0x150 ret_from_fork+0x10/0x20 Code: f9403844 eb03009f 54fffee1 f94
Thus ethernet header address should only be fetched after batadv_check_management_packet has been called.
Fixes: 0da0035942d4 ("batman-adv: OGMv2 - add basic infrastructure") Cc: stable@vger.kernel.org Signed-off-by: Remi Pommarel repk@triplefau.lt Signed-off-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/batman-adv/bat_v_elp.c | 3 ++- net/batman-adv/bat_v_ogm.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
--- a/net/batman-adv/bat_v_elp.c +++ b/net/batman-adv/bat_v_elp.c @@ -505,7 +505,7 @@ int batadv_v_elp_packet_recv(struct sk_b struct batadv_priv *bat_priv = netdev_priv(if_incoming->soft_iface); struct batadv_elp_packet *elp_packet; struct batadv_hard_iface *primary_if; - struct ethhdr *ethhdr = (struct ethhdr *)skb_mac_header(skb); + struct ethhdr *ethhdr; bool res; int ret = NET_RX_DROP;
@@ -513,6 +513,7 @@ int batadv_v_elp_packet_recv(struct sk_b if (!res) goto free_skb;
+ ethhdr = eth_hdr(skb); if (batadv_is_my_mac(bat_priv, ethhdr->h_source)) goto free_skb;
--- a/net/batman-adv/bat_v_ogm.c +++ b/net/batman-adv/bat_v_ogm.c @@ -985,7 +985,7 @@ int batadv_v_ogm_packet_recv(struct sk_b { struct batadv_priv *bat_priv = netdev_priv(if_incoming->soft_iface); struct batadv_ogm2_packet *ogm_packet; - struct ethhdr *ethhdr = eth_hdr(skb); + struct ethhdr *ethhdr; int ogm_offset; u8 *packet_pos; int ret = NET_RX_DROP; @@ -999,6 +999,7 @@ int batadv_v_ogm_packet_recv(struct sk_b if (!batadv_check_management_packet(skb, if_incoming, BATADV_OGM2_HLEN)) goto free_skb;
+ ethhdr = eth_hdr(skb); if (batadv_is_my_mac(bat_priv, ethhdr->h_source)) goto free_skb;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Remi Pommarel repk@triplefau.lt
commit d25ddb7e788d34cf27ff1738d11a87cb4b67d446 upstream.
When a client roamed back to a node before it got time to destroy the pending local entry (i.e. within the same originator interval) the old global one is directly removed from hash table and left as such.
But because this entry had an extra reference taken at lookup (i.e using batadv_tt_global_hash_find) there is no way its memory will be reclaimed at any time causing the following memory leak:
unreferenced object 0xffff0000073c8000 (size 18560): comm "softirq", pid 0, jiffies 4294907738 (age 228.644s) hex dump (first 32 bytes): 06 31 ac 12 c7 7a 05 00 01 00 00 00 00 00 00 00 .1...z.......... 2c ad be 08 00 80 ff ff 6c b6 be 08 00 80 ff ff ,.......l....... backtrace: [<00000000ee6e0ffa>] kmem_cache_alloc+0x1b4/0x300 [<000000000ff2fdbc>] batadv_tt_global_add+0x700/0xe20 [<00000000443897c7>] _batadv_tt_update_changes+0x21c/0x790 [<000000005dd90463>] batadv_tt_update_changes+0x3c/0x110 [<00000000a2d7fc57>] batadv_tt_tvlv_unicast_handler_v1+0xafc/0xe10 [<0000000011793f2a>] batadv_tvlv_containers_process+0x168/0x2b0 [<00000000b7cbe2ef>] batadv_recv_unicast_tvlv+0xec/0x1f4 [<0000000042aef1d8>] batadv_batman_skb_recv+0x25c/0x3a0 [<00000000bbd8b0a2>] __netif_receive_skb_core.isra.0+0x7a8/0xe90 [<000000004033d428>] __netif_receive_skb_one_core+0x64/0x74 [<000000000f39a009>] __netif_receive_skb+0x48/0xe0 [<00000000f2cd8888>] process_backlog+0x174/0x344 [<00000000507d6564>] __napi_poll+0x58/0x1f4 [<00000000b64ef9eb>] net_rx_action+0x504/0x590 [<00000000056fa5e4>] _stext+0x1b8/0x418 [<00000000878879d6>] run_ksoftirqd+0x74/0xa4 unreferenced object 0xffff00000bae1a80 (size 56): comm "softirq", pid 0, jiffies 4294910888 (age 216.092s) hex dump (first 32 bytes): 00 78 b1 0b 00 00 ff ff 0d 50 00 00 00 00 00 00 .x.......P...... 00 00 00 00 00 00 00 00 50 c8 3c 07 00 00 ff ff ........P.<..... backtrace: [<00000000ee6e0ffa>] kmem_cache_alloc+0x1b4/0x300 [<00000000d9aaa49e>] batadv_tt_global_add+0x53c/0xe20 [<00000000443897c7>] _batadv_tt_update_changes+0x21c/0x790 [<000000005dd90463>] batadv_tt_update_changes+0x3c/0x110 [<00000000a2d7fc57>] batadv_tt_tvlv_unicast_handler_v1+0xafc/0xe10 [<0000000011793f2a>] batadv_tvlv_containers_process+0x168/0x2b0 [<00000000b7cbe2ef>] batadv_recv_unicast_tvlv+0xec/0x1f4 [<0000000042aef1d8>] batadv_batman_skb_recv+0x25c/0x3a0 [<00000000bbd8b0a2>] __netif_receive_skb_core.isra.0+0x7a8/0xe90 [<000000004033d428>] __netif_receive_skb_one_core+0x64/0x74 [<000000000f39a009>] __netif_receive_skb+0x48/0xe0 [<00000000f2cd8888>] process_backlog+0x174/0x344 [<00000000507d6564>] __napi_poll+0x58/0x1f4 [<00000000b64ef9eb>] net_rx_action+0x504/0x590 [<00000000056fa5e4>] _stext+0x1b8/0x418 [<00000000878879d6>] run_ksoftirqd+0x74/0xa4
Releasing the extra reference from batadv_tt_global_hash_find even at roam back when batadv_tt_global_free is called fixes this memory leak.
Cc: stable@vger.kernel.org Fixes: 068ee6e204e1 ("batman-adv: roaming handling mechanism redesign") Signed-off-by: Remi Pommarel repk@triplefau.lt Signed-off-by; Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/batman-adv/translation-table.c | 1 - 1 file changed, 1 deletion(-)
--- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -774,7 +774,6 @@ check_roaming: if (roamed_back) { batadv_tt_global_free(bat_priv, tt_global, "Roaming canceled"); - tt_global = NULL; } else { /* The global entry has to be marked as ROAMING and * has to be kept for consistency purpose
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Remi Pommarel repk@triplefau.lt
commit 421d467dc2d483175bad4fb76a31b9e5a3d744cf upstream.
When batadv_v_ogm_aggr_send is called for an inactive interface, the skb is silently dropped by batadv_v_ogm_send_to_if() but never freed causing the following memory leak:
unreferenced object 0xffff00000c164800 (size 512): comm "kworker/u8:1", pid 2648, jiffies 4295122303 (age 97.656s) hex dump (first 32 bytes): 00 80 af 09 00 00 ff ff e1 09 00 00 75 01 60 83 ............u.`. 1f 00 00 00 b8 00 00 00 15 00 05 00 da e3 d3 64 ...............d backtrace: [<0000000007ad20f6>] __kmalloc_track_caller+0x1a8/0x310 [<00000000d1029e55>] kmalloc_reserve.constprop.0+0x70/0x13c [<000000008b9d4183>] __alloc_skb+0xec/0x1fc [<00000000c7af5051>] __netdev_alloc_skb+0x48/0x23c [<00000000642ee5f5>] batadv_v_ogm_aggr_send+0x50/0x36c [<0000000088660bd7>] batadv_v_ogm_aggr_work+0x24/0x40 [<0000000042fc2606>] process_one_work+0x3b0/0x610 [<000000002f2a0b1c>] worker_thread+0xa0/0x690 [<0000000059fae5d4>] kthread+0x1fc/0x210 [<000000000c587d3a>] ret_from_fork+0x10/0x20
Free the skb in that case to fix this leak.
Cc: stable@vger.kernel.org Fixes: 0da0035942d4 ("batman-adv: OGMv2 - add basic infrastructure") Signed-off-by: Remi Pommarel repk@triplefau.lt Signed-off-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/batman-adv/bat_v_ogm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/batman-adv/bat_v_ogm.c +++ b/net/batman-adv/bat_v_ogm.c @@ -123,8 +123,10 @@ static void batadv_v_ogm_send_to_if(stru { struct batadv_priv *bat_priv = netdev_priv(hard_iface->soft_iface);
- if (hard_iface->if_status != BATADV_IF_ACTIVE) + if (hard_iface->if_status != BATADV_IF_ACTIVE) { + kfree_skb(skb); return; + }
batadv_inc_counter(bat_priv, BATADV_CNT_MGMT_TX); batadv_add_counter(bat_priv, BATADV_CNT_MGMT_TX_BYTES,
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sven Eckelmann sven@narfation.org
commit 987aae75fc1041072941ffb622b45ce2359a99b9 upstream.
The automatic recalculation of the maximum allowed MTU is usually triggered by code sections which are already rtnl lock protected by callers outside of batman-adv. But when the fragmentation setting is changed via batman-adv's own batadv genl family, then the rtnl lock is not yet taken.
But dev_set_mtu requires that the caller holds the rtnl lock because it uses netdevice notifiers. And this code will then fail the check for this lock:
RTNL: assertion failed at net/core/dev.c (1953)
Cc: stable@vger.kernel.org Reported-by: syzbot+f8812454d9b3ac00d282@syzkaller.appspotmail.com Fixes: c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU") Signed-off-by: Sven Eckelmann sven@narfation.org Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230821-batadv-missing-mtu-rtnl-lock-v1-1-1c5a7bf... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/batman-adv/netlink.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/batman-adv/netlink.c +++ b/net/batman-adv/netlink.c @@ -495,7 +495,10 @@ static int batadv_netlink_set_mesh(struc attr = info->attrs[BATADV_ATTR_FRAGMENTATION_ENABLED];
atomic_set(&bat_priv->fragmentation, !!nla_get_u8(attr)); + + rtnl_lock(); batadv_update_min_mtu(bat_priv->soft_iface); + rtnl_unlock(); }
if (info->attrs[BATADV_ATTR_GW_BANDWIDTH_DOWN]) {
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit 453b014e2c294abf762d3bce12e91ce4b34055e6 upstream.
It turns out that some PCSpecialist Elimina Pro 16 M models have "GM6BGEQ" as DMI product-name instead of "Elimina Pro 16 M", causing the existing DMI quirk to not work on these models.
The DMI board-name is always "GM6BGEQ", so match on that instead.
Fixes: 56fec0051a69 ("ACPI: resource: Add IRQ override quirk for PCSpecialist Elimina Pro 16 M") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217394#c36 Cc: All applicable stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/resource.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index a4d9f149b48d..32cfa3f4efd3 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -501,9 +501,13 @@ static const struct dmi_system_id maingear_laptop[] = { static const struct dmi_system_id pcspecialist_laptop[] = { { .ident = "PCSpecialist Elimina Pro 16 M", + /* + * Some models have product-name "Elimina Pro 16 M", + * others "GM6BGEQ". Match on board-name to match both. + */ .matches = { DMI_MATCH(DMI_SYS_VENDOR, "PCSpecialist"), - DMI_MATCH(DMI_PRODUCT_NAME, "Elimina Pro 16 M"), + DMI_MATCH(DMI_BOARD_NAME, "GM6BGEQ"), }, }, { }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 382d4cd1847517ffcb1800fd462b625db7b2ebea upstream.
The gcc compiler translates on some architectures the 64-bit __builtin_clzll() function to a call to the libgcc function __clzdi2(), which should take a 64-bit parameter on 32- and 64-bit platforms.
But in the current kernel code, the built-in __clzdi2() function is defined to operate (wrongly) on 32-bit parameters if BITS_PER_LONG == 32, thus the return values on 32-bit kernels are in the range from [0..31] instead of the expected [0..63] range.
This patch fixes the in-kernel functions __clzdi2() and __ctzdi2() to take a 64-bit parameter on 32-bit kernels as well, thus it makes the functions identical for 32- and 64-bit kernels.
This bug went unnoticed since kernel 3.11 for over 10 years, and here are some possible reasons for that:
a) Some architectures have assembly instructions to count the bits and which are used instead of calling __clzdi2(), e.g. on x86 the bsr instruction and on ppc cntlz is used. On such architectures the wrong __clzdi2() implementation isn't used and as such the bug has no effect and won't be noticed.
b) Some architectures link to libgcc.a, and the in-kernel weak functions get replaced by the correct 64-bit variants from libgcc.a.
c) __builtin_clzll() and __clzdi2() doesn't seem to be used in many places in the kernel, and most likely only in uncritical functions, e.g. when printing hex values via seq_put_hex_ll(). The wrong return value will still print the correct number, but just in a wrong formatting (e.g. with too many leading zeroes).
d) 32-bit kernels aren't used that much any longer, so they are less tested.
A trivial testcase to verify if the currently running 32-bit kernel is affected by the bug is to look at the output of /proc/self/maps:
Here the kernel uses a correct implementation of __clzdi2():
root@debian:~# cat /proc/self/maps 00010000-00019000 r-xp 00000000 08:05 787324 /usr/bin/cat 00019000-0001a000 rwxp 00009000 08:05 787324 /usr/bin/cat 0001a000-0003b000 rwxp 00000000 00:00 0 [heap] f7551000-f770d000 r-xp 00000000 08:05 794765 /usr/lib/hppa-linux-gnu/libc.so.6 ...
and this kernel uses the broken implementation of __clzdi2():
root@debian:~# cat /proc/self/maps 0000000010000-0000000019000 r-xp 00000000 000000008:000000005 787324 /usr/bin/cat 0000000019000-000000001a000 rwxp 000000009000 000000008:000000005 787324 /usr/bin/cat 000000001a000-000000003b000 rwxp 00000000 00:00 0 [heap] 00000000f73d1000-00000000f758d000 r-xp 00000000 000000008:000000005 794765 /usr/lib/hppa-linux-gnu/libc.so.6 ...
Signed-off-by: Helge Deller deller@gmx.de Fixes: 4df87bb7b6a22 ("lib: add weak clz/ctz functions") Cc: Chanho Min chanho.min@lge.com Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/clz_ctz.c | 32 ++++++-------------------------- 1 file changed, 6 insertions(+), 26 deletions(-)
--- a/lib/clz_ctz.c +++ b/lib/clz_ctz.c @@ -28,36 +28,16 @@ int __weak __clzsi2(int val) } EXPORT_SYMBOL(__clzsi2);
-int __weak __clzdi2(long val); -int __weak __ctzdi2(long val); -#if BITS_PER_LONG == 32 - -int __weak __clzdi2(long val) +int __weak __clzdi2(u64 val); +int __weak __clzdi2(u64 val) { - return 32 - fls((int)val); + return 64 - fls64(val); } EXPORT_SYMBOL(__clzdi2);
-int __weak __ctzdi2(long val) +int __weak __ctzdi2(u64 val); +int __weak __ctzdi2(u64 val) { - return __ffs((u32)val); + return __ffs64(val); } EXPORT_SYMBOL(__ctzdi2); - -#elif BITS_PER_LONG == 64 - -int __weak __clzdi2(long val) -{ - return 64 - fls64((u64)val); -} -EXPORT_SYMBOL(__clzdi2); - -int __weak __ctzdi2(long val) -{ - return __ffs64((u64)val); -} -EXPORT_SYMBOL(__ctzdi2); - -#else -#error BITS_PER_LONG not 32 or 64 -#endif
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mingzheng Xing xingmingzheng@iscas.ac.cn
commit ca09f772cccaeec4cd05a21528c37a260aa2dd2c upstream.
Binutils-2.38 and GCC-12.1.0 bumped[0][1] the default ISA spec to the newer 20191213 version which moves some instructions from the I extension to the Zicsr and Zifencei extensions. So if one of the binutils and GCC exceeds that version, we should explicitly specifying Zicsr and Zifencei via -march to cope with the new changes. but this only occurs when binutils >= 2.36 and GCC >= 11.1.0. It's a different story when binutils < 2.36.
binutils-2.36 supports the Zifencei extension[2] and splits Zifencei and Zicsr from I[3]. GCC-11.1.0 is particular[4] because it add support Zicsr and Zifencei extension for -march. binutils-2.35 does not support the Zifencei extension, and does not need to specify Zicsr and Zifencei when working with GCC >= 12.1.0.
To make our lives easier, let's relax the check to binutils >= 2.36 in CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. For the other two cases, where clang < 17 or GCC < 11.1.0, we will deal with them in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC.
For more information, please refer to: commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38") commit e89c2e815e76 ("riscv: Handle zicsr/zifencei issues between clang and binutils")
Link: https://sourceware.org/git/?p=binutils-gdb.git%3Ba=commit%3Bh=aed44286efa8ae... [0] Link: https://gcc.gnu.org/git/?p=gcc.git%3Ba=commit%3Bh=98416dbb0a62579d4a7a4a76ba... [1] Link: https://sourceware.org/git/?p=binutils-gdb.git%3Ba=commit%3Bh=5a1b31e1e1cee6... [2] Link: https://sourceware.org/git/?p=binutils-gdb.git%3Ba=commit%3Bh=729a53530e8697... [3] Link: https://gcc.gnu.org/git/?p=gcc.git%3Ba=commit%3Bh=b03be74bad08c382da47e04800... [4] Link: https://lore.kernel.org/all/20230308220842.1231003-1-conor@kernel.org Link: https://lore.kernel.org/all/20230223220546.52879-1-conor@kernel.org Reviewed-by: Conor Dooley conor.dooley@microchip.com Acked-by: Guo Ren guoren@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Mingzheng Xing xingmingzheng@iscas.ac.cn Link: https://lore.kernel.org/r/20230809165648.21071-1-xingmingzheng@iscas.ac.cn Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/Kconfig | 28 +++++++++++++++++----------- arch/riscv/kernel/compat_vdso/Makefile | 8 +++++++- 2 files changed, 24 insertions(+), 12 deletions(-)
--- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -525,24 +525,30 @@ config TOOLCHAIN_HAS_ZIHINTPAUSE config TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI def_bool y # https://sourceware.org/git/?p=binutils-gdb.git%3Ba=commit%3Bh=aed44286efa8ae... - depends on AS_IS_GNU && AS_VERSION >= 23800 + # https://gcc.gnu.org/git/?p=gcc.git%3Ba=commit%3Bh=98416dbb0a62579d4a7a4a76ba... + depends on AS_IS_GNU && AS_VERSION >= 23600 help - Newer binutils versions default to ISA spec version 20191213 which - moves some instructions from the I extension to the Zicsr and Zifencei - extensions. + Binutils-2.38 and GCC-12.1.0 bumped the default ISA spec to the newer + 20191213 version, which moves some instructions from the I extension to + the Zicsr and Zifencei extensions. This requires explicitly specifying + Zicsr and Zifencei when binutils >= 2.38 or GCC >= 12.1.0. Zicsr + and Zifencei are supported in binutils from version 2.36 onwards. + To make life easier, and avoid forcing toolchains that default to a + newer ISA spec to version 2.2, relax the check to binutils >= 2.36. + For clang < 17 or GCC < 11.1.0, for which this is not possible, this is + dealt with in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC.
config TOOLCHAIN_NEEDS_OLD_ISA_SPEC def_bool y depends on TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI # https://github.com/llvm/llvm-project/commit/22e199e6afb1263c943c0c0d4498694e... - depends on CC_IS_CLANG && CLANG_VERSION < 170000 + # https://gcc.gnu.org/git/?p=gcc.git%3Ba=commit%3Bh=b03be74bad08c382da47e04800... + depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION < 110100) help - Certain versions of clang do not support zicsr and zifencei via -march - but newer versions of binutils require it for the reasons noted in the - help text of CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. This - option causes an older ISA spec compatible with these older versions - of clang to be passed to GAS, which has the same result as passing zicsr - and zifencei to -march. + Certain versions of clang and GCC do not support zicsr and zifencei via + -march. This option causes an older ISA spec compatible with these older + versions of clang and GCC to be passed to GAS, which has the same result + as passing zicsr and zifencei to -march.
config FPU bool "FPU support" --- a/arch/riscv/kernel/compat_vdso/Makefile +++ b/arch/riscv/kernel/compat_vdso/Makefile @@ -11,7 +11,13 @@ compat_vdso-syms += flush_icache COMPAT_CC := $(CC) COMPAT_LD := $(LD)
-COMPAT_CC_FLAGS := -march=rv32g -mabi=ilp32 +# binutils 2.35 does not support the zifencei extension, but in the ISA +# spec 20191213, G stands for IMAFD_ZICSR_ZIFENCEI. +ifdef CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI + COMPAT_CC_FLAGS := -march=rv32g -mabi=ilp32 +else + COMPAT_CC_FLAGS := -march=rv32imafd -mabi=ilp32 +endif COMPAT_LD_FLAGS := -melf32lriscv
# Disable attributes, as they're useless and break the build.
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mingzheng Xing xingmingzheng@iscas.ac.cn
commit ef21fa7c198e04f3d3053b1c5b5f2b4b225c3350 upstream.
When building the kernel with binutils 2.37 and GCC-11.1.0/GCC-11.2.0, the following error occurs:
Assembler messages: Error: cannot find default versions of the ISA extension `zicsr' Error: cannot find default versions of the ISA extension `zifencei'
The above error originated from this commit of binutils[0], which has been resolved and backported by GCC-12.1.0[1] and GCC-11.3.0[2].
So fix this by change the GCC version in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC to GCC-11.3.0.
Link: https://sourceware.org/git/?p=binutils-gdb.git%3Ba=commit%3Bh=f0bae2552db1dd... [0] Link: https://gcc.gnu.org/git/?p=gcc.git%3Ba=commit%3Bh=ca2bbb88f999f4d3cc40e89bc1... [1] Link: https://gcc.gnu.org/git/?p=gcc.git%3Ba=commit%3Bh=d29f5d6ab513c52fd872f532c4... [2] Fixes: ca09f772ccca ("riscv: Handle zicsr/zifencei issue between gcc and binutils") Reported-by: Conor Dooley conor.dooley@microchip.com Cc: stable@vger.kernel.org Signed-off-by: Mingzheng Xing xingmingzheng@iscas.ac.cn Link: https://lore.kernel.org/r/20230824190852.45470-1-xingmingzheng@iscas.ac.cn Closes: https://lore.kernel.org/all/20230823-captive-abdomen-befd942a4a73@wendy/ Reviewed-by: Conor Dooley conor.dooley@microchip.com Tested-by: Conor Dooley conor.dooley@microchip.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/Kconfig | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -535,15 +535,15 @@ config TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZI and Zifencei are supported in binutils from version 2.36 onwards. To make life easier, and avoid forcing toolchains that default to a newer ISA spec to version 2.2, relax the check to binutils >= 2.36. - For clang < 17 or GCC < 11.1.0, for which this is not possible, this is - dealt with in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC. + For clang < 17 or GCC < 11.3.0, for which this is not possible or need + special treatment, this is dealt with in TOOLCHAIN_NEEDS_OLD_ISA_SPEC.
config TOOLCHAIN_NEEDS_OLD_ISA_SPEC def_bool y depends on TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI # https://github.com/llvm/llvm-project/commit/22e199e6afb1263c943c0c0d4498694e... - # https://gcc.gnu.org/git/?p=gcc.git%3Ba=commit%3Bh=b03be74bad08c382da47e04800... - depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION < 110100) + # https://gcc.gnu.org/git/?p=gcc.git%3Ba=commit%3Bh=d29f5d6ab513c52fd872f532c4... + depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION < 110300) help Certain versions of clang and GCC do not support zicsr and zifencei via -march. This option causes an older ISA spec compatible with these older
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
commit d59070d1076ec5114edb67c87658aeb1d691d381 upstream.
Recent versions of clang warn about an unused variable, though older versions saw the 'slot++' as a use and did not warn:
radix-tree.c:1136:50: error: parameter 'slot' set but not used [-Werror,-Wunused-but-set-parameter]
It's clearly not needed any more, so just remove it.
Link: https://lkml.kernel.org/r/20230811131023.2226509-1-arnd@kernel.org Fixes: 3a08cd52c37c7 ("radix tree: Remove multiorder support") Signed-off-by: Arnd Bergmann arnd@arndb.de Cc: Matthew Wilcox willy@infradead.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Peng Zhang zhangpeng.00@bytedance.com Cc: Rong Tao rongtao@cestc.cn Cc: Tom Rix trix@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/radix-tree.c | 1 - 1 file changed, 1 deletion(-)
--- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -1136,7 +1136,6 @@ static void set_iter_tags(struct radix_t void __rcu **radix_tree_iter_resume(void __rcu **slot, struct radix_tree_iter *iter) { - slot++; iter->index = __radix_tree_iter_add(iter, 1); iter->next_index = iter->index; iter->tags = 0;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Herring robh@kernel.org
commit 0aeae3788e28f64ccb95405d4dc8cd80637ffaea upstream.
Commit 12e17243d8a1 ("of: base: improve error msg in of_phandle_iterator_next()") added printing of the phandle value on error, but failed to update the unittest.
Fixes: 12e17243d8a1 ("of: base: improve error msg in of_phandle_iterator_next()") Cc: stable@vger.kernel.org Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20230801-dt-changeset-fixes-v3-1-5f0410e007dd@kern... Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/of/unittest.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -664,12 +664,12 @@ static void __init of_unittest_parse_pha memset(&args, 0, sizeof(args));
EXPECT_BEGIN(KERN_INFO, - "OF: /testcase-data/phandle-tests/consumer-b: could not find phandle"); + "OF: /testcase-data/phandle-tests/consumer-b: could not find phandle 12345678");
rc = of_parse_phandle_with_args_map(np, "phandle-list-bad-phandle", "phandle", 0, &args); EXPECT_END(KERN_INFO, - "OF: /testcase-data/phandle-tests/consumer-b: could not find phandle"); + "OF: /testcase-data/phandle-tests/consumer-b: could not find phandle 12345678");
unittest(rc == -EINVAL, "expected:%i got:%i\n", -EINVAL, rc);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Herring robh@kernel.org
commit 914d9d831e6126a6e7a92e27fcfaa250671be42c upstream.
While originally it was fine to format strings using "%pOF" while holding devtree_lock, this now causes a deadlock. Lockdep reports:
of_get_parent from of_fwnode_get_parent+0x18/0x24 ^^^^^^^^^^^^^ of_fwnode_get_parent from fwnode_count_parents+0xc/0x28 fwnode_count_parents from fwnode_full_name_string+0x18/0xac fwnode_full_name_string from device_node_string+0x1a0/0x404 device_node_string from pointer+0x3c0/0x534 pointer from vsnprintf+0x248/0x36c vsnprintf from vprintk_store+0x130/0x3b4
Fix this by moving the printing in __of_changeset_entry_apply() outside the lock. As the only difference in the multiple prints is the action name, use the existing "action_names" to refactor the prints into a single print.
Fixes: a92eb7621b9fb2c2 ("lib/vsprintf: Make use of fwnode API to obtain node names and separators") Cc: stable@vger.kernel.org Reported-by: Geert Uytterhoeven geert+renesas@glider.be Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20230801-dt-changeset-fixes-v3-2-5f0410e007dd@kern... Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/of/dynamic.c | 31 +++++++++---------------------- 1 file changed, 9 insertions(+), 22 deletions(-)
--- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -63,15 +63,14 @@ int of_reconfig_notifier_unregister(stru } EXPORT_SYMBOL_GPL(of_reconfig_notifier_unregister);
-#ifdef DEBUG -const char *action_names[] = { +static const char *action_names[] = { + [0] = "INVALID", [OF_RECONFIG_ATTACH_NODE] = "ATTACH_NODE", [OF_RECONFIG_DETACH_NODE] = "DETACH_NODE", [OF_RECONFIG_ADD_PROPERTY] = "ADD_PROPERTY", [OF_RECONFIG_REMOVE_PROPERTY] = "REMOVE_PROPERTY", [OF_RECONFIG_UPDATE_PROPERTY] = "UPDATE_PROPERTY", }; -#endif
int of_reconfig_notify(unsigned long action, struct of_reconfig_data *p) { @@ -620,21 +619,9 @@ static int __of_changeset_entry_apply(st }
ret = __of_add_property(ce->np, ce->prop); - if (ret) { - pr_err("changeset: add_property failed @%pOF/%s\n", - ce->np, - ce->prop->name); - break; - } break; case OF_RECONFIG_REMOVE_PROPERTY: ret = __of_remove_property(ce->np, ce->prop); - if (ret) { - pr_err("changeset: remove_property failed @%pOF/%s\n", - ce->np, - ce->prop->name); - break; - } break;
case OF_RECONFIG_UPDATE_PROPERTY: @@ -648,20 +635,17 @@ static int __of_changeset_entry_apply(st }
ret = __of_update_property(ce->np, ce->prop, &old_prop); - if (ret) { - pr_err("changeset: update_property failed @%pOF/%s\n", - ce->np, - ce->prop->name); - break; - } break; default: ret = -EINVAL; } raw_spin_unlock_irqrestore(&devtree_lock, flags);
- if (ret) + if (ret) { + pr_err("changeset: apply failed: %-15s %pOF:%s\n", + action_names[ce->action], ce->np, ce->prop->name); return ret; + }
switch (ce->action) { case OF_RECONFIG_ATTACH_NODE: @@ -947,6 +931,9 @@ int of_changeset_action(struct of_change if (!ce) return -ENOMEM;
+ if (WARN_ON(action >= ARRAY_SIZE(action_names))) + return -EINVAL; + /* get a reference to the node */ ce->action = action; ce->np = of_node_get(np);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit 6bc3462a0f5ecaa376a0b3d76dafc55796799e17 upstream.
Shubhra reports that their laptop is heating up over s2idle. Even though it's getting into the deepest state, it appears to be having spurious wakeup events.
While debugging a tangential issue with the RTC Carsten reports that recent 6.1.y based kernel face a similar problem.
Looking at acpidump and GPIO register comparisons these spurious wakeup events are from the GPIO associated with the I2C touchpad on both laptops and occur even when the touchpad is not marked as a wake source by the kernel.
This means that the boot firmware has programmed these bits and because Linux didn't touch them lead to spurious wakeup events from that GPIO.
To fix this issue, restore most of the code that previously would clear all the bits associated with wakeup sources. This will allow the kernel to only program the wake up sources that are necessary.
This is similar to what was done previously; but only the wake bits are cleared by default instead of interrupts and wake bits. If any other problems are reported then it may make sense to clear interrupts again too.
Cc: Sachi King nakato@nakato.io Cc: stable@vger.kernel.org Cc: Thorsten Leemhuis regressions@leemhuis.info Fixes: 65f6c7c91cb2 ("pinctrl: amd: Revert "pinctrl: amd: disable and mask interrupts on probe"") Reported-by: Shubhra Prakash Nandi email2shubhra@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217754 Reported-by: Carsten Hatger xmb8dsv4@gmail.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=217626#c28 Signed-off-by: Mario Limonciello mario.limonciello@amd.com Link: https://lore.kernel.org/r/20230818144850.1439-1-mario.limonciello@amd.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/pinctrl-amd.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+)
--- a/drivers/pinctrl/pinctrl-amd.c +++ b/drivers/pinctrl/pinctrl-amd.c @@ -862,6 +862,33 @@ static const struct pinconf_ops amd_pinc .pin_config_group_set = amd_pinconf_group_set, };
+static void amd_gpio_irq_init(struct amd_gpio *gpio_dev) +{ + struct pinctrl_desc *desc = gpio_dev->pctrl->desc; + unsigned long flags; + u32 pin_reg, mask; + int i; + + mask = BIT(WAKE_CNTRL_OFF_S0I3) | BIT(WAKE_CNTRL_OFF_S3) | + BIT(WAKE_CNTRL_OFF_S4); + + for (i = 0; i < desc->npins; i++) { + int pin = desc->pins[i].number; + const struct pin_desc *pd = pin_desc_get(gpio_dev->pctrl, pin); + + if (!pd) + continue; + + raw_spin_lock_irqsave(&gpio_dev->lock, flags); + + pin_reg = readl(gpio_dev->base + pin * 4); + pin_reg &= ~mask; + writel(pin_reg, gpio_dev->base + pin * 4); + + raw_spin_unlock_irqrestore(&gpio_dev->lock, flags); + } +} + #ifdef CONFIG_PM_SLEEP static bool amd_gpio_should_save(struct amd_gpio *gpio_dev, unsigned int pin) { @@ -1099,6 +1126,9 @@ static int amd_gpio_probe(struct platfor return PTR_ERR(gpio_dev->pctrl); }
+ /* Disable and mask interrupts */ + amd_gpio_irq_init(gpio_dev); + girq = &gpio_dev->gc.irq; gpio_irq_chip_set_chip(girq, &amd_gpio_irqchip); /* This will let us handle the parent IRQ in the driver */
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wei Chen harperchen1110@gmail.com
commit e7f2e65699e2290fd547ec12a17008764e5d9620 upstream.
variable *nplanes is provided by user via system call argument. The possible value of q_data->fmt->num_planes is 1-3, while the value of *nplanes can be 1-8. The array access by index i can cause array out-of-bounds.
Fix this bug by checking *nplanes against the array size.
Fixes: 4e855a6efa54 ("[media] vcodec: mediatek: Add Mediatek V4L2 Video Encoder Driver") Signed-off-by: Wei Chen harperchen1110@gmail.com Cc: stable@vger.kernel.org Reviewed-by: Chen-Yu Tsai wenst@chromium.org Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c +++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c @@ -821,6 +821,8 @@ static int vb2ops_venc_queue_setup(struc return -EINVAL;
if (*nplanes) { + if (*nplanes != q_data->fmt->num_planes) + return -EINVAL; for (i = 0; i < *nplanes; i++) if (sizes[i] < q_data->sizeimage[i]) return -EINVAL;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Igor Mammedov imammedo@redhat.com
commit cc22522fd55e257c86d340ae9aedc122e705a435 upstream.
40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") changed acpiphp hotplug to use pci_assign_unassigned_bridge_resources() which depends on bridge being available, however enable_slot() can be called without bridge associated:
1. Legitimate case of hotplug on root bus (widely used in virt world)
2. A (misbehaving) firmware, that sends ACPI Bus Check notifications to non existing root ports (Dell Inspiron 7352/0W6WV0), which end up at enable_slot(..., bridge = 0) where bus has no bridge assigned to it. acpihp doesn't know that it's a bridge, and bus specific 'PCI subsystem' can't augment ACPI context with bridge information since the PCI device to get this data from is/was not available.
Issue is easy to reproduce with QEMU's 'pc' machine, which supports PCI hotplug on hostbridge slots. To reproduce, boot kernel at commit 40613da52b13 in VM started with following CLI (assuming guest root fs is installed on sda1 partition):
# qemu-system-x86_64 -M pc -m 1G -enable-kvm -cpu host \ -monitor stdio -serial file:serial.log \ -kernel arch/x86/boot/bzImage \ -append "root=/dev/sda1 console=ttyS0" \ guest_disk.img
Once guest OS is fully booted at qemu prompt:
(qemu) device_add e1000
(check serial.log) it will cause NULL pointer dereference at:
void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge) { struct pci_bus *parent = bridge->subordinate;
BUG: kernel NULL pointer dereference, address: 0000000000000018
? pci_assign_unassigned_bridge_resources+0x1f/0x260 enable_slot+0x21f/0x3e0 acpiphp_hotplug_notify+0x13d/0x260 acpi_device_hotplug+0xbc/0x540 acpi_hotplug_work_fn+0x15/0x20 process_one_work+0x1f7/0x370 worker_thread+0x45/0x3b0
The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with following sequence:
1. Suspend to RAM 2. Wake up with the same backtrace being observed: 3. 2nd suspend to RAM attempt makes laptop freeze
Fix it by using __pci_bus_assign_resources() instead of pci_assign_unassigned_bridge_resources() as we used to do, but only in case when bus doesn't have a bridge associated (to cover for the case of ACPI event on hostbridge or non existing root port).
That lets us keep hotplug on root bus working like it used to and at the same time keeps resource reassignment usable on root ports (and other 1st level bridges) that was fixed by 40613da52b13.
Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") Link: https://lore.kernel.org/r/20230726123518.2361181-2-imammedo@redhat.com Reported-by: Woody Suwalski terraluna977@gmail.com Tested-by: Woody Suwalski terraluna977@gmail.com Tested-by: Michal Koutný mkoutny@suse.com Link: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com Signed-off-by: Igor Mammedov imammedo@redhat.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Acked-by: Rafael J. Wysocki rafael@kernel.org Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/hotplug/acpiphp_glue.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -498,6 +498,7 @@ static void enable_slot(struct acpiphp_s acpiphp_native_scan_bridge(dev); } } else { + LIST_HEAD(add_list); int max, pass;
acpiphp_rescan_slot(slot); @@ -511,10 +512,15 @@ static void enable_slot(struct acpiphp_s if (pass && dev->subordinate) { check_hotplug_bridge(slot, dev); pcibios_resource_survey_bus(dev->subordinate); + if (pci_is_root_bus(bus)) + __pci_bus_size_bridges(dev->subordinate, &add_list); } } } - pci_assign_unassigned_bridge_resources(bus->self); + if (pci_is_root_bus(bus)) + __pci_bus_assign_resources(bus, &add_list, NULL); + else + pci_assign_unassigned_bridge_resources(bus->self); }
acpiphp_sanitize_bus(bus);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sanjay R Mehta sanju.mehta@amd.com
commit 583893a66d731f5da010a3fa38a0460e05f0149b upstream.
Previously, on unplug events, the TMU mode was disabled first followed by the Time Synchronization Handshake, irrespective of whether the tb_switch_tmu_rate_write() API was successful or not.
However, this caused a problem with Thunderbolt 3 (TBT3) devices, as the TSPacketInterval bits were always enabled by default, leading the host router to assume that the device router's TMU was already enabled and preventing it from initiating the Time Synchronization Handshake. As a result, TBT3 monitors experienced display flickering from the second hot plug onwards.
To address this issue, we have modified the code to only disable the Time Synchronization Handshake during TMU disable if the tb_switch_tmu_rate_write() function is successful. This ensures that the TBT3 devices function correctly and eliminates the display flickering issue.
Co-developed-by: Sanath S Sanath.S@amd.com Signed-off-by: Sanath S Sanath.S@amd.com Signed-off-by: Sanjay R Mehta sanju.mehta@amd.com Cc: stable@vger.kernel.org Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com [ USB4v2 introduced support for uni-directional TMU mode as part of d49b4f043d63 ("thunderbolt: Add support for enhanced uni-directional TMU mode") This is not a stable candidate commit, so adjust the code for backport. ] Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thunderbolt/tmu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/thunderbolt/tmu.c +++ b/drivers/thunderbolt/tmu.c @@ -415,7 +415,8 @@ int tb_switch_tmu_disable(struct tb_swit * uni-directional mode and we don't want to change it's TMU * mode. */ - tb_switch_tmu_rate_write(sw, TB_SWITCH_TMU_RATE_OFF); + ret = tb_switch_tmu_rate_write(sw, TB_SWITCH_TMU_RATE_OFF); + return ret;
tb_port_tmu_time_sync_disable(up); ret = tb_port_tmu_time_sync_disable(down);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oliver Hartkopp socketcan@hartkopp.net
commit c275a176e4b69868576e543409927ae75e3a3288 upstream.
Commit ee8b94c8510c ("can: raw: fix receiver memory leak") introduced a new reference to the CAN netdevice that has assigned CAN filters. But this new ro->dev reference did not maintain its own refcount which lead to another KASAN use-after-free splat found by Eric Dumazet.
This patch ensures a proper refcount for the CAN nedevice.
Fixes: ee8b94c8510c ("can: raw: fix receiver memory leak") Reported-by: Eric Dumazet edumazet@google.com Cc: Ziyang Xuan william.xuanziyang@huawei.com Signed-off-by: Oliver Hartkopp socketcan@hartkopp.net Link: https://lore.kernel.org/r/20230821144547.6658-3-socketcan@hartkopp.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/can/raw.c | 35 ++++++++++++++++++++++++++--------- 1 file changed, 26 insertions(+), 9 deletions(-)
--- a/net/can/raw.c +++ b/net/can/raw.c @@ -85,6 +85,7 @@ struct raw_sock { int bound; int ifindex; struct net_device *dev; + netdevice_tracker dev_tracker; struct list_head notifier; int loopback; int recv_own_msgs; @@ -285,8 +286,10 @@ static void raw_notify(struct raw_sock * case NETDEV_UNREGISTER: lock_sock(sk); /* remove current filters & unregister */ - if (ro->bound) + if (ro->bound) { raw_disable_allfilters(dev_net(dev), dev, sk); + netdev_put(dev, &ro->dev_tracker); + }
if (ro->count > 1) kfree(ro->filter); @@ -391,10 +394,12 @@ static int raw_release(struct socket *so
/* remove current filters & unregister */ if (ro->bound) { - if (ro->dev) + if (ro->dev) { raw_disable_allfilters(dev_net(ro->dev), ro->dev, sk); - else + netdev_put(ro->dev, &ro->dev_tracker); + } else { raw_disable_allfilters(sock_net(sk), NULL, sk); + } }
if (ro->count > 1) @@ -445,10 +450,10 @@ static int raw_bind(struct socket *sock, goto out; } if (dev->type != ARPHRD_CAN) { - dev_put(dev); err = -ENODEV; - goto out; + goto out_put_dev; } + if (!(dev->flags & IFF_UP)) notify_enetdown = 1;
@@ -456,7 +461,9 @@ static int raw_bind(struct socket *sock,
/* filters set by default/setsockopt */ err = raw_enable_allfilters(sock_net(sk), dev, sk); - dev_put(dev); + if (err) + goto out_put_dev; + } else { ifindex = 0;
@@ -467,18 +474,28 @@ static int raw_bind(struct socket *sock, if (!err) { if (ro->bound) { /* unregister old filters */ - if (ro->dev) + if (ro->dev) { raw_disable_allfilters(dev_net(ro->dev), ro->dev, sk); - else + /* drop reference to old ro->dev */ + netdev_put(ro->dev, &ro->dev_tracker); + } else { raw_disable_allfilters(sock_net(sk), NULL, sk); + } } ro->ifindex = ifindex; ro->bound = 1; + /* bind() ok -> hold a reference for new ro->dev */ ro->dev = dev; + if (ro->dev) + netdev_hold(ro->dev, &ro->dev_tracker, GFP_KERNEL); }
- out: +out_put_dev: + /* remove potential reference from dev_get_by_index() */ + if (dev) + dev_put(dev); +out: release_sock(sk); rtnl_unlock();
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matt Roper matthew.d.roper@intel.com
commit 718551bbed3ca5308a9f9429305dd074727e8d46 upstream.
If i915_driver_create() fails to create a valid 'i915' object, we should just disable the PCI device and return immediately without trying to call i915_probe_error() that relies on a valid i915 pointer.
Fixes: 12e6f6dc78e4 ("drm/i915/display: Handle GMD_ID identification in display code") Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lore.kernel.org/all/55236f93-dcc5-481e-b788-9f7e95b129d8@kili.mounta... Signed-off-by: Matt Roper matthew.d.roper@intel.com Reviewed-by: Gustavo Sousa gustavo.sousa@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230601173804.557756-1-matthe... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/i915_driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/i915/i915_driver.c +++ b/drivers/gpu/drm/i915/i915_driver.c @@ -747,8 +747,8 @@ int i915_driver_probe(struct pci_dev *pd
i915 = i915_driver_create(pdev, ent); if (IS_ERR(i915)) { - ret = PTR_ERR(i915); - goto out_pci_disable; + pci_disable_device(pdev); + return PTR_ERR(i915); }
ret = i915_driver_early_probe(i915);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yin Fengwei fengwei.yin@intel.com
commit 2f406263e3e954aa24c1248edcfa9be0c1bb30fa upstream.
Patch series "don't use mapcount() to check large folio sharing", v2.
In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(), folio_mapcount() is used to check whether the folio is shared. But it's not correct as folio_mapcount() returns total mapcount of large folio.
Use folio_estimated_sharers() here as the estimated number is enough.
This patchset will fix the cases: User space application call madvise() with MADV_FREE, MADV_COLD and MADV_PAGEOUT for specific address range. There are THP mapped to the range. Without the patchset, the THP is skipped. With the patch, the THP will be split and handled accordingly.
David reported the cow self test skip some cases because of MADV_PAGEOUT skip THP: https://lore.kernel.org/linux-mm/9e92e42d-488f-47db-ac9d-75b24cd0d037@intel.... and I confirmed this patchset make it work again.
This patch (of 3):
Commit 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios") replaced the page_mapcount() with folio_mapcount() to check whether the folio is shared by other mapping.
It's not correct for large folio. folio_mapcount() returns the total mapcount of large folio which is not suitable to detect whether the folio is shared.
Use folio_estimated_sharers() which returns a estimated number of shares. That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise. But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com Fixes: 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios") Signed-off-by: Yin Fengwei fengwei.yin@intel.com Reviewed-by: Yu Zhao yuzhao@google.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: David Hildenbrand david@redhat.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Matthew Wilcox willy@infradead.org Cc: Minchan Kim minchan@kernel.org Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/madvise.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/madvise.c +++ b/mm/madvise.c @@ -376,7 +376,7 @@ static int madvise_cold_or_pageout_pte_r folio = pfn_folio(pmd_pfn(orig_pmd));
/* Do not interfere with other mappings of this folio */ - if (folio_mapcount(folio) != 1) + if (folio_estimated_sharers(folio) != 1) goto huge_unlock;
if (pageout_anon_only_filter && !folio_test_anon(folio)) @@ -448,7 +448,7 @@ regular_folio: * are sure it's worth. Split it if we are only owner. */ if (folio_test_large(folio)) { - if (folio_mapcount(folio) != 1) + if (folio_estimated_sharers(folio) != 1) break; if (pageout_anon_only_filter && !folio_test_anon(folio)) break;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yin Fengwei fengwei.yin@intel.com
commit 0e0e9bd5f7b9d40fd03b70092367247d52da1db0 upstream.
Commit 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio") replaced the page_mapcount() with folio_mapcount() to check whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total mapcount of large folio which is not suitable to detect whether the folio is shared.
Use folio_estimated_sharers() which returns a estimated number of shares. That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise. But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com Fixes: 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio") Signed-off-by: Yin Fengwei fengwei.yin@intel.com Reviewed-by: Yu Zhao yuzhao@google.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: David Hildenbrand david@redhat.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Matthew Wilcox willy@infradead.org Cc: Minchan Kim minchan@kernel.org Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/madvise.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/madvise.c +++ b/mm/madvise.c @@ -666,8 +666,8 @@ static int madvise_free_pte_range(pmd_t * deactivate all pages. */ if (folio_test_large(folio)) { - if (folio_mapcount(folio) != 1) - goto out; + if (folio_estimated_sharers(folio) != 1) + break; folio_get(folio); if (!folio_trylock(folio)) { folio_put(folio);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhu Wang wangzhu9@huawei.com
commit 1bd3a76880b2bce017987cf53780b372cf59528e upstream.
Commit 41320b18a0e0 ("scsi: snic: Fix possible memory leak if device_add() fails") fixed the memory leak caused by dev_set_name() when device_add() failed. However, it did not consider that 'tgt' has already been released when put_device(&tgt->dev) is called. Remove kfree(tgt) in the error path to avoid double free of 'tgt' and move put_device(&tgt->dev) after the removed kfree(tgt) to avoid a use-after-free.
Fixes: 41320b18a0e0 ("scsi: snic: Fix possible memory leak if device_add() fails") Signed-off-by: Zhu Wang wangzhu9@huawei.com Link: https://lore.kernel.org/r/20230819083941.164365-1-wangzhu9@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/snic/snic_disc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/scsi/snic/snic_disc.c +++ b/drivers/scsi/snic/snic_disc.c @@ -303,12 +303,11 @@ snic_tgt_create(struct snic *snic, struc "Snic Tgt: device_add, with err = %d\n", ret);
- put_device(&tgt->dev); put_device(&snic->shost->shost_gendev); spin_lock_irqsave(snic->shost->host_lock, flags); list_del(&tgt->list); spin_unlock_irqrestore(snic->shost->host_lock, flags); - kfree(tgt); + put_device(&tgt->dev); tgt = NULL;
return tgt;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neil Armstrong neil.armstrong@linaro.org
commit c422fbd5cb58c9a078172ae1e9750971b738a197 upstream.
The qunipro_g4_sel clear is also needed for new platforms with major version > 5. Fix the version check to take this into account.
Fixes: 9c02aa24bf40 ("scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW version major 5") Acked-by: Manivannan Sadhasivam mani@kernel.org Reviewed-by: Nitin Rawat quic_nitirawa@quicinc.com Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/20230821-topic-sm8x50-upstream-ufs-major-5-plus-v2... Reviewed-by: "Bao D. Nguyen" quic_nguyenb@quicinc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ufs/host/ufs-qcom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/ufs/host/ufs-qcom.c +++ b/drivers/ufs/host/ufs-qcom.c @@ -225,7 +225,7 @@ static void ufs_qcom_select_unipro_mode( ufs_qcom_cap_qunipro(host) ? QUNIPRO_SEL : 0, REG_UFS_CFG1);
- if (host->hw_ver.major == 0x05) + if (host->hw_ver.major >= 0x05) ufshcd_rmwl(host->hba, QUNIPRO_G4_SEL, 0, REG_UFS_CFG0);
/* make sure above configuration is applied before we return */
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhu Wang wangzhu9@huawei.com
commit 60c5fd2e8f3c42a5abc565ba9876ead1da5ad2b7 upstream.
The raid_component_add() function was added to the kernel tree via patch "[SCSI] embryonic RAID class" (2005). Remove this function since it never has had any callers in the Linux kernel. And also raid_component_release() is only used in raid_component_add(), so it is also removed.
Signed-off-by: Zhu Wang wangzhu9@huawei.com Link: https://lore.kernel.org/r/20230822015254.184270-1-wangzhu9@huawei.com Reviewed-by: Bart Van Assche bvanassche@acm.org Fixes: 04b5b5cb0136 ("scsi: core: Fix possible memory leak if device_add() fails") Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/raid_class.c | 48 --------------------------------------------- include/linux/raid_class.h | 4 --- 2 files changed, 52 deletions(-)
--- a/drivers/scsi/raid_class.c +++ b/drivers/scsi/raid_class.c @@ -209,54 +209,6 @@ raid_attr_ro_state(level); raid_attr_ro_fn(resync); raid_attr_ro_state_fn(state);
-static void raid_component_release(struct device *dev) -{ - struct raid_component *rc = - container_of(dev, struct raid_component, dev); - dev_printk(KERN_ERR, rc->dev.parent, "COMPONENT RELEASE\n"); - put_device(rc->dev.parent); - kfree(rc); -} - -int raid_component_add(struct raid_template *r,struct device *raid_dev, - struct device *component_dev) -{ - struct device *cdev = - attribute_container_find_class_device(&r->raid_attrs.ac, - raid_dev); - struct raid_component *rc; - struct raid_data *rd = dev_get_drvdata(cdev); - int err; - - rc = kzalloc(sizeof(*rc), GFP_KERNEL); - if (!rc) - return -ENOMEM; - - INIT_LIST_HEAD(&rc->node); - device_initialize(&rc->dev); - rc->dev.release = raid_component_release; - rc->dev.parent = get_device(component_dev); - rc->num = rd->component_count++; - - dev_set_name(&rc->dev, "component-%d", rc->num); - list_add_tail(&rc->node, &rd->component_list); - rc->dev.class = &raid_class.class; - err = device_add(&rc->dev); - if (err) - goto err_out; - - return 0; - -err_out: - put_device(&rc->dev); - list_del(&rc->node); - rd->component_count--; - put_device(component_dev); - kfree(rc); - return err; -} -EXPORT_SYMBOL(raid_component_add); - struct raid_template * raid_class_attach(struct raid_function_template *ft) { --- a/include/linux/raid_class.h +++ b/include/linux/raid_class.h @@ -77,7 +77,3 @@ DEFINE_RAID_ATTRIBUTE(enum raid_state, s struct raid_template *raid_class_attach(struct raid_function_template *); void raid_class_release(struct raid_template *); - -int __must_check raid_component_add(struct raid_template *, struct device *, - struct device *); -
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Biju Das biju.das.jz@bp.renesas.com
[ Upstream commit 2746f13f6f1df7999001d6595b16f789ecc28ad1 ]
The COMMON_CLK config is not enabled in some of the architectures. This causes below build issues:
pwm-rz-mtu3.c:(.text+0x114): undefined reference to `clk_rate_exclusive_put' pwm-rz-mtu3.c:(.text+0x32c): undefined reference to `clk_rate_exclusive_get'
Fix these issues by moving clk_rate_exclusive_{get,put} inside COMMON_CLK code block, as clk.c is enabled by COMMON_CLK.
Fixes: 55e9b8b7b806 ("clk: add clk_rate_exclusive api") Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/all/202307251752.vLfmmhYm-lkp@intel.com/ Signed-off-by: Biju Das biju.das.jz@bp.renesas.com Link: https://lore.kernel.org/r/20230725175140.361479-1-biju.das.jz@bp.renesas.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/clk.h | 80 ++++++++++++++++++++++----------------------- 1 file changed, 40 insertions(+), 40 deletions(-)
diff --git a/include/linux/clk.h b/include/linux/clk.h index 1ef0133242374..06f1b292f8a00 100644 --- a/include/linux/clk.h +++ b/include/linux/clk.h @@ -183,6 +183,39 @@ int clk_get_scaled_duty_cycle(struct clk *clk, unsigned int scale); */ bool clk_is_match(const struct clk *p, const struct clk *q);
+/** + * clk_rate_exclusive_get - get exclusivity over the rate control of a + * producer + * @clk: clock source + * + * This function allows drivers to get exclusive control over the rate of a + * provider. It prevents any other consumer to execute, even indirectly, + * opereation which could alter the rate of the provider or cause glitches + * + * If exlusivity is claimed more than once on clock, even by the same driver, + * the rate effectively gets locked as exclusivity can't be preempted. + * + * Must not be called from within atomic context. + * + * Returns success (0) or negative errno. + */ +int clk_rate_exclusive_get(struct clk *clk); + +/** + * clk_rate_exclusive_put - release exclusivity over the rate control of a + * producer + * @clk: clock source + * + * This function allows drivers to release the exclusivity it previously got + * from clk_rate_exclusive_get() + * + * The caller must balance the number of clk_rate_exclusive_get() and + * clk_rate_exclusive_put() calls. + * + * Must not be called from within atomic context. + */ +void clk_rate_exclusive_put(struct clk *clk); + #else
static inline int clk_notifier_register(struct clk *clk, @@ -236,6 +269,13 @@ static inline bool clk_is_match(const struct clk *p, const struct clk *q) return p == q; }
+static inline int clk_rate_exclusive_get(struct clk *clk) +{ + return 0; +} + +static inline void clk_rate_exclusive_put(struct clk *clk) {} + #endif
#ifdef CONFIG_HAVE_CLK_PREPARE @@ -583,38 +623,6 @@ struct clk *devm_clk_get_optional_enabled(struct device *dev, const char *id); */ struct clk *devm_get_clk_from_child(struct device *dev, struct device_node *np, const char *con_id); -/** - * clk_rate_exclusive_get - get exclusivity over the rate control of a - * producer - * @clk: clock source - * - * This function allows drivers to get exclusive control over the rate of a - * provider. It prevents any other consumer to execute, even indirectly, - * opereation which could alter the rate of the provider or cause glitches - * - * If exlusivity is claimed more than once on clock, even by the same driver, - * the rate effectively gets locked as exclusivity can't be preempted. - * - * Must not be called from within atomic context. - * - * Returns success (0) or negative errno. - */ -int clk_rate_exclusive_get(struct clk *clk); - -/** - * clk_rate_exclusive_put - release exclusivity over the rate control of a - * producer - * @clk: clock source - * - * This function allows drivers to release the exclusivity it previously got - * from clk_rate_exclusive_get() - * - * The caller must balance the number of clk_rate_exclusive_get() and - * clk_rate_exclusive_put() calls. - * - * Must not be called from within atomic context. - */ -void clk_rate_exclusive_put(struct clk *clk);
/** * clk_enable - inform the system when the clock source should be running. @@ -974,14 +982,6 @@ static inline void clk_bulk_put_all(int num_clks, struct clk_bulk_data *clks) {}
static inline void devm_clk_put(struct device *dev, struct clk *clk) {}
- -static inline int clk_rate_exclusive_get(struct clk *clk) -{ - return 0; -} - -static inline void clk_rate_exclusive_put(struct clk *clk) {} - static inline int clk_enable(struct clk *clk) { return 0;
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chao Song chao.song@linux.intel.com
[ Upstream commit 2d218b45848b92b03b220bf4d9bef29f058f866f ]
The call to snd_sof_find_spcm_dai() could return NULL, add nullable check for the return value to avoid null pointer defenrece.
Fixes: 7cb19007baba ("ASoC: SOF: ipc4-pcm: add hw_params") Signed-off-by: Chao Song chao.song@linux.intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Signed-off-by: Peter Ujfalusi peter.ujfalusi@linux.intel.com Link: https://lore.kernel.org/r/20230816133311.7523-1-peter.ujfalusi@linux.intel.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sof/ipc4-pcm.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/sound/soc/sof/ipc4-pcm.c b/sound/soc/sof/ipc4-pcm.c index 9e2b6c45080dd..49eb98605518a 100644 --- a/sound/soc/sof/ipc4-pcm.c +++ b/sound/soc/sof/ipc4-pcm.c @@ -708,6 +708,9 @@ static int sof_ipc4_pcm_hw_params(struct snd_soc_component *component, struct snd_sof_pcm *spcm;
spcm = snd_sof_find_spcm_dai(component, rtd); + if (!spcm) + return -EINVAL; + time_info = spcm->stream[substream->stream].private; /* delay calculation is not supported by current fw_reg ABI */ if (!time_info)
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Strozek mstrozek@opensource.cirrus.com
[ Upstream commit 897a6b5a030e62c21566551c870d81740f82ca13 ]
Use a device property "cirrus,firmware-uid" to get the unique firmware identifier instead of using ACPI _SUB. There aren't any products that use _SUB.
There will not usually be a _SUB in Soundwire nodes. The ACPI can use a _DSD section for custom properties.
There is also a need to support instantiating this driver using software nodes. This is for systems where the CS35L56 is a back-end device and the ACPI refers only to the front-end audio device - there will not be any ACPI references to CS35L56.
Fixes: e49611252900 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56") Signed-off-by: Maciej Strozek mstrozek@opensource.cirrus.com Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Link: https://lore.kernel.org/r/20230817112712.16637-2-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs35l56.c | 31 ++++++++++++------------------- 1 file changed, 12 insertions(+), 19 deletions(-)
diff --git a/sound/soc/codecs/cs35l56.c b/sound/soc/codecs/cs35l56.c index f3fee448d759e..6a2b0797f3c7d 100644 --- a/sound/soc/codecs/cs35l56.c +++ b/sound/soc/codecs/cs35l56.c @@ -5,7 +5,6 @@ // Copyright (C) 2023 Cirrus Logic, Inc. and // Cirrus Logic International Semiconductor Ltd.
-#include <linux/acpi.h> #include <linux/completion.h> #include <linux/debugfs.h> #include <linux/delay.h> @@ -1327,26 +1326,22 @@ static int cs35l56_dsp_init(struct cs35l56_private *cs35l56) return 0; }
-static int cs35l56_acpi_get_name(struct cs35l56_private *cs35l56) +static int cs35l56_get_firmware_uid(struct cs35l56_private *cs35l56) { - acpi_handle handle = ACPI_HANDLE(cs35l56->dev); - const char *sub; + struct device *dev = cs35l56->dev; + const char *prop; + int ret;
- /* If there is no ACPI_HANDLE, there is no ACPI for this system, return 0 */ - if (!handle) + ret = device_property_read_string(dev, "cirrus,firmware-uid", &prop); + /* If bad sw node property, return 0 and fallback to legacy firmware path */ + if (ret < 0) return 0;
- sub = acpi_get_subsystem_id(handle); - if (IS_ERR(sub)) { - /* If bad ACPI, return 0 and fallback to legacy firmware path, otherwise fail */ - if (PTR_ERR(sub) == -ENODATA) - return 0; - else - return PTR_ERR(sub); - } + cs35l56->dsp.system_name = devm_kstrdup(dev, prop, GFP_KERNEL); + if (cs35l56->dsp.system_name == NULL) + return -ENOMEM;
- cs35l56->dsp.system_name = sub; - dev_dbg(cs35l56->dev, "Subsystem ID: %s\n", cs35l56->dsp.system_name); + dev_dbg(dev, "Firmware UID: %s\n", cs35l56->dsp.system_name);
return 0; } @@ -1390,7 +1385,7 @@ int cs35l56_common_probe(struct cs35l56_private *cs35l56) gpiod_set_value_cansleep(cs35l56->reset_gpio, 1); }
- ret = cs35l56_acpi_get_name(cs35l56); + ret = cs35l56_get_firmware_uid(cs35l56); if (ret != 0) goto err;
@@ -1577,8 +1572,6 @@ void cs35l56_remove(struct cs35l56_private *cs35l56)
regcache_cache_only(cs35l56->regmap, true);
- kfree(cs35l56->dsp.system_name); - gpiod_set_value_cansleep(cs35l56->reset_gpio, 0); regulator_bulk_disable(ARRAY_SIZE(cs35l56->supplies), cs35l56->supplies); }
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Biju Das biju.das.jz@bp.renesas.com
[ Upstream commit 661efa2284bbc2338da0424e219603f034072c74 ]
Fix the below random NULL pointer crash during boot by serializing pinctrl group and function creation/remove calls in rzg2l_dt_subnode_to_map() with mutex lock.
Crash log: pc : __pi_strcmp+0x20/0x140 lr : pinmux_func_name_to_selector+0x68/0xa4 Call trace: __pi_strcmp+0x20/0x140 pinmux_generic_add_function+0x34/0xcc rzg2l_dt_subnode_to_map+0x314/0x44c rzg2l_dt_node_to_map+0x164/0x194 pinctrl_dt_to_map+0x218/0x37c create_pinctrl+0x70/0x3d8
While at it, add comments for bitmap_lock and lock.
Fixes: c4c4637eb57f ("pinctrl: renesas: Add RZ/G2L pin and gpio controller driver") Tested-by: Chris Paterson Chris.Paterson2@renesas.com Signed-off-by: Biju Das biju.das.jz@bp.renesas.com Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20230815131558.33787-2-biju.das.jz@bp.renesas.com Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/renesas/pinctrl-rzg2l.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/pinctrl/renesas/pinctrl-rzg2l.c b/drivers/pinctrl/renesas/pinctrl-rzg2l.c index b53d26167da52..6e8a76556e238 100644 --- a/drivers/pinctrl/renesas/pinctrl-rzg2l.c +++ b/drivers/pinctrl/renesas/pinctrl-rzg2l.c @@ -11,6 +11,7 @@ #include <linux/interrupt.h> #include <linux/io.h> #include <linux/module.h> +#include <linux/mutex.h> #include <linux/of_device.h> #include <linux/of_irq.h> #include <linux/seq_file.h> @@ -149,10 +150,11 @@ struct rzg2l_pinctrl { struct gpio_chip gpio_chip; struct pinctrl_gpio_range gpio_range; DECLARE_BITMAP(tint_slot, RZG2L_TINT_MAX_INTERRUPT); - spinlock_t bitmap_lock; + spinlock_t bitmap_lock; /* protect tint_slot bitmap */ unsigned int hwirq[RZG2L_TINT_MAX_INTERRUPT];
- spinlock_t lock; + spinlock_t lock; /* lock read/write registers */ + struct mutex mutex; /* serialize adding groups and functions */ };
static const unsigned int iolh_groupa_mA[] = { 2, 4, 8, 12 }; @@ -362,11 +364,13 @@ static int rzg2l_dt_subnode_to_map(struct pinctrl_dev *pctldev, name = np->name; }
+ mutex_lock(&pctrl->mutex); + /* Register a single pin group listing all the pins we read from DT */ gsel = pinctrl_generic_add_group(pctldev, name, pins, num_pinmux, NULL); if (gsel < 0) { ret = gsel; - goto done; + goto unlock; }
/* @@ -380,6 +384,8 @@ static int rzg2l_dt_subnode_to_map(struct pinctrl_dev *pctldev, goto remove_group; }
+ mutex_unlock(&pctrl->mutex); + maps[idx].type = PIN_MAP_TYPE_MUX_GROUP; maps[idx].data.mux.group = name; maps[idx].data.mux.function = name; @@ -391,6 +397,8 @@ static int rzg2l_dt_subnode_to_map(struct pinctrl_dev *pctldev,
remove_group: pinctrl_generic_remove_group(pctldev, gsel); +unlock: + mutex_unlock(&pctrl->mutex); done: *index = idx; kfree(configs); @@ -1509,6 +1517,7 @@ static int rzg2l_pinctrl_probe(struct platform_device *pdev)
spin_lock_init(&pctrl->lock); spin_lock_init(&pctrl->bitmap_lock); + mutex_init(&pctrl->mutex);
platform_set_drvdata(pdev, pctrl);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Biju Das biju.das.jz@bp.renesas.com
[ Upstream commit f982b9d57e7f834138fc908804fe66f646f2b108 ]
Fix the below random NULL pointer crash during boot by serializing pinctrl group and function creation/remove calls in rzv2m_dt_subnode_to_map() with mutex lock.
Crash logs: pc : __pi_strcmp+0x20/0x140 lr : pinmux_func_name_to_selector+0x68/0xa4 Call trace: __pi_strcmp+0x20/0x140 pinmux_generic_add_function+0x34/0xcc rzv2m_dt_subnode_to_map+0x2e4/0x418 rzv2m_dt_node_to_map+0x15c/0x18c pinctrl_dt_to_map+0x218/0x37c create_pinctrl+0x70/0x3d8
While at it, add a comment for lock.
Fixes: 92a9b8252576 ("pinctrl: renesas: Add RZ/V2M pin and gpio controller driver") Signed-off-by: Biju Das biju.das.jz@bp.renesas.com Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20230815131558.33787-3-biju.das.jz@bp.renesas.com Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/renesas/pinctrl-rzv2m.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/pinctrl/renesas/pinctrl-rzv2m.c b/drivers/pinctrl/renesas/pinctrl-rzv2m.c index 35b23c1a5684d..9146101ea9e2f 100644 --- a/drivers/pinctrl/renesas/pinctrl-rzv2m.c +++ b/drivers/pinctrl/renesas/pinctrl-rzv2m.c @@ -14,6 +14,7 @@ #include <linux/gpio/driver.h> #include <linux/io.h> #include <linux/module.h> +#include <linux/mutex.h> #include <linux/of_device.h> #include <linux/spinlock.h>
@@ -123,7 +124,8 @@ struct rzv2m_pinctrl { struct gpio_chip gpio_chip; struct pinctrl_gpio_range gpio_range;
- spinlock_t lock; + spinlock_t lock; /* lock read/write registers */ + struct mutex mutex; /* serialize adding groups and functions */ };
static const unsigned int drv_1_8V_group2_uA[] = { 1800, 3800, 7800, 11000 }; @@ -322,11 +324,13 @@ static int rzv2m_dt_subnode_to_map(struct pinctrl_dev *pctldev, name = np->name; }
+ mutex_lock(&pctrl->mutex); + /* Register a single pin group listing all the pins we read from DT */ gsel = pinctrl_generic_add_group(pctldev, name, pins, num_pinmux, NULL); if (gsel < 0) { ret = gsel; - goto done; + goto unlock; }
/* @@ -340,6 +344,8 @@ static int rzv2m_dt_subnode_to_map(struct pinctrl_dev *pctldev, goto remove_group; }
+ mutex_unlock(&pctrl->mutex); + maps[idx].type = PIN_MAP_TYPE_MUX_GROUP; maps[idx].data.mux.group = name; maps[idx].data.mux.function = name; @@ -351,6 +357,8 @@ static int rzv2m_dt_subnode_to_map(struct pinctrl_dev *pctldev,
remove_group: pinctrl_generic_remove_group(pctldev, gsel); +unlock: + mutex_unlock(&pctrl->mutex); done: *index = idx; kfree(configs); @@ -1071,6 +1079,7 @@ static int rzv2m_pinctrl_probe(struct platform_device *pdev) }
spin_lock_init(&pctrl->lock); + mutex_init(&pctrl->mutex);
platform_set_drvdata(pdev, pctrl);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Biju Das biju.das.jz@bp.renesas.com
[ Upstream commit 8fcc1c40b747069644db6102c1d84c942c9d4d86 ]
The pinctrl group and function creation/remove calls expect caller to take care of locking. Add lock around these functions.
Fixes: b59d0e782706 ("pinctrl: Add RZ/A2 pin and gpio controller") Signed-off-by: Biju Das biju.das.jz@bp.renesas.com Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20230815131558.33787-4-biju.das.jz@bp.renesas.com Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/renesas/pinctrl-rza2.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/pinctrl/renesas/pinctrl-rza2.c b/drivers/pinctrl/renesas/pinctrl-rza2.c index 40b1326a10776..5591ddf16fdfd 100644 --- a/drivers/pinctrl/renesas/pinctrl-rza2.c +++ b/drivers/pinctrl/renesas/pinctrl-rza2.c @@ -14,6 +14,7 @@ #include <linux/gpio/driver.h> #include <linux/io.h> #include <linux/module.h> +#include <linux/mutex.h> #include <linux/of_device.h> #include <linux/pinctrl/pinmux.h>
@@ -46,6 +47,7 @@ struct rza2_pinctrl_priv { struct pinctrl_dev *pctl; struct pinctrl_gpio_range gpio_range; int npins; + struct mutex mutex; /* serialize adding groups and functions */ };
#define RZA2_PDR(port) (0x0000 + (port) * 2) /* Direction 16-bit */ @@ -358,10 +360,14 @@ static int rza2_dt_node_to_map(struct pinctrl_dev *pctldev, psel_val[i] = MUX_FUNC(value); }
+ mutex_lock(&priv->mutex); + /* Register a single pin group listing all the pins we read from DT */ gsel = pinctrl_generic_add_group(pctldev, np->name, pins, npins, NULL); - if (gsel < 0) - return gsel; + if (gsel < 0) { + ret = gsel; + goto unlock; + }
/* * Register a single group function where the 'data' is an array PSEL @@ -390,6 +396,8 @@ static int rza2_dt_node_to_map(struct pinctrl_dev *pctldev, (*map)->data.mux.function = np->name; *num_maps = 1;
+ mutex_unlock(&priv->mutex); + return 0;
remove_function: @@ -398,6 +406,9 @@ static int rza2_dt_node_to_map(struct pinctrl_dev *pctldev, remove_group: pinctrl_generic_remove_group(pctldev, gsel);
+unlock: + mutex_unlock(&priv->mutex); + dev_err(priv->dev, "Unable to parse DT node %s\n", np->name);
return ret; @@ -473,6 +484,8 @@ static int rza2_pinctrl_probe(struct platform_device *pdev) if (IS_ERR(priv->base)) return PTR_ERR(priv->base);
+ mutex_init(&priv->mutex); + platform_set_drvdata(pdev, priv);
priv->npins = (int)(uintptr_t)of_device_get_match_data(&pdev->dev) *
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Clark robdclark@chromium.org
[ Upstream commit e531fdb5cd5ee2564b7fe10c8a9219e2b2fac61e ]
If a signal callback releases the sw_sync fence, that will trigger a deadlock as the timeline_fence_release recurses onto the fence->lock (used both for signaling and the the timeline tree).
To avoid that, temporarily hold an extra reference to the signalled fences until after we drop the lock.
(This is an alternative implementation of https://patchwork.kernel.org/patch/11664717/ which avoids some potential UAF issues with the original patch.)
v2: Remove now obsolete comment, use list_move_tail() and list_del_init()
Reported-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Fixes: d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free") Signed-off-by: Rob Clark robdclark@chromium.org Link: https://patchwork.freedesktop.org/patch/msgid/20230818145939.39697-1-robdcla... Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Christian König christian.koenig@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma-buf/sw_sync.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index 348b3a9170fa4..7f5ed1aa7a9f8 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c @@ -191,6 +191,7 @@ static const struct dma_fence_ops timeline_fence_ops = { */ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) { + LIST_HEAD(signalled); struct sync_pt *pt, *next;
trace_sync_timeline(obj); @@ -203,21 +204,20 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) if (!timeline_fence_signaled(&pt->base)) break;
- list_del_init(&pt->link); + dma_fence_get(&pt->base); + + list_move_tail(&pt->link, &signalled); rb_erase(&pt->node, &obj->pt_tree);
- /* - * A signal callback may release the last reference to this - * fence, causing it to be freed. That operation has to be - * last to avoid a use after free inside this loop, and must - * be after we remove the fence from the timeline in order to - * prevent deadlocking on timeline->lock inside - * timeline_fence_release(). - */ dma_fence_signal_locked(&pt->base); }
spin_unlock_irq(&obj->lock); + + list_for_each_entry_safe(pt, next, &signalled, link) { + list_del_init(&pt->link); + dma_fence_put(&pt->base); + } }
/**
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bartosz Golaszewski bartosz.golaszewski@linaro.org
[ Upstream commit ab4109f91b328ff5cb5e1279f64d443241add2d1 ]
If a GPIO simulator device is unbound with interrupts still requested, we will hit a use-after-free issue in __irq_domain_deactivate_irq(). The owner of the irq domain must dispose of all mappings before destroying the domain object.
Fixes: cb8c474e79be ("gpio: sim: new testing module") Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-sim.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/drivers/gpio/gpio-sim.c b/drivers/gpio/gpio-sim.c index f1f6f1c329877..8fb11a5395eb8 100644 --- a/drivers/gpio/gpio-sim.c +++ b/drivers/gpio/gpio-sim.c @@ -291,6 +291,15 @@ static void gpio_sim_mutex_destroy(void *data) mutex_destroy(lock); }
+static void gpio_sim_dispose_mappings(void *data) +{ + struct gpio_sim_chip *chip = data; + unsigned int i; + + for (i = 0; i < chip->gc.ngpio; i++) + irq_dispose_mapping(irq_find_mapping(chip->irq_sim, i)); +} + static void gpio_sim_sysfs_remove(void *data) { struct gpio_sim_chip *chip = data; @@ -406,6 +415,10 @@ static int gpio_sim_add_bank(struct fwnode_handle *swnode, struct device *dev) if (IS_ERR(chip->irq_sim)) return PTR_ERR(chip->irq_sim);
+ ret = devm_add_action_or_reset(dev, gpio_sim_dispose_mappings, chip); + if (ret) + return ret; + mutex_init(&chip->lock); ret = devm_add_action_or_reset(dev, gpio_sim_mutex_destroy, &chip->lock);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bartosz Golaszewski bartosz.golaszewski@linaro.org
[ Upstream commit 6e39c1ac688161b4db3617aabbca589b395242bc ]
Associate the swnode of the GPIO device's (which is the interrupt controller here) with the irq domain. Otherwise the interrupt-controller device attribute is a no-op.
Fixes: cb8c474e79be ("gpio: sim: new testing module") Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-sim.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpio-sim.c b/drivers/gpio/gpio-sim.c index 8fb11a5395eb8..533d815725794 100644 --- a/drivers/gpio/gpio-sim.c +++ b/drivers/gpio/gpio-sim.c @@ -411,7 +411,7 @@ static int gpio_sim_add_bank(struct fwnode_handle *swnode, struct device *dev) if (!chip->pull_map) return -ENOMEM;
- chip->irq_sim = devm_irq_domain_create_sim(dev, NULL, num_lines); + chip->irq_sim = devm_irq_domain_create_sim(dev, swnode, num_lines); if (IS_ERR(chip->irq_sim)) return PTR_ERR(chip->irq_sim);
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit c008323fe361bd62a43d9fb29737dacd5c067fb7 ]
Lenovo 82SJ doesn't have DMIC connected like 82V2 does. Narrow the match down to only cover 82V2.
Reported-by: prosenfeld@Yuhsbstudents.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217063 Fixes: 2232b2dd8cd4 ("ASoC: amd: yc: Add Lenovo Yoga Slim 7 Pro X to quirks table") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com Link: https://lore.kernel.org/r/20230824011149.1395-1-mario.limonciello@amd.com Signed-off-by: Mark Brown <broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/amd/yc/acp6x-mach.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/amd/yc/acp6x-mach.c b/sound/soc/amd/yc/acp6x-mach.c index d80adbea05219..5310ba0734b14 100644 --- a/sound/soc/amd/yc/acp6x-mach.c +++ b/sound/soc/amd/yc/acp6x-mach.c @@ -217,7 +217,7 @@ static const struct dmi_system_id yc_acp_quirk_table[] = { .driver_data = &acp6x_card, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), - DMI_MATCH(DMI_PRODUCT_NAME, "82"), + DMI_MATCH(DMI_PRODUCT_NAME, "82V2"), } }, {
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liam R. Howlett Liam.Howlett@oracle.com
[ Upstream commit cfeb6ae8bcb96ccf674724f223661bbcef7b0d0b ]
The current implementation of append may cause duplicate data and/or incorrect ranges to be returned to a reader during an update. Although this has not been reported or seen, disable the append write operation while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was artificially created by separating the writer and reader code:
Writer: reader: mas_wr_append set end pivot updates end metata Detects write to last slot last slot write is to start of slot store current contents in slot overwrite old end pivot mas_next_slot(): read end metadata read old end pivot return with incorrect range store new value
Alternatively:
Writer: reader: mas_wr_append set end pivot updates end metata Detects write to last slot last lost write to end of slot store value mas_next_slot(): read end metadata read old end pivot read new end pivot return with incorrect range set old end pivot
There may be other accesses that are not safe since we are now updating both metadata and pointers, so disabling append if there could be rcu readers is the safest action.
Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett Liam.Howlett@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- lib/maple_tree.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c index bb28a49d173c0..3315eaf93f563 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -4315,6 +4315,9 @@ static inline bool mas_wr_append(struct ma_wr_state *wr_mas) struct ma_state *mas = wr_mas->mas; unsigned char node_pivots = mt_pivots[wr_mas->type];
+ if (mt_in_rcu(mas->tree)) + return false; + if ((mas->index != wr_mas->r_min) && (mas->last == wr_mas->r_max)) { if (new_end < node_pivots) wr_mas->pivots[new_end] = wr_mas->pivots[end];
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
commit fd0a7ec379dbf21b7bfd81914381ae5281706ef5 upstream.
The vangogh driver just gained a link time dependency that now causes randconfig builds to fail:
x86_64-linux-ld: sound/soc/amd/vangogh/pci-acp5x.o: in function `snd_acp5x_probe': pci-acp5x.c:(.text+0xbb): undefined reference to `snd_amd_acp_find_config'
Fixes: e89f45edb747e ("ASoC: amd: vangogh: Add check for acp config flags in vangogh platform") Signed-off-by: Arnd Bergmann arnd@arndb.de Link: https://lore.kernel.org/r/20230605085839.2157268-1-arnd@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/amd/Kconfig | 1 + 1 file changed, 1 insertion(+)
--- a/sound/soc/amd/Kconfig +++ b/sound/soc/amd/Kconfig @@ -71,6 +71,7 @@ config SND_SOC_AMD_RENOIR_MACH config SND_SOC_AMD_ACP5x tristate "AMD Audio Coprocessor-v5.x I2S support" depends on X86 && PCI + select SND_AMD_ACP_CONFIG help This option enables ACP v5.x support on AMD platform
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Günther Noack gnoack3000@gmail.com
commit 3f29d9ee323ae5cda59d144d1f8b0b10ea065be0 upstream.
Clarifies that the LEGACY_TIOCSTI setting is safe to turn off even when running BRLTTY, as it was introduced in commit 690c8b804ad2 ("TIOCSTI: always enable for CAP_SYS_ADMIN").
Signed-off-by: Günther Noack gnoack3000@gmail.com Reviewed-by: Samuel Thibault samuel.thibault@ens-lyon.org Link: https://lore.kernel.org/r/20230808201115.23993-1-gnoack3000@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/Kconfig | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/tty/Kconfig +++ b/drivers/tty/Kconfig @@ -164,6 +164,9 @@ config LEGACY_TIOCSTI userspace depends on this functionality to continue operating normally.
+ Processes which run with CAP_SYS_ADMIN, such as BRLTTY, can + use TIOCSTI even when this is set to N. + This functionality can be changed at runtime with the dev.tty.legacy_tiocsti sysctl. This configuration option sets the default value of the sysctl.
6.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 08713cb006b6f07434f276c5ee214fb20c7fd965 upstream.
Jakub Kicinski says: We've got some new kdoc warnings here: net/netfilter/nft_set_pipapo.c:1557: warning: Function parameter or member '_set' not described in 'pipapo_gc' net/netfilter/nft_set_pipapo.c:1557: warning: Excess function parameter 'set' description in 'pipapo_gc' include/net/netfilter/nf_tables.h:577: warning: Function parameter or member 'dead' not described in 'nft_set'
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Reported-by: Jakub Kicinski kuba@kernel.org Closes: https://lore.kernel.org/netdev/20230810104638.746e46f1@kernel.org/ Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 1 + net/netfilter/nft_set_pipapo.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -534,6 +534,7 @@ struct nft_set_elem_expr { * @expr: stateful expression * @ops: set ops * @flags: set flags + * @dead: set will be freed, never cleared * @genmask: generation mask * @klen: key length * @dlen: data length --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -1550,7 +1550,7 @@ static void nft_pipapo_gc_deactivate(str
/** * pipapo_gc() - Drop expired entries from set, destroy start and end elements - * @set: nftables API set representation + * @_set: nftables API set representation * @m: Matching data */ static void pipapo_gc(const struct nft_set *_set, struct nft_pipapo_match *m)
On Mon, Aug 28, 2023 at 12:11:19PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.13-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
Tested rc1 against the Fedora build system (aarch64, ppc64le, s390x, x86_64), and boot tested x86_64. No regressions noted.
Tested-by: Justin M. Forbes jforbes@fedoraproject.org
On Mon, 28 Aug 2023 at 15:47, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.13-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.4.13-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-6.4.y * git commit: 8c2717f278c54f64673603d8157130144141d4c7 * git describe: v6.4.12-130-g8c2717f278c5 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.4.y/build/v6.4.12...
## Test Regressions (compared to v6.4.12)
## Metric Regressions (compared to v6.4.12)
## Test Fixes (compared to v6.4.12)
## Metric Fixes (compared to v6.4.12)
## Test result summary total: 157320, pass: 136320, fail: 1847, skip: 18983, xfail: 170
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 141 total, 139 passed, 2 failed * arm64: 53 total, 50 passed, 3 failed * i386: 41 total, 39 passed, 2 failed * mips: 29 total, 27 passed, 2 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 36 total, 34 passed, 2 failed * riscv: 26 total, 23 passed, 3 failed * s390: 16 total, 14 passed, 2 failed * sh: 14 total, 12 passed, 2 failed * sparc: 8 total, 8 passed, 0 failed * x86_64: 44 total, 40 passed, 4 failed
## Test suites summary * boot * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-user_events * kselftest-vDSO * kselftest-vm * kselftest-watchdog * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * perf * rcutorture * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
On 8/28/23 3:11 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.13-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Mon, Aug 28, 2023 at 12:11:19PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.13-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
For RCU, Tested-by: Joel Fernandes (Google) joel@joelfernandes.org
thanks,
- Joel
thanks,
greg k-h
Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.4.13-rc1
Florian Westphal fw@strlen.de netfilter: nf_tables: fix kdoc warnings after gc rework
Günther Noack gnoack3000@gmail.com TIOCSTI: Document CAP_SYS_ADMIN behaviour in Kconfig
Arnd Bergmann arnd@arndb.de ASoC: amd: vangogh: select CONFIG_SND_AMD_ACP_CONFIG
Liam R. Howlett Liam.Howlett@oracle.com maple_tree: disable mas_wr_append() when other readers are possible
Mario Limonciello mario.limonciello@amd.com ASoC: amd: yc: Fix a non-functional mic on Lenovo 82SJ
Bartosz Golaszewski bartosz.golaszewski@linaro.org gpio: sim: pass the GPIO device's software node to irq domain
Bartosz Golaszewski bartosz.golaszewski@linaro.org gpio: sim: dispose of irq mappings before destroying the irq_sim domain
Rob Clark robdclark@chromium.org dma-buf/sw_sync: Avoid recursive lock during fence signal
Biju Das biju.das.jz@bp.renesas.com pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
Biju Das biju.das.jz@bp.renesas.com pinctrl: renesas: rzv2m: Fix NULL pointer dereference in rzv2m_dt_subnode_to_map()
Biju Das biju.das.jz@bp.renesas.com pinctrl: renesas: rzg2l: Fix NULL pointer dereference in rzg2l_dt_subnode_to_map()
Maciej Strozek mstrozek@opensource.cirrus.com ASoC: cs35l56: Read firmware uuid from a device property instead of _SUB
Chao Song chao.song@linux.intel.com ASoC: SOF: ipc4-pcm: fix possible null pointer deference
Biju Das biju.das.jz@bp.renesas.com clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
Zhu Wang wangzhu9@huawei.com scsi: core: raid_class: Remove raid_component_add()
Neil Armstrong neil.armstrong@linaro.org scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW major version > 5
Zhu Wang wangzhu9@huawei.com scsi: snic: Fix double free in snic_tgt_create()
Yin Fengwei fengwei.yin@intel.com madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
Yin Fengwei fengwei.yin@intel.com madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
Matt Roper matthew.d.roper@intel.com drm/i915: Fix error handling if driver creation fails during probe
Oliver Hartkopp socketcan@hartkopp.net can: raw: add missing refcount for memory leak fix
Sanjay R Mehta sanju.mehta@amd.com thunderbolt: Fix Thunderbolt 3 display flickering issue on 2nd hot plug onwards
Igor Mammedov imammedo@redhat.com PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus
Wei Chen harperchen1110@gmail.com media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
Mario Limonciello mario.limonciello@amd.com pinctrl: amd: Mask wake bits on probe again
Rob Herring robh@kernel.org of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock
Rob Herring robh@kernel.org of: unittest: Fix EXPECT for parse_phandle_with_args_map() test
Arnd Bergmann arnd@arndb.de radix tree: remove unused variable
Mingzheng Xing xingmingzheng@iscas.ac.cn riscv: Fix build errors using binutils2.37 toolchains
Mingzheng Xing xingmingzheng@iscas.ac.cn riscv: Handle zicsr/zifencei issue between gcc and binutils
Helge Deller deller@gmx.de lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
Hans de Goede hdegoede@redhat.com ACPI: resource: Fix IRQ override quirk for PCSpecialist Elimina Pro 16 M
Sven Eckelmann sven@narfation.org batman-adv: Hold rtnl lock during MTU update via netlink
Remi Pommarel repk@triplefau.lt batman-adv: Fix batadv_v_ogm_aggr_send memory leak
Remi Pommarel repk@triplefau.lt batman-adv: Fix TT global entry leak when client roamed back
Remi Pommarel repk@triplefau.lt batman-adv: Do not get eth header before batadv_check_management_packet
Sven Eckelmann sven@narfation.org batman-adv: Don't increase MTU when set by user
Sven Eckelmann sven@narfation.org batman-adv: Trigger events for auto adjusted MTU
Christian Göttsche cgzones@googlemail.com selinux: set next pointer before attaching to list
Benjamin Coddington bcodding@redhat.com nfsd: Fix race to FREE_STATEID and cl_revoked
Trond Myklebust trond.myklebust@hammerspace.com NFS: Fix a use after free in nfs_direct_join_group()
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()
T.J. Mercier tjmercier@google.com mm: multi-gen LRU: don't spin during memcg release
Miaohe Lin linmiaohe@huawei.com mm: memory-failure: fix unexpected return value in soft_offline_page()
Alexandre Ghiti alexghiti@rivosinc.com mm: add a call to flush_cache_vmap() in vmap_pfn()
Dietmar Eggemann dietmar.eggemann@arm.com cgroup/cpuset: Free DL BW in case can_attach() fails
Dietmar Eggemann dietmar.eggemann@arm.com sched/deadline: Create DL BW alloc, free & check overflow interface
Juri Lelli juri.lelli@redhat.com cgroup/cpuset: Iterate only if DEADLINE tasks are present
Juri Lelli juri.lelli@redhat.com sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
Juri Lelli juri.lelli@redhat.com sched/cpuset: Bring back cpuset_mutex
Juri Lelli juri.lelli@redhat.com cgroup/cpuset: Rename functions dealing with DEADLINE accounting
Jani Nikula jani.nikula@intel.com drm/i915: fix display probe for IVB Q and IVB D GT2 server
Matt Roper matthew.d.roper@intel.com drm/i915/display: Handle GMD_ID identification in display code
Feng Tang feng.tang@intel.com x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
Rick Edgecombe rick.p.edgecombe@intel.com x86/fpu: Invalidate FPU state correctly on exec()
Huacai Chen chenhuacai@kernel.org LoongArch: Fix hw_breakpoint_control() for watchpoints
Imre Deak imre.deak@intel.com drm/i915: Fix HPD polling, reenabling the output poll work as needed
Ankit Nautiyal ankit.k.nautiyal@intel.com drm/display/dp: Fix the DP DSC Receiver cap size
Anshuman Gupta anshuman.gupta@intel.com drm/i915/dgfx: Enable d3cold at s2idle
David Michael fedora.dm0@gmail.com drm/panfrost: Skip speed binning on EOPNOTSUPP
Imre Deak imre.deak@intel.com drm: Add an HPD poll helper to reschedule the poll work
Zack Rusin zackr@vmware.com drm/vmwgfx: Fix possible invalid drm gem put calls
Zack Rusin zackr@vmware.com drm/vmwgfx: Fix shader stage validation
David Hildenbrand david@redhat.com mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
David Hildenbrand david@redhat.com mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
Suren Baghdasaryan surenb@google.com mm: enable page walking API to lock vmas during the walk
Ayush Jain ayush.jain3@amd.com selftests/mm: FOLL_LONGTERM need to be updated to 0x100
Takashi Iwai tiwai@suse.de ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
Hugh Dickins hughd@google.com shmem: fix smaps BUG sleeping while atomic
Rik van Riel riel@surriel.com mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer
Andrey Skvortsov andrej.skvortzov@gmail.com clk: Fix slab-out-of-bounds error in devm_clk_release()
Benjamin Coddington bcodding@redhat.com NFSv4: Fix dropped lock for racing OPEN and delegation return
André Apitzsch git@apitzsch.eu platform/x86: ideapad-laptop: Add support for new hotkeys found on ThinkBook 14s Yoga ITL
Swapnil Devesh me@sidevesh.com platform/x86: lenovo-ymc: Add Lenovo Yoga 7 14ACN6 to ec_trigger_quirk_dmi_table
Ping-Ke Shih pkshih@realtek.com wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
Michael Ellerman mpe@ellerman.id.au ibmveth: Use dcbf rather than dcbfl
Srinivas Goud srinivas.goud@amd.com spi: spi-cadence: Fix data corruption issues in slave mode
Charles Keepax ckeepax@opensource.cirrus.com ASoC: cs35l41: Correct amp_gain_tlv values
BrenoRCBrito brenorcbrito@gmail.com ASoC: amd: yc: Add VivoBook Pro 15 to quirks list for acp6x
Hangbin Liu liuhangbin@gmail.com bonding: fix macvlan over alb bond support
Ido Schimmel idosch@nvidia.com rtnetlink: Reject negative ifindexes in RTM_NEWLINK
Florian Westphal fw@strlen.de netfilter: nf_tables: defer gc run if previous batch is still pending
Florian Westphal fw@strlen.de netfilter: nf_tables: fix out of memory error handling
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: use correct lock to protect gc_list
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: GC transaction race with abort path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: flush pending destroy work before netlink notifier
Florian Westphal fw@strlen.de netfilter: nf_tables: validate all pending tables
Andrii Staikov andrii.staikov@intel.com i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
Jamal Hadi Salim jhs@mojatatu.com net/sched: fix a qdisc modification with ambiguous command request
Sasha Neftin sasha.neftin@intel.com igc: Fix the typo in the PTM Control macro
Alessio Igor Bogani alessio.bogani@elettra.eu igb: Avoid starting unnecessary workqueues
Oliver Hartkopp socketcan@hartkopp.net can: isotp: fix support for transmission of SF without flow control
Daniel Golle daniel@makrotopia.org net: ethernet: mtk_eth_soc: fix NULL pointer on hw reset
Kees Cook keescook@chromium.org tg3: Use slab_build_skb() when needed
Hangbin Liu liuhangbin@gmail.com selftests: bonding: do not set port down before adding to bond
Petr Oros poros@redhat.com ice: Fix NULL pointer deref during VF reset
Petr Oros poros@redhat.com Revert "ice: Fix ice VF reset during iavf initialization"
Jesse Brandeburg jesse.brandeburg@intel.com ice: fix receive buffer size miscalculation
Eric Dumazet edumazet@google.com ipv4: fix data-races around inet->inet_id
Jakub Kicinski kuba@kernel.org net: validate veth and vxcan peer ifindexes
Ruan Jinjie ruanjinjie@huawei.com net: bcmgenet: Fix return value check for fixed_phy_register()
Ruan Jinjie ruanjinjie@huawei.com net: bgmac: Fix return value check for fixed_phy_register()
Serge Semin fancer.lancer@gmail.com net: mdio: mdio-bitbang: Fix C45 read/write protocol
Arınç ÜNAL arinc.unal@arinc9.com net: dsa: mt7530: fix handling of 802.1X PAE frames
Ido Schimmel idosch@nvidia.com selftests: mlxsw: Fix test failure on Spectrum-4
Amit Cohen amcohen@nvidia.com mlxsw: Fix the size of 'VIRT_ROUTER_MSB'
Ido Schimmel idosch@nvidia.com mlxsw: reg: Fix SSPR register layout
Danielle Ratson danieller@nvidia.com mlxsw: pci: Set time stamp fields also when its type is MIRROR_UTC
Lu Wei luwei32@huawei.com ipvlan: Fix a reference count leak warning in ipvlan_ns_exit()
Eric Dumazet edumazet@google.com dccp: annotate data-races in dccp_poll()
Eric Dumazet edumazet@google.com sock: annotate data-races around prot->memory_pressure
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: felix: fix oversize frame dropping for always closed tc-taprio gates
Jiri Pirko jiri@resnulli.us devlink: add missing unregister linecard notification
Hariprasad Kelam hkelam@marvell.com octeontx2-af: SDP: fix receive link config
Zheng Yejian zhengyejian1@huawei.com tracing: Fix memleak due to race between current_tracer and trace
Sven Schnelle svens@linux.ibm.com tracing/synthetic: Allocate one additional element for size
Sven Schnelle svens@linux.ibm.com tracing/synthetic: Skip first entry for stack traces
Sven Schnelle svens@linux.ibm.com tracing/synthetic: Use union instead of casts
Zheng Yejian zhengyejian1@huawei.com tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
Randy Dunlap rdunlap@infradead.org wifi: iwlwifi: mvm: add dependency for PTP clock
Eric Dumazet edumazet@google.com can: raw: fix lockdep issue in raw_release()
Ziyang Xuan william.xuanziyang@huawei.com can: raw: fix receiver memory leak
Zhang Yi yi.zhang@huawei.com jbd2: fix a race when checking checkpoint buffer busy
Zhang Yi yi.zhang@huawei.com jbd2: remove journal_clean_one_cp_list()
Zhang Yi yi.zhang@huawei.com jbd2: remove t_checkpoint_io_list
Igor Mammedov imammedo@redhat.com PCI: acpiphp: Reassign resources on bridge if necessary
Chuck Lever chuck.lever@oracle.com xprtrdma: Remap Receive buffers after a reconnect
Fedor Pchelkin pchelkin@ispras.ru NFSv4: fix out path in __nfs4_get_acl_uncached
Fedor Pchelkin pchelkin@ispras.ru NFSv4.2: fix error handling in nfs42_proc_getxattr
Diffstat:
Makefile | 4 +- arch/loongarch/kernel/hw_breakpoint.c | 3 +- arch/powerpc/mm/book3s64/subpage_prot.c | 1 + arch/riscv/Kconfig | 28 ++- arch/riscv/kernel/compat_vdso/Makefile | 8 +- arch/riscv/mm/pageattr.c | 1 + arch/s390/mm/gmap.c | 5 + arch/x86/kernel/fpu/context.h | 3 +- arch/x86/kernel/fpu/core.c | 2 +- arch/x86/kernel/fpu/xstate.c | 7 + drivers/acpi/resource.c | 6 +- drivers/clk/clk-devres.c | 13 +- drivers/dma-buf/sw_sync.c | 18 +- drivers/gpio/gpio-sim.c | 15 +- drivers/gpu/drm/drm_probe_helper.c | 68 ++++-- .../gpu/drm/i915/display/intel_display_device.c | 91 +++++++- .../gpu/drm/i915/display/intel_display_device.h | 5 +- drivers/gpu/drm/i915/display/intel_hotplug.c | 4 +- drivers/gpu/drm/i915/i915_driver.c | 50 +++-- drivers/gpu/drm/i915/intel_device_info.c | 13 +- drivers/gpu/drm/panfrost/panfrost_devfreq.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 6 +- drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 8 + drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 12 + drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 35 ++- drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 6 +- drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c | 3 +- drivers/gpu/drm/vmwgfx/vmwgfx_shader.c | 3 +- .../platform/mediatek/vcodec/mtk_vcodec_enc.c | 2 + drivers/net/bonding/bond_alb.c | 6 +- drivers/net/can/vxcan.c | 7 +- drivers/net/dsa/mt7530.c | 4 + drivers/net/dsa/mt7530.h | 2 + drivers/net/dsa/ocelot/felix_vsc9959.c | 3 + drivers/net/ethernet/broadcom/bgmac.c | 2 +- drivers/net/ethernet/broadcom/genet/bcmmii.c | 2 +- drivers/net/ethernet/broadcom/tg3.c | 5 +- .../chelsio/inline_crypto/chtls/chtls_cm.c | 2 +- drivers/net/ethernet/ibm/ibmveth.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 5 +- drivers/net/ethernet/intel/ice/ice_base.c | 3 +- drivers/net/ethernet/intel/ice/ice_sriov.c | 8 +- drivers/net/ethernet/intel/ice/ice_vf_lib.c | 34 +-- drivers/net/ethernet/intel/ice/ice_vf_lib.h | 1 - drivers/net/ethernet/intel/ice/ice_virtchnl.c | 1 - drivers/net/ethernet/intel/igb/igb_ptp.c | 24 +- drivers/net/ethernet/intel/igc/igc_defines.h | 2 +- .../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 3 +- drivers/net/ethernet/mediatek/mtk_wed.c | 12 +- .../ethernet/mellanox/mlxsw/core_acl_flex_keys.c | 4 +- drivers/net/ethernet/mellanox/mlxsw/pci.c | 8 +- drivers/net/ethernet/mellanox/mlxsw/reg.h | 9 - .../ethernet/mellanox/mlxsw/spectrum2_mr_tcam.c | 2 +- .../mellanox/mlxsw/spectrum_acl_flex_keys.c | 4 +- drivers/net/ipvlan/ipvlan_main.c | 3 +- drivers/net/mdio/mdio-bitbang.c | 4 +- drivers/net/veth.c | 5 +- drivers/net/wireless/intel/iwlwifi/Kconfig | 1 + drivers/of/dynamic.c | 31 +-- drivers/of/kexec.c | 3 +- drivers/of/unittest.c | 4 +- drivers/pci/hotplug/acpiphp_glue.c | 9 +- drivers/pinctrl/pinctrl-amd.c | 30 +++ drivers/pinctrl/renesas/pinctrl-rza2.c | 17 +- drivers/pinctrl/renesas/pinctrl-rzg2l.c | 15 +- drivers/pinctrl/renesas/pinctrl-rzv2m.c | 13 +- drivers/platform/x86/ideapad-laptop.c | 5 + drivers/platform/x86/lenovo-ymc.c | 7 + drivers/scsi/raid_class.c | 48 ---- drivers/scsi/snic/snic_disc.c | 3 +- drivers/spi/spi-cadence.c | 19 +- drivers/thunderbolt/tmu.c | 3 +- drivers/tty/Kconfig | 3 + drivers/ufs/host/ufs-qcom.c | 2 +- fs/jbd2/checkpoint.c | 165 +++++--------- fs/jbd2/commit.c | 3 +- fs/jbd2/transaction.c | 17 +- fs/nfs/direct.c | 26 ++- fs/nfs/nfs42proc.c | 5 +- fs/nfs/nfs4proc.c | 14 +- fs/nfsd/nfs4state.c | 2 +- fs/nilfs2/segment.c | 5 + fs/proc/task_mmu.c | 5 + include/drm/display/drm_dp.h | 2 +- include/drm/drm_probe_helper.h | 1 + include/linux/clk.h | 80 +++---- include/linux/cpuset.h | 12 +- include/linux/jbd2.h | 7 +- include/linux/mm.h | 21 +- include/linux/mm_types.h | 9 + include/linux/pagewalk.h | 11 + include/linux/raid_class.h | 4 - include/linux/sched.h | 4 +- include/linux/trace_events.h | 11 + include/net/bonding.h | 11 +- include/net/inet_sock.h | 2 +- include/net/ip.h | 15 +- include/net/mac80211.h | 1 + include/net/netfilter/nf_tables.h | 7 + include/net/rtnetlink.h | 4 +- include/net/sock.h | 7 +- include/trace/events/jbd2.h | 12 +- kernel/cgroup/cgroup.c | 4 + kernel/cgroup/cpuset.c | 244 +++++++++++++-------- kernel/sched/core.c | 41 ++-- kernel/sched/deadline.c | 67 ++++-- kernel/sched/sched.h | 2 +- kernel/trace/trace.c | 15 +- kernel/trace/trace.h | 8 + kernel/trace/trace_events_synth.c | 103 ++++----- kernel/trace/trace_irqsoff.c | 3 +- kernel/trace/trace_sched_wakeup.c | 2 + lib/clz_ctz.c | 32 +-- lib/maple_tree.c | 3 + lib/radix-tree.c | 1 - mm/damon/vaddr.c | 2 + mm/gup.c | 30 ++- mm/hmm.c | 1 + mm/huge_memory.c | 3 +- mm/internal.h | 10 + mm/ksm.c | 25 ++- mm/madvise.c | 11 +- mm/memcontrol.c | 2 + mm/memory-failure.c | 12 +- mm/mempolicy.c | 22 +- mm/migrate_device.c | 1 + mm/mincore.c | 1 + mm/mlock.c | 1 + mm/mprotect.c | 1 + mm/pagewalk.c | 36 ++- mm/shmem.c | 6 +- mm/vmalloc.c | 4 + mm/vmscan.c | 14 +- net/batman-adv/bat_v_elp.c | 3 +- net/batman-adv/bat_v_ogm.c | 7 +- net/batman-adv/hard-interface.c | 14 +- net/batman-adv/netlink.c | 3 + net/batman-adv/soft-interface.c | 3 + net/batman-adv/translation-table.c | 1 - net/batman-adv/types.h | 6 + net/can/isotp.c | 22 +- net/can/raw.c | 77 ++++--- net/core/rtnetlink.c | 25 ++- net/dccp/ipv4.c | 4 +- net/dccp/proto.c | 20 +- net/devlink/leftover.c | 3 + net/ipv4/af_inet.c | 2 +- net/ipv4/datagram.c | 2 +- net/ipv4/tcp_ipv4.c | 4 +- net/mac80211/rx.c | 12 +- net/netfilter/nf_tables_api.c | 23 +- net/netfilter/nft_set_hash.c | 3 + net/netfilter/nft_set_pipapo.c | 15 +- net/netfilter/nft_set_rbtree.c | 3 + net/sched/sch_api.c | 53 +++-- net/sctp/socket.c | 4 +- net/sunrpc/xprtrdma/verbs.c | 9 +- security/selinux/ss/policydb.c | 2 +- sound/pci/ymfpci/ymfpci.c | 10 +- sound/soc/amd/Kconfig | 1 + sound/soc/amd/yc/acp6x-mach.c | 9 +- sound/soc/codecs/cs35l41.c | 2 +- sound/soc/codecs/cs35l56.c | 31 +-- sound/soc/sof/ipc4-pcm.c | 3 + .../drivers/net/bonding/bond-break-lacpdu-tx.sh | 4 +- .../selftests/drivers/net/mlxsw/sharedbuffer.sh | 16 +- tools/testing/selftests/mm/hmm-tests.c | 9 +- 167 files changed, 1438 insertions(+), 939 deletions(-)
Hello Greg,
On Mon, 28 Aug 2023 12:11:19 +0200 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.13-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] 8c2717f278c5 ("Linux 6.4.13-rc1")
Thanks, SJ
[...]
---
[32m ok 1 selftests: damon: debugfs_attrs.sh ok 2 selftests: damon: debugfs_schemes.sh ok 3 selftests: damon: debugfs_target_ids.sh ok 4 selftests: damon: debugfs_empty_targets.sh ok 5 selftests: damon: debugfs_huge_count_read_write.sh ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: debugfs_rm_non_contexts.sh ok 8 selftests: damon: sysfs.sh ok 9 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 10 selftests: damon: reclaim.sh ok 11 selftests: damon: lru_sort.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_m68k.sh ok 12 selftests: damon-tests: build_arm64.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m _remote_run_corr.sh SUCCESS
On Mon, Aug 28, 2023 at 12:11:19PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Conor Dooley conor.dooley@microchip.com
Thanks, Conor.
On Mon, Aug 28, 2023 at 12:11:19PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully compiled and installed bindeb-pkgs on my computer (Acer Aspire E15, Intel Core i3 Haswell). No noticeable regressions.
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
Hi Greg,
On Mon, Aug 28, 2023 at 12:11:19PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000. Anything received after that time might be too late.
Build test (gcc version 12.3.1 20230829): mips: 52 configs -> no failure arm: 71 configs -> no failure arm64: 3 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure csky allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1] arm64: Booted on rpi4b (4GB model). No regression. [2] mips: Booted on ci20 board. No regression. [3]
[1]. https://openqa.qa.codethink.co.uk/tests/4841 [2]. https://openqa.qa.codethink.co.uk/tests/4864 [3]. https://openqa.qa.codethink.co.uk/tests/4863
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
On 8/28/23 04:11, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.13-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On 8/28/23 03:11, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.13-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On Mon, Aug 28, 2023 at 12:11:19PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 157 fail: 0 Qemu test results: total: 522 pass: 522 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Mon, 28 Aug 2023 12:11:19 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.13 release. There are 129 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.13-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.4: 11 builds: 11 pass, 0 fail 28 boots: 28 pass, 0 fail 130 tests: 130 pass, 0 fail
Linux version: 6.4.13-rc1-g8c2717f278c5 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
linux-stable-mirror@lists.linaro.org