This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.5.4-rc1
Wesley Chalmers wesley.chalmers@amd.com drm/amd/display: Fix a bug when searching for insert_above_mpcc
Linus Torvalds torvalds@linux-foundation.org vm: fix move_vma() memory accounting being off
Kuniyuki Iwashima kuniyu@amazon.com kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg().
Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com net: renesas: rswitch: Fix unmasking irq condition
Corinna Vinschen vinschen@redhat.com igb: clean up in all error paths when enabling SR-IOV
Vadim Fedorenko vadim.fedorenko@linux.dev ixgbe: fix timestamp configuration code
Kuniyuki Iwashima kuniyu@amazon.com selftest: tcp: Fix address length in bind_wildcard.c.
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix bind() regression for v4-mapped-v6 wildcard address.
Kuniyuki Iwashima kuniyu@amazon.com tcp: Factorise sk_family-independent comparison in inet_bind2_bucket_match(_addr_any).
Eric Dumazet edumazet@google.com ipv6: fix ip6_sock_set_addr_preferences() typo
Toke Høiland-Jørgensen toke@redhat.com veth: Update XDP feature set when bringing up device
Sascha Hauer s.hauer@pengutronix.de net: macb: fix sleep inside spinlock
Liu Jian liujian56@huawei.com net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()
Geert Uytterhoeven geert+renesas@glider.be platform/mellanox: NVSW_SN2201 should depend on ACPI
Shravan Kumar Ramani shravankr@nvidia.com platform/mellanox: mlxbf-pmc: Fix reading of unprogrammed events
Shravan Kumar Ramani shravankr@nvidia.com platform/mellanox: mlxbf-pmc: Fix potential buffer overflows
Liming Sun limings@nvidia.com platform/mellanox: mlxbf-tmfifo: Drop jumbo frames
Liming Sun limings@nvidia.com platform/mellanox: mlxbf-tmfifo: Drop the Rx packet if no more descriptors
Shigeru Yoshida syoshida@redhat.com kcm: Fix memory leak in error path of kcm_sendmsg()
Hayes Wang hayeswang@realtek.com r8152: check budget for r8152_poll()
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: sja1105: block FDB accesses that are concurrent with a switch reset
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: sja1105: serialize sja1105_port_mcast_flood() with other FDB accesses
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: sja1105: fix multicast forwarding working only for last added mdb entry
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: sja1105: propagate exact error code from sja1105_dynamic_config_poll_valid()
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: sja1105: hide all multicast addresses from "bridge fdb show"
Ciprian Regus ciprian.regus@analog.com net:ethernet:adi:adin1110: Fix forwarding offload
Yang Yingliang yangyingliang@huawei.com net: ethernet: adi: adin1110: use eth_broadcast_addr() to assign broadcast address
Ziyang Xuan william.xuanziyang@huawei.com hsr: Fix uninit-value access in fill_frame_info()
Hangyu Hua hbh25y@gmail.com net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_hwlro_get_fdir_all()
Hangyu Hua hbh25y@gmail.com net: ethernet: mvpp2_main: fix possible OOB write in mvpp2_ethtool_get_rxnfc()
Vincent Whitchurch vincent.whitchurch@axis.com net: stmmac: fix handling of zero coalescing tx-usecs
Guangguan Wang guangguan.wang@linux.alibaba.com net/smc: use smc_lgr_list.lock to protect smc_lgr_list.list iterate in smcr_port_add
Ratheesh Kannoth rkannoth@marvell.com octeontx2-pf: Fix page pool cache index corruption.
Jinjie Ruan ruanjinjie@huawei.com net: microchip: vcap api: Fix possible memory leak for vcap_dup_rule()
Naveen N Rao naveen@kernel.org selftests/ftrace: Fix dependencies for some of the synthetic event tests
Björn Töpel bjorn@rivosinc.com selftests: Keep symlinks, when possible
Björn Töpel bjorn@rivosinc.com kselftest/runner.sh: Propagate SIGTERM to runner child
Liu Jian liujian56@huawei.com net: ipv4: fix one memleak in __inet_del_ifa()
Jinjie Ruan ruanjinjie@huawei.com kunit: Fix wild-memory-access bug in kunit_free_suite_set()
Helge Deller deller@gmx.de parisc: sba_iommu: Fix build warning if procfs if disabled
Biju Das biju.das.jz@bp.renesas.com regulator: raa215300: Fix resource leak in case of error
Biju Das biju.das.jz@bp.renesas.com regulator: raa215300: Change the scope of the variables {clkin_name, xin_name}
Arnd Bergmann arnd@arndb.de bpf: fix bpf_probe_read_kernel prototype mismatch
Hamza Mahfooz hamza.mahfooz@amd.com drm/amdgpu: register a dirty framebuffer callback for fbcon
Jay Cornwall jay.cornwall@amd.com drm/amdkfd: Add missing gfx11 MQD manager callbacks
Gabe Teeger gabe.teeger@amd.com drm/amd/display: Remove wait while locked
Wenjing Liu wenjing.liu@amd.com drm/amd/display: always switch off ODM before committing more streams
Namhyung Kim namhyung@kernel.org perf hists browser: Fix the number of entries for 'e' key
Namhyung Kim namhyung@kernel.org perf build: Include generated header files properly
Namhyung Kim namhyung@kernel.org perf tools: Handle old data in PERF_RECORD_ATTR
Namhyung Kim namhyung@kernel.org perf test shell stat_bpf_counters: Fix test on Intel
Namhyung Kim namhyung@kernel.org perf build: Update build rule for generated files
Namhyung Kim namhyung@kernel.org perf hists browser: Fix hierarchy mode header
Maciej W. Rozycki macro@orcam.me.uk MIPS: Fix CONFIG_CPU_DADDI_WORKAROUNDS `modules_install' regression
Maciej W. Rozycki macro@orcam.me.uk MIPS: Only fiddle with CHECKFLAGS if `need-compiler'
Sean Christopherson seanjc@google.com KVM: SVM: Skip VMSA init in sev_es_init_vmcb() if pointer is NULL
Sean Christopherson seanjc@google.com KVM: SVM: Set target pCPU during IRTE update if target vCPU is running
Sean Christopherson seanjc@google.com KVM: nSVM: Load L1's TSC multiplier based on L1 state, not L2 state
Sean Christopherson seanjc@google.com KVM: nSVM: Check instead of asserting on nested TSC scaling support
Sean Christopherson seanjc@google.com KVM: SVM: Get source vCPUs from source VM for SEV-ES intrahost migration
Sean Christopherson seanjc@google.com KVM: SVM: Don't inject #UD if KVM attempts to skip SEV guest insn
Sean Christopherson seanjc@google.com KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry
Sean Christopherson seanjc@google.com KVM: VMX: Refresh available regs and IDT vectoring info before NMI handling
Hamza Mahfooz hamza.mahfooz@amd.com drm/amd/display: prevent potential division by zero errors
Hamza Mahfooz hamza.mahfooz@amd.com drm/amd/display: limit the v_startup workaround to ASICs older than DCN3.1
Melissa Wen mwen@igalia.com drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma
Hamza Mahfooz hamza.mahfooz@amd.com Revert "drm/amd/display: Remove v_startup workaround for dcn3+"
William Zhang william.zhang@broadcom.com mtd: rawnand: brcmnand: Fix ECC level field setting for v7.2 controller
William Zhang william.zhang@broadcom.com mtd: rawnand: brcmnand: Fix potential false time out warning
Linus Walleij linus.walleij@linaro.org mtd: spi-nor: Correct flags for Winbond w25q128
William Zhang william.zhang@broadcom.com mtd: rawnand: brcmnand: Fix potential out-of-bounds access in oob write
William Zhang william.zhang@broadcom.com mtd: rawnand: brcmnand: Fix crash during the panic_write
Liu Ying victor.liu@nxp.com drm/mxsfb: Disable overlay plane in mxsfb_plane_overlay_atomic_disable()
Qu Wenruo wqu@suse.com btrfs: scrub: fix grouping of read IO
Qu Wenruo wqu@suse.com btrfs: scrub: avoid unnecessary csum tree search preparing stripes
Qu Wenruo wqu@suse.com btrfs: scrub: avoid unnecessary extent tree search preparing stripes
Anand Jain anand.jain@oracle.com btrfs: use the correct superblock to compare fsid in btrfs_validate_super
Naohiro Aota naohiro.aota@wdc.com btrfs: zoned: re-enable metadata over-commit for zoned mode
Josef Bacik josef@toxicpanda.com btrfs: set page extent mapped after read_folio in relocate_one_page
Filipe Manana fdmanana@suse.com btrfs: don't start transaction when joining with TRANS_JOIN_NOSTART
Boris Burkov boris@bur.io btrfs: free qgroup rsv on io failure
Boris Burkov boris@bur.io btrfs: fix start transaction qgroup rsv double free
Naohiro Aota naohiro.aota@wdc.com btrfs: zoned: do not zone finish data relocation block group
ruanmeisi ruan.meisi@zte.com.cn fuse: nlookup missing decrement in fuse_direntplus_link
Damien Le Moal dlemoal@kernel.org ata: pata_ftide010: Add missing MODULE_DESCRIPTION
Damien Le Moal dlemoal@kernel.org ata: sata_gemini: Add missing MODULE_DESCRIPTION
Michael Schmitz schmitzmic@gmail.com ata: pata_falcon: fix IO base selection for Q40
Werner Fischer devlists@wefi.net ata: ahci: Add Elkhart Lake AHCI controller
Johannes Weiner hannes@cmpxchg.org memcontrol: ensure memcg acquired by id is properly set up
Christian Marangi ansuelsmth@gmail.com hwspinlock: qcom: add missing regmap config for SFPB MMIO implementation
Nathan Chancellor nathan@kernel.org lib: test_scanf: Add explicit type cast to result initialization in test_number_prefix()
Jaegeuk Kim jaegeuk@kernel.org f2fs: avoid false alarm of circular locking
Jaegeuk Kim jaegeuk@kernel.org f2fs: flush inode if atomic file is aborted
Jaegeuk Kim jaegeuk@kernel.org f2fs: get out of a repeat loop when getting a locked data page
Brian Foster bfoster@redhat.com ext4: drop dio overwrite only flag and associated warning
Luís Henriques lhenriques@suse.de ext4: fix memory leaks in ext4_fname_{setup_filename,prepare_lookup}
Wang Jianjian wangjianjian0@foxmail.com ext4: add correct group descriptors and reserved GDT blocks to system zone
Baokun Li libaokun1@huawei.com ext4: fix slab-use-after-free in ext4_es_insert_extent()
Zhang Yi yi.zhang@huawei.com jbd2: correct the end of the journal recovery scan range
Zhihao Cheng chengzhihao1@huawei.com jbd2: check 'jh->b_transaction' before removing it from checkpoint
Zhang Yi yi.zhang@huawei.com jbd2: fix checkpoint cleanup performance regression
Ekansh Gupta quic_ekangupt@quicinc.com misc: fastrpc: Fix incorrect DMA mapping unmap request
Ekansh Gupta quic_ekangupt@quicinc.com misc: fastrpc: Fix remote heap allocation request
Hien Huynh hien.huynh.px@renesas.com dmaengine: sh: rz-dmac: Fix destination and source data size setting
Walter Chang walter.chang@mediatek.com clocksource/drivers/arm_arch_timer: Disable timer before programming CVAL
Pavel Kozlov pavel.kozlov@synopsys.com ARC: atomics: Add compiler barrier to atomic operations...
Fangzhi Zuo jerry.zuo@amd.com drm/amd/display: Temporary Disable MST DP Colorspace Property
Florent CARLI fcarli@gmail.com watchdog: advantech_ec_wdt: fix Kconfig dependencies
Masahiro Yamada masahiroy@kernel.org linux/export: fix reference to exported functions for parisc64
Duoming Zhou duoming@zju.edu.cn sh: push-switch: Reorder cleanup operations to avoid use-after-free bug
Petr Tesarik petr.tesarik.ext@huawei.com sh: boards: Fix CEU buffer size passed to dma_declare_coherent_memory()
Vladimir Oltean vladimir.oltean@nxp.com net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()
Jie Wang wangjie125@huawei.com net: hns3: remove GSO partial feature bit
Yisen Zhuang yisen.zhuang@huawei.com net: hns3: fix the port information display when sfp is absent
Jijie Shao shaojijie@huawei.com net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue
Hao Chen chenhao418@huawei.com net: hns3: fix debugfs concurrency issue between kfree buffer and read
Hao Chen chenhao418@huawei.com net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()
Jian Shen shenjian15@huawei.com net: hns3: fix tx timeout issue
Lukasz Majewski lukma@denx.de net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: Unbreak audit log reset
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
Wander Lairson Costa wander@redhat.com netfilter: nfnetlink_osf: avoid OOB read
Florian Westphal fw@strlen.de netfilter: nftables: exthdr: fix 4-byte stack OOB write
Martin KaFai Lau martin.lau@kernel.org bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
Martin KaFai Lau martin.lau@kernel.org bpf: bpf_sk_storage: Fix invalid wait context lockdep report
Ilya Leoshkevich iii@linux.ibm.com s390/bpf: Pass through tail call counter in trampolines
Sebastian Andrzej Siewior bigeasy@linutronix.de bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check.
Sebastian Andrzej Siewior bigeasy@linutronix.de bpf: Invoke __bpf_prog_exit_sleepable_recur() on recursion in kern_sys_bpf().
Jakub Kicinski kuba@kernel.org net: phylink: fix sphinx complaint about invalid literal
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: sja1105: complete tc-cbs offload support on SJA1110
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: sja1105: fix -ENOSPC when replacing the same tc-cbs too many times
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: sja1105: fix bandwidth discrepancy between tc-cbs software and offload
Bodong Wang bodong@nvidia.com mlx5/core: E-Switch, Create ACL FT for eswitch manager in switchdev mode
Jiri Pirko jiri@resnulli.us net/mlx5: Push devlink port PF/VF init/cleanup calls out of devlink_port_register/unregister()
Jiri Pirko jiri@resnulli.us net/mlx5: Rework devlink port alloc/free into init/cleanup
Jiri Pirko jiri@resnulli.us net/mlx5: Give esw_offloads_load/unload_rep() "mlx5_" prefix
Jianbo Liu jianbol@nvidia.com net/mlx5e: Clear mirred devices array if the rule is split
Eric Dumazet edumazet@google.com ip_tunnels: use DEV_STATS_INC()
Ariel Marcovitch arielmarcovitch@gmail.com idr: fix param name in idr_alloc_cyclic() doc
Jerome Neanne jneanne@baylibre.com regulator: tps6594-regulator: Fix random kernel crash
Andy Shevchenko andriy.shevchenko@linux.intel.com s390/zcrypt: don't leak memory if dev_set_name() fails
Olga Zaborska olga.zaborska@intel.com igb: Change IGB_MIN to allow set rx/tx value between 64 and 80
Olga Zaborska olga.zaborska@intel.com igbvf: Change IGBVF_MIN to allow set rx/tx value between 64 and 80
Olga Zaborska olga.zaborska@intel.com igc: Change IGC_MIN to allow set rx/tx value between 64 and 80
Geetha sowjanya gakula@marvell.com octeontx2-af: Fix truncation of smq in CN10K NIX AQ enqueue mbox handler
Shigeru Yoshida syoshida@redhat.com kcm: Destroy mutex in kcm_exit_net()
valis sec@valis.email net: sched: sch_qfq: Fix UAF in qfq_dequeue()
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Fix data race around sk->sk_err.
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Fix data-races around sk->sk_shutdown.
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Fix data-race around unix_tot_inflight.
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Fix data-races around user->unix_inflight.
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT.
John Fastabend john.fastabend@gmail.com bpf, sockmap: Fix skb refcnt race after locking changes
Oleksij Rempel linux@rempel-privat.de net: phy: micrel: Correct bit assignments for phy_device flags
Alex Henrie alexhenrie24@gmail.com net: ipv6/addrconf: avoid integer underflow in ipv6_create_tempaddr
Liang Chen liangchen.linux@gmail.com veth: Fixing transmit return status for dropped packets
Eric Dumazet edumazet@google.com gve: fix frag_list chaining
Corinna Vinschen vinschen@redhat.com igb: disable virtualization features on 82580
Xu Kuohai xukuohai@huawei.com selftests/bpf: Fix a CI failure caused by vsock write
Sriram Yagnaraman sriram.yagnaraman@est.tech ipv6: ignore dst hint for multipath routes
Sriram Yagnaraman sriram.yagnaraman@est.tech ipv4: ignore dst hint for multipath routes
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_bind_phc
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_tsflags
Eric Dumazet edumazet@google.com mptcp: annotate data-races around msk->rmem_fwd_alloc
Eric Dumazet edumazet@google.com net: annotate data-races around sk->sk_forward_alloc
Eric Dumazet edumazet@google.com net: use sk_forward_alloc_get() in sk_get_meminfo()
Eric Dumazet edumazet@google.com net/handshake: fix null-ptr-deref in handshake_nl_done_doit()
Hamza Mahfooz hamza.mahfooz@amd.com drm/amd/display: fix mode scaling (RMX_.*)
Sean Christopherson seanjc@google.com drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
Sean Christopherson seanjc@google.com drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
Sean Christopherson seanjc@google.com drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
Xiubo Li xiubli@redhat.com ceph: make members in struct ceph_mds_request_args_ext a union
Magnus Karlsson magnus.karlsson@intel.com xsk: Fix xsk_diag use-after-free error during socket cleanup
Florian Westphal fw@strlen.de net: fib: avoid warn splat in flow dissector
Eric Dumazet edumazet@google.com net: read sk->sk_family once in sk_mc_loop()
Eric Dumazet edumazet@google.com ipv4: annotate data-races around fi->fib_dead
Eric Dumazet edumazet@google.com sctp: annotate data-races around sk->sk_wmem_queued
Eric Dumazet edumazet@google.com net/sched: fq_pie: avoid stalls in fq_pie_timer()
Katya Orlova e.orlova@ispras.ru smb: propagate error code of extract_sharename()
Phil Sutter phil@nwl.cc netfilter: nf_tables: Audit log rule reset
Phil Sutter phil@nwl.cc netfilter: nf_tables: Audit log setelem reset
Yu Kuai yukuai3@huawei.com blk-throttle: consider 'carryover_ios/bytes' in throtl_trim_slice()
Yu Kuai yukuai3@huawei.com blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()
Andrzej Hajda andrzej.hajda@intel.com drm/i915: mark requests for GuC virtual engines to avoid use-after-free
Yonghong Song yonghong.song@linux.dev selftests/bpf: Fix flaky cgroup_iter_sleepable subtest
Vincent Whitchurch vincent.whitchurch@axis.com regulator: tps6287x: Fix n_voltages
Namhyung Kim namhyung@kernel.org perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test
Kajol Jain kjain@linux.ibm.com perf test stat_bpf_counters_cgrp: Fix shellcheck issue about logical operators
Miquel Raynal miquel.raynal@bootlin.com i3c: master: svc: Describe member 'saved_regs'
Ian Rogers irogers@google.com perf header: Fix missing PMU caps
Justin Stitt justinstitt@google.com accel/ivpu: refactor deprecated strncpy
Vladimir Zapolskiy vz@mleia.com pwm: lpc32xx: Remove handling of PWM channels
Ilkka Koskinen ilkka@os.amperecomputing.com perf vendor events arm64: Remove L1D_CACHE_LMISS from AmpereOne list
Raag Jadav raag.jadav@intel.com watchdog: intel-mid_wdt: add MODULE_ALIAS() to allow auto-load
Arnaldo Carvalho de Melo acme@redhat.com perf lock: Don't pass an ERR_PTR() directly to perf_session__delete()
Arnaldo Carvalho de Melo acme@redhat.com perf top: Don't pass an ERR_PTR() directly to perf_session__delete()
Kajol Jain kjain@linux.ibm.com perf vendor events: Update metric event names for power10 platform
Kajol Jain kjain@linux.ibm.com perf vendor events: Move JSON/events to appropriate files for power10 platform
Kajol Jain kjain@linux.ibm.com perf vendor events: Drop STORES_PER_INST metric event for power10 platform
Kajol Jain kjain@linux.ibm.com perf vendor events: Drop some of the JSON/events for power10 platform
Kajol Jain kjain@linux.ibm.com perf vendor events: Update the JSON/events descriptions for power10 platform
Adrian Hunter adrian.hunter@intel.com perf dlfilter: Add al_cleanup()
Arnaldo Carvalho de Melo acme@kernel.org perf dlfilter: Initialize addr_location before passing it to thread__find_symbol_fb()
Namhyung Kim namhyung@kernel.org perf bpf-filter: Fix sample flag check with ||
Ivan Babrou ivan@cloudflare.com perf script: Print "cgroup" field on the same line as "comm"
Sean Christopherson seanjc@google.com x86/virt: Drop unnecessary check on extended CPUID level in cpu_has_svm()
Arnaldo Carvalho de Melo acme@redhat.com perf annotate bpf: Don't enclose non-debug code with an assert()
Dmitry Torokhov dmitry.torokhov@gmail.com Input: tca6416-keypad - fix interrupt enable disbalance
Dmitry Torokhov dmitry.torokhov@gmail.com Input: tca6416-keypad - always expect proper IRQ number in i2c client
Sean Christopherson seanjc@google.com KVM: SVM: Don't defer NMI unblocking until next exit for SEV-ES guests
Ian Rogers irogers@google.com perf parse-events: Additional error reporting
Ian Rogers irogers@google.com perf parse-events: Separate ENOMEM memory handling
Ian Rogers irogers@google.com perf parse-events: Move instances of YYABORT to YYNOMEM
Ian Rogers irogers@google.com perf parse-events: Separate YYABORT and YYNOMEM cases
Ying Liu victor.liu@nxp.com backlight: gpio_backlight: Drop output GPIO direction check for initial power state
Artur Weber aweber.kernel@gmail.com backlight: lp855x: Initialize PWM state on first brightness change
Uwe Kleine-König u.kleine-koenig@pengutronix.de pwm: atmel-tcb: Fix resource freeing in error path and remove
Uwe Kleine-König u.kleine-koenig@pengutronix.de pwm: atmel-tcb: Harmonize resource allocation order
Arnaldo Carvalho de Melo acme@redhat.com perf trace: Really free the evsel->priv area
Jeff LaBundy jeff@labundy.com Input: iqs7222 - configure power mode before triggering ATI
Xie XiuQi xiexiuqi@huawei.com tools/mm: fix undefined reference to pthread_once
Konstantin Meskhidze konstantin.meskhidze@huawei.com kconfig: fix possible buffer overflow
Jonathan Marek jonathan@marek.ca mailbox: qcom-ipcc: fix incorrect num_chans counting
Andreas Gruenbacher agruenba@redhat.com gfs2: low-memory forced flush fixes
Andreas Gruenbacher agruenba@redhat.com gfs2: Switch to wait_event in gfs2_logd
Christophe JAILLET christophe.jaillet@wanadoo.fr tpm_crb: Fix an error handling path in crb_acpi_add()
Jiri Slaby jirislaby@kernel.org kbuild: dummy-tools: make MPROFILE_KERNEL checks work on BE
Masahiro Yamada masahiroy@kernel.org kbuild: do not run depmod for 'make modules_sign'
Masahiro Yamada masahiroy@kernel.org kbuild: rpm-pkg: define _arch conditionally
Qiang Yu quic_qianyu@quicinc.com bus: mhi: host: Skip MHI reset if device is in RDDM
Fedor Pchelkin pchelkin@ispras.ru NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
Trond Myklebust trond.myklebust@hammerspace.com NFS: Fix a potential data corruption
Johan Hovold johan+linaro@kernel.org clk: qcom: mss-sc7180: fix missing resume during probe
Johan Hovold johan+linaro@kernel.org clk: qcom: q6sstop-qcs404: fix missing resume during probe
Johan Hovold johan+linaro@kernel.org clk: qcom: lpasscc-sc7280: fix missing resume during probe
Johan Hovold johan+linaro@kernel.org clk: qcom: dispcc-sm8550: fix runtime PM imbalance on probe errors
Johan Hovold johan+linaro@kernel.org clk: qcom: dispcc-sm8450: fix runtime PM imbalance on probe errors
Chris Lew quic_clew@quicinc.com soc: qcom: qmi_encdec: Restrict string length in decode
Dmitry Baryshkov dmitry.baryshkov@linaro.org clk: qcom: gcc-mdm9615: use proper parent for pll0_vote clock
Marco Felsch m.felsch@pengutronix.de clk: imx: pll14xx: align pdiv with reference manual
Ahmad Fatoum a.fatoum@pengutronix.de clk: imx: pll14xx: dynamically configure PLL for 393216000/361267200Hz
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org dt-bindings: clock: xlnx,versal-clk: drop select:false
Raag Jadav raag.jadav@intel.com pinctrl: cherryview: fix address_space_handler() argument
Bharath SM bharathsm@microsoft.com cifs: update desired access while requesting for directory lease
Helge Deller deller@gmx.de parisc: led: Reduce CPU overhead for disk & lan LED computation
Helge Deller deller@gmx.de parisc: led: Fix LAN receive and transmit LEDs
Kalesh Singh kaleshsingh@google.com Multi-gen LRU: avoid race in inc_min_seq()
Andrew Donnellan ajd@linux.ibm.com lib/test_meminit: allocate pages up to order MAX_ORDER
Muchun Song muchun.song@linux.dev mm: hugetlb_vmemmap: fix a race between vmemmap pmd split
Michal Hocko mhocko@suse.com memcg: drop kmem.limit_in_bytes
Steve French stfrench@microsoft.com send channel sequence number in SMB3 requests after reconnects
Aleksey Nasibulin alealexpro100@ya.ru ARM: dts: BCM5301X: Extend RAM to full 256MB for Linksys EA6500 V2
Chris Paterson chris.paterson2@renesas.com arm64: dts: renesas: rzg2l: Fix txdv-skew-psec typos
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org ARM: dts: qcom: msm8974pro-castor: correct touchscreen syna,nosleep-mode
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org ARM: dts: qcom: msm8974pro-castor: correct touchscreen function names
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org arm64: dts: qcom: msm8953-vince: drop duplicated touschreen parent interrupt
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org ARM: dts: qcom: msm8974pro-castor: correct inverted X of touchscreen
Johan Hovold johan+linaro@kernel.org clk: qcom: turingcc-qcs404: fix missing resume during probe
Sameer Pujar spujar@nvidia.com arm64: tegra: Update AHUB clock parent and rate
Sheetal sheetal@nvidia.com arm64: tegra: Update AHUB clock parent and rate on Tegra234
Paul Cercueil paul@crapouillou.net ARM: dts: samsung: exynos4210-i9100: Fix LCD screen's physical size
Sheetal sheetal@nvidia.com ASoC: tegra: Fix SFC conversion for few rates
Thomas Zimmermann tzimmermann@suse.de drm/ast: Fix DRAM init on AST2200
Johan Hovold johan+linaro@kernel.org clk: qcom: camcc-sc7180: fix async resume during probe
Thomas Zimmermann tzimmermann@suse.de fbdev/ep93xx-fb: Do not assign to struct fb_info.dev
Ian Kent raven@themaw.net kernfs: fix missing kernfs_iattr_rwsem locking
Chengming Zhou zhouchengming@bytedance.com null_blk: fix poll request timeout handling
Quinn Tran qutran@marvell.com scsi: qla2xxx: Fix firmware resource tracking
Quinn Tran qutran@marvell.com scsi: qla2xxx: Error code did not return to upper layer
Nilesh Javali njavali@marvell.com scsi: qla2xxx: Fix smatch warn for qla_init_iocb_limit()
Quinn Tran qutran@marvell.com scsi: qla2xxx: Flush mailbox commands on chip reset
Manish Rangankar mrangankar@marvell.com scsi: qla2xxx: Remove unsupported ql2xenabledif option
Quinn Tran qutran@marvell.com scsi: qla2xxx: Fix TMF leak through
Quinn Tran qutran@marvell.com scsi: qla2xxx: Fix session hang in gnl
Quinn Tran qutran@marvell.com scsi: qla2xxx: Turn off noisy message log
Quinn Tran qutran@marvell.com scsi: qla2xxx: Fix erroneous link up failure
Quinn Tran qutran@marvell.com scsi: qla2xxx: Fix command flush during TMF
Quinn Tran qutran@marvell.com scsi: qla2xxx: fix inconsistent TMF timeout
Quinn Tran qutran@marvell.com scsi: qla2xxx: Fix deletion race condition
Quinn Tran qutran@marvell.com scsi: qla2xxx: Limit TMF to 8 per function
Quinn Tran qutran@marvell.com scsi: qla2xxx: Adjust IOCB resource on qpair create
Bean Huo beanhuo@micron.com scsi: ufs: core: Add advanced RPMB support where UFSHCI 4.0 does not support EHS length in UTRD
Gurchetan Singh gurchetansingh@chromium.org drm/virtio: Conditionally allocate virtio_gpu_fence
Quan Tian qtian@vmware.com net/ipv6: SKB symmetric hash should incorporate transport ports
-------------
Diffstat:
Documentation/admin-guide/cgroup-v1/memory.rst | 2 - .../devicetree/bindings/clock/xlnx,versal-clk.yaml | 2 - Makefile | 6 +- arch/arc/include/asm/atomic-llsc.h | 6 +- arch/arc/include/asm/atomic64-arcv2.h | 6 +- .../dts/broadcom/bcm4708-linksys-ea6500-v2.dts | 3 +- .../qcom-msm8974pro-sony-xperia-shinano-castor.dts | 8 +- arch/arm/boot/dts/samsung/exynos4210-i9100.dts | 4 +- arch/arm64/boot/dts/nvidia/tegra186.dtsi | 3 +- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 3 +- arch/arm64/boot/dts/nvidia/tegra210.dtsi | 3 +- arch/arm64/boot/dts/nvidia/tegra234.dtsi | 3 +- arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts | 1 - arch/arm64/boot/dts/renesas/rzg2l-smarc-som.dtsi | 4 +- arch/arm64/boot/dts/renesas/rzg2lc-smarc-som.dtsi | 2 +- arch/arm64/boot/dts/renesas/rzg2ul-smarc-som.dtsi | 4 +- arch/mips/Makefile | 6 +- arch/parisc/include/asm/led.h | 4 +- arch/parisc/include/asm/mckinley.h | 8 - arch/s390/net/bpf_jit_comp.c | 10 + arch/sh/boards/mach-ap325rxa/setup.c | 2 +- arch/sh/boards/mach-ecovec24/setup.c | 6 +- arch/sh/boards/mach-kfr2r09/setup.c | 2 +- arch/sh/boards/mach-migor/setup.c | 2 +- arch/sh/boards/mach-se/7724/setup.c | 6 +- arch/sh/drivers/push-switch.c | 2 +- arch/x86/include/asm/virtext.h | 6 - arch/x86/kvm/svm/avic.c | 59 +++- arch/x86/kvm/svm/nested.c | 9 +- arch/x86/kvm/svm/sev.c | 14 +- arch/x86/kvm/svm/svm.c | 45 ++- arch/x86/kvm/vmx/vmx.c | 21 +- block/blk-throttle.c | 99 +++--- drivers/accel/ivpu/ivpu_jsm_msg.c | 3 +- drivers/ata/ahci.c | 2 + drivers/ata/pata_falcon.c | 50 +-- drivers/ata/pata_ftide010.c | 1 + drivers/ata/sata_gemini.c | 1 + drivers/block/null_blk/main.c | 12 +- drivers/bus/mhi/host/pm.c | 5 + drivers/char/tpm/tpm_crb.c | 5 +- drivers/clk/imx/clk-pll14xx.c | 13 +- drivers/clk/qcom/camcc-sc7180.c | 2 +- drivers/clk/qcom/dispcc-sm8450.c | 13 +- drivers/clk/qcom/dispcc-sm8550.c | 13 +- drivers/clk/qcom/gcc-mdm9615.c | 2 +- drivers/clk/qcom/lpasscc-sc7280.c | 16 +- drivers/clk/qcom/mss-sc7180.c | 13 +- drivers/clk/qcom/q6sstop-qcs404.c | 15 +- drivers/clk/qcom/turingcc-qcs404.c | 13 +- drivers/clocksource/arm_arch_timer.c | 7 + drivers/dma/sh/rz-dmac.c | 11 +- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 26 +- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 3 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 +- .../drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 7 + drivers/gpu/drm/amd/display/dc/Makefile | 1 + drivers/gpu/drm/amd/display/dc/core/dc.c | 68 ++-- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c | 5 +- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 11 - .../gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 25 +- .../drm/amd/display/modules/freesync/freesync.c | 9 +- drivers/gpu/drm/ast/ast_post.c | 2 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 + drivers/gpu/drm/i915/gvt/gtt.c | 27 +- drivers/gpu/drm/i915/gvt/gtt.h | 1 - drivers/gpu/drm/i915/i915_request.c | 7 +- drivers/gpu/drm/mxsfb/mxsfb_kms.c | 9 + drivers/gpu/drm/virtio/virtgpu_submit.c | 32 +- drivers/hwspinlock/qcom_hwspinlock.c | 9 + drivers/i3c/master/svc-i3c-master.c | 1 + drivers/input/keyboard/tca6416-keypad.c | 31 +- drivers/input/misc/iqs7222.c | 8 +- drivers/mailbox/qcom-ipcc.c | 4 +- drivers/misc/fastrpc.c | 22 +- drivers/mtd/nand/raw/brcmnand/brcmnand.c | 112 ++++-- drivers/mtd/spi-nor/winbond.c | 5 +- drivers/net/dsa/microchip/ksz_common.c | 16 +- drivers/net/dsa/sja1105/sja1105.h | 4 + drivers/net/dsa/sja1105/sja1105_dynamic_config.c | 93 +++-- drivers/net/dsa/sja1105/sja1105_main.c | 120 +++++-- drivers/net/dsa/sja1105/sja1105_spi.c | 4 + drivers/net/ethernet/adi/adin1110.c | 10 +- drivers/net/ethernet/cadence/macb_main.c | 5 +- drivers/net/ethernet/freescale/enetc/enetc_pf.c | 2 +- drivers/net/ethernet/google/gve/gve_rx_dqo.c | 5 +- drivers/net/ethernet/hisilicon/hns3/hnae3.h | 1 + drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c | 7 +- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 19 +- drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 4 +- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 20 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 14 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 5 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 2 - drivers/net/ethernet/intel/igb/igb.h | 4 +- drivers/net/ethernet/intel/igb/igb_main.c | 10 +- drivers/net/ethernet/intel/igbvf/igbvf.h | 4 +- drivers/net/ethernet/intel/igc/igc.h | 4 +- drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 28 +- drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 5 + .../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 21 +- drivers/net/ethernet/marvell/octeontx2/nic/cn10k.c | 6 +- drivers/net/ethernet/marvell/octeontx2/nic/cn10k.h | 2 +- .../ethernet/marvell/octeontx2/nic/otx2_common.c | 43 +-- .../ethernet/marvell/octeontx2/nic/otx2_common.h | 3 +- .../net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 7 +- .../net/ethernet/marvell/octeontx2/nic/otx2_txrx.c | 30 +- .../net/ethernet/marvell/octeontx2/nic/otx2_txrx.h | 4 +- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 3 + .../net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c | 4 +- .../ethernet/mellanox/mlx5/core/en/tc/act/mirred.c | 1 + .../ethernet/mellanox/mlx5/core/en/tc/act/pedit.c | 4 +- .../mlx5/core/en/tc/act/redirect_ingress.c | 1 + .../ethernet/mellanox/mlx5/core/en/tc/act/vlan.c | 1 + .../mellanox/mlx5/core/en/tc/act/vlan_mangle.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 1 + .../ethernet/mellanox/mlx5/core/esw/devlink_port.c | 62 ++-- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 64 +++- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 8 +- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 75 ++-- drivers/net/ethernet/microchip/vcap/vcap_api.c | 18 +- drivers/net/ethernet/renesas/rswitch.c | 8 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 +- drivers/net/phy/micrel.c | 9 +- drivers/net/usb/r8152.c | 3 + drivers/net/veth.c | 6 +- drivers/parisc/led.c | 4 +- drivers/parisc/sba_iommu.c | 10 +- drivers/pinctrl/intel/pinctrl-cherryview.c | 5 +- drivers/platform/mellanox/Kconfig | 4 +- drivers/platform/mellanox/mlxbf-pmc.c | 41 +-- drivers/platform/mellanox/mlxbf-tmfifo.c | 90 +++-- drivers/pwm/pwm-atmel-tcb.c | 64 ++-- drivers/pwm/pwm-lpc32xx.c | 16 +- drivers/regulator/raa215300.c | 32 +- drivers/regulator/tps6287x-regulator.c | 2 +- drivers/regulator/tps6594-regulator.c | 31 +- drivers/s390/crypto/zcrypt_api.c | 1 + drivers/scsi/qla2xxx/qla_attr.c | 2 - drivers/scsi/qla2xxx/qla_dbg.c | 2 +- drivers/scsi/qla2xxx/qla_def.h | 21 +- drivers/scsi/qla2xxx/qla_dfs.c | 10 + drivers/scsi/qla2xxx/qla_gbl.h | 1 + drivers/scsi/qla2xxx/qla_init.c | 234 ++++++++----- drivers/scsi/qla2xxx/qla_inline.h | 57 +++- drivers/scsi/qla2xxx/qla_iocb.c | 1 + drivers/scsi/qla2xxx/qla_isr.c | 7 +- drivers/scsi/qla2xxx/qla_mbx.c | 7 +- drivers/scsi/qla2xxx/qla_nvme.c | 3 +- drivers/scsi/qla2xxx/qla_os.c | 26 +- drivers/scsi/qla2xxx/qla_target.c | 14 +- drivers/soc/qcom/qmi_encdec.c | 4 +- drivers/ufs/core/ufs_bsg.c | 3 +- drivers/ufs/core/ufshcd.c | 10 +- drivers/video/backlight/gpio_backlight.c | 3 +- drivers/video/backlight/lp855x_bl.c | 20 +- drivers/video/fbdev/ep93xx-fb.c | 1 - drivers/watchdog/Kconfig | 2 + drivers/watchdog/intel-mid_wdt.c | 1 + fs/btrfs/disk-io.c | 5 +- fs/btrfs/extent-tree.c | 43 +-- fs/btrfs/file-item.c | 34 +- fs/btrfs/file-item.h | 6 +- fs/btrfs/inode.c | 7 + fs/btrfs/raid56.c | 4 +- fs/btrfs/relocation.c | 12 +- fs/btrfs/scrub.c | 152 ++++++--- fs/btrfs/space-info.c | 6 +- fs/btrfs/transaction.c | 26 +- fs/btrfs/zoned.c | 16 +- fs/ext4/balloc.c | 15 +- fs/ext4/block_validity.c | 8 +- fs/ext4/crypto.c | 4 + fs/ext4/ext4.h | 2 + fs/ext4/extents_status.c | 44 ++- fs/ext4/file.c | 25 +- fs/f2fs/data.c | 8 +- fs/f2fs/f2fs.h | 24 +- fs/f2fs/inline.c | 3 +- fs/f2fs/segment.c | 2 + fs/fuse/readdir.c | 10 +- fs/gfs2/aops.c | 4 +- fs/gfs2/log.c | 25 +- fs/jbd2/checkpoint.c | 22 +- fs/jbd2/recovery.c | 12 +- fs/kernfs/dir.c | 4 + fs/nfs/direct.c | 20 +- fs/nfs/pnfs_dev.c | 2 +- fs/smb/client/cached_dir.c | 2 +- fs/smb/client/cifsglob.h | 1 + fs/smb/client/connect.c | 1 + fs/smb/client/fscache.c | 2 +- fs/smb/client/smb2ops.c | 11 +- fs/smb/client/smb2pdu.c | 11 + fs/smb/common/smb2pdu.h | 22 ++ include/linux/audit.h | 2 + include/linux/bpf.h | 12 + include/linux/ceph/ceph_fs.h | 24 +- include/linux/export-internal.h | 2 + include/linux/ipv6.h | 1 + include/linux/micrel_phy.h | 7 +- include/linux/phylink.h | 4 +- include/linux/tca6416_keypad.h | 1 - include/net/ip.h | 3 +- include/net/ip6_fib.h | 5 +- include/net/ip_fib.h | 5 +- include/net/ip_tunnels.h | 15 +- include/net/ipv6.h | 7 +- include/net/scm.h | 14 +- include/net/sock.h | 29 +- kernel/auditsc.c | 2 + kernel/bpf/bpf_local_storage.c | 49 +-- kernel/bpf/core.c | 10 +- kernel/bpf/syscall.c | 2 +- kernel/bpf/trampoline.c | 5 +- kernel/trace/bpf_trace.c | 11 - lib/idr.c | 2 +- lib/kunit/test.c | 3 +- lib/test_meminit.c | 2 +- lib/test_scanf.c | 2 +- mm/hugetlb_vmemmap.c | 34 +- mm/memcontrol.c | 32 +- mm/mremap.c | 2 +- mm/vmscan.c | 13 +- net/can/j1939/socket.c | 10 +- net/core/flow_dissector.c | 3 +- net/core/skbuff.c | 10 +- net/core/skmsg.c | 12 +- net/core/sock.c | 27 +- net/handshake/netlink.c | 18 +- net/hsr/hsr_forward.c | 1 + net/ipv4/devinet.c | 10 +- net/ipv4/fib_semantics.c | 5 +- net/ipv4/fib_trie.c | 3 +- net/ipv4/inet_hashtables.c | 36 +- net/ipv4/ip_input.c | 3 +- net/ipv4/ip_output.c | 2 +- net/ipv4/ip_sockglue.c | 2 +- net/ipv4/route.c | 1 + net/ipv4/tcp.c | 4 +- net/ipv4/tcp_output.c | 2 +- net/ipv4/udp.c | 6 +- net/ipv6/addrconf.c | 2 +- net/ipv6/ip6_input.c | 3 +- net/ipv6/ip6_output.c | 2 +- net/ipv6/ping.c | 2 +- net/ipv6/raw.c | 2 +- net/ipv6/route.c | 3 + net/ipv6/udp.c | 2 +- net/kcm/kcmsock.c | 15 +- net/mptcp/protocol.c | 23 +- net/netfilter/nf_tables_api.c | 54 ++- net/netfilter/nfnetlink_osf.c | 8 + net/netfilter/nft_exthdr.c | 22 +- net/netfilter/nft_set_rbtree.c | 8 +- net/sched/sch_fq_pie.c | 27 +- net/sched/sch_plug.c | 2 +- net/sched/sch_qfq.c | 22 +- net/sctp/proc.c | 2 +- net/sctp/socket.c | 10 +- net/smc/smc_core.c | 2 + net/socket.c | 15 +- net/tls/tls_sw.c | 4 +- net/unix/af_unix.c | 2 +- net/unix/scm.c | 6 +- net/xdp/xsk_diag.c | 3 + scripts/dummy-tools/gcc | 3 +- scripts/kconfig/preprocess.c | 3 + scripts/mod/modpost.c | 9 + scripts/package/mkspec | 2 +- sound/soc/tegra/tegra210_sfc.c | 31 +- sound/soc/tegra/tegra210_sfc.h | 4 +- tools/build/Makefile.build | 10 + tools/mm/Makefile | 4 +- tools/perf/Documentation/perf-dlfilter.txt | 22 +- tools/perf/Makefile.perf | 2 +- tools/perf/builtin-lock.c | 1 + tools/perf/builtin-script.c | 22 +- tools/perf/builtin-top.c | 1 + tools/perf/builtin-trace.c | 9 +- tools/perf/dlfilters/dlfilter-test-api-v2.c | 377 +++++++++++++++++++++ tools/perf/include/perf/perf_dlfilter.h | 11 +- tools/perf/pmu-events/Build | 6 + .../arch/arm64/ampere/ampereone/cache.json | 3 - .../pmu-events/arch/powerpc/power10/cache.json | 47 +-- .../arch/powerpc/power10/floating_point.json | 66 +++- .../pmu-events/arch/powerpc/power10/frontend.json | 188 +--------- .../pmu-events/arch/powerpc/power10/marked.json | 194 ++++++++--- .../pmu-events/arch/powerpc/power10/memory.json | 91 +---- .../pmu-events/arch/powerpc/power10/metrics.json | 56 ++- .../pmu-events/arch/powerpc/power10/others.json | 209 ++---------- .../pmu-events/arch/powerpc/power10/pipeline.json | 269 +++++++++++---- .../perf/pmu-events/arch/powerpc/power10/pmc.json | 193 ++++++++++- .../arch/powerpc/power10/translation.json | 42 +-- tools/perf/pmu-events/jevents.py | 2 +- tools/perf/tests/dlfilter-test.c | 38 ++- tools/perf/tests/shell/stat_bpf_counters.sh | 4 +- tools/perf/tests/shell/stat_bpf_counters_cgrp.sh | 28 +- tools/perf/ui/browsers/hists.c | 60 ++-- tools/perf/util/annotate.c | 10 +- tools/perf/util/bpf-filter.c | 14 +- tools/perf/util/dlfilter.c | 30 ++ tools/perf/util/expr.c | 4 +- tools/perf/util/header.c | 42 +-- tools/perf/util/parse-events.c | 4 +- tools/perf/util/parse-events.y | 256 +++++++++----- tools/perf/util/pmu.c | 4 +- .../selftests/bpf/prog_tests/bpf_obj_pinning.c | 5 +- .../selftests/bpf/prog_tests/sockmap_helpers.h | 26 ++ .../selftests/bpf/prog_tests/sockmap_listen.c | 7 + .../trigger-synthetic-event-dynstring.tc | 2 +- .../trigger-synthetic_event_syntax_errors.tc | 2 +- tools/testing/selftests/kselftest/runner.sh | 3 +- tools/testing/selftests/lib.mk | 4 +- tools/testing/selftests/net/bind_wildcard.c | 2 +- 316 files changed, 3898 insertions(+), 2319 deletions(-)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quan Tian qtian@vmware.com
commit a5e2151ff9d5852d0ababbbcaeebd9646af9c8d9 upstream.
__skb_get_hash_symmetric() was added to compute a symmetric hash over the protocol, addresses and transport ports, by commit eb70db875671 ("packet: Use symmetric hash for PACKET_FANOUT_HASH."). It uses flow_keys_dissector_symmetric_keys as the flow_dissector to incorporate IPv4 addresses, IPv6 addresses and ports. However, it should not specify the flag as FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, which stops further dissection when an IPv6 flow label is encountered, making transport ports not being incorporated in such case.
As a consequence, the symmetric hash is based on 5-tuple for IPv4 but 3-tuple for IPv6 when flow label is present. It caused a few problems, e.g. when nft symhash and openvswitch l4_sym rely on the symmetric hash to perform load balancing as different L4 flows between two given IPv6 addresses would always get the same symmetric hash, leading to uneven traffic distribution.
Removing the use of FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL makes sure the symmetric hash is based on 5-tuple for both IPv4 and IPv6 consistently.
Fixes: eb70db875671 ("packet: Use symmetric hash for PACKET_FANOUT_HASH.") Reported-by: Lars Ekman uablrek@gmail.com Closes: https://github.com/antrea-io/antrea/issues/5457 Signed-off-by: Quan Tian qtian@vmware.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/flow_dissector.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -1780,8 +1780,7 @@ u32 __skb_get_hash_symmetric(const struc
memset(&keys, 0, sizeof(keys)); __skb_flow_dissect(NULL, skb, &flow_keys_dissector_symmetric, - &keys, NULL, 0, 0, 0, - FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL); + &keys, NULL, 0, 0, 0, 0);
return __flow_hash_from_keys(&keys, &hashrnd); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gurchetan Singh gurchetansingh@chromium.org
commit 70d1ace56db6c79d39dbe9c0d5244452b67e2fde upstream.
We don't want to create a fence for every command submission. It's only necessary when userspace provides a waitable token for submission. This could be:
1) bo_handles, to be used with VIRTGPU_WAIT 2) out_fence_fd, to be used with dma_fence apis 3) a ring_idx provided with VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK + DRM event API 4) syncobjs in the future
The use case for just submitting a command to the host, and expecting no response. For example, gfxstream has GFXSTREAM_CONTEXT_PING that just wakes up the host side worker threads. There's also CROSS_DOMAIN_CMD_SEND which just sends data to the Wayland server.
This prevents the need to signal the automatically created virtio_gpu_fence.
In addition, VIRTGPU_EXECBUF_RING_IDX is checked when creating a DRM event object. VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK is already defined in terms of per-context rings. It was theoretically possible to create a DRM event on the global timeline (ring_idx == 0), if the context enabled DRM event polling. However, that wouldn't work and userspace (Sommelier). Explicitly disallow it for clarity.
Signed-off-by: Gurchetan Singh gurchetansingh@chromium.org Reviewed-by: Dmitry Osipenko dmitry.osipenko@collabora.com Tested-by: Dmitry Osipenko dmitry.osipenko@collabora.com Signed-off-by: Dmitry Osipenko dmitry.osipenko@collabora.com # edited coding style Link: https://patchwork.freedesktop.org/patch/msgid/20230707213124.494-1-gurchetan... Signed-off-by: Alyssa Ross hi@alyssa.is Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/virtio/virtgpu_submit.c | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-)
--- a/drivers/gpu/drm/virtio/virtgpu_submit.c +++ b/drivers/gpu/drm/virtio/virtgpu_submit.c @@ -64,13 +64,9 @@ static int virtio_gpu_fence_event_create struct virtio_gpu_fence *fence, u32 ring_idx) { - struct virtio_gpu_fpriv *vfpriv = file->driver_priv; struct virtio_gpu_fence_event *e = NULL; int ret;
- if (!(vfpriv->ring_idx_mask & BIT_ULL(ring_idx))) - return 0; - e = kzalloc(sizeof(*e), GFP_KERNEL); if (!e) return -ENOMEM; @@ -164,18 +160,30 @@ static int virtio_gpu_init_submit(struct struct virtio_gpu_fpriv *vfpriv = file->driver_priv; struct virtio_gpu_device *vgdev = dev->dev_private; struct virtio_gpu_fence *out_fence; + bool drm_fence_event; int err;
memset(submit, 0, sizeof(*submit));
- out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx); - if (!out_fence) - return -ENOMEM; - - err = virtio_gpu_fence_event_create(dev, file, out_fence, ring_idx); - if (err) { - dma_fence_put(&out_fence->f); - return err; + if ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX) && + (vfpriv->ring_idx_mask & BIT_ULL(ring_idx))) + drm_fence_event = true; + else + drm_fence_event = false; + + if ((exbuf->flags & VIRTGPU_EXECBUF_FENCE_FD_OUT) || + exbuf->num_bo_handles || + drm_fence_event) + out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx); + else + out_fence = NULL; + + if (drm_fence_event) { + err = virtio_gpu_fence_event_create(dev, file, out_fence, ring_idx); + if (err) { + dma_fence_put(&out_fence->f); + return err; + } }
submit->out_fence = out_fence;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bean Huo beanhuo@micron.com
commit c91e585cfb3dd7d076e9ba0967908fc504d32def upstream.
According to UFSHCI 4.0 specification:
5.2 Host Controller Capabilities Registers 5.2.1 Offset 00h: CAP – Controller Capabilities:
"EHS Length in UTRD Supported (EHSLUTRDS): Indicates whether the host controller supports EHS Length field in UTRD.
0 – Host controller takes EHS length from CMD UPIU, and SW driver use EHS Length field in CMD UPIU.
1 – HW controller takes EHS length from UTRD, and SW driver use EHS Length field in UTRD.
NOTE Recommend Host controllers move to taking EHS length from UTRD, and in UFS-5, it will be mandatory."
So, when UFSHCI 4.0 doesn't support EHS Length field in UTRD, we could use EHS Length field in CMD UPIU. Remove the limitation that advanced RPMB only works when EHS length is supported in UTRD.
Fixes: 6ff265fc5ef6 ("scsi: ufs: core: bsg: Add advanced RPMB support in ufs_bsg") Co-developed-by: "jonghwi.rha" jonghwi.rha@samsung.com Signed-off-by: "jonghwi.rha" jonghwi.rha@samsung.com Signed-off-by: Bean Huo beanhuo@micron.com Link: https://lore.kernel.org/r/20230809181847.102123-2-beanhuo@iokpp.de Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ufs/core/ufs_bsg.c | 3 +-- drivers/ufs/core/ufshcd.c | 10 +++++++++- 2 files changed, 10 insertions(+), 3 deletions(-)
--- a/drivers/ufs/core/ufs_bsg.c +++ b/drivers/ufs/core/ufs_bsg.c @@ -76,8 +76,7 @@ static int ufs_bsg_exec_advanced_rpmb_re int ret; int data_len;
- if (hba->ufs_version < ufshci_version(4, 0) || !hba->dev_info.b_advanced_rpmb_en || - !(hba->capabilities & MASK_EHSLUTRD_SUPPORTED)) + if (hba->ufs_version < ufshci_version(4, 0) || !hba->dev_info.b_advanced_rpmb_en) return -EINVAL;
if (rpmb_request->ehs_req.length != 2 || rpmb_request->ehs_req.ehs_type != 1) --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -7296,7 +7296,15 @@ int ufshcd_advanced_rpmb_req_handler(str /* Advanced RPMB starts from UFS 4.0, so its command type is UTP_CMD_TYPE_UFS_STORAGE */ lrbp->command_type = UTP_CMD_TYPE_UFS_STORAGE;
- ufshcd_prepare_req_desc_hdr(lrbp, &upiu_flags, dir, 2); + /* + * According to UFSHCI 4.0 specification page 24, if EHSLUTRDS is 0, host controller takes + * EHS length from CMD UPIU, and SW driver use EHS Length field in CMD UPIU. if it is 1, + * HW controller takes EHS length from UTRD. + */ + if (hba->capabilities & MASK_EHSLUTRD_SUPPORTED) + ufshcd_prepare_req_desc_hdr(lrbp, &upiu_flags, dir, 2); + else + ufshcd_prepare_req_desc_hdr(lrbp, &upiu_flags, dir, 0);
/* update the task tag and LUN in the request upiu */ req_upiu->header.dword_0 |= cpu_to_be32(upiu_flags << 16 | UFS_UPIU_RPMB_WLUN << 8 | tag);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran qutran@marvell.com
commit efa74a62aaa2429c04fe6cb277b3bf6739747d86 upstream.
During NVMe queue creation, a new qpair is created. FW resource limit needs to be re-adjusted to take into account the new qpair. Otherwise, NVMe command can not go through. This issue was discovered while testing/forcing FW execution to fail at load time.
Add call to readjust IOCB and exchange limit.
In addition, get FW state command and require FW to be running. Otherwise, error is generated.
Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran qutran@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230714070104.40052-3-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_gbl.h | 1 drivers/scsi/qla2xxx/qla_init.c | 52 +++++++++++++++++++++++++--------------- drivers/scsi/qla2xxx/qla_mbx.c | 3 ++ drivers/scsi/qla2xxx/qla_nvme.c | 1 4 files changed, 38 insertions(+), 19 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_gbl.h +++ b/drivers/scsi/qla2xxx/qla_gbl.h @@ -143,6 +143,7 @@ void qla_edif_sess_down(struct scsi_qla_ void qla_edif_clear_appdata(struct scsi_qla_host *vha, struct fc_port *fcport); const char *sc_to_str(uint16_t cmd); +void qla_adjust_iocb_limit(scsi_qla_host_t *vha);
/* * Global Data in qla_os.c source file. --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -4147,41 +4147,55 @@ out: return ha->flags.lr_detected; }
-void qla_init_iocb_limit(scsi_qla_host_t *vha) +static void __qla_adjust_iocb_limit(struct qla_qpair *qpair) { - u16 i, num_qps; - u32 limit; - struct qla_hw_data *ha = vha->hw; + u8 num_qps; + u16 limit; + struct qla_hw_data *ha = qpair->vha->hw;
num_qps = ha->num_qpairs + 1; limit = (ha->orig_fw_iocb_count * QLA_IOCB_PCT_LIMIT) / 100;
- ha->base_qpair->fwres.iocbs_total = ha->orig_fw_iocb_count; - ha->base_qpair->fwres.iocbs_limit = limit; - ha->base_qpair->fwres.iocbs_qp_limit = limit / num_qps; - ha->base_qpair->fwres.iocbs_used = 0; + qpair->fwres.iocbs_total = ha->orig_fw_iocb_count; + qpair->fwres.iocbs_limit = limit; + qpair->fwres.iocbs_qp_limit = limit / num_qps; + + qpair->fwres.exch_total = ha->orig_fw_xcb_count; + qpair->fwres.exch_limit = (ha->orig_fw_xcb_count * + QLA_IOCB_PCT_LIMIT) / 100; +}
- ha->base_qpair->fwres.exch_total = ha->orig_fw_xcb_count; - ha->base_qpair->fwres.exch_limit = (ha->orig_fw_xcb_count * - QLA_IOCB_PCT_LIMIT) / 100; +void qla_init_iocb_limit(scsi_qla_host_t *vha) +{ + u8 i; + struct qla_hw_data *ha = vha->hw; + + __qla_adjust_iocb_limit(ha->base_qpair); + ha->base_qpair->fwres.iocbs_used = 0; ha->base_qpair->fwres.exch_used = 0;
for (i = 0; i < ha->max_qpairs; i++) { if (ha->queue_pair_map[i]) { - ha->queue_pair_map[i]->fwres.iocbs_total = - ha->orig_fw_iocb_count; - ha->queue_pair_map[i]->fwres.iocbs_limit = limit; - ha->queue_pair_map[i]->fwres.iocbs_qp_limit = - limit / num_qps; + __qla_adjust_iocb_limit(ha->queue_pair_map[i]); ha->queue_pair_map[i]->fwres.iocbs_used = 0; - ha->queue_pair_map[i]->fwres.exch_total = ha->orig_fw_xcb_count; - ha->queue_pair_map[i]->fwres.exch_limit = - (ha->orig_fw_xcb_count * QLA_IOCB_PCT_LIMIT) / 100; ha->queue_pair_map[i]->fwres.exch_used = 0; } } }
+void qla_adjust_iocb_limit(scsi_qla_host_t *vha) +{ + u8 i; + struct qla_hw_data *ha = vha->hw; + + __qla_adjust_iocb_limit(ha->base_qpair); + + for (i = 0; i < ha->max_qpairs; i++) { + if (ha->queue_pair_map[i]) + __qla_adjust_iocb_limit(ha->queue_pair_map[i]); + } +} + /** * qla2x00_setup_chip() - Load and start RISC firmware. * @vha: HA context --- a/drivers/scsi/qla2xxx/qla_mbx.c +++ b/drivers/scsi/qla2xxx/qla_mbx.c @@ -2213,6 +2213,9 @@ qla2x00_get_firmware_state(scsi_qla_host ql_dbg(ql_dbg_mbx + ql_dbg_verbose, vha, 0x1054, "Entered %s.\n", __func__);
+ if (!ha->flags.fw_started) + return QLA_FUNCTION_FAILED; + mcp->mb[0] = MBC_GET_FIRMWARE_STATE; mcp->out_mb = MBX_0; if (IS_FWI2_CAPABLE(vha->hw)) --- a/drivers/scsi/qla2xxx/qla_nvme.c +++ b/drivers/scsi/qla2xxx/qla_nvme.c @@ -132,6 +132,7 @@ static int qla_nvme_alloc_queue(struct n "Failed to allocate qpair\n"); return -EINVAL; } + qla_adjust_iocb_limit(vha); } *handle = qpair;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran qutran@marvell.com
commit a8ec192427e0516436e61f9ca9eb49c54eadfe0a upstream.
Per FW recommendation, 8 TMF's can be outstanding for each function. Previously, it allowed 8 per target.
Limit TMF to 8 per function.
Cc: stable@vger.kernel.org Fixes: 6a87679626b5 ("scsi: qla2xxx: Fix task management cmd fail due to unavailable resource") Signed-off-by: Quinn Tran qutran@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230714070104.40052-4-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_def.h | 9 +++--- drivers/scsi/qla2xxx/qla_init.c | 55 ++++++++++++++++++++++++---------------- drivers/scsi/qla2xxx/qla_os.c | 2 + 3 files changed, 41 insertions(+), 25 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -466,6 +466,7 @@ static inline be_id_t port_id_to_be_id(p }
struct tmf_arg { + struct list_head tmf_elem; struct qla_qpair *qpair; struct fc_port *fcport; struct scsi_qla_host *vha; @@ -2541,7 +2542,6 @@ enum rscn_addr_format { typedef struct fc_port { struct list_head list; struct scsi_qla_host *vha; - struct list_head tmf_pending;
unsigned int conf_compl_supported:1; unsigned int deleted:2; @@ -2562,9 +2562,6 @@ typedef struct fc_port { unsigned int do_prli_nvme:1;
uint8_t nvme_flag; - uint8_t active_tmf; -#define MAX_ACTIVE_TMF 8 - uint8_t node_name[WWN_SIZE]; uint8_t port_name[WWN_SIZE]; port_id_t d_id; @@ -4656,6 +4653,8 @@ struct qla_hw_data { uint32_t flt_region_aux_img_status_sec; }; uint8_t active_image; + uint8_t active_tmf; +#define MAX_ACTIVE_TMF 8
/* Needed for BEACON */ uint16_t beacon_blink_led; @@ -4670,6 +4669,8 @@ struct qla_hw_data {
struct qla_msix_entry *msix_entries;
+ struct list_head tmf_pending; + struct list_head tmf_active; struct list_head vp_list; /* list of VP */ unsigned long vp_idx_map[(MAX_MULTI_ID_FABRIC / 8) / sizeof(unsigned long)]; --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -2186,30 +2186,42 @@ done: return rval; }
-static void qla_put_tmf(fc_port_t *fcport) +static void qla_put_tmf(struct tmf_arg *arg) { - struct scsi_qla_host *vha = fcport->vha; + struct scsi_qla_host *vha = arg->vha; struct qla_hw_data *ha = vha->hw; unsigned long flags;
spin_lock_irqsave(&ha->tgt.sess_lock, flags); - fcport->active_tmf--; + ha->active_tmf--; + list_del(&arg->tmf_elem); spin_unlock_irqrestore(&ha->tgt.sess_lock, flags); }
static -int qla_get_tmf(fc_port_t *fcport) +int qla_get_tmf(struct tmf_arg *arg) { - struct scsi_qla_host *vha = fcport->vha; + struct scsi_qla_host *vha = arg->vha; struct qla_hw_data *ha = vha->hw; unsigned long flags; + fc_port_t *fcport = arg->fcport; int rc = 0; - LIST_HEAD(tmf_elem); + struct tmf_arg *t;
spin_lock_irqsave(&ha->tgt.sess_lock, flags); - list_add_tail(&tmf_elem, &fcport->tmf_pending); + list_for_each_entry(t, &ha->tmf_active, tmf_elem) { + if (t->fcport == arg->fcport && t->lun == arg->lun) { + /* reject duplicate TMF */ + ql_log(ql_log_warn, vha, 0x802c, + "found duplicate TMF. Nexus=%ld:%06x:%llu.\n", + vha->host_no, fcport->d_id.b24, arg->lun); + spin_unlock_irqrestore(&ha->tgt.sess_lock, flags); + return -EINVAL; + } + }
- while (fcport->active_tmf >= MAX_ACTIVE_TMF) { + list_add_tail(&arg->tmf_elem, &ha->tmf_pending); + while (ha->active_tmf >= MAX_ACTIVE_TMF) { spin_unlock_irqrestore(&ha->tgt.sess_lock, flags);
msleep(1); @@ -2221,15 +2233,17 @@ int qla_get_tmf(fc_port_t *fcport) rc = EIO; break; } - if (fcport->active_tmf < MAX_ACTIVE_TMF && - list_is_first(&tmf_elem, &fcport->tmf_pending)) + if (ha->active_tmf < MAX_ACTIVE_TMF && + list_is_first(&arg->tmf_elem, &ha->tmf_pending)) break; }
- list_del(&tmf_elem); + list_del(&arg->tmf_elem);
- if (!rc) - fcport->active_tmf++; + if (!rc) { + ha->active_tmf++; + list_add_tail(&arg->tmf_elem, &ha->tmf_active); + }
spin_unlock_irqrestore(&ha->tgt.sess_lock, flags);
@@ -2251,15 +2265,18 @@ qla2x00_async_tm_cmd(fc_port_t *fcport, a.vha = fcport->vha; a.fcport = fcport; a.lun = lun; + a.flags = flags; + INIT_LIST_HEAD(&a.tmf_elem); + if (flags & (TCF_LUN_RESET|TCF_ABORT_TASK_SET|TCF_CLEAR_TASK_SET|TCF_CLEAR_ACA)) { a.modifier = MK_SYNC_ID_LUN; - - if (qla_get_tmf(fcport)) - return QLA_FUNCTION_FAILED; } else { a.modifier = MK_SYNC_ID; }
+ if (qla_get_tmf(&a)) + return QLA_FUNCTION_FAILED; + if (vha->hw->mqenable) { for (i = 0; i < vha->hw->num_qpairs; i++) { qpair = vha->hw->queue_pair_map[i]; @@ -2285,13 +2302,10 @@ qla2x00_async_tm_cmd(fc_port_t *fcport, goto bailout;
a.qpair = vha->hw->base_qpair; - a.flags = flags; rval = __qla2x00_async_tm_cmd(&a);
bailout: - if (a.modifier == MK_SYNC_ID_LUN) - qla_put_tmf(fcport); - + qla_put_tmf(&a); return rval; }
@@ -5520,7 +5534,6 @@ qla2x00_alloc_fcport(scsi_qla_host_t *vh INIT_WORK(&fcport->reg_work, qla_register_fcport_fn); INIT_LIST_HEAD(&fcport->gnl_entry); INIT_LIST_HEAD(&fcport->list); - INIT_LIST_HEAD(&fcport->tmf_pending);
INIT_LIST_HEAD(&fcport->sess_cmd_list); spin_lock_init(&fcport->sess_cmd_lock); --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -3009,6 +3009,8 @@ qla2x00_probe_one(struct pci_dev *pdev, atomic_set(&ha->num_pend_mbx_stage3, 0); atomic_set(&ha->zio_threshold, DEFAULT_ZIO_THRESHOLD); ha->last_zio_threshold = DEFAULT_ZIO_THRESHOLD; + INIT_LIST_HEAD(&ha->tmf_pending); + INIT_LIST_HEAD(&ha->tmf_active);
/* Assign ISP specific operations. */ if (IS_QLA2100(ha)) {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran qutran@marvell.com
commit 6dfe4344c168c6ca20fe7640649aacfcefcccb26 upstream.
System crash when using debug kernel due to link list corruption. The cause of the link list corruption is due to session deletion was allowed to queue up twice. Here's the internal trace that show the same port was allowed to double queue for deletion on different cpu.
20808683956 015 qla2xxx [0000:13:00.1]-e801:4: Scheduling sess ffff93ebf9306800 for deletion 50:06:0e:80:12:48:ff:50 fc4_type 1 20808683957 027 qla2xxx [0000:13:00.1]-e801:4: Scheduling sess ffff93ebf9306800 for deletion 50:06:0e:80:12:48:ff:50 fc4_type 1
Move the clearing/setting of deleted flag lock.
Cc: stable@vger.kernel.org Fixes: 726b85487067 ("qla2xxx: Add framework for async fabric discovery") Signed-off-by: Quinn Tran qutran@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230714070104.40052-2-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_init.c | 16 ++++++++++++++-- drivers/scsi/qla2xxx/qla_target.c | 14 +++++++------- 2 files changed, 21 insertions(+), 9 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -508,6 +508,7 @@ static void qla24xx_handle_adisc_event(scsi_qla_host_t *vha, struct event_arg *ea) { struct fc_port *fcport = ea->fcport; + unsigned long flags;
ql_dbg(ql_dbg_disc, vha, 0x20d2, "%s %8phC DS %d LS %d rc %d login %d|%d rscn %d|%d lid %d\n", @@ -522,9 +523,15 @@ void qla24xx_handle_adisc_event(scsi_qla ql_dbg(ql_dbg_disc, vha, 0x2066, "%s %8phC: adisc fail: post delete\n", __func__, ea->fcport->port_name); + + spin_lock_irqsave(&vha->work_lock, flags); /* deleted = 0 & logout_on_delete = force fw cleanup */ - fcport->deleted = 0; + if (fcport->deleted == QLA_SESS_DELETED) + fcport->deleted = 0; + fcport->logout_on_delete = 1; + spin_unlock_irqrestore(&vha->work_lock, flags); + qlt_schedule_sess_for_deletion(ea->fcport); return; } @@ -1446,7 +1453,6 @@ void __qla24xx_handle_gpdb_event(scsi_ql
spin_lock_irqsave(&vha->hw->tgt.sess_lock, flags); ea->fcport->login_gen++; - ea->fcport->deleted = 0; ea->fcport->logout_on_delete = 1;
if (!ea->fcport->login_succ && !IS_SW_RESV_ADDR(ea->fcport->d_id)) { @@ -6117,6 +6123,8 @@ qla2x00_reg_remote_port(scsi_qla_host_t void qla2x00_update_fcport(scsi_qla_host_t *vha, fc_port_t *fcport) { + unsigned long flags; + if (IS_SW_RESV_ADDR(fcport->d_id)) return;
@@ -6126,7 +6134,11 @@ qla2x00_update_fcport(scsi_qla_host_t *v qla2x00_set_fcport_disc_state(fcport, DSC_UPD_FCPORT); fcport->login_retry = vha->hw->login_retry_count; fcport->flags &= ~(FCF_LOGIN_NEEDED | FCF_ASYNC_SENT); + + spin_lock_irqsave(&vha->work_lock, flags); fcport->deleted = 0; + spin_unlock_irqrestore(&vha->work_lock, flags); + if (vha->hw->current_topology == ISP_CFG_NL) fcport->logout_on_delete = 0; else --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -1068,10 +1068,6 @@ void qlt_free_session_done(struct work_s (struct imm_ntfy_from_isp *)sess->iocb, SRB_NACK_LOGO); }
- spin_lock_irqsave(&vha->work_lock, flags); - sess->flags &= ~FCF_ASYNC_SENT; - spin_unlock_irqrestore(&vha->work_lock, flags); - spin_lock_irqsave(&ha->tgt.sess_lock, flags); if (sess->se_sess) { sess->se_sess = NULL; @@ -1081,7 +1077,6 @@ void qlt_free_session_done(struct work_s
qla2x00_set_fcport_disc_state(sess, DSC_DELETED); sess->fw_login_state = DSC_LS_PORT_UNAVAIL; - sess->deleted = QLA_SESS_DELETED;
if (sess->login_succ && !IS_SW_RESV_ADDR(sess->d_id)) { vha->fcport_count--; @@ -1133,10 +1128,15 @@ void qlt_free_session_done(struct work_s
sess->explicit_logout = 0; spin_unlock_irqrestore(&ha->tgt.sess_lock, flags); - sess->free_pending = 0;
qla2x00_dfs_remove_rport(vha, sess);
+ spin_lock_irqsave(&vha->work_lock, flags); + sess->flags &= ~FCF_ASYNC_SENT; + sess->deleted = QLA_SESS_DELETED; + sess->free_pending = 0; + spin_unlock_irqrestore(&vha->work_lock, flags); + ql_dbg(ql_dbg_disc, vha, 0xf001, "Unregistration of sess %p %8phC finished fcp_cnt %d\n", sess, sess->port_name, vha->fcport_count); @@ -1185,12 +1185,12 @@ void qlt_unreg_sess(struct fc_port *sess * management from being sent. */ sess->flags |= FCF_ASYNC_SENT; + sess->deleted = QLA_SESS_DELETION_IN_PROGRESS; spin_unlock_irqrestore(&sess->vha->work_lock, flags);
if (sess->se_sess) vha->hw->tgt.tgt_ops->clear_nacl_from_fcport_map(sess);
- sess->deleted = QLA_SESS_DELETION_IN_PROGRESS; qla2x00_set_fcport_disc_state(sess, DSC_DELETE_PEND); sess->last_rscn_gen = sess->rscn_gen; sess->last_login_gen = sess->login_gen;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran qutran@marvell.com
commit 009e7fe4a1ed52276b332842a6b6e23b07200f2d upstream.
Different behavior were experienced of session being torn down vs not when TMF is timed out. When FW detects the time out, the session is torn down. When driver detects the time out, the session is not torn down.
Allow TMF error to return to upper layer without session tear down.
Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran qutran@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230714070104.40052-10-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_isr.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -2539,7 +2539,6 @@ qla24xx_tm_iocb_entry(scsi_qla_host_t *v case CS_PORT_BUSY: case CS_INCOMPLETE: case CS_PORT_UNAVAILABLE: - case CS_TIMEOUT: case CS_RESET: if (atomic_read(&fcport->state) == FCS_ONLINE) { ql_dbg(ql_dbg_disc, fcport->vha, 0x3021,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran qutran@marvell.com
commit da7c21b72aa86e990af5f73bce6590b8d8d148d0 upstream.
For each TMF request, driver iterates through each qpair and flushes commands associated to the TMF. At the end of the qpair flush, a Marker is used to complete the flush transaction. This process was repeated for each qpair. The multiple flush and marker for this TMF request seems to cause confusion for FW.
Instead, 1 flush is sent to FW. Driver would wait for FW to go through all the I/Os on each qpair to be read then return. Driver then closes out the transaction with a Marker.
Cc: stable@vger.kernel.org Fixes: d90171dd0da5 ("scsi: qla2xxx: Multi-que support for TMF") Signed-off-by: Quinn Tran qutran@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230714070104.40052-5-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_init.c | 74 +++++++++++++++++++++------------------- drivers/scsi/qla2xxx/qla_iocb.c | 1 drivers/scsi/qla2xxx/qla_os.c | 9 ++-- 3 files changed, 45 insertions(+), 39 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -2002,12 +2002,11 @@ qla2x00_tmf_iocb_timeout(void *data) int rc, h; unsigned long flags;
- if (sp->type == SRB_MARKER) { - complete(&tmf->u.tmf.comp); - return; - } + if (sp->type == SRB_MARKER) + rc = QLA_FUNCTION_FAILED; + else + rc = qla24xx_async_abort_cmd(sp, false);
- rc = qla24xx_async_abort_cmd(sp, false); if (rc) { spin_lock_irqsave(sp->qpair->qp_lock_ptr, flags); for (h = 1; h < sp->qpair->req->num_outstanding_cmds; h++) { @@ -2129,6 +2128,17 @@ static void qla2x00_tmf_sp_done(srb_t *s complete(&tmf->u.tmf.comp); }
+static int qla_tmf_wait(struct tmf_arg *arg) +{ + /* there are only 2 types of error handling that reaches here, lun or target reset */ + if (arg->flags & (TCF_LUN_RESET | TCF_ABORT_TASK_SET | TCF_CLEAR_TASK_SET)) + return qla2x00_eh_wait_for_pending_commands(arg->vha, + arg->fcport->d_id.b24, arg->lun, WAIT_LUN); + else + return qla2x00_eh_wait_for_pending_commands(arg->vha, + arg->fcport->d_id.b24, arg->lun, WAIT_TARGET); +} + static int __qla2x00_async_tm_cmd(struct tmf_arg *arg) { @@ -2136,8 +2146,9 @@ __qla2x00_async_tm_cmd(struct tmf_arg *a struct srb_iocb *tm_iocb; srb_t *sp; int rval = QLA_FUNCTION_FAILED; - fc_port_t *fcport = arg->fcport; + u32 chip_gen, login_gen; + u64 jif;
if (TMF_NOT_READY(arg->fcport)) { ql_dbg(ql_dbg_taskm, vha, 0x8032, @@ -2182,8 +2193,27 @@ __qla2x00_async_tm_cmd(struct tmf_arg *a "TM IOCB failed (%x).\n", rval); }
- if (!test_bit(UNLOADING, &vha->dpc_flags) && !IS_QLAFX00(vha->hw)) - rval = qla26xx_marker(arg); + if (!test_bit(UNLOADING, &vha->dpc_flags) && !IS_QLAFX00(vha->hw)) { + chip_gen = vha->hw->chip_reset; + login_gen = fcport->login_gen; + + jif = jiffies; + if (qla_tmf_wait(arg)) { + ql_log(ql_log_info, vha, 0x803e, + "Waited %u ms Nexus=%ld:%06x:%llu.\n", + jiffies_to_msecs(jiffies - jif), vha->host_no, + fcport->d_id.b24, arg->lun); + } + + if (chip_gen == vha->hw->chip_reset && login_gen == fcport->login_gen) { + rval = qla26xx_marker(arg); + } else { + ql_log(ql_log_info, vha, 0x803e, + "Skip Marker due to disruption. Nexus=%ld:%06x:%llu.\n", + vha->host_no, fcport->d_id.b24, arg->lun); + rval = QLA_FUNCTION_FAILED; + } + }
done_free_sp: /* ref: INIT */ @@ -2261,9 +2291,8 @@ qla2x00_async_tm_cmd(fc_port_t *fcport, uint32_t tag) { struct scsi_qla_host *vha = fcport->vha; - struct qla_qpair *qpair; struct tmf_arg a; - int i, rval = QLA_SUCCESS; + int rval = QLA_SUCCESS;
if (TMF_NOT_READY(fcport)) return QLA_SUSPENDED; @@ -2283,34 +2312,9 @@ qla2x00_async_tm_cmd(fc_port_t *fcport, if (qla_get_tmf(&a)) return QLA_FUNCTION_FAILED;
- if (vha->hw->mqenable) { - for (i = 0; i < vha->hw->num_qpairs; i++) { - qpair = vha->hw->queue_pair_map[i]; - if (!qpair) - continue; - - if (TMF_NOT_READY(fcport)) { - ql_log(ql_log_warn, vha, 0x8026, - "Unable to send TM due to disruption.\n"); - rval = QLA_SUSPENDED; - break; - } - - a.qpair = qpair; - a.flags = flags|TCF_NOTMCMD_TO_TARGET; - rval = __qla2x00_async_tm_cmd(&a); - if (rval) - break; - } - } - - if (rval) - goto bailout; - a.qpair = vha->hw->base_qpair; rval = __qla2x00_async_tm_cmd(&a);
-bailout: qla_put_tmf(&a); return rval; } --- a/drivers/scsi/qla2xxx/qla_iocb.c +++ b/drivers/scsi/qla2xxx/qla_iocb.c @@ -3882,6 +3882,7 @@ qla_marker_iocb(srb_t *sp, struct mrk_en { mrk->entry_type = MARKER_TYPE; mrk->modifier = sp->u.iocb_cmd.u.tmf.modifier; + mrk->handle = make_handle(sp->qpair->req->id, sp->handle); if (sp->u.iocb_cmd.u.tmf.modifier != MK_SYNC_ALL) { mrk->nport_handle = cpu_to_le16(sp->u.iocb_cmd.u.tmf.loop_id); int_to_scsilun(sp->u.iocb_cmd.u.tmf.lun, (struct scsi_lun *)&mrk->lun); --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -1488,8 +1488,9 @@ qla2xxx_eh_device_reset(struct scsi_cmnd goto eh_reset_failed; } err = 3; - if (qla2x00_eh_wait_for_pending_commands(vha, sdev->id, - sdev->lun, WAIT_LUN) != QLA_SUCCESS) { + if (qla2x00_eh_wait_for_pending_commands(vha, fcport->d_id.b24, + cmd->device->lun, + WAIT_LUN) != QLA_SUCCESS) { ql_log(ql_log_warn, vha, 0x800d, "wait for pending cmds failed for cmd=%p.\n", cmd); goto eh_reset_failed; @@ -1555,8 +1556,8 @@ qla2xxx_eh_target_reset(struct scsi_cmnd goto eh_reset_failed; } err = 3; - if (qla2x00_eh_wait_for_pending_commands(vha, sdev->id, - 0, WAIT_TARGET) != QLA_SUCCESS) { + if (qla2x00_eh_wait_for_pending_commands(vha, fcport->d_id.b24, 0, + WAIT_TARGET) != QLA_SUCCESS) { ql_log(ql_log_warn, vha, 0x800d, "wait for pending cmds failed for cmd=%p.\n", cmd); goto eh_reset_failed;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran qutran@marvell.com
commit 5b51f35d127e7bef55fa869d2465e2bca4636454 upstream.
Link up failure occurred where driver failed to see certain events from FW indicating link up (AEN 8011) and fabric login completion (AEN 8014). Without these 2 events, driver would not proceed forward to scan the fabric. The cause of this is due to delay in the receive of interrupt for Mailbox 60 that causes qla to set the fw_started flag late. The late setting of this flag causes other interrupts to be dropped. These dropped interrupts happen to be the link up (AEN 8011) and fabric login completion (AEN 8014).
Set fw_started flag early to prevent interrupts being dropped.
Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran qutran@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230714070104.40052-6-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_init.c | 3 ++- drivers/scsi/qla2xxx/qla_isr.c | 6 +++++- 2 files changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -4815,15 +4815,16 @@ qla2x00_init_rings(scsi_qla_host_t *vha) if (ha->flags.edif_enabled) mid_init_cb->init_cb.frame_payload_size = cpu_to_le16(ELS_MAX_PAYLOAD);
+ QLA_FW_STARTED(ha); rval = qla2x00_init_firmware(vha, ha->init_cb_size); next_check: if (rval) { + QLA_FW_STOPPED(ha); ql_log(ql_log_fatal, vha, 0x00d2, "Init Firmware **** FAILED ****.\n"); } else { ql_dbg(ql_dbg_init, vha, 0x00d3, "Init Firmware -- success.\n"); - QLA_FW_STARTED(ha); vha->u_ql2xexchoffld = vha->u_ql2xiniexchg = 0; }
--- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -1121,8 +1121,12 @@ qla2x00_async_event(scsi_qla_host_t *vha unsigned long flags; fc_port_t *fcport = NULL;
- if (!vha->hw->flags.fw_started) + if (!vha->hw->flags.fw_started) { + ql_log(ql_log_warn, vha, 0x50ff, + "Dropping AEN - %04x %04x %04x %04x.\n", + mb[0], mb[1], mb[2], mb[3]); return; + }
/* Setup to process RIO completion. */ handle_cnt = 0;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran qutran@marvell.com
commit 8ebaa45163a3fedc885c1dc7d43ea987a2f00a06 upstream.
Some consider noisy log as test failure. Turn off noisy message log.
Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran qutran@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230714070104.40052-8-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_nvme.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/scsi/qla2xxx/qla_nvme.c +++ b/drivers/scsi/qla2xxx/qla_nvme.c @@ -668,7 +668,7 @@ static int qla_nvme_post_cmd(struct nvme
rval = qla2x00_start_nvme_mq(sp); if (rval != QLA_SUCCESS) { - ql_log(ql_log_warn, vha, 0x212d, + ql_dbg(ql_dbg_io + ql_dbg_verbose, vha, 0x212d, "qla2x00_start_nvme_mq failed = %d\n", rval); sp->priv = NULL; priv->sp = NULL;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran qutran@marvell.com
commit 39d22740712c7563a2e18c08f033deeacdaf66e7 upstream.
Connection does not resume after a host reset / chip reset. The cause of the blockage is due to the FCF_ASYNC_ACTIVE left on. The gnl command was interrupted by the chip reset. On exiting the command, this flag should be turn off to allow relogin to reoccur. Clear this flag to prevent blockage.
Cc: stable@vger.kernel.org Fixes: 17e64648aa47 ("scsi: qla2xxx: Correct fcport flags handling") Signed-off-by: Quinn Tran qutran@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230714070104.40052-7-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_init.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -1141,7 +1141,7 @@ int qla24xx_async_gnl(struct scsi_qla_ho u16 *mb;
if (!vha->flags.online || (fcport->flags & FCF_ASYNC_SENT)) - return rval; + goto done;
ql_dbg(ql_dbg_disc, vha, 0x20d9, "Async-gnlist WWPN %8phC \n", fcport->port_name); @@ -1195,8 +1195,9 @@ int qla24xx_async_gnl(struct scsi_qla_ho done_free_sp: /* ref: INIT */ kref_put(&sp->cmd_kref, qla2x00_sp_release); + fcport->flags &= ~(FCF_ASYNC_SENT); done: - fcport->flags &= ~(FCF_ASYNC_ACTIVE | FCF_ASYNC_SENT); + fcport->flags &= ~(FCF_ASYNC_ACTIVE); return rval; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran qutran@marvell.com
commit 5d3148d8e8b05f084e607ac3bd55a4c317a9f934 upstream.
Task management can retry up to 5 times when FW resource becomes bottle neck. Between the retries, there is a short sleep. Current code assumes the chip has not reset or session has not changed.
Check for chip reset or session change before sending Task management.
Cc: stable@vger.kernel.org Fixes: 9803fb5d2759 ("scsi: qla2xxx: Fix task management cmd failure") Signed-off-by: Quinn Tran qutran@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230714070104.40052-9-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_init.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -2038,10 +2038,14 @@ static void qla_marker_sp_done(srb_t *sp complete(&tmf->u.tmf.comp); }
-#define START_SP_W_RETRIES(_sp, _rval) \ +#define START_SP_W_RETRIES(_sp, _rval, _chip_gen, _login_gen) \ {\ int cnt = 5; \ do { \ + if (_chip_gen != sp->vha->hw->chip_reset || _login_gen != sp->fcport->login_gen) {\ + _rval = EINVAL; \ + break; \ + } \ _rval = qla2x00_start_sp(_sp); \ if (_rval == EAGAIN) \ msleep(1); \ @@ -2064,6 +2068,7 @@ qla26xx_marker(struct tmf_arg *arg) srb_t *sp; int rval = QLA_FUNCTION_FAILED; fc_port_t *fcport = arg->fcport; + u32 chip_gen, login_gen;
if (TMF_NOT_READY(arg->fcport)) { ql_dbg(ql_dbg_taskm, vha, 0x8039, @@ -2073,6 +2078,9 @@ qla26xx_marker(struct tmf_arg *arg) return QLA_SUSPENDED; }
+ chip_gen = vha->hw->chip_reset; + login_gen = fcport->login_gen; + /* ref: INIT */ sp = qla2xxx_get_qpair_sp(vha, arg->qpair, fcport, GFP_KERNEL); if (!sp) @@ -2090,7 +2098,7 @@ qla26xx_marker(struct tmf_arg *arg) tm_iocb->u.tmf.loop_id = fcport->loop_id; tm_iocb->u.tmf.vp_index = vha->vp_idx;
- START_SP_W_RETRIES(sp, rval); + START_SP_W_RETRIES(sp, rval, chip_gen, login_gen);
ql_dbg(ql_dbg_taskm, vha, 0x8006, "Async-marker hdl=%x loop-id=%x portid=%06x modifier=%x lun=%lld qp=%d rval %d.\n", @@ -2159,6 +2167,9 @@ __qla2x00_async_tm_cmd(struct tmf_arg *a return QLA_SUSPENDED; }
+ chip_gen = vha->hw->chip_reset; + login_gen = fcport->login_gen; + /* ref: INIT */ sp = qla2xxx_get_qpair_sp(vha, arg->qpair, fcport, GFP_KERNEL); if (!sp) @@ -2176,7 +2187,7 @@ __qla2x00_async_tm_cmd(struct tmf_arg *a tm_iocb->u.tmf.flags = arg->flags; tm_iocb->u.tmf.lun = arg->lun;
- START_SP_W_RETRIES(sp, rval); + START_SP_W_RETRIES(sp, rval, chip_gen, login_gen);
ql_dbg(ql_dbg_taskm, vha, 0x802f, "Async-tmf hdl=%x loop-id=%x portid=%06x ctrl=%x lun=%lld qp=%d rval=%x.\n", @@ -2195,9 +2206,6 @@ __qla2x00_async_tm_cmd(struct tmf_arg *a }
if (!test_bit(UNLOADING, &vha->dpc_flags) && !IS_QLAFX00(vha->hw)) { - chip_gen = vha->hw->chip_reset; - login_gen = fcport->login_gen; - jif = jiffies; if (qla_tmf_wait(arg)) { ql_log(ql_log_info, vha, 0x803e,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Manish Rangankar mrangankar@marvell.com
commit e9105c4b7a9208a21a9bda133707624f12ddabc2 upstream.
User accidently passed module parameter ql2xenabledif=1 which is unsupported. However, driver still initialized which lead to guard tag errors during device discovery.
Remove unsupported ql2xenabledif=1 option and validate the user input.
Cc: stable@vger.kernel.org Signed-off-by: Manish Rangankar mrangankar@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230821130045.34850-7-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_attr.c | 2 -- drivers/scsi/qla2xxx/qla_dbg.c | 2 +- drivers/scsi/qla2xxx/qla_os.c | 9 +++++++-- 3 files changed, 8 insertions(+), 5 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_attr.c +++ b/drivers/scsi/qla2xxx/qla_attr.c @@ -3093,8 +3093,6 @@ qla24xx_vport_create(struct fc_vport *fc vha->flags.difdix_supported = 1; ql_dbg(ql_dbg_user, vha, 0x7082, "Registered for DIF/DIX type 1 and 3 protection.\n"); - if (ql2xenabledif == 1) - prot = SHOST_DIX_TYPE0_PROTECTION; scsi_host_set_prot(vha->host, prot | SHOST_DIF_TYPE1_PROTECTION | SHOST_DIF_TYPE2_PROTECTION --- a/drivers/scsi/qla2xxx/qla_dbg.c +++ b/drivers/scsi/qla2xxx/qla_dbg.c @@ -18,7 +18,7 @@ * | Queue Command and IO tracing | 0x3074 | 0x300b | * | | | 0x3027-0x3028 | * | | | 0x303d-0x3041 | - * | | | 0x302d,0x3033 | + * | | | 0x302e,0x3033 | * | | | 0x3036,0x3038 | * | | | 0x303a | * | DPC Thread | 0x4023 | 0x4002,0x4013 | --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -3288,6 +3288,13 @@ qla2x00_probe_one(struct pci_dev *pdev, host->max_id = ha->max_fibre_devices; host->cmd_per_lun = 3; host->unique_id = host->host_no; + + if (ql2xenabledif && ql2xenabledif != 2) { + ql_log(ql_log_warn, base_vha, 0x302d, + "Invalid value for ql2xenabledif, resetting it to default (2)\n"); + ql2xenabledif = 2; + } + if (IS_T10_PI_CAPABLE(ha) && ql2xenabledif) host->max_cmd_len = 32; else @@ -3524,8 +3531,6 @@ skip_dpc: base_vha->flags.difdix_supported = 1; ql_dbg(ql_dbg_init, base_vha, 0x00f1, "Registering for DIF/DIX type 1 and 3 protection.\n"); - if (ql2xenabledif == 1) - prot = SHOST_DIX_TYPE0_PROTECTION; if (ql2xprotmask) scsi_host_set_prot(host, ql2xprotmask); else
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran qutran@marvell.com
commit 6d0b65569c0a10b27c49bacd8d25bcd406003533 upstream.
Fix race condition between Interrupt thread and Chip reset thread in trying to flush the same mailbox. With the race condition, the "ha->mbx_intr_comp" will get an extra complete() call. The extra complete call create erroneous mailbox timeout condition when the next mailbox is sent where the mailbox call does not wait for interrupt to arrive. Instead, it advances without waiting.
Add lock protection around the check for mailbox completion.
Cc: stable@vger.kernel.org Fixes: b2000805a975 ("scsi: qla2xxx: Flush mailbox commands on chip reset") Signed-off-by: Quinn Tran quinn.tran@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230821130045.34850-3-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_def.h | 1 - drivers/scsi/qla2xxx/qla_init.c | 7 ++++--- drivers/scsi/qla2xxx/qla_mbx.c | 4 ---- drivers/scsi/qla2xxx/qla_os.c | 1 - 4 files changed, 4 insertions(+), 9 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -4384,7 +4384,6 @@ struct qla_hw_data { uint8_t aen_mbx_count; atomic_t num_pend_mbx_stage1; atomic_t num_pend_mbx_stage2; - atomic_t num_pend_mbx_stage3; uint16_t frame_payload_size;
uint32_t login_retry_count; --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -7390,14 +7390,15 @@ qla2x00_abort_isp_cleanup(scsi_qla_host_ }
/* purge MBox commands */ - if (atomic_read(&ha->num_pend_mbx_stage3)) { + spin_lock_irqsave(&ha->hardware_lock, flags); + if (test_bit(MBX_INTR_WAIT, &ha->mbx_cmd_flags)) { clear_bit(MBX_INTR_WAIT, &ha->mbx_cmd_flags); complete(&ha->mbx_intr_comp); } + spin_unlock_irqrestore(&ha->hardware_lock, flags);
i = 0; - while (atomic_read(&ha->num_pend_mbx_stage3) || - atomic_read(&ha->num_pend_mbx_stage2) || + while (atomic_read(&ha->num_pend_mbx_stage2) || atomic_read(&ha->num_pend_mbx_stage1)) { msleep(20); i++; --- a/drivers/scsi/qla2xxx/qla_mbx.c +++ b/drivers/scsi/qla2xxx/qla_mbx.c @@ -273,7 +273,6 @@ qla2x00_mailbox_command(scsi_qla_host_t spin_unlock_irqrestore(&ha->hardware_lock, flags);
wait_time = jiffies; - atomic_inc(&ha->num_pend_mbx_stage3); if (!wait_for_completion_timeout(&ha->mbx_intr_comp, mcp->tov * HZ)) { ql_dbg(ql_dbg_mbx, vha, 0x117a, @@ -290,7 +289,6 @@ qla2x00_mailbox_command(scsi_qla_host_t spin_unlock_irqrestore(&ha->hardware_lock, flags); atomic_dec(&ha->num_pend_mbx_stage2); - atomic_dec(&ha->num_pend_mbx_stage3); rval = QLA_ABORTED; goto premature_exit; } @@ -302,11 +300,9 @@ qla2x00_mailbox_command(scsi_qla_host_t ha->flags.mbox_busy = 0; spin_unlock_irqrestore(&ha->hardware_lock, flags); atomic_dec(&ha->num_pend_mbx_stage2); - atomic_dec(&ha->num_pend_mbx_stage3); rval = QLA_ABORTED; goto premature_exit; } - atomic_dec(&ha->num_pend_mbx_stage3);
if (time_after(jiffies, wait_time + 5 * HZ)) ql_log(ql_log_warn, vha, 0x1015, "cmd=0x%x, waited %d msecs\n", --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -3007,7 +3007,6 @@ qla2x00_probe_one(struct pci_dev *pdev, ha->max_exchg = FW_MAX_EXCHANGES_CNT; atomic_set(&ha->num_pend_mbx_stage1, 0); atomic_set(&ha->num_pend_mbx_stage2, 0); - atomic_set(&ha->num_pend_mbx_stage3, 0); atomic_set(&ha->zio_threshold, DEFAULT_ZIO_THRESHOLD); ha->last_zio_threshold = DEFAULT_ZIO_THRESHOLD; INIT_LIST_HEAD(&ha->tmf_pending);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nilesh Javali njavali@marvell.com
commit b496953dd0444001b12f425ea07d78c1f47e3193 upstream.
Fix indentation for warning reported by smatch:
drivers/scsi/qla2xxx/qla_init.c:4199 qla_init_iocb_limit() warn: inconsistent indenting
Fixes: efa74a62aaa2 ("scsi: qla2xxx: Adjust IOCB resource on qpair create") Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230821130045.34850-8-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -4203,7 +4203,7 @@ void qla_init_iocb_limit(scsi_qla_host_t u8 i; struct qla_hw_data *ha = vha->hw;
- __qla_adjust_iocb_limit(ha->base_qpair); + __qla_adjust_iocb_limit(ha->base_qpair); ha->base_qpair->fwres.iocbs_used = 0; ha->base_qpair->fwres.exch_used = 0;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran qutran@marvell.com
commit 0ba0b018f94525a6b32f5930f980ce9b62b72e6f upstream.
TMF was returned with an error code. The error code was not preserved to be returned to upper layer. Instead, the error code from the Marker was returned.
Preserve error code from TMF and return it to upper layer.
Cc: stable@vger.kernel.org Fixes: da7c21b72aa8 ("scsi: qla2xxx: Fix command flush during TMF") Signed-off-by: Quinn Tran qutran@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230821130045.34850-6-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_init.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -2223,6 +2223,8 @@ __qla2x00_async_tm_cmd(struct tmf_arg *a rval = QLA_FUNCTION_FAILED; } } + if (tm_iocb->u.tmf.data) + rval = tm_iocb->u.tmf.data;
done_free_sp: /* ref: INIT */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran qutran@marvell.com
commit e370b64c7db96384a0886a09a9d80406e4c663d7 upstream.
The storage was not draining I/Os and the work load was not spread out across different CPUs evenly. This led to firmware resource counters getting overrun on the busy CPU. This overrun prevented error recovery from happening in a timely manner.
By switching the counter to atomic, it allows the count to be little more accurate to prevent the overrun.
Cc: stable@vger.kernel.org Fixes: da7c21b72aa8 ("scsi: qla2xxx: Fix command flush during TMF") Signed-off-by: Quinn Tran qutran@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230821130045.34850-4-njavali@marvell.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qla2xxx/qla_def.h | 11 +++++++ drivers/scsi/qla2xxx/qla_dfs.c | 10 ++++++ drivers/scsi/qla2xxx/qla_init.c | 8 +++++ drivers/scsi/qla2xxx/qla_inline.h | 57 +++++++++++++++++++++++++++++++++++++- drivers/scsi/qla2xxx/qla_os.c | 5 ++- 5 files changed, 88 insertions(+), 3 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -3742,6 +3742,16 @@ struct qla_fw_resources { u16 pad; };
+struct qla_fw_res { + u16 iocb_total; + u16 iocb_limit; + atomic_t iocb_used; + + u16 exch_total; + u16 exch_limit; + atomic_t exch_used; +}; + #define QLA_IOCB_PCT_LIMIT 95
struct qla_buf_pool { @@ -4799,6 +4809,7 @@ struct qla_hw_data { struct els_reject elsrej; u8 edif_post_stop_cnt_down; struct qla_vp_map *vp_map; + struct qla_fw_res fwres ____cacheline_aligned; };
#define RX_ELS_SIZE (roundup(sizeof(struct enode) + ELS_MAX_PAYLOAD, SMP_CACHE_BYTES)) --- a/drivers/scsi/qla2xxx/qla_dfs.c +++ b/drivers/scsi/qla2xxx/qla_dfs.c @@ -276,6 +276,16 @@ qla_dfs_fw_resource_cnt_show(struct seq_
seq_printf(s, "estimate exchange used[%d] high water limit [%d] n", exch_used, ha->base_qpair->fwres.exch_limit); + + if (ql2xenforce_iocb_limit == 2) { + iocbs_used = atomic_read(&ha->fwres.iocb_used); + exch_used = atomic_read(&ha->fwres.exch_used); + seq_printf(s, " estimate iocb2 used [%d] high water limit [%d]\n", + iocbs_used, ha->fwres.iocb_limit); + + seq_printf(s, " estimate exchange2 used[%d] high water limit [%d] \n", + exch_used, ha->fwres.exch_limit); + } }
return 0; --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -4216,6 +4216,14 @@ void qla_init_iocb_limit(scsi_qla_host_t ha->queue_pair_map[i]->fwres.exch_used = 0; } } + + ha->fwres.iocb_total = ha->orig_fw_iocb_count; + ha->fwres.iocb_limit = (ha->orig_fw_iocb_count * QLA_IOCB_PCT_LIMIT) / 100; + ha->fwres.exch_total = ha->orig_fw_xcb_count; + ha->fwres.exch_limit = (ha->orig_fw_xcb_count * QLA_IOCB_PCT_LIMIT) / 100; + + atomic_set(&ha->fwres.iocb_used, 0); + atomic_set(&ha->fwres.exch_used, 0); }
void qla_adjust_iocb_limit(scsi_qla_host_t *vha) --- a/drivers/scsi/qla2xxx/qla_inline.h +++ b/drivers/scsi/qla2xxx/qla_inline.h @@ -386,6 +386,7 @@ enum { RESOURCE_IOCB = BIT_0, RESOURCE_EXCH = BIT_1, /* exchange */ RESOURCE_FORCE = BIT_2, + RESOURCE_HA = BIT_3, };
static inline int @@ -393,7 +394,7 @@ qla_get_fw_resources(struct qla_qpair *q { u16 iocbs_used, i; u16 exch_used; - struct qla_hw_data *ha = qp->vha->hw; + struct qla_hw_data *ha = qp->hw;
if (!ql2xenforce_iocb_limit) { iores->res_type = RESOURCE_NONE; @@ -428,15 +429,69 @@ qla_get_fw_resources(struct qla_qpair *q return -ENOSPC; } } + + if (ql2xenforce_iocb_limit == 2) { + if ((iores->iocb_cnt + atomic_read(&ha->fwres.iocb_used)) >= + ha->fwres.iocb_limit) { + iores->res_type = RESOURCE_NONE; + return -ENOSPC; + } + + if (iores->res_type & RESOURCE_EXCH) { + if ((iores->exch_cnt + atomic_read(&ha->fwres.exch_used)) >= + ha->fwres.exch_limit) { + iores->res_type = RESOURCE_NONE; + return -ENOSPC; + } + } + } + force: qp->fwres.iocbs_used += iores->iocb_cnt; qp->fwres.exch_used += iores->exch_cnt; + if (ql2xenforce_iocb_limit == 2) { + atomic_add(iores->iocb_cnt, &ha->fwres.iocb_used); + atomic_add(iores->exch_cnt, &ha->fwres.exch_used); + iores->res_type |= RESOURCE_HA; + } return 0; }
+/* + * decrement to zero. This routine will not decrement below zero + * @v: pointer of type atomic_t + * @amount: amount to decrement from v + */ +static void qla_atomic_dtz(atomic_t *v, int amount) +{ + int c, old, dec; + + c = atomic_read(v); + for (;;) { + dec = c - amount; + if (unlikely(dec < 0)) + dec = 0; + + old = atomic_cmpxchg((v), c, dec); + if (likely(old == c)) + break; + c = old; + } +} + static inline void qla_put_fw_resources(struct qla_qpair *qp, struct iocb_resource *iores) { + struct qla_hw_data *ha = qp->hw; + + if (iores->res_type & RESOURCE_HA) { + if (iores->res_type & RESOURCE_IOCB) + qla_atomic_dtz(&ha->fwres.iocb_used, iores->iocb_cnt); + + if (iores->res_type & RESOURCE_EXCH) + qla_atomic_dtz(&ha->fwres.exch_used, iores->exch_cnt); + } + if (iores->res_type & RESOURCE_IOCB) { if (qp->fwres.iocbs_used >= iores->iocb_cnt) { qp->fwres.iocbs_used -= iores->iocb_cnt; --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -44,10 +44,11 @@ module_param(ql2xfulldump_on_mpifail, in MODULE_PARM_DESC(ql2xfulldump_on_mpifail, "Set this to take full dump on MPI hang.");
-int ql2xenforce_iocb_limit = 1; +int ql2xenforce_iocb_limit = 2; module_param(ql2xenforce_iocb_limit, int, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(ql2xenforce_iocb_limit, - "Enforce IOCB throttling, to avoid FW congestion. (default: 1)"); + "Enforce IOCB throttling, to avoid FW congestion. (default: 2) " + "1: track usage per queue, 2: track usage per adapter");
/* * CT6 CTX allocation cache
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chengming Zhou zhouchengming@bytedance.com
commit 5a26e45edb4690d58406178b5a9ea4c6dcf2c105 upstream.
When doing io_uring benchmark on /dev/nullb0, it's easy to crash the kernel if poll requests timeout triggered, as reported by David. [1]
BUG: kernel NULL pointer dereference, address: 0000000000000008 Workqueue: kblockd blk_mq_timeout_work RIP: 0010:null_timeout_rq+0x4e/0x91 Call Trace: ? null_timeout_rq+0x4e/0x91 blk_mq_handle_expired+0x31/0x4b bt_iter+0x68/0x84 ? bt_tags_iter+0x81/0x81 __sbitmap_for_each_set.constprop.0+0xb0/0xf2 ? __blk_mq_complete_request_remote+0xf/0xf bt_for_each+0x46/0x64 ? __blk_mq_complete_request_remote+0xf/0xf ? percpu_ref_get_many+0xc/0x2a blk_mq_queue_tag_busy_iter+0x14d/0x18e blk_mq_timeout_work+0x95/0x127 process_one_work+0x185/0x263 worker_thread+0x1b5/0x227
This is indeed a race problem between null_timeout_rq() and null_poll().
null_poll() null_timeout_rq() spin_lock(&nq->poll_lock) list_splice_init(&nq->poll_list, &list) spin_unlock(&nq->poll_lock)
while (!list_empty(&list)) req = list_first_entry() list_del_init() ... blk_mq_add_to_batch() // req->rq_next = NULL spin_lock(&nq->poll_lock)
// rq->queuelist->next == NULL list_del_init(&rq->queuelist)
spin_unlock(&nq->poll_lock)
Fix these problems by setting requests state to MQ_RQ_COMPLETE under nq->poll_lock protection, in which null_timeout_rq() can safely detect this race and early return.
Note this patch just fix the kernel panic when request timeout happen.
[1] https://lore.kernel.org/all/3893581.1691785261@warthog.procyon.org.uk/
Fixes: 0a593fbbc245 ("null_blk: poll queue support") Reported-by: David Howells dhowells@redhat.com Tested-by: David Howells dhowells@redhat.com Reviewed-by: Ming Lei ming.lei@redhat.com Signed-off-by: Chengming Zhou zhouchengming@bytedance.com Link: https://lore.kernel.org/r/20230901120306.170520-2-chengming.zhou@linux.dev Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/null_blk/main.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/drivers/block/null_blk/main.c +++ b/drivers/block/null_blk/main.c @@ -1643,9 +1643,12 @@ static int null_poll(struct blk_mq_hw_ct struct nullb_queue *nq = hctx->driver_data; LIST_HEAD(list); int nr = 0; + struct request *rq;
spin_lock(&nq->poll_lock); list_splice_init(&nq->poll_list, &list); + list_for_each_entry(rq, &list, queuelist) + blk_mq_set_request_complete(rq); spin_unlock(&nq->poll_lock);
while (!list_empty(&list)) { @@ -1671,16 +1674,21 @@ static enum blk_eh_timer_return null_tim struct blk_mq_hw_ctx *hctx = rq->mq_hctx; struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq);
- pr_info("rq %p timed out\n", rq); - if (hctx->type == HCTX_TYPE_POLL) { struct nullb_queue *nq = hctx->driver_data;
spin_lock(&nq->poll_lock); + /* The request may have completed meanwhile. */ + if (blk_mq_request_completed(rq)) { + spin_unlock(&nq->poll_lock); + return BLK_EH_DONE; + } list_del_init(&rq->queuelist); spin_unlock(&nq->poll_lock); }
+ pr_info("rq %p timed out\n", rq); + /* * If the device is marked as blocking (i.e. memory backed or zoned * device), the submission path may be blocked waiting for resources
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Kent raven@themaw.net
commit 0559f63057f927d298d68294d6ff77ce09b99255 upstream.
When the kernfs_iattr_rwsem was introduced a case was missed.
The update of the kernfs directory node child count was also protected by the kernfs_rwsem and needs to be included in the change so that the child count (and so the inode n_link attribute) does not change while holding the rwsem for read.
Fixes: 9caf69614225 ("kernfs: Introduce separate rwsem to protect inode attributes.") Cc: stable stable@kernel.org Signed-off-by: Ian Kent raven@themaw.net Reviewed-By: Imran Khan imran.f.khan@oracle.com Acked-by: Miklos Szeredi mszeredi@redhat.com Cc: Anders Roxell anders.roxell@linaro.org Cc: Arnd Bergmann arnd@arndb.de Cc: Minchan Kim minchan@kernel.org Cc: Eric Sandeen sandeen@sandeen.net Link: https://lore.kernel.org/r/169128520941.68052.15749253469930138901.stgit@dona... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/kernfs/dir.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 5a1a4af9d3d2..bf243015834e 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node *kn) rb_insert_color(&kn->rb, &kn->parent->dir.children);
/* successfully added, account subdir number */ + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); if (kernfs_type(kn) == KERNFS_DIR) kn->parent->dir.subdirs++; kernfs_inc_rev(kn->parent); + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
return 0; } @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct kernfs_node *kn) if (RB_EMPTY_NODE(&kn->rb)) return false;
+ down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); if (kernfs_type(kn) == KERNFS_DIR) kn->parent->dir.subdirs--; kernfs_inc_rev(kn->parent); + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
rb_erase(&kn->rb, &kn->parent->dir.children); RB_CLEAR_NODE(&kn->rb);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Zimmermann tzimmermann@suse.de
commit f90a0e5265b60cdd3c77990e8105f79aa2fac994 upstream.
Do not assing the Linux device to struct fb_info.dev. The call to register_framebuffer() initializes the field to the fbdev device. Drivers should not override its value.
Fixes a bug where the driver incorrectly decreases the hardware device's reference counter and leaks the fbdev device.
v2: * add Fixes tag (Dan)
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Fixes: 88017bda96a5 ("ep93xx video driver") Cc: stable@vger.kernel.org # v2.6.32+ Reviewed-by: Javier Martinez Canillas javierm@redhat.com Reviewed-by: Sam Ravnborg sam@ravnborg.org Link: https://patchwork.freedesktop.org/patch/msgid/20230613110953.24176-15-tzimme... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/video/fbdev/ep93xx-fb.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/video/fbdev/ep93xx-fb.c +++ b/drivers/video/fbdev/ep93xx-fb.c @@ -474,7 +474,6 @@ static int ep93xxfb_probe(struct platfor if (!info) return -ENOMEM;
- info->dev = &pdev->dev; platform_set_drvdata(pdev, info); fbi = info->par; fbi->mach_info = mach_info;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit c948ff727e25297f3a703eb5349dd66aabf004e4 upstream.
To make sure that the controller is runtime resumed and its power domain is enabled before accessing its registers during probe, the synchronous runtime PM interface must be used.
Fixes: 8d4025943e13 ("clk: qcom: camcc-sc7180: Use runtime PM ops instead of clk ones") Cc: stable@vger.kernel.org # 5.11 Cc: Stephen Boyd sboyd@kernel.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20230718132902.21430-2-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/qcom/camcc-sc7180.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/clk/qcom/camcc-sc7180.c +++ b/drivers/clk/qcom/camcc-sc7180.c @@ -1664,7 +1664,7 @@ static int cam_cc_sc7180_probe(struct pl return ret; }
- ret = pm_runtime_get(&pdev->dev); + ret = pm_runtime_resume_and_get(&pdev->dev); if (ret) return ret;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Zimmermann tzimmermann@suse.de
commit 4cfe75f0f14f044dae66ad0e6eea812d038465d9 upstream.
Fix the test for the AST2200 in the DRAM initialization. The value in ast->chip has to be compared against an enum constant instead of a numerical value.
This bug got introduced when the driver was first imported into the kernel.
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Fixes: 312fec1405dd ("drm: Initial KMS driver for AST (ASpeed Technologies) 2000 series (v2)") Cc: Dave Airlie airlied@redhat.com Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v3.5+ Reviewed-by: Sui Jingfeng suijingfeng@loongson.cn Reviewed-by: Jocelyn Falempe jfalempe@redhat.com Tested-by: Jocelyn Falempe jfalempe@redhat.com # AST2600 Link: https://patchwork.freedesktop.org/patch/msgid/20230621130032.3568-2-tzimmerm... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/ast/ast_post.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/ast/ast_post.c +++ b/drivers/gpu/drm/ast/ast_post.c @@ -291,7 +291,7 @@ static void ast_init_dram_reg(struct drm ; } while (ast_read32(ast, 0x10100) != 0xa8); } else {/* AST2100/1100 */ - if (ast->chip == AST2100 || ast->chip == 2200) + if (ast->chip == AST2100 || ast->chip == AST2200) dram_reg_info = ast2100_dram_table_data; else dram_reg_info = ast1100_dram_table_data;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sheetal sheetal@nvidia.com
commit d900d9a435ca95a386f49424f3689cd17ec201da upstream.
Sample rate conversions for rates greater than 48kHz are found to be failing. It means x->y conversions fail when either x or y is greater than 48kHz.
This happens because, tegra210_sfc_rate_to_idx() returns incorrect index for rates greater than 48kHz. This actually depends on the tegra210_sfc_rates[] array and it is not in sync with frequency values of SFC TX/RX register. To be precise, 64kHz entry is missing in above array defined in the driver. Due to this wrong index is returned and this results in incorrect programming of coefficients.
To fix this, align the tegra210_sfc_rates[] array with SFC register specification and thus add 64kHz entry to it. Also, the coefficient table is updated to reflect that none of the conversions are supported for 64kHz.
Fixes: b2f74ec53a6c ("ASoC: tegra: Add Tegra210 based SFC driver") Cc: stable@vger.kernel.org Signed-off-by: Sheetal sheetal@nvidia.com Reviewed-by: Mohan Kumar D mkumard@nvidia.com Reviewed-by: Sameer Pujar spujar@nvidia.com Link: https://lore.kernel.org/r/Message-Id: 1687433656-7892-2-git-send-email-spujar@nvidia.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/tegra/tegra210_sfc.c | 31 ++++++++++++++++++++++++++++++- sound/soc/tegra/tegra210_sfc.h | 4 ++-- 2 files changed, 32 insertions(+), 3 deletions(-)
--- a/sound/soc/tegra/tegra210_sfc.c +++ b/sound/soc/tegra/tegra210_sfc.c @@ -2,7 +2,7 @@ // // tegra210_sfc.c - Tegra210 SFC driver // -// Copyright (c) 2021 NVIDIA CORPORATION. All rights reserved. +// Copyright (c) 2021-2023 NVIDIA CORPORATION. All rights reserved.
#include <linux/clk.h> #include <linux/device.h> @@ -42,6 +42,7 @@ static const int tegra210_sfc_rates[TEGR 32000, 44100, 48000, + 64000, 88200, 96000, 176400, @@ -2857,6 +2858,7 @@ static s32 *coef_addr_table[TEGRA210_SFC coef_8to32, coef_8to44, coef_8to48, + UNSUPP_CONV, coef_8to88, coef_8to96, UNSUPP_CONV, @@ -2872,6 +2874,7 @@ static s32 *coef_addr_table[TEGRA210_SFC coef_11to32, coef_11to44, coef_11to48, + UNSUPP_CONV, coef_11to88, coef_11to96, UNSUPP_CONV, @@ -2887,6 +2890,7 @@ static s32 *coef_addr_table[TEGRA210_SFC coef_16to32, coef_16to44, coef_16to48, + UNSUPP_CONV, coef_16to88, coef_16to96, coef_16to176, @@ -2902,6 +2906,7 @@ static s32 *coef_addr_table[TEGRA210_SFC coef_22to32, coef_22to44, coef_22to48, + UNSUPP_CONV, coef_22to88, coef_22to96, coef_22to176, @@ -2917,6 +2922,7 @@ static s32 *coef_addr_table[TEGRA210_SFC coef_24to32, coef_24to44, coef_24to48, + UNSUPP_CONV, coef_24to88, coef_24to96, coef_24to176, @@ -2932,6 +2938,7 @@ static s32 *coef_addr_table[TEGRA210_SFC BYPASS_CONV, coef_32to44, coef_32to48, + UNSUPP_CONV, coef_32to88, coef_32to96, coef_32to176, @@ -2947,6 +2954,7 @@ static s32 *coef_addr_table[TEGRA210_SFC coef_44to32, BYPASS_CONV, coef_44to48, + UNSUPP_CONV, coef_44to88, coef_44to96, coef_44to176, @@ -2962,11 +2970,28 @@ static s32 *coef_addr_table[TEGRA210_SFC coef_48to32, coef_48to44, BYPASS_CONV, + UNSUPP_CONV, coef_48to88, coef_48to96, coef_48to176, coef_48to192, }, + /* Convertions from 64 kHz */ + { + UNSUPP_CONV, + UNSUPP_CONV, + UNSUPP_CONV, + UNSUPP_CONV, + UNSUPP_CONV, + UNSUPP_CONV, + UNSUPP_CONV, + UNSUPP_CONV, + UNSUPP_CONV, + UNSUPP_CONV, + UNSUPP_CONV, + UNSUPP_CONV, + UNSUPP_CONV, + }, /* Convertions from 88.2 kHz */ { coef_88to8, @@ -2977,6 +3002,7 @@ static s32 *coef_addr_table[TEGRA210_SFC coef_88to32, coef_88to44, coef_88to48, + UNSUPP_CONV, BYPASS_CONV, coef_88to96, coef_88to176, @@ -2991,6 +3017,7 @@ static s32 *coef_addr_table[TEGRA210_SFC coef_96to32, coef_96to44, coef_96to48, + UNSUPP_CONV, coef_96to88, BYPASS_CONV, coef_96to176, @@ -3006,6 +3033,7 @@ static s32 *coef_addr_table[TEGRA210_SFC coef_176to32, coef_176to44, coef_176to48, + UNSUPP_CONV, coef_176to88, coef_176to96, BYPASS_CONV, @@ -3021,6 +3049,7 @@ static s32 *coef_addr_table[TEGRA210_SFC coef_192to32, coef_192to44, coef_192to48, + UNSUPP_CONV, coef_192to88, coef_192to96, coef_192to176, --- a/sound/soc/tegra/tegra210_sfc.h +++ b/sound/soc/tegra/tegra210_sfc.h @@ -2,7 +2,7 @@ /* * tegra210_sfc.h - Definitions for Tegra210 SFC driver * - * Copyright (c) 2021 NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2021-2023 NVIDIA CORPORATION. All rights reserved. * */
@@ -47,7 +47,7 @@ #define TEGRA210_SFC_EN_SHIFT 0 #define TEGRA210_SFC_EN (1 << TEGRA210_SFC_EN_SHIFT)
-#define TEGRA210_SFC_NUM_RATES 12 +#define TEGRA210_SFC_NUM_RATES 13
/* Fields in TEGRA210_SFC_COEF_RAM */ #define TEGRA210_SFC_COEF_RAM_EN BIT(0)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Cercueil paul@crapouillou.net
commit b3f3fc32e5ff1e848555af8616318cc667457f90 upstream.
The previous values were completely bogus, and resulted in the computed DPI ratio being much lower than reality, causing applications and UIs to misbehave.
The new values were measured by myself with a ruler.
Signed-off-by: Paul Cercueil paul@crapouillou.net Acked-by: Sam Ravnborg sam@ravnborg.org Fixes: 8620cc2f99b7 ("ARM: dts: exynos: Add devicetree file for the Galaxy S2") Cc: stable@vger.kernel.org # v5.8+ Link: https://lore.kernel.org/r/20230714153720.336990-1-paul@crapouillou.net Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/samsung/exynos4210-i9100.dts | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/boot/dts/samsung/exynos4210-i9100.dts +++ b/arch/arm/boot/dts/samsung/exynos4210-i9100.dts @@ -207,8 +207,8 @@ power-on-delay = <10>; reset-delay = <10>;
- panel-width-mm = <90>; - panel-height-mm = <154>; + panel-width-mm = <56>; + panel-height-mm = <93>;
display-timings { timing {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sheetal sheetal@nvidia.com
commit e483fe34adab3197558b7284044c1b26f5ede20e upstream.
I2S data sanity tests fail beyond a bit clock frequency of 6.144MHz. This happens because the AHUB clock rate is too low and it shows 9.83MHz on boot.
The maximum rate of PLLA_OUT0 is 49.152MHz and is used to serve I/O clocks. It is recommended that AHUB clock operates higher than this. Thus fix this by using PLLP_OUT0 as parent clock for AHUB instead of PLLA_OUT0 and fix the rate to 81.6MHz.
Fixes: dc94a94daa39 ("arm64: tegra: Add audio devices on Tegra234") Cc: stable@vger.kernel.org Signed-off-by: Sheetal sheetal@nvidia.com Signed-off-by: Sameer Pujar spujar@nvidia.com Reviewed-by: Mohan Kumar D mkumard@nvidia.com Signed-off-by: Thierry Reding treding@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/nvidia/tegra234.dtsi | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi @@ -180,7 +180,8 @@ clocks = <&bpmp TEGRA234_CLK_AHUB>; clock-names = "ahub"; assigned-clocks = <&bpmp TEGRA234_CLK_AHUB>; - assigned-clock-parents = <&bpmp TEGRA234_CLK_PLLA_OUT0>; + assigned-clock-parents = <&bpmp TEGRA234_CLK_PLLP_OUT0>; + assigned-clock-rates = <81600000>; status = "disabled";
#address-cells = <2>;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sameer Pujar spujar@nvidia.com
commit dc6d5d85ed3a3fe566314f388bce4c71a26b1677 upstream.
I2S data sanity test failures are seen at lower AHUB clock rates on Tegra234. The Tegra194 uses the same clock relationship for AHUB and it is likely that similar issues would be seen. Thus update the AHUB clock parent and rates here as well for Tegra194, Tegra186 and Tegra210.
Fixes: 177208f7b06d ("arm64: tegra: Add DT binding for AHUB components") Cc: stable@vger.kernel.org Signed-off-by: Sameer Pujar spujar@nvidia.com Signed-off-by: Thierry Reding treding@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/nvidia/tegra186.dtsi | 3 ++- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 3 ++- arch/arm64/boot/dts/nvidia/tegra210.dtsi | 3 ++- 3 files changed, 6 insertions(+), 3 deletions(-)
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi @@ -135,7 +135,8 @@ clocks = <&bpmp TEGRA186_CLK_AHUB>; clock-names = "ahub"; assigned-clocks = <&bpmp TEGRA186_CLK_AHUB>; - assigned-clock-parents = <&bpmp TEGRA186_CLK_PLL_A_OUT0>; + assigned-clock-parents = <&bpmp TEGRA186_CLK_PLLP_OUT0>; + assigned-clock-rates = <81600000>; #address-cells = <1>; #size-cells = <1>; ranges = <0x02900800 0x02900800 0x11800>; --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -231,7 +231,8 @@ clocks = <&bpmp TEGRA194_CLK_AHUB>; clock-names = "ahub"; assigned-clocks = <&bpmp TEGRA194_CLK_AHUB>; - assigned-clock-parents = <&bpmp TEGRA194_CLK_PLLA_OUT0>; + assigned-clock-parents = <&bpmp TEGRA194_CLK_PLLP_OUT0>; + assigned-clock-rates = <81600000>; status = "disabled";
#address-cells = <2>; --- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi @@ -1386,7 +1386,8 @@ clocks = <&tegra_car TEGRA210_CLK_D_AUDIO>; clock-names = "ahub"; assigned-clocks = <&tegra_car TEGRA210_CLK_D_AUDIO>; - assigned-clock-parents = <&tegra_car TEGRA210_CLK_PLL_A_OUT0>; + assigned-clock-parents = <&tegra_car TEGRA210_CLK_PLL_P>; + assigned-clock-rates = <81600000>; #address-cells = <1>; #size-cells = <1>; ranges = <0x702d0000 0x702d0000 0x0000e400>;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit a9f71a033587c9074059132d34c74eabbe95ef26 upstream.
Drivers that enable runtime PM must make sure that the controller is runtime resumed before accessing its registers to prevent the power domain from being disabled.
Fixes: 892df0191b29 ("clk: qcom: Add QCS404 TuringCC") Cc: stable@vger.kernel.org # 5.2 Cc: Bjorn Andersson andersson@kernel.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20230718132902.21430-9-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/qcom/turingcc-qcs404.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
--- a/drivers/clk/qcom/turingcc-qcs404.c +++ b/drivers/clk/qcom/turingcc-qcs404.c @@ -125,11 +125,22 @@ static int turingcc_probe(struct platfor return ret; }
+ ret = pm_runtime_resume_and_get(&pdev->dev); + if (ret) + return ret; + ret = qcom_cc_probe(pdev, &turingcc_desc); if (ret < 0) - return ret; + goto err_put_rpm; + + pm_runtime_put(&pdev->dev);
return 0; + +err_put_rpm: + pm_runtime_put_sync(&pdev->dev); + + return ret; }
static const struct dev_pm_ops turingcc_pm_ops = {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit 43db69268149049540b1d2bbe8a69e59d5cb43b6 upstream.
There is no syna,f11-flip-x property, so assume intention was to use touchscreen-inverted-x.
Fixes: ab80661883de ("ARM: dts: qcom: msm8974: Add Sony Xperia Z2 Tablet") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20230720115335.137354-4-krzysztof.kozlowski@linaro... Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts +++ b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts @@ -132,8 +132,8 @@
rmi-f11@11 { reg = <0x11>; - syna,f11-flip-x = <1>; syna,sensor-type = <1>; + touchscreen-inverted-x; }; }; };
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit b019cf7e5fbaa7d25f716cb936a9237b47156f2d upstream.
Interrupts extended already define a parent interrupt controller:
msm8953-xiaomi-vince.dtb: touchscreen@20: Unevaluated properties are not allowed ('interrupts-parent' was unexpected)
Fixes: aa17e707e04a ("arm64: dts: qcom: msm8953: Add device tree for Xiaomi Redmi 5 Plus") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20230720115335.137354-1-krzysztof.kozlowski@linaro... Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts | 1 - 1 file changed, 1 deletion(-)
diff --git a/arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts b/arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts index 0956c866d6cb..1a1d3f92a511 100644 --- a/arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts +++ b/arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts @@ -132,7 +132,6 @@ &i2c_3 { touchscreen@20 { reg = <0x20>; compatible = "syna,rmi4-i2c"; - interrupts-parent = <&tlmm>; interrupts-extended = <&tlmm 65 IRQ_TYPE_EDGE_FALLING>;
#address-cells = <1>;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit 31fba16c19c45b2b3a7c23b0bfef80aed1b29050 upstream.
The node names for functions of Synaptics RMI4 touchscreen must be as "rmi4-fXX", as required by bindings and Linux driver.
qcom-msm8974pro-sony-xperia-shinano-castor.dtb: synaptics@2c: Unevaluated properties are not allowed ('rmi-f01@1', 'rmi-f11@11' were unexpected)
Fixes: ab80661883de ("ARM: dts: qcom: msm8974: Add Sony Xperia Z2 Tablet") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20230720115335.137354-5-krzysztof.kozlowski@linaro... Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts +++ b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts @@ -125,12 +125,12 @@
syna,startup-delay-ms = <100>;
- rmi-f01@1 { + rmi4-f01@1 { reg = <0x1>; syna,nosleep = <1>; };
- rmi-f11@11 { + rmi4-f11@11 { reg = <0x11>; syna,sensor-type = <1>; touchscreen-inverted-x;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit 7c74379afdfee7b13f1cd8ff1ad6e0f986aec96c upstream.
There is no syna,nosleep property in Synaptics RMI4 touchscreen:
qcom-msm8974pro-sony-xperia-shinano-castor.dtb: synaptics@2c: rmi4-f01@1: 'syna,nosleep' does not match any of the regexes: 'pinctrl-[0-9]+'
Fixes: ab80661883de ("ARM: dts: qcom: msm8974: Add Sony Xperia Z2 Tablet") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20230720115335.137354-6-krzysztof.kozlowski@linaro... Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts +++ b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts @@ -127,7 +127,7 @@
rmi4-f01@1 { reg = <0x1>; - syna,nosleep = <1>; + syna,nosleep-mode = <1>; };
rmi4-f11@11 {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chris Paterson chris.paterson2@renesas.com
commit db67345716a52abb750ec8f76d6a5675218715f9 upstream.
It looks like txdv-skew-psec is a typo from a copy+paste. txdv-skew-psec is not present in the PHY bindings nor is it in the driver.
Correct to txen-skew-psec which is clearly what it was meant to be.
Given that the default for txen-skew-psec is 0, and the device tree is only trying to set it to 0 anyway, there should not be any functional change from this fix.
Fixes: 361b0dcbd7f9 ("arm64: dts: renesas: rzg2l-smarc-som: Enable Ethernet") Fixes: 6494e4f90503 ("arm64: dts: renesas: rzg2ul-smarc-som: Enable Ethernet on SMARC platform") Fixes: ce0c63b6a5ef ("arm64: dts: renesas: Add initial device tree for RZ/G2LC SMARC EVK") Cc: stable@vger.kernel.org # 6.1.y Reported-by: Tomohiro Komagata tomohiro.komagata.aj@renesas.com Signed-off-by: Chris Paterson chris.paterson2@renesas.com Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20230609221136.7431-1-chris.paterson2@renesas.com Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/renesas/rzg2l-smarc-som.dtsi | 4 ++-- arch/arm64/boot/dts/renesas/rzg2lc-smarc-som.dtsi | 2 +- arch/arm64/boot/dts/renesas/rzg2ul-smarc-som.dtsi | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-)
--- a/arch/arm64/boot/dts/renesas/rzg2l-smarc-som.dtsi +++ b/arch/arm64/boot/dts/renesas/rzg2l-smarc-som.dtsi @@ -100,7 +100,7 @@ rxc-skew-psec = <2400>; txc-skew-psec = <2400>; rxdv-skew-psec = <0>; - txdv-skew-psec = <0>; + txen-skew-psec = <0>; rxd0-skew-psec = <0>; rxd1-skew-psec = <0>; rxd2-skew-psec = <0>; @@ -128,7 +128,7 @@ rxc-skew-psec = <2400>; txc-skew-psec = <2400>; rxdv-skew-psec = <0>; - txdv-skew-psec = <0>; + txen-skew-psec = <0>; rxd0-skew-psec = <0>; rxd1-skew-psec = <0>; rxd2-skew-psec = <0>; --- a/arch/arm64/boot/dts/renesas/rzg2lc-smarc-som.dtsi +++ b/arch/arm64/boot/dts/renesas/rzg2lc-smarc-som.dtsi @@ -77,7 +77,7 @@ rxc-skew-psec = <2400>; txc-skew-psec = <2400>; rxdv-skew-psec = <0>; - txdv-skew-psec = <0>; + txen-skew-psec = <0>; rxd0-skew-psec = <0>; rxd1-skew-psec = <0>; rxd2-skew-psec = <0>; --- a/arch/arm64/boot/dts/renesas/rzg2ul-smarc-som.dtsi +++ b/arch/arm64/boot/dts/renesas/rzg2ul-smarc-som.dtsi @@ -83,7 +83,7 @@ rxc-skew-psec = <2400>; txc-skew-psec = <2400>; rxdv-skew-psec = <0>; - txdv-skew-psec = <0>; + txen-skew-psec = <0>; rxd0-skew-psec = <0>; rxd1-skew-psec = <0>; rxd2-skew-psec = <0>; @@ -112,7 +112,7 @@ rxc-skew-psec = <2400>; txc-skew-psec = <2400>; rxdv-skew-psec = <0>; - txdv-skew-psec = <0>; + txen-skew-psec = <0>; rxd0-skew-psec = <0>; rxd1-skew-psec = <0>; rxd2-skew-psec = <0>;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aleksey Nasibulin alealexpro100@ya.ru
commit 91994e59079dcb455783d3f9ea338eea6f671af3 upstream.
Linksys ea6500-v2 have 256MB of ram. Currently we only use 128MB. Expand the definition to use all the available RAM.
Fixes: 03e96644d7a8 ("ARM: dts: BCM5301X: Add basic DT for Linksys EA6500 V2") Signed-off-by: Aleksey Nasibulin alealexpro100@ya.ru Signed-off-by: Christian Marangi ansuelsmth@gmail.com Cc: stable@vger.kernel.org Acked-by: Rafał Miłecki rafal@milecki.pl Link: https://lore.kernel.org/r/20230712014017.28123-1-ansuelsmth@gmail.com Signed-off-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/broadcom/bcm4708-linksys-ea6500-v2.dts | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm/boot/dts/broadcom/bcm4708-linksys-ea6500-v2.dts +++ b/arch/arm/boot/dts/broadcom/bcm4708-linksys-ea6500-v2.dts @@ -19,7 +19,8 @@
memory@0 { device_type = "memory"; - reg = <0x00000000 0x08000000>; + reg = <0x00000000 0x08000000>, + <0x88000000 0x08000000>; };
gpio-keys {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steve French stfrench@microsoft.com
commit 09ee7a3bf866c0fa5ee1914d2c65958559eb5b4c upstream.
The ChannelSequence field in the SMB3 header is supposed to be increased after reconnect to allow the server to distinguish requests from before and after the reconnect. We had always been setting it to zero. There are cases where incrementing ChannelSequence on requests after network reconnects can reduce the chance of data corruptions.
See MS-SMB2 3.2.4.1 and 3.2.7.1
Signed-off-by: Steve French stfrench@microsoft.com Cc: stable@vger.kernel.org # 5.16+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/cifsglob.h | 1 + fs/smb/client/connect.c | 1 + fs/smb/client/smb2ops.c | 11 ++++++++++- fs/smb/client/smb2pdu.c | 11 +++++++++++ fs/smb/common/smb2pdu.h | 22 ++++++++++++++++++++++ 5 files changed, 45 insertions(+), 1 deletion(-)
--- a/fs/smb/client/cifsglob.h +++ b/fs/smb/client/cifsglob.h @@ -729,6 +729,7 @@ struct TCP_Server_Info { */ #define CIFS_SERVER_IS_CHAN(server) (!!(server)->primary_server) struct TCP_Server_Info *primary_server; + __u16 channel_sequence_num; /* incremented on primary channel on each chan reconnect */
#ifdef CONFIG_CIFS_SWN_UPCALL bool use_swn_dstaddr; --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -1686,6 +1686,7 @@ cifs_get_tcp_session(struct smb3_fs_cont ctx->target_rfc1001_name, RFC1001_NAME_LEN_WITH_NULL); tcp_ses->session_estab = false; tcp_ses->sequence_number = 0; + tcp_ses->channel_sequence_num = 0; /* only tracked for primary channel */ tcp_ses->reconnect_instance = 1; tcp_ses->lstrp = jiffies; tcp_ses->compress_algorithm = cpu_to_le16(ctx->compression); --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -172,8 +172,17 @@ smb2_set_credits(struct TCP_Server_Info
spin_lock(&server->req_lock); server->credits = val; - if (val == 1) + if (val == 1) { server->reconnect_instance++; + /* + * ChannelSequence updated for all channels in primary channel so that consistent + * across SMB3 requests sent on any channel. See MS-SMB2 3.2.4.1 and 3.2.7.1 + */ + if (CIFS_SERVER_IS_CHAN(server)) + server->primary_server->channel_sequence_num++; + else + server->channel_sequence_num++; + } scredits = server->credits; in_flight = server->in_flight; spin_unlock(&server->req_lock); --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -88,9 +88,20 @@ smb2_hdr_assemble(struct smb2_hdr *shdr, const struct cifs_tcon *tcon, struct TCP_Server_Info *server) { + struct smb3_hdr_req *smb3_hdr; shdr->ProtocolId = SMB2_PROTO_NUMBER; shdr->StructureSize = cpu_to_le16(64); shdr->Command = smb2_cmd; + if (server->dialect >= SMB30_PROT_ID) { + /* After reconnect SMB3 must set ChannelSequence on subsequent reqs */ + smb3_hdr = (struct smb3_hdr_req *)shdr; + /* if primary channel is not set yet, use default channel for chan sequence num */ + if (CIFS_SERVER_IS_CHAN(server)) + smb3_hdr->ChannelSequence = + cpu_to_le16(server->primary_server->channel_sequence_num); + else + smb3_hdr->ChannelSequence = cpu_to_le16(server->channel_sequence_num); + } if (server) { spin_lock(&server->req_lock); /* Request up to 10 credits but don't go over the limit. */ --- a/fs/smb/common/smb2pdu.h +++ b/fs/smb/common/smb2pdu.h @@ -153,6 +153,28 @@ struct smb2_hdr { __u8 Signature[16]; } __packed;
+struct smb3_hdr_req { + __le32 ProtocolId; /* 0xFE 'S' 'M' 'B' */ + __le16 StructureSize; /* 64 */ + __le16 CreditCharge; /* MBZ */ + __le16 ChannelSequence; /* See MS-SMB2 3.2.4.1 and 3.2.7.1 */ + __le16 Reserved; + __le16 Command; + __le16 CreditRequest; /* CreditResponse */ + __le32 Flags; + __le32 NextCommand; + __le64 MessageId; + union { + struct { + __le32 ProcessId; + __le32 TreeId; + } __packed SyncId; + __le64 AsyncId; + } __packed Id; + __le64 SessionId; + __u8 Signature[16]; +} __packed; + struct smb2_pdu { struct smb2_hdr hdr; __le16 StructureSize2; /* size of wct area (varies, request specific) */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Hocko mhocko@suse.com
commit 86327e8eb94c52eca4f93cfece2e29d1bf52acbf upstream.
kmem.limit_in_bytes (v1 way to limit kernel memory usage) has been deprecated since 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes") merged in 5.16. We haven't heard about any serious users since then but it seems that the mere presence of the file is causing more harm thatn good. We (SUSE) have had several bug reports from customers where Docker based containers started to fail because a write to kmem.limit_in_bytes has failed.
This was unexpected because runc code only expects ENOENT (kmem disabled) or EBUSY (tasks already running within cgroup). So a new error code was unexpected and the whole container startup failed. This has been later addressed by https://github.com/opencontainers/runc/commit/52390d68040637dfc77f9fda6bbe70... so current Docker runtimes do not suffer from the problem anymore. There are still older version of Docker in use and likely hard to get rid of completely.
Address this by wiping out the file completely and effectively get back to pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
I would recommend backporting to stable trees which have picked up 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes").
[mhocko@suse.com: restore _KMEM switch case] Link: https://lkml.kernel.org/r/ZKe5wxdbvPi5Cwd7@dhcp22.suse.cz Link: https://lkml.kernel.org/r/20230704115240.14672-1-mhocko@kernel.org Signed-off-by: Michal Hocko mhocko@suse.com Acked-by: Shakeel Butt shakeelb@google.com Acked-by: Johannes Weiner hannes@cmpxchg.org Acked-by: Roman Gushchin roman.gushchin@linux.dev Cc: Muchun Song muchun.song@linux.dev Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/cgroup-v1/memory.rst | 2 -- mm/memcontrol.c | 10 ---------- 2 files changed, 12 deletions(-)
--- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -92,8 +92,6 @@ Brief summary of control files. memory.oom_control set/show oom controls. memory.numa_stat show the number of memory usage per numa node - memory.kmem.limit_in_bytes This knob is deprecated and writing to - it will return -ENOTSUPP. memory.kmem.usage_in_bytes show current kernel memory allocation memory.kmem.failcnt show the number of kernel memory usage hits limits --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3871,10 +3871,6 @@ static ssize_t mem_cgroup_write(struct k case _MEMSWAP: ret = mem_cgroup_resize_max(memcg, nr_pages, true); break; - case _KMEM: - /* kmem.limit_in_bytes is deprecated. */ - ret = -EOPNOTSUPP; - break; case _TCP: ret = memcg_update_tcp_max(memcg, nr_pages); break; @@ -5086,12 +5082,6 @@ static struct cftype mem_cgroup_legacy_f }, #endif { - .name = "kmem.limit_in_bytes", - .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT), - .write = mem_cgroup_write, - .read_u64 = mem_cgroup_read_u64, - }, - { .name = "kmem.usage_in_bytes", .private = MEMFILE_PRIVATE(_KMEM, RES_USAGE), .read_u64 = mem_cgroup_read_u64,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Muchun Song songmuchun@bytedance.com
commit 3ce2c24cb68f228590a053d6058a5901cd31af61 upstream.
The local variable @page in __split_vmemmap_huge_pmd() to obtain a pmd page without holding page_table_lock may possiblely get the page table page instead of a huge pmd page.
The effect may be in set_pte_at() since we may pass an invalid page struct, if set_pte_at() wants to access the page struct (e.g. CONFIG_PAGE_TABLE_CHECK is enabled), it may crash the kernel.
So fix it. And inline __split_vmemmap_huge_pmd() since it only has one user.
Link: https://lkml.kernel.org/r/20230707033859.16148-1-songmuchun@bytedance.com Fixes: d8d55f5616cf ("mm: sparsemem: use page table lock to protect kernel pmd operations") Signed-off-by: Muchun Song songmuchun@bytedance.com Cc: Mike Kravetz mike.kravetz@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/hugetlb_vmemmap.c | 34 ++++++++++++++-------------------- 1 file changed, 14 insertions(+), 20 deletions(-)
--- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -36,14 +36,22 @@ struct vmemmap_remap_walk { struct list_head *vmemmap_pages; };
-static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) { pmd_t __pmd; int i; unsigned long addr = start; - struct page *page = pmd_page(*pmd); - pte_t *pgtable = pte_alloc_one_kernel(&init_mm); + struct page *head; + pte_t *pgtable; + + spin_lock(&init_mm.page_table_lock); + head = pmd_leaf(*pmd) ? pmd_page(*pmd) : NULL; + spin_unlock(&init_mm.page_table_lock);
+ if (!head) + return 0; + + pgtable = pte_alloc_one_kernel(&init_mm); if (!pgtable) return -ENOMEM;
@@ -53,7 +61,7 @@ static int __split_vmemmap_huge_pmd(pmd_ pte_t entry, *pte; pgprot_t pgprot = PAGE_KERNEL;
- entry = mk_pte(page + i, pgprot); + entry = mk_pte(head + i, pgprot); pte = pte_offset_kernel(&__pmd, addr); set_pte_at(&init_mm, addr, pte, entry); } @@ -65,8 +73,8 @@ static int __split_vmemmap_huge_pmd(pmd_ * be treated as indepdenent small pages (as they can be freed * individually). */ - if (!PageReserved(page)) - split_page(page, get_order(PMD_SIZE)); + if (!PageReserved(head)) + split_page(head, get_order(PMD_SIZE));
/* Make pte visible before pmd. See comment in pmd_install(). */ smp_wmb(); @@ -80,20 +88,6 @@ static int __split_vmemmap_huge_pmd(pmd_ return 0; }
-static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) -{ - int leaf; - - spin_lock(&init_mm.page_table_lock); - leaf = pmd_leaf(*pmd); - spin_unlock(&init_mm.page_table_lock); - - if (!leaf) - return 0; - - return __split_vmemmap_huge_pmd(pmd, start); -} - static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct vmemmap_remap_walk *walk)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrew Donnellan ajd@linux.ibm.com
commit efb78fa86e95832b78ca0ba60f3706788a818938 upstream.
test_pages() tests the page allocator by calling alloc_pages() with different orders up to order 10.
However, different architectures and platforms support different maximum contiguous allocation sizes. The default maximum allocation order (MAX_ORDER) is 10, but architectures can use CONFIG_ARCH_FORCE_MAX_ORDER to override this. On platforms where this is less than 10, test_meminit() will blow up with a WARN(). This is expected, so let's not do that.
Replace the hardcoded "10" with the MAX_ORDER macro so that we test allocations up to the expected platform limit.
Link: https://lkml.kernel.org/r/20230714015238.47931-1-ajd@linux.ibm.com Fixes: 5015a300a522 ("lib: introduce test_meminit module") Signed-off-by: Andrew Donnellan ajd@linux.ibm.com Reviewed-by: Alexander Potapenko glider@google.com Cc: Xiaoke Wang xkernel.wang@foxmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/test_meminit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/test_meminit.c +++ b/lib/test_meminit.c @@ -93,7 +93,7 @@ static int __init test_pages(int *total_ int failures = 0, num_tests = 0; int i;
- for (i = 0; i < 10; i++) + for (i = 0; i <= MAX_ORDER; i++) num_tests += do_alloc_pages_order(i, &failures);
REPORT_FAILURES_IN_FN();
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kalesh Singh kaleshsingh@google.com
commit bb5e7f234eacf34b65be67ebb3613e3b8cf11b87 upstream.
inc_max_seq() will try to inc_min_seq() if nr_gens == MAX_NR_GENS. This is because the generations are reused (the last oldest now empty generation will become the next youngest generation).
inc_min_seq() is retried until successful, dropping the lru_lock and yielding the CPU on each failure, and retaking the lock before trying again:
while (!inc_min_seq(lruvec, type, can_swap)) { spin_unlock_irq(&lruvec->lru_lock); cond_resched(); spin_lock_irq(&lruvec->lru_lock); }
However, the initial condition that required incrementing the min_seq (nr_gens == MAX_NR_GENS) is not retested. This can change by another call to inc_max_seq() from run_aging() with force_scan=true from the debugfs interface.
Since the eviction stalls when the nr_gens == MIN_NR_GENS, avoid unnecessarily incrementing the min_seq by rechecking the number of generations before each attempt.
This issue was uncovered in previous discussion on the list by Yu Zhao and Aneesh Kumar [1].
[1] https://lore.kernel.org/linux-mm/CAOUHufbO7CaVm=xjEb1avDhHVvnC8pJmGyKcFf2iY_...
Link: https://lkml.kernel.org/r/20230802025606.346758-2-kaleshsingh@google.com Fixes: d6c3af7d8a2b ("mm: multi-gen LRU: debugfs interface") Signed-off-by: Kalesh Singh kaleshsingh@google.com Tested-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com [mediatek] Tested-by: Charan Teja Kalla quic_charante@quicinc.com Cc: Yu Zhao yuzhao@google.com Cc: Aneesh Kumar K V aneesh.kumar@linux.ibm.com Cc: Barry Song baohua@kernel.org Cc: Brian Geffon bgeffon@google.com Cc: Jan Alexander Steffens (heftig) heftig@archlinux.org Cc: Lecopzer Chen lecopzer.chen@mediatek.com Cc: Matthias Brugger matthias.bgg@gmail.com Cc: Oleksandr Natalenko oleksandr@natalenko.name Cc: Qi Zheng zhengqi.arch@bytedance.com Cc: Steven Barrett steven@liquorix.net Cc: Suleiman Souhlal suleiman@google.com Cc: Suren Baghdasaryan surenb@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/vmscan.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
--- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4440,7 +4440,7 @@ static void inc_max_seq(struct lruvec *l int prev, next; int type, zone; struct lru_gen_folio *lrugen = &lruvec->lrugen; - +restart: spin_lock_irq(&lruvec->lru_lock);
VM_WARN_ON_ONCE(!seq_is_valid(lruvec)); @@ -4451,11 +4451,12 @@ static void inc_max_seq(struct lruvec *l
VM_WARN_ON_ONCE(!force_scan && (type == LRU_GEN_FILE || can_swap));
- while (!inc_min_seq(lruvec, type, can_swap)) { - spin_unlock_irq(&lruvec->lru_lock); - cond_resched(); - spin_lock_irq(&lruvec->lru_lock); - } + if (inc_min_seq(lruvec, type, can_swap)) + continue; + + spin_unlock_irq(&lruvec->lru_lock); + cond_resched(); + goto restart; }
/*
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 4db89524b084f712a887256391fc19d9f66c8e55 upstream.
Fix the LAN receive and LAN transmit LEDs, which where swapped up to now.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/include/asm/led.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/parisc/include/asm/led.h +++ b/arch/parisc/include/asm/led.h @@ -11,8 +11,8 @@ #define LED1 0x02 #define LED0 0x01 /* bottom (or furthest left) LED */
-#define LED_LAN_TX LED0 /* for LAN transmit activity */ -#define LED_LAN_RCV LED1 /* for LAN receive activity */ +#define LED_LAN_RCV LED0 /* for LAN receive activity */ +#define LED_LAN_TX LED1 /* for LAN transmit activity */ #define LED_DISK_IO LED2 /* for disk activity */ #define LED_HEARTBEAT LED3 /* heartbeat */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 358ad816e52d4253b38c2f312e6b1cbd89e0dbf7 upstream.
Older PA-RISC machines have LEDs which show the disk- and LAN-activity. The computation is done in software and takes quite some time, e.g. on a J6500 this may take up to 60% time of one CPU if the machine is loaded via network traffic.
Since most people don't care about the LEDs, start with LEDs disabled and just show a CPU heartbeat LED. The disk and LAN LEDs can be turned on manually via /proc/pdc/led.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/parisc/led.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/parisc/led.c +++ b/drivers/parisc/led.c @@ -56,8 +56,8 @@ static int led_type __read_mostly = -1; static unsigned char lastleds; /* LED state from most recent update */ static unsigned int led_heartbeat __read_mostly = 1; -static unsigned int led_diskio __read_mostly = 1; -static unsigned int led_lanrxtx __read_mostly = 1; +static unsigned int led_diskio __read_mostly; +static unsigned int led_lanrxtx __read_mostly; static char lcd_text[32] __read_mostly; static char lcd_text_default[32] __read_mostly; static int lcd_no_led_support __read_mostly = 0; /* KittyHawk doesn't support LED on its LCD */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bharath SM bharathsm@microsoft.com
commit b6d44d42313baa45a81ce9b299aeee2ccf3d0ee1 upstream.
We read and cache directory contents when we get directory lease, so we should ask for read permission to read contents of directory.
Signed-off-by: Bharath SM bharathsm@microsoft.com Reviewed-by: Shyam Prasad N sprasad@microsoft.com Cc: stable@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/cached_dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/smb/client/cached_dir.c +++ b/fs/smb/client/cached_dir.c @@ -218,7 +218,7 @@ int open_cached_dir(unsigned int xid, st .tcon = tcon, .path = path, .create_options = cifs_create_options(cifs_sb, CREATE_NOT_FILE), - .desired_access = FILE_READ_ATTRIBUTES, + .desired_access = FILE_READ_DATA | FILE_READ_ATTRIBUTES, .disposition = FILE_OPEN, .fid = pfid, };
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Raag Jadav raag.jadav@intel.com
commit d5301c90716a8e20bc961a348182daca00c8e8f0 upstream.
First argument of acpi_*_address_space_handler() APIs is acpi_handle of the device, which is incorrectly passed in driver ->remove() path here. Fix it by passing the appropriate argument and while at it, make both API calls consistent using ACPI_HANDLE().
Fixes: a0b028597d59 ("pinctrl: cherryview: Add support for GMMR GPIO opregion") Cc: stable@vger.kernel.org Signed-off-by: Raag Jadav raag.jadav@intel.com Acked-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/intel/pinctrl-cherryview.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/pinctrl/intel/pinctrl-cherryview.c +++ b/drivers/pinctrl/intel/pinctrl-cherryview.c @@ -1701,7 +1701,6 @@ static int chv_pinctrl_probe(struct plat struct intel_community_context *cctx; struct intel_community *community; struct device *dev = &pdev->dev; - struct acpi_device *adev = ACPI_COMPANION(dev); struct intel_pinctrl *pctrl; acpi_status status; unsigned int i; @@ -1769,7 +1768,7 @@ static int chv_pinctrl_probe(struct plat if (ret) return ret;
- status = acpi_install_address_space_handler(adev->handle, + status = acpi_install_address_space_handler(ACPI_HANDLE(dev), community->acpi_space_id, chv_pinctrl_mmio_access_handler, NULL, pctrl); @@ -1786,7 +1785,7 @@ static int chv_pinctrl_remove(struct pla struct intel_pinctrl *pctrl = platform_get_drvdata(pdev); const struct intel_community *community = &pctrl->communities[0];
- acpi_remove_address_space_handler(ACPI_COMPANION(&pdev->dev), + acpi_remove_address_space_handler(ACPI_HANDLE(&pdev->dev), community->acpi_space_id, chv_pinctrl_mmio_access_handler);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit 172044e30b00977784269e8ab72132a48293c654 upstream.
select:false makes the schema basically ignored and not effective, which is clearly not what we want for a device binding.
Fixes: 352546805a44 ("dt-bindings: clock: Add bindings for versal clock driver") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20230728165923.108589-1-krzysztof.kozlowski@linaro... Reviewed-by: Conor Dooley conor.dooley@microchip.com Reviewed-by: Shubhrajyoti Datta shubhrajyoti.datta@amd.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/devicetree/bindings/clock/xlnx,versal-clk.yaml | 2 -- 1 file changed, 2 deletions(-)
--- a/Documentation/devicetree/bindings/clock/xlnx,versal-clk.yaml +++ b/Documentation/devicetree/bindings/clock/xlnx,versal-clk.yaml @@ -14,8 +14,6 @@ description: | reads required input clock frequencies from the devicetree and acts as clock provider for all clock consumers of PS clocks.
-select: false - properties: compatible: const: xlnx,versal-clk
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ahmad Fatoum a.fatoum@pengutronix.de
commit 72d00e560d10665e6139c9431956a87ded6e9880 upstream.
Since commit b09c68dc57c9 ("clk: imx: pll14xx: Support dynamic rates"), the driver has the ability to dynamically compute PLL parameters to approximate the requested rates. This is not always used, because the logic is as follows:
- Check if the target rate is hardcoded in the frequency table - Check if varying only kdiv is possible, so switch over is glitch free - Compute rate dynamically by iterating over pdiv range
If we skip the frequency table for the 1443x PLL, we find that the computed values differ to the hardcoded ones. This can be valid if the hardcoded values guarantee for example an earlier lock-in or if the divisors are chosen, so that other important rates are more likely to be reached glitch-free.
For rates (393216000 and 361267200, this doesn't seem to be the case: They are only approximated by existing parameters (393215995 and 361267196 Hz, respectively) and they aren't reachable glitch-free from other hardcoded frequencies. Dropping them from the table allows us to lock-in to these frequencies exactly.
This is immediately noticeable because they are the assigned-clock-rates for IMX8MN_AUDIO_PLL1 and IMX8MN_AUDIO_PLL2, respectively and a look into clk_summary so far showed that they were a few Hz short of the target:
imx8mn-board:~# grep audio_pll[12]_out /sys/kernel/debug/clk/clk_summary audio_pll2_out 0 0 0 361267196 0 0 50000 N audio_pll1_out 1 1 0 393215995 0 0 50000 Y
and afterwards:
imx8mn-board:~# grep audio_pll[12]_out /sys/kernel/debug/clk/clk_summary audio_pll2_out 0 0 0 361267200 0 0 50000 N audio_pll1_out 1 1 0 393216000 0 0 50000 Y
This change is equivalent to adding following hardcoded values:
/* rate mdiv pdiv sdiv kdiv */ PLL_1443X_RATE(393216000, 655, 5, 3, 23593), PLL_1443X_RATE(361267200, 497, 33, 0, -16882),
Fixes: 053a4ffe2988 ("clk: imx: imx8mm: fix audio pll setting") Cc: stable@vger.kernel.org # v5.18+ Signed-off-by: Ahmad Fatoum a.fatoum@pengutronix.de Signed-off-by: Marco Felsch m.felsch@pengutronix.de Link: https://lore.kernel.org/r/20230807084744.1184791-2-m.felsch@pengutronix.de Signed-off-by: Abel Vesa abel.vesa@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/imx/clk-pll14xx.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/clk/imx/clk-pll14xx.c +++ b/drivers/clk/imx/clk-pll14xx.c @@ -64,8 +64,6 @@ static const struct imx_pll14xx_rate_tab PLL_1443X_RATE(650000000U, 325, 3, 2, 0), PLL_1443X_RATE(594000000U, 198, 2, 2, 0), PLL_1443X_RATE(519750000U, 173, 2, 2, 16384), - PLL_1443X_RATE(393216000U, 262, 2, 3, 9437), - PLL_1443X_RATE(361267200U, 361, 3, 3, 17511), };
struct imx_pll14xx_clk imx_1443x_pll = {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marco Felsch m.felsch@pengutronix.de
commit 37cfd5e457cbdcd030f378127ff2d62776f641e7 upstream.
The PLL14xx hardware can be found on i.MX8M{M,N,P} SoCs and always come with a 6-bit pre-divider. Neither the reference manuals nor the datasheets of these SoCs do mention any restrictions. Furthermore the current code doesn't respect the restrictions from the comment too.
Therefore drop the restriction and align the max pre-divider (pdiv) value to 63 to get more accurate frequencies.
Fixes: b09c68dc57c9 ("clk: imx: pll14xx: Support dynamic rates") Cc: stable@vger.kernel.org Signed-off-by: Marco Felsch m.felsch@pengutronix.de Reviewed-by: Abel Vesa abel.vesa@linaro.org Reviewed-by: Adam Ford aford173@gmail.com Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Acked-by: Sascha Hauer s.hauer@pengutronix.de Tested-by: Ahmad Fatoum a.fatoum@pengutronix.de Link: https://lore.kernel.org/r/20230807084744.1184791-1-m.felsch@pengutronix.de Signed-off-by: Abel Vesa abel.vesa@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/imx/clk-pll14xx.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/drivers/clk/imx/clk-pll14xx.c +++ b/drivers/clk/imx/clk-pll14xx.c @@ -137,11 +137,10 @@ static void imx_pll14xx_calc_settings(st /* * Fractional PLL constrains: * - * a) 6MHz <= prate <= 25MHz - * b) 1 <= p <= 63 (1 <= p <= 4 prate = 24MHz) - * c) 64 <= m <= 1023 - * d) 0 <= s <= 6 - * e) -32768 <= k <= 32767 + * a) 1 <= p <= 63 + * b) 64 <= m <= 1023 + * c) 0 <= s <= 6 + * d) -32768 <= k <= 32767 * * fvco = (m * 65536 + k) * prate / (p * 65536) */ @@ -184,7 +183,7 @@ static void imx_pll14xx_calc_settings(st }
/* Finally calculate best values */ - for (pdiv = 1; pdiv <= 7; pdiv++) { + for (pdiv = 1; pdiv <= 63; pdiv++) { for (sdiv = 0; sdiv <= 6; sdiv++) { /* calc mdiv = round(rate * pdiv * 2^sdiv) / prate) */ mdiv = DIV_ROUND_CLOSEST(rate * (pdiv << sdiv), prate);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
commit 1583694bb4eaf186f17131dbc1b83d6057d2749b upstream.
The pll0_vote clock definitely should have pll0 as a parent (instead of pll8).
Fixes: 7792a8d6713c ("clk: mdm9615: Add support for MDM9615 Clock Controllers") Cc: stable@kernel.org Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20230512211727.3445575-7-dmitry.baryshkov@linaro.o... Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/qcom/gcc-mdm9615.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/clk/qcom/gcc-mdm9615.c +++ b/drivers/clk/qcom/gcc-mdm9615.c @@ -58,7 +58,7 @@ static struct clk_regmap pll0_vote = { .enable_mask = BIT(0), .hw.init = &(struct clk_init_data){ .name = "pll0_vote", - .parent_names = (const char *[]){ "pll8" }, + .parent_names = (const char *[]){ "pll0" }, .num_parents = 1, .ops = &clk_pll_vote_ops, },
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chris Lew quic_clew@quicinc.com
commit 8d207400fd6b79c92aeb2f33bb79f62dff904ea2 upstream.
The QMI TLV value for strings in a lot of qmi element info structures account for null terminated strings with MAX_LEN + 1. If a string is actually MAX_LEN + 1 length, this will cause an out of bounds access when the NULL character is appended in decoding.
Fixes: 9b8a11e82615 ("soc: qcom: Introduce QMI encoder/decoder") Cc: stable@vger.kernel.org Signed-off-by: Chris Lew quic_clew@quicinc.com Signed-off-by: Praveenkumar I quic_ipkumar@quicinc.com Link: https://lore.kernel.org/r/20230801064712.3590128-1-quic_ipkumar@quicinc.com Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/qcom/qmi_encdec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/soc/qcom/qmi_encdec.c +++ b/drivers/soc/qcom/qmi_encdec.c @@ -534,8 +534,8 @@ static int qmi_decode_string_elem(const decoded_bytes += rc; }
- if (string_len > temp_ei->elem_len) { - pr_err("%s: String len %d > Max Len %d\n", + if (string_len >= temp_ei->elem_len) { + pr_err("%s: String len %d >= Max Len %d\n", __func__, string_len, temp_ei->elem_len); return -ETOOSMALL; } else if (string_len > tlv_len) {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit b0f3d01bda6c3f6f811e70f76d2040ae81f64565 upstream.
Make sure to decrement the runtime PM usage count before returning in case regmap initialisation fails.
Fixes: 16fb89f92ec4 ("clk: qcom: Add support for Display Clock Controller on SM8450") Cc: stable@vger.kernel.org # 6.1 Cc: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20230718132902.21430-3-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/qcom/dispcc-sm8450.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
--- a/drivers/clk/qcom/dispcc-sm8450.c +++ b/drivers/clk/qcom/dispcc-sm8450.c @@ -1776,8 +1776,10 @@ static int disp_cc_sm8450_probe(struct p return ret;
regmap = qcom_cc_map(pdev, &disp_cc_sm8450_desc); - if (IS_ERR(regmap)) - return PTR_ERR(regmap); + if (IS_ERR(regmap)) { + ret = PTR_ERR(regmap); + goto err_put_rpm; + }
clk_lucid_evo_pll_configure(&disp_cc_pll0, regmap, &disp_cc_pll0_config); clk_lucid_evo_pll_configure(&disp_cc_pll1, regmap, &disp_cc_pll1_config); @@ -1792,9 +1794,16 @@ static int disp_cc_sm8450_probe(struct p regmap_update_bits(regmap, 0xe05c, BIT(0), BIT(0));
ret = qcom_cc_really_probe(pdev, &disp_cc_sm8450_desc, regmap); + if (ret) + goto err_put_rpm;
pm_runtime_put(&pdev->dev);
+ return 0; + +err_put_rpm: + pm_runtime_put_sync(&pdev->dev); + return ret; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit acaf1b3296a504d4a61b685f78baae771421608d upstream.
Make sure to decrement the runtime PM usage count before returning in case regmap initialisation fails.
Fixes: 90114ca11476 ("clk: qcom: add SM8550 DISPCC driver") Cc: stable@vger.kernel.org # 6.3 Cc: Neil Armstrong neil.armstrong@linaro.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20230718132902.21430-4-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/qcom/dispcc-sm8550.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
--- a/drivers/clk/qcom/dispcc-sm8550.c +++ b/drivers/clk/qcom/dispcc-sm8550.c @@ -1761,8 +1761,10 @@ static int disp_cc_sm8550_probe(struct p return ret;
regmap = qcom_cc_map(pdev, &disp_cc_sm8550_desc); - if (IS_ERR(regmap)) - return PTR_ERR(regmap); + if (IS_ERR(regmap)) { + ret = PTR_ERR(regmap); + goto err_put_rpm; + }
clk_lucid_evo_pll_configure(&disp_cc_pll0, regmap, &disp_cc_pll0_config); clk_lucid_evo_pll_configure(&disp_cc_pll1, regmap, &disp_cc_pll1_config); @@ -1777,9 +1779,16 @@ static int disp_cc_sm8550_probe(struct p regmap_update_bits(regmap, 0xe054, BIT(0), BIT(0));
ret = qcom_cc_really_probe(pdev, &disp_cc_sm8550_desc, regmap); + if (ret) + goto err_put_rpm;
pm_runtime_put(&pdev->dev);
+ return 0; + +err_put_rpm: + pm_runtime_put_sync(&pdev->dev); + return ret; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 66af5339d4f8e20c6d89a490570bd94d40f1a7f6 upstream.
Drivers that enable runtime PM must make sure that the controller is runtime resumed before accessing its registers to prevent the power domain from being disabled.
Fixes: 4ab43d171181 ("clk: qcom: Add lpass clock controller driver for SC7280") Cc: stable@vger.kernel.org # 5.16 Cc: Taniya Das quic_tdas@quicinc.com Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20230718132902.21430-6-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/qcom/lpasscc-sc7280.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
--- a/drivers/clk/qcom/lpasscc-sc7280.c +++ b/drivers/clk/qcom/lpasscc-sc7280.c @@ -118,9 +118,13 @@ static int lpass_cc_sc7280_probe(struct ret = pm_clk_add(&pdev->dev, "iface"); if (ret < 0) { dev_err(&pdev->dev, "failed to acquire iface clock\n"); - goto destroy_pm_clk; + goto err_destroy_pm_clk; }
+ ret = pm_runtime_resume_and_get(&pdev->dev); + if (ret) + goto err_destroy_pm_clk; + if (!of_property_read_bool(pdev->dev.of_node, "qcom,adsp-pil-mode")) { lpass_regmap_config.name = "qdsp6ss"; lpass_regmap_config.max_register = 0x3f; @@ -128,7 +132,7 @@ static int lpass_cc_sc7280_probe(struct
ret = qcom_cc_probe_by_index(pdev, 0, desc); if (ret) - goto destroy_pm_clk; + goto err_put_rpm; }
lpass_regmap_config.name = "top_cc"; @@ -137,11 +141,15 @@ static int lpass_cc_sc7280_probe(struct
ret = qcom_cc_probe_by_index(pdev, 1, desc); if (ret) - goto destroy_pm_clk; + goto err_put_rpm; + + pm_runtime_put(&pdev->dev);
return 0;
-destroy_pm_clk: +err_put_rpm: + pm_runtime_put_sync(&pdev->dev); +err_destroy_pm_clk: pm_clk_destroy(&pdev->dev);
return ret;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 97112c83f4671a4a722f99a53be4e91fac4091bc upstream.
Drivers that enable runtime PM must make sure that the controller is runtime resumed before accessing its registers to prevent the power domain from being disabled.
Fixes: 6cdef2738db0 ("clk: qcom: Add Q6SSTOP clock controller for QCS404") Cc: stable@vger.kernel.org # 5.5 Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20230718132902.21430-7-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/qcom/q6sstop-qcs404.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
--- a/drivers/clk/qcom/q6sstop-qcs404.c +++ b/drivers/clk/qcom/q6sstop-qcs404.c @@ -174,21 +174,32 @@ static int q6sstopcc_qcs404_probe(struct return ret; }
+ ret = pm_runtime_resume_and_get(&pdev->dev); + if (ret) + return ret; + q6sstop_regmap_config.name = "q6sstop_tcsr"; desc = &tcsr_qcs404_desc;
ret = qcom_cc_probe_by_index(pdev, 1, desc); if (ret) - return ret; + goto err_put_rpm;
q6sstop_regmap_config.name = "q6sstop_cc"; desc = &q6sstop_qcs404_desc;
ret = qcom_cc_probe_by_index(pdev, 0, desc); if (ret) - return ret; + goto err_put_rpm; + + pm_runtime_put(&pdev->dev);
return 0; + +err_put_rpm: + pm_runtime_put_sync(&pdev->dev); + + return ret; }
static const struct dev_pm_ops q6sstopcc_pm_ops = {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit e2349da0fa7ca822cda72f427345b95795358fe7 upstream.
Drivers that enable runtime PM must make sure that the controller is runtime resumed before accessing its registers to prevent the power domain from being disabled.
Fixes: 8def929c4097 ("clk: qcom: Add modem clock controller driver for SC7180") Cc: stable@vger.kernel.org # 5.7 Cc: Taniya Das quic_tdas@quicinc.com Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20230718132902.21430-8-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/qcom/mss-sc7180.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
--- a/drivers/clk/qcom/mss-sc7180.c +++ b/drivers/clk/qcom/mss-sc7180.c @@ -87,11 +87,22 @@ static int mss_sc7180_probe(struct platf return ret; }
+ ret = pm_runtime_resume_and_get(&pdev->dev); + if (ret) + return ret; + ret = qcom_cc_probe(pdev, &mss_sc7180_desc); if (ret < 0) - return ret; + goto err_put_rpm; + + pm_runtime_put(&pdev->dev);
return 0; + +err_put_rpm: + pm_runtime_put_sync(&pdev->dev); + + return ret; }
static const struct dev_pm_ops mss_sc7180_pm_ops = {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
commit 88975a55969e11f26fe3846bf4fbf8e7dc8cbbd4 upstream.
We must ensure that the subrequests are joined back into the head before we can retransmit a request. If the head was not on the commit lists, because the server wrote it synchronously, we still need to add it back to the retransmission list. Add a call that mirrors the effect of nfs_cancel_remove_inode() for O_DIRECT.
Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfs/direct.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
--- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -472,13 +472,31 @@ out: return result; }
+static void nfs_direct_add_page_head(struct list_head *list, + struct nfs_page *req) +{ + struct nfs_page *head = req->wb_head; + + if (!list_empty(&head->wb_list) || !nfs_lock_request(head)) + return; + if (!list_empty(&head->wb_list)) { + nfs_unlock_request(head); + return; + } + list_add(&head->wb_list, list); + kref_get(&head->wb_kref); + kref_get(&head->wb_kref); +} + static void nfs_direct_join_group(struct list_head *list, struct inode *inode) { struct nfs_page *req, *subreq;
list_for_each_entry(req, list, wb_list) { - if (req->wb_head != req) + if (req->wb_head != req) { + nfs_direct_add_page_head(&req->wb_list, req); continue; + } subreq = req->wb_this_page; if (subreq == req) continue;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fedor Pchelkin pchelkin@ispras.ru
commit 96562c45af5c31b89a197af28f79bfa838fb8391 upstream.
It is an almost improbable error case but when page allocating loop in nfs4_get_device_info() fails then we should only free the already allocated pages, as __free_page() can't deal with NULL arguments.
Found by Linux Verification Center (linuxtesting.org).
Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Reviewed-by: Benjamin Coddington bcodding@redhat.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfs/pnfs_dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nfs/pnfs_dev.c +++ b/fs/nfs/pnfs_dev.c @@ -154,7 +154,7 @@ nfs4_get_device_info(struct nfs_server * set_bit(NFS_DEVICEID_NOCACHE, &d->flags);
out_free_pages: - for (i = 0; i < max_pages; i++) + while (--i >= 0) __free_page(pages[i]); kfree(pages); out_free_pdev:
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qiang Yu quic_qianyu@quicinc.com
commit cabce92dd805945a090dc6fc73b001bb35ed083a upstream.
In RDDM EE, device can not process MHI reset issued by host. In case of MHI power off, host is issuing MHI reset and polls for it to get cleared until it times out. Since this timeout can not be avoided in case of RDDM, skip the MHI reset in this scenarios.
Cc: stable@vger.kernel.org Fixes: a6e2e3522f29 ("bus: mhi: core: Add support for PM state transitions") Signed-off-by: Qiang Yu quic_qianyu@quicinc.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Manivannan Sadhasivam mani@kernel.org Link: https://lore.kernel.org/r/1684390959-17836-1-git-send-email-quic_qianyu@quic... Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/bus/mhi/host/pm.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/bus/mhi/host/pm.c +++ b/drivers/bus/mhi/host/pm.c @@ -470,6 +470,10 @@ static void mhi_pm_disable_transition(st
/* Trigger MHI RESET so that the device will not access host memory */ if (!MHI_PM_IN_FATAL_STATE(mhi_cntrl->pm_state)) { + /* Skip MHI RESET if in RDDM state */ + if (mhi_cntrl->rddm_image && mhi_get_exec_env(mhi_cntrl) == MHI_EE_RDDM) + goto skip_mhi_reset; + dev_dbg(dev, "Triggering MHI Reset in device\n"); mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);
@@ -495,6 +499,7 @@ static void mhi_pm_disable_transition(st } }
+skip_mhi_reset: dev_dbg(dev, "Waiting for all pending event ring processing to complete\n"); mhi_event = mhi_cntrl->mhi_event;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit 233046a2afd12a4f699305b92ee634eebf1e4f31 ]
Commit 3089b2be0cce ("kbuild: rpm-pkg: fix build error when _arch is undefined") does not work as intended; _arch is always defined as $UTS_MACHINE.
The intention was to define _arch to $UTS_MACHINE only when it is not defined.
Fixes: 3089b2be0cce ("kbuild: rpm-pkg: fix build error when _arch is undefined") Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/package/mkspec | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/package/mkspec b/scripts/package/mkspec index 8049f0e2c110f..c9299f9c1f3e4 100755 --- a/scripts/package/mkspec +++ b/scripts/package/mkspec @@ -57,7 +57,7 @@ $S BuildRequires: gcc make openssl openssl-devel perl python3 rsync
# $UTS_MACHINE as a fallback of _arch in case # /usr/lib/rpm/platform/*/macros was not included. - %define _arch %{?_arch:$UTS_MACHINE} + %{!?_arch: %define _arch $UTS_MACHINE} %define __spec_install_post /usr/lib/rpm/brp-compress || : %define debug_package %{nil}
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit 2429742e506a2b5939a62c629c4a46d91df0ada8 ]
Commit 961ab4a3cd66 ("kbuild: merge scripts/Makefile.modsign to scripts/Makefile.modinst") started to run depmod at the end of 'make modules_sign'.
Move the depmod rule to scripts/Makefile.modinst and run it only when $(modules_sign_only) is empty.
Fixes: 961ab4a3cd66 ("kbuild: merge scripts/Makefile.modsign to scripts/Makefile.modinst") Signed-off-by: Masahiro Yamada masahiroy@kernel.org Reviewed-by: Nicolas Schier nicolas@fjasle.eu Signed-off-by: Sasha Levin sashal@kernel.org --- Makefile | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/Makefile b/Makefile index 901cdfa5e7d3b..a5178b9863fb2 100644 --- a/Makefile +++ b/Makefile @@ -1962,7 +1962,9 @@ quiet_cmd_depmod = DEPMOD $(MODLIB)
modules_install: $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modinst +ifndef modules_sign_only $(call cmd,depmod) +endif
else # CONFIG_MODULES
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby jslaby@suse.cz
[ Upstream commit bfb41e46d0b040ae83c1c4a50292298208b10f73 ]
Commit 2eab791f940b ("kbuild: dummy-tools: support MPROFILE_KERNEL checks for ppc") added support for ppc64le's checks for -mprofile-kernel.
Now, commit aec0ba7472a7 ("powerpc/64: Use -mprofile-kernel for big endian ELFv2 kernels") added support for -mprofile-kernel even on big-endian ppc.
So lift the check in gcc-check-mprofile-kernel.sh to support big-endian too.
Fixes: aec0ba7472a7 ("powerpc/64: Use -mprofile-kernel for big endian ELFv2 kernels") Signed-off-by: Jiri Slaby jslaby@suse.cz Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/dummy-tools/gcc | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/scripts/dummy-tools/gcc b/scripts/dummy-tools/gcc index 1db1889f6d81e..07f6dc4c5cf69 100755 --- a/scripts/dummy-tools/gcc +++ b/scripts/dummy-tools/gcc @@ -85,8 +85,7 @@ if arg_contain -S "$@"; then fi
# For arch/powerpc/tools/gcc-check-mprofile-kernel.sh - if arg_contain -m64 "$@" && arg_contain -mlittle-endian "$@" && - arg_contain -mprofile-kernel "$@"; then + if arg_contain -m64 "$@" && arg_contain -mprofile-kernel "$@"; then if ! test -t 0 && ! grep -q notrace; then echo "_mcount" fi
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 9c377852ddfdc557b1370f196b0cfdf28d233460 ]
Some error paths don't call acpi_put_table() before returning. Branch to the correct place instead of doing some direct return.
Fixes: 4d2732882703 ("tpm_crb: Add support for CRB devices based on Pluton") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Acked-by: Matthew Garrett mgarrett@aurora.tech Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/char/tpm/tpm_crb.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c index a5dbebb1acfcf..ea085b14ab7c9 100644 --- a/drivers/char/tpm/tpm_crb.c +++ b/drivers/char/tpm/tpm_crb.c @@ -775,12 +775,13 @@ static int crb_acpi_add(struct acpi_device *device) FW_BUG "TPM2 ACPI table has wrong size %u for start method type %d\n", buf->header.length, ACPI_TPM2_COMMAND_BUFFER_WITH_PLUTON); - return -EINVAL; + rc = -EINVAL; + goto out; } crb_pluton = ACPI_ADD_PTR(struct tpm2_crb_pluton, buf, sizeof(*buf)); rc = crb_map_pluton(dev, priv, buf, crb_pluton); if (rc) - return rc; + goto out; }
priv->sm = sm;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
[ Upstream commit 6df373b09b1dcf2f7d579f515f653f89a896d417 ]
In gfs2_logd(), switch from an open-coded wait loop to wait_event_interruptible_timeout().
Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Stable-dep-of: b74cd55aa9a9 ("gfs2: low-memory forced flush fixes") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/log.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-)
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index aa568796207c0..d3da259820e30 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -1301,7 +1301,6 @@ int gfs2_logd(void *data) { struct gfs2_sbd *sdp = data; unsigned long t = 1; - DEFINE_WAIT(wait);
while (!kthread_should_stop()) {
@@ -1338,17 +1337,11 @@ int gfs2_logd(void *data)
try_to_freeze();
- do { - prepare_to_wait(&sdp->sd_logd_waitq, &wait, - TASK_INTERRUPTIBLE); - if (!gfs2_ail_flush_reqd(sdp) && - !gfs2_jrnl_flush_reqd(sdp) && - !kthread_should_stop()) - t = schedule_timeout(t); - } while(t && !gfs2_ail_flush_reqd(sdp) && - !gfs2_jrnl_flush_reqd(sdp) && - !kthread_should_stop()); - finish_wait(&sdp->sd_logd_waitq, &wait); + t = wait_event_interruptible_timeout(sdp->sd_logd_waitq, + gfs2_ail_flush_reqd(sdp) || + gfs2_jrnl_flush_reqd(sdp) || + kthread_should_stop(), + t); }
return 0;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
[ Upstream commit b74cd55aa9a9d0aca760028a51343ec79812e410 ]
First, function gfs2_ail_flush_reqd checks the SDF_FORCE_AIL_FLUSH flag to determine if an AIL flush should be forced in low-memory situations. However, it also immediately clears the flag, and when called repeatedly as in function gfs2_logd, the flag will be lost. Fix that by pulling the SDF_FORCE_AIL_FLUSH flag check out of gfs2_ail_flush_reqd.
Second, function gfs2_writepages sets the SDF_FORCE_AIL_FLUSH flag whether or not enough pages were written. If enough pages could be written, flushing the AIL is unnecessary, though.
Third, gfs2_writepages doesn't wake up logd after setting the SDF_FORCE_AIL_FLUSH flag, so it can take a long time for logd to react. It would be preferable to wake up logd, but that hurts the performance of some workloads and we don't quite understand why so far, so don't wake up logd so far.
Fixes: b066a4eebd4f ("gfs2: forcibly flush ail to relieve memory pressure") Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/aops.c | 4 ++-- fs/gfs2/log.c | 8 ++++---- 2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index ae49256b7c8c6..be2759a974f9e 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -183,13 +183,13 @@ static int gfs2_writepages(struct address_space *mapping, int ret;
/* - * Even if we didn't write any pages here, we might still be holding + * Even if we didn't write enough pages here, we might still be holding * dirty pages in the ail. We forcibly flush the ail because we don't * want balance_dirty_pages() to loop indefinitely trying to write out * pages held in the ail that it can't find. */ ret = iomap_writepages(mapping, wbc, &wpc, &gfs2_writeback_ops); - if (ret == 0) + if (ret == 0 && wbc->nr_to_write > 0) set_bit(SDF_FORCE_AIL_FLUSH, &sdp->sd_flags); return ret; } diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index d3da259820e30..aaca22f2aa2d1 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -1282,9 +1282,6 @@ static inline int gfs2_ail_flush_reqd(struct gfs2_sbd *sdp) { unsigned int used_blocks = sdp->sd_jdesc->jd_blocks - atomic_read(&sdp->sd_log_blks_free);
- if (test_and_clear_bit(SDF_FORCE_AIL_FLUSH, &sdp->sd_flags)) - return 1; - return used_blocks + atomic_read(&sdp->sd_log_blks_needed) >= atomic_read(&sdp->sd_log_thresh2); } @@ -1325,7 +1322,9 @@ int gfs2_logd(void *data) GFS2_LFC_LOGD_JFLUSH_REQD); }
- if (gfs2_ail_flush_reqd(sdp)) { + if (test_bit(SDF_FORCE_AIL_FLUSH, &sdp->sd_flags) || + gfs2_ail_flush_reqd(sdp)) { + clear_bit(SDF_FORCE_AIL_FLUSH, &sdp->sd_flags); gfs2_ail1_start(sdp); gfs2_ail1_wait(sdp); gfs2_ail1_empty(sdp, 0); @@ -1338,6 +1337,7 @@ int gfs2_logd(void *data) try_to_freeze();
t = wait_event_interruptible_timeout(sdp->sd_logd_waitq, + test_bit(SDF_FORCE_AIL_FLUSH, &sdp->sd_flags) || gfs2_ail_flush_reqd(sdp) || gfs2_jrnl_flush_reqd(sdp) || kthread_should_stop(),
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonathan Marek jonathan@marek.ca
[ Upstream commit a493208079e299aefdc15169dc80e3da3ebb718a ]
Breaking out early when a match is found leads to an incorrect num_chans value when more than one ipcc mailbox channel is used by the same device.
Fixes: e9d50e4b4d04 ("mailbox: qcom-ipcc: Dynamic alloc for channel arrangement") Signed-off-by: Jonathan Marek jonathan@marek.ca Signed-off-by: Jassi Brar jaswinder.singh@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mailbox/qcom-ipcc.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/mailbox/qcom-ipcc.c b/drivers/mailbox/qcom-ipcc.c index 7e27acf6c0cca..f597a1bd56847 100644 --- a/drivers/mailbox/qcom-ipcc.c +++ b/drivers/mailbox/qcom-ipcc.c @@ -227,10 +227,8 @@ static int qcom_ipcc_setup_mbox(struct qcom_ipcc *ipcc, ret = of_parse_phandle_with_args(client_dn, "mboxes", "#mbox-cells", j, &curr_ph); of_node_put(curr_ph.np); - if (!ret && curr_ph.np == controller_dn) { + if (!ret && curr_ph.np == controller_dn) ipcc->num_chans++; - break; - } } }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konstantin Meskhidze konstantin.meskhidze@huawei.com
[ Upstream commit a3b7039bb2b22fcd2ad20d59c00ed4e606ce3754 ]
Buffer 'new_argv' is accessed without bound check after accessing with bound check via 'new_argc' index.
Fixes: e298f3b49def ("kconfig: add built-in function support") Co-developed-by: Ivanov Mikhail ivanov.mikhail1@huawei-partners.com Signed-off-by: Konstantin Meskhidze konstantin.meskhidze@huawei.com Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/kconfig/preprocess.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/scripts/kconfig/preprocess.c b/scripts/kconfig/preprocess.c index 748da578b418c..d1f5bcff4b62d 100644 --- a/scripts/kconfig/preprocess.c +++ b/scripts/kconfig/preprocess.c @@ -396,6 +396,9 @@ static char *eval_clause(const char *str, size_t len, int argc, char *argv[])
p++; } + + if (new_argc >= FUNCTION_MAX_ARGS) + pperror("too many function arguments"); new_argv[new_argc++] = prev;
/*
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xie XiuQi xiexiuqi@huawei.com
[ Upstream commit 7f33105cdd59a99d068d3d147723a865d10e2260 ]
Commit 97d5f2e9ee12 ("tools api fs: More thread safety for global filesystem variables") introduces pthread_once, so the libpthread should be added at link time, or we'll meet the following compile error when 'make -C tools/mm':
gcc -Wall -Wextra -I../lib/ -o page-types page-types.c ../lib/api/libapi.a ~/linux/tools/lib/api/fs/fs.c:146: undefined reference to `pthread_once' ~/linux/tools/lib/api/fs/fs.c:147: undefined reference to `pthread_once' ~/linux/tools/lib/api/fs/fs.c:148: undefined reference to `pthread_once' ~/linux/tools/lib/api/fs/fs.c:149: undefined reference to `pthread_once' ~/linux/tools/lib/api/fs/fs.c:150: undefined reference to `pthread_once' /usr/bin/ld: ../lib/api/libapi.a(libapi-in.o):~/linux/tools/lib/api/fs/fs.c:151: more undefined references to `pthread_once' follow collect2: error: ld returned 1 exit status make: *** [Makefile:22: page-types] Error 1
Link: https://lkml.kernel.org/r/20230831034205.2376653-1-xiexiuqi@huaweicloud.com Fixes: 97d5f2e9ee12 ("tools api fs: More thread safety for global filesystem variables") Signed-off-by: Xie XiuQi xiexiuqi@huawei.com Acked-by: Ian Rogers irogers@google.com Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/mm/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/mm/Makefile b/tools/mm/Makefile index 6c1da51f4177c..1c5606cc33346 100644 --- a/tools/mm/Makefile +++ b/tools/mm/Makefile @@ -8,8 +8,8 @@ TARGETS=page-types slabinfo page_owner_sort LIB_DIR = ../lib/api LIBS = $(LIB_DIR)/libapi.a
-CFLAGS += -Wall -Wextra -I../lib/ -LDFLAGS += $(LIBS) +CFLAGS += -Wall -Wextra -I../lib/ -pthread +LDFLAGS += $(LIBS) -pthread
all: $(TARGETS)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff LaBundy jeff@labundy.com
[ Upstream commit 2e00b8bf5624767f6be7427b6eb532524793463e ]
If the device drops into ultra-low-power mode before being placed into normal-power mode as part of ATI being triggered, the device does not assert any interrupts until the ATI routine is restarted two seconds later.
Solve this problem by adopting the vendor's recommendation, which calls for the device to be placed into normal-power mode prior to being configured and ATI being triggered.
The original implementation followed this sequence, but the order was inadvertently changed as part of the resolution of a separate erratum.
Fixes: 1e4189d8af27 ("Input: iqs7222 - protect volatile registers") Signed-off-by: Jeff LaBundy jeff@labundy.com Link: https://lore.kernel.org/r/ZKrpHc2Ji9qR25r2@nixie71 Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/misc/iqs7222.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/input/misc/iqs7222.c b/drivers/input/misc/iqs7222.c index 096b0925f41ba..acb95048e8230 100644 --- a/drivers/input/misc/iqs7222.c +++ b/drivers/input/misc/iqs7222.c @@ -1381,9 +1381,6 @@ static int iqs7222_ati_trigger(struct iqs7222_private *iqs7222) if (error) return error;
- sys_setup &= ~IQS7222_SYS_SETUP_INTF_MODE_MASK; - sys_setup &= ~IQS7222_SYS_SETUP_PWR_MODE_MASK; - for (i = 0; i < IQS7222_NUM_RETRIES; i++) { /* * Trigger ATI from streaming and normal-power modes so that @@ -1561,8 +1558,11 @@ static int iqs7222_dev_init(struct iqs7222_private *iqs7222, int dir) return error; }
- if (dir == READ) + if (dir == READ) { + iqs7222->sys_setup[0] &= ~IQS7222_SYS_SETUP_INTF_MODE_MASK; + iqs7222->sys_setup[0] &= ~IQS7222_SYS_SETUP_PWR_MODE_MASK; return 0; + }
return iqs7222_ati_trigger(iqs7222); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnaldo Carvalho de Melo acme@redhat.com
[ Upstream commit 7962ef13651a9163f07b530607392ea123482e8a ]
In 3cb4d5e00e037c70 ("perf trace: Free syscall tp fields in evsel->priv") it only was freeing if strcmp(evsel->tp_format->system, "syscalls") returned zero, while the corresponding initialization of evsel->priv was being performed if it was _not_ zero, i.e. if the tp system wasn't 'syscalls'.
Just stop looking for that and free it if evsel->priv was set, which should be equivalent.
Also use the pre-existing evsel_trace__delete() function.
This resolves these leaks, detected with:
$ make EXTRA_CFLAGS="-fsanitize=address" BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf-tools-next -C tools/perf install-bin
================================================================= ==481565==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f7343cba097 in calloc (/lib64/libasan.so.8+0xba097) #1 0x987966 in zalloc (/home/acme/bin/perf+0x987966) #2 0x52f9b9 in evsel_trace__new /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:307 #3 0x52f9b9 in evsel__syscall_tp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:333 #4 0x52f9b9 in evsel__init_raw_syscall_tp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:458 #5 0x52f9b9 in perf_evsel__raw_syscall_newtp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:480 #6 0x540e8b in trace__add_syscall_newtp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3212 #7 0x540e8b in trace__run /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3891 #8 0x540e8b in cmd_trace /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:5156 #9 0x5ef262 in run_builtin /home/acme/git/perf-tools-next/tools/perf/perf.c:323 #10 0x4196da in handle_internal_command /home/acme/git/perf-tools-next/tools/perf/perf.c:377 #11 0x4196da in run_argv /home/acme/git/perf-tools-next/tools/perf/perf.c:421 #12 0x4196da in main /home/acme/git/perf-tools-next/tools/perf/perf.c:537 #13 0x7f7342c4a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f)
Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f7343cba097 in calloc (/lib64/libasan.so.8+0xba097) #1 0x987966 in zalloc (/home/acme/bin/perf+0x987966) #2 0x52f9b9 in evsel_trace__new /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:307 #3 0x52f9b9 in evsel__syscall_tp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:333 #4 0x52f9b9 in evsel__init_raw_syscall_tp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:458 #5 0x52f9b9 in perf_evsel__raw_syscall_newtp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:480 #6 0x540dd1 in trace__add_syscall_newtp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3205 #7 0x540dd1 in trace__run /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3891 #8 0x540dd1 in cmd_trace /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:5156 #9 0x5ef262 in run_builtin /home/acme/git/perf-tools-next/tools/perf/perf.c:323 #10 0x4196da in handle_internal_command /home/acme/git/perf-tools-next/tools/perf/perf.c:377 #11 0x4196da in run_argv /home/acme/git/perf-tools-next/tools/perf/perf.c:421 #12 0x4196da in main /home/acme/git/perf-tools-next/tools/perf/perf.c:537 #13 0x7f7342c4a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f)
SUMMARY: AddressSanitizer: 80 byte(s) leaked in 2 allocation(s). [root@quaco ~]#
With this we plug all leaks with "perf trace sleep 1".
Fixes: 3cb4d5e00e037c70 ("perf trace: Free syscall tp fields in evsel->priv") Acked-by: Ian Rogers irogers@google.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: Riccardo Mancini rickyman7@gmail.com Link: https://lore.kernel.org/lkml/20230719202951.534582-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/builtin-trace.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index 6e73d0e957152..4aba576512a15 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -3136,13 +3136,8 @@ static void evlist__free_syscall_tp_fields(struct evlist *evlist) struct evsel *evsel;
evlist__for_each_entry(evlist, evsel) { - struct evsel_trace *et = evsel->priv; - - if (!et || !evsel->tp_format || strcmp(evsel->tp_format->system, "syscalls")) - continue; - - zfree(&et->fmt); - free(et); + evsel_trace__delete(evsel->priv); + evsel->priv = NULL; } }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
[ Upstream commit 0323e8fedd1ef25342cf7abf3a2024f5670362b8 ]
Allocate driver data as first resource in the probe function. This way it can be used during allocation of the other resources (instead of assigning these to local variables first and update driver data only when it's allocated). Also as driver data is allocated using a devm function this should happen first to have the order of freeing resources in the error path and the remove function in reverse.
Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Thierry Reding thierry.reding@gmail.com Stable-dep-of: c11622324c02 ("pwm: atmel-tcb: Fix resource freeing in error path and remove") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pwm/pwm-atmel-tcb.c | 49 +++++++++++++++---------------------- 1 file changed, 20 insertions(+), 29 deletions(-)
diff --git a/drivers/pwm/pwm-atmel-tcb.c b/drivers/pwm/pwm-atmel-tcb.c index 4a116dc44f6e7..613dd1810fb53 100644 --- a/drivers/pwm/pwm-atmel-tcb.c +++ b/drivers/pwm/pwm-atmel-tcb.c @@ -422,13 +422,14 @@ static int atmel_tcb_pwm_probe(struct platform_device *pdev) struct atmel_tcb_pwm_chip *tcbpwm; const struct atmel_tcb_config *config; struct device_node *np = pdev->dev.of_node; - struct regmap *regmap; - struct clk *clk, *gclk = NULL; - struct clk *slow_clk; char clk_name[] = "t0_clk"; int err; int channel;
+ tcbpwm = devm_kzalloc(&pdev->dev, sizeof(*tcbpwm), GFP_KERNEL); + if (tcbpwm == NULL) + return -ENOMEM; + err = of_property_read_u32(np, "reg", &channel); if (err < 0) { dev_err(&pdev->dev, @@ -437,47 +438,37 @@ static int atmel_tcb_pwm_probe(struct platform_device *pdev) return err; }
- regmap = syscon_node_to_regmap(np->parent); - if (IS_ERR(regmap)) - return PTR_ERR(regmap); + tcbpwm->regmap = syscon_node_to_regmap(np->parent); + if (IS_ERR(tcbpwm->regmap)) + return PTR_ERR(tcbpwm->regmap);
- slow_clk = of_clk_get_by_name(np->parent, "slow_clk"); - if (IS_ERR(slow_clk)) - return PTR_ERR(slow_clk); + tcbpwm->slow_clk = of_clk_get_by_name(np->parent, "slow_clk"); + if (IS_ERR(tcbpwm->slow_clk)) + return PTR_ERR(tcbpwm->slow_clk);
clk_name[1] += channel; - clk = of_clk_get_by_name(np->parent, clk_name); - if (IS_ERR(clk)) - clk = of_clk_get_by_name(np->parent, "t0_clk"); - if (IS_ERR(clk)) - return PTR_ERR(clk); + tcbpwm->clk = of_clk_get_by_name(np->parent, clk_name); + if (IS_ERR(tcbpwm->clk)) + tcbpwm->clk = of_clk_get_by_name(np->parent, "t0_clk"); + if (IS_ERR(tcbpwm->clk)) + return PTR_ERR(tcbpwm->clk);
match = of_match_node(atmel_tcb_of_match, np->parent); config = match->data;
if (config->has_gclk) { - gclk = of_clk_get_by_name(np->parent, "gclk"); - if (IS_ERR(gclk)) - return PTR_ERR(gclk); - } - - tcbpwm = devm_kzalloc(&pdev->dev, sizeof(*tcbpwm), GFP_KERNEL); - if (tcbpwm == NULL) { - err = -ENOMEM; - goto err_slow_clk; + tcbpwm->gclk = of_clk_get_by_name(np->parent, "gclk"); + if (IS_ERR(tcbpwm->gclk)) + return PTR_ERR(tcbpwm->gclk); }
tcbpwm->chip.dev = &pdev->dev; tcbpwm->chip.ops = &atmel_tcb_pwm_ops; tcbpwm->chip.npwm = NPWM; tcbpwm->channel = channel; - tcbpwm->regmap = regmap; - tcbpwm->clk = clk; - tcbpwm->gclk = gclk; - tcbpwm->slow_clk = slow_clk; tcbpwm->width = config->counter_width;
- err = clk_prepare_enable(slow_clk); + err = clk_prepare_enable(tcbpwm->slow_clk); if (err) goto err_slow_clk;
@@ -495,7 +486,7 @@ static int atmel_tcb_pwm_probe(struct platform_device *pdev) clk_disable_unprepare(tcbpwm->slow_clk);
err_slow_clk: - clk_put(slow_clk); + clk_put(tcbpwm->slow_clk);
return err; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
[ Upstream commit c11622324c023415fb69196c5fc3782d2b8cced0 ]
Several resources were not freed in the error path and the remove function. Add the forgotten items.
Fixes: 34cbcd72588f ("pwm: atmel-tcb: Add sama5d2 support") Fixes: 061f8572a31c ("pwm: atmel-tcb: Switch to new binding") Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Reviewed-by: Claudiu Beznea claudiu.beznea@tuxon.dev Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pwm/pwm-atmel-tcb.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/pwm/pwm-atmel-tcb.c b/drivers/pwm/pwm-atmel-tcb.c index 613dd1810fb53..2826fc216d291 100644 --- a/drivers/pwm/pwm-atmel-tcb.c +++ b/drivers/pwm/pwm-atmel-tcb.c @@ -450,16 +450,20 @@ static int atmel_tcb_pwm_probe(struct platform_device *pdev) tcbpwm->clk = of_clk_get_by_name(np->parent, clk_name); if (IS_ERR(tcbpwm->clk)) tcbpwm->clk = of_clk_get_by_name(np->parent, "t0_clk"); - if (IS_ERR(tcbpwm->clk)) - return PTR_ERR(tcbpwm->clk); + if (IS_ERR(tcbpwm->clk)) { + err = PTR_ERR(tcbpwm->clk); + goto err_slow_clk; + }
match = of_match_node(atmel_tcb_of_match, np->parent); config = match->data;
if (config->has_gclk) { tcbpwm->gclk = of_clk_get_by_name(np->parent, "gclk"); - if (IS_ERR(tcbpwm->gclk)) - return PTR_ERR(tcbpwm->gclk); + if (IS_ERR(tcbpwm->gclk)) { + err = PTR_ERR(tcbpwm->gclk); + goto err_clk; + } }
tcbpwm->chip.dev = &pdev->dev; @@ -470,7 +474,7 @@ static int atmel_tcb_pwm_probe(struct platform_device *pdev)
err = clk_prepare_enable(tcbpwm->slow_clk); if (err) - goto err_slow_clk; + goto err_gclk;
spin_lock_init(&tcbpwm->lock);
@@ -485,6 +489,12 @@ static int atmel_tcb_pwm_probe(struct platform_device *pdev) err_disable_clk: clk_disable_unprepare(tcbpwm->slow_clk);
+err_gclk: + clk_put(tcbpwm->gclk); + +err_clk: + clk_put(tcbpwm->clk); + err_slow_clk: clk_put(tcbpwm->slow_clk);
@@ -498,8 +508,9 @@ static void atmel_tcb_pwm_remove(struct platform_device *pdev) pwmchip_remove(&tcbpwm->chip);
clk_disable_unprepare(tcbpwm->slow_clk); - clk_put(tcbpwm->slow_clk); + clk_put(tcbpwm->gclk); clk_put(tcbpwm->clk); + clk_put(tcbpwm->slow_clk); }
static const struct of_device_id atmel_tcb_pwm_dt_ids[] = {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Artur Weber aweber.kernel@gmail.com
[ Upstream commit 4c09e20b3c85f60353ace21092e34f35f5e3ab00 ]
As pointed out by Uwe Kleine-König[1], the changes introduced in commit c1ff7da03e16 ("video: backlight: lp855x: Get PWM for PWM mode during probe") caused the PWM state set up by the bootloader to be re-set when the driver is probed. This differs from the behavior from before that patch, where the PWM state would be initialized on the first brightness change.
Fix this by moving the PWM state initialization into the PWM control function. Add a new variable, needs_pwm_init, to the device info struct to allow us to check whether we need the initialization, or whether it has already been done.
[1] https://lore.kernel.org/lkml/20230614083953.e4kkweddjz7wztby@pengutronix.de/
Fixes: c1ff7da03e16 ("video: backlight: lp855x: Get PWM for PWM mode during probe") Signed-off-by: Artur Weber aweber.kernel@gmail.com Reviewed-by: Daniel Thompson daniel.thompson@linaro.org Link: https://lore.kernel.org/r/20230714121440.7717-2-aweber.kernel@gmail.com Signed-off-by: Lee Jones lee@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/backlight/lp855x_bl.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/video/backlight/lp855x_bl.c b/drivers/video/backlight/lp855x_bl.c index 1c9e921bca14a..349ec324bc1ea 100644 --- a/drivers/video/backlight/lp855x_bl.c +++ b/drivers/video/backlight/lp855x_bl.c @@ -71,6 +71,7 @@ struct lp855x { struct device *dev; struct lp855x_platform_data *pdata; struct pwm_device *pwm; + bool needs_pwm_init; struct regulator *supply; /* regulator for VDD input */ struct regulator *enable; /* regulator for EN/VDDIO input */ }; @@ -220,7 +221,15 @@ static void lp855x_pwm_ctrl(struct lp855x *lp, int br, int max_br) { struct pwm_state state;
- pwm_get_state(lp->pwm, &state); + if (lp->needs_pwm_init) { + pwm_init_state(lp->pwm, &state); + /* Legacy platform data compatibility */ + if (lp->pdata->period_ns > 0) + state.period = lp->pdata->period_ns; + lp->needs_pwm_init = false; + } else { + pwm_get_state(lp->pwm, &state); + }
state.duty_cycle = div_u64(br * state.period, max_br); state.enabled = state.duty_cycle; @@ -387,7 +396,6 @@ static int lp855x_probe(struct i2c_client *cl) const struct i2c_device_id *id = i2c_client_get_device_id(cl); const struct acpi_device_id *acpi_id = NULL; struct device *dev = &cl->dev; - struct pwm_state pwmstate; struct lp855x *lp; int ret;
@@ -470,15 +478,11 @@ static int lp855x_probe(struct i2c_client *cl) else return dev_err_probe(dev, ret, "getting PWM\n");
+ lp->needs_pwm_init = false; lp->mode = REGISTER_BASED; dev_dbg(dev, "mode: register based\n"); } else { - pwm_init_state(lp->pwm, &pwmstate); - /* Legacy platform data compatibility */ - if (lp->pdata->period_ns > 0) - pwmstate.period = lp->pdata->period_ns; - pwm_apply_state(lp->pwm, &pwmstate); - + lp->needs_pwm_init = true; lp->mode = PWM_BASED; dev_dbg(dev, "mode: PWM based\n"); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ying Liu victor.liu@nxp.com
[ Upstream commit fe1328b5b2a087221e31da77e617f4c2b70f3b7f ]
So, let's drop output GPIO direction check and only check GPIO value to set the initial power state.
Fixes: 706dc68102bc ("backlight: gpio: Explicitly set the direction of the GPIO") Signed-off-by: Liu Ying victor.liu@nxp.com Reviewed-by: Andy Shevchenko andy@kernel.org Acked-by: Linus Walleij linus.walleij@linaro.org Acked-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Link: https://lore.kernel.org/r/20230721093342.1532531-1-victor.liu@nxp.com Signed-off-by: Lee Jones lee@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/backlight/gpio_backlight.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/video/backlight/gpio_backlight.c b/drivers/video/backlight/gpio_backlight.c index 5c5c99f7979e3..30ec5b6845335 100644 --- a/drivers/video/backlight/gpio_backlight.c +++ b/drivers/video/backlight/gpio_backlight.c @@ -87,8 +87,7 @@ static int gpio_backlight_probe(struct platform_device *pdev) /* Not booted with device tree or no phandle link to the node */ bl->props.power = def_value ? FB_BLANK_UNBLANK : FB_BLANK_POWERDOWN; - else if (gpiod_get_direction(gbl->gpiod) == 0 && - gpiod_get_value_cansleep(gbl->gpiod) == 0) + else if (gpiod_get_value_cansleep(gbl->gpiod) == 0) bl->props.power = FB_BLANK_POWERDOWN; else bl->props.power = FB_BLANK_UNBLANK;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Rogers irogers@google.com
[ Upstream commit a7a3252dad354a9e5c173156dab959e4019b9467 ]
Split cases in event_pmu for greater accuracy.
Signed-off-by: Ian Rogers irogers@google.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Athira Rajeev atrajeev@linux.vnet.ibm.com Cc: Ingo Molnar mingo@redhat.com Cc: Jiri Olsa jolsa@kernel.org Cc: Kan Liang kan.liang@linux.intel.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Stable-dep-of: b30d4f0b6954 ("perf parse-events: Additional error reporting") Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/parse-events.y | 45 ++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 19 deletions(-)
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y index 9f28d4b5502f1..6b996f22dee3a 100644 --- a/tools/perf/util/parse-events.y +++ b/tools/perf/util/parse-events.y @@ -285,37 +285,42 @@ event_pmu: PE_NAME opt_pmu_config { struct parse_events_state *parse_state = _parse_state; - struct parse_events_error *error = parse_state->error; struct list_head *list = NULL, *orig_terms = NULL, *terms= NULL; + struct parse_events_error *error = parse_state->error; char *pattern = NULL;
-#define CLEANUP_YYABORT \ +#define CLEANUP \ do { \ parse_events_terms__delete($2); \ parse_events_terms__delete(orig_terms); \ free(list); \ free($1); \ free(pattern); \ - YYABORT; \ } while(0)
- if (parse_events_copy_term_list($2, &orig_terms)) - CLEANUP_YYABORT; - if (error) error->idx = @1.first_column;
+ if (parse_events_copy_term_list($2, &orig_terms)) { + CLEANUP; + YYNOMEM; + } + list = alloc_list(); - if (!list) - CLEANUP_YYABORT; + if (!list) { + CLEANUP; + YYNOMEM; + } /* Attempt to add to list assuming $1 is a PMU name. */ if (parse_events_add_pmu(parse_state, list, $1, $2, /*auto_merge_stats=*/false)) { struct perf_pmu *pmu = NULL; int ok = 0;
/* Failure to add, try wildcard expansion of $1 as a PMU name. */ - if (asprintf(&pattern, "%s*", $1) < 0) - CLEANUP_YYABORT; + if (asprintf(&pattern, "%s*", $1) < 0) { + CLEANUP; + YYNOMEM; + }
while ((pmu = perf_pmus__scan(pmu)) != NULL) { char *name = pmu->name; @@ -330,8 +335,10 @@ PE_NAME opt_pmu_config !perf_pmu__match(pattern, pmu->alias_name, $1)) { bool auto_merge_stats = perf_pmu__auto_merge_stats(pmu);
- if (parse_events_copy_term_list(orig_terms, &terms)) - CLEANUP_YYABORT; + if (parse_events_copy_term_list(orig_terms, &terms)) { + CLEANUP; + YYNOMEM; + } if (!parse_events_add_pmu(parse_state, list, pmu->name, terms, auto_merge_stats)) { ok++; @@ -347,15 +354,15 @@ PE_NAME opt_pmu_config ok = !parse_events_multi_pmu_add(parse_state, $1, $2, &list); $2 = NULL; } - if (!ok) - CLEANUP_YYABORT; + if (!ok) { + CLEANUP; + YYABORT; + } } - parse_events_terms__delete($2); - parse_events_terms__delete(orig_terms); - free(pattern); - free($1); $$ = list; -#undef CLEANUP_YYABORT + list = NULL; + CLEANUP; +#undef CLEANUP } | PE_KERNEL_PMU_EVENT sep_dc
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Rogers irogers@google.com
[ Upstream commit 77cdd787fc45e3426b8e0b5038b85c276540dfb4 ]
Migration to improve error reporting as YYABORT cases should carry event parsing errors.
Signed-off-by: Ian Rogers irogers@google.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Athira Rajeev atrajeev@linux.vnet.ibm.com Cc: Ingo Molnar mingo@redhat.com Cc: Jiri Olsa jolsa@kernel.org Cc: Kan Liang kan.liang@linux.intel.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Stable-dep-of: b30d4f0b6954 ("perf parse-events: Additional error reporting") Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/parse-events.y | 58 +++++++++++++++++++++++----------- 1 file changed, 40 insertions(+), 18 deletions(-)
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y index 6b996f22dee3a..78c1f49d8d7e4 100644 --- a/tools/perf/util/parse-events.y +++ b/tools/perf/util/parse-events.y @@ -455,7 +455,8 @@ value_sym '/' event_config '/' bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE);
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; err = parse_events_add_numeric(_parse_state, list, type, config, $3, wildcard); parse_events_terms__delete($3); if (err) { @@ -473,7 +474,8 @@ value_sym sep_slash_slash_dc bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE);
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; ABORT_ON(parse_events_add_numeric(_parse_state, list, type, config, /*head_config=*/NULL, wildcard)); $$ = list; @@ -484,7 +486,8 @@ PE_VALUE_SYM_TOOL sep_slash_slash_dc struct list_head *list;
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; ABORT_ON(parse_events_add_tool(_parse_state, list, $1)); $$ = list; } @@ -497,7 +500,9 @@ PE_LEGACY_CACHE opt_event_config int err;
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; + err = parse_events_add_cache(list, &parse_state->idx, $1, parse_state, $2);
parse_events_terms__delete($2); @@ -516,7 +521,9 @@ PE_PREFIX_MEM PE_VALUE PE_BP_SLASH PE_VALUE PE_BP_COLON PE_MODIFIER_BP opt_event int err;
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; + err = parse_events_add_breakpoint(_parse_state, list, $2, $6, $4, $7); parse_events_terms__delete($7); @@ -534,7 +541,9 @@ PE_PREFIX_MEM PE_VALUE PE_BP_SLASH PE_VALUE opt_event_config int err;
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; + err = parse_events_add_breakpoint(_parse_state, list, $2, NULL, $4, $5); parse_events_terms__delete($5); @@ -551,7 +560,9 @@ PE_PREFIX_MEM PE_VALUE PE_BP_COLON PE_MODIFIER_BP opt_event_config int err;
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; + err = parse_events_add_breakpoint(_parse_state, list, $2, $4, 0, $5); parse_events_terms__delete($5); @@ -569,7 +580,8 @@ PE_PREFIX_MEM PE_VALUE opt_event_config int err;
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; err = parse_events_add_breakpoint(_parse_state, list, $2, NULL, 0, $3); parse_events_terms__delete($3); @@ -589,7 +601,8 @@ tracepoint_name opt_event_config int err;
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; if (error) error->idx = @1.first_column;
@@ -621,7 +634,8 @@ PE_VALUE ':' PE_VALUE opt_event_config int err;
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; err = parse_events_add_numeric(_parse_state, list, (u32)$1, $3, $4, /*wildcard=*/false); parse_events_terms__delete($4); @@ -640,7 +654,8 @@ PE_RAW opt_event_config u64 num;
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; errno = 0; num = strtoull($1 + 1, NULL, 16); ABORT_ON(errno); @@ -663,7 +678,8 @@ PE_BPF_OBJECT opt_event_config int err;
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; err = parse_events_load_bpf(parse_state, list, $1, false, $2); parse_events_terms__delete($2); free($1); @@ -680,7 +696,8 @@ PE_BPF_SOURCE opt_event_config int err;
list = alloc_list(); - ABORT_ON(!list); + if (!list) + YYNOMEM; err = parse_events_load_bpf(_parse_state, list, $1, true, $2); parse_events_terms__delete($2); if (err) { @@ -745,7 +762,8 @@ event_term struct list_head *head = malloc(sizeof(*head)); struct parse_events_term *term = $1;
- ABORT_ON(!head); + if (!head) + YYNOMEM; INIT_LIST_HEAD(head); list_add_tail(&term->list, head); $$ = head; @@ -922,7 +940,8 @@ PE_DRV_CFG_TERM struct parse_events_term *term; char *config = strdup($1);
- ABORT_ON(!config); + if (!config) + YYNOMEM; if (parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_DRV_CFG, config, $1, &@1, NULL)) { free($1); @@ -953,7 +972,8 @@ array_terms ',' array_term new_array.ranges = realloc($1.ranges, sizeof(new_array.ranges[0]) * new_array.nr_ranges); - ABORT_ON(!new_array.ranges); + if (!new_array.ranges) + YYNOMEM; memcpy(&new_array.ranges[$1.nr_ranges], $3.ranges, $3.nr_ranges * sizeof(new_array.ranges[0])); free($3.ranges); @@ -969,7 +989,8 @@ PE_VALUE
array.nr_ranges = 1; array.ranges = malloc(sizeof(array.ranges[0])); - ABORT_ON(!array.ranges); + if (!array.ranges) + YYNOMEM; array.ranges[0].start = $1; array.ranges[0].length = 1; $$ = array; @@ -982,7 +1003,8 @@ PE_VALUE PE_ARRAY_RANGE PE_VALUE ABORT_ON($3 < $1); array.nr_ranges = 1; array.ranges = malloc(sizeof(array.ranges[0])); - ABORT_ON(!array.ranges); + if (!array.ranges) + YYNOMEM; array.ranges[0].start = $1; array.ranges[0].length = $3 - $1 + 1; $$ = array;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Rogers irogers@google.com
[ Upstream commit b52cb995f1a559bc6e1a7cdc0ed0375503528541 ]
Add PE_ABORT that will YYNOMEM or YYABORT accordingly.
Signed-off-by: Ian Rogers irogers@google.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Athira Rajeev atrajeev@linux.vnet.ibm.com Cc: Ingo Molnar mingo@redhat.com Cc: Jiri Olsa jolsa@kernel.org Cc: Kan Liang kan.liang@linux.intel.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Stable-dep-of: b30d4f0b6954 ("perf parse-events: Additional error reporting") Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/parse-events.y | 134 ++++++++++++++++++++------------- 1 file changed, 82 insertions(+), 52 deletions(-)
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y index 78c1f49d8d7e4..24274c6cf85f1 100644 --- a/tools/perf/util/parse-events.y +++ b/tools/perf/util/parse-events.y @@ -28,6 +28,13 @@ do { \ YYABORT; \ } while (0)
+#define PE_ABORT(val) \ +do { \ + if (val == -ENOMEM) \ + YYNOMEM; \ + YYABORT; \ +} while (0) + static struct list_head* alloc_list(void) { struct list_head *list; @@ -385,7 +392,7 @@ PE_NAME sep_dc err = parse_events_multi_pmu_add(_parse_state, $1, NULL, &list); free($1); if (err < 0) - YYABORT; + PE_ABORT(err); $$ = list; } | @@ -461,7 +468,7 @@ value_sym '/' event_config '/' parse_events_terms__delete($3); if (err) { free_list_evsel(list); - YYABORT; + PE_ABORT(err); } $$ = list; } @@ -472,23 +479,28 @@ value_sym sep_slash_slash_dc int type = $1 >> 16; int config = $1 & 255; bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE); + int err;
list = alloc_list(); if (!list) YYNOMEM; - ABORT_ON(parse_events_add_numeric(_parse_state, list, type, config, - /*head_config=*/NULL, wildcard)); + err = parse_events_add_numeric(_parse_state, list, type, config, /*head_config=*/NULL, wildcard); + if (err) + PE_ABORT(err); $$ = list; } | PE_VALUE_SYM_TOOL sep_slash_slash_dc { struct list_head *list; + int err;
list = alloc_list(); if (!list) YYNOMEM; - ABORT_ON(parse_events_add_tool(_parse_state, list, $1)); + err = parse_events_add_tool(_parse_state, list, $1); + if (err) + YYNOMEM; $$ = list; }
@@ -509,7 +521,7 @@ PE_LEGACY_CACHE opt_event_config free($1); if (err) { free_list_evsel(list); - YYABORT; + PE_ABORT(err); } $$ = list; } @@ -530,7 +542,7 @@ PE_PREFIX_MEM PE_VALUE PE_BP_SLASH PE_VALUE PE_BP_COLON PE_MODIFIER_BP opt_event free($6); if (err) { free(list); - YYABORT; + PE_ABORT(err); } $$ = list; } @@ -549,7 +561,7 @@ PE_PREFIX_MEM PE_VALUE PE_BP_SLASH PE_VALUE opt_event_config parse_events_terms__delete($5); if (err) { free(list); - YYABORT; + PE_ABORT(err); } $$ = list; } @@ -569,7 +581,7 @@ PE_PREFIX_MEM PE_VALUE PE_BP_COLON PE_MODIFIER_BP opt_event_config free($4); if (err) { free(list); - YYABORT; + PE_ABORT(err); } $$ = list; } @@ -587,7 +599,7 @@ PE_PREFIX_MEM PE_VALUE opt_event_config parse_events_terms__delete($3); if (err) { free(list); - YYABORT; + PE_ABORT(err); } $$ = list; } @@ -614,7 +626,7 @@ tracepoint_name opt_event_config free($1.event); if (err) { free(list); - YYABORT; + PE_ABORT(err); } $$ = list; } @@ -641,7 +653,7 @@ PE_VALUE ':' PE_VALUE opt_event_config parse_events_terms__delete($4); if (err) { free(list); - YYABORT; + PE_ABORT(err); } $$ = list; } @@ -665,7 +677,7 @@ PE_RAW opt_event_config parse_events_terms__delete($2); if (err) { free(list); - YYABORT; + PE_ABORT(err); } $$ = list; } @@ -685,7 +697,7 @@ PE_BPF_OBJECT opt_event_config free($1); if (err) { free(list); - YYABORT; + PE_ABORT(err); } $$ = list; } @@ -702,7 +714,7 @@ PE_BPF_SOURCE opt_event_config parse_events_terms__delete($2); if (err) { free(list); - YYABORT; + PE_ABORT(err); } $$ = list; } @@ -777,11 +789,12 @@ event_term: PE_RAW { struct parse_events_term *term; + int err = parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_RAW, + strdup("raw"), $1, &@1, &@1);
- if (parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_RAW, - strdup("raw"), $1, &@1, &@1)) { + if (err) { free($1); - YYABORT; + PE_ABORT(err); } $$ = term; } @@ -789,12 +802,12 @@ PE_RAW name_or_raw '=' name_or_legacy { struct parse_events_term *term; + int err = parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER, $1, $3, &@1, &@3);
- if (parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER, - $1, $3, &@1, &@3)) { + if (err) { free($1); free($3); - YYABORT; + PE_ABORT(err); } $$ = term; } @@ -802,11 +815,12 @@ name_or_raw '=' name_or_legacy name_or_raw '=' PE_VALUE { struct parse_events_term *term; + int err = parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, + $1, $3, false, &@1, &@3);
- if (parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, - $1, $3, false, &@1, &@3)) { + if (err) { free($1); - YYABORT; + PE_ABORT(err); } $$ = term; } @@ -814,12 +828,13 @@ name_or_raw '=' PE_VALUE name_or_raw '=' PE_TERM_HW { struct parse_events_term *term; + int err = parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER, + $1, $3.str, &@1, &@3);
- if (parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER, - $1, $3.str, &@1, &@3)) { + if (err) { free($1); free($3.str); - YYABORT; + PE_ABORT(err); } $$ = term; } @@ -827,11 +842,12 @@ name_or_raw '=' PE_TERM_HW PE_LEGACY_CACHE { struct parse_events_term *term; + int err = parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE, + $1, 1, true, &@1, NULL);
- if (parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE, - $1, 1, true, &@1, NULL)) { + if (err) { free($1); - YYABORT; + PE_ABORT(err); } $$ = term; } @@ -839,11 +855,12 @@ PE_LEGACY_CACHE PE_NAME { struct parse_events_term *term; + int err = parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, + $1, 1, true, &@1, NULL);
- if (parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, - $1, 1, true, &@1, NULL)) { + if (err) { free($1); - YYABORT; + PE_ABORT(err); } $$ = term; } @@ -851,11 +868,12 @@ PE_NAME PE_TERM_HW { struct parse_events_term *term; + int err = parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_HARDWARE, + $1.str, $1.num & 255, false, &@1, NULL);
- if (parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_HARDWARE, - $1.str, $1.num & 255, false, &@1, NULL)) { + if (err) { free($1.str); - YYABORT; + PE_ABORT(err); } $$ = term; } @@ -863,10 +881,11 @@ PE_TERM_HW PE_TERM '=' name_or_legacy { struct parse_events_term *term; + int err = parse_events_term__str(&term, (int)$1, NULL, $3, &@1, &@3);
- if (parse_events_term__str(&term, (int)$1, NULL, $3, &@1, &@3)) { + if (err) { free($3); - YYABORT; + PE_ABORT(err); } $$ = term; } @@ -874,10 +893,11 @@ PE_TERM '=' name_or_legacy PE_TERM '=' PE_TERM_HW { struct parse_events_term *term; + int err = parse_events_term__str(&term, (int)$1, NULL, $3.str, &@1, &@3);
- if (parse_events_term__str(&term, (int)$1, NULL, $3.str, &@1, &@3)) { + if (err) { free($3.str); - YYABORT; + PE_ABORT(err); } $$ = term; } @@ -885,37 +905,46 @@ PE_TERM '=' PE_TERM_HW PE_TERM '=' PE_TERM { struct parse_events_term *term; + int err = parse_events_term__term(&term, (int)$1, (int)$3, &@1, &@3); + + if (err) + PE_ABORT(err);
- ABORT_ON(parse_events_term__term(&term, (int)$1, (int)$3, &@1, &@3)); $$ = term; } | PE_TERM '=' PE_VALUE { struct parse_events_term *term; + int err = parse_events_term__num(&term, (int)$1, NULL, $3, false, &@1, &@3); + + if (err) + PE_ABORT(err);
- ABORT_ON(parse_events_term__num(&term, (int)$1, NULL, $3, false, &@1, &@3)); $$ = term; } | PE_TERM { struct parse_events_term *term; + int err = parse_events_term__num(&term, (int)$1, NULL, 1, true, &@1, NULL); + + if (err) + PE_ABORT(err);
- ABORT_ON(parse_events_term__num(&term, (int)$1, NULL, 1, true, &@1, NULL)); $$ = term; } | name_or_raw array '=' name_or_legacy { struct parse_events_term *term; + int err = parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER, $1, $4, &@1, &@4);
- if (parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER, - $1, $4, &@1, &@4)) { + if (err) { free($1); free($4); free($2.ranges); - YYABORT; + PE_ABORT(err); } term->array = $2; $$ = term; @@ -924,12 +953,12 @@ name_or_raw array '=' name_or_legacy name_or_raw array '=' PE_VALUE { struct parse_events_term *term; + int err = parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, $1, $4, false, &@1, &@4);
- if (parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER, - $1, $4, false, &@1, &@4)) { + if (err) { free($1); free($2.ranges); - YYABORT; + PE_ABORT(err); } term->array = $2; $$ = term; @@ -939,14 +968,15 @@ PE_DRV_CFG_TERM { struct parse_events_term *term; char *config = strdup($1); + int err;
if (!config) YYNOMEM; - if (parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_DRV_CFG, - config, $1, &@1, NULL)) { + err = parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_DRV_CFG, config, $1, &@1, NULL); + if (err) { free($1); free(config); - YYABORT; + PE_ABORT(err); } $$ = term; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Rogers irogers@google.com
[ Upstream commit b30d4f0b695428f513c561eeaea52e042ef48550 ]
When no events or PMUs match report an error for event_pmu:
Before: ``` $ perf stat -e 'asdfasdf' -a sleep 1 Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events ```
After: ``` $ perf stat -e 'asdfasdf' -a sleep 1 event syntax error: 'asdfasdf' ___ Bad event name
Unabled to find PMU or event on a PMU of 'asdfasdf' Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events ```
Fixes the inadvertent removal when hybrid parsing was modified.
Fixes: 70c90e4a6b2fbe77 ("perf parse-events: Avoid scanning PMUs before parsing") Signed-off-by: Ian Rogers irogers@google.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Athira Rajeev atrajeev@linux.vnet.ibm.com Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@redhat.com Cc: Jiri Olsa jolsa@kernel.org Cc: Kan Liang kan.liang@linux.intel.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/parse-events.y | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y index 24274c6cf85f1..c590cf7f02a45 100644 --- a/tools/perf/util/parse-events.y +++ b/tools/perf/util/parse-events.y @@ -293,7 +293,6 @@ PE_NAME opt_pmu_config { struct parse_events_state *parse_state = _parse_state; struct list_head *list = NULL, *orig_terms = NULL, *terms= NULL; - struct parse_events_error *error = parse_state->error; char *pattern = NULL;
#define CLEANUP \ @@ -305,9 +304,6 @@ PE_NAME opt_pmu_config free(pattern); \ } while(0)
- if (error) - error->idx = @1.first_column; - if (parse_events_copy_term_list($2, &orig_terms)) { CLEANUP; YYNOMEM; @@ -362,6 +358,14 @@ PE_NAME opt_pmu_config $2 = NULL; } if (!ok) { + struct parse_events_error *error = parse_state->error; + char *help; + + if (asprintf(&help, "Unabled to find PMU or event on a PMU of '%s'", $1) < 0) + help = NULL; + parse_events_error__handle(error, @1.first_column, + strdup("Bad event or PMU"), + help); CLEANUP; YYABORT; } @@ -390,9 +394,18 @@ PE_NAME sep_dc int err;
err = parse_events_multi_pmu_add(_parse_state, $1, NULL, &list); - free($1); - if (err < 0) + if (err < 0) { + struct parse_events_state *parse_state = _parse_state; + struct parse_events_error *error = parse_state->error; + char *help; + + if (asprintf(&help, "Unabled to find PMU or event on a PMU of '%s'", $1) < 0) + help = NULL; + parse_events_error__handle(error, @1.first_column, strdup("Bad event name"), help); + free($1); PE_ABORT(err); + } + free($1); $$ = list; } |
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
[ Upstream commit 389fbbec261b2842fd0e34b26a2b288b122cc406 ]
Immediately mark NMIs as unmasked in response to #VMGEXIT(NMI complete) instead of setting awaiting_iret_completion and waiting until the *next* VM-Exit to unmask NMIs. The whole point of "NMI complete" is that the guest is responsible for telling the hypervisor when it's safe to inject an NMI, i.e. there's no need to wait. And because there's no IRET to single-step, the next VM-Exit could be a long time coming, i.e. KVM could incorrectly hold an NMI pending for far longer than what is required and expected.
Opportunistically fix a stale reference to HF_IRET_MASK.
Fixes: 916b54a7688b ("KVM: x86: Move HF_NMI_MASK and HF_IRET_MASK into "struct vcpu_svm"") Fixes: 4444dfe4050b ("KVM: SVM: Add NMI support for an SEV-ES guest") Cc: Tom Lendacky thomas.lendacky@amd.com Link: https://lore.kernel.org/r/20230615063757.3039121-9-aik@amd.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/svm/sev.c | 5 ++++- arch/x86/kvm/svm/svm.c | 10 +++++----- 2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index d3aec1f2cad20..42630f5b11875 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2881,7 +2881,10 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu) svm->sev_es.ghcb_sa); break; case SVM_VMGEXIT_NMI_COMPLETE: - ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET); + ++vcpu->stat.nmi_window_exits; + svm->nmi_masked = false; + kvm_make_request(KVM_REQ_EVENT, vcpu); + ret = 1; break; case SVM_VMGEXIT_AP_HLT_LOOP: ret = kvm_emulate_ap_reset_hold(vcpu); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index d4bfdc607fe7f..dfb8a3f504322 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -2510,12 +2510,13 @@ static int iret_interception(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu);
+ WARN_ON_ONCE(sev_es_guest(vcpu->kvm)); + ++vcpu->stat.nmi_window_exits; svm->awaiting_iret_completion = true;
svm_clr_iret_intercept(svm); - if (!sev_es_guest(vcpu->kvm)) - svm->nmi_iret_rip = kvm_rip_read(vcpu); + svm->nmi_iret_rip = kvm_rip_read(vcpu);
kvm_make_request(KVM_REQ_EVENT, vcpu); return 1; @@ -3918,12 +3919,11 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) svm->soft_int_injected = false;
/* - * If we've made progress since setting HF_IRET_MASK, we've + * If we've made progress since setting awaiting_iret_completion, we've * executed an IRET and can allow NMI injection. */ if (svm->awaiting_iret_completion && - (sev_es_guest(vcpu->kvm) || - kvm_rip_read(vcpu) != svm->nmi_iret_rip)) { + kvm_rip_read(vcpu) != svm->nmi_iret_rip) { svm->awaiting_iret_completion = false; svm->nmi_masked = false; kvm_make_request(KVM_REQ_EVENT, vcpu);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Torokhov dmitry.torokhov@gmail.com
[ Upstream commit 687fe7dfb736b03ab820d172ea5dbfc1ec447135 ]
Remove option having i2c client contain raw gpio number instead of proper IRQ number. There are no users of this facility in mainline and it will allow cleaning up the driver code with regard to wakeup handling, etc.
Link: https://lore.kernel.org/r/20230724053024.352054-1-dmitry.torokhov@gmail.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Stable-dep-of: cc141c35af87 ("Input: tca6416-keypad - fix interrupt enable disbalance") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/keyboard/tca6416-keypad.c | 27 +++++++++---------------- include/linux/tca6416_keypad.h | 1 - 2 files changed, 10 insertions(+), 18 deletions(-)
diff --git a/drivers/input/keyboard/tca6416-keypad.c b/drivers/input/keyboard/tca6416-keypad.c index 2f745cabf4f24..01bc0b8811882 100644 --- a/drivers/input/keyboard/tca6416-keypad.c +++ b/drivers/input/keyboard/tca6416-keypad.c @@ -148,7 +148,7 @@ static int tca6416_keys_open(struct input_dev *dev) if (chip->use_polling) schedule_delayed_work(&chip->dwork, msecs_to_jiffies(100)); else - enable_irq(chip->irqnum); + enable_irq(chip->client->irq);
return 0; } @@ -160,7 +160,7 @@ static void tca6416_keys_close(struct input_dev *dev) if (chip->use_polling) cancel_delayed_work_sync(&chip->dwork); else - disable_irq(chip->irqnum); + disable_irq(chip->client->irq); }
static int tca6416_setup_registers(struct tca6416_keypad_chip *chip) @@ -266,12 +266,7 @@ static int tca6416_keypad_probe(struct i2c_client *client) goto fail1;
if (!chip->use_polling) { - if (pdata->irq_is_gpio) - chip->irqnum = gpio_to_irq(client->irq); - else - chip->irqnum = client->irq; - - error = request_threaded_irq(chip->irqnum, NULL, + error = request_threaded_irq(client->irq, NULL, tca6416_keys_isr, IRQF_TRIGGER_FALLING | IRQF_ONESHOT | IRQF_NO_AUTOEN, @@ -279,7 +274,7 @@ static int tca6416_keypad_probe(struct i2c_client *client) if (error) { dev_dbg(&client->dev, "Unable to claim irq %d; error %d\n", - chip->irqnum, error); + client->irq, error); goto fail1; } } @@ -298,8 +293,8 @@ static int tca6416_keypad_probe(struct i2c_client *client)
fail2: if (!chip->use_polling) { - free_irq(chip->irqnum, chip); - enable_irq(chip->irqnum); + free_irq(client->irq, chip); + enable_irq(client->irq); } fail1: input_free_device(input); @@ -312,8 +307,8 @@ static void tca6416_keypad_remove(struct i2c_client *client) struct tca6416_keypad_chip *chip = i2c_get_clientdata(client);
if (!chip->use_polling) { - free_irq(chip->irqnum, chip); - enable_irq(chip->irqnum); + free_irq(client->irq, chip); + enable_irq(client->irq); }
input_unregister_device(chip->input); @@ -323,10 +318,9 @@ static void tca6416_keypad_remove(struct i2c_client *client) static int tca6416_keypad_suspend(struct device *dev) { struct i2c_client *client = to_i2c_client(dev); - struct tca6416_keypad_chip *chip = i2c_get_clientdata(client);
if (device_may_wakeup(dev)) - enable_irq_wake(chip->irqnum); + enable_irq_wake(client->irq);
return 0; } @@ -334,10 +328,9 @@ static int tca6416_keypad_suspend(struct device *dev) static int tca6416_keypad_resume(struct device *dev) { struct i2c_client *client = to_i2c_client(dev); - struct tca6416_keypad_chip *chip = i2c_get_clientdata(client);
if (device_may_wakeup(dev)) - disable_irq_wake(chip->irqnum); + disable_irq_wake(client->irq);
return 0; } diff --git a/include/linux/tca6416_keypad.h b/include/linux/tca6416_keypad.h index b0d36a9934ccd..5cf6f6f82aa70 100644 --- a/include/linux/tca6416_keypad.h +++ b/include/linux/tca6416_keypad.h @@ -25,7 +25,6 @@ struct tca6416_keys_platform_data { unsigned int rep:1; /* enable input subsystem auto repeat */ uint16_t pinmask; uint16_t invert; - int irq_is_gpio; int use_polling; /* use polling if Interrupt is not connected*/ }; #endif
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Torokhov dmitry.torokhov@gmail.com
[ Upstream commit cc141c35af873c6796e043adcb820833bd8ef8c5 ]
The driver has been switched to use IRQF_NO_AUTOEN, but in the error unwinding and remove paths calls to enable_irq() were left in place, which will lead to an incorrect enable counter value.
Fixes: bcd9730a04a1 ("Input: move to use request_irq by IRQF_NO_AUTOEN flag") Link: https://lore.kernel.org/r/20230724053024.352054-3-dmitry.torokhov@gmail.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/keyboard/tca6416-keypad.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/input/keyboard/tca6416-keypad.c b/drivers/input/keyboard/tca6416-keypad.c index 01bc0b8811882..d20cbddfae68c 100644 --- a/drivers/input/keyboard/tca6416-keypad.c +++ b/drivers/input/keyboard/tca6416-keypad.c @@ -292,10 +292,8 @@ static int tca6416_keypad_probe(struct i2c_client *client) return 0;
fail2: - if (!chip->use_polling) { + if (!chip->use_polling) free_irq(client->irq, chip); - enable_irq(client->irq); - } fail1: input_free_device(input); kfree(chip); @@ -306,10 +304,8 @@ static void tca6416_keypad_remove(struct i2c_client *client) { struct tca6416_keypad_chip *chip = i2c_get_clientdata(client);
- if (!chip->use_polling) { + if (!chip->use_polling) free_irq(client->irq, chip); - enable_irq(client->irq); - }
input_unregister_device(chip->input); kfree(chip);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnaldo Carvalho de Melo acme@redhat.com
[ Upstream commit 979e9c9fc9c2a761303585e07fe2699bdd88182f ]
In 616b14b47a86d880 ("perf build: Conditionally define NDEBUG") we started using NDEBUG=1 when DEBUG=1 isn't present, so code that is enclosed with assert() is not called.
In dd317df072071903 ("perf build: Make binutil libraries opt in") we stopped linking against binutils-devel, for licensing reasons.
Recently people asked me why annotation of BPF programs wasn't working, i.e. this:
$ perf annotate bpf_prog_5280546344e3f45c_kfree_skb
was returning:
case SYMBOL_ANNOTATE_ERRNO__NO_LIBOPCODES_FOR_BPF: scnprintf(buf, buflen, "Please link with binutils's libopcode to enable BPF annotation");
This was on a fedora rpm, so its new enough that I had to try to test by rebuilding using BUILD_NONDISTRO=1, only to get it segfaulting on me.
This combination made this libopcode function not to be called:
assert(bfd_check_format(bfdf, bfd_object));
Changing it to:
if (!bfd_check_format(bfdf, bfd_object)) abort();
Made it work, looking at this "check" function made me realize it changes the 'bfdf' internal state, i.e. we better call it.
So stop using assert() on it, just call it and abort if it fails.
Probably it is better to propagate the error, etc, but it seems it is unlikely to fail from the usage done so far and we really need to stop using libopcodes, so do the quick fix above and move on.
With it we have BPF annotation back working when built with BUILD_NONDISTRO=1:
⬢[acme@toolbox perf-tools-next]$ perf annotate --stdio2 bpf_prog_5280546344e3f45c_kfree_skb | head No kallsyms or vmlinux with build-id 939bc71a1a51cdc434e60af93c7e734f7d5c0e7e was found Samples: 12 of event 'cpu-clock:ppp', 4000 Hz, Event count (approx.): 3000000, [percent: local period] bpf_prog_5280546344e3f45c_kfree_skb() bpf_prog_5280546344e3f45c_kfree_skb Percent int kfree_skb(struct trace_event_raw_kfree_skb *args) { nop 33.33 xchg %ax,%ax push %rbp mov %rsp,%rbp sub $0x180,%rsp push %rbx push %r13 ⬢[acme@toolbox perf-tools-next]$
Fixes: 6987561c9e86eace ("perf annotate: Enable annotation of BPF programs") Cc: Adrian Hunter adrian.hunter@intel.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: Mohamed Mahmoud mmahmoud@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Dave Tucker datucker@redhat.com Cc: Derek Barbosa debarbos@redhat.com Cc: Song Liu songliubraving@fb.com Link: https://lore.kernel.org/lkml/ZMrMzoQBe0yqMek1@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/annotate.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index ba988a13dacb6..82956adf99632 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1846,8 +1846,11 @@ static int symbol__disassemble_bpf(struct symbol *sym, perf_exe(tpath, sizeof(tpath));
bfdf = bfd_openr(tpath, NULL); - assert(bfdf); - assert(bfd_check_format(bfdf, bfd_object)); + if (bfdf == NULL) + abort(); + + if (!bfd_check_format(bfdf, bfd_object)) + abort();
s = open_memstream(&buf, &buf_size); if (!s) { @@ -1895,7 +1898,8 @@ static int symbol__disassemble_bpf(struct symbol *sym, #else disassemble = disassembler(bfdf); #endif - assert(disassemble); + if (disassemble == NULL) + abort();
fflush(s); do {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
[ Upstream commit 5df8ecfe3632d5879d1f154f7aa8de441b5d1c89 ]
Drop the explicit check on the extended CPUID level in cpu_has_svm(), the kernel's cached CPUID info will leave the entire SVM leaf unset if said leaf is not supported by hardware. Prior to using cached information, the check was needed to avoid false positives due to Intel's rather crazy CPUID behavior of returning the values of the maximum supported leaf if the specified leaf is unsupported.
Fixes: 682a8108872f ("x86/kvm/svm: Simplify cpu_has_svm()") Link: https://lore.kernel.org/r/20230721201859.2307736-13-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/virtext.h | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 3b12e6b994123..6c2e3ff3cb28f 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -101,12 +101,6 @@ static inline int cpu_has_svm(const char **msg) return 0; }
- if (boot_cpu_data.extended_cpuid_level < SVM_CPUID_FUNC) { - if (msg) - *msg = "can't execute cpuid_8000000a"; - return 0; - } - if (!boot_cpu_has(X86_FEATURE_SVM)) { if (msg) *msg = "svm not available";
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Babrou ivan@cloudflare.com
[ Upstream commit 8c49c6e1a7b790c4cb9f464c5485117451d91c60 ]
Commit 3fd7a168bf51 ("perf script: Add 'cgroup' field for output") added support for printing cgroup path in perf script output.
It was okay if you didn't want any stacks:
$ sudo perf script --comms jpegtran:23f4bf -F comm,tid,cpu,time,cgroup jpegtran:23f4bf 3321915 [013] 404718.587488: /idle.slice/polish.service jpegtran:23f4bf 3321915 [031] 404718.592073: /idle.slice/polish.service
With stacks it gets messier as cgroup is printed after the stack:
$ perf script --comms jpegtran:23f4bf -F comm,tid,cpu,time,cgroup,ip,sym jpegtran:23f4bf 3321915 [013] 404718.587488: 5c554 compress_output 570d9 jpeg_finish_compress 3476e jpegtran_main 330ee jpegtran::main 326e2 core::ops::function::FnOnce::call_once (inlined) 326e2 std::sys_common::backtrace::__rust_begin_short_backtrace /idle.slice/polish.service jpegtran:23f4bf 3321915 [031] 404718.592073: 8474d jsimd_encode_mcu_AC_first_prepare_sse2.PADDING 55af68e62fff [unknown] /idle.slice/polish.service
Let's instead print cgroup on the same line as comm:
$ perf script --comms jpegtran:23f4bf -F comm,tid,cpu,time,cgroup,ip,sym jpegtran:23f4bf 3321915 [013] 404718.587488: /idle.slice/polish.service 5c554 compress_output 570d9 jpeg_finish_compress 3476e jpegtran_main 330ee jpegtran::main 326e2 core::ops::function::FnOnce::call_once (inlined) 326e2 std::sys_common::backtrace::__rust_begin_short_backtrace
jpegtran:23f4bf 3321915 [031] 404718.592073: /idle.slice/polish.service 8474d jsimd_encode_mcu_AC_first_prepare_sse2.PADDING 55af68e62fff [unknown]
Fixes: 3fd7a168bf514979 ("perf script: Add 'cgroup' field for output") Signed-off-by: Ivan Babrou ivan@cloudflare.com Acked-by: Ian Rogers irogers@google.com Acked-by: Namhyung Kim namhyung@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Peter Zijlstra peterz@infradead.org Cc: kernel-team@cloudflare.com Link: https://lore.kernel.org/r/20230718000737.49077-1-ivan@cloudflare.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/builtin-script.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 200b3e7ea8dad..517bf25750c8b 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -2199,6 +2199,17 @@ static void process_event(struct perf_script *script, if (PRINT_FIELD(RETIRE_LAT)) fprintf(fp, "%16" PRIu16, sample->retire_lat);
+ if (PRINT_FIELD(CGROUP)) { + const char *cgrp_name; + struct cgroup *cgrp = cgroup__find(machine->env, + sample->cgroup); + if (cgrp != NULL) + cgrp_name = cgrp->name; + else + cgrp_name = "unknown"; + fprintf(fp, " %s", cgrp_name); + } + if (PRINT_FIELD(IP)) { struct callchain_cursor *cursor = NULL;
@@ -2243,17 +2254,6 @@ static void process_event(struct perf_script *script, if (PRINT_FIELD(CODE_PAGE_SIZE)) fprintf(fp, " %s", get_page_size_name(sample->code_page_size, str));
- if (PRINT_FIELD(CGROUP)) { - const char *cgrp_name; - struct cgroup *cgrp = cgroup__find(machine->env, - sample->cgroup); - if (cgrp != NULL) - cgrp_name = cgrp->name; - else - cgrp_name = "unknown"; - fprintf(fp, " %s", cgrp_name); - } - perf_sample__fprintf_ipc(sample, attr, fp);
fprintf(fp, "\n");
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namhyung Kim namhyung@kernel.org
[ Upstream commit dc7f01f1bceca38839992b3371e0be8a3c9d5acf ]
For logical OR operator, the actual sample_flags are in the 'groups' list so it needs to check entries in the list instead. Otherwise it would show the following error message.
$ sudo perf record -a -e cycles:p --filter 'period > 100 || weight > 0' sleep 1 Error: cycles:p event does not have sample flags 0 failed to set filter "BPF" on event cycles:p with 2 (No such file or directory)
Actually it should warn on 'weight' is used without WEIGHT flag.
Error: cycles:p event does not have PERF_SAMPLE_WEIGHT Hint: please add -W option to perf record failed to set filter "BPF" on event cycles:p with 2 (No such file or directory)
Fixes: 4310551b76e0d676 ("perf bpf filter: Show warning for missing sample flags") Reviewed-by: Ian Rogers irogers@google.com Signed-off-by: Namhyung Kim namhyung@kernel.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Ingo Molnar mingo@kernel.org Cc: Jiri Olsa jolsa@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: https://lore.kernel.org/r/20230811025822.3859771-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/bpf-filter.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c index 0b30688d78a7f..a1f076ef653d3 100644 --- a/tools/perf/util/bpf-filter.c +++ b/tools/perf/util/bpf-filter.c @@ -62,6 +62,16 @@ static int check_sample_flags(struct evsel *evsel, struct perf_bpf_filter_expr * if (evsel->core.attr.sample_type & expr->sample_flags) return 0;
+ if (expr->op == PBF_OP_GROUP_BEGIN) { + struct perf_bpf_filter_expr *group; + + list_for_each_entry(group, &expr->groups, list) { + if (check_sample_flags(evsel, group) < 0) + return -1; + } + return 0; + } + info = get_sample_info(expr->sample_flags); if (info == NULL) { pr_err("Error: %s event does not have sample flags %lx\n",
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnaldo Carvalho de Melo acme@kernel.org
[ Upstream commit 42c6dd9d23019ff339d0aca80a444eb71087050e ]
As thread__find_symbol_fb() will end up calling thread__find_map() and it in turn will call these on uninitialized memory:
maps__zput(al->maps); map__zput(al->map); thread__zput(al->thread);
Fixes: 0dd5041c9a0eaf8c ("perf addr_location: Add init/exit/copy functions") Reviewed-by: Ian Rogers irogers@google.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Cc: Athira Rajeev atrajeev@linux.vnet.ibm.com Cc: Disha Goel disgoel@linux.vnet.ibm.com Cc: Jiri Olsa jolsa@kernel.org Cc: Kajol Jain kjain@linux.ibm.com Cc: Madhavan Srinivasan maddy@linux.ibm.com Cc: Namhyung Kim namhyung@kernel.org Link: https://lore.kernel.org/r/20230731091857.10681-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/dlfilter.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c index 46f74b2344dbb..798a53d7e6c9d 100644 --- a/tools/perf/util/dlfilter.c +++ b/tools/perf/util/dlfilter.c @@ -166,6 +166,7 @@ static __s32 dlfilter__resolve_address(void *ctx, __u64 address, struct perf_dlf if (!thread) return -1;
+ addr_location__init(&al); thread__find_symbol_fb(thread, d->sample->cpumode, address, &al);
al_to_d_al(&al, &d_al);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adrian Hunter adrian.hunter@intel.com
[ Upstream commit 82b0a10390e5f198a4e23c9cc6a7307d2cf099f3 ]
Add perf_dlfilter_fns.al_cleanup() to do addr_location__exit() on data passed via perf_dlfilter_fns.resolve_address().
Add dlfilter-test-api-v2 to the "dlfilter C API" test to test it.
Update documentation, clarifying that data returned by APIs should not be dereferenced after filter_event() and filter_event_early() return.
Fixes: 0dd5041c9a0eaf8c ("perf addr_location: Add init/exit/copy functions") Reviewed-by: Ian Rogers irogers@google.com Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Link: https://lore.kernel.org/r/20230731091857.10681-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/Documentation/perf-dlfilter.txt | 22 +- tools/perf/Makefile.perf | 2 +- tools/perf/dlfilters/dlfilter-test-api-v2.c | 377 ++++++++++++++++++++ tools/perf/include/perf/perf_dlfilter.h | 11 +- tools/perf/tests/dlfilter-test.c | 38 +- tools/perf/util/dlfilter.c | 29 ++ 6 files changed, 464 insertions(+), 15 deletions(-) create mode 100644 tools/perf/dlfilters/dlfilter-test-api-v2.c
diff --git a/tools/perf/Documentation/perf-dlfilter.txt b/tools/perf/Documentation/perf-dlfilter.txt index fb22e3b31dc5c..8887cc20a809e 100644 --- a/tools/perf/Documentation/perf-dlfilter.txt +++ b/tools/perf/Documentation/perf-dlfilter.txt @@ -64,6 +64,12 @@ internal filtering. If implemented, 'filter_description' should return a one-line description of the filter, and optionally a longer description.
+Do not assume the 'sample' argument is valid (dereferenceable) +after 'filter_event' and 'filter_event_early' return. + +Do not assume data referenced by pointers in struct perf_dlfilter_sample +is valid (dereferenceable) after 'filter_event' and 'filter_event_early' return. + The perf_dlfilter_sample structure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -150,7 +156,8 @@ struct perf_dlfilter_fns { const char *(*srcline)(void *ctx, __u32 *line_number); struct perf_event_attr *(*attr)(void *ctx); __s32 (*object_code)(void *ctx, __u64 ip, void *buf, __u32 len); - void *(*reserved[120])(void *); + void (*al_cleanup)(void *ctx, struct perf_dlfilter_al *al); + void *(*reserved[119])(void *); }; ----
@@ -161,7 +168,8 @@ struct perf_dlfilter_fns { 'args' returns arguments from --dlarg options.
'resolve_address' provides information about 'address'. al->size must be set -before calling. Returns 0 on success, -1 otherwise. +before calling. Returns 0 on success, -1 otherwise. Call al_cleanup() (if present, +see below) when 'al' data is no longer needed.
'insn' returns instruction bytes and length.
@@ -171,6 +179,12 @@ before calling. Returns 0 on success, -1 otherwise.
'object_code' reads object code and returns the number of bytes read.
+'al_cleanup' must be called (if present, so check perf_dlfilter_fns.al_cleanup != NULL) +after resolve_address() to free any associated resources. + +Do not assume pointers obtained via perf_dlfilter_fns are valid (dereferenceable) +after 'filter_event' and 'filter_event_early' return. + The perf_dlfilter_al structure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -197,9 +211,13 @@ struct perf_dlfilter_al { /* Below members are only populated by resolve_ip() */ __u8 filtered; /* true if this sample event will be filtered out */ const char *comm; + void *priv; /* Private data. Do not change */ }; ----
+Do not assume data referenced by pointers in struct perf_dlfilter_al +is valid (dereferenceable) after 'filter_event' and 'filter_event_early' return. + perf_dlfilter_sample flags ~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf index 097316ef38e6a..f178b36c69402 100644 --- a/tools/perf/Makefile.perf +++ b/tools/perf/Makefile.perf @@ -381,7 +381,7 @@ ifndef NO_JVMTI PROGRAMS += $(OUTPUT)$(LIBJVMTI) endif
-DLFILTERS := dlfilter-test-api-v0.so dlfilter-show-cycles.so +DLFILTERS := dlfilter-test-api-v0.so dlfilter-test-api-v2.so dlfilter-show-cycles.so DLFILTERS := $(patsubst %,$(OUTPUT)dlfilters/%,$(DLFILTERS))
# what 'all' will build and 'install' will install, in perfexecdir diff --git a/tools/perf/dlfilters/dlfilter-test-api-v2.c b/tools/perf/dlfilters/dlfilter-test-api-v2.c new file mode 100644 index 0000000000000..38e593d92920c --- /dev/null +++ b/tools/perf/dlfilters/dlfilter-test-api-v2.c @@ -0,0 +1,377 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Test v2 API for perf --dlfilter shared object + * Copyright (c) 2023, Intel Corporation. + */ +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stdbool.h> + +/* + * Copy v2 API instead of including current API + */ +#include <linux/perf_event.h> +#include <linux/types.h> + +/* + * The following macro can be used to determine if this header defines + * perf_dlfilter_sample machine_pid and vcpu. + */ +#define PERF_DLFILTER_HAS_MACHINE_PID + +/* Definitions for perf_dlfilter_sample flags */ +enum { + PERF_DLFILTER_FLAG_BRANCH = 1ULL << 0, + PERF_DLFILTER_FLAG_CALL = 1ULL << 1, + PERF_DLFILTER_FLAG_RETURN = 1ULL << 2, + PERF_DLFILTER_FLAG_CONDITIONAL = 1ULL << 3, + PERF_DLFILTER_FLAG_SYSCALLRET = 1ULL << 4, + PERF_DLFILTER_FLAG_ASYNC = 1ULL << 5, + PERF_DLFILTER_FLAG_INTERRUPT = 1ULL << 6, + PERF_DLFILTER_FLAG_TX_ABORT = 1ULL << 7, + PERF_DLFILTER_FLAG_TRACE_BEGIN = 1ULL << 8, + PERF_DLFILTER_FLAG_TRACE_END = 1ULL << 9, + PERF_DLFILTER_FLAG_IN_TX = 1ULL << 10, + PERF_DLFILTER_FLAG_VMENTRY = 1ULL << 11, + PERF_DLFILTER_FLAG_VMEXIT = 1ULL << 12, +}; + +/* + * perf sample event information (as per perf script and <linux/perf_event.h>) + */ +struct perf_dlfilter_sample { + __u32 size; /* Size of this structure (for compatibility checking) */ + __u16 ins_lat; /* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */ + __u16 p_stage_cyc; /* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */ + __u64 ip; + __s32 pid; + __s32 tid; + __u64 time; + __u64 addr; + __u64 id; + __u64 stream_id; + __u64 period; + __u64 weight; /* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */ + __u64 transaction; /* Refer PERF_SAMPLE_TRANSACTION in <linux/perf_event.h> */ + __u64 insn_cnt; /* For instructions-per-cycle (IPC) */ + __u64 cyc_cnt; /* For instructions-per-cycle (IPC) */ + __s32 cpu; + __u32 flags; /* Refer PERF_DLFILTER_FLAG_* above */ + __u64 data_src; /* Refer PERF_SAMPLE_DATA_SRC in <linux/perf_event.h> */ + __u64 phys_addr; /* Refer PERF_SAMPLE_PHYS_ADDR in <linux/perf_event.h> */ + __u64 data_page_size; /* Refer PERF_SAMPLE_DATA_PAGE_SIZE in <linux/perf_event.h> */ + __u64 code_page_size; /* Refer PERF_SAMPLE_CODE_PAGE_SIZE in <linux/perf_event.h> */ + __u64 cgroup; /* Refer PERF_SAMPLE_CGROUP in <linux/perf_event.h> */ + __u8 cpumode; /* Refer CPUMODE_MASK etc in <linux/perf_event.h> */ + __u8 addr_correlates_sym; /* True => resolve_addr() can be called */ + __u16 misc; /* Refer perf_event_header in <linux/perf_event.h> */ + __u32 raw_size; /* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */ + const void *raw_data; /* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */ + __u64 brstack_nr; /* Number of brstack entries */ + const struct perf_branch_entry *brstack; /* Refer <linux/perf_event.h> */ + __u64 raw_callchain_nr; /* Number of raw_callchain entries */ + const __u64 *raw_callchain; /* Refer <linux/perf_event.h> */ + const char *event; + __s32 machine_pid; + __s32 vcpu; +}; + +/* + * Address location (as per perf script) + */ +struct perf_dlfilter_al { + __u32 size; /* Size of this structure (for compatibility checking) */ + __u32 symoff; + const char *sym; + __u64 addr; /* Mapped address (from dso) */ + __u64 sym_start; + __u64 sym_end; + const char *dso; + __u8 sym_binding; /* STB_LOCAL, STB_GLOBAL or STB_WEAK, refer <elf.h> */ + __u8 is_64_bit; /* Only valid if dso is not NULL */ + __u8 is_kernel_ip; /* True if in kernel space */ + __u32 buildid_size; + __u8 *buildid; + /* Below members are only populated by resolve_ip() */ + __u8 filtered; /* True if this sample event will be filtered out */ + const char *comm; + void *priv; /* Private data (v2 API) */ +}; + +struct perf_dlfilter_fns { + /* Return information about ip */ + const struct perf_dlfilter_al *(*resolve_ip)(void *ctx); + /* Return information about addr (if addr_correlates_sym) */ + const struct perf_dlfilter_al *(*resolve_addr)(void *ctx); + /* Return arguments from --dlarg option */ + char **(*args)(void *ctx, int *dlargc); + /* + * Return information about address (al->size must be set before + * calling). Returns 0 on success, -1 otherwise. Call al_cleanup() + * when 'al' data is no longer needed. + */ + __s32 (*resolve_address)(void *ctx, __u64 address, struct perf_dlfilter_al *al); + /* Return instruction bytes and length */ + const __u8 *(*insn)(void *ctx, __u32 *length); + /* Return source file name and line number */ + const char *(*srcline)(void *ctx, __u32 *line_number); + /* Return perf_event_attr, refer <linux/perf_event.h> */ + struct perf_event_attr *(*attr)(void *ctx); + /* Read object code, return numbers of bytes read */ + __s32 (*object_code)(void *ctx, __u64 ip, void *buf, __u32 len); + /* + * If present (i.e. must check al_cleanup != NULL), call after + * resolve_address() to free any associated resources. (v2 API) + */ + void (*al_cleanup)(void *ctx, struct perf_dlfilter_al *al); + /* Reserved */ + void *(*reserved[119])(void *); +}; + +struct perf_dlfilter_fns perf_dlfilter_fns; + +static int verbose; + +#define pr_debug(fmt, ...) do { \ + if (verbose > 0) \ + fprintf(stderr, fmt, ##__VA_ARGS__); \ + } while (0) + +static int test_fail(const char *msg) +{ + pr_debug("%s\n", msg); + return -1; +} + +#define CHECK(x) do { \ + if (!(x)) \ + return test_fail("Check '" #x "' failed\n"); \ + } while (0) + +struct filter_data { + __u64 ip; + __u64 addr; + int do_early; + int early_filter_cnt; + int filter_cnt; +}; + +static struct filter_data *filt_dat; + +int start(void **data, void *ctx) +{ + int dlargc; + char **dlargv; + struct filter_data *d; + static bool called; + + verbose = 1; + + CHECK(!filt_dat && !called); + called = true; + + d = calloc(1, sizeof(*d)); + if (!d) + test_fail("Failed to allocate memory"); + filt_dat = d; + *data = d; + + dlargv = perf_dlfilter_fns.args(ctx, &dlargc); + + CHECK(dlargc == 6); + CHECK(!strcmp(dlargv[0], "first")); + verbose = strtol(dlargv[1], NULL, 0); + d->ip = strtoull(dlargv[2], NULL, 0); + d->addr = strtoull(dlargv[3], NULL, 0); + d->do_early = strtol(dlargv[4], NULL, 0); + CHECK(!strcmp(dlargv[5], "last")); + + pr_debug("%s API\n", __func__); + + return 0; +} + +#define CHECK_SAMPLE(x) do { \ + if (sample->x != expected.x) \ + return test_fail("'" #x "' not expected value\n"); \ + } while (0) + +static int check_sample(struct filter_data *d, const struct perf_dlfilter_sample *sample) +{ + struct perf_dlfilter_sample expected = { + .ip = d->ip, + .pid = 12345, + .tid = 12346, + .time = 1234567890, + .addr = d->addr, + .id = 99, + .stream_id = 101, + .period = 543212345, + .cpu = 31, + .cpumode = PERF_RECORD_MISC_USER, + .addr_correlates_sym = 1, + .misc = PERF_RECORD_MISC_USER, + }; + + CHECK(sample->size >= sizeof(struct perf_dlfilter_sample)); + + CHECK_SAMPLE(ip); + CHECK_SAMPLE(pid); + CHECK_SAMPLE(tid); + CHECK_SAMPLE(time); + CHECK_SAMPLE(addr); + CHECK_SAMPLE(id); + CHECK_SAMPLE(stream_id); + CHECK_SAMPLE(period); + CHECK_SAMPLE(cpu); + CHECK_SAMPLE(cpumode); + CHECK_SAMPLE(addr_correlates_sym); + CHECK_SAMPLE(misc); + + CHECK(!sample->raw_data); + CHECK_SAMPLE(brstack_nr); + CHECK(!sample->brstack); + CHECK_SAMPLE(raw_callchain_nr); + CHECK(!sample->raw_callchain); + +#define EVENT_NAME "branches:" + CHECK(!strncmp(sample->event, EVENT_NAME, strlen(EVENT_NAME))); + + return 0; +} + +static int check_al(void *ctx) +{ + const struct perf_dlfilter_al *al; + + al = perf_dlfilter_fns.resolve_ip(ctx); + if (!al) + return test_fail("resolve_ip() failed"); + + CHECK(al->sym && !strcmp("foo", al->sym)); + CHECK(!al->symoff); + + return 0; +} + +static int check_addr_al(void *ctx) +{ + const struct perf_dlfilter_al *addr_al; + + addr_al = perf_dlfilter_fns.resolve_addr(ctx); + if (!addr_al) + return test_fail("resolve_addr() failed"); + + CHECK(addr_al->sym && !strcmp("bar", addr_al->sym)); + CHECK(!addr_al->symoff); + + return 0; +} + +static int check_address_al(void *ctx, const struct perf_dlfilter_sample *sample) +{ + struct perf_dlfilter_al address_al; + const struct perf_dlfilter_al *al; + + al = perf_dlfilter_fns.resolve_ip(ctx); + if (!al) + return test_fail("resolve_ip() failed"); + + address_al.size = sizeof(address_al); + if (perf_dlfilter_fns.resolve_address(ctx, sample->ip, &address_al)) + return test_fail("resolve_address() failed"); + + CHECK(address_al.sym && al->sym); + CHECK(!strcmp(address_al.sym, al->sym)); + CHECK(address_al.addr == al->addr); + CHECK(address_al.sym_start == al->sym_start); + CHECK(address_al.sym_end == al->sym_end); + CHECK(address_al.dso && al->dso); + CHECK(!strcmp(address_al.dso, al->dso)); + + /* al_cleanup() is v2 API so may not be present */ + if (perf_dlfilter_fns.al_cleanup) + perf_dlfilter_fns.al_cleanup(ctx, &address_al); + + return 0; +} + +static int check_attr(void *ctx) +{ + struct perf_event_attr *attr = perf_dlfilter_fns.attr(ctx); + + CHECK(attr); + CHECK(attr->type == PERF_TYPE_HARDWARE); + CHECK(attr->config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS); + + return 0; +} + +static int do_checks(void *data, const struct perf_dlfilter_sample *sample, void *ctx, bool early) +{ + struct filter_data *d = data; + + CHECK(data && filt_dat == data); + + if (early) { + CHECK(!d->early_filter_cnt); + d->early_filter_cnt += 1; + } else { + CHECK(!d->filter_cnt); + CHECK(d->early_filter_cnt); + CHECK(d->do_early != 2); + d->filter_cnt += 1; + } + + if (check_sample(data, sample)) + return -1; + + if (check_attr(ctx)) + return -1; + + if (early && !d->do_early) + return 0; + + if (check_al(ctx) || check_addr_al(ctx) || check_address_al(ctx, sample)) + return -1; + + if (early) + return d->do_early == 2; + + return 1; +} + +int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx) +{ + pr_debug("%s API\n", __func__); + + return do_checks(data, sample, ctx, true); +} + +int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx) +{ + pr_debug("%s API\n", __func__); + + return do_checks(data, sample, ctx, false); +} + +int stop(void *data, void *ctx) +{ + static bool called; + + pr_debug("%s API\n", __func__); + + CHECK(data && filt_dat == data && !called); + called = true; + + free(data); + filt_dat = NULL; + return 0; +} + +const char *filter_description(const char **long_description) +{ + *long_description = "Filter used by the 'dlfilter C API' perf test"; + return "dlfilter to test v2 C API"; +} diff --git a/tools/perf/include/perf/perf_dlfilter.h b/tools/perf/include/perf/perf_dlfilter.h index a26e2f129f83e..16fc4568ac53b 100644 --- a/tools/perf/include/perf/perf_dlfilter.h +++ b/tools/perf/include/perf/perf_dlfilter.h @@ -91,6 +91,7 @@ struct perf_dlfilter_al { /* Below members are only populated by resolve_ip() */ __u8 filtered; /* True if this sample event will be filtered out */ const char *comm; + void *priv; /* Private data. Do not change */ };
struct perf_dlfilter_fns { @@ -102,7 +103,8 @@ struct perf_dlfilter_fns { char **(*args)(void *ctx, int *dlargc); /* * Return information about address (al->size must be set before - * calling). Returns 0 on success, -1 otherwise. + * calling). Returns 0 on success, -1 otherwise. Call al_cleanup() + * when 'al' data is no longer needed. */ __s32 (*resolve_address)(void *ctx, __u64 address, struct perf_dlfilter_al *al); /* Return instruction bytes and length */ @@ -113,8 +115,13 @@ struct perf_dlfilter_fns { struct perf_event_attr *(*attr)(void *ctx); /* Read object code, return numbers of bytes read */ __s32 (*object_code)(void *ctx, __u64 ip, void *buf, __u32 len); + /* + * If present (i.e. must check al_cleanup != NULL), call after + * resolve_address() to free any associated resources. + */ + void (*al_cleanup)(void *ctx, struct perf_dlfilter_al *al); /* Reserved */ - void *(*reserved[120])(void *); + void *(*reserved[119])(void *); };
/* diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c index 086fd2179e41f..da3a9b50b1b1f 100644 --- a/tools/perf/tests/dlfilter-test.c +++ b/tools/perf/tests/dlfilter-test.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* * Test dlfilter C API. A perf.data file is synthesized and then processed - * by perf script with a dlfilter named dlfilter-test-api-v0.so. Also a C file + * by perf script with dlfilters named dlfilter-test-api-v*.so. Also a C file * is compiled to provide a dso to match the synthesized perf.data file. */
@@ -37,6 +37,8 @@
#define MAP_START 0x400000
+#define DLFILTER_TEST_NAME_MAX 128 + struct test_data { struct perf_tool tool; struct machine *machine; @@ -45,6 +47,8 @@ struct test_data { u64 bar; u64 ip; u64 addr; + char name[DLFILTER_TEST_NAME_MAX]; + char desc[DLFILTER_TEST_NAME_MAX]; char perf[PATH_MAX]; char perf_data_file_name[PATH_MAX]; char c_file_name[PATH_MAX]; @@ -215,7 +219,7 @@ static int write_prog(char *file_name) return err ? -1 : 0; }
-static int get_dlfilters_path(char *buf, size_t sz) +static int get_dlfilters_path(const char *name, char *buf, size_t sz) { char perf[PATH_MAX]; char path[PATH_MAX]; @@ -224,12 +228,12 @@ static int get_dlfilters_path(char *buf, size_t sz)
perf_exe(perf, sizeof(perf)); perf_path = dirname(perf); - snprintf(path, sizeof(path), "%s/dlfilters/dlfilter-test-api-v0.so", perf_path); + snprintf(path, sizeof(path), "%s/dlfilters/%s", perf_path, name); if (access(path, R_OK)) { exec_path = get_argv_exec_path(); if (!exec_path) return -1; - snprintf(path, sizeof(path), "%s/dlfilters/dlfilter-test-api-v0.so", exec_path); + snprintf(path, sizeof(path), "%s/dlfilters/%s", exec_path, name); free(exec_path); if (access(path, R_OK)) return -1; @@ -244,9 +248,9 @@ static int check_filter_desc(struct test_data *td) char *desc = NULL; int ret;
- if (get_filter_desc(td->dlfilters, "dlfilter-test-api-v0.so", &desc, &long_desc) && + if (get_filter_desc(td->dlfilters, td->name, &desc, &long_desc) && long_desc && !strcmp(long_desc, "Filter used by the 'dlfilter C API' perf test") && - desc && !strcmp(desc, "dlfilter to test v0 C API")) + desc && !strcmp(desc, td->desc)) ret = 0; else ret = -1; @@ -284,7 +288,7 @@ static int get_ip_addr(struct test_data *td) static int do_run_perf_script(struct test_data *td, int do_early) { return system_cmd("%s script -i %s " - "--dlfilter %s/dlfilter-test-api-v0.so " + "--dlfilter %s/%s " "--dlarg first " "--dlarg %d " "--dlarg %" PRIu64 " " @@ -292,7 +296,7 @@ static int do_run_perf_script(struct test_data *td, int do_early) "--dlarg %d " "--dlarg last", td->perf, td->perf_data_file_name, td->dlfilters, - verbose, td->ip, td->addr, do_early); + td->name, verbose, td->ip, td->addr, do_early); }
static int run_perf_script(struct test_data *td) @@ -321,7 +325,7 @@ static int test__dlfilter_test(struct test_data *td) u64 id = 99; int err;
- if (get_dlfilters_path(td->dlfilters, PATH_MAX)) + if (get_dlfilters_path(td->name, td->dlfilters, PATH_MAX)) return test_result("dlfilters not found", TEST_SKIP);
if (check_filter_desc(td)) @@ -399,14 +403,18 @@ static void test_data__free(struct test_data *td) } }
-static int test__dlfilter(struct test_suite *test __maybe_unused, int subtest __maybe_unused) +static int test__dlfilter_ver(int ver) { struct test_data td = {.fd = -1}; int pid = getpid(); int err;
+ pr_debug("\n-- Testing version %d API --\n", ver); + perf_exe(td.perf, sizeof(td.perf));
+ snprintf(td.name, sizeof(td.name), "dlfilter-test-api-v%d.so", ver); + snprintf(td.desc, sizeof(td.desc), "dlfilter to test v%d C API", ver); snprintf(td.perf_data_file_name, PATH_MAX, "/tmp/dlfilter-test-%u-perf-data", pid); snprintf(td.c_file_name, PATH_MAX, "/tmp/dlfilter-test-%u-prog.c", pid); snprintf(td.prog_file_name, PATH_MAX, "/tmp/dlfilter-test-%u-prog", pid); @@ -416,4 +424,14 @@ static int test__dlfilter(struct test_suite *test __maybe_unused, int subtest __ return err; }
+static int test__dlfilter(struct test_suite *test __maybe_unused, int subtest __maybe_unused) +{ + int err = test__dlfilter_ver(0); + + if (err) + return err; + /* No test for version 1 */ + return test__dlfilter_ver(2); +} + DEFINE_SUITE("dlfilter C API", dlfilter); diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c index 798a53d7e6c9d..e0f822ebb9b97 100644 --- a/tools/perf/util/dlfilter.c +++ b/tools/perf/util/dlfilter.c @@ -10,6 +10,8 @@ #include <subcmd/exec-cmd.h> #include <linux/zalloc.h> #include <linux/build_bug.h> +#include <linux/kernel.h> +#include <linux/string.h>
#include "debug.h" #include "event.h" @@ -63,6 +65,7 @@ static void al_to_d_al(struct addr_location *al, struct perf_dlfilter_al *d_al) d_al->addr = al->addr; d_al->comm = NULL; d_al->filtered = 0; + d_al->priv = NULL; }
static struct addr_location *get_al(struct dlfilter *d) @@ -151,6 +154,11 @@ static char **dlfilter__args(void *ctx, int *dlargc) return d->dlargv; }
+static bool has_priv(struct perf_dlfilter_al *d_al_p) +{ + return d_al_p->size >= offsetof(struct perf_dlfilter_al, priv) + sizeof(d_al_p->priv); +} + static __s32 dlfilter__resolve_address(void *ctx, __u64 address, struct perf_dlfilter_al *d_al_p) { struct dlfilter *d = (struct dlfilter *)ctx; @@ -177,9 +185,29 @@ static __s32 dlfilter__resolve_address(void *ctx, __u64 address, struct perf_dlf memcpy(d_al_p, &d_al, min((size_t)sz, sizeof(d_al))); d_al_p->size = sz;
+ if (has_priv(d_al_p)) + d_al_p->priv = memdup(&al, sizeof(al)); + return 0; }
+static void dlfilter__al_cleanup(void *ctx __maybe_unused, struct perf_dlfilter_al *d_al_p) +{ + struct addr_location *al; + + /* Ensure backward compatibility */ + if (!has_priv(d_al_p) || !d_al_p->priv) + return; + + al = d_al_p->priv; + + d_al_p->priv = NULL; + + addr_location__exit(al); + + free(al); +} + static const __u8 *dlfilter__insn(void *ctx, __u32 *len) { struct dlfilter *d = (struct dlfilter *)ctx; @@ -297,6 +325,7 @@ static const struct perf_dlfilter_fns perf_dlfilter_fns = { .resolve_addr = dlfilter__resolve_addr, .args = dlfilter__args, .resolve_address = dlfilter__resolve_address, + .al_cleanup = dlfilter__al_cleanup, .insn = dlfilter__insn, .srcline = dlfilter__srcline, .attr = dlfilter__attr,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kajol Jain kjain@linux.ibm.com
[ Upstream commit 3286f88f31da060ac2789cee247153961ba57e49 ]
Update the description for some of the JSON/events for power10 platform.
Fixes: 32daa5d7899e0343 ("perf vendor events: Initial JSON/events list for power10 platform") Signed-off-by: Kajol Jain kjain@linux.ibm.com Cc: Athira Rajeev atrajeev@linux.vnet.ibm.com Cc: Disha Goel disgoel@linux.ibm.com Cc: Ian Rogers irogers@google.com Cc: Kajol Jain kjain@linux.ibm.com Cc: Madhavan Srinivasan maddy@linux.ibm.com Cc: Namhyung Kim namhyung@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230814112803.1508296-1-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../arch/powerpc/power10/cache.json | 4 +- .../arch/powerpc/power10/frontend.json | 30 ++++++------ .../arch/powerpc/power10/marked.json | 20 ++++---- .../arch/powerpc/power10/memory.json | 6 +-- .../arch/powerpc/power10/others.json | 48 +++++++++---------- .../arch/powerpc/power10/pipeline.json | 20 ++++---- .../pmu-events/arch/powerpc/power10/pmc.json | 4 +- .../arch/powerpc/power10/translation.json | 6 +-- 8 files changed, 69 insertions(+), 69 deletions(-)
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/cache.json b/tools/perf/pmu-events/arch/powerpc/power10/cache.json index 605be14f441c8..9cb929bb64afd 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/cache.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/cache.json @@ -17,7 +17,7 @@ { "EventCode": "0x34056", "EventName": "PM_EXEC_STALL_LOAD_FINISH", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was finishing a load after its data was reloaded from a data source beyond the local L1; cycles in which the LSU was processing an L1-hit; cycles in which the NTF instruction merged with another load in the LMQ; cycles in which the NTF instruction is waiting for a data reload for a load miss, but the data comes back with a non-NTF instruction." + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was finishing a load after its data was reloaded from a data source beyond the local L1; cycles in which the LSU was processing an L1-hit; cycles in which the next-to-finish (NTF) instruction merged with another load in the LMQ; cycles in which the NTF instruction is waiting for a data reload for a load miss, but the data comes back with a non-NTF instruction." }, { "EventCode": "0x3006C", @@ -27,7 +27,7 @@ { "EventCode": "0x300F4", "EventName": "PM_RUN_INST_CMPL_CONC", - "BriefDescription": "PowerPC instructions completed by this thread when all threads in the core had the run-latch set." + "BriefDescription": "PowerPC instruction completed by this thread when all threads in the core had the run-latch set." }, { "EventCode": "0x4C016", diff --git a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json index 558f9530f54ec..61e9e0222c873 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json @@ -7,7 +7,7 @@ { "EventCode": "0x10006", "EventName": "PM_DISP_STALL_HELD_OTHER_CYC", - "BriefDescription": "Cycles in which the NTC instruction is held at dispatch for any other reason." + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch for any other reason." }, { "EventCode": "0x10010", @@ -32,12 +32,12 @@ { "EventCode": "0x1D05E", "EventName": "PM_DISP_STALL_HELD_HALT_CYC", - "BriefDescription": "Cycles in which the NTC instruction is held at dispatch because of power management." + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because of power management." }, { "EventCode": "0x1E050", "EventName": "PM_DISP_STALL_HELD_STF_MAPPER_CYC", - "BriefDescription": "Cycles in which the NTC instruction is held at dispatch because the STF mapper/SRB was full. Includes GPR (count, link, tar), VSR, VMR, FPR." + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because the STF mapper/SRB was full. Includes GPR (count, link, tar), VSR, VMR, FPR." }, { "EventCode": "0x1F054", @@ -67,7 +67,7 @@ { "EventCode": "0x100F6", "EventName": "PM_IERAT_MISS", - "BriefDescription": "IERAT Reloaded to satisfy an IERAT miss. All page sizes are counted by this event." + "BriefDescription": "IERAT Reloaded to satisfy an IERAT miss. All page sizes are counted by this event. This event only counts instruction demand access." }, { "EventCode": "0x100F8", @@ -77,7 +77,7 @@ { "EventCode": "0x20006", "EventName": "PM_DISP_STALL_HELD_ISSQ_FULL_CYC", - "BriefDescription": "Cycles in which the NTC instruction is held at dispatch due to Issue queue full. Includes issue queue and branch queue." + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch due to Issue queue full. Includes issue queue and branch queue." }, { "EventCode": "0x20114", @@ -102,7 +102,7 @@ { "EventCode": "0x2D01A", "EventName": "PM_DISP_STALL_IC_MISS", - "BriefDescription": "Cycles when dispatch was stalled for this thread due to an Icache Miss." + "BriefDescription": "Cycles when dispatch was stalled for this thread due to an instruction cache miss." }, { "EventCode": "0x2E018", @@ -112,7 +112,7 @@ { "EventCode": "0x2E01A", "EventName": "PM_DISP_STALL_HELD_XVFC_MAPPER_CYC", - "BriefDescription": "Cycles in which the NTC instruction is held at dispatch because the XVFC mapper/SRB was full." + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because the XVFC mapper/SRB was full." }, { "EventCode": "0x2C142", @@ -137,7 +137,7 @@ { "EventCode": "0x30004", "EventName": "PM_DISP_STALL_FLUSH", - "BriefDescription": "Cycles when dispatch was stalled because of a flush that happened to an instruction(s) that was not yet NTC. PM_EXEC_STALL_NTC_FLUSH only includes instructions that were flushed after becoming NTC." + "BriefDescription": "Cycles when dispatch was stalled because of a flush that happened to an instruction(s) that was not yet next-to-complete (NTC). PM_EXEC_STALL_NTC_FLUSH only includes instructions that were flushed after becoming NTC." }, { "EventCode": "0x3000A", @@ -157,7 +157,7 @@ { "EventCode": "0x30018", "EventName": "PM_DISP_STALL_HELD_SCOREBOARD_CYC", - "BriefDescription": "Cycles in which the NTC instruction is held at dispatch while waiting on the Scoreboard. This event combines VSCR and FPSCR together." + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch while waiting on the Scoreboard. This event combines VSCR and FPSCR together." }, { "EventCode": "0x30026", @@ -182,7 +182,7 @@ { "EventCode": "0x3D05C", "EventName": "PM_DISP_STALL_HELD_RENAME_CYC", - "BriefDescription": "Cycles in which the NTC instruction is held at dispatch because the mapper/SRB was full. Includes GPR (count, link, tar), VSR, VMR, FPR and XVFC." + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because the mapper/SRB was full. Includes GPR (count, link, tar), VSR, VMR, FPR and XVFC." }, { "EventCode": "0x3E052", @@ -192,7 +192,7 @@ { "EventCode": "0x3E054", "EventName": "PM_LD_MISS_L1", - "BriefDescription": "Load Missed L1, counted at execution time (can be greater than loads finished). LMQ merges are not included in this count. i.e. if a load instruction misses on an address that is already allocated on the LMQ, this event will not increment for that load). Note that this count is per slice, so if a load spans multiple slices this event will increment multiple times for a single load." + "BriefDescription": "Load missed L1, counted at finish time. LMQ merges are not included in this count. i.e. if a load instruction misses on an address that is already allocated on the LMQ, this event will not increment for that load). Note that this count is per slice, so if a load spans multiple slices this event will increment multiple times for a single load." }, { "EventCode": "0x301EA", @@ -202,7 +202,7 @@ { "EventCode": "0x300FA", "EventName": "PM_INST_FROM_L3MISS", - "BriefDescription": "The processor's instruction cache was reloaded from a source other than the local core's L1, L2, or L3 due to a demand miss." + "BriefDescription": "The processor's instruction cache was reloaded from beyond the local core's L3 due to a demand miss." }, { "EventCode": "0x40006", @@ -232,16 +232,16 @@ { "EventCode": "0x4E01A", "EventName": "PM_DISP_STALL_HELD_CYC", - "BriefDescription": "Cycles in which the NTC instruction is held at dispatch for any reason." + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch for any reason." }, { "EventCode": "0x4003C", "EventName": "PM_DISP_STALL_HELD_SYNC_CYC", - "BriefDescription": "Cycles in which the NTC instruction is held at dispatch because of a synchronizing instruction that requires the ICT to be empty before dispatch." + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because of a synchronizing instruction that requires the ICT to be empty before dispatch." }, { "EventCode": "0x44056", "EventName": "PM_VECTOR_ST_CMPL", - "BriefDescription": "Vector store instructions completed." + "BriefDescription": "Vector store instruction completed." } ] diff --git a/tools/perf/pmu-events/arch/powerpc/power10/marked.json b/tools/perf/pmu-events/arch/powerpc/power10/marked.json index 58b5dfe3a2731..131f8d0e88317 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/marked.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/marked.json @@ -62,7 +62,7 @@ { "EventCode": "0x200FD", "EventName": "PM_L1_ICACHE_MISS", - "BriefDescription": "Demand iCache Miss." + "BriefDescription": "Demand instruction cache miss." }, { "EventCode": "0x30130", @@ -72,7 +72,7 @@ { "EventCode": "0x34146", "EventName": "PM_MRK_LD_CMPL", - "BriefDescription": "Marked loads completed." + "BriefDescription": "Marked load instruction completed." }, { "EventCode": "0x3E158", @@ -82,12 +82,12 @@ { "EventCode": "0x3E15A", "EventName": "PM_MRK_ST_FIN", - "BriefDescription": "The marked instruction was a store of any kind." + "BriefDescription": "Marked store instruction finished." }, { "EventCode": "0x30068", "EventName": "PM_L1_ICACHE_RELOADED_PREF", - "BriefDescription": "Counts all Icache prefetch reloads ( includes demand turned into prefetch)." + "BriefDescription": "Counts all instruction cache prefetch reloads (includes demand turned into prefetch)." }, { "EventCode": "0x301E4", @@ -102,12 +102,12 @@ { "EventCode": "0x300FE", "EventName": "PM_DATA_FROM_L3MISS", - "BriefDescription": "The processor's data cache was reloaded from a source other than the local core's L1, L2, or L3 due to a demand miss." + "BriefDescription": "The processor's L1 data cache was reloaded from beyond the local core's L3 due to a demand miss." }, { "EventCode": "0x40012", "EventName": "PM_L1_ICACHE_RELOADED_ALL", - "BriefDescription": "Counts all Icache reloads includes demand, prefetch, prefetch turned into demand and demand turned into prefetch." + "BriefDescription": "Counts all instruction cache reloads includes demand, prefetch, prefetch turned into demand and demand turned into prefetch." }, { "EventCode": "0x40134", @@ -117,22 +117,22 @@ { "EventCode": "0x4505A", "EventName": "PM_SP_FLOP_CMPL", - "BriefDescription": "Single Precision floating point instructions completed." + "BriefDescription": "Single Precision floating point instruction completed." }, { "EventCode": "0x4D058", "EventName": "PM_VECTOR_FLOP_CMPL", - "BriefDescription": "Vector floating point instructions completed." + "BriefDescription": "Vector floating point instruction completed." }, { "EventCode": "0x4D05A", "EventName": "PM_NON_MATH_FLOP_CMPL", - "BriefDescription": "Non Math instructions completed." + "BriefDescription": "Non Math instruction completed." }, { "EventCode": "0x401E0", "EventName": "PM_MRK_INST_CMPL", - "BriefDescription": "marked instruction completed." + "BriefDescription": "Marked instruction completed." }, { "EventCode": "0x400FE", diff --git a/tools/perf/pmu-events/arch/powerpc/power10/memory.json b/tools/perf/pmu-events/arch/powerpc/power10/memory.json index 843b51f531e95..c4c10ca98cad7 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/memory.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/memory.json @@ -47,7 +47,7 @@ { "EventCode": "0x10062", "EventName": "PM_LD_L3MISS_PEND_CYC", - "BriefDescription": "Cycles L3 miss was pending for this thread." + "BriefDescription": "Cycles in which an L3 miss was pending for this thread." }, { "EventCode": "0x20010", @@ -132,7 +132,7 @@ { "EventCode": "0x300FC", "EventName": "PM_DTLB_MISS", - "BriefDescription": "The DPTEG required for the load/store instruction in execution was missing from the TLB. It includes pages of all sizes for demand and prefetch activity." + "BriefDescription": "The DPTEG required for the load/store instruction in execution was missing from the TLB. This event only counts for demand misses." }, { "EventCode": "0x4D02C", @@ -142,7 +142,7 @@ { "EventCode": "0x4003E", "EventName": "PM_LD_CMPL", - "BriefDescription": "Loads completed." + "BriefDescription": "Load instruction completed." }, { "EventCode": "0x4C040", diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json b/tools/perf/pmu-events/arch/powerpc/power10/others.json index a771e4b6bec58..a5319cdba89b3 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/others.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json @@ -2,12 +2,12 @@ { "EventCode": "0x10016", "EventName": "PM_VSU0_ISSUE", - "BriefDescription": "VSU instructions issued to VSU pipe 0." + "BriefDescription": "VSU instruction issued to VSU pipe 0." }, { "EventCode": "0x1001C", "EventName": "PM_ULTRAVISOR_INST_CMPL", - "BriefDescription": "PowerPC instructions that completed while the thread was in ultravisor state." + "BriefDescription": "PowerPC instruction completed while the thread was in ultravisor state." }, { "EventCode": "0x100F0", @@ -17,12 +17,12 @@ { "EventCode": "0x10134", "EventName": "PM_MRK_ST_DONE_L2", - "BriefDescription": "Marked stores completed in L2 (RC machine done)." + "BriefDescription": "Marked store completed in L2." }, { "EventCode": "0x1505E", "EventName": "PM_LD_HIT_L1", - "BriefDescription": "Loads that finished without experiencing an L1 miss." + "BriefDescription": "Load finished without experiencing an L1 miss." }, { "EventCode": "0x1F056", @@ -42,7 +42,7 @@ { "EventCode": "0x101E4", "EventName": "PM_MRK_L1_ICACHE_MISS", - "BriefDescription": "Marked Instruction suffered an icache Miss." + "BriefDescription": "Marked instruction suffered an instruction cache miss." }, { "EventCode": "0x101EA", @@ -72,7 +72,7 @@ { "EventCode": "0x2E010", "EventName": "PM_ADJUNCT_INST_CMPL", - "BriefDescription": "PowerPC instructions that completed while the thread is in Adjunct state." + "BriefDescription": "PowerPC instruction completed while the thread was in Adjunct state." }, { "EventCode": "0x2E014", @@ -122,7 +122,7 @@ { "EventCode": "0x201E4", "EventName": "PM_MRK_DATA_FROM_L3MISS", - "BriefDescription": "The processor's data cache was reloaded from a source other than the local core's L1, L2, or L3 due to a demand miss for a marked load." + "BriefDescription": "The processor's L1 data cache was reloaded from beyond the local core's L3 due to a demand miss for a marked instruction." }, { "EventCode": "0x201E8", @@ -132,17 +132,17 @@ { "EventCode": "0x200F2", "EventName": "PM_INST_DISP", - "BriefDescription": "PowerPC instructions dispatched." + "BriefDescription": "PowerPC instruction dispatched." }, { "EventCode": "0x30132", "EventName": "PM_MRK_VSU_FIN", - "BriefDescription": "VSU marked instructions finished. Excludes simple FX instructions issued to the Store Unit." + "BriefDescription": "VSU marked instruction finished. Excludes simple FX instructions issued to the Store Unit." }, { "EventCode": "0x30038", "EventName": "PM_EXEC_STALL_DMISS_LMEM", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from the local memory, local OpenCapp cache, or local OpenCapp memory." + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from the local memory, local OpenCAPI cache, or local OpenCAPI memory." }, { "EventCode": "0x3F04A", @@ -152,12 +152,12 @@ { "EventCode": "0x3405A", "EventName": "PM_PRIVILEGED_INST_CMPL", - "BriefDescription": "PowerPC Instructions that completed while the thread is in Privileged state." + "BriefDescription": "PowerPC instruction completed while the thread was in Privileged state." }, { "EventCode": "0x3F150", "EventName": "PM_MRK_ST_DRAIN_CYC", - "BriefDescription": "cycles to drain st from core to L2." + "BriefDescription": "Cycles in which the marked store drained from the core to the L2." }, { "EventCode": "0x3F054", @@ -182,7 +182,7 @@ { "EventCode": "0x4001C", "EventName": "PM_VSU_FIN", - "BriefDescription": "VSU instructions finished." + "BriefDescription": "VSU instruction finished." }, { "EventCode": "0x4C01A", @@ -197,7 +197,7 @@ { "EventCode": "0x4D022", "EventName": "PM_HYPERVISOR_INST_CMPL", - "BriefDescription": "PowerPC instructions that completed while the thread is in hypervisor state." + "BriefDescription": "PowerPC instruction completed while the thread was in hypervisor state." }, { "EventCode": "0x4D026", @@ -212,32 +212,32 @@ { "EventCode": "0x40030", "EventName": "PM_INST_FIN", - "BriefDescription": "Instructions finished." + "BriefDescription": "Instruction finished." }, { "EventCode": "0x44146", "EventName": "PM_MRK_STCX_CORE_CYC", - "BriefDescription": "Cycles spent in the core portion of a marked Stcx instruction. It starts counting when the instruction is decoded and stops counting when it drains into the L2." + "BriefDescription": "Cycles spent in the core portion of a marked STCX instruction. It starts counting when the instruction is decoded and stops counting when it drains into the L2." }, { "EventCode": "0x44054", "EventName": "PM_VECTOR_LD_CMPL", - "BriefDescription": "Vector load instructions completed." + "BriefDescription": "Vector load instruction completed." }, { "EventCode": "0x45054", "EventName": "PM_FMA_CMPL", - "BriefDescription": "Two floating point instructions completed (FMA class of instructions: fmadd, fnmadd, fmsub, fnmsub). Scalar instructions only." + "BriefDescription": "Two floating point instruction completed (FMA class of instructions: fmadd, fnmadd, fmsub, fnmsub). Scalar instructions only." }, { "EventCode": "0x45056", "EventName": "PM_SCALAR_FLOP_CMPL", - "BriefDescription": "Scalar floating point instructions completed." + "BriefDescription": "Scalar floating point instruction completed." }, { "EventCode": "0x4505C", "EventName": "PM_MATH_FLOP_CMPL", - "BriefDescription": "Math floating point instructions completed." + "BriefDescription": "Math floating point instruction completed." }, { "EventCode": "0x4D05E", @@ -252,21 +252,21 @@ { "EventCode": "0x401E6", "EventName": "PM_MRK_INST_FROM_L3MISS", - "BriefDescription": "The processor's instruction cache was reloaded from a source other than the local core's L1, L2, or L3 due to a demand miss for a marked instruction." + "BriefDescription": "The processor's instruction cache was reloaded from beyond the local core's L3 due to a demand miss for a marked instruction." }, { "EventCode": "0x401E8", "EventName": "PM_MRK_DATA_FROM_L2MISS", - "BriefDescription": "The processor's data cache was reloaded from a source other than the local core's L1 or L2 due to a demand miss for a marked load." + "BriefDescription": "The processor's L1 data cache was reloaded from a source beyond the local core's L2 due to a demand miss for a marked instruction." }, { "EventCode": "0x400F0", "EventName": "PM_LD_DEMAND_MISS_L1_FIN", - "BriefDescription": "Load Missed L1, counted at finish time." + "BriefDescription": "Load missed L1, counted at finish time." }, { "EventCode": "0x500FA", "EventName": "PM_RUN_INST_CMPL", - "BriefDescription": "Completed PowerPC instructions gated by the run latch." + "BriefDescription": "PowerPC instruction completed while the run latch is set." } ] diff --git a/tools/perf/pmu-events/arch/powerpc/power10/pipeline.json b/tools/perf/pmu-events/arch/powerpc/power10/pipeline.json index b8aded6045faa..449f57e8ba6af 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/pipeline.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/pipeline.json @@ -2,7 +2,7 @@ { "EventCode": "0x100FE", "EventName": "PM_INST_CMPL", - "BriefDescription": "PowerPC instructions completed." + "BriefDescription": "PowerPC instruction completed." }, { "EventCode": "0x1000C", @@ -12,7 +12,7 @@ { "EventCode": "0x1000E", "EventName": "PM_MMA_ISSUED", - "BriefDescription": "MMA instructions issued." + "BriefDescription": "MMA instruction issued." }, { "EventCode": "0x10012", @@ -107,7 +107,7 @@ { "EventCode": "0x2D012", "EventName": "PM_VSU1_ISSUE", - "BriefDescription": "VSU instructions issued to VSU pipe 1." + "BriefDescription": "VSU instruction issued to VSU pipe 1." }, { "EventCode": "0x2D018", @@ -122,7 +122,7 @@ { "EventCode": "0x2E01E", "EventName": "PM_EXEC_STALL_NTC_FLUSH", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was executing in any unit before it was flushed. Note that if the flush of the oldest instruction happens after finish, the cycles from dispatch to issue will be included in PM_DISP_STALL and the cycles from issue to finish will be included in PM_EXEC_STALL and its corresponding children. This event will also count cycles when the previous NTF instruction is still completing and the new NTF instruction is stalled at dispatch." + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was executing in any unit before it was flushed. Note that if the flush of the oldest instruction happens after finish, the cycles from dispatch to issue will be included in PM_DISP_STALL and the cycles from issue to finish will be included in PM_EXEC_STALL and its corresponding children. This event will also count cycles when the previous next-to-finish (NTF) instruction is still completing and the new NTF instruction is stalled at dispatch." }, { "EventCode": "0x2013C", @@ -137,7 +137,7 @@ { "EventCode": "0x201E2", "EventName": "PM_MRK_LD_MISS_L1", - "BriefDescription": "Marked DL1 Demand Miss counted at finish time." + "BriefDescription": "Marked demand data load miss counted at finish time." }, { "EventCode": "0x200F4", @@ -172,7 +172,7 @@ { "EventCode": "0x30028", "EventName": "PM_CMPL_STALL_MEM_ECC", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for the non-speculative finish of either a stcx waiting for its result or a load waiting for non-critical sectors of data and ECC." + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for the non-speculative finish of either a STCX waiting for its result or a load waiting for non-critical sectors of data and ECC." }, { "EventCode": "0x30036", @@ -187,12 +187,12 @@ { "EventCode": "0x3F044", "EventName": "PM_VSU2_ISSUE", - "BriefDescription": "VSU instructions issued to VSU pipe 2." + "BriefDescription": "VSU instruction issued to VSU pipe 2." }, { "EventCode": "0x30058", "EventName": "PM_TLBIE_FIN", - "BriefDescription": "TLBIE instructions finished in the LSU. Two TLBIEs can finish each cycle. All will be counted." + "BriefDescription": "TLBIE instruction finished in the LSU. Two TLBIEs can finish each cycle. All will be counted." }, { "EventCode": "0x3D058", @@ -252,7 +252,7 @@ { "EventCode": "0x4E012", "EventName": "PM_EXEC_STALL_UNKNOWN", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline completed without an ntf_type pulse. The ntf_pulse was missed by the ISU because the NTF finishes and completions came too close together." + "BriefDescription": "Cycles in which the oldest instruction in the pipeline completed without an ntf_type pulse. The ntf_pulse was missed by the ISU because the next-to-finish (NTF) instruction finishes and completions came too close together." }, { "EventCode": "0x4D020", @@ -267,7 +267,7 @@ { "EventCode": "0x45058", "EventName": "PM_IC_MISS_CMPL", - "BriefDescription": "Non-speculative icache miss, counted at completion." + "BriefDescription": "Non-speculative instruction cache miss, counted at completion." }, { "EventCode": "0x4D050", diff --git a/tools/perf/pmu-events/arch/powerpc/power10/pmc.json b/tools/perf/pmu-events/arch/powerpc/power10/pmc.json index b5d1bd39cfb22..364fedbfb490b 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/pmc.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/pmc.json @@ -12,11 +12,11 @@ { "EventCode": "0x45052", "EventName": "PM_4FLOP_CMPL", - "BriefDescription": "Four floating point instructions completed (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg)." + "BriefDescription": "Four floating point instruction completed (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg)." }, { "EventCode": "0x4D054", "EventName": "PM_8FLOP_CMPL", - "BriefDescription": "Four Double Precision vector instructions completed." + "BriefDescription": "Four Double Precision vector instruction completed." } ] diff --git a/tools/perf/pmu-events/arch/powerpc/power10/translation.json b/tools/perf/pmu-events/arch/powerpc/power10/translation.json index db3766dca07c5..3e47b804a0a8f 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/translation.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/translation.json @@ -17,7 +17,7 @@ { "EventCode": "0x2011C", "EventName": "PM_MRK_NTF_CYC", - "BriefDescription": "Cycles during which the marked instruction is the oldest in the pipeline (NTF or NTC)." + "BriefDescription": "Cycles in which the marked instruction is the oldest in the pipeline (next-to-finish or next-to-complete)." }, { "EventCode": "0x2E01C", @@ -37,7 +37,7 @@ { "EventCode": "0x200FE", "EventName": "PM_DATA_FROM_L2MISS", - "BriefDescription": "The processor's data cache was reloaded from a source other than the local core's L1 or L2 due to a demand miss." + "BriefDescription": "The processor's L1 data cache was reloaded from a source beyond the local core's L2 due to a demand miss." }, { "EventCode": "0x30010", @@ -52,6 +52,6 @@ { "EventCode": "0x4D05C", "EventName": "PM_DPP_FLOP_CMPL", - "BriefDescription": "Double-Precision or Quad-Precision instructions completed." + "BriefDescription": "Double-Precision or Quad-Precision instruction completed." } ]
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kajol Jain kjain@linux.ibm.com
[ Upstream commit e104df97b8dcfbab2e42de634b99bf03f0805d85 ]
Drop some of the JSON/events for power10 platform due to counter data mismatch.
Fixes: 32daa5d7899e0343 ("perf vendor events: Initial JSON/events list for power10 platform") Signed-off-by: Kajol Jain kjain@linux.ibm.com Cc: Athira Rajeev atrajeev@linux.vnet.ibm.com Cc: Disha Goel disgoel@linux.ibm.com Cc: Ian Rogers irogers@google.com Cc: Kajol Jain kjain@linux.ibm.com Cc: Madhavan Srinivasan maddy@linux.ibm.com Cc: Namhyung Kim namhyung@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230814112803.1508296-2-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../arch/powerpc/power10/floating_point.json | 7 ------- tools/perf/pmu-events/arch/powerpc/power10/marked.json | 10 ---------- tools/perf/pmu-events/arch/powerpc/power10/others.json | 5 ----- .../perf/pmu-events/arch/powerpc/power10/pipeline.json | 10 ---------- .../pmu-events/arch/powerpc/power10/translation.json | 5 ----- 5 files changed, 37 deletions(-) delete mode 100644 tools/perf/pmu-events/arch/powerpc/power10/floating_point.json
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json b/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json deleted file mode 100644 index 54acb55e2c8c6..0000000000000 --- a/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json +++ /dev/null @@ -1,7 +0,0 @@ -[ - { - "EventCode": "0x4016E", - "EventName": "PM_THRESH_NOT_MET", - "BriefDescription": "Threshold counter did not meet threshold." - } -] diff --git a/tools/perf/pmu-events/arch/powerpc/power10/marked.json b/tools/perf/pmu-events/arch/powerpc/power10/marked.json index 131f8d0e88317..f2436fc5537ce 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/marked.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/marked.json @@ -19,11 +19,6 @@ "EventName": "PM_MRK_BR_TAKEN_CMPL", "BriefDescription": "Marked Branch Taken instruction completed." }, - { - "EventCode": "0x20112", - "EventName": "PM_MRK_NTF_FIN", - "BriefDescription": "The marked instruction became the oldest in the pipeline before it finished. It excludes instructions that finish at dispatch." - }, { "EventCode": "0x2C01C", "EventName": "PM_EXEC_STALL_DMISS_OFF_CHIP", @@ -64,11 +59,6 @@ "EventName": "PM_L1_ICACHE_MISS", "BriefDescription": "Demand instruction cache miss." }, - { - "EventCode": "0x30130", - "EventName": "PM_MRK_INST_FIN", - "BriefDescription": "marked instruction finished. Excludes instructions that finish at dispatch. Note that stores always finish twice since the address gets issued to the LSU and the data gets issued to the VSU." - }, { "EventCode": "0x34146", "EventName": "PM_MRK_LD_CMPL", diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json b/tools/perf/pmu-events/arch/powerpc/power10/others.json index a5319cdba89b3..17c5424ef1ac1 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/others.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json @@ -29,11 +29,6 @@ "EventName": "PM_DISP_SS0_2_INSTR_CYC", "BriefDescription": "Cycles in which Superslice 0 dispatches either 1 or 2 instructions." }, - { - "EventCode": "0x1F15C", - "EventName": "PM_MRK_STCX_L2_CYC", - "BriefDescription": "Cycles spent in the nest portion of a marked Stcx instruction. It starts counting when the operation starts to drain to the L2 and it stops counting when the instruction retires from the Instruction Completion Table (ICT) in the Instruction Sequencing Unit (ISU)." - }, { "EventCode": "0x10066", "EventName": "PM_ADJUNCT_CYC", diff --git a/tools/perf/pmu-events/arch/powerpc/power10/pipeline.json b/tools/perf/pmu-events/arch/powerpc/power10/pipeline.json index 449f57e8ba6af..799893c56f32b 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/pipeline.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/pipeline.json @@ -194,11 +194,6 @@ "EventName": "PM_TLBIE_FIN", "BriefDescription": "TLBIE instruction finished in the LSU. Two TLBIEs can finish each cycle. All will be counted." }, - { - "EventCode": "0x3D058", - "EventName": "PM_SCALAR_FSQRT_FDIV_ISSUE", - "BriefDescription": "Scalar versions of four floating point operations: fdiv,fsqrt (xvdivdp, xvdivsp, xvsqrtdp, xvsqrtsp)." - }, { "EventCode": "0x30066", "EventName": "PM_LSU_FIN", @@ -269,11 +264,6 @@ "EventName": "PM_IC_MISS_CMPL", "BriefDescription": "Non-speculative instruction cache miss, counted at completion." }, - { - "EventCode": "0x4D050", - "EventName": "PM_VSU_NON_FLOP_CMPL", - "BriefDescription": "Non-floating point VSU instructions completed." - }, { "EventCode": "0x4D052", "EventName": "PM_2FLOP_CMPL", diff --git a/tools/perf/pmu-events/arch/powerpc/power10/translation.json b/tools/perf/pmu-events/arch/powerpc/power10/translation.json index 3e47b804a0a8f..961e2491e73f6 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/translation.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/translation.json @@ -4,11 +4,6 @@ "EventName": "PM_MRK_START_PROBE_NOP_CMPL", "BriefDescription": "Marked Start probe nop (AND R0,R0,R0) completed." }, - { - "EventCode": "0x20016", - "EventName": "PM_ST_FIN", - "BriefDescription": "Store finish count. Includes speculative activity." - }, { "EventCode": "0x20018", "EventName": "PM_ST_FWD",
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kajol Jain kjain@linux.ibm.com
[ Upstream commit 4836b9a85ef148c7c9779b66fab3f7279e488d90 ]
Drop STORES_PER_INST metric event for the power10 platform, as the metric expression of STORES_PER_INST metric event using dropped event PM_ST_FIN.
Fixes: 3ca3af7d1f230d1f ("perf vendor events power10: Add metric events JSON file for power10 platform") Signed-off-by: Kajol Jain kjain@linux.ibm.com Cc: Athira Rajeev atrajeev@linux.vnet.ibm.com Cc: Disha Goel disgoel@linux.ibm.com Cc: Ian Rogers irogers@google.com Cc: Kajol Jain kjain@linux.ibm.com Cc: Madhavan Srinivasan maddy@linux.ibm.com Cc: Namhyung Kim namhyung@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230814112803.1508296-3-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/pmu-events/arch/powerpc/power10/metrics.json | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/metrics.json b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json index 6f53583a0c62c..e3087eb1ccff8 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/metrics.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json @@ -453,12 +453,6 @@ "MetricGroup": "General", "MetricName": "LOADS_PER_INST" }, - { - "BriefDescription": "Average number of finished stores per completed instruction", - "MetricExpr": "PM_ST_FIN / PM_RUN_INST_CMPL", - "MetricGroup": "General", - "MetricName": "STORES_PER_INST" - }, { "BriefDescription": "Percentage of demand loads that reloaded from beyond the L2 per completed instruction", "MetricExpr": "PM_DATA_FROM_L2MISS / PM_RUN_INST_CMPL * 100",
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kajol Jain kjain@linux.ibm.com
[ Upstream commit 7d473f475b2aff7e7c5d63b6f701c54590f84781 ]
Move some of the power10 JSON/events to appropriate files.
Fixes: 32daa5d7899e0343 ("perf vendor events: Initial JSON/events list for power10 platform") Signed-off-by: Kajol Jain kjain@linux.ibm.com Cc: Athira Rajeev atrajeev@linux.vnet.ibm.com Cc: Disha Goel disgoel@linux.ibm.com Cc: Ian Rogers irogers@google.com Cc: Kajol Jain kjain@linux.ibm.com Cc: Madhavan Srinivasan maddy@linux.ibm.com Cc: Namhyung Kim namhyung@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230814112803.1508296-4-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../arch/powerpc/power10/cache.json | 45 ---- .../arch/powerpc/power10/floating_point.json | 67 +++++ .../arch/powerpc/power10/frontend.json | 180 ------------- .../arch/powerpc/power10/marked.json | 186 ++++++++++--- .../arch/powerpc/power10/memory.json | 85 ------ .../arch/powerpc/power10/others.json | 192 ++------------ .../arch/powerpc/power10/pipeline.json | 247 ++++++++++++++---- .../pmu-events/arch/powerpc/power10/pmc.json | 193 +++++++++++++- .../arch/powerpc/power10/translation.json | 35 --- 9 files changed, 616 insertions(+), 614 deletions(-) create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/floating_point.json
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/cache.json b/tools/perf/pmu-events/arch/powerpc/power10/cache.json index 9cb929bb64afd..839ae26945fb2 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/cache.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/cache.json @@ -1,54 +1,9 @@ [ - { - "EventCode": "0x1003C", - "EventName": "PM_EXEC_STALL_DMISS_L2L3", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from either the local L2 or local L3." - }, - { - "EventCode": "0x1E054", - "EventName": "PM_EXEC_STALL_DMISS_L21_L31", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from another core's L2 or L3 on the same chip." - }, - { - "EventCode": "0x34054", - "EventName": "PM_EXEC_STALL_DMISS_L2L3_NOCONFLICT", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from the local L2 or local L3, without a dispatch conflict." - }, - { - "EventCode": "0x34056", - "EventName": "PM_EXEC_STALL_LOAD_FINISH", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was finishing a load after its data was reloaded from a data source beyond the local L1; cycles in which the LSU was processing an L1-hit; cycles in which the next-to-finish (NTF) instruction merged with another load in the LMQ; cycles in which the NTF instruction is waiting for a data reload for a load miss, but the data comes back with a non-NTF instruction." - }, - { - "EventCode": "0x3006C", - "EventName": "PM_RUN_CYC_SMT2_MODE", - "BriefDescription": "Cycles when this thread's run latch is set and the core is in SMT2 mode." - }, { "EventCode": "0x300F4", "EventName": "PM_RUN_INST_CMPL_CONC", "BriefDescription": "PowerPC instruction completed by this thread when all threads in the core had the run-latch set." }, - { - "EventCode": "0x4C016", - "EventName": "PM_EXEC_STALL_DMISS_L2L3_CONFLICT", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from the local L2 or local L3, with a dispatch conflict." - }, - { - "EventCode": "0x4D014", - "EventName": "PM_EXEC_STALL_LOAD", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a load instruction executing in the Load Store Unit." - }, - { - "EventCode": "0x4D016", - "EventName": "PM_EXEC_STALL_PTESYNC", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a PTESYNC instruction executing in the Load Store Unit." - }, - { - "EventCode": "0x401EA", - "EventName": "PM_THRESH_EXC_128", - "BriefDescription": "Threshold counter exceeded a value of 128." - }, { "EventCode": "0x400F6", "EventName": "PM_BR_MPRED_CMPL", diff --git a/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json b/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json new file mode 100644 index 0000000000000..e816cd10c1293 --- /dev/null +++ b/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json @@ -0,0 +1,67 @@ +[ + { + "EventCode": "0x100F4", + "EventName": "PM_FLOP_CMPL", + "BriefDescription": "Floating Point Operations Completed. Includes any type. It counts once for each 1, 2, 4 or 8 flop instruction. Use PM_1|2|4|8_FLOP_CMPL events to count flops." + }, + { + "EventCode": "0x45050", + "EventName": "PM_1FLOP_CMPL", + "BriefDescription": "One floating point instruction completed (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg)." + }, + { + "EventCode": "0x45052", + "EventName": "PM_4FLOP_CMPL", + "BriefDescription": "Four floating point instruction completed (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg)." + }, + { + "EventCode": "0x45054", + "EventName": "PM_FMA_CMPL", + "BriefDescription": "Two floating point instruction completed (FMA class of instructions: fmadd, fnmadd, fmsub, fnmsub). Scalar instructions only." + }, + { + "EventCode": "0x45056", + "EventName": "PM_SCALAR_FLOP_CMPL", + "BriefDescription": "Scalar floating point instruction completed." + }, + { + "EventCode": "0x4505A", + "EventName": "PM_SP_FLOP_CMPL", + "BriefDescription": "Single Precision floating point instruction completed." + }, + { + "EventCode": "0x4505C", + "EventName": "PM_MATH_FLOP_CMPL", + "BriefDescription": "Math floating point instruction completed." + }, + { + "EventCode": "0x4D052", + "EventName": "PM_2FLOP_CMPL", + "BriefDescription": "Double Precision vector version of fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg completed." + }, + { + "EventCode": "0x4D054", + "EventName": "PM_8FLOP_CMPL", + "BriefDescription": "Four Double Precision vector instruction completed." + }, + { + "EventCode": "0x4D056", + "EventName": "PM_NON_FMA_FLOP_CMPL", + "BriefDescription": "Non FMA instruction completed." + }, + { + "EventCode": "0x4D058", + "EventName": "PM_VECTOR_FLOP_CMPL", + "BriefDescription": "Vector floating point instruction completed." + }, + { + "EventCode": "0x4D05A", + "EventName": "PM_NON_MATH_FLOP_CMPL", + "BriefDescription": "Non Math instruction completed." + }, + { + "EventCode": "0x4D05C", + "EventName": "PM_DPP_FLOP_CMPL", + "BriefDescription": "Double-Precision or Quad-Precision instruction completed." + } +] diff --git a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json index 61e9e0222c873..dc0bb6c6338bf 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json @@ -1,64 +1,9 @@ [ - { - "EventCode": "0x10004", - "EventName": "PM_EXEC_STALL_TRANSLATION", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline suffered a TLB miss or ERAT miss and waited for it to resolve." - }, - { - "EventCode": "0x10006", - "EventName": "PM_DISP_STALL_HELD_OTHER_CYC", - "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch for any other reason." - }, - { - "EventCode": "0x10010", - "EventName": "PM_PMC4_OVERFLOW", - "BriefDescription": "The event selected for PMC4 caused the event counter to overflow." - }, - { - "EventCode": "0x10020", - "EventName": "PM_PMC4_REWIND", - "BriefDescription": "The speculative event selected for PMC4 rewinds and the counter for PMC4 is not charged." - }, - { - "EventCode": "0x10038", - "EventName": "PM_DISP_STALL_TRANSLATION", - "BriefDescription": "Cycles when dispatch was stalled for this thread because the MMU was handling a translation miss." - }, - { - "EventCode": "0x1003A", - "EventName": "PM_DISP_STALL_BR_MPRED_IC_L2", - "BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from the local L2 after suffering a branch mispredict." - }, - { - "EventCode": "0x1D05E", - "EventName": "PM_DISP_STALL_HELD_HALT_CYC", - "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because of power management." - }, - { - "EventCode": "0x1E050", - "EventName": "PM_DISP_STALL_HELD_STF_MAPPER_CYC", - "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because the STF mapper/SRB was full. Includes GPR (count, link, tar), VSR, VMR, FPR." - }, { "EventCode": "0x1F054", "EventName": "PM_DTLB_HIT", "BriefDescription": "The PTE required by the instruction was resident in the TLB (data TLB access). When MMCR1[16]=0 this event counts only demand hits. When MMCR1[16]=1 this event includes demand and prefetch. Applies to both HPT and RPT." }, - { - "EventCode": "0x10064", - "EventName": "PM_DISP_STALL_IC_L2", - "BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from the local L2." - }, - { - "EventCode": "0x101E8", - "EventName": "PM_THRESH_EXC_256", - "BriefDescription": "Threshold counter exceeded a count of 256." - }, - { - "EventCode": "0x101EC", - "EventName": "PM_THRESH_MET", - "BriefDescription": "Threshold exceeded." - }, { "EventCode": "0x100F2", "EventName": "PM_1PLUS_PPC_CMPL", @@ -69,56 +14,6 @@ "EventName": "PM_IERAT_MISS", "BriefDescription": "IERAT Reloaded to satisfy an IERAT miss. All page sizes are counted by this event. This event only counts instruction demand access." }, - { - "EventCode": "0x100F8", - "EventName": "PM_DISP_STALL_CYC", - "BriefDescription": "Cycles the ICT has no itags assigned to this thread (no instructions were dispatched during these cycles)." - }, - { - "EventCode": "0x20006", - "EventName": "PM_DISP_STALL_HELD_ISSQ_FULL_CYC", - "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch due to Issue queue full. Includes issue queue and branch queue." - }, - { - "EventCode": "0x20114", - "EventName": "PM_MRK_L2_RC_DISP", - "BriefDescription": "Marked instruction RC dispatched in L2." - }, - { - "EventCode": "0x2C010", - "EventName": "PM_EXEC_STALL_LSU", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was executing in the Load Store Unit. This does not include simple fixed point instructions." - }, - { - "EventCode": "0x2C016", - "EventName": "PM_DISP_STALL_IERAT_ONLY_MISS", - "BriefDescription": "Cycles when dispatch was stalled while waiting to resolve an instruction ERAT miss." - }, - { - "EventCode": "0x2C01E", - "EventName": "PM_DISP_STALL_BR_MPRED_IC_L3", - "BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from the local L3 after suffering a branch mispredict." - }, - { - "EventCode": "0x2D01A", - "EventName": "PM_DISP_STALL_IC_MISS", - "BriefDescription": "Cycles when dispatch was stalled for this thread due to an instruction cache miss." - }, - { - "EventCode": "0x2E018", - "EventName": "PM_DISP_STALL_FETCH", - "BriefDescription": "Cycles when dispatch was stalled for this thread because Fetch was being held." - }, - { - "EventCode": "0x2E01A", - "EventName": "PM_DISP_STALL_HELD_XVFC_MAPPER_CYC", - "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because the XVFC mapper/SRB was full." - }, - { - "EventCode": "0x2C142", - "EventName": "PM_MRK_XFER_FROM_SRC_PMC2", - "BriefDescription": "For a marked data transfer instruction, the processor's L1 data cache was reloaded from the source specified in MMCR3[15:27]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads." - }, { "EventCode": "0x24050", "EventName": "PM_IOPS_DISP", @@ -134,11 +29,6 @@ "EventName": "PM_BR_TAKEN_CMPL", "BriefDescription": "Branch Taken instruction completed." }, - { - "EventCode": "0x30004", - "EventName": "PM_DISP_STALL_FLUSH", - "BriefDescription": "Cycles when dispatch was stalled because of a flush that happened to an instruction(s) that was not yet next-to-complete (NTC). PM_EXEC_STALL_NTC_FLUSH only includes instructions that were flushed after becoming NTC." - }, { "EventCode": "0x3000A", "EventName": "PM_DISP_STALL_ITLB_MISS", @@ -149,56 +39,16 @@ "EventName": "PM_FLUSH_COMPLETION", "BriefDescription": "The instruction that was next to complete (oldest in the pipeline) did not complete because it suffered a flush." }, - { - "EventCode": "0x30014", - "EventName": "PM_EXEC_STALL_STORE", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a store instruction executing in the Load Store Unit." - }, - { - "EventCode": "0x30018", - "EventName": "PM_DISP_STALL_HELD_SCOREBOARD_CYC", - "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch while waiting on the Scoreboard. This event combines VSCR and FPSCR together." - }, - { - "EventCode": "0x30026", - "EventName": "PM_EXEC_STALL_STORE_MISS", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a store whose cache line was not resident in the L1 and was waiting for allocation of the missing line into the L1." - }, - { - "EventCode": "0x3012A", - "EventName": "PM_MRK_L2_RC_DONE", - "BriefDescription": "L2 RC machine completed the transaction for the marked instruction." - }, { "EventCode": "0x3F046", "EventName": "PM_ITLB_HIT_1G", "BriefDescription": "Instruction TLB hit (IERAT reload) page size 1G, which implies Radix Page Table translation is in use. When MMCR1[17]=0 this event counts only for demand misses. When MMCR1[17]=1 this event includes demand misses and prefetches." }, - { - "EventCode": "0x34058", - "EventName": "PM_DISP_STALL_BR_MPRED_ICMISS", - "BriefDescription": "Cycles when dispatch was stalled after a mispredicted branch resulted in an instruction cache miss." - }, - { - "EventCode": "0x3D05C", - "EventName": "PM_DISP_STALL_HELD_RENAME_CYC", - "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because the mapper/SRB was full. Includes GPR (count, link, tar), VSR, VMR, FPR and XVFC." - }, - { - "EventCode": "0x3E052", - "EventName": "PM_DISP_STALL_IC_L3", - "BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from the local L3." - }, { "EventCode": "0x3E054", "EventName": "PM_LD_MISS_L1", "BriefDescription": "Load missed L1, counted at finish time. LMQ merges are not included in this count. i.e. if a load instruction misses on an address that is already allocated on the LMQ, this event will not increment for that load). Note that this count is per slice, so if a load spans multiple slices this event will increment multiple times for a single load." }, - { - "EventCode": "0x301EA", - "EventName": "PM_THRESH_EXC_1024", - "BriefDescription": "Threshold counter exceeded a value of 1024." - }, { "EventCode": "0x300FA", "EventName": "PM_INST_FROM_L3MISS", @@ -209,36 +59,6 @@ "EventName": "PM_ISSUE_KILL", "BriefDescription": "Cycles in which an instruction or group of instructions were cancelled after being issued. This event increments once per occurrence, regardless of how many instructions are included in the issue group." }, - { - "EventCode": "0x40116", - "EventName": "PM_MRK_LARX_FIN", - "BriefDescription": "Marked load and reserve instruction (LARX) finished. LARX and STCX are instructions used to acquire a lock." - }, - { - "EventCode": "0x4C010", - "EventName": "PM_DISP_STALL_BR_MPRED_IC_L3MISS", - "BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from sources beyond the local L3 after suffering a mispredicted branch." - }, - { - "EventCode": "0x4D01E", - "EventName": "PM_DISP_STALL_BR_MPRED", - "BriefDescription": "Cycles when dispatch was stalled for this thread due to a mispredicted branch." - }, - { - "EventCode": "0x4E010", - "EventName": "PM_DISP_STALL_IC_L3MISS", - "BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from any source beyond the local L3." - }, - { - "EventCode": "0x4E01A", - "EventName": "PM_DISP_STALL_HELD_CYC", - "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch for any reason." - }, - { - "EventCode": "0x4003C", - "EventName": "PM_DISP_STALL_HELD_SYNC_CYC", - "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because of a synchronizing instruction that requires the ICT to be empty before dispatch." - }, { "EventCode": "0x44056", "EventName": "PM_VECTOR_ST_CMPL", diff --git a/tools/perf/pmu-events/arch/powerpc/power10/marked.json b/tools/perf/pmu-events/arch/powerpc/power10/marked.json index f2436fc5537ce..913b6515b8701 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/marked.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/marked.json @@ -1,14 +1,29 @@ [ - { - "EventCode": "0x1002C", - "EventName": "PM_LD_PREFETCH_CACHE_LINE_MISS", - "BriefDescription": "The L1 cache was reloaded with a line that fulfills a prefetch request." - }, { "EventCode": "0x10132", "EventName": "PM_MRK_INST_ISSUED", "BriefDescription": "Marked instruction issued. Note that stores always get issued twice, the address gets issued to the LSU and the data gets issued to the VSU. Also, issues can sometimes get killed/cancelled and cause multiple sequential issues for the same instruction." }, + { + "EventCode": "0x10134", + "EventName": "PM_MRK_ST_DONE_L2", + "BriefDescription": "Marked store completed in L2." + }, + { + "EventCode": "0x1C142", + "EventName": "PM_MRK_XFER_FROM_SRC_PMC1", + "BriefDescription": "For a marked data transfer instruction, the processor's L1 data cache was reloaded from the source specified in MMCR3[0:12]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads." + }, + { + "EventCode": "0x1C144", + "EventName": "PM_MRK_XFER_FROM_SRC_CYC_PMC1", + "BriefDescription": "Cycles taken for a marked demand miss to reload a line from the source specified in MMCR3[0:12]." + }, + { + "EventCode": "0x1F150", + "EventName": "PM_MRK_ST_L2_CYC", + "BriefDescription": "Cycles from L2 RC dispatch to L2 RC completion." + }, { "EventCode": "0x101E0", "EventName": "PM_MRK_INST_DISP", @@ -20,9 +35,39 @@ "BriefDescription": "Marked Branch Taken instruction completed." }, { - "EventCode": "0x2C01C", - "EventName": "PM_EXEC_STALL_DMISS_OFF_CHIP", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from a remote chip." + "EventCode": "0x101E4", + "EventName": "PM_MRK_L1_ICACHE_MISS", + "BriefDescription": "Marked instruction suffered an instruction cache miss." + }, + { + "EventCode": "0x101EA", + "EventName": "PM_MRK_L1_RELOAD_VALID", + "BriefDescription": "Marked demand reload." + }, + { + "EventCode": "0x20114", + "EventName": "PM_MRK_L2_RC_DISP", + "BriefDescription": "Marked instruction RC dispatched in L2." + }, + { + "EventCode": "0x2011C", + "EventName": "PM_MRK_NTF_CYC", + "BriefDescription": "Cycles in which the marked instruction is the oldest in the pipeline (next-to-finish or next-to-complete)." + }, + { + "EventCode": "0x20130", + "EventName": "PM_MRK_INST_DECODED", + "BriefDescription": "An instruction was marked at decode time. Random Instruction Sampling (RIS) only." + }, + { + "EventCode": "0x20132", + "EventName": "PM_MRK_DFU_ISSUE", + "BriefDescription": "The marked instruction was a decimal floating point operation issued to the VSU. Measured at issue time." + }, + { + "EventCode": "0x20134", + "EventName": "PM_MRK_FXU_ISSUE", + "BriefDescription": "The marked instruction was a fixed point operation issued to the VSU. Measured at issue time." }, { "EventCode": "0x20138", @@ -34,6 +79,16 @@ "EventName": "PM_MRK_BRU_FIN", "BriefDescription": "Marked Branch instruction finished." }, + { + "EventCode": "0x2013C", + "EventName": "PM_MRK_FX_LSU_FIN", + "BriefDescription": "The marked instruction was simple fixed point that was issued to the store unit. Measured at finish time." + }, + { + "EventCode": "0x2C142", + "EventName": "PM_MRK_XFER_FROM_SRC_PMC2", + "BriefDescription": "For a marked data transfer instruction, the processor's L1 data cache was reloaded from the source specified in MMCR3[15:27]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads." + }, { "EventCode": "0x2C144", "EventName": "PM_MRK_XFER_FROM_SRC_CYC_PMC2", @@ -55,15 +110,50 @@ "BriefDescription": "A marked branch completed. All branches are included." }, { - "EventCode": "0x200FD", - "EventName": "PM_L1_ICACHE_MISS", - "BriefDescription": "Demand instruction cache miss." + "EventCode": "0x2D154", + "EventName": "PM_MRK_DERAT_MISS_64K", + "BriefDescription": "Data ERAT Miss (Data TLB Access) page size 64K for a marked instruction. When MMCR1[16]=0 this event counts only DERAT reloads for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches." + }, + { + "EventCode": "0x201E0", + "EventName": "PM_MRK_DATA_FROM_MEMORY", + "BriefDescription": "The processor's data cache was reloaded from local, remote, or distant memory due to a demand miss for a marked load." + }, + { + "EventCode": "0x201E2", + "EventName": "PM_MRK_LD_MISS_L1", + "BriefDescription": "Marked demand data load miss counted at finish time." + }, + { + "EventCode": "0x201E4", + "EventName": "PM_MRK_DATA_FROM_L3MISS", + "BriefDescription": "The processor's data cache was reloaded from a source other than the local core's L1, L2, or L3 due to a demand miss for a marked load." + }, + { + "EventCode": "0x3012A", + "EventName": "PM_MRK_L2_RC_DONE", + "BriefDescription": "L2 RC machine completed the transaction for the marked instruction." + }, + { + "EventCode": "0x30132", + "EventName": "PM_MRK_VSU_FIN", + "BriefDescription": "VSU marked instruction finished. Excludes simple FX instructions issued to the Store Unit." }, { "EventCode": "0x34146", "EventName": "PM_MRK_LD_CMPL", "BriefDescription": "Marked load instruction completed." }, + { + "EventCode": "0x3C142", + "EventName": "PM_MRK_XFER_FROM_SRC_PMC3", + "BriefDescription": "For a marked data transfer instruction, the processor's L1 data cache was reloaded from the source specified in MMCR3[30:42]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads." + }, + { + "EventCode": "0x3C144", + "EventName": "PM_MRK_XFER_FROM_SRC_CYC_PMC3", + "BriefDescription": "Cycles taken for a marked demand miss to reload a line from the source specified in MMCR3[30:42]." + }, { "EventCode": "0x3E158", "EventName": "PM_MRK_STCX_FAIL", @@ -75,9 +165,19 @@ "BriefDescription": "Marked store instruction finished." }, { - "EventCode": "0x30068", - "EventName": "PM_L1_ICACHE_RELOADED_PREF", - "BriefDescription": "Counts all instruction cache prefetch reloads (includes demand turned into prefetch)." + "EventCode": "0x3F150", + "EventName": "PM_MRK_ST_DRAIN_CYC", + "BriefDescription": "Cycles in which the marked store drained from the core to the L2." + }, + { + "EventCode": "0x30162", + "EventName": "PM_MRK_ISSUE_DEPENDENT_LOAD", + "BriefDescription": "The marked instruction was dependent on a load. It is eligible for issue kill." + }, + { + "EventCode": "0x301E2", + "EventName": "PM_MRK_ST_CMPL", + "BriefDescription": "Marked store completed and sent to nest. Note that this count excludes cache-inhibited stores." }, { "EventCode": "0x301E4", @@ -85,39 +185,44 @@ "BriefDescription": "Marked Branch Mispredicted. Includes direction and target." }, { - "EventCode": "0x300F6", - "EventName": "PM_LD_DEMAND_MISS_L1", - "BriefDescription": "The L1 cache was reloaded with a line that fulfills a demand miss request. Counted at reload time, before finish." + "EventCode": "0x40116", + "EventName": "PM_MRK_LARX_FIN", + "BriefDescription": "Marked load and reserve instruction (LARX) finished. LARX and STCX are instructions used to acquire a lock." + }, + { + "EventCode": "0x40132", + "EventName": "PM_MRK_LSU_FIN", + "BriefDescription": "LSU marked instruction finish." }, { - "EventCode": "0x300FE", - "EventName": "PM_DATA_FROM_L3MISS", - "BriefDescription": "The processor's L1 data cache was reloaded from beyond the local core's L3 due to a demand miss." + "EventCode": "0x44146", + "EventName": "PM_MRK_STCX_CORE_CYC", + "BriefDescription": "Cycles spent in the core portion of a marked STCX instruction. It starts counting when the instruction is decoded and stops counting when it drains into the L2." }, { - "EventCode": "0x40012", - "EventName": "PM_L1_ICACHE_RELOADED_ALL", - "BriefDescription": "Counts all instruction cache reloads includes demand, prefetch, prefetch turned into demand and demand turned into prefetch." + "EventCode": "0x4C142", + "EventName": "PM_MRK_XFER_FROM_SRC_PMC4", + "BriefDescription": "For a marked data transfer instruction, the processor's L1 data cache was reloaded from the source specified in MMCR3[45:57]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads." }, { - "EventCode": "0x40134", - "EventName": "PM_MRK_INST_TIMEO", - "BriefDescription": "Marked instruction finish timeout (instruction was lost)." + "EventCode": "0x4C144", + "EventName": "PM_MRK_XFER_FROM_SRC_CYC_PMC4", + "BriefDescription": "Cycles taken for a marked demand miss to reload a line from the source specified in MMCR3[45:57]." }, { - "EventCode": "0x4505A", - "EventName": "PM_SP_FLOP_CMPL", - "BriefDescription": "Single Precision floating point instruction completed." + "EventCode": "0x4C15E", + "EventName": "PM_MRK_DTLB_MISS_64K", + "BriefDescription": "Marked Data TLB reload (after a miss) page size 64K. When MMCR1[16]=0 this event counts only for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches." }, { - "EventCode": "0x4D058", - "EventName": "PM_VECTOR_FLOP_CMPL", - "BriefDescription": "Vector floating point instruction completed." + "EventCode": "0x4E15E", + "EventName": "PM_MRK_INST_FLUSHED", + "BriefDescription": "The marked instruction was flushed." }, { - "EventCode": "0x4D05A", - "EventName": "PM_NON_MATH_FLOP_CMPL", - "BriefDescription": "Non Math instruction completed." + "EventCode": "0x40164", + "EventName": "PM_MRK_DERAT_MISS_2M", + "BriefDescription": "Data ERAT Miss (Data TLB Access) page size 2M for a marked instruction. When MMCR1[16]=0 this event counts only DERAT reloads for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches." }, { "EventCode": "0x401E0", @@ -125,8 +230,13 @@ "BriefDescription": "Marked instruction completed." }, { - "EventCode": "0x400FE", - "EventName": "PM_DATA_FROM_MEMORY", - "BriefDescription": "The processor's data cache was reloaded from local, remote, or distant memory due to a demand miss." + "EventCode": "0x401E6", + "EventName": "PM_MRK_INST_FROM_L3MISS", + "BriefDescription": "The processor's instruction cache was reloaded from beyond the local core's L3 due to a demand miss for a marked instruction." + }, + { + "EventCode": "0x401E8", + "EventName": "PM_MRK_DATA_FROM_L2MISS", + "BriefDescription": "The processor's L1 data cache was reloaded from a source beyond the local core's L2 due to a demand miss for a marked instruction." } ] diff --git a/tools/perf/pmu-events/arch/powerpc/power10/memory.json b/tools/perf/pmu-events/arch/powerpc/power10/memory.json index c4c10ca98cad7..b95a547a704b3 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/memory.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/memory.json @@ -1,24 +1,9 @@ [ - { - "EventCode": "0x1000A", - "EventName": "PM_PMC3_REWIND", - "BriefDescription": "The speculative event selected for PMC3 rewinds and the counter for PMC3 is not charged." - }, { "EventCode": "0x1C040", "EventName": "PM_XFER_FROM_SRC_PMC1", "BriefDescription": "The processor's L1 data cache was reloaded from the source specified in MMCR3[0:12]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads." }, - { - "EventCode": "0x1C142", - "EventName": "PM_MRK_XFER_FROM_SRC_PMC1", - "BriefDescription": "For a marked data transfer instruction, the processor's L1 data cache was reloaded from the source specified in MMCR3[0:12]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads." - }, - { - "EventCode": "0x1C144", - "EventName": "PM_MRK_XFER_FROM_SRC_CYC_PMC1", - "BriefDescription": "Cycles taken for a marked demand miss to reload a line from the source specified in MMCR3[0:12]." - }, { "EventCode": "0x1C056", "EventName": "PM_DERAT_MISS_4K", @@ -34,26 +19,11 @@ "EventName": "PM_DTLB_MISS_2M", "BriefDescription": "Data TLB reload (after a miss) page size 2M. Implies radix translation was used. When MMCR1[16]=0 this event counts only for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches." }, - { - "EventCode": "0x1E056", - "EventName": "PM_EXEC_STALL_STORE_PIPE", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was executing in the store unit. This does not include cycles spent handling store misses, PTESYNC instructions or TLBIE instructions." - }, - { - "EventCode": "0x1F150", - "EventName": "PM_MRK_ST_L2_CYC", - "BriefDescription": "Cycles from L2 RC dispatch to L2 RC completion." - }, { "EventCode": "0x10062", "EventName": "PM_LD_L3MISS_PEND_CYC", "BriefDescription": "Cycles in which an L3 miss was pending for this thread." }, - { - "EventCode": "0x20010", - "EventName": "PM_PMC1_OVERFLOW", - "BriefDescription": "The event selected for PMC1 caused the event counter to overflow." - }, { "EventCode": "0x2001A", "EventName": "PM_ITLB_HIT", @@ -79,36 +49,16 @@ "EventName": "PM_DTLB_MISS_4K", "BriefDescription": "Data TLB reload (after a miss) page size 4K. When MMCR1[16]=0 this event counts only for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches." }, - { - "EventCode": "0x2D154", - "EventName": "PM_MRK_DERAT_MISS_64K", - "BriefDescription": "Data ERAT Miss (Data TLB Access) page size 64K for a marked instruction. When MMCR1[16]=0 this event counts only DERAT reloads for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches." - }, { "EventCode": "0x200F6", "EventName": "PM_DERAT_MISS", "BriefDescription": "DERAT Reloaded to satisfy a DERAT miss. All page sizes are counted by this event. When MMCR1[16]=0 this event counts only DERAT reloads for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches." }, - { - "EventCode": "0x30016", - "EventName": "PM_EXEC_STALL_DERAT_DTLB_MISS", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline suffered a TLB miss and waited for it resolve." - }, { "EventCode": "0x3C040", "EventName": "PM_XFER_FROM_SRC_PMC3", "BriefDescription": "The processor's L1 data cache was reloaded from the source specified in MMCR3[30:42]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads." }, - { - "EventCode": "0x3C142", - "EventName": "PM_MRK_XFER_FROM_SRC_PMC3", - "BriefDescription": "For a marked data transfer instruction, the processor's L1 data cache was reloaded from the source specified in MMCR3[30:42]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads." - }, - { - "EventCode": "0x3C144", - "EventName": "PM_MRK_XFER_FROM_SRC_CYC_PMC3", - "BriefDescription": "Cycles taken for a marked demand miss to reload a line from the source specified in MMCR3[30:42]." - }, { "EventCode": "0x3C054", "EventName": "PM_DERAT_MISS_16M", @@ -124,21 +74,11 @@ "EventName": "PM_LARX_FIN", "BriefDescription": "Load and reserve instruction (LARX) finished. LARX and STCX are instructions used to acquire a lock." }, - { - "EventCode": "0x301E2", - "EventName": "PM_MRK_ST_CMPL", - "BriefDescription": "Marked store completed and sent to nest. Note that this count excludes cache-inhibited stores." - }, { "EventCode": "0x300FC", "EventName": "PM_DTLB_MISS", "BriefDescription": "The DPTEG required for the load/store instruction in execution was missing from the TLB. This event only counts for demand misses." }, - { - "EventCode": "0x4D02C", - "EventName": "PM_PMC1_REWIND", - "BriefDescription": "The speculative event selected for PMC1 rewinds and the counter for PMC1 is not charged." - }, { "EventCode": "0x4003E", "EventName": "PM_LD_CMPL", @@ -149,16 +89,6 @@ "EventName": "PM_XFER_FROM_SRC_PMC4", "BriefDescription": "The processor's L1 data cache was reloaded from the source specified in MMCR3[45:57]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads." }, - { - "EventCode": "0x4C142", - "EventName": "PM_MRK_XFER_FROM_SRC_PMC4", - "BriefDescription": "For a marked data transfer instruction, the processor's L1 data cache was reloaded from the source specified in MMCR3[45:57]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads." - }, - { - "EventCode": "0x4C144", - "EventName": "PM_MRK_XFER_FROM_SRC_CYC_PMC4", - "BriefDescription": "Cycles taken for a marked demand miss to reload a line from the source specified in MMCR3[45:57]." - }, { "EventCode": "0x4C056", "EventName": "PM_DTLB_MISS_16M", @@ -168,20 +98,5 @@ "EventCode": "0x4C05A", "EventName": "PM_DTLB_MISS_1G", "BriefDescription": "Data TLB reload (after a miss) page size 1G. Implies radix translation was used. When MMCR1[16]=0 this event counts only for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches." - }, - { - "EventCode": "0x4C15E", - "EventName": "PM_MRK_DTLB_MISS_64K", - "BriefDescription": "Marked Data TLB reload (after a miss) page size 64K. When MMCR1[16]=0 this event counts only for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches." - }, - { - "EventCode": "0x4D056", - "EventName": "PM_NON_FMA_FLOP_CMPL", - "BriefDescription": "Non FMA instruction completed." - }, - { - "EventCode": "0x40164", - "EventName": "PM_MRK_DERAT_MISS_2M", - "BriefDescription": "Data ERAT Miss (Data TLB Access) page size 2M for a marked instruction. When MMCR1[16]=0 this event counts only DERAT reloads for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches." } ] diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json b/tools/perf/pmu-events/arch/powerpc/power10/others.json index 17c5424ef1ac1..f09c00c89322e 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/others.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json @@ -1,23 +1,8 @@ [ { - "EventCode": "0x10016", - "EventName": "PM_VSU0_ISSUE", - "BriefDescription": "VSU instruction issued to VSU pipe 0." - }, - { - "EventCode": "0x1001C", - "EventName": "PM_ULTRAVISOR_INST_CMPL", - "BriefDescription": "PowerPC instruction completed while the thread was in ultravisor state." - }, - { - "EventCode": "0x100F0", - "EventName": "PM_CYC", - "BriefDescription": "Processor cycles." - }, - { - "EventCode": "0x10134", - "EventName": "PM_MRK_ST_DONE_L2", - "BriefDescription": "Marked store completed in L2." + "EventCode": "0x1002C", + "EventName": "PM_LD_PREFETCH_CACHE_LINE_MISS", + "BriefDescription": "The L1 cache was reloaded with a line that fulfills a prefetch request." }, { "EventCode": "0x1505E", @@ -34,36 +19,11 @@ "EventName": "PM_ADJUNCT_CYC", "BriefDescription": "Cycles in which the thread is in Adjunct state. MSR[S HV PR] bits = 011." }, - { - "EventCode": "0x101E4", - "EventName": "PM_MRK_L1_ICACHE_MISS", - "BriefDescription": "Marked instruction suffered an instruction cache miss." - }, - { - "EventCode": "0x101EA", - "EventName": "PM_MRK_L1_RELOAD_VALID", - "BriefDescription": "Marked demand reload." - }, - { - "EventCode": "0x100F4", - "EventName": "PM_FLOP_CMPL", - "BriefDescription": "Floating Point Operations Completed. Includes any type. It counts once for each 1, 2, 4 or 8 flop instruction. Use PM_1|2|4|8_FLOP_CMPL events to count flops." - }, - { - "EventCode": "0x100FA", - "EventName": "PM_RUN_LATCH_ANY_THREAD_CYC", - "BriefDescription": "Cycles when at least one thread has the run latch set." - }, { "EventCode": "0x100FC", "EventName": "PM_LD_REF_L1", "BriefDescription": "All L1 D cache load references counted at finish, gated by reject. In P9 and earlier this event counted only cacheable loads but in P10 both cacheable and non-cacheable loads are included." }, - { - "EventCode": "0x2000C", - "EventName": "PM_RUN_LATCH_ALL_THREADS_CYC", - "BriefDescription": "Cycles when the run latch is set for all threads." - }, { "EventCode": "0x2E010", "EventName": "PM_ADJUNCT_INST_CMPL", @@ -74,26 +34,6 @@ "EventName": "PM_STCX_FIN", "BriefDescription": "Conditional store instruction (STCX) finished. LARX and STCX are instructions used to acquire a lock." }, - { - "EventCode": "0x20130", - "EventName": "PM_MRK_INST_DECODED", - "BriefDescription": "An instruction was marked at decode time. Random Instruction Sampling (RIS) only." - }, - { - "EventCode": "0x20132", - "EventName": "PM_MRK_DFU_ISSUE", - "BriefDescription": "The marked instruction was a decimal floating point operation issued to the VSU. Measured at issue time." - }, - { - "EventCode": "0x20134", - "EventName": "PM_MRK_FXU_ISSUE", - "BriefDescription": "The marked instruction was a fixed point operation issued to the VSU. Measured at issue time." - }, - { - "EventCode": "0x2505C", - "EventName": "PM_VSU_ISSUE", - "BriefDescription": "At least one VSU instruction was issued to one of the VSU pipes. Up to 4 per cycle. Includes fixed point operations." - }, { "EventCode": "0x2F054", "EventName": "PM_DISP_SS1_2_INSTR_CYC", @@ -104,40 +44,15 @@ "EventName": "PM_DISP_SS1_4_INSTR_CYC", "BriefDescription": "Cycles in which Superslice 1 dispatches either 3 or 4 instructions." }, - { - "EventCode": "0x2006C", - "EventName": "PM_RUN_CYC_SMT4_MODE", - "BriefDescription": "Cycles when this thread's run latch is set and the core is in SMT4 mode." - }, - { - "EventCode": "0x201E0", - "EventName": "PM_MRK_DATA_FROM_MEMORY", - "BriefDescription": "The processor's data cache was reloaded from local, remote, or distant memory due to a demand miss for a marked load." - }, - { - "EventCode": "0x201E4", - "EventName": "PM_MRK_DATA_FROM_L3MISS", - "BriefDescription": "The processor's L1 data cache was reloaded from beyond the local core's L3 due to a demand miss for a marked instruction." - }, - { - "EventCode": "0x201E8", - "EventName": "PM_THRESH_EXC_512", - "BriefDescription": "Threshold counter exceeded a value of 512." - }, { "EventCode": "0x200F2", "EventName": "PM_INST_DISP", "BriefDescription": "PowerPC instruction dispatched." }, { - "EventCode": "0x30132", - "EventName": "PM_MRK_VSU_FIN", - "BriefDescription": "VSU marked instruction finished. Excludes simple FX instructions issued to the Store Unit." - }, - { - "EventCode": "0x30038", - "EventName": "PM_EXEC_STALL_DMISS_LMEM", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from the local memory, local OpenCAPI cache, or local OpenCAPI memory." + "EventCode": "0x200FD", + "EventName": "PM_L1_ICACHE_MISS", + "BriefDescription": "Demand instruction cache miss." }, { "EventCode": "0x3F04A", @@ -149,11 +64,6 @@ "EventName": "PM_PRIVILEGED_INST_CMPL", "BriefDescription": "PowerPC instruction completed while the thread was in Privileged state." }, - { - "EventCode": "0x3F150", - "EventName": "PM_MRK_ST_DRAIN_CYC", - "BriefDescription": "Cycles in which the marked store drained from the core to the L2." - }, { "EventCode": "0x3F054", "EventName": "PM_DISP_SS0_4_INSTR_CYC", @@ -165,103 +75,43 @@ "BriefDescription": "Cycles in which Superslice 0 dispatches either 5, 6, 7 or 8 instructions." }, { - "EventCode": "0x30162", - "EventName": "PM_MRK_ISSUE_DEPENDENT_LOAD", - "BriefDescription": "The marked instruction was dependent on a load. It is eligible for issue kill." - }, - { - "EventCode": "0x40114", - "EventName": "PM_MRK_START_PROBE_NOP_DISP", - "BriefDescription": "Marked Start probe nop dispatched. Instruction AND R0,R0,R0." - }, - { - "EventCode": "0x4001C", - "EventName": "PM_VSU_FIN", - "BriefDescription": "VSU instruction finished." - }, - { - "EventCode": "0x4C01A", - "EventName": "PM_EXEC_STALL_DMISS_OFF_NODE", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from a distant chip." - }, - { - "EventCode": "0x4D012", - "EventName": "PM_PMC3_SAVED", - "BriefDescription": "The conditions for the speculative event selected for PMC3 are met and PMC3 is charged." - }, - { - "EventCode": "0x4D022", - "EventName": "PM_HYPERVISOR_INST_CMPL", - "BriefDescription": "PowerPC instruction completed while the thread was in hypervisor state." - }, - { - "EventCode": "0x4D026", - "EventName": "PM_ULTRAVISOR_CYC", - "BriefDescription": "Cycles when the thread is in Ultravisor state. MSR[S HV PR]=110." + "EventCode": "0x30068", + "EventName": "PM_L1_ICACHE_RELOADED_PREF", + "BriefDescription": "Counts all instruction cache prefetch reloads (includes demand turned into prefetch)." }, { - "EventCode": "0x4D028", - "EventName": "PM_PRIVILEGED_CYC", - "BriefDescription": "Cycles when the thread is in Privileged state. MSR[S HV PR]=x00." + "EventCode": "0x300F6", + "EventName": "PM_LD_DEMAND_MISS_L1", + "BriefDescription": "The L1 cache was reloaded with a line that fulfills a demand miss request. Counted at reload time, before finish." }, { - "EventCode": "0x40030", - "EventName": "PM_INST_FIN", - "BriefDescription": "Instruction finished." + "EventCode": "0x300FE", + "EventName": "PM_DATA_FROM_L3MISS", + "BriefDescription": "The processor's data cache was reloaded from a source other than the local core's L1, L2, or L3 due to a demand miss." }, { - "EventCode": "0x44146", - "EventName": "PM_MRK_STCX_CORE_CYC", - "BriefDescription": "Cycles spent in the core portion of a marked STCX instruction. It starts counting when the instruction is decoded and stops counting when it drains into the L2." + "EventCode": "0x40012", + "EventName": "PM_L1_ICACHE_RELOADED_ALL", + "BriefDescription": "Counts all instruction cache reloads includes demand, prefetch, prefetch turned into demand and demand turned into prefetch." }, { "EventCode": "0x44054", "EventName": "PM_VECTOR_LD_CMPL", "BriefDescription": "Vector load instruction completed." }, - { - "EventCode": "0x45054", - "EventName": "PM_FMA_CMPL", - "BriefDescription": "Two floating point instruction completed (FMA class of instructions: fmadd, fnmadd, fmsub, fnmsub). Scalar instructions only." - }, - { - "EventCode": "0x45056", - "EventName": "PM_SCALAR_FLOP_CMPL", - "BriefDescription": "Scalar floating point instruction completed." - }, - { - "EventCode": "0x4505C", - "EventName": "PM_MATH_FLOP_CMPL", - "BriefDescription": "Math floating point instruction completed." - }, { "EventCode": "0x4D05E", "EventName": "PM_BR_CMPL", "BriefDescription": "A branch completed. All branches are included." }, - { - "EventCode": "0x4E15E", - "EventName": "PM_MRK_INST_FLUSHED", - "BriefDescription": "The marked instruction was flushed." - }, - { - "EventCode": "0x401E6", - "EventName": "PM_MRK_INST_FROM_L3MISS", - "BriefDescription": "The processor's instruction cache was reloaded from beyond the local core's L3 due to a demand miss for a marked instruction." - }, - { - "EventCode": "0x401E8", - "EventName": "PM_MRK_DATA_FROM_L2MISS", - "BriefDescription": "The processor's L1 data cache was reloaded from a source beyond the local core's L2 due to a demand miss for a marked instruction." - }, { "EventCode": "0x400F0", "EventName": "PM_LD_DEMAND_MISS_L1_FIN", "BriefDescription": "Load missed L1, counted at finish time." }, { - "EventCode": "0x500FA", - "EventName": "PM_RUN_INST_CMPL", - "BriefDescription": "PowerPC instruction completed while the run latch is set." + "EventCode": "0x400FE", + "EventName": "PM_DATA_FROM_MEMORY", + "BriefDescription": "The processor's data cache was reloaded from local, remote, or distant memory due to a demand miss." } ] diff --git a/tools/perf/pmu-events/arch/powerpc/power10/pipeline.json b/tools/perf/pmu-events/arch/powerpc/power10/pipeline.json index 799893c56f32b..a8272a2f05174 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/pipeline.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/pipeline.json @@ -1,8 +1,13 @@ [ { - "EventCode": "0x100FE", - "EventName": "PM_INST_CMPL", - "BriefDescription": "PowerPC instruction completed." + "EventCode": "0x10004", + "EventName": "PM_EXEC_STALL_TRANSLATION", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline suffered a TLB miss or ERAT miss and waited for it to resolve." + }, + { + "EventCode": "0x10006", + "EventName": "PM_DISP_STALL_HELD_OTHER_CYC", + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch for any other reason." }, { "EventCode": "0x1000C", @@ -30,14 +35,19 @@ "BriefDescription": "Cycles in which an instruction reload is pending to satisfy a demand miss." }, { - "EventCode": "0x10022", - "EventName": "PM_PMC2_SAVED", - "BriefDescription": "The conditions for the speculative event selected for PMC2 are met and PMC2 is charged." + "EventCode": "0x10038", + "EventName": "PM_DISP_STALL_TRANSLATION", + "BriefDescription": "Cycles when dispatch was stalled for this thread because the MMU was handling a translation miss." }, { - "EventCode": "0x10024", - "EventName": "PM_PMC5_OVERFLOW", - "BriefDescription": "The event selected for PMC5 caused the event counter to overflow." + "EventCode": "0x1003A", + "EventName": "PM_DISP_STALL_BR_MPRED_IC_L2", + "BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from the local L2 after suffering a branch mispredict." + }, + { + "EventCode": "0x1003C", + "EventName": "PM_EXEC_STALL_DMISS_L2L3", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from either the local L2 or local L3." }, { "EventCode": "0x10058", @@ -54,11 +64,36 @@ "EventName": "PM_DERAT_MISS_2M", "BriefDescription": "Data ERAT Miss (Data TLB Access) page size 2M. Implies radix translation. When MMCR1[16]=0 this event counts only DERAT reloads for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches." }, + { + "EventCode": "0x1D05E", + "EventName": "PM_DISP_STALL_HELD_HALT_CYC", + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because of power management." + }, + { + "EventCode": "0x1E050", + "EventName": "PM_DISP_STALL_HELD_STF_MAPPER_CYC", + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because the STF mapper/SRB was full. Includes GPR (count, link, tar), VSR, VMR, FPR." + }, + { + "EventCode": "0x1E054", + "EventName": "PM_EXEC_STALL_DMISS_L21_L31", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from another core's L2 or L3 on the same chip." + }, + { + "EventCode": "0x1E056", + "EventName": "PM_EXEC_STALL_STORE_PIPE", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was executing in the store unit. This does not include cycles spent handling store misses, PTESYNC instructions or TLBIE instructions." + }, { "EventCode": "0x1E05A", "EventName": "PM_CMPL_STALL_LWSYNC", "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a lwsync waiting to complete." }, + { + "EventCode": "0x10064", + "EventName": "PM_DISP_STALL_IC_L2", + "BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from the local L2." + }, { "EventCode": "0x10068", "EventName": "PM_BR_FIN", @@ -70,9 +105,9 @@ "BriefDescription": "Simple fixed point instruction issued to the store unit. Measured at finish time." }, { - "EventCode": "0x1006C", - "EventName": "PM_RUN_CYC_ST_MODE", - "BriefDescription": "Cycles when the run latch is set and the core is in ST mode." + "EventCode": "0x100F8", + "EventName": "PM_DISP_STALL_CYC", + "BriefDescription": "Cycles the ICT has no itags assigned to this thread (no instructions were dispatched during these cycles)." }, { "EventCode": "0x20004", @@ -80,69 +115,114 @@ "BriefDescription": "Cycles in which the oldest instruction in the pipeline was dispatched but not issued yet." }, { - "EventCode": "0x2000A", - "EventName": "PM_HYPERVISOR_CYC", - "BriefDescription": "Cycles when the thread is in Hypervisor state. MSR[S HV PR]=010." + "EventCode": "0x20006", + "EventName": "PM_DISP_STALL_HELD_ISSQ_FULL_CYC", + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch due to Issue queue full. Includes issue queue and branch queue." }, { "EventCode": "0x2000E", "EventName": "PM_LSU_LD1_FIN", "BriefDescription": "LSU Finished an internal operation in LD1 port." }, + { + "EventCode": "0x2C010", + "EventName": "PM_EXEC_STALL_LSU", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was executing in the Load Store Unit. This does not include simple fixed point instructions." + }, { "EventCode": "0x2C014", "EventName": "PM_CMPL_STALL_SPECIAL", "BriefDescription": "Cycles in which the oldest instruction in the pipeline required special handling before completing." }, + { + "EventCode": "0x2C016", + "EventName": "PM_DISP_STALL_IERAT_ONLY_MISS", + "BriefDescription": "Cycles when dispatch was stalled while waiting to resolve an instruction ERAT miss." + }, { "EventCode": "0x2C018", "EventName": "PM_EXEC_STALL_DMISS_L3MISS", "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from a source beyond the local L2 or local L3." }, + { + "EventCode": "0x2C01C", + "EventName": "PM_EXEC_STALL_DMISS_OFF_CHIP", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from a remote chip." + }, + { + "EventCode": "0x2C01E", + "EventName": "PM_DISP_STALL_BR_MPRED_IC_L3", + "BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from the local L3 after suffering a branch mispredict." + }, { "EventCode": "0x2D010", "EventName": "PM_LSU_ST1_FIN", "BriefDescription": "LSU Finished an internal operation in ST1 port." }, + { + "EventCode": "0x10016", + "EventName": "PM_VSU0_ISSUE", + "BriefDescription": "VSU instruction issued to VSU pipe 0." + }, { "EventCode": "0x2D012", "EventName": "PM_VSU1_ISSUE", "BriefDescription": "VSU instruction issued to VSU pipe 1." }, + { + "EventCode": "0x2505C", + "EventName": "PM_VSU_ISSUE", + "BriefDescription": "At least one VSU instruction was issued to one of the VSU pipes. Up to 4 per cycle. Includes fixed point operations." + }, + { + "EventCode": "0x4001C", + "EventName": "PM_VSU_FIN", + "BriefDescription": "VSU instruction finished." + }, { "EventCode": "0x2D018", "EventName": "PM_EXEC_STALL_VSU", "BriefDescription": "Cycles in which the oldest instruction in the pipeline was executing in the VSU (includes FXU, VSU, CRU)." }, + { + "EventCode": "0x2D01A", + "EventName": "PM_DISP_STALL_IC_MISS", + "BriefDescription": "Cycles when dispatch was stalled for this thread due to an instruction cache miss." + }, { "EventCode": "0x2D01C", "EventName": "PM_CMPL_STALL_STCX", "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a stcx waiting for resolution from the nest before completing." }, + { + "EventCode": "0x2E018", + "EventName": "PM_DISP_STALL_FETCH", + "BriefDescription": "Cycles when dispatch was stalled for this thread because Fetch was being held." + }, + { + "EventCode": "0x2E01A", + "EventName": "PM_DISP_STALL_HELD_XVFC_MAPPER_CYC", + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because the XVFC mapper/SRB was full." + }, + { + "EventCode": "0x2E01C", + "EventName": "PM_EXEC_STALL_TLBIE", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a TLBIE instruction executing in the Load Store Unit." + }, { "EventCode": "0x2E01E", "EventName": "PM_EXEC_STALL_NTC_FLUSH", "BriefDescription": "Cycles in which the oldest instruction in the pipeline was executing in any unit before it was flushed. Note that if the flush of the oldest instruction happens after finish, the cycles from dispatch to issue will be included in PM_DISP_STALL and the cycles from issue to finish will be included in PM_EXEC_STALL and its corresponding children. This event will also count cycles when the previous next-to-finish (NTF) instruction is still completing and the new NTF instruction is stalled at dispatch." }, - { - "EventCode": "0x2013C", - "EventName": "PM_MRK_FX_LSU_FIN", - "BriefDescription": "The marked instruction was simple fixed point that was issued to the store unit. Measured at finish time." - }, { "EventCode": "0x2405A", "EventName": "PM_NTC_FIN", "BriefDescription": "Cycles in which the oldest instruction in the pipeline (NTC) finishes. Note that instructions can finish out of order, therefore not all the instructions that finish have a Next-to-complete status." }, { - "EventCode": "0x201E2", - "EventName": "PM_MRK_LD_MISS_L1", - "BriefDescription": "Marked demand data load miss counted at finish time." - }, - { - "EventCode": "0x200F4", - "EventName": "PM_RUN_CYC", - "BriefDescription": "Processor cycles gated by the run latch." + "EventCode": "0x30004", + "EventName": "PM_DISP_STALL_FLUSH", + "BriefDescription": "Cycles when dispatch was stalled because of a flush that happened to an instruction(s) that was not yet next-to-complete (NTC). PM_EXEC_STALL_NTC_FLUSH only includes instructions that were flushed after becoming NTC." }, { "EventCode": "0x30008", @@ -150,24 +230,29 @@ "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting to finish in one of the execution units (BRU, LSU, VSU). Only cycles between issue and finish are counted in this category." }, { - "EventCode": "0x3001A", - "EventName": "PM_LSU_ST2_FIN", - "BriefDescription": "LSU Finished an internal operation in ST2 port." + "EventCode": "0x30014", + "EventName": "PM_EXEC_STALL_STORE", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a store instruction executing in the Load Store Unit." }, { - "EventCode": "0x30020", - "EventName": "PM_PMC2_REWIND", - "BriefDescription": "The speculative event selected for PMC2 rewinds and the counter for PMC2 is not charged." + "EventCode": "0x30016", + "EventName": "PM_EXEC_STALL_DERAT_DTLB_MISS", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline suffered a TLB miss and waited for it resolve." }, { - "EventCode": "0x30022", - "EventName": "PM_PMC4_SAVED", - "BriefDescription": "The conditions for the speculative event selected for PMC4 are met and PMC4 is charged." + "EventCode": "0x30018", + "EventName": "PM_DISP_STALL_HELD_SCOREBOARD_CYC", + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch while waiting on the Scoreboard. This event combines VSCR and FPSCR together." + }, + { + "EventCode": "0x3001A", + "EventName": "PM_LSU_ST2_FIN", + "BriefDescription": "LSU Finished an internal operation in ST2 port." }, { - "EventCode": "0x30024", - "EventName": "PM_PMC6_OVERFLOW", - "BriefDescription": "The event selected for PMC6 caused the event counter to overflow." + "EventCode": "0x30026", + "EventName": "PM_EXEC_STALL_STORE_MISS", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a store whose cache line was not resident in the L1 and was waiting for allocation of the missing line into the L1." }, { "EventCode": "0x30028", @@ -179,6 +264,11 @@ "EventName": "PM_EXEC_STALL_SIMPLE_FX", "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a simple fixed point instruction executing in the Load Store Unit." }, + { + "EventCode": "0x30038", + "EventName": "PM_EXEC_STALL_DMISS_LMEM", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from the local memory, local OpenCAPI cache, or local OpenCAPI memory." + }, { "EventCode": "0x3003A", "EventName": "PM_CMPL_STALL_EXCEPTION", @@ -194,6 +284,31 @@ "EventName": "PM_TLBIE_FIN", "BriefDescription": "TLBIE instruction finished in the LSU. Two TLBIEs can finish each cycle. All will be counted." }, + { + "EventCode": "0x34054", + "EventName": "PM_EXEC_STALL_DMISS_L2L3_NOCONFLICT", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from the local L2 or local L3, without a dispatch conflict." + }, + { + "EventCode": "0x34056", + "EventName": "PM_EXEC_STALL_LOAD_FINISH", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was finishing a load after its data was reloaded from a data source beyond the local L1; cycles in which the LSU was processing an L1-hit; cycles in which the next-to-finish (NTF) instruction merged with another load in the LMQ; cycles in which the NTF instruction is waiting for a data reload for a load miss, but the data comes back with a non-NTF instruction." + }, + { + "EventCode": "0x34058", + "EventName": "PM_DISP_STALL_BR_MPRED_ICMISS", + "BriefDescription": "Cycles when dispatch was stalled after a mispredicted branch resulted in an instruction cache miss." + }, + { + "EventCode": "0x3D05C", + "EventName": "PM_DISP_STALL_HELD_RENAME_CYC", + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because the mapper/SRB was full. Includes GPR (count, link, tar), VSR, VMR, FPR and XVFC." + }, + { + "EventCode": "0x3E052", + "EventName": "PM_DISP_STALL_IC_L3", + "BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from the local L3." + }, { "EventCode": "0x30066", "EventName": "PM_LSU_FIN", @@ -210,25 +325,45 @@ "BriefDescription": "Cycles in which both instructions in the ICT entry pair show as finished. These are the cycles between finish and completion for the oldest pair of instructions in the pipeline." }, { - "EventCode": "0x40010", - "EventName": "PM_PMC3_OVERFLOW", - "BriefDescription": "The event selected for PMC3 caused the event counter to overflow." + "EventCode": "0x4C010", + "EventName": "PM_DISP_STALL_BR_MPRED_IC_L3MISS", + "BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from sources beyond the local L3 after suffering a mispredicted branch." }, { "EventCode": "0x4C012", "EventName": "PM_EXEC_STALL_DERAT_ONLY_MISS", "BriefDescription": "Cycles in which the oldest instruction in the pipeline suffered an ERAT miss and waited for it resolve." }, + { + "EventCode": "0x4C016", + "EventName": "PM_EXEC_STALL_DMISS_L2L3_CONFLICT", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from the local L2 or local L3, with a dispatch conflict." + }, { "EventCode": "0x4C018", "EventName": "PM_CMPL_STALL", "BriefDescription": "Cycles in which the oldest instruction in the pipeline cannot complete because the thread was blocked for any reason." }, + { + "EventCode": "0x4C01A", + "EventName": "PM_EXEC_STALL_DMISS_OFF_NODE", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from a distant chip." + }, { "EventCode": "0x4C01E", "EventName": "PM_LSU_ST3_FIN", "BriefDescription": "LSU Finished an internal operation in ST3 port." }, + { + "EventCode": "0x4D014", + "EventName": "PM_EXEC_STALL_LOAD", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a load instruction executing in the Load Store Unit." + }, + { + "EventCode": "0x4D016", + "EventName": "PM_EXEC_STALL_PTESYNC", + "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a PTESYNC instruction executing in the Load Store Unit." + }, { "EventCode": "0x4D018", "EventName": "PM_EXEC_STALL_BRU", @@ -244,31 +379,41 @@ "EventName": "PM_EXEC_STALL_TLBIEL", "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a TLBIEL instruction executing in the Load Store Unit. TLBIEL instructions have lower overhead than TLBIE instructions because they don't get set to the nest." }, + { + "EventCode": "0x4D01E", + "EventName": "PM_DISP_STALL_BR_MPRED", + "BriefDescription": "Cycles when dispatch was stalled for this thread due to a mispredicted branch." + }, + { + "EventCode": "0x4E010", + "EventName": "PM_DISP_STALL_IC_L3MISS", + "BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from any source beyond the local L3." + }, { "EventCode": "0x4E012", "EventName": "PM_EXEC_STALL_UNKNOWN", "BriefDescription": "Cycles in which the oldest instruction in the pipeline completed without an ntf_type pulse. The ntf_pulse was missed by the ISU because the next-to-finish (NTF) instruction finishes and completions came too close together." }, + { + "EventCode": "0x4E01A", + "EventName": "PM_DISP_STALL_HELD_CYC", + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch for any reason." + }, { "EventCode": "0x4D020", "EventName": "PM_VSU3_ISSUE", "BriefDescription": "VSU instruction was issued to VSU pipe 3." }, { - "EventCode": "0x40132", - "EventName": "PM_MRK_LSU_FIN", - "BriefDescription": "LSU marked instruction finish." + "EventCode": "0x4003C", + "EventName": "PM_DISP_STALL_HELD_SYNC_CYC", + "BriefDescription": "Cycles in which the next-to-complete (NTC) instruction is held at dispatch because of a synchronizing instruction that requires the ICT to be empty before dispatch." }, { "EventCode": "0x45058", "EventName": "PM_IC_MISS_CMPL", "BriefDescription": "Non-speculative instruction cache miss, counted at completion." }, - { - "EventCode": "0x4D052", - "EventName": "PM_2FLOP_CMPL", - "BriefDescription": "Double Precision vector version of fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg completed." - }, { "EventCode": "0x400F2", "EventName": "PM_1PLUS_PPC_DISP", diff --git a/tools/perf/pmu-events/arch/powerpc/power10/pmc.json b/tools/perf/pmu-events/arch/powerpc/power10/pmc.json index 364fedbfb490b..0a2bf56ee7c10 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/pmc.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/pmc.json @@ -1,22 +1,197 @@ [ + { + "EventCode": "0x100FE", + "EventName": "PM_INST_CMPL", + "BriefDescription": "PowerPC instruction completed." + }, + { + "EventCode": "0x1000A", + "EventName": "PM_PMC3_REWIND", + "BriefDescription": "The speculative event selected for PMC3 rewinds and the counter for PMC3 is not charged." + }, + { + "EventCode": "0x10010", + "EventName": "PM_PMC4_OVERFLOW", + "BriefDescription": "The event selected for PMC4 caused the event counter to overflow." + }, + { + "EventCode": "0x1001C", + "EventName": "PM_ULTRAVISOR_INST_CMPL", + "BriefDescription": "PowerPC instruction completed while the thread was in ultravisor state." + }, + { + "EventCode": "0x100F0", + "EventName": "PM_CYC", + "BriefDescription": "Processor cycles." + }, + { + "EventCode": "0x10020", + "EventName": "PM_PMC4_REWIND", + "BriefDescription": "The speculative event selected for PMC4 rewinds and the counter for PMC4 is not charged." + }, + { + "EventCode": "0x10022", + "EventName": "PM_PMC2_SAVED", + "BriefDescription": "The conditions for the speculative event selected for PMC2 are met and PMC2 is charged." + }, + { + "EventCode": "0x10024", + "EventName": "PM_PMC5_OVERFLOW", + "BriefDescription": "The event selected for PMC5 caused the event counter to overflow." + }, + { + "EventCode": "0x1F15E", + "EventName": "PM_MRK_START_PROBE_NOP_CMPL", + "BriefDescription": "Marked Start probe nop (AND R0,R0,R0) completed." + }, + { + "EventCode": "0x1006C", + "EventName": "PM_RUN_CYC_ST_MODE", + "BriefDescription": "Cycles when the run latch is set and the core is in ST mode." + }, + { + "EventCode": "0x101E8", + "EventName": "PM_THRESH_EXC_256", + "BriefDescription": "Threshold counter exceeded a count of 256." + }, + { + "EventCode": "0x101EC", + "EventName": "PM_THRESH_MET", + "BriefDescription": "Threshold exceeded." + }, + { + "EventCode": "0x100FA", + "EventName": "PM_RUN_LATCH_ANY_THREAD_CYC", + "BriefDescription": "Cycles when at least one thread has the run latch set." + }, + { + "EventCode": "0x2000A", + "EventName": "PM_HYPERVISOR_CYC", + "BriefDescription": "Cycles when the thread is in Hypervisor state. MSR[S HV PR]=010." + }, + { + "EventCode": "0x2000C", + "EventName": "PM_RUN_LATCH_ALL_THREADS_CYC", + "BriefDescription": "Cycles when the run latch is set for all threads." + }, + { + "EventCode": "0x20010", + "EventName": "PM_PMC1_OVERFLOW", + "BriefDescription": "The event selected for PMC1 caused the event counter to overflow." + }, + { + "EventCode": "0x2006C", + "EventName": "PM_RUN_CYC_SMT4_MODE", + "BriefDescription": "Cycles when this thread's run latch is set and the core is in SMT4 mode." + }, + { + "EventCode": "0x201E6", + "EventName": "PM_THRESH_EXC_32", + "BriefDescription": "Threshold counter exceeded a value of 32." + }, + { + "EventCode": "0x201E8", + "EventName": "PM_THRESH_EXC_512", + "BriefDescription": "Threshold counter exceeded a value of 512." + }, + { + "EventCode": "0x200F4", + "EventName": "PM_RUN_CYC", + "BriefDescription": "Processor cycles gated by the run latch." + }, + { + "EventCode": "0x30010", + "EventName": "PM_PMC2_OVERFLOW", + "BriefDescription": "The event selected for PMC2 caused the event counter to overflow." + }, + { + "EventCode": "0x30020", + "EventName": "PM_PMC2_REWIND", + "BriefDescription": "The speculative event selected for PMC2 rewinds and the counter for PMC2 is not charged." + }, + { + "EventCode": "0x30022", + "EventName": "PM_PMC4_SAVED", + "BriefDescription": "The conditions for the speculative event selected for PMC4 are met and PMC4 is charged." + }, + { + "EventCode": "0x30024", + "EventName": "PM_PMC6_OVERFLOW", + "BriefDescription": "The event selected for PMC6 caused the event counter to overflow." + }, + { + "EventCode": "0x3006C", + "EventName": "PM_RUN_CYC_SMT2_MODE", + "BriefDescription": "Cycles when this thread's run latch is set and the core is in SMT2 mode." + }, { "EventCode": "0x301E8", "EventName": "PM_THRESH_EXC_64", "BriefDescription": "Threshold counter exceeded a value of 64." }, { - "EventCode": "0x45050", - "EventName": "PM_1FLOP_CMPL", - "BriefDescription": "One floating point instruction completed (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg)." + "EventCode": "0x301EA", + "EventName": "PM_THRESH_EXC_1024", + "BriefDescription": "Threshold counter exceeded a value of 1024." + }, + { + "EventCode": "0x40010", + "EventName": "PM_PMC3_OVERFLOW", + "BriefDescription": "The event selected for PMC3 caused the event counter to overflow." + }, + { + "EventCode": "0x40114", + "EventName": "PM_MRK_START_PROBE_NOP_DISP", + "BriefDescription": "Marked Start probe nop dispatched. Instruction AND R0,R0,R0." + }, + { + "EventCode": "0x4D010", + "EventName": "PM_PMC1_SAVED", + "BriefDescription": "The conditions for the speculative event selected for PMC1 are met and PMC1 is charged." + }, + { + "EventCode": "0x4D012", + "EventName": "PM_PMC3_SAVED", + "BriefDescription": "The conditions for the speculative event selected for PMC3 are met and PMC3 is charged." + }, + { + "EventCode": "0x4D022", + "EventName": "PM_HYPERVISOR_INST_CMPL", + "BriefDescription": "PowerPC instruction completed while the thread was in hypervisor state." + }, + { + "EventCode": "0x4D026", + "EventName": "PM_ULTRAVISOR_CYC", + "BriefDescription": "Cycles when the thread is in Ultravisor state. MSR[S HV PR]=110." + }, + { + "EventCode": "0x4D028", + "EventName": "PM_PRIVILEGED_CYC", + "BriefDescription": "Cycles when the thread is in Privileged state. MSR[S HV PR]=x00." + }, + { + "EventCode": "0x4D02C", + "EventName": "PM_PMC1_REWIND", + "BriefDescription": "The speculative event selected for PMC1 rewinds and the counter for PMC1 is not charged." + }, + { + "EventCode": "0x40030", + "EventName": "PM_INST_FIN", + "BriefDescription": "Instruction finished." + }, + { + "EventCode": "0x40134", + "EventName": "PM_MRK_INST_TIMEO", + "BriefDescription": "Marked instruction finish timeout (instruction was lost)." }, { - "EventCode": "0x45052", - "EventName": "PM_4FLOP_CMPL", - "BriefDescription": "Four floating point instruction completed (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg)." + "EventCode": "0x401EA", + "EventName": "PM_THRESH_EXC_128", + "BriefDescription": "Threshold counter exceeded a value of 128." }, { - "EventCode": "0x4D054", - "EventName": "PM_8FLOP_CMPL", - "BriefDescription": "Four Double Precision vector instruction completed." + "EventCode": "0x400FA", + "EventName": "PM_RUN_INST_CMPL", + "BriefDescription": "PowerPC instruction completed while the run latch is set." } ] diff --git a/tools/perf/pmu-events/arch/powerpc/power10/translation.json b/tools/perf/pmu-events/arch/powerpc/power10/translation.json index 961e2491e73f6..170c9aeb30d83 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/translation.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/translation.json @@ -1,29 +1,9 @@ [ - { - "EventCode": "0x1F15E", - "EventName": "PM_MRK_START_PROBE_NOP_CMPL", - "BriefDescription": "Marked Start probe nop (AND R0,R0,R0) completed." - }, { "EventCode": "0x20018", "EventName": "PM_ST_FWD", "BriefDescription": "Store forwards that finished." }, - { - "EventCode": "0x2011C", - "EventName": "PM_MRK_NTF_CYC", - "BriefDescription": "Cycles in which the marked instruction is the oldest in the pipeline (next-to-finish or next-to-complete)." - }, - { - "EventCode": "0x2E01C", - "EventName": "PM_EXEC_STALL_TLBIE", - "BriefDescription": "Cycles in which the oldest instruction in the pipeline was a TLBIE instruction executing in the Load Store Unit." - }, - { - "EventCode": "0x201E6", - "EventName": "PM_THRESH_EXC_32", - "BriefDescription": "Threshold counter exceeded a value of 32." - }, { "EventCode": "0x200F0", "EventName": "PM_ST_CMPL", @@ -33,20 +13,5 @@ "EventCode": "0x200FE", "EventName": "PM_DATA_FROM_L2MISS", "BriefDescription": "The processor's L1 data cache was reloaded from a source beyond the local core's L2 due to a demand miss." - }, - { - "EventCode": "0x30010", - "EventName": "PM_PMC2_OVERFLOW", - "BriefDescription": "The event selected for PMC2 caused the event counter to overflow." - }, - { - "EventCode": "0x4D010", - "EventName": "PM_PMC1_SAVED", - "BriefDescription": "The conditions for the speculative event selected for PMC1 are met and PMC1 is charged." - }, - { - "EventCode": "0x4D05C", - "EventName": "PM_DPP_FLOP_CMPL", - "BriefDescription": "Double-Precision or Quad-Precision instruction completed." } ]
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kajol Jain kjain@linux.ibm.com
[ Upstream commit edd65d2bc55fb84d7b80c2ffe3b74d9b11ac4e2f ]
Update metric event name for some of the JSON/metric events for power10 platform.
Fixes: 3ca3af7d1f230d1f ("perf vendor events power10: Add metric events JSON file for power10 platform") Signed-off-by: Kajol Jain kjain@linux.ibm.com Cc: Athira Rajeev atrajeev@linux.vnet.ibm.com Cc: Disha Goel disgoel@linux.ibm.com Cc: Ian Rogers irogers@google.com Cc: Kajol Jain kjain@linux.ibm.com Cc: Madhavan Srinivasan maddy@linux.ibm.com Cc: Namhyung Kim namhyung@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230814112803.1508296-6-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../arch/powerpc/power10/metrics.json | 50 +++++++++---------- 1 file changed, 25 insertions(+), 25 deletions(-)
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/metrics.json b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json index e3087eb1ccff8..182369076d956 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/metrics.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json @@ -16,133 +16,133 @@ "BriefDescription": "Average cycles per completed instruction when dispatch was stalled for any reason", "MetricExpr": "PM_DISP_STALL_CYC / PM_RUN_INST_CMPL", "MetricGroup": "CPI;CPI_STALL_RATIO", - "MetricName": "DISPATCHED_CPI" + "MetricName": "DISPATCH_STALL_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled because there was a flush", "MetricExpr": "PM_DISP_STALL_FLUSH / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_FLUSH_CPI" + "MetricName": "DISPATCH_STALL_FLUSH_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled because the MMU was handling a translation miss", "MetricExpr": "PM_DISP_STALL_TRANSLATION / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_TRANSLATION_CPI" + "MetricName": "DISPATCH_STALL_TRANSLATION_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled waiting to resolve an instruction ERAT miss", "MetricExpr": "PM_DISP_STALL_IERAT_ONLY_MISS / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_IERAT_ONLY_MISS_CPI" + "MetricName": "DISPATCH_STALL_IERAT_ONLY_MISS_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled waiting to resolve an instruction TLB miss", "MetricExpr": "PM_DISP_STALL_ITLB_MISS / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_ITLB_MISS_CPI" + "MetricName": "DISPATCH_STALL_ITLB_MISS_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled due to an icache miss", "MetricExpr": "PM_DISP_STALL_IC_MISS / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_IC_MISS_CPI" + "MetricName": "DISPATCH_STALL_IC_MISS_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled while the instruction was fetched from the local L2", "MetricExpr": "PM_DISP_STALL_IC_L2 / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_IC_L2_CPI" + "MetricName": "DISPATCH_STALL_IC_L2_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled while the instruction was fetched from the local L3", "MetricExpr": "PM_DISP_STALL_IC_L3 / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_IC_L3_CPI" + "MetricName": "DISPATCH_STALL_IC_L3_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled while the instruction was fetched from any source beyond the local L3", "MetricExpr": "PM_DISP_STALL_IC_L3MISS / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_IC_L3MISS_CPI" + "MetricName": "DISPATCH_STALL_IC_L3MISS_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled due to an icache miss after a branch mispredict", "MetricExpr": "PM_DISP_STALL_BR_MPRED_ICMISS / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_BR_MPRED_ICMISS_CPI" + "MetricName": "DISPATCH_STALL_BR_MPRED_ICMISS_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled while instruction was fetched from the local L2 after suffering a branch mispredict", "MetricExpr": "PM_DISP_STALL_BR_MPRED_IC_L2 / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_BR_MPRED_IC_L2_CPI" + "MetricName": "DISPATCH_STALL_BR_MPRED_IC_L2_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled while instruction was fetched from the local L3 after suffering a branch mispredict", "MetricExpr": "PM_DISP_STALL_BR_MPRED_IC_L3 / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_BR_MPRED_IC_L3_CPI" + "MetricName": "DISPATCH_STALL_BR_MPRED_IC_L3_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled while instruction was fetched from any source beyond the local L3 after suffering a branch mispredict", "MetricExpr": "PM_DISP_STALL_BR_MPRED_IC_L3MISS / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_BR_MPRED_IC_L3MISS_CPI" + "MetricName": "DISPATCH_STALL_BR_MPRED_IC_L3MISS_CPI" }, { "BriefDescription": "Average cycles per completed instruction when dispatch was stalled due to a branch mispredict", "MetricExpr": "PM_DISP_STALL_BR_MPRED / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_BR_MPRED_CPI" + "MetricName": "DISPATCH_STALL_BR_MPRED_CPI" }, { "BriefDescription": "Average cycles per completed instruction when the NTC instruction was held at dispatch for any reason", "MetricExpr": "PM_DISP_STALL_HELD_CYC / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_HELD_CPI" + "MetricName": "DISPATCH_STALL_HELD_CPI" }, { "BriefDescription": "Average cycles per completed instruction when the NTC instruction was held at dispatch because of a synchronizing instruction that requires the ICT to be empty before dispatch", "MetricExpr": "PM_DISP_STALL_HELD_SYNC_CYC / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISP_HELD_STALL_SYNC_CPI" + "MetricName": "DISPATCH_STALL_HELD_SYNC_CPI" }, { "BriefDescription": "Average cycles per completed instruction when the NTC instruction was held at dispatch while waiting on the scoreboard", "MetricExpr": "PM_DISP_STALL_HELD_SCOREBOARD_CYC / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISP_HELD_STALL_SCOREBOARD_CPI" + "MetricName": "DISPATCH_STALL_HELD_SCOREBOARD_CPI" }, { "BriefDescription": "Average cycles per completed instruction when the NTC instruction was held at dispatch due to issue queue full", "MetricExpr": "PM_DISP_STALL_HELD_ISSQ_FULL_CYC / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISP_HELD_STALL_ISSQ_FULL_CPI" + "MetricName": "DISPATCH_STALL_HELD_ISSQ_FULL_CPI" }, { "BriefDescription": "Average cycles per completed instruction when the NTC instruction was held at dispatch because the mapper/SRB was full", "MetricExpr": "PM_DISP_STALL_HELD_RENAME_CYC / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_HELD_RENAME_CPI" + "MetricName": "DISPATCH_STALL_HELD_RENAME_CPI" }, { "BriefDescription": "Average cycles per completed instruction when the NTC instruction was held at dispatch because the STF mapper/SRB was full", "MetricExpr": "PM_DISP_STALL_HELD_STF_MAPPER_CYC / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_HELD_STF_MAPPER_CPI" + "MetricName": "DISPATCH_STALL_HELD_STF_MAPPER_CPI" }, { "BriefDescription": "Average cycles per completed instruction when the NTC instruction was held at dispatch because the XVFC mapper/SRB was full", "MetricExpr": "PM_DISP_STALL_HELD_XVFC_MAPPER_CYC / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_HELD_XVFC_MAPPER_CPI" + "MetricName": "DISPATCH_STALL_HELD_XVFC_MAPPER_CPI" }, { "BriefDescription": "Average cycles per completed instruction when the NTC instruction was held at dispatch for any other reason", "MetricExpr": "PM_DISP_STALL_HELD_OTHER_CYC / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_HELD_OTHER_CPI" + "MetricName": "DISPATCH_STALL_HELD_OTHER_CPI" }, { "BriefDescription": "Average cycles per completed instruction when the NTC instruction has been dispatched but not issued for any reason", @@ -352,13 +352,13 @@ "BriefDescription": "Average cycles per completed instruction when dispatch was stalled because fetch was being held, so there was nothing in the pipeline for this thread", "MetricExpr": "PM_DISP_STALL_FETCH / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_FETCH_CPI" + "MetricName": "DISPATCH_STALL_FETCH_CPI" }, { "BriefDescription": "Average cycles per completed instruction when the NTC instruction was held at dispatch because of power management", "MetricExpr": "PM_DISP_STALL_HELD_HALT_CYC / PM_RUN_INST_CMPL", "MetricGroup": "CPI", - "MetricName": "DISPATCHED_HELD_HALT_CPI" + "MetricName": "DISPATCH_STALL_HELD_HALT_CPI" }, { "BriefDescription": "Percentage of flushes per completed instruction", @@ -560,7 +560,7 @@ "BriefDescription": "Average number of STCX instructions finshed per completed instruction", "MetricExpr": "PM_STCX_FIN / PM_RUN_INST_CMPL", "MetricGroup": "General", - "MetricName": "STXC_PER_INST" + "MetricName": "STCX_PER_INST" }, { "BriefDescription": "Average number of LARX instructions finshed per completed instruction",
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnaldo Carvalho de Melo acme@redhat.com
[ Upstream commit ef23cb593304bde0cc046fd4cc83ae7ea2e24f16 ]
While debugging a segfault on 'perf lock contention' without an available perf.data file I noticed that it was basically calling:
perf_session__delete(ERR_PTR(-1))
Resulting in:
(gdb) run lock contention Starting program: /root/bin/perf lock contention [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". failed to open perf.data: No such file or directory (try 'perf record' first) Initializing perf session failed
Program received signal SIGSEGV, Segmentation fault. 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 2858 if (!session->auxtrace) (gdb) p session $1 = (struct perf_session *) 0xffffffffffffffff (gdb) bt #0 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 #1 0x000000000057bb4d in perf_session__delete (session=0xffffffffffffffff) at util/session.c:300 #2 0x000000000047c421 in __cmd_contention (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2161 #3 0x000000000047dc95 in cmd_lock (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2604 #4 0x0000000000501466 in run_builtin (p=0xe597a8 <commands+552>, argc=2, argv=0x7fffffffe200) at perf.c:322 #5 0x00000000005016d5 in handle_internal_command (argc=2, argv=0x7fffffffe200) at perf.c:375 #6 0x0000000000501824 in run_argv (argcp=0x7fffffffe02c, argv=0x7fffffffe020) at perf.c:419 #7 0x0000000000501b11 in main (argc=2, argv=0x7fffffffe200) at perf.c:535 (gdb)
So just set it to NULL after using PTR_ERR(session) to decode the error as perf_session__delete(NULL) is supported.
The same problem was found in 'perf top' after an audit of all perf_session__new() failure handling.
Fixes: 6ef81c55a2b6584c ("perf session: Return error code for perf_session__new() function on failure") Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Alexey Budankov alexey.budankov@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jeremie Galarneau jeremie.galarneau@efficios.com Cc: Jiri Olsa jolsa@kernel.org Cc: Kate Stewart kstewart@linuxfoundation.org Cc: Mamatha Inamdar mamatha4@linux.vnet.ibm.com Cc: Mukesh Ojha mojha@codeaurora.org Cc: Nageswara R Sastry rnsastry@linux.vnet.ibm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi Bangoria ravi.bangoria@linux.ibm.com Cc: Shawn Landden shawn@git.icu Cc: Song Liu songliubraving@fb.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Tzvetomir Stoyanov tstoyanov@vmware.com Link: https://lore.kernel.org/lkml/ZN4Q2rxxsL08A8rd@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/builtin-top.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 1baa2acb3cedd..ea8c7eca5eeed 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -1805,6 +1805,7 @@ int cmd_top(int argc, const char **argv) top.session = perf_session__new(NULL, NULL); if (IS_ERR(top.session)) { status = PTR_ERR(top.session); + top.session = NULL; goto out_delete_evlist; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnaldo Carvalho de Melo acme@redhat.com
[ Upstream commit abaf1e0355abb050f9c11d2d13a513caec80f7ad ]
While debugging a segfault on 'perf lock contention' without an available perf.data file I noticed that it was basically calling:
perf_session__delete(ERR_PTR(-1))
Resulting in:
(gdb) run lock contention Starting program: /root/bin/perf lock contention [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". failed to open perf.data: No such file or directory (try 'perf record' first) Initializing perf session failed
Program received signal SIGSEGV, Segmentation fault. 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 2858 if (!session->auxtrace) (gdb) p session $1 = (struct perf_session *) 0xffffffffffffffff (gdb) bt #0 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 #1 0x000000000057bb4d in perf_session__delete (session=0xffffffffffffffff) at util/session.c:300 #2 0x000000000047c421 in __cmd_contention (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2161 #3 0x000000000047dc95 in cmd_lock (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2604 #4 0x0000000000501466 in run_builtin (p=0xe597a8 <commands+552>, argc=2, argv=0x7fffffffe200) at perf.c:322 #5 0x00000000005016d5 in handle_internal_command (argc=2, argv=0x7fffffffe200) at perf.c:375 #6 0x0000000000501824 in run_argv (argcp=0x7fffffffe02c, argv=0x7fffffffe020) at perf.c:419 #7 0x0000000000501b11 in main (argc=2, argv=0x7fffffffe200) at perf.c:535 (gdb)
So just set it to NULL after using PTR_ERR(session) to decode the error as perf_session__delete(NULL) is supported.
Fixes: eef4fee5e52071d5 ("perf lock: Dynamically allocate lockhash_table") Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@redhat.com Cc: Jiri Olsa jolsa@kernel.org Cc: K Prateek Nayak kprateek.nayak@amd.com Cc: Kan Liang kan.liang@linux.intel.com Cc: Leo Yan leo.yan@linaro.org Cc: Mamatha Inamdar mamatha4@linux.vnet.ibm.com Cc: Mark Rutland mark.rutland@arm.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi Bangoria ravi.bangoria@amd.com Cc: Ross Zwisler zwisler@chromium.org Cc: Sean Christopherson seanjc@google.com Cc: Steven Rostedt (VMware) rostedt@goodmis.org Cc: Tiezhu Yang yangtiezhu@loongson.cn Cc: Yang Jihong yangjihong1@huawei.com Link: https://lore.kernel.org/lkml/ZN4R1AYfsD2J8lRs@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/builtin-lock.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c index c15386cb10331..0cf1c5a2e0323 100644 --- a/tools/perf/builtin-lock.c +++ b/tools/perf/builtin-lock.c @@ -2052,6 +2052,7 @@ static int __cmd_contention(int argc, const char **argv) if (IS_ERR(session)) { pr_err("Initializing perf session failed\n"); err = PTR_ERR(session); + session = NULL; goto out_delete; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Raag Jadav raag.jadav@intel.com
[ Upstream commit cf38e7691c85f1b09973b22a0b89bf1e1228d2f9 ]
When built with CONFIG_INTEL_MID_WATCHDOG=m, currently the driver needs to be loaded manually, for the lack of module alias. This causes unintended resets in cases where watchdog timer is set-up by bootloader and the driver is not explicitly loaded. Add MODULE_ALIAS() to load the driver automatically at boot and avoid this issue.
Fixes: 87a1ef8058d9 ("watchdog: add Intel MID watchdog driver support") Signed-off-by: Raag Jadav raag.jadav@intel.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/r/20230811120220.31578-1-raag.jadav@intel.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Wim Van Sebroeck wim@linux-watchdog.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/watchdog/intel-mid_wdt.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/watchdog/intel-mid_wdt.c b/drivers/watchdog/intel-mid_wdt.c index 9b2173f765c8c..fb7fae750181b 100644 --- a/drivers/watchdog/intel-mid_wdt.c +++ b/drivers/watchdog/intel-mid_wdt.c @@ -203,3 +203,4 @@ module_platform_driver(mid_wdt_driver); MODULE_AUTHOR("David Cohen david.a.cohen@linux.intel.com"); MODULE_DESCRIPTION("Watchdog Driver for Intel MID platform"); MODULE_LICENSE("GPL"); +MODULE_ALIAS("platform:intel_mid_wdt");
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilkka Koskinen ilkka@os.amperecomputing.com
[ Upstream commit b8af10062df3c23fe002c3f187389bb263b3eb20 ]
amperene/cache.json file tried to include L1D_CACHE_LMISS while it doesn't exist in common-and-microarch.json. While this bug doesn't seem to cause issue in newer kernels with jevents.py script, it prevents building older perf tools with the backported patch.
Fixes: a9650b7f6fc09d16 ("perf vendor events arm64: Add AmpereOne core PMU events") Reported-by: Dave Kleikamp dave.kleikamp@oracle.com Reviewed-by: Ian Rogers irogers@google.com Reviewed-by: John Garry john.g.garry@oracle.com Signed-off-by: Ilkka Koskinen ilkka@os.amperecomputing.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Ilkka Koskinen ilkka@os.amperecomputing.com Cc: Ingo Molnar mingo@redhat.com Cc: James Clark james.clark@arm.com Cc: Jiri Olsa jolsa@kernel.org Cc: Leo Yan leo.yan@linaro.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Will Deacon will@kernel.org Cc: linux-arm-kernel@lists.infradead.org Closes: https://lore.kernel.org/all/76bb2e47-ce44-76ae-838e-53279047084d@oracle.com/ Link: https://lore.kernel.org/r/20230803211331.140553-2-ilkka@os.amperecomputing.c... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/pmu-events/arch/arm64/ampere/ampereone/cache.json | 3 --- 1 file changed, 3 deletions(-)
diff --git a/tools/perf/pmu-events/arch/arm64/ampere/ampereone/cache.json b/tools/perf/pmu-events/arch/arm64/ampere/ampereone/cache.json index fc06330542116..7a2b7b200f144 100644 --- a/tools/perf/pmu-events/arch/arm64/ampere/ampereone/cache.json +++ b/tools/perf/pmu-events/arch/arm64/ampere/ampereone/cache.json @@ -92,9 +92,6 @@ { "ArchStdEvent": "L1D_CACHE_LMISS_RD" }, - { - "ArchStdEvent": "L1D_CACHE_LMISS" - }, { "ArchStdEvent": "L1I_CACHE_LMISS" },
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Zapolskiy vz@mleia.com
[ Upstream commit 4aae44f65827f0213a7361cf9c32cfe06114473f ]
Because LPC32xx PWM controllers have only a single output which is registered as the only PWM device/channel per controller, it is known in advance that pwm->hwpwm value is always 0. On basis of this fact simplify the code by removing operations with pwm->hwpwm, there is no controls which require channel number as input.
Even though I wasn't aware at the time when I forward ported that patch, this fixes a null pointer dereference as lpc32xx->chip.pwms is NULL before devm_pwmchip_add() is called.
Reported-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: Vladimir Zapolskiy vz@mleia.com Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Fixes: 3d2813fb17e5 ("pwm: lpc32xx: Don't modify HW state in .probe() after the PWM chip was registered") Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pwm/pwm-lpc32xx.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/pwm/pwm-lpc32xx.c b/drivers/pwm/pwm-lpc32xx.c index 86a0ea0f6955c..806f0bb3ad6d8 100644 --- a/drivers/pwm/pwm-lpc32xx.c +++ b/drivers/pwm/pwm-lpc32xx.c @@ -51,10 +51,10 @@ static int lpc32xx_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm, if (duty_cycles > 255) duty_cycles = 255;
- val = readl(lpc32xx->base + (pwm->hwpwm << 2)); + val = readl(lpc32xx->base); val &= ~0xFFFF; val |= (period_cycles << 8) | duty_cycles; - writel(val, lpc32xx->base + (pwm->hwpwm << 2)); + writel(val, lpc32xx->base);
return 0; } @@ -69,9 +69,9 @@ static int lpc32xx_pwm_enable(struct pwm_chip *chip, struct pwm_device *pwm) if (ret) return ret;
- val = readl(lpc32xx->base + (pwm->hwpwm << 2)); + val = readl(lpc32xx->base); val |= PWM_ENABLE; - writel(val, lpc32xx->base + (pwm->hwpwm << 2)); + writel(val, lpc32xx->base);
return 0; } @@ -81,9 +81,9 @@ static void lpc32xx_pwm_disable(struct pwm_chip *chip, struct pwm_device *pwm) struct lpc32xx_pwm_chip *lpc32xx = to_lpc32xx_pwm_chip(chip); u32 val;
- val = readl(lpc32xx->base + (pwm->hwpwm << 2)); + val = readl(lpc32xx->base); val &= ~PWM_ENABLE; - writel(val, lpc32xx->base + (pwm->hwpwm << 2)); + writel(val, lpc32xx->base);
clk_disable_unprepare(lpc32xx->clk); } @@ -141,9 +141,9 @@ static int lpc32xx_pwm_probe(struct platform_device *pdev) lpc32xx->chip.npwm = 1;
/* If PWM is disabled, configure the output to the default value */ - val = readl(lpc32xx->base + (lpc32xx->chip.pwms[0].hwpwm << 2)); + val = readl(lpc32xx->base); val &= ~PWM_PIN_LEVEL; - writel(val, lpc32xx->base + (lpc32xx->chip.pwms[0].hwpwm << 2)); + writel(val, lpc32xx->base);
ret = devm_pwmchip_add(&pdev->dev, &lpc32xx->chip); if (ret < 0) {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Justin Stitt justinstitt@google.com
[ Upstream commit 4b2fd81f2af7147e844ecec0c5c07a16bca6b86e ]
`strncpy` is deprecated for use on NUL-terminated destination strings [1].
A suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on its destination buffer argument which is _not_ the case for `strncpy`!
Also remove extraneous if-statement as it can never be entered. The return value from `strncpy` is it's first argument. In this case, `...dyndbg_cmd` is an array: | char dyndbg_cmd[VPU_DYNDBG_CMD_MAX_LEN]; ^^^^^^^^^^ This can never be NULL which means `strncpy`'s return value cannot be NULL here. Just use `strscpy` which is more robust and results in simpler and less ambiguous code.
Moreover, remove needless `... - 1` as `strscpy`'s implementation ensures NUL-termination and we do not need to carefully dance around ending boundaries with a "- 1" anymore.
Fixes: 5d7422cfb498 ("accel/ivpu: Add IPC driver and JSM messages") Link: www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt justinstitt@google.com Reviewed-by: Stanislaw Gruszka stanislaw.gruszka@linux.intel.com Signed-off-by: Stanislaw Gruszka stanislaw.gruszka@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230824-strncpy-drivers-accel... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/accel/ivpu/ivpu_jsm_msg.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/accel/ivpu/ivpu_jsm_msg.c b/drivers/accel/ivpu/ivpu_jsm_msg.c index 831bfd2b2d39d..bdddef2c59eec 100644 --- a/drivers/accel/ivpu/ivpu_jsm_msg.c +++ b/drivers/accel/ivpu/ivpu_jsm_msg.c @@ -118,8 +118,7 @@ int ivpu_jsm_dyndbg_control(struct ivpu_device *vdev, char *command, size_t size struct vpu_jsm_msg resp; int ret;
- if (!strncpy(req.payload.dyndbg_control.dyndbg_cmd, command, VPU_DYNDBG_CMD_MAX_LEN - 1)) - return -ENOMEM; + strscpy(req.payload.dyndbg_control.dyndbg_cmd, command, VPU_DYNDBG_CMD_MAX_LEN);
ret = ivpu_ipc_send_receive(vdev, &req, VPU_JSM_MSG_DYNDBG_CONTROL_RSP, &resp, VPU_IPC_CHAN_ASYNC_CMD, vdev->timeout.jsm);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Rogers irogers@google.com
[ Upstream commit 9897009eecae821efc684ecdd1d04584f5501509 ]
PMU caps are written as HEADER_PMU_CAPS or for the special case of the PMU "cpu" as HEADER_CPU_PMU_CAPS. As the PMU "cpu" is special, and not any "core" PMU, the logic had become broken and core PMUs not called "cpu" were not having their caps written.
This affects ARM and s390 non-hybrid PMUs.
Simplify the PMU caps writing logic to scan one fewer time and to be more explicit in its behavior.
Fixes: 178ddf3bad981380 ("perf header: Avoid hybrid PMU list in write_pmu_caps") Reported-by: Wei Li liwei391@huawei.com Signed-off-by: Ian Rogers irogers@google.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Athira Rajeev atrajeev@linux.vnet.ibm.com Cc: Huacai Chen chenhuacai@kernel.org Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@redhat.com Cc: James Clark james.clark@arm.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Garry john.g.garry@oracle.com Cc: K Prateek Nayak kprateek.nayak@amd.com Cc: Kajol Jain kjain@linux.ibm.com Cc: Kan Liang kan.liang@linux.intel.com Cc: Leo Yan leo.yan@linaro.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Ming Wang wangming01@loongson.cn Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi Bangoria ravi.bangoria@amd.com Cc: Sean Christopherson seanjc@google.com Cc: Suzuki Poulouse suzuki.poulose@arm.com Cc: Will Deacon will@kernel.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230825024002.801955-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/header.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index 52fbf526fe74a..13c71d28e0eb3 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -1605,8 +1605,15 @@ static int write_pmu_caps(struct feat_fd *ff, int ret;
while ((pmu = perf_pmus__scan(pmu))) { - if (!pmu->name || !strcmp(pmu->name, "cpu") || - perf_pmu__caps_parse(pmu) <= 0) + if (!strcmp(pmu->name, "cpu")) { + /* + * The "cpu" PMU is special and covered by + * HEADER_CPU_PMU_CAPS. Note, core PMUs are + * counted/written here for ARM, s390 and Intel hybrid. + */ + continue; + } + if (perf_pmu__caps_parse(pmu) <= 0) continue; nr_pmu++; } @@ -1619,23 +1626,17 @@ static int write_pmu_caps(struct feat_fd *ff, return 0;
/* - * Write hybrid pmu caps first to maintain compatibility with - * older perf tool. + * Note older perf tools assume core PMUs come first, this is a property + * of perf_pmus__scan. */ - if (perf_pmus__num_core_pmus() > 1) { - pmu = NULL; - while ((pmu = perf_pmus__scan_core(pmu))) { - ret = __write_pmu_caps(ff, pmu, true); - if (ret < 0) - return ret; - } - } - pmu = NULL; while ((pmu = perf_pmus__scan(pmu))) { - if (pmu->is_core || !pmu->nr_caps) + if (!strcmp(pmu->name, "cpu")) { + /* Skip as above. */ + continue; + } + if (perf_pmu__caps_parse(pmu) <= 0) continue; - ret = __write_pmu_caps(ff, pmu, true); if (ret < 0) return ret;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miquel Raynal miquel.raynal@bootlin.com
[ Upstream commit 5496eac6ad7428fa06811a8c34b3a15beb93b86d ]
The 'saved_regs' member of the 'svc_i3c_master' structure is not described in the kernel doc, which produces the following warning:
Function parameter or member 'saved_regs' not described in 'svc_i3c_master'
Add the missing line in the kernel documentation of the parent structure.
Fixes: 1c5ee2a77b1b ("i3c: master: svc: fix i3c suspend/resume issue") Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202308171435.0xQ82lvu-lkp@intel.com/ Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/r/20230817101853.16805-1-miquel.raynal@bootlin.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i3c/master/svc-i3c-master.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c index 2fefbe55c1675..6c43992c8cf6b 100644 --- a/drivers/i3c/master/svc-i3c-master.c +++ b/drivers/i3c/master/svc-i3c-master.c @@ -156,6 +156,7 @@ struct svc_i3c_regs_save { * @base: I3C master controller * @dev: Corresponding device * @regs: Memory mapping + * @saved_regs: Volatile values for PM operations * @free_slots: Bit array of available slots * @addrs: Array containing the dynamic addresses of each attached device * @descs: Array of descriptors, one per attached device
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kajol Jain kjain@linux.ibm.com
[ Upstream commit 0dd1f815545d7210150642741c364521cc5cf116 ]
Running shellcheck on lock_contention.sh generates below warning:
In stat_bpf_counters_cgrp.sh line 28: if [ -d /sys/fs/cgroup/system.slice -a -d /sys/fs/cgroup/user.slice ]; then ^-- SC2166 (warning): Prefer [ p ] && [ q ] as [ p -a q ] is not well defined.
In stat_bpf_counters_cgrp.sh line 34: local self_cgrp=$(grep perf_event /proc/self/cgroup | cut -d: -f3) ^-------------^ SC3043 (warning): In POSIX sh, 'local' is undefined. ^-------^ SC2155 (warning): Declare and assign separately to avoid masking return values. ^-- SC2046 (warning): Quote this to prevent word splitting.
In stat_bpf_counters_cgrp.sh line 51: local output ^----------^ SC3043 (warning): In POSIX sh, 'local' is undefined.
In stat_bpf_counters_cgrp.sh line 65: local output ^----------^ SC3043 (warning): In POSIX sh, 'local' is undefined.
Fixed above warnings by: - Changing the expression [p -a q] to [p] && [q]. - Fixing shellcheck warnings for local usage, by prefixing function name to the variable.
Signed-off-by: Kajol Jain kjain@linux.ibm.com Acked-by: Ian Rogers irogers@google.com Cc: Disha Goel disgoel@linux.vnet.ibm.com Cc: Jiri Olsa jolsa@kernel.org Cc: Madhavan Srinivasan maddy@linux.ibm.com Cc: Namhyung Kim namhyung@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-6-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Stable-dep-of: a84260e31402 ("perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test") Signed-off-by: Sasha Levin sashal@kernel.org --- .../tests/shell/stat_bpf_counters_cgrp.sh | 28 ++++++++----------- 1 file changed, 12 insertions(+), 16 deletions(-)
diff --git a/tools/perf/tests/shell/stat_bpf_counters_cgrp.sh b/tools/perf/tests/shell/stat_bpf_counters_cgrp.sh index d724855d097c2..a74440a00b6b6 100755 --- a/tools/perf/tests/shell/stat_bpf_counters_cgrp.sh +++ b/tools/perf/tests/shell/stat_bpf_counters_cgrp.sh @@ -25,22 +25,22 @@ check_bpf_counter() find_cgroups() { # try usual systemd slices first - if [ -d /sys/fs/cgroup/system.slice -a -d /sys/fs/cgroup/user.slice ]; then + if [ -d /sys/fs/cgroup/system.slice ] && [ -d /sys/fs/cgroup/user.slice ]; then test_cgroups="system.slice,user.slice" return fi
# try root and self cgroups - local self_cgrp=$(grep perf_event /proc/self/cgroup | cut -d: -f3) - if [ -z ${self_cgrp} ]; then + find_cgroups_self_cgrp=$(grep perf_event /proc/self/cgroup | cut -d: -f3) + if [ -z ${find_cgroups_self_cgrp} ]; then # cgroup v2 doesn't specify perf_event - self_cgrp=$(grep ^0: /proc/self/cgroup | cut -d: -f3) + find_cgroups_self_cgrp=$(grep ^0: /proc/self/cgroup | cut -d: -f3) fi
- if [ -z ${self_cgrp} ]; then + if [ -z ${find_cgroups_self_cgrp} ]; then test_cgroups="/" else - test_cgroups="/,${self_cgrp}" + test_cgroups="/,${find_cgroups_self_cgrp}" fi }
@@ -48,13 +48,11 @@ find_cgroups() # Just check if it runs without failure and has non-zero results. check_system_wide_counted() { - local output - - output=$(perf stat -a --bpf-counters --for-each-cgroup ${test_cgroups} -e cpu-clock -x, sleep 1 2>&1) - if echo ${output} | grep -q -F "<not "; then + check_system_wide_counted_output=$(perf stat -a --bpf-counters --for-each-cgroup ${test_cgroups} -e cpu-clock -x, sleep 1 2>&1) + if echo ${check_system_wide_counted_output} | grep -q -F "<not "; then echo "Some system-wide events are not counted" if [ "${verbose}" = "1" ]; then - echo ${output} + echo ${check_system_wide_counted_output} fi exit 1 fi @@ -62,13 +60,11 @@ check_system_wide_counted()
check_cpu_list_counted() { - local output - - output=$(perf stat -C 1 --bpf-counters --for-each-cgroup ${test_cgroups} -e cpu-clock -x, taskset -c 1 sleep 1 2>&1) - if echo ${output} | grep -q -F "<not "; then + check_cpu_list_counted_output=$(perf stat -C 1 --bpf-counters --for-each-cgroup ${test_cgroups} -e cpu-clock -x, taskset -c 1 sleep 1 2>&1) + if echo ${check_cpu_list_counted_output} | grep -q -F "<not "; then echo "Some CPU events are not counted" if [ "${verbose}" = "1" ]; then - echo ${output} + echo ${check_cpu_list_counted_output} fi exit 1 fi
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namhyung Kim namhyung@kernel.org
[ Upstream commit a84260e314029e6dc9904fd6eabf8d9fd7965351 ]
It has system-wide test and cpu-list test but the cpu-list test fails sometimes. It runs sleep command on CPU1 and measure both user.slice and system.slice cgroups by default (on systemd-based systems).
But if the system was idle enough, sometime the system.slice gets no count and it makes the test failing. Maybe that's because it only looks at the CPU1, let's add CPU0 to increase the chance it finds some tasks.
Fixes: 7901086014bbaa3a ("perf test: Add a new test for perf stat cgroup BPF counter") Reported-by: Arnaldo Carvalho de Melo acme@kernel.org Signed-off-by: Namhyung Kim namhyung@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@kernel.org Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230825164152.165610-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/tests/shell/stat_bpf_counters_cgrp.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/tests/shell/stat_bpf_counters_cgrp.sh b/tools/perf/tests/shell/stat_bpf_counters_cgrp.sh index a74440a00b6b6..e75d0780dc788 100755 --- a/tools/perf/tests/shell/stat_bpf_counters_cgrp.sh +++ b/tools/perf/tests/shell/stat_bpf_counters_cgrp.sh @@ -60,7 +60,7 @@ check_system_wide_counted()
check_cpu_list_counted() { - check_cpu_list_counted_output=$(perf stat -C 1 --bpf-counters --for-each-cgroup ${test_cgroups} -e cpu-clock -x, taskset -c 1 sleep 1 2>&1) + check_cpu_list_counted_output=$(perf stat -C 0,1 --bpf-counters --for-each-cgroup ${test_cgroups} -e cpu-clock -x, taskset -c 1 sleep 1 2>&1) if echo ${check_cpu_list_counted_output} | grep -q -F "<not "; then echo "Some CPU events are not counted" if [ "${verbose}" = "1" ]; then
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vincent Whitchurch vincent.whitchurch@axis.com
[ Upstream commit c69290557c7571dff3d995fa27619b965915e8a1 ]
There are 256 possible voltage settings for each range, not 256 possible voltage settings in total.
Fixes: 15a1cd245d5b ("regulator: tps6287x: Fix missing .n_voltages setting") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com Link: https://lore.kernel.org/r/20230829-tps-voltages-v1-1-7ba4f958a194@axis.com Signed-off-by: Mark Brown <broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/tps6287x-regulator.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/regulator/tps6287x-regulator.c b/drivers/regulator/tps6287x-regulator.c index b1c0963586ace..e45579a4498c6 100644 --- a/drivers/regulator/tps6287x-regulator.c +++ b/drivers/regulator/tps6287x-regulator.c @@ -119,7 +119,7 @@ static struct regulator_desc tps6287x_reg = { .ramp_mask = TPS6287X_CTRL1_VRAMP, .ramp_delay_table = tps6287x_ramp_table, .n_ramp_values = ARRAY_SIZE(tps6287x_ramp_table), - .n_voltages = 256, + .n_voltages = 256 * ARRAY_SIZE(tps6287x_voltage_ranges), .linear_ranges = tps6287x_voltage_ranges, .n_linear_ranges = ARRAY_SIZE(tps6287x_voltage_ranges), .linear_range_selectors = tps6287x_voltage_range_sel,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yonghong Song yonghong.song@linux.dev
[ Upstream commit 5439cfa7fe612e7d02d5a1234feda3fa6e483ba7 ]
Occasionally, with './test_progs -j' on my vm, I will hit the following failure:
test_cgrp_local_storage:PASS:join_cgroup /cgrp_local_storage 0 nsec test_cgroup_iter_sleepable:PASS:skel_open 0 nsec test_cgroup_iter_sleepable:PASS:skel_load 0 nsec test_cgroup_iter_sleepable:PASS:attach_iter 0 nsec test_cgroup_iter_sleepable:PASS:iter_create 0 nsec test_cgroup_iter_sleepable:FAIL:cgroup_id unexpected cgroup_id: actual 1 != expected 2812 #48/5 cgrp_local_storage/cgroup_iter_sleepable:FAIL #48 cgrp_local_storage:FAIL
Finally, I decided to do some investigation since the test is introduced by myself. It turns out the reason is due to cgroup_fd with value 0. In cgroup_iter, a cgroup_fd of value 0 means the root cgroup.
/* from cgroup_iter.c */ if (fd) cgrp = cgroup_v1v2_get_from_fd(fd); else if (id) cgrp = cgroup_get_from_id(id); else /* walk the entire hierarchy by default. */ cgrp = cgroup_get_from_path("/");
That is why we got cgroup_id 1 instead of expected 2812.
Why we got a cgroup_fd 0? Nobody should really touch 'stdin' (fd 0) in test_progs. I traced 'close' syscall with stack trace and found the root cause, which is a bug in bpf_obj_pinning.c. Basically, the code closed fd 0 although it should not. Fixing the bug in bpf_obj_pinning.c also resolved the above cgroup_iter_sleepable subtest failure.
Fixes: 3b22f98e5a05 ("selftests/bpf: Add path_fd-based BPF_OBJ_PIN and BPF_OBJ_GET tests") Signed-off-by: Yonghong Song yonghong.song@linux.dev Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20230827150551.1743497-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/bpf/prog_tests/bpf_obj_pinning.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_obj_pinning.c b/tools/testing/selftests/bpf/prog_tests/bpf_obj_pinning.c index 31f1e815f6719..ee0458a5ce789 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_obj_pinning.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_obj_pinning.c @@ -8,6 +8,7 @@ #include <linux/unistd.h> #include <linux/mount.h> #include <sys/syscall.h> +#include "bpf/libbpf_internal.h"
static inline int sys_fsopen(const char *fsname, unsigned flags) { @@ -155,7 +156,7 @@ static void validate_pin(int map_fd, const char *map_name, int src_value, ASSERT_OK(err, "obj_pin");
/* cleanup */ - if (pin_opts.path_fd >= 0) + if (path_kind == PATH_FD_REL && pin_opts.path_fd >= 0) close(pin_opts.path_fd); if (old_cwd[0]) ASSERT_OK(chdir(old_cwd), "restore_cwd"); @@ -220,7 +221,7 @@ static void validate_get(int map_fd, const char *map_name, int src_value, goto cleanup;
/* cleanup */ - if (get_opts.path_fd >= 0) + if (path_kind == PATH_FD_REL && get_opts.path_fd >= 0) close(get_opts.path_fd); if (old_cwd[0]) ASSERT_OK(chdir(old_cwd), "restore_cwd");
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrzej Hajda andrzej.hajda@intel.com
[ Upstream commit 5eefc5307c983b59344a4cb89009819f580c84fa ]
References to i915_requests may be trapped by userspace inside a sync_file or dmabuf (dma-resv) and held indefinitely across different proceses. To counter-act the memory leaks, we try to not to keep references from the request past their completion. On the other side on fence release we need to know if rq->engine is valid and points to hw engine (true for non-virtual requests). To make it possible extra bit has been added to rq->execution_mask, for marking virtual engines.
Fixes: bcb9aa45d5a0 ("Revert "drm/i915: Hold reference to intel_context over life of i915_request"") Signed-off-by: Chris Wilson chris.p.wilson@linux.intel.com Signed-off-by: Andrzej Hajda andrzej.hajda@intel.com Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230821153035.3903006-1-andrz... (cherry picked from commit 280410677af763f3871b93e794a199cfcf6fb580) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++ drivers/gpu/drm/i915/i915_request.c | 7 ++----- 3 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e99a6fa03d453..a7e6775980043 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -58,6 +58,7 @@ struct i915_perf_group;
typedef u32 intel_engine_mask_t; #define ALL_ENGINES ((intel_engine_mask_t)~0ul) +#define VIRTUAL_ENGINES BIT(BITS_PER_TYPE(intel_engine_mask_t) - 1)
struct intel_hw_status_page { struct list_head timelines; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a0e3ef1c65d24..b5b7f2fe8c78e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -5470,6 +5470,9 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count,
ve->base.flags = I915_ENGINE_IS_VIRTUAL;
+ BUILD_BUG_ON(ilog2(VIRTUAL_ENGINES) < I915_NUM_ENGINES); + ve->base.mask = VIRTUAL_ENGINES; + intel_context_init(&ve->context, &ve->base);
for (n = 0; n < count; n++) { diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 833b73edefdbb..eb2a3000ad66b 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -134,9 +134,7 @@ static void i915_fence_release(struct dma_fence *fence) i915_sw_fence_fini(&rq->semaphore);
/* - * Keep one request on each engine for reserved use under mempressure - * do not use with virtual engines as this really is only needed for - * kernel contexts. + * Keep one request on each engine for reserved use under mempressure. * * We do not hold a reference to the engine here and so have to be * very careful in what rq->engine we poke. The virtual engine is @@ -166,8 +164,7 @@ static void i915_fence_release(struct dma_fence *fence) * know that if the rq->execution_mask is a single bit, rq->engine * can be a physical engine with the exact corresponding mask. */ - if (!intel_engine_is_virtual(rq->engine) && - is_power_of_2(rq->execution_mask) && + if (is_power_of_2(rq->execution_mask) && !cmpxchg(&rq->engine->request_pool, NULL, rq)) return;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit e8368b57c006dc0e02dcd8a9dc9f2060ff5476fe ]
There are no functional changes, just make the code cleaner.
Signed-off-by: Yu Kuai yukuai3@huawei.com Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20230816012708.1193747-4-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe axboe@kernel.dk Stable-dep-of: eead0056648c ("blk-throttle: consider 'carryover_ios/bytes' in throtl_trim_slice()") Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-throttle.c | 86 +++++++++++++++++++++----------------------- 1 file changed, 41 insertions(+), 45 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 7397ff199d669..b0d9573f1911b 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -697,11 +697,40 @@ static bool throtl_slice_used(struct throtl_grp *tg, bool rw) return true; }
+static unsigned int calculate_io_allowed(u32 iops_limit, + unsigned long jiffy_elapsed) +{ + unsigned int io_allowed; + u64 tmp; + + /* + * jiffy_elapsed should not be a big value as minimum iops can be + * 1 then at max jiffy elapsed should be equivalent of 1 second as we + * will allow dispatch after 1 second and after that slice should + * have been trimmed. + */ + + tmp = (u64)iops_limit * jiffy_elapsed; + do_div(tmp, HZ); + + if (tmp > UINT_MAX) + io_allowed = UINT_MAX; + else + io_allowed = tmp; + + return io_allowed; +} + +static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed) +{ + return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ); +} + /* Trim the used slices and adjust slice start accordingly */ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) { - unsigned long nr_slices, time_elapsed, io_trim; - u64 bytes_trim, tmp; + unsigned long time_elapsed, io_trim; + u64 bytes_trim;
BUG_ON(time_before(tg->slice_end[rw], tg->slice_start[rw]));
@@ -723,19 +752,14 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
throtl_set_slice_end(tg, rw, jiffies + tg->td->throtl_slice);
- time_elapsed = jiffies - tg->slice_start[rw]; - - nr_slices = time_elapsed / tg->td->throtl_slice; - - if (!nr_slices) + time_elapsed = rounddown(jiffies - tg->slice_start[rw], + tg->td->throtl_slice); + if (!time_elapsed) return; - tmp = tg_bps_limit(tg, rw) * tg->td->throtl_slice * nr_slices; - do_div(tmp, HZ); - bytes_trim = tmp; - - io_trim = (tg_iops_limit(tg, rw) * tg->td->throtl_slice * nr_slices) / - HZ;
+ bytes_trim = calculate_bytes_allowed(tg_bps_limit(tg, rw), + time_elapsed); + io_trim = calculate_io_allowed(tg_iops_limit(tg, rw), time_elapsed); if (!bytes_trim && !io_trim) return;
@@ -749,41 +773,13 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) else tg->io_disp[rw] = 0;
- tg->slice_start[rw] += nr_slices * tg->td->throtl_slice; + tg->slice_start[rw] += time_elapsed;
throtl_log(&tg->service_queue, "[%c] trim slice nr=%lu bytes=%llu io=%lu start=%lu end=%lu jiffies=%lu", - rw == READ ? 'R' : 'W', nr_slices, bytes_trim, io_trim, - tg->slice_start[rw], tg->slice_end[rw], jiffies); -} - -static unsigned int calculate_io_allowed(u32 iops_limit, - unsigned long jiffy_elapsed) -{ - unsigned int io_allowed; - u64 tmp; - - /* - * jiffy_elapsed should not be a big value as minimum iops can be - * 1 then at max jiffy elapsed should be equivalent of 1 second as we - * will allow dispatch after 1 second and after that slice should - * have been trimmed. - */ - - tmp = (u64)iops_limit * jiffy_elapsed; - do_div(tmp, HZ); - - if (tmp > UINT_MAX) - io_allowed = UINT_MAX; - else - io_allowed = tmp; - - return io_allowed; -} - -static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed) -{ - return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ); + rw == READ ? 'R' : 'W', time_elapsed / tg->td->throtl_slice, + bytes_trim, io_trim, tg->slice_start[rw], tg->slice_end[rw], + jiffies); }
static void __tg_update_carryover(struct throtl_grp *tg, bool rw)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit eead0056648cef49d7b15c07ae612fa217083165 ]
Currently, 'carryover_ios/bytes' is not handled in throtl_trim_slice(), for consequence, 'carryover_ios/bytes' will be used to throttle bio multiple times, for example:
1) set iops limit to 100, and slice start is 0, slice end is 100ms; 2) current time is 0, and 10 ios are dispatched, those io won't be throttled and io_disp is 10; 3) still at current time 0, update iops limit to 1000, carryover_ios is updated to (0 - 10) = -10; 4) in this slice(0 - 100ms), io_allowed = 100 + (-10) = 90, which means only 90 ios can be dispatched without waiting; 5) assume that io is throttled in slice(0 - 100ms), and throtl_trim_slice() update silce to (100ms - 200ms). In this case, 'carryover_ios/bytes' is not cleared and still only 90 ios can be dispatched between 100ms - 200ms.
Fix this problem by updating 'carryover_ios/bytes' in throtl_trim_slice().
Fixes: a880ae93e5b5 ("blk-throttle: fix io hung due to configuration updates") Reported-by: zhuxiaohui zhuxiaohui.400@bytedance.com Link: https://lore.kernel.org/all/20230812072116.42321-1-zhuxiaohui.400@bytedance.... Signed-off-by: Yu Kuai yukuai3@huawei.com Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20230816012708.1193747-5-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-throttle.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c index b0d9573f1911b..e78bc3b65ec80 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -729,8 +729,9 @@ static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed) /* Trim the used slices and adjust slice start accordingly */ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) { - unsigned long time_elapsed, io_trim; - u64 bytes_trim; + unsigned long time_elapsed; + long long bytes_trim; + int io_trim;
BUG_ON(time_before(tg->slice_end[rw], tg->slice_start[rw]));
@@ -758,17 +759,21 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) return;
bytes_trim = calculate_bytes_allowed(tg_bps_limit(tg, rw), - time_elapsed); - io_trim = calculate_io_allowed(tg_iops_limit(tg, rw), time_elapsed); - if (!bytes_trim && !io_trim) + time_elapsed) + + tg->carryover_bytes[rw]; + io_trim = calculate_io_allowed(tg_iops_limit(tg, rw), time_elapsed) + + tg->carryover_ios[rw]; + if (bytes_trim <= 0 && io_trim <= 0) return;
- if (tg->bytes_disp[rw] >= bytes_trim) + tg->carryover_bytes[rw] = 0; + if ((long long)tg->bytes_disp[rw] >= bytes_trim) tg->bytes_disp[rw] -= bytes_trim; else tg->bytes_disp[rw] = 0;
- if (tg->io_disp[rw] >= io_trim) + tg->carryover_ios[rw] = 0; + if ((int)tg->io_disp[rw] >= io_trim) tg->io_disp[rw] -= io_trim; else tg->io_disp[rw] = 0; @@ -776,7 +781,7 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) tg->slice_start[rw] += time_elapsed;
throtl_log(&tg->service_queue, - "[%c] trim slice nr=%lu bytes=%llu io=%lu start=%lu end=%lu jiffies=%lu", + "[%c] trim slice nr=%lu bytes=%lld io=%d start=%lu end=%lu jiffies=%lu", rw == READ ? 'R' : 'W', time_elapsed / tg->td->throtl_slice, bytes_trim, io_trim, tg->slice_start[rw], tg->slice_end[rw], jiffies);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit 7e9be1124dbe7888907e82cab20164578e3f9ab7 ]
Since set element reset is not integrated into nf_tables' transaction logic, an explicit log call is needed, similar to NFT_MSG_GETOBJ_RESET handling.
For the sake of simplicity, catchall element reset will always generate a dedicated log entry. This relieves nf_tables_dump_set() from having to adjust the logged element count depending on whether a catchall element was found or not.
Fixes: 079cd633219d7 ("netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET") Signed-off-by: Phil Sutter phil@nwl.cc Reviewed-by: Richard Guy Briggs rgb@redhat.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/audit.h | 1 + kernel/auditsc.c | 1 + net/netfilter/nf_tables_api.c | 31 ++++++++++++++++++++++++++++--- 3 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/include/linux/audit.h b/include/linux/audit.h index 6a3a9e122bb5e..192bf03aacc52 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -117,6 +117,7 @@ enum audit_nfcfgop { AUDIT_NFT_OP_OBJ_RESET, AUDIT_NFT_OP_FLOWTABLE_REGISTER, AUDIT_NFT_OP_FLOWTABLE_UNREGISTER, + AUDIT_NFT_OP_SETELEM_RESET, AUDIT_NFT_OP_INVALID, };
diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 8dfd581cd5543..87342b7126bcd 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -143,6 +143,7 @@ static const struct audit_nfcfgop_tab audit_nfcfgs[] = { { AUDIT_NFT_OP_OBJ_RESET, "nft_reset_obj" }, { AUDIT_NFT_OP_FLOWTABLE_REGISTER, "nft_register_flowtable" }, { AUDIT_NFT_OP_FLOWTABLE_UNREGISTER, "nft_unregister_flowtable" }, + { AUDIT_NFT_OP_SETELEM_RESET, "nft_reset_setelem" }, { AUDIT_NFT_OP_INVALID, "nft_invalid" }, };
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index eb8b1167dced2..2e3844d5923f5 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -102,6 +102,7 @@ static const u8 nft2audit_op[NFT_MSG_MAX] = { // enum nf_tables_msg_types [NFT_MSG_NEWFLOWTABLE] = AUDIT_NFT_OP_FLOWTABLE_REGISTER, [NFT_MSG_GETFLOWTABLE] = AUDIT_NFT_OP_INVALID, [NFT_MSG_DELFLOWTABLE] = AUDIT_NFT_OP_FLOWTABLE_UNREGISTER, + [NFT_MSG_GETSETELEM_RESET] = AUDIT_NFT_OP_SETELEM_RESET, };
static void nft_validate_state_update(struct nft_table *table, u8 new_validate_state) @@ -5621,13 +5622,25 @@ static int nf_tables_dump_setelem(const struct nft_ctx *ctx, return nf_tables_fill_setelem(args->skb, set, elem, args->reset); }
+static void audit_log_nft_set_reset(const struct nft_table *table, + unsigned int base_seq, + unsigned int nentries) +{ + char *buf = kasprintf(GFP_ATOMIC, "%s:%u", table->name, base_seq); + + audit_log_nfcfg(buf, table->family, nentries, + AUDIT_NFT_OP_SETELEM_RESET, GFP_ATOMIC); + kfree(buf); +} + struct nft_set_dump_ctx { const struct nft_set *set; struct nft_ctx ctx; };
static int nft_set_catchall_dump(struct net *net, struct sk_buff *skb, - const struct nft_set *set, bool reset) + const struct nft_set *set, bool reset, + unsigned int base_seq) { struct nft_set_elem_catchall *catchall; u8 genmask = nft_genmask_cur(net); @@ -5643,6 +5656,8 @@ static int nft_set_catchall_dump(struct net *net, struct sk_buff *skb,
elem.priv = catchall->elem; ret = nf_tables_fill_setelem(skb, set, &elem, reset); + if (reset && !ret) + audit_log_nft_set_reset(set->table, base_seq, 1); break; }
@@ -5722,12 +5737,17 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb) set->ops->walk(&dump_ctx->ctx, set, &args.iter);
if (!args.iter.err && args.iter.count == cb->args[0]) - args.iter.err = nft_set_catchall_dump(net, skb, set, reset); + args.iter.err = nft_set_catchall_dump(net, skb, set, + reset, cb->seq); rcu_read_unlock();
nla_nest_end(skb, nest); nlmsg_end(skb, nlh);
+ if (reset && args.iter.count > args.iter.skip) + audit_log_nft_set_reset(table, cb->seq, + args.iter.count - args.iter.skip); + if (args.iter.err && args.iter.err != -EMSGSIZE) return args.iter.err; if (args.iter.count == cb->args[0]) @@ -5952,13 +5972,13 @@ static int nf_tables_getsetelem(struct sk_buff *skb, struct netlink_ext_ack *extack = info->extack; u8 genmask = nft_genmask_cur(info->net); u8 family = info->nfmsg->nfgen_family; + int rem, err = 0, nelems = 0; struct net *net = info->net; struct nft_table *table; struct nft_set *set; struct nlattr *attr; struct nft_ctx ctx; bool reset = false; - int rem, err = 0;
table = nft_table_lookup(net, nla[NFTA_SET_ELEM_LIST_TABLE], family, genmask, 0); @@ -6001,8 +6021,13 @@ static int nf_tables_getsetelem(struct sk_buff *skb, NL_SET_BAD_ATTR(extack, attr); break; } + nelems++; }
+ if (reset) + audit_log_nft_set_reset(table, nft_pernet(net)->base_seq, + nelems); + return err; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit ea078ae9108e25fc881c84369f7c03931d22e555 ]
Resetting rules' stateful data happens outside of the transaction logic, so 'get' and 'dump' handlers have to emit audit log entries themselves.
Fixes: 8daa8fde3fc3f ("netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET") Signed-off-by: Phil Sutter phil@nwl.cc Reviewed-by: Richard Guy Briggs rgb@redhat.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/audit.h | 1 + kernel/auditsc.c | 1 + net/netfilter/nf_tables_api.c | 18 ++++++++++++++++++ 3 files changed, 20 insertions(+)
diff --git a/include/linux/audit.h b/include/linux/audit.h index 192bf03aacc52..51b1b7054a233 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -118,6 +118,7 @@ enum audit_nfcfgop { AUDIT_NFT_OP_FLOWTABLE_REGISTER, AUDIT_NFT_OP_FLOWTABLE_UNREGISTER, AUDIT_NFT_OP_SETELEM_RESET, + AUDIT_NFT_OP_RULE_RESET, AUDIT_NFT_OP_INVALID, };
diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 87342b7126bcd..eae5dfe9b9a01 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -144,6 +144,7 @@ static const struct audit_nfcfgop_tab audit_nfcfgs[] = { { AUDIT_NFT_OP_FLOWTABLE_REGISTER, "nft_register_flowtable" }, { AUDIT_NFT_OP_FLOWTABLE_UNREGISTER, "nft_unregister_flowtable" }, { AUDIT_NFT_OP_SETELEM_RESET, "nft_reset_setelem" }, + { AUDIT_NFT_OP_RULE_RESET, "nft_reset_rule" }, { AUDIT_NFT_OP_INVALID, "nft_invalid" }, };
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 2e3844d5923f5..cc70482b94907 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3422,6 +3422,18 @@ static void nf_tables_rule_notify(const struct nft_ctx *ctx, nfnetlink_set_err(ctx->net, ctx->portid, NFNLGRP_NFTABLES, -ENOBUFS); }
+static void audit_log_rule_reset(const struct nft_table *table, + unsigned int base_seq, + unsigned int nentries) +{ + char *buf = kasprintf(GFP_ATOMIC, "%s:%u", + table->name, base_seq); + + audit_log_nfcfg(buf, table->family, nentries, + AUDIT_NFT_OP_RULE_RESET, GFP_ATOMIC); + kfree(buf); +} + struct nft_rule_dump_ctx { char *table; char *chain; @@ -3528,6 +3540,9 @@ static int nf_tables_dump_rules(struct sk_buff *skb, done: rcu_read_unlock();
+ if (reset && idx > cb->args[0]) + audit_log_rule_reset(table, cb->seq, idx - cb->args[0]); + cb->args[0] = idx; return skb->len; } @@ -3635,6 +3650,9 @@ static int nf_tables_getrule(struct sk_buff *skb, const struct nfnl_info *info, if (err < 0) goto err_fill_rule_info;
+ if (reset) + audit_log_rule_reset(table, nft_pernet(net)->base_seq, 1); + return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid);
err_fill_rule_info:
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Katya Orlova e.orlova@ispras.ru
[ Upstream commit efc0b0bcffcba60d9c6301063d25a22a4744b499 ]
In addition to the EINVAL, there may be an ENOMEM.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 70431bfd825d ("cifs: Support fscache indexing rewrite") Signed-off-by: Katya Orlova e.orlova@ispras.ru Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/fscache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/smb/client/fscache.c b/fs/smb/client/fscache.c index 8f6909d633da8..34e20c4cd507f 100644 --- a/fs/smb/client/fscache.c +++ b/fs/smb/client/fscache.c @@ -48,7 +48,7 @@ int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon) sharename = extract_sharename(tcon->tree_name); if (IS_ERR(sharename)) { cifs_dbg(FYI, "%s: couldn't extract sharename\n", __func__); - return -EINVAL; + return PTR_ERR(sharename); }
slen = strlen(sharename);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 8c21ab1bae945686c602c5bfa4e3f3352c2452c5 ]
When setting a high number of flows (limit being 65536), fq_pie_timer() is currently using too much time as syzbot reported.
Add logic to yield the cpu every 2048 flows (less than 150 usec on debug kernels). It should also help by not blocking qdisc fast paths for too long. Worst case (65536 flows) would need 31 jiffies for a complete scan.
Relevant extract from syzbot report:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-.... } 2663 jiffies s: 873 root: 0x1/. rcu: blocking rcu_node structures (internal RCU debug): Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 5177 Comm: syz-executor273 Not tainted 6.5.0-syzkaller-00453-g727dbda16b83 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] RIP: 0010:write_comp_data+0x21/0x90 kernel/kcov.c:236 Code: 2e 0f 1f 84 00 00 00 00 00 65 8b 05 01 b2 7d 7e 49 89 f1 89 c6 49 89 d2 81 e6 00 01 00 00 49 89 f8 65 48 8b 14 25 80 b9 03 00 <a9> 00 01 ff 00 74 0e 85 f6 74 59 8b 82 04 16 00 00 85 c0 74 4f 8b RSP: 0018:ffffc90000007bb8 EFLAGS: 00000206 RAX: 0000000000000101 RBX: ffffc9000dc0d140 RCX: ffffffff885893b0 RDX: ffff88807c075940 RSI: 0000000000000100 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000dc0d178 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000555555d54380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6b442f6130 CR3: 000000006fe1c000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <NMI> </NMI> <IRQ> pie_calculate_probability+0x480/0x850 net/sched/sch_pie.c:415 fq_pie_timer+0x1da/0x4f0 net/sched/sch_fq_pie.c:387 call_timer_fn+0x1a0/0x580 kernel/time/timer.c:1700
Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Link: https://lore.kernel.org/lkml/00000000000017ad3f06040bf394@google.com/ Reported-by: syzbot+e46fbd5289363464bc13@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Michal Kubiak michal.kubiak@intel.com Reviewed-by: Jamal Hadi Salim jhs@mojatatu.com Link: https://lore.kernel.org/r/20230829123541.3745013-1-edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_fq_pie.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c index 591d87d5e5c0f..68e6acd0f130d 100644 --- a/net/sched/sch_fq_pie.c +++ b/net/sched/sch_fq_pie.c @@ -61,6 +61,7 @@ struct fq_pie_sched_data { struct pie_params p_params; u32 ecn_prob; u32 flows_cnt; + u32 flows_cursor; u32 quantum; u32 memory_limit; u32 new_flow_count; @@ -375,22 +376,32 @@ static int fq_pie_change(struct Qdisc *sch, struct nlattr *opt, static void fq_pie_timer(struct timer_list *t) { struct fq_pie_sched_data *q = from_timer(q, t, adapt_timer); + unsigned long next, tupdate; struct Qdisc *sch = q->sch; spinlock_t *root_lock; /* to lock qdisc for probability calculations */ - u32 idx; + int max_cnt, i;
rcu_read_lock(); root_lock = qdisc_lock(qdisc_root_sleeping(sch)); spin_lock(root_lock);
- for (idx = 0; idx < q->flows_cnt; idx++) - pie_calculate_probability(&q->p_params, &q->flows[idx].vars, - q->flows[idx].backlog); - - /* reset the timer to fire after 'tupdate' jiffies. */ - if (q->p_params.tupdate) - mod_timer(&q->adapt_timer, jiffies + q->p_params.tupdate); + /* Limit this expensive loop to 2048 flows per round. */ + max_cnt = min_t(int, q->flows_cnt - q->flows_cursor, 2048); + for (i = 0; i < max_cnt; i++) { + pie_calculate_probability(&q->p_params, + &q->flows[q->flows_cursor].vars, + q->flows[q->flows_cursor].backlog); + q->flows_cursor++; + }
+ tupdate = q->p_params.tupdate; + next = 0; + if (q->flows_cursor >= q->flows_cnt) { + q->flows_cursor = 0; + next = tupdate; + } + if (tupdate) + mod_timer(&q->adapt_timer, jiffies + next); spin_unlock(root_lock); rcu_read_unlock(); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit dc9511dd6f37fe803f6b15b61b030728d7057417 ]
sk->sk_wmem_queued can be read locklessly from sctp_poll()
Use sk_wmem_queued_add() when the field is changed, and add READ_ONCE() annotations in sctp_writeable() and sctp_assocs_seq_show()
syzbot reported:
BUG: KCSAN: data-race in sctp_poll / sctp_wfree
read-write to 0xffff888149d77810 of 4 bytes by interrupt on cpu 0: sctp_wfree+0x170/0x4a0 net/sctp/socket.c:9147 skb_release_head_state+0xb7/0x1a0 net/core/skbuff.c:988 skb_release_all net/core/skbuff.c:1000 [inline] __kfree_skb+0x16/0x140 net/core/skbuff.c:1016 consume_skb+0x57/0x180 net/core/skbuff.c:1232 sctp_chunk_destroy net/sctp/sm_make_chunk.c:1503 [inline] sctp_chunk_put+0xcd/0x130 net/sctp/sm_make_chunk.c:1530 sctp_datamsg_put+0x29a/0x300 net/sctp/chunk.c:128 sctp_chunk_free+0x34/0x50 net/sctp/sm_make_chunk.c:1515 sctp_outq_sack+0xafa/0xd70 net/sctp/outqueue.c:1381 sctp_cmd_process_sack net/sctp/sm_sideeffect.c:834 [inline] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1366 [inline] sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline] sctp_do_sm+0x12c7/0x31b0 net/sctp/sm_sideeffect.c:1169 sctp_assoc_bh_rcv+0x2b2/0x430 net/sctp/associola.c:1051 sctp_inq_push+0x108/0x120 net/sctp/inqueue.c:80 sctp_rcv+0x116e/0x1340 net/sctp/input.c:243 sctp6_rcv+0x25/0x40 net/sctp/ipv6.c:1120 ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437 ip6_input_finish net/ipv6/ip6_input.c:482 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:303 [inline] ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core net/core/dev.c:5452 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566 process_backlog+0x21f/0x380 net/core/dev.c:5894 __napi_poll+0x60/0x3b0 net/core/dev.c:6460 napi_poll net/core/dev.c:6527 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6660 __do_softirq+0xc1/0x265 kernel/softirq.c:553 run_ksoftirqd+0x17/0x20 kernel/softirq.c:921 smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164 kthread+0x1d7/0x210 kernel/kthread.c:389 ret_from_fork+0x2e/0x40 arch/x86/kernel/process.c:145 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
read to 0xffff888149d77810 of 4 bytes by task 17828 on cpu 1: sctp_writeable net/sctp/socket.c:9304 [inline] sctp_poll+0x265/0x410 net/sctp/socket.c:8671 sock_poll+0x253/0x270 net/socket.c:1374 vfs_poll include/linux/poll.h:88 [inline] do_pollfd fs/select.c:873 [inline] do_poll fs/select.c:921 [inline] do_sys_poll+0x636/0xc00 fs/select.c:1015 __do_sys_ppoll fs/select.c:1121 [inline] __se_sys_ppoll+0x1af/0x1f0 fs/select.c:1101 __x64_sys_ppoll+0x67/0x80 fs/select.c:1101 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00019e80 -> 0x0000cc80
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 17828 Comm: syz-executor.1 Not tainted 6.5.0-rc7-syzkaller-00185-g28f20a19294d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Acked-by: Xin Long lucien.xin@gmail.com Link: https://lore.kernel.org/r/20230830094519.950007-1-edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/proc.c | 2 +- net/sctp/socket.c | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/sctp/proc.c b/net/sctp/proc.c index f13d6a34f32f2..ec00ee75d59a6 100644 --- a/net/sctp/proc.c +++ b/net/sctp/proc.c @@ -282,7 +282,7 @@ static int sctp_assocs_seq_show(struct seq_file *seq, void *v) assoc->init_retries, assoc->shutdown_retries, assoc->rtx_data_chunks, refcount_read(&sk->sk_wmem_alloc), - sk->sk_wmem_queued, + READ_ONCE(sk->sk_wmem_queued), sk->sk_sndbuf, sk->sk_rcvbuf); seq_printf(seq, "\n"); diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 423dc400992ba..7cf207706eb66 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -69,7 +69,7 @@ #include <net/sctp/stream_sched.h>
/* Forward declarations for internal helper functions. */ -static bool sctp_writeable(struct sock *sk); +static bool sctp_writeable(const struct sock *sk); static void sctp_wfree(struct sk_buff *skb); static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p, size_t msg_len); @@ -140,7 +140,7 @@ static inline void sctp_set_owner_w(struct sctp_chunk *chunk)
refcount_add(sizeof(struct sctp_chunk), &sk->sk_wmem_alloc); asoc->sndbuf_used += chunk->skb->truesize + sizeof(struct sctp_chunk); - sk->sk_wmem_queued += chunk->skb->truesize + sizeof(struct sctp_chunk); + sk_wmem_queued_add(sk, chunk->skb->truesize + sizeof(struct sctp_chunk)); sk_mem_charge(sk, chunk->skb->truesize); }
@@ -9144,7 +9144,7 @@ static void sctp_wfree(struct sk_buff *skb) struct sock *sk = asoc->base.sk;
sk_mem_uncharge(sk, skb->truesize); - sk->sk_wmem_queued -= skb->truesize + sizeof(struct sctp_chunk); + sk_wmem_queued_add(sk, -(skb->truesize + sizeof(struct sctp_chunk))); asoc->sndbuf_used -= skb->truesize + sizeof(struct sctp_chunk); WARN_ON(refcount_sub_and_test(sizeof(struct sctp_chunk), &sk->sk_wmem_alloc)); @@ -9299,9 +9299,9 @@ void sctp_write_space(struct sock *sk) * UDP-style sockets or TCP-style sockets, this code should work. * - Daisy */ -static bool sctp_writeable(struct sock *sk) +static bool sctp_writeable(const struct sock *sk) { - return sk->sk_sndbuf > sk->sk_wmem_queued; + return READ_ONCE(sk->sk_sndbuf) > READ_ONCE(sk->sk_wmem_queued); }
/* Wait for an association to go into ESTABLISHED state. If timeout is 0,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit fce92af1c29d90184dfec638b5738831097d66e9 ]
syzbot complained about a data-race in fib_table_lookup() [1]
Add appropriate annotations to document it.
[1] BUG: KCSAN: data-race in fib_release_info / fib_table_lookup
write to 0xffff888150f31744 of 1 bytes by task 1189 on cpu 0: fib_release_info+0x3a0/0x460 net/ipv4/fib_semantics.c:281 fib_table_delete+0x8d2/0x900 net/ipv4/fib_trie.c:1777 fib_magic+0x1c1/0x1f0 net/ipv4/fib_frontend.c:1106 fib_del_ifaddr+0x8cf/0xa60 net/ipv4/fib_frontend.c:1317 fib_inetaddr_event+0x77/0x200 net/ipv4/fib_frontend.c:1448 notifier_call_chain kernel/notifier.c:93 [inline] blocking_notifier_call_chain+0x90/0x200 kernel/notifier.c:388 __inet_del_ifa+0x4df/0x800 net/ipv4/devinet.c:432 inet_del_ifa net/ipv4/devinet.c:469 [inline] inetdev_destroy net/ipv4/devinet.c:322 [inline] inetdev_event+0x553/0xaf0 net/ipv4/devinet.c:1606 notifier_call_chain kernel/notifier.c:93 [inline] raw_notifier_call_chain+0x6b/0x1c0 kernel/notifier.c:461 call_netdevice_notifiers_info net/core/dev.c:1962 [inline] call_netdevice_notifiers_mtu+0xd2/0x130 net/core/dev.c:2037 dev_set_mtu_ext+0x30b/0x3e0 net/core/dev.c:8673 do_setlink+0x5be/0x2430 net/core/rtnetlink.c:2837 rtnl_setlink+0x255/0x300 net/core/rtnetlink.c:3177 rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6445 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2549 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6463 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1914 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] sock_write_iter+0x1aa/0x230 net/socket.c:1129 do_iter_write+0x4b4/0x7b0 fs/read_write.c:860 vfs_writev+0x1a8/0x320 fs/read_write.c:933 do_writev+0xf8/0x220 fs/read_write.c:976 __do_sys_writev fs/read_write.c:1049 [inline] __se_sys_writev fs/read_write.c:1046 [inline] __x64_sys_writev+0x45/0x50 fs/read_write.c:1046 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff888150f31744 of 1 bytes by task 21839 on cpu 1: fib_table_lookup+0x2bf/0xd50 net/ipv4/fib_trie.c:1585 fib_lookup include/net/ip_fib.h:383 [inline] ip_route_output_key_hash_rcu+0x38c/0x12c0 net/ipv4/route.c:2751 ip_route_output_key_hash net/ipv4/route.c:2641 [inline] __ip_route_output_key include/net/route.h:134 [inline] ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2869 send4+0x1e7/0x500 drivers/net/wireguard/socket.c:61 wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work+0x434/0x860 kernel/workqueue.c:2600 worker_thread+0x5f2/0xa10 kernel/workqueue.c:2751 kthread+0x1d7/0x210 kernel/kthread.c:389 ret_from_fork+0x2e/0x40 arch/x86/kernel/process.c:145 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
value changed: 0x00 -> 0x01
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 21839 Comm: kworker/u4:18 Tainted: G W 6.5.0-syzkaller #0
Fixes: dccd9ecc3744 ("ipv4: Do not use dead fib_info entries.") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/20230830095520.1046984-1-edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/fib_semantics.c | 5 ++++- net/ipv4/fib_trie.c | 3 ++- 2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 65ba18a91865a..eafa4a0335157 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -278,7 +278,8 @@ void fib_release_info(struct fib_info *fi) hlist_del(&nexthop_nh->nh_hash); } endfor_nexthops(fi) } - fi->fib_dead = 1; + /* Paired with READ_ONCE() from fib_table_lookup() */ + WRITE_ONCE(fi->fib_dead, 1); fib_info_put(fi); } spin_unlock_bh(&fib_info_lock); @@ -1581,6 +1582,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg, link_it: ofi = fib_find_info(fi); if (ofi) { + /* fib_table_lookup() should not see @fi yet. */ fi->fib_dead = 1; free_fib_info(fi); refcount_inc(&ofi->fib_treeref); @@ -1619,6 +1621,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
failure: if (fi) { + /* fib_table_lookup() should not see @fi yet. */ fi->fib_dead = 1; free_fib_info(fi); } diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 74d403dbd2b4e..d13fb9e76b971 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1582,7 +1582,8 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp, if (fa->fa_dscp && inet_dscp_to_dsfield(fa->fa_dscp) != flp->flowi4_tos) continue; - if (fi->fib_dead) + /* Paired with WRITE_ONCE() in fib_release_info() */ + if (READ_ONCE(fi->fib_dead)) continue; if (fa->fa_info->fib_scope < flp->flowi4_scope) continue;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit a3e0fdf71bbe031de845e8e08ed7fba49f9c702c ]
syzbot is playing with IPV6_ADDRFORM quite a lot these days, and managed to hit the WARN_ON_ONCE(1) in sk_mc_loop()
We have many more similar issues to fix.
WARNING: CPU: 1 PID: 1593 at net/core/sock.c:782 sk_mc_loop+0x165/0x260 Modules linked in: CPU: 1 PID: 1593 Comm: kworker/1:3 Not tainted 6.1.40-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Workqueue: events_power_efficient gc_worker RIP: 0010:sk_mc_loop+0x165/0x260 net/core/sock.c:782 Code: 34 1b fd 49 81 c7 18 05 00 00 4c 89 f8 48 c1 e8 03 42 80 3c 20 00 74 08 4c 89 ff e8 25 36 6d fd 4d 8b 37 eb 13 e8 db 33 1b fd <0f> 0b b3 01 eb 34 e8 d0 33 1b fd 45 31 f6 49 83 c6 38 4c 89 f0 48 RSP: 0018:ffffc90000388530 EFLAGS: 00010246 RAX: ffffffff846d9b55 RBX: 0000000000000011 RCX: ffff88814f884980 RDX: 0000000000000102 RSI: ffffffff87ae5160 RDI: 0000000000000011 RBP: ffffc90000388550 R08: 0000000000000003 R09: ffffffff846d9a65 R10: 0000000000000002 R11: ffff88814f884980 R12: dffffc0000000000 R13: ffff88810dbee000 R14: 0000000000000010 R15: ffff888150084000 FS: 0000000000000000(0000) GS:ffff8881f6b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000180 CR3: 000000014ee5b000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> [<ffffffff8507734f>] ip6_finish_output2+0x33f/0x1ae0 net/ipv6/ip6_output.c:83 [<ffffffff85062766>] __ip6_finish_output net/ipv6/ip6_output.c:200 [inline] [<ffffffff85062766>] ip6_finish_output+0x6c6/0xb10 net/ipv6/ip6_output.c:211 [<ffffffff85061f8c>] NF_HOOK_COND include/linux/netfilter.h:298 [inline] [<ffffffff85061f8c>] ip6_output+0x2bc/0x3d0 net/ipv6/ip6_output.c:232 [<ffffffff852071cf>] dst_output include/net/dst.h:444 [inline] [<ffffffff852071cf>] ip6_local_out+0x10f/0x140 net/ipv6/output_core.c:161 [<ffffffff83618fb4>] ipvlan_process_v6_outbound drivers/net/ipvlan/ipvlan_core.c:483 [inline] [<ffffffff83618fb4>] ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:529 [inline] [<ffffffff83618fb4>] ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline] [<ffffffff83618fb4>] ipvlan_queue_xmit+0x1174/0x1be0 drivers/net/ipvlan/ipvlan_core.c:677 [<ffffffff8361ddd9>] ipvlan_start_xmit+0x49/0x100 drivers/net/ipvlan/ipvlan_main.c:229 [<ffffffff84763fc0>] netdev_start_xmit include/linux/netdevice.h:4925 [inline] [<ffffffff84763fc0>] xmit_one net/core/dev.c:3644 [inline] [<ffffffff84763fc0>] dev_hard_start_xmit+0x320/0x980 net/core/dev.c:3660 [<ffffffff8494c650>] sch_direct_xmit+0x2a0/0x9c0 net/sched/sch_generic.c:342 [<ffffffff8494d883>] qdisc_restart net/sched/sch_generic.c:407 [inline] [<ffffffff8494d883>] __qdisc_run+0xb13/0x1e70 net/sched/sch_generic.c:415 [<ffffffff8478c426>] qdisc_run+0xd6/0x260 include/net/pkt_sched.h:125 [<ffffffff84796eac>] net_tx_action+0x7ac/0x940 net/core/dev.c:5247 [<ffffffff858002bd>] __do_softirq+0x2bd/0x9bd kernel/softirq.c:599 [<ffffffff814c3fe8>] invoke_softirq kernel/softirq.c:430 [inline] [<ffffffff814c3fe8>] __irq_exit_rcu+0xc8/0x170 kernel/softirq.c:683 [<ffffffff814c3f09>] irq_exit_rcu+0x9/0x20 kernel/softirq.c:695
Fixes: 7ad6848c7e81 ("ip: fix mc_loop checks for tunnels with multicast outer addresses") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230830101244.1146934-1-edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 1c5c01b116e6f..4ae68aa07e9fe 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -765,7 +765,8 @@ bool sk_mc_loop(struct sock *sk) return false; if (!sk) return true; - switch (sk->sk_family) { + /* IPV6_ADDRFORM can change sk->sk_family under us. */ + switch (READ_ONCE(sk->sk_family)) { case AF_INET: return inet_sk(sk)->mc_loop; #if IS_ENABLED(CONFIG_IPV6)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 8aae7625ff3f0bd5484d01f1b8d5af82e44bec2d ]
New skbs allocated via nf_send_reset() have skb->dev == NULL.
fib*_rules_early_flow_dissect helpers already have a 'struct net' argument but its not passed down to the flow dissector core, which will then WARN as it can't derive a net namespace to use:
WARNING: CPU: 0 PID: 0 at net/core/flow_dissector.c:1016 __skb_flow_dissect+0xa91/0x1cd0 [..] ip_route_me_harder+0x143/0x330 nf_send_reset+0x17c/0x2d0 [nf_reject_ipv4] nft_reject_inet_eval+0xa9/0xf2 [nft_reject_inet] nft_do_chain+0x198/0x5d0 [nf_tables] nft_do_chain_inet+0xa4/0x110 [nf_tables] nf_hook_slow+0x41/0xc0 ip_local_deliver+0xce/0x110 ..
Cc: Stanislav Fomichev sdf@google.com Cc: David Ahern dsahern@kernel.org Cc: Ido Schimmel idosch@nvidia.com Fixes: 812fa71f0d96 ("netfilter: Dissect flow after packet mangling") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217826 Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/20230830110043.30497-1-fw@strlen.de Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/ip6_fib.h | 5 ++++- include/net/ip_fib.h | 5 ++++- 2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index 05e6f756feafe..9ba6413fd2e3e 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -604,7 +604,10 @@ static inline bool fib6_rules_early_flow_dissect(struct net *net, if (!net->ipv6.fib6_rules_require_fldissect) return false;
- skb_flow_dissect_flow_keys(skb, flkeys, flag); + memset(flkeys, 0, sizeof(*flkeys)); + __skb_flow_dissect(net, skb, &flow_keys_dissector, + flkeys, NULL, 0, 0, 0, flag); + fl6->fl6_sport = flkeys->ports.src; fl6->fl6_dport = flkeys->ports.dst; fl6->flowi6_proto = flkeys->basic.ip_proto; diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index a378eff827c74..f0c13864180e2 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -418,7 +418,10 @@ static inline bool fib4_rules_early_flow_dissect(struct net *net, if (!net->ipv4.fib_rules_require_fldissect) return false;
- skb_flow_dissect_flow_keys(skb, flkeys, flag); + memset(flkeys, 0, sizeof(*flkeys)); + __skb_flow_dissect(net, skb, &flow_keys_dissector, + flkeys, NULL, 0, 0, 0, flag); + fl4->fl4_sport = flkeys->ports.src; fl4->fl4_dport = flkeys->ports.dst; fl4->flowi4_proto = flkeys->basic.ip_proto;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Magnus Karlsson magnus.karlsson@intel.com
[ Upstream commit 3e019d8a05a38abb5c85d4f1e85fda964610aa14 ]
Fix a use-after-free error that is possible if the xsk_diag interface is used after the socket has been unbound from the device. This can happen either due to the socket being closed or the device disappearing. In the early days of AF_XDP, the way we tested that a socket was not bound to a device was to simply check if the netdevice pointer in the xsk socket structure was NULL. Later, a better system was introduced by having an explicit state variable in the xsk socket struct. For example, the state of a socket that is on the way to being closed and has been unbound from the device is XSK_UNBOUND.
The commit in the Fixes tag below deleted the old way of signalling that a socket is unbound, setting dev to NULL. This in the belief that all code using the old way had been exterminated. That was unfortunately not true as the xsk diagnostics code was still using the old way and thus does not work as intended when a socket is going down. Fix this by introducing a test against the state variable. If the socket is in the state XSK_UNBOUND, simply abort the diagnostic's netlink operation.
Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown") Reported-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Tested-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com Tested-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Link: https://lore.kernel.org/bpf/20230831100119.17408-1-magnus.karlsson@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/xdp/xsk_diag.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c index c014217f5fa7d..22b36c8143cfd 100644 --- a/net/xdp/xsk_diag.c +++ b/net/xdp/xsk_diag.c @@ -111,6 +111,9 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb, sock_diag_save_cookie(sk, msg->xdiag_cookie);
mutex_lock(&xs->mutex); + if (READ_ONCE(xs->state) == XSK_UNBOUND) + goto out_nlmsg_trim; + if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb)) goto out_nlmsg_trim;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiubo Li xiubli@redhat.com
[ Upstream commit 3af5ae22030cb59fab4fba35f5a2b62f47e14df9 ]
In ceph mainline it will allow to set the btime in the setattr request and just add a 'btime' member in the union 'ceph_mds_request_args' and then bump up the header version to 4. That means the total size of union 'ceph_mds_request_args' will increase sizeof(struct ceph_timespec) bytes, but in kclient it will increase the sizeof(setattr_ext) bytes for each request.
Since the MDS will always depend on the header's vesion and front_len members to decode the 'ceph_mds_request_head' struct, at the same time kclient hasn't supported the 'btime' feature yet in setattr request, so it's safe to do this change here.
This will save 48 bytes memories for each request.
Fixes: 4f1ddb1ea874 ("ceph: implement updated ceph_mds_request_head structure") Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Milind Changire mchangir@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/ceph/ceph_fs.h | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 49586ff261520..b4fa2a25b7d95 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -462,17 +462,19 @@ union ceph_mds_request_args { } __attribute__ ((packed));
union ceph_mds_request_args_ext { - union ceph_mds_request_args old; - struct { - __le32 mode; - __le32 uid; - __le32 gid; - struct ceph_timespec mtime; - struct ceph_timespec atime; - __le64 size, old_size; /* old_size needed by truncate */ - __le32 mask; /* CEPH_SETATTR_* */ - struct ceph_timespec btime; - } __attribute__ ((packed)) setattr_ext; + union { + union ceph_mds_request_args old; + struct { + __le32 mode; + __le32 uid; + __le32 gid; + struct ceph_timespec mtime; + struct ceph_timespec atime; + __le64 size, old_size; /* old_size needed by truncate */ + __le32 mask; /* CEPH_SETATTR_* */ + struct ceph_timespec btime; + } __attribute__ ((packed)) setattr_ext; + }; };
#define CEPH_MDS_FLAG_REPLAY 1 /* this is a replayed op */
On Sun, Sep 17, 2023 at 9:49 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
6.5-stable review patch. If anyone has any objections, please let me know.
From: Xiubo Li xiubli@redhat.com
[ Upstream commit 3af5ae22030cb59fab4fba35f5a2b62f47e14df9 ]
In ceph mainline it will allow to set the btime in the setattr request and just add a 'btime' member in the union 'ceph_mds_request_args' and then bump up the header version to 4. That means the total size of union 'ceph_mds_request_args' will increase sizeof(struct ceph_timespec) bytes, but in kclient it will increase the sizeof(setattr_ext) bytes for each request.
Since the MDS will always depend on the header's vesion and front_len members to decode the 'ceph_mds_request_head' struct, at the same time kclient hasn't supported the 'btime' feature yet in setattr request, so it's safe to do this change here.
This will save 48 bytes memories for each request.
Fixes: 4f1ddb1ea874 ("ceph: implement updated ceph_mds_request_head structure") Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Milind Changire mchangir@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org
include/linux/ceph/ceph_fs.h | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 49586ff261520..b4fa2a25b7d95 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -462,17 +462,19 @@ union ceph_mds_request_args { } __attribute__ ((packed));
union ceph_mds_request_args_ext {
union ceph_mds_request_args old;
struct {
__le32 mode;
__le32 uid;
__le32 gid;
struct ceph_timespec mtime;
struct ceph_timespec atime;
__le64 size, old_size; /* old_size needed by truncate */
__le32 mask; /* CEPH_SETATTR_* */
struct ceph_timespec btime;
} __attribute__ ((packed)) setattr_ext;
union {
union ceph_mds_request_args old;
struct {
__le32 mode;
__le32 uid;
__le32 gid;
struct ceph_timespec mtime;
struct ceph_timespec atime;
__le64 size, old_size; /* old_size needed by truncate */
__le32 mask; /* CEPH_SETATTR_* */
struct ceph_timespec btime;
} __attribute__ ((packed)) setattr_ext;
};
Hi Xiubo,
I was going to ask whether it makes sense to backport this change, but, after looking at it, the change seems bogus to me even in mainline. You added a union inside siting memory use but ceph_mds_request_args_ext was already a union before:
union ceph_mds_request_args_ext { union ceph_mds_request_args old; struct { ... } __attribute__ ((packed)) setattr_ext; }
What is being achieved here?
union ceph_mds_request_args_ext { union { union ceph_mds_request_args old; struct { ... } __attribute__ ((packed)) setattr_ext; }; }
Thanks,
Ilya
On 9/18/23 16:04, Ilya Dryomov wrote:
On Sun, Sep 17, 2023 at 9:49 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
6.5-stable review patch. If anyone has any objections, please let me know.
From: Xiubo Li xiubli@redhat.com
[ Upstream commit 3af5ae22030cb59fab4fba35f5a2b62f47e14df9 ]
In ceph mainline it will allow to set the btime in the setattr request and just add a 'btime' member in the union 'ceph_mds_request_args' and then bump up the header version to 4. That means the total size of union 'ceph_mds_request_args' will increase sizeof(struct ceph_timespec) bytes, but in kclient it will increase the sizeof(setattr_ext) bytes for each request.
Since the MDS will always depend on the header's vesion and front_len members to decode the 'ceph_mds_request_head' struct, at the same time kclient hasn't supported the 'btime' feature yet in setattr request, so it's safe to do this change here.
This will save 48 bytes memories for each request.
Fixes: 4f1ddb1ea874 ("ceph: implement updated ceph_mds_request_head structure") Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Milind Changire mchangir@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org
include/linux/ceph/ceph_fs.h | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 49586ff261520..b4fa2a25b7d95 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -462,17 +462,19 @@ union ceph_mds_request_args { } __attribute__ ((packed));
union ceph_mds_request_args_ext {
union ceph_mds_request_args old;
struct {
__le32 mode;
__le32 uid;
__le32 gid;
struct ceph_timespec mtime;
struct ceph_timespec atime;
__le64 size, old_size; /* old_size needed by truncate */
__le32 mask; /* CEPH_SETATTR_* */
struct ceph_timespec btime;
} __attribute__ ((packed)) setattr_ext;
union {
union ceph_mds_request_args old;
struct {
__le32 mode;
__le32 uid;
__le32 gid;
struct ceph_timespec mtime;
struct ceph_timespec atime;
__le64 size, old_size; /* old_size needed by truncate */
__le32 mask; /* CEPH_SETATTR_* */
struct ceph_timespec btime;
} __attribute__ ((packed)) setattr_ext;
};
Hi Xiubo,
I was going to ask whether it makes sense to backport this change, but, after looking at it, the change seems bogus to me even in mainline. You added a union inside siting memory use but ceph_mds_request_args_ext was already a union before:
union ceph_mds_request_args_ext { union ceph_mds_request_args old; struct { ... } __attribute__ ((packed)) setattr_ext; }
What is being achieved here?
As I remembered there has other changes in this union in the beginning. And that patch seems being abandoned and missing this one.
Let's skip backporting this one and in the upstream just revert it.
Thanks
- Xiubo
union ceph_mds_request_args_ext { union { union ceph_mds_request_args old; struct { ... } __attribute__ ((packed)) setattr_ext; }; }
Thanks,
Ilya
On Mon, Sep 18, 2023 at 10:20 AM Xiubo Li xiubli@redhat.com wrote:
On 9/18/23 16:04, Ilya Dryomov wrote:
On Sun, Sep 17, 2023 at 9:49 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
6.5-stable review patch. If anyone has any objections, please let me know.
From: Xiubo Li xiubli@redhat.com
[ Upstream commit 3af5ae22030cb59fab4fba35f5a2b62f47e14df9 ]
In ceph mainline it will allow to set the btime in the setattr request and just add a 'btime' member in the union 'ceph_mds_request_args' and then bump up the header version to 4. That means the total size of union 'ceph_mds_request_args' will increase sizeof(struct ceph_timespec) bytes, but in kclient it will increase the sizeof(setattr_ext) bytes for each request.
Since the MDS will always depend on the header's vesion and front_len members to decode the 'ceph_mds_request_head' struct, at the same time kclient hasn't supported the 'btime' feature yet in setattr request, so it's safe to do this change here.
This will save 48 bytes memories for each request.
Fixes: 4f1ddb1ea874 ("ceph: implement updated ceph_mds_request_head structure") Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Milind Changire mchangir@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org
include/linux/ceph/ceph_fs.h | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 49586ff261520..b4fa2a25b7d95 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -462,17 +462,19 @@ union ceph_mds_request_args { } __attribute__ ((packed));
union ceph_mds_request_args_ext {
union ceph_mds_request_args old;
struct {
__le32 mode;
__le32 uid;
__le32 gid;
struct ceph_timespec mtime;
struct ceph_timespec atime;
__le64 size, old_size; /* old_size needed by truncate */
__le32 mask; /* CEPH_SETATTR_* */
struct ceph_timespec btime;
} __attribute__ ((packed)) setattr_ext;
union {
union ceph_mds_request_args old;
struct {
__le32 mode;
__le32 uid;
__le32 gid;
struct ceph_timespec mtime;
struct ceph_timespec atime;
__le64 size, old_size; /* old_size needed by truncate */
__le32 mask; /* CEPH_SETATTR_* */
struct ceph_timespec btime;
} __attribute__ ((packed)) setattr_ext;
};
Hi Xiubo,
I was going to ask whether it makes sense to backport this change, but, after looking at it, the change seems bogus to me even in mainline. You added a union inside siting memory use but ceph_mds_request_args_ext was already a union before:
union ceph_mds_request_args_ext { union ceph_mds_request_args old; struct { ... } __attribute__ ((packed)) setattr_ext; }
What is being achieved here?
As I remembered there has other changes in this union in the beginning. And that patch seems being abandoned and missing this one.
Let's skip backporting this one and in the upstream just revert it.
OK, I will send a revert to ceph-devel list.
Greg, please drop this one from all stable branches.
Thanks,
Ilya
On Mon, Sep 18, 2023 at 10:43:08AM +0200, Ilya Dryomov wrote:
On Mon, Sep 18, 2023 at 10:20 AM Xiubo Li xiubli@redhat.com wrote:
On 9/18/23 16:04, Ilya Dryomov wrote:
On Sun, Sep 17, 2023 at 9:49 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
6.5-stable review patch. If anyone has any objections, please let me know.
From: Xiubo Li xiubli@redhat.com
[ Upstream commit 3af5ae22030cb59fab4fba35f5a2b62f47e14df9 ]
In ceph mainline it will allow to set the btime in the setattr request and just add a 'btime' member in the union 'ceph_mds_request_args' and then bump up the header version to 4. That means the total size of union 'ceph_mds_request_args' will increase sizeof(struct ceph_timespec) bytes, but in kclient it will increase the sizeof(setattr_ext) bytes for each request.
Since the MDS will always depend on the header's vesion and front_len members to decode the 'ceph_mds_request_head' struct, at the same time kclient hasn't supported the 'btime' feature yet in setattr request, so it's safe to do this change here.
This will save 48 bytes memories for each request.
Fixes: 4f1ddb1ea874 ("ceph: implement updated ceph_mds_request_head structure") Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Milind Changire mchangir@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org
include/linux/ceph/ceph_fs.h | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 49586ff261520..b4fa2a25b7d95 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -462,17 +462,19 @@ union ceph_mds_request_args { } __attribute__ ((packed));
union ceph_mds_request_args_ext {
union ceph_mds_request_args old;
struct {
__le32 mode;
__le32 uid;
__le32 gid;
struct ceph_timespec mtime;
struct ceph_timespec atime;
__le64 size, old_size; /* old_size needed by truncate */
__le32 mask; /* CEPH_SETATTR_* */
struct ceph_timespec btime;
} __attribute__ ((packed)) setattr_ext;
union {
union ceph_mds_request_args old;
struct {
__le32 mode;
__le32 uid;
__le32 gid;
struct ceph_timespec mtime;
struct ceph_timespec atime;
__le64 size, old_size; /* old_size needed by truncate */
__le32 mask; /* CEPH_SETATTR_* */
struct ceph_timespec btime;
} __attribute__ ((packed)) setattr_ext;
};
Hi Xiubo,
I was going to ask whether it makes sense to backport this change, but, after looking at it, the change seems bogus to me even in mainline. You added a union inside siting memory use but ceph_mds_request_args_ext was already a union before:
union ceph_mds_request_args_ext { union ceph_mds_request_args old; struct { ... } __attribute__ ((packed)) setattr_ext; }
What is being achieved here?
As I remembered there has other changes in this union in the beginning. And that patch seems being abandoned and missing this one.
Let's skip backporting this one and in the upstream just revert it.
OK, I will send a revert to ceph-devel list.
Greg, please drop this one from all stable branches.
Now dropped, thanks.
greg k-h
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
[ Upstream commit f046923af79158361295ed4f0a588c80b9fdcc1d ]
Check that the pfn found by gfn_to_pfn() is actually backed by "struct page" memory prior to retrieving and dereferencing the page. KVM supports backing guest memory with VM_PFNMAP, VM_IO, etc., and so there is no guarantee the pfn returned by gfn_to_pfn() has an associated "struct page".
Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support") Reviewed-by: Yan Zhao yan.y.zhao@intel.com Tested-by: Yongwei Ma yongwei.ma@intel.com Reviewed-by: Zhi Wang zhi.a.wang@intel.com Link: https://lore.kernel.org/r/20230729013535.1070024-2-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gvt/gtt.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c index 4ec85308379a4..58b9b316ae462 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.c +++ b/drivers/gpu/drm/i915/gvt/gtt.c @@ -1183,6 +1183,10 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu, pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry)); if (is_error_noslot_pfn(pfn)) return -EINVAL; + + if (!pfn_valid(pfn)) + return -EINVAL; + return PageTransHuge(pfn_to_page(pfn)); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
[ Upstream commit 708e49583d7da863898b25dafe4bcd799c414278 ]
Put the struct page reference acquired by gfn_to_pfn(), KVM's API is that the caller is ultimately responsible for dropping any reference.
Note, kvm_release_pfn_clean() ensures the pfn is actually a refcounted struct page before trying to put any references.
Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support") Reviewed-by: Yan Zhao yan.y.zhao@intel.com Tested-by: Yongwei Ma yongwei.ma@intel.com Reviewed-by: Zhi Wang zhi.a.wang@intel.com Link: https://lore.kernel.org/r/20230729013535.1070024-6-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gvt/gtt.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c index 58b9b316ae462..8ba6c8668f033 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.c +++ b/drivers/gpu/drm/i915/gvt/gtt.c @@ -1174,6 +1174,7 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu, { const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops; kvm_pfn_t pfn; + int ret;
if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M)) return 0; @@ -1187,7 +1188,9 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu, if (!pfn_valid(pfn)) return -EINVAL;
- return PageTransHuge(pfn_to_page(pfn)); + ret = PageTransHuge(pfn_to_page(pfn)); + kvm_release_pfn_clean(pfn); + return ret; }
static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
[ Upstream commit a90c367e5af63880008e21dd199dac839e0e9e0f ]
Drop intel_vgpu_reset_gtt() as it no longer has any callers. In addition to eliminating dead code, this eliminates the last possible scenario where __kvmgt_protect_table_find() can be reached without holding vgpu_lock. Requiring vgpu_lock to be held when calling __kvmgt_protect_table_find() will allow a protecting the gfn hash with vgpu_lock without too much fuss.
No functional change intended.
Fixes: ba25d977571e ("drm/i915/gvt: Do not destroy ppgtt_mm during vGPU D3->D0.") Reviewed-by: Yan Zhao yan.y.zhao@intel.com Tested-by: Yongwei Ma yongwei.ma@intel.com Reviewed-by: Zhi Wang zhi.a.wang@intel.com Link: https://lore.kernel.org/r/20230729013535.1070024-11-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gvt/gtt.c | 18 ------------------ drivers/gpu/drm/i915/gvt/gtt.h | 1 - 2 files changed, 19 deletions(-)
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c index 8ba6c8668f033..ef5517ecd9a0c 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.c +++ b/drivers/gpu/drm/i915/gvt/gtt.c @@ -2882,24 +2882,6 @@ void intel_vgpu_reset_ggtt(struct intel_vgpu *vgpu, bool invalidate_old) ggtt_invalidate(gvt->gt); }
-/** - * intel_vgpu_reset_gtt - reset the all GTT related status - * @vgpu: a vGPU - * - * This function is called from vfio core to reset reset all - * GTT related status, including GGTT, PPGTT, scratch page. - * - */ -void intel_vgpu_reset_gtt(struct intel_vgpu *vgpu) -{ - /* Shadow pages are only created when there is no page - * table tracking data, so remove page tracking data after - * removing the shadow pages. - */ - intel_vgpu_destroy_all_ppgtt_mm(vgpu); - intel_vgpu_reset_ggtt(vgpu, true); -} - /** * intel_gvt_restore_ggtt - restore all vGPU's ggtt entries * @gvt: intel gvt device diff --git a/drivers/gpu/drm/i915/gvt/gtt.h b/drivers/gpu/drm/i915/gvt/gtt.h index a3b0f59ec8bd9..4cb183e06e95a 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.h +++ b/drivers/gpu/drm/i915/gvt/gtt.h @@ -224,7 +224,6 @@ void intel_vgpu_reset_ggtt(struct intel_vgpu *vgpu, bool invalidate_old); void intel_vgpu_invalidate_ppgtt(struct intel_vgpu *vgpu);
int intel_gvt_init_gtt(struct intel_gvt *gvt); -void intel_vgpu_reset_gtt(struct intel_vgpu *vgpu); void intel_gvt_clean_gtt(struct intel_gvt *gvt);
struct intel_vgpu_mm *intel_gvt_find_ppgtt_mm(struct intel_vgpu *vgpu,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hamza Mahfooz hamza.mahfooz@amd.com
[ Upstream commit ea7971af7a911a7a388b4c47db2a231a6b8dcc29 ]
As made mention of in commit 4a2df0d1f28e ("drm/amd/display: Fixed non-native modes not lighting up"), we shouldn't call drm_mode_set_crtcinfo() once the crtc timings have been decided. Since, it can cause settings to be unintentionally overwritten. So, since dm_state is never NULL now, we can use old_stream to determine if we should call drm_mode_set_crtcinfo() because we only need to set the crtc timing parameters for entirely new streams.
Cc: Harry Wentland harry.wentland@amd.com Cc: Rodrigo Siqueira rodrigo.siqueira@amd.com Fixes: bd49f19039c1 ("drm/amd/display: Always set crtcinfo from create_stream_for_sink") Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 3a7e7d2ce847b..e8e238865b021 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -5990,7 +5990,7 @@ create_stream_for_sink(struct amdgpu_dm_connector *aconnector,
if (recalculate_timing) drm_mode_set_crtcinfo(&saved_mode, 0); - else + else if (!old_stream) drm_mode_set_crtcinfo(&mode, 0);
/*
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 82ba0ff7bf0483d962e592017bef659ae022d754 ]
We should not call trace_handshake_cmd_done_err() if socket lookup has failed.
Also we should call trace_handshake_cmd_done_err() before releasing the file, otherwise dereferencing sock->sk can return garbage.
This also reverts 7afc6d0a107f ("net/handshake: Fix uninitialized local variable")
Unable to handle kernel paging request at virtual address dfff800000000003 KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [dfff800000000003] address between user and kernel address ranges Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 5986 Comm: syz-executor292 Not tainted 6.5.0-rc7-syzkaller-gfe4469582053 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : handshake_nl_done_doit+0x198/0x9c8 net/handshake/netlink.c:193 lr : handshake_nl_done_doit+0x180/0x9c8 sp : ffff800096e37180 x29: ffff800096e37200 x28: 1ffff00012dc6e34 x27: dfff800000000000 x26: ffff800096e373d0 x25: 0000000000000000 x24: 00000000ffffffa8 x23: ffff800096e373f0 x22: 1ffff00012dc6e38 x21: 0000000000000000 x20: ffff800096e371c0 x19: 0000000000000018 x18: 0000000000000000 x17: 0000000000000000 x16: ffff800080516cc4 x15: 0000000000000001 x14: 1fffe0001b14aa3b x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000003 x8 : 0000000000000003 x7 : ffff800080afe47c x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff800080a88078 x2 : 0000000000000001 x1 : 00000000ffffffa8 x0 : 0000000000000000 Call trace: handshake_nl_done_doit+0x198/0x9c8 net/handshake/netlink.c:193 genl_family_rcv_msg_doit net/netlink/genetlink.c:970 [inline] genl_family_rcv_msg net/netlink/genetlink.c:1050 [inline] genl_rcv_msg+0x96c/0xc50 net/netlink/genetlink.c:1067 netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2549 genl_rcv+0x38/0x50 net/netlink/genetlink.c:1078 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x834/0xb18 net/netlink/af_netlink.c:1914 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] ____sys_sendmsg+0x56c/0x840 net/socket.c:2494 ___sys_sendmsg net/socket.c:2548 [inline] __sys_sendmsg+0x26c/0x33c net/socket.c:2577 __do_sys_sendmsg net/socket.c:2586 [inline] __se_sys_sendmsg net/socket.c:2584 [inline] __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2584 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x58/0x16c arch/arm64/kernel/entry-common.c:678 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 Code: 12800108 b90043e8 910062b3 d343fe68 (387b6908)
Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Chuck Lever chuck.lever@oracle.com Reviewed-by: Michal Kubiak michal.kubiak@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/handshake/netlink.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c index 1086653e1fada..d0bc1dd8e65a8 100644 --- a/net/handshake/netlink.c +++ b/net/handshake/netlink.c @@ -157,26 +157,24 @@ int handshake_nl_accept_doit(struct sk_buff *skb, struct genl_info *info) int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *info) { struct net *net = sock_net(skb->sk); - struct handshake_req *req = NULL; - struct socket *sock = NULL; + struct handshake_req *req; + struct socket *sock; int fd, status, err;
if (GENL_REQ_ATTR_CHECK(info, HANDSHAKE_A_DONE_SOCKFD)) return -EINVAL; fd = nla_get_u32(info->attrs[HANDSHAKE_A_DONE_SOCKFD]);
- err = 0; sock = sockfd_lookup(fd, &err); - if (err) { - err = -EBADF; - goto out_status; - } + if (!sock) + return err;
req = handshake_req_hash_lookup(sock->sk); if (!req) { err = -EBUSY; + trace_handshake_cmd_done_err(net, req, sock->sk, err); fput(sock->file); - goto out_status; + return err; }
trace_handshake_cmd_done(net, req, sock->sk, fd); @@ -188,10 +186,6 @@ int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *info) handshake_complete(req, status, info); fput(sock->file); return 0; - -out_status: - trace_handshake_cmd_done_err(net, req, sock->sk, err); - return err; }
static unsigned int handshake_net_id;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 66d58f046c9d3a8f996b7138d02e965fd0617de0 ]
inet_sk_diag_fill() has been changed to use sk_forward_alloc_get(), but sk_get_meminfo() was forgotten.
Fixes: 292e6077b040 ("net: introduce sk_forward_alloc_get()") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 4ae68aa07e9fe..3109eb0cd512e 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3742,7 +3742,7 @@ void sk_get_meminfo(const struct sock *sk, u32 *mem) mem[SK_MEMINFO_RCVBUF] = READ_ONCE(sk->sk_rcvbuf); mem[SK_MEMINFO_WMEM_ALLOC] = sk_wmem_alloc_get(sk); mem[SK_MEMINFO_SNDBUF] = READ_ONCE(sk->sk_sndbuf); - mem[SK_MEMINFO_FWD_ALLOC] = sk->sk_forward_alloc; + mem[SK_MEMINFO_FWD_ALLOC] = sk_forward_alloc_get(sk); mem[SK_MEMINFO_WMEM_QUEUED] = READ_ONCE(sk->sk_wmem_queued); mem[SK_MEMINFO_OPTMEM] = atomic_read(&sk->sk_omem_alloc); mem[SK_MEMINFO_BACKLOG] = READ_ONCE(sk->sk_backlog.len);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 5e6300e7b3a4ab5b72a82079753868e91fbf9efc ]
Every time sk->sk_forward_alloc is read locklessly, add a READ_ONCE().
Add sk_forward_alloc_add() helper to centralize updates, to reduce number of WRITE_ONCE().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/sock.h | 12 +++++++++--- net/core/sock.c | 8 ++++---- net/ipv4/tcp_output.c | 2 +- net/ipv4/udp.c | 6 +++--- net/mptcp/protocol.c | 6 +++--- 5 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h index e8927f2d47a3c..85f9dffde05d0 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1053,6 +1053,12 @@ static inline void sk_wmem_queued_add(struct sock *sk, int val) WRITE_ONCE(sk->sk_wmem_queued, sk->sk_wmem_queued + val); }
+static inline void sk_forward_alloc_add(struct sock *sk, int val) +{ + /* Paired with lockless reads of sk->sk_forward_alloc */ + WRITE_ONCE(sk->sk_forward_alloc, sk->sk_forward_alloc + val); +} + void sk_stream_write_space(struct sock *sk);
/* OOB backlog add */ @@ -1377,7 +1383,7 @@ static inline int sk_forward_alloc_get(const struct sock *sk) if (sk->sk_prot->forward_alloc_get) return sk->sk_prot->forward_alloc_get(sk); #endif - return sk->sk_forward_alloc; + return READ_ONCE(sk->sk_forward_alloc); }
static inline bool __sk_stream_memory_free(const struct sock *sk, int wake) @@ -1673,14 +1679,14 @@ static inline void sk_mem_charge(struct sock *sk, int size) { if (!sk_has_account(sk)) return; - sk->sk_forward_alloc -= size; + sk_forward_alloc_add(sk, -size); }
static inline void sk_mem_uncharge(struct sock *sk, int size) { if (!sk_has_account(sk)) return; - sk->sk_forward_alloc += size; + sk_forward_alloc_add(sk, size); sk_mem_reclaim(sk); }
diff --git a/net/core/sock.c b/net/core/sock.c index 3109eb0cd512e..5edb17de6d1df 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1045,7 +1045,7 @@ static int sock_reserve_memory(struct sock *sk, int bytes) mem_cgroup_uncharge_skmem(sk->sk_memcg, pages); return -ENOMEM; } - sk->sk_forward_alloc += pages << PAGE_SHIFT; + sk_forward_alloc_add(sk, pages << PAGE_SHIFT);
WRITE_ONCE(sk->sk_reserved_mem, sk->sk_reserved_mem + (pages << PAGE_SHIFT)); @@ -3138,10 +3138,10 @@ int __sk_mem_schedule(struct sock *sk, int size, int kind) { int ret, amt = sk_mem_pages(size);
- sk->sk_forward_alloc += amt << PAGE_SHIFT; + sk_forward_alloc_add(sk, amt << PAGE_SHIFT); ret = __sk_mem_raise_allocated(sk, size, amt, kind); if (!ret) - sk->sk_forward_alloc -= amt << PAGE_SHIFT; + sk_forward_alloc_add(sk, -(amt << PAGE_SHIFT)); return ret; } EXPORT_SYMBOL(__sk_mem_schedule); @@ -3173,7 +3173,7 @@ void __sk_mem_reduce_allocated(struct sock *sk, int amount) void __sk_mem_reclaim(struct sock *sk, int amount) { amount >>= PAGE_SHIFT; - sk->sk_forward_alloc -= amount << PAGE_SHIFT; + sk_forward_alloc_add(sk, -(amount << PAGE_SHIFT)); __sk_mem_reduce_allocated(sk, amount); } EXPORT_SYMBOL(__sk_mem_reclaim); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 51d8638d4b4c6..9f9ca68c47026 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3459,7 +3459,7 @@ void sk_forced_mem_schedule(struct sock *sk, int size) if (delta <= 0) return; amt = sk_mem_pages(delta); - sk->sk_forward_alloc += amt << PAGE_SHIFT; + sk_forward_alloc_add(sk, amt << PAGE_SHIFT); sk_memory_allocated_add(sk, amt);
if (mem_cgroup_sockets_enabled && sk->sk_memcg) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index b3aa68ea29de2..4c847baf52d1c 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1443,9 +1443,9 @@ static void udp_rmem_release(struct sock *sk, int size, int partial, spin_lock(&sk_queue->lock);
- sk->sk_forward_alloc += size; + sk_forward_alloc_add(sk, size); amt = (sk->sk_forward_alloc - partial) & ~(PAGE_SIZE - 1); - sk->sk_forward_alloc -= amt; + sk_forward_alloc_add(sk, -amt);
if (amt) __sk_mem_reduce_allocated(sk, amt >> PAGE_SHIFT); @@ -1556,7 +1556,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) goto uncharge_drop; }
- sk->sk_forward_alloc -= size; + sk_forward_alloc_add(sk, -size);
/* no need to setup a destructor, we will explicitly release the * forward allocated memory on dequeue diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 0efc52c640b59..996e031dff78a 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1773,7 +1773,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) }
/* data successfully copied into the write queue */ - sk->sk_forward_alloc -= total_ts; + sk_forward_alloc_add(sk, -total_ts); copied += psize; dfrag->data_len += psize; frag_truesize += psize; @@ -3242,7 +3242,7 @@ void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags) /* move all the rx fwd alloc into the sk_mem_reclaim_final in * inet_sock_destruct() will dispose it */ - sk->sk_forward_alloc += msk->rmem_fwd_alloc; + sk_forward_alloc_add(sk, msk->rmem_fwd_alloc); msk->rmem_fwd_alloc = 0; mptcp_token_destroy(msk); mptcp_pm_free_anno_list(msk); @@ -3513,7 +3513,7 @@ static void mptcp_shutdown(struct sock *sk, int how)
static int mptcp_forward_alloc_get(const struct sock *sk) { - return sk->sk_forward_alloc + mptcp_sk(sk)->rmem_fwd_alloc; + return READ_ONCE(sk->sk_forward_alloc) + mptcp_sk(sk)->rmem_fwd_alloc; }
static int mptcp_ioctl_outq(const struct mptcp_sock *msk, u64 v)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 9531e4a83febc3fb47ac77e24cfb5ea97e50034d ]
msk->rmem_fwd_alloc can be read locklessly.
Add mptcp_rmem_fwd_alloc_add(), similar to sk_forward_alloc_add(), and appropriate READ_ONCE()/WRITE_ONCE() annotations.
Fixes: 6511882cdd82 ("mptcp: allocate fwd memory separately on the rx and tx path") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Paolo Abeni pabeni@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/protocol.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 996e031dff78a..40258d9f8c799 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -136,9 +136,15 @@ static void mptcp_drop(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); }
+static void mptcp_rmem_fwd_alloc_add(struct sock *sk, int size) +{ + WRITE_ONCE(mptcp_sk(sk)->rmem_fwd_alloc, + mptcp_sk(sk)->rmem_fwd_alloc + size); +} + static void mptcp_rmem_charge(struct sock *sk, int size) { - mptcp_sk(sk)->rmem_fwd_alloc -= size; + mptcp_rmem_fwd_alloc_add(sk, -size); }
static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, @@ -179,7 +185,7 @@ static bool mptcp_ooo_try_coalesce(struct mptcp_sock *msk, struct sk_buff *to, static void __mptcp_rmem_reclaim(struct sock *sk, int amount) { amount >>= PAGE_SHIFT; - mptcp_sk(sk)->rmem_fwd_alloc -= amount << PAGE_SHIFT; + mptcp_rmem_charge(sk, amount << PAGE_SHIFT); __sk_mem_reduce_allocated(sk, amount); }
@@ -188,7 +194,7 @@ static void mptcp_rmem_uncharge(struct sock *sk, int size) struct mptcp_sock *msk = mptcp_sk(sk); int reclaimable;
- msk->rmem_fwd_alloc += size; + mptcp_rmem_fwd_alloc_add(sk, size); reclaimable = msk->rmem_fwd_alloc - sk_unused_reserved_mem(sk);
/* see sk_mem_uncharge() for the rationale behind the following schema */ @@ -343,7 +349,7 @@ static bool mptcp_rmem_schedule(struct sock *sk, struct sock *ssk, int size) if (!__sk_mem_raise_allocated(sk, size, amt, SK_MEM_RECV)) return false;
- msk->rmem_fwd_alloc += amount; + mptcp_rmem_fwd_alloc_add(sk, amount); return true; }
@@ -3243,7 +3249,7 @@ void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags) * inet_sock_destruct() will dispose it */ sk_forward_alloc_add(sk, msk->rmem_fwd_alloc); - msk->rmem_fwd_alloc = 0; + WRITE_ONCE(msk->rmem_fwd_alloc, 0); mptcp_token_destroy(msk); mptcp_pm_free_anno_list(msk); mptcp_free_local_addr_list(msk); @@ -3513,7 +3519,8 @@ static void mptcp_shutdown(struct sock *sk, int how)
static int mptcp_forward_alloc_get(const struct sock *sk) { - return READ_ONCE(sk->sk_forward_alloc) + mptcp_sk(sk)->rmem_fwd_alloc; + return READ_ONCE(sk->sk_forward_alloc) + + READ_ONCE(mptcp_sk(sk)->rmem_fwd_alloc); }
static int mptcp_ioctl_outq(const struct mptcp_sock *msk, u64 v)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit e3390b30a5dfb112e8e802a59c0f68f947b638b2 ]
sk->sk_tsflags can be read locklessly, add corresponding annotations.
Fixes: b9f40e21ef42 ("net-timestamp: move timestamp flags out of sk_flags") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/ip.h | 2 +- include/net/sock.h | 17 ++++++++++------- net/can/j1939/socket.c | 10 ++++++---- net/core/skbuff.c | 10 ++++++---- net/core/sock.c | 4 ++-- net/ipv4/ip_output.c | 2 +- net/ipv4/ip_sockglue.c | 2 +- net/ipv4/tcp.c | 4 ++-- net/ipv6/ip6_output.c | 2 +- net/ipv6/ping.c | 2 +- net/ipv6/raw.c | 2 +- net/ipv6/udp.c | 2 +- net/socket.c | 13 +++++++------ 13 files changed, 40 insertions(+), 32 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h index 19adacd5ece03..9276cea775cc2 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -94,7 +94,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm, ipcm_init(ipcm);
ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark); - ipcm->sockc.tsflags = inet->sk.sk_tsflags; + ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags); ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if); ipcm->addr = inet->inet_saddr; ipcm->protocol = inet->inet_num; diff --git a/include/net/sock.h b/include/net/sock.h index 85f9dffde05d0..4e787285fc66b 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1906,7 +1906,9 @@ struct sockcm_cookie { static inline void sockcm_init(struct sockcm_cookie *sockc, const struct sock *sk) { - *sockc = (struct sockcm_cookie) { .tsflags = sk->sk_tsflags }; + *sockc = (struct sockcm_cookie) { + .tsflags = READ_ONCE(sk->sk_tsflags) + }; }
int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg, @@ -2701,9 +2703,9 @@ void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk, static inline void sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { - ktime_t kt = skb->tstamp; struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb); - + u32 tsflags = READ_ONCE(sk->sk_tsflags); + ktime_t kt = skb->tstamp; /* * generate control messages if * - receive time stamping in software requested @@ -2711,10 +2713,10 @@ sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) * - hardware time stamps available and wanted */ if (sock_flag(sk, SOCK_RCVTSTAMP) || - (sk->sk_tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || - (kt && sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE) || + (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || + (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || (hwtstamps->hwtstamp && - (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) + (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) __sock_recv_timestamp(msg, sk, skb); else sock_write_timestamp(sk, kt); @@ -2736,7 +2738,8 @@ static inline void sock_recv_cmsgs(struct msghdr *msg, struct sock *sk, #define TSFLAGS_ANY (SOF_TIMESTAMPING_SOFTWARE | \ SOF_TIMESTAMPING_RAW_HARDWARE)
- if (sk->sk_flags & FLAGS_RECV_CMSGS || sk->sk_tsflags & TSFLAGS_ANY) + if (sk->sk_flags & FLAGS_RECV_CMSGS || + READ_ONCE(sk->sk_tsflags) & TSFLAGS_ANY) __sock_recv_cmsgs(msg, sk, skb); else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP))) sock_write_timestamp(sk, skb->tstamp); diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c index feaec4ad6d163..b28c976f52a0a 100644 --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -974,6 +974,7 @@ static void __j1939_sk_errqueue(struct j1939_session *session, struct sock *sk, struct sock_exterr_skb *serr; struct sk_buff *skb; char *state = "UNK"; + u32 tsflags; int err;
jsk = j1939_sk(sk); @@ -981,13 +982,14 @@ static void __j1939_sk_errqueue(struct j1939_session *session, struct sock *sk, if (!(jsk->state & J1939_SOCK_ERRQUEUE)) return;
+ tsflags = READ_ONCE(sk->sk_tsflags); switch (type) { case J1939_ERRQUEUE_TX_ACK: - if (!(sk->sk_tsflags & SOF_TIMESTAMPING_TX_ACK)) + if (!(tsflags & SOF_TIMESTAMPING_TX_ACK)) return; break; case J1939_ERRQUEUE_TX_SCHED: - if (!(sk->sk_tsflags & SOF_TIMESTAMPING_TX_SCHED)) + if (!(tsflags & SOF_TIMESTAMPING_TX_SCHED)) return; break; case J1939_ERRQUEUE_TX_ABORT: @@ -997,7 +999,7 @@ static void __j1939_sk_errqueue(struct j1939_session *session, struct sock *sk, case J1939_ERRQUEUE_RX_DPO: fallthrough; case J1939_ERRQUEUE_RX_ABORT: - if (!(sk->sk_tsflags & SOF_TIMESTAMPING_RX_SOFTWARE)) + if (!(tsflags & SOF_TIMESTAMPING_RX_SOFTWARE)) return; break; default: @@ -1054,7 +1056,7 @@ static void __j1939_sk_errqueue(struct j1939_session *session, struct sock *sk, }
serr->opt_stats = true; - if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) + if (tsflags & SOF_TIMESTAMPING_OPT_ID) serr->ee.ee_data = session->tskey;
netdev_dbg(session->priv->ndev, "%s: 0x%p tskey: %i, state: %s\n", diff --git a/net/core/skbuff.c b/net/core/skbuff.c index acdf94bb54c80..7dfae58055c2b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -5149,7 +5149,7 @@ static void __skb_complete_tx_timestamp(struct sk_buff *skb, serr->ee.ee_info = tstype; serr->opt_stats = opt_stats; serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0; - if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) { + if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) { serr->ee.ee_data = skb_shinfo(skb)->tskey; if (sk_is_tcp(sk)) serr->ee.ee_data -= atomic_read(&sk->sk_tskey); @@ -5205,21 +5205,23 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, { struct sk_buff *skb; bool tsonly, opt_stats = false; + u32 tsflags;
if (!sk) return;
- if (!hwtstamps && !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) && + tsflags = READ_ONCE(sk->sk_tsflags); + if (!hwtstamps && !(tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) && skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS) return;
- tsonly = sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TSONLY; + tsonly = tsflags & SOF_TIMESTAMPING_OPT_TSONLY; if (!skb_may_tx_timestamp(sk, tsonly)) return;
if (tsonly) { #ifdef CONFIG_INET - if ((sk->sk_tsflags & SOF_TIMESTAMPING_OPT_STATS) && + if ((tsflags & SOF_TIMESTAMPING_OPT_STATS) && sk_is_tcp(sk)) { skb = tcp_get_timestamping_opt_stats(sk, orig_skb, ack_skb); diff --git a/net/core/sock.c b/net/core/sock.c index 5edb17de6d1df..fea5961c51fd1 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -937,7 +937,7 @@ int sock_set_timestamping(struct sock *sk, int optname, return ret; }
- sk->sk_tsflags = val; + WRITE_ONCE(sk->sk_tsflags, val); sock_valbool_flag(sk, SOCK_TSTAMP_NEW, optname == SO_TIMESTAMPING_NEW);
if (val & SOF_TIMESTAMPING_RX_SOFTWARE) @@ -1718,7 +1718,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
case SO_TIMESTAMPING_OLD: lv = sizeof(v.timestamping); - v.timestamping.flags = sk->sk_tsflags; + v.timestamping.flags = READ_ONCE(sk->sk_tsflags); v.timestamping.bind_phc = sk->sk_bind_phc; break;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index a6e4c82615d7e..6935d07a60c35 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -982,7 +982,7 @@ static int __ip_append_data(struct sock *sk, paged = !!cork->gso_size;
if (cork->tx_flags & SKBTX_ANY_TSTAMP && - sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) + READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) tskey = atomic_inc_return(&sk->sk_tskey) - 1;
hh_len = LL_RESERVED_SPACE(rt->dst.dev); diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index d41bce8927b2c..d7006942fc2f9 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -510,7 +510,7 @@ static bool ipv4_datagram_support_cmsg(const struct sock *sk, * or without payload (SOF_TIMESTAMPING_OPT_TSONLY). */ info = PKTINFO_SKB_CB(skb); - if (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG) || + if (!(READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_CMSG) || !info->ipi_ifindex) return false;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 8ed52e1e3c99a..75f24b931a185 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2256,14 +2256,14 @@ void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk, } }
- if (sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE) + if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_SOFTWARE) has_timestamping = true; else tss->ts[0] = (struct timespec64) {0}; }
if (tss->ts[2].tv_sec || tss->ts[2].tv_nsec) { - if (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) + if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_RAW_HARDWARE) has_timestamping = true; else tss->ts[2] = (struct timespec64) {0}; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 016b0a513259f..9270ef7f8e98b 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1502,7 +1502,7 @@ static int __ip6_append_data(struct sock *sk, orig_mtu = mtu;
if (cork->tx_flags & SKBTX_ANY_TSTAMP && - sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) + READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) tskey = atomic_inc_return(&sk->sk_tskey) - 1;
hh_len = LL_RESERVED_SPACE(rt->dst.dev); diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c index 1b27728349725..5831aaa53d75e 100644 --- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -119,7 +119,7 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) return -EINVAL;
ipcm6_init_sk(&ipc6, np); - ipc6.sockc.tsflags = sk->sk_tsflags; + ipc6.sockc.tsflags = READ_ONCE(sk->sk_tsflags); ipc6.sockc.mark = READ_ONCE(sk->sk_mark);
fl6.flowi6_oif = oif; diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index ea16734f5e1f7..d52d5e34c12ae 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -778,7 +778,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) fl6.flowi6_uid = sk->sk_uid;
ipcm6_init(&ipc6); - ipc6.sockc.tsflags = sk->sk_tsflags; + ipc6.sockc.tsflags = READ_ONCE(sk->sk_tsflags); ipc6.sockc.mark = fl6.flowi6_mark;
if (sin6) { diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 3ffca158d3e11..24d3c5c791218 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1368,7 +1368,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
ipcm6_init(&ipc6); ipc6.gso_size = READ_ONCE(up->gso_size); - ipc6.sockc.tsflags = sk->sk_tsflags; + ipc6.sockc.tsflags = READ_ONCE(sk->sk_tsflags); ipc6.sockc.mark = READ_ONCE(sk->sk_mark);
/* destination address check */ diff --git a/net/socket.c b/net/socket.c index f49edb9b49185..6bba7818b593d 100644 --- a/net/socket.c +++ b/net/socket.c @@ -821,7 +821,7 @@ static bool skb_is_swtx_tstamp(const struct sk_buff *skb, int false_tstamp)
static ktime_t get_timestamp(struct sock *sk, struct sk_buff *skb, int *if_index) { - bool cycles = sk->sk_tsflags & SOF_TIMESTAMPING_BIND_PHC; + bool cycles = READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_BIND_PHC; struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); struct net_device *orig_dev; ktime_t hwtstamp; @@ -873,12 +873,12 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, int need_software_tstamp = sock_flag(sk, SOCK_RCVTSTAMP); int new_tstamp = sock_flag(sk, SOCK_TSTAMP_NEW); struct scm_timestamping_internal tss; - int empty = 1, false_tstamp = 0; struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); int if_index; ktime_t hwtstamp; + u32 tsflags;
/* Race occurred between timestamp enabling and packet receiving. Fill in the current time for now. */ @@ -920,11 +920,12 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, }
memset(&tss, 0, sizeof(tss)); - if ((sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE) && + tsflags = READ_ONCE(sk->sk_tsflags); + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0)) empty = 0; if (shhwtstamps && - (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) && + (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) && !skb_is_swtx_tstamp(skb, false_tstamp)) { if_index = 0; if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV) @@ -932,14 +933,14 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, else hwtstamp = shhwtstamps->hwtstamp;
- if (sk->sk_tsflags & SOF_TIMESTAMPING_BIND_PHC) + if (tsflags & SOF_TIMESTAMPING_BIND_PHC) hwtstamp = ptp_convert_timestamp(&hwtstamp, sk->sk_bind_phc);
if (ktime_to_timespec64_cond(hwtstamp, tss.ts + 2)) { empty = 0;
- if ((sk->sk_tsflags & SOF_TIMESTAMPING_OPT_PKTINFO) && + if ((tsflags & SOF_TIMESTAMPING_OPT_PKTINFO) && !skb_is_err_queue(skb)) put_ts_pktinfo(msg, skb, if_index); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 251cd405a9e6e70b92fe5afbdd17fd5caf9d3266 ]
sk->sk_bind_phc is read locklessly. Add corresponding annotations.
Fixes: d463126e23f1 ("net: sock: extend SO_TIMESTAMPING for PHC binding") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Yangbo Lu yangbo.lu@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 4 ++-- net/socket.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c index fea5961c51fd1..0a687c8fbed7f 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -894,7 +894,7 @@ static int sock_timestamping_bind_phc(struct sock *sk, int phc_index) if (!match) return -EINVAL;
- sk->sk_bind_phc = phc_index; + WRITE_ONCE(sk->sk_bind_phc, phc_index);
return 0; } @@ -1719,7 +1719,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, case SO_TIMESTAMPING_OLD: lv = sizeof(v.timestamping); v.timestamping.flags = READ_ONCE(sk->sk_tsflags); - v.timestamping.bind_phc = sk->sk_bind_phc; + v.timestamping.bind_phc = READ_ONCE(sk->sk_bind_phc); break;
case SO_RCVTIMEO_OLD: diff --git a/net/socket.c b/net/socket.c index 6bba7818b593d..b5639a6500158 100644 --- a/net/socket.c +++ b/net/socket.c @@ -935,7 +935,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
if (tsflags & SOF_TIMESTAMPING_BIND_PHC) hwtstamp = ptp_convert_timestamp(&hwtstamp, - sk->sk_bind_phc); + READ_ONCE(sk->sk_bind_phc));
if (ktime_to_timespec64_cond(hwtstamp, tss.ts + 2)) { empty = 0;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sriram Yagnaraman sriram.yagnaraman@est.tech
[ Upstream commit 6ac66cb03ae306c2e288a9be18226310529f5b25 ]
Route hints when the nexthop is part of a multipath group causes packets in the same receive batch to be sent to the same nexthop irrespective of the multipath hash of the packet. So, do not extract route hint for packets whose destination is part of a multipath group.
A new SKB flag IPSKB_MULTIPATH is introduced for this purpose, set the flag when route is looked up in ip_mkroute_input() and use it in ip_extract_route_hint() to check for the existence of the flag.
Fixes: 02b24941619f ("ipv4: use dst hint for ipv4 list receive") Signed-off-by: Sriram Yagnaraman sriram.yagnaraman@est.tech Reviewed-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/ip.h | 1 + net/ipv4/ip_input.c | 3 ++- net/ipv4/route.c | 1 + 3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/net/ip.h b/include/net/ip.h index 9276cea775cc2..3489a1cca5e7b 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -57,6 +57,7 @@ struct inet_skb_parm { #define IPSKB_FRAG_PMTU BIT(6) #define IPSKB_L3SLAVE BIT(7) #define IPSKB_NOPOLICY BIT(8) +#define IPSKB_MULTIPATH BIT(9)
u16 frag_max_size; }; diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index fe9ead9ee863d..5e9c8156656a7 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -584,7 +584,8 @@ static void ip_sublist_rcv_finish(struct list_head *head) static struct sk_buff *ip_extract_route_hint(const struct net *net, struct sk_buff *skb, int rt_type) { - if (fib4_has_custom_rules(net) || rt_type == RTN_BROADCAST) + if (fib4_has_custom_rules(net) || rt_type == RTN_BROADCAST || + IPCB(skb)->flags & IPSKB_MULTIPATH) return NULL;
return skb; diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 92fede388d520..33626619aee79 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2144,6 +2144,7 @@ static int ip_mkroute_input(struct sk_buff *skb, int h = fib_multipath_hash(res->fi->fib_net, NULL, skb, hkeys);
fib_select_multipath(res, h); + IPCB(skb)->flags |= IPSKB_MULTIPATH; } #endif
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sriram Yagnaraman sriram.yagnaraman@est.tech
[ Upstream commit 8423be8926aa82cd2e28bba5cc96ccb72c7ce6be ]
Route hints when the nexthop is part of a multipath group causes packets in the same receive batch to be sent to the same nexthop irrespective of the multipath hash of the packet. So, do not extract route hint for packets whose destination is part of a multipath group.
A new SKB flag IP6SKB_MULTIPATH is introduced for this purpose, set the flag when route is looked up in fib6_select_path() and use it in ip6_can_use_hint() to check for the existence of the flag.
Fixes: 197dbf24e360 ("ipv6: introduce and uses route look hints for list input.") Signed-off-by: Sriram Yagnaraman sriram.yagnaraman@est.tech Reviewed-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/ipv6.h | 1 + net/ipv6/ip6_input.c | 3 ++- net/ipv6/route.c | 3 +++ 3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 660012997f54c..644e69354cba6 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -146,6 +146,7 @@ struct inet6_skb_parm { #define IP6SKB_JUMBOGRAM 128 #define IP6SKB_SEG6 256 #define IP6SKB_FAKEJUMBO 512 +#define IP6SKB_MULTIPATH 1024 };
#if defined(CONFIG_NET_L3_MASTER_DEV) diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index d94041bb42872..b8378814532ce 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -99,7 +99,8 @@ static bool ip6_can_use_hint(const struct sk_buff *skb, static struct sk_buff *ip6_extract_route_hint(const struct net *net, struct sk_buff *skb) { - if (fib6_routes_require_src(net) || fib6_has_custom_rules(net)) + if (fib6_routes_require_src(net) || fib6_has_custom_rules(net) || + IP6CB(skb)->flags & IP6SKB_MULTIPATH) return NULL;
return skb; diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 56a55585eb798..a02328c93a537 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -424,6 +424,9 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, if (match->nh && have_oif_match && res->nh) return;
+ if (skb) + IP6CB(skb)->flags |= IP6SKB_MULTIPATH; + /* We might have already computed the hash for ICMPv6 errors. In such * case it will always be non-zero. Otherwise now is the time to do it. */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xu Kuohai xukuohai@huawei.com
[ Upstream commit c1970e26bdc1209974bb5cf31cc23f2b7ad6ce50 ]
While commit 90f0074cd9f9 ("selftests/bpf: fix a CI failure caused by vsock sockmap test") fixes a receive failure of vsock sockmap test, there is still a write failure:
Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected vsock_unix_redir_connectible:FAIL:1501 ./test_progs:vsock_unix_redir_connectible:1501: ingress: write: Transport endpoint is not connected vsock_unix_redir_connectible:FAIL:1501 ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected vsock_unix_redir_connectible:FAIL:1501
The reason is that the vsock connection in the test is set to ESTABLISHED state by function virtio_transport_recv_pkt, which is executed in a workqueue thread, so when the user space test thread runs before the workqueue thread, this problem occurs.
To fix it, before writing the connection, wait for it to be connected.
Fixes: d61bd8c1fd02 ("selftests/bpf: add a test case for vsock sockmap") Signed-off-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20230901031037.3314007-1-xukuohai@huaweicloud.co... Signed-off-by: Sasha Levin sashal@kernel.org --- .../bpf/prog_tests/sockmap_helpers.h | 26 +++++++++++++++++++ .../selftests/bpf/prog_tests/sockmap_listen.c | 7 +++++ 2 files changed, 33 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h index d12665490a905..36d829a65aa44 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h @@ -179,6 +179,32 @@ __ret; \ })
+static inline int poll_connect(int fd, unsigned int timeout_sec) +{ + struct timeval timeout = { .tv_sec = timeout_sec }; + fd_set wfds; + int r, eval; + socklen_t esize = sizeof(eval); + + FD_ZERO(&wfds); + FD_SET(fd, &wfds); + + r = select(fd + 1, NULL, &wfds, NULL, &timeout); + if (r == 0) + errno = ETIME; + if (r != 1) + return -1; + + if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &eval, &esize) < 0) + return -1; + if (eval != 0) { + errno = eval; + return -1; + } + + return 0; +} + static inline int poll_read(int fd, unsigned int timeout_sec) { struct timeval timeout = { .tv_sec = timeout_sec }; diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index 5674a9d0cacf0..8df8cbb447f10 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -1452,11 +1452,18 @@ static int vsock_socketpair_connectible(int sotype, int *v0, int *v1) if (p < 0) goto close_cli;
+ if (poll_connect(c, IO_TIMEOUT_SEC) < 0) { + FAIL_ERRNO("poll_connect"); + goto close_acc; + } + *v0 = p; *v1 = c;
return 0;
+close_acc: + close(p); close_cli: close(c); close_srv:
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Corinna Vinschen vinschen@redhat.com
[ Upstream commit fa09bc40b21a33937872c4c4cf0f266ec9fa4869 ]
Disable virtualization features on 82580 just as on i210/i211. This avoids that virt functions are acidentally called on 82850.
Fixes: 55cac248caa4 ("igb: Add full support for 82580 devices") Signed-off-by: Corinna Vinschen vinschen@redhat.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 08e3df37089fe..ac19730e8db91 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3933,8 +3933,9 @@ static void igb_probe_vfs(struct igb_adapter *adapter) struct pci_dev *pdev = adapter->pdev; struct e1000_hw *hw = &adapter->hw;
- /* Virtualization features not supported on i210 family. */ - if ((hw->mac.type == e1000_i210) || (hw->mac.type == e1000_i211)) + /* Virtualization features not supported on i210 and 82580 family. */ + if ((hw->mac.type == e1000_i210) || (hw->mac.type == e1000_i211) || + (hw->mac.type == e1000_82580)) return;
/* Of the below we really only want the effect of getting
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 817c7cd2043a83a3d8147f40eea1505ac7300b62 ]
gve_rx_append_frags() is able to build skbs chained with frag_list, like GRO engine.
Problem is that shinfo->frag_list should only be used for the head of the chain.
All other links should use skb->next pointer.
Otherwise, built skbs are not valid and can cause crashes.
Equivalent code in GRO (skb_gro_receive()) is:
if (NAPI_GRO_CB(p)->last == p) skb_shinfo(p)->frag_list = skb; else NAPI_GRO_CB(p)->last->next = skb; NAPI_GRO_CB(p)->last = skb;
Fixes: 9b8dd5e5ea48 ("gve: DQO: Add RX path") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Bailey Forrest bcf@google.com Cc: Willem de Bruijn willemb@google.com Cc: Catherine Sullivan csully@google.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/google/gve/gve_rx_dqo.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/google/gve/gve_rx_dqo.c b/drivers/net/ethernet/google/gve/gve_rx_dqo.c index e57b73eb70f62..ac041cc5714c0 100644 --- a/drivers/net/ethernet/google/gve/gve_rx_dqo.c +++ b/drivers/net/ethernet/google/gve/gve_rx_dqo.c @@ -492,7 +492,10 @@ static int gve_rx_append_frags(struct napi_struct *napi, if (!skb) return -1;
- skb_shinfo(rx->ctx.skb_tail)->frag_list = skb; + if (rx->ctx.skb_tail == rx->ctx.skb_head) + skb_shinfo(rx->ctx.skb_head)->frag_list = skb; + else + rx->ctx.skb_tail->next = skb; rx->ctx.skb_tail = skb; num_frags = 0; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liang Chen liangchen.linux@gmail.com
[ Upstream commit 151e887d8ff97e2e42110ffa1fb1e6a2128fb364 ]
The veth_xmit function returns NETDEV_TX_OK even when packets are dropped. This behavior leads to incorrect calculations of statistics counts, as well as things like txq->trans_start updates.
Fixes: e314dbdc1c0d ("[NET]: Virtual ethernet device driver.") Signed-off-by: Liang Chen liangchen.linux@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/veth.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index ef8eacb596f73..2db678c0082a3 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -344,6 +344,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); struct veth_rq *rq = NULL; + int ret = NETDEV_TX_OK; struct net_device *rcv; int length = skb->len; bool use_napi = false; @@ -376,6 +377,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) } else { drop: atomic64_inc(&priv->dropped); + ret = NET_XMIT_DROP; }
if (use_napi) @@ -383,7 +385,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
rcu_read_unlock();
- return NETDEV_TX_OK; + return ret; }
static u64 veth_stats_tx(struct net_device *dev, u64 *packets, u64 *bytes)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Henrie alexhenrie24@gmail.com
[ Upstream commit f31867d0d9d82af757c1e0178b659438f4c1ea3c ]
The existing code incorrectly casted a negative value (the result of a subtraction) to an unsigned value without checking. For example, if /proc/sys/net/ipv6/conf/*/temp_prefered_lft was set to 1, the preferred lifetime would jump to 4 billion seconds. On my machine and network the shortest lifetime that avoided underflow was 3 seconds.
Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR") Signed-off-by: Alex Henrie alexhenrie24@gmail.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/addrconf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 94cec2075eee8..c93a2b9a91723 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1368,7 +1368,7 @@ static int ipv6_create_tempaddr(struct inet6_ifaddr *ifp, bool block) * idev->desync_factor if it's larger */ cnf_temp_preferred_lft = READ_ONCE(idev->cnf.temp_prefered_lft); - max_desync_factor = min_t(__u32, + max_desync_factor = min_t(long, idev->cnf.max_desync_factor, cnf_temp_preferred_lft - regen_advance);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleksij Rempel o.rempel@pengutronix.de
[ Upstream commit 719c5e37e99d2fd588d1c994284d17650a66354c ]
Previously, the defines for phy_device flags in the Micrel driver were ambiguous in their representation. They were intended to be bit masks but were mistakenly defined as bit positions. This led to the following issues:
- MICREL_KSZ8_P1_ERRATA, designated for KSZ88xx switches, overlapped with MICREL_PHY_FXEN and MICREL_PHY_50MHZ_CLK. - Due to this overlap, the code path for MICREL_PHY_FXEN, tailored for the KSZ8041 PHY, was not executed for KSZ88xx PHYs. - Similarly, the code associated with MICREL_PHY_50MHZ_CLK wasn't triggered for KSZ88xx.
To rectify this, all three flags have now been explicitly converted to use the `BIT()` macro, ensuring they are defined as bit masks and preventing potential overlaps in the future.
Fixes: 49011e0c1555 ("net: phy: micrel: ksz886x/ksz8081: add cabletest support") Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Reviewed-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/micrel_phy.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/micrel_phy.h b/include/linux/micrel_phy.h index 8bef1ab62bba3..322d872559847 100644 --- a/include/linux/micrel_phy.h +++ b/include/linux/micrel_phy.h @@ -41,9 +41,9 @@ #define PHY_ID_KSZ9477 0x00221631
/* struct phy_device dev_flags definitions */ -#define MICREL_PHY_50MHZ_CLK 0x00000001 -#define MICREL_PHY_FXEN 0x00000002 -#define MICREL_KSZ8_P1_ERRATA 0x00000003 +#define MICREL_PHY_50MHZ_CLK BIT(0) +#define MICREL_PHY_FXEN BIT(1) +#define MICREL_KSZ8_P1_ERRATA BIT(2)
#define MICREL_KSZ9021_EXTREG_CTRL 0xB #define MICREL_KSZ9021_EXTREG_DATA_WRITE 0xC
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Fastabend john.fastabend@gmail.com
[ Upstream commit a454d84ee20baf7bd7be90721b9821f73c7d23d9 ]
There is a race where skb's from the sk_psock_backlog can be referenced after userspace side has already skb_consumed() the sk_buff and its refcnt dropped to zer0 causing use after free.
The flow is the following:
while ((skb = skb_peek(&psock->ingress_skb)) sk_psock_handle_Skb(psock, skb, ..., ingress) if (!ingress) ... sk_psock_skb_ingress sk_psock_skb_ingress_enqueue(skb) msg->skb = skb sk_psock_queue_msg(psock, msg) skb_dequeue(&psock->ingress_skb)
The sk_psock_queue_msg() puts the msg on the ingress_msg queue. This is what the application reads when recvmsg() is called. An application can read this anytime after the msg is placed on the queue. The recvmsg hook will also read msg->skb and then after user space reads the msg will call consume_skb(skb) on it effectively free'ing it.
But, the race is in above where backlog queue still has a reference to the skb and calls skb_dequeue(). If the skb_dequeue happens after the user reads and free's the skb we have a use after free.
The !ingress case does not suffer from this problem because it uses sendmsg_*(sk, msg) which does not pass the sk_buff further down the stack.
The following splat was observed with 'test_progs -t sockmap_listen':
[ 1022.710250][ T2556] general protection fault, ... [...] [ 1022.712830][ T2556] Workqueue: events sk_psock_backlog [ 1022.713262][ T2556] RIP: 0010:skb_dequeue+0x4c/0x80 [ 1022.713653][ T2556] Code: ... [...] [ 1022.720699][ T2556] Call Trace: [ 1022.720984][ T2556] <TASK> [ 1022.721254][ T2556] ? die_addr+0x32/0x80^M [ 1022.721589][ T2556] ? exc_general_protection+0x25a/0x4b0 [ 1022.722026][ T2556] ? asm_exc_general_protection+0x22/0x30 [ 1022.722489][ T2556] ? skb_dequeue+0x4c/0x80 [ 1022.722854][ T2556] sk_psock_backlog+0x27a/0x300 [ 1022.723243][ T2556] process_one_work+0x2a7/0x5b0 [ 1022.723633][ T2556] worker_thread+0x4f/0x3a0 [ 1022.723998][ T2556] ? __pfx_worker_thread+0x10/0x10 [ 1022.724386][ T2556] kthread+0xfd/0x130 [ 1022.724709][ T2556] ? __pfx_kthread+0x10/0x10 [ 1022.725066][ T2556] ret_from_fork+0x2d/0x50 [ 1022.725409][ T2556] ? __pfx_kthread+0x10/0x10 [ 1022.725799][ T2556] ret_from_fork_asm+0x1b/0x30 [ 1022.726201][ T2556] </TASK>
To fix we add an skb_get() before passing the skb to be enqueued in the engress queue. This bumps the skb->users refcnt so that consume_skb() and kfree_skb will not immediately free the sk_buff. With this we can be sure the skb is still around when we do the dequeue. Then we just need to decrement the refcnt or free the skb in the backlog case which we do by calling kfree_skb() on the ingress case as well as the sendmsg case.
Before locking change from fixes tag we had the sock locked so we couldn't race with user and there was no issue here.
Fixes: 799aa7f98d53e ("skmsg: Avoid lock_sock() in sk_psock_backlog()") Reported-by: Jiri Olsa jolsa@kernel.org Signed-off-by: John Fastabend john.fastabend@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Tested-by: Xu Kuohai xukuohai@huawei.com Tested-by: Jiri Olsa jolsa@kernel.org Link: https://lore.kernel.org/bpf/20230901202137.214666-1-john.fastabend@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/skmsg.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c index ef1a2eb6520bf..a746dbc2f8877 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -612,12 +612,18 @@ static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, u32 off, u32 len, bool ingress) { + int err = 0; + if (!ingress) { if (!sock_writeable(psock->sk)) return -EAGAIN; return skb_send_sock(psock->sk, skb, off, len); } - return sk_psock_skb_ingress(psock, skb, off, len); + skb_get(skb); + err = sk_psock_skb_ingress(psock, skb, off, len); + if (err < 0) + kfree_skb(skb); + return err; }
static void sk_psock_skb_state(struct sk_psock *psock, @@ -685,9 +691,7 @@ static void sk_psock_backlog(struct work_struct *work) } while (len);
skb = skb_dequeue(&psock->ingress_skb); - if (!ingress) { - kfree_skb(skb); - } + kfree_skb(skb); } end: mutex_unlock(&psock->work_mutex);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 718e6b51298e0f254baca0d40ab52a00e004e014 ]
Heiko Carstens reported that SCM_PIDFD does not work with MSG_CMSG_COMPAT because scm_pidfd_recv() always checks msg_controllen against sizeof(struct cmsghdr).
We need to use sizeof(struct compat_cmsghdr) for the compat case.
Fixes: 5e2ff6704a27 ("scm: add SO_PASSPIDFD and SCM_PIDFD") Reported-by: Heiko Carstens hca@linux.ibm.com Closes: https://lore.kernel.org/netdev/20230901200517.8742-A-hca@linux.ibm.com/ Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Tested-by: Heiko Carstens hca@linux.ibm.com Reviewed-by: Alexander Mikhalitsyn aleksandr.mikhalitsyn@canonical.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Acked-by: Christian Brauner brauner@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/scm.h | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/include/net/scm.h b/include/net/scm.h index c5bcdf65f55c9..e8c76b4be2fe7 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -9,6 +9,7 @@ #include <linux/pid.h> #include <linux/nsproxy.h> #include <linux/sched/signal.h> +#include <net/compat.h>
/* Well, we should have at least one descriptor open * to accept passed FDs 8) @@ -123,14 +124,17 @@ static inline bool scm_has_secdata(struct socket *sock) static __inline__ void scm_pidfd_recv(struct msghdr *msg, struct scm_cookie *scm) { struct file *pidfd_file = NULL; - int pidfd; + int len, pidfd;
- /* - * put_cmsg() doesn't return an error if CMSG is truncated, + /* put_cmsg() doesn't return an error if CMSG is truncated, * that's why we need to opencode these checks here. */ - if ((msg->msg_controllen <= sizeof(struct cmsghdr)) || - (msg->msg_controllen - sizeof(struct cmsghdr)) < sizeof(int)) { + if (msg->msg_flags & MSG_CMSG_COMPAT) + len = sizeof(struct compat_cmsghdr) + sizeof(int); + else + len = sizeof(struct cmsghdr) + sizeof(int); + + if (msg->msg_controllen < len) { msg->msg_flags |= MSG_CTRUNC; return; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 0bc36c0650b21df36fbec8136add83936eaf0607 ]
user->unix_inflight is changed under spin_lock(unix_gc_lock), but too_many_unix_fds() reads it locklessly.
Let's annotate the write/read accesses to user->unix_inflight.
BUG: KCSAN: data-race in unix_attach_fds / unix_inflight
write to 0xffffffff8546f2d0 of 8 bytes by task 44798 on cpu 1: unix_inflight+0x157/0x180 net/unix/scm.c:66 unix_attach_fds+0x147/0x1e0 net/unix/scm.c:123 unix_scm_to_skb net/unix/af_unix.c:1827 [inline] unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1950 unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline] unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg+0x148/0x160 net/socket.c:748 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2494 ___sys_sendmsg+0xc6/0x140 net/socket.c:2548 __sys_sendmsg+0x94/0x140 net/socket.c:2577 __do_sys_sendmsg net/socket.c:2586 [inline] __se_sys_sendmsg net/socket.c:2584 [inline] __x64_sys_sendmsg+0x45/0x50 net/socket.c:2584 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
read to 0xffffffff8546f2d0 of 8 bytes by task 44814 on cpu 0: too_many_unix_fds net/unix/scm.c:101 [inline] unix_attach_fds+0x54/0x1e0 net/unix/scm.c:110 unix_scm_to_skb net/unix/af_unix.c:1827 [inline] unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1950 unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline] unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg+0x148/0x160 net/socket.c:748 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2494 ___sys_sendmsg+0xc6/0x140 net/socket.c:2548 __sys_sendmsg+0x94/0x140 net/socket.c:2577 __do_sys_sendmsg net/socket.c:2586 [inline] __se_sys_sendmsg net/socket.c:2584 [inline] __x64_sys_sendmsg+0x45/0x50 net/socket.c:2584 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
value changed: 0x000000000000000c -> 0x000000000000000d
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 44814 Comm: systemd-coredum Not tainted 6.4.0-11989-g6843306689af #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 712f4aad406b ("unix: properly account for FDs passed over unix sockets") Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Willy Tarreau w@1wt.eu Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/scm.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/unix/scm.c b/net/unix/scm.c index f9152881d77f6..033e21e5c4df6 100644 --- a/net/unix/scm.c +++ b/net/unix/scm.c @@ -63,7 +63,7 @@ void unix_inflight(struct user_struct *user, struct file *fp) /* Paired with READ_ONCE() in wait_for_unix_gc() */ WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1); } - user->unix_inflight++; + WRITE_ONCE(user->unix_inflight, user->unix_inflight + 1); spin_unlock(&unix_gc_lock); }
@@ -84,7 +84,7 @@ void unix_notinflight(struct user_struct *user, struct file *fp) /* Paired with READ_ONCE() in wait_for_unix_gc() */ WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1); } - user->unix_inflight--; + WRITE_ONCE(user->unix_inflight, user->unix_inflight - 1); spin_unlock(&unix_gc_lock); }
@@ -98,7 +98,7 @@ static inline bool too_many_unix_fds(struct task_struct *p) { struct user_struct *user = current_user();
- if (unlikely(user->unix_inflight > task_rlimit(p, RLIMIT_NOFILE))) + if (unlikely(READ_ONCE(user->unix_inflight) > task_rlimit(p, RLIMIT_NOFILE))) return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN); return false; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit ade32bd8a738d7497ffe9743c46728db26740f78 ]
unix_tot_inflight is changed under spin_lock(unix_gc_lock), but unix_release_sock() reads it locklessly.
Let's use READ_ONCE() for unix_tot_inflight.
Note that the writer side was marked by commit 9d6d7f1cb67c ("af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress")
BUG: KCSAN: data-race in unix_inflight / unix_release_sock
write (marked) to 0xffffffff871852b8 of 4 bytes by task 123 on cpu 1: unix_inflight+0x130/0x180 net/unix/scm.c:64 unix_attach_fds+0x137/0x1b0 net/unix/scm.c:123 unix_scm_to_skb net/unix/af_unix.c:1832 [inline] unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1955 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0x148/0x160 net/socket.c:747 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2493 ___sys_sendmsg+0xc6/0x140 net/socket.c:2547 __sys_sendmsg+0x94/0x140 net/socket.c:2576 __do_sys_sendmsg net/socket.c:2585 [inline] __se_sys_sendmsg net/socket.c:2583 [inline] __x64_sys_sendmsg+0x45/0x50 net/socket.c:2583 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffffffff871852b8 of 4 bytes by task 4891 on cpu 0: unix_release_sock+0x608/0x910 net/unix/af_unix.c:671 unix_release+0x59/0x80 net/unix/af_unix.c:1058 __sock_release+0x7d/0x170 net/socket.c:653 sock_close+0x19/0x30 net/socket.c:1385 __fput+0x179/0x5e0 fs/file_table.c:321 ____fput+0x15/0x20 fs/file_table.c:349 task_work_run+0x116/0x1a0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x00000000 -> 0x00000001
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 4891 Comm: systemd-coredum Not tainted 6.4.0-rc5-01219-gfa0e21fa4443 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 9305cfa4443d ("[AF_UNIX]: Make unix_tot_inflight counter non-atomic") Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 86930a8ed012b..3e8a04a136688 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -680,7 +680,7 @@ static void unix_release_sock(struct sock *sk, int embrion) * What the above comment does talk about? --ANK(980817) */
- if (unix_tot_inflight) + if (READ_ONCE(unix_tot_inflight)) unix_gc(); /* Garbage collect fds */ }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit afe8764f76346ba838d4f162883e23d2fcfaa90e ]
sk->sk_shutdown is changed under unix_state_lock(sk), but unix_dgram_sendmsg() calls two functions to read sk_shutdown locklessly.
sock_alloc_send_pskb `- sock_wait_for_wmem
Let's use READ_ONCE() there.
Note that the writer side was marked by commit e1d09c2c2f57 ("af_unix: Fix data races around sk->sk_shutdown.").
BUG: KCSAN: data-race in sock_alloc_send_pskb / unix_release_sock
write (marked) to 0xffff8880069af12c of 1 bytes by task 1 on cpu 1: unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631 unix_release+0x59/0x80 net/unix/af_unix.c:1053 __sock_release+0x7d/0x170 net/socket.c:654 sock_close+0x19/0x30 net/socket.c:1386 __fput+0x2a3/0x680 fs/file_table.c:384 ____fput+0x15/0x20 fs/file_table.c:412 task_work_run+0x116/0x1a0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
read to 0xffff8880069af12c of 1 bytes by task 28650 on cpu 0: sock_alloc_send_pskb+0xd2/0x620 net/core/sock.c:2767 unix_dgram_sendmsg+0x2f8/0x14f0 net/unix/af_unix.c:1944 unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline] unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg+0x148/0x160 net/socket.c:748 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2494 ___sys_sendmsg+0xc6/0x140 net/socket.c:2548 __sys_sendmsg+0x94/0x140 net/socket.c:2577 __do_sys_sendmsg net/socket.c:2586 [inline] __se_sys_sendmsg net/socket.c:2584 [inline] __x64_sys_sendmsg+0x45/0x50 net/socket.c:2584 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
value changed: 0x00 -> 0x03
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 28650 Comm: systemd-coredum Not tainted 6.4.0-11989-g6843306689af #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 0a687c8fbed7f..2a78e47f76dba 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2746,7 +2746,7 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo) prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); if (refcount_read(&sk->sk_wmem_alloc) < READ_ONCE(sk->sk_sndbuf)) break; - if (sk->sk_shutdown & SEND_SHUTDOWN) + if (READ_ONCE(sk->sk_shutdown) & SEND_SHUTDOWN) break; if (sk->sk_err) break; @@ -2776,7 +2776,7 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len, goto failure;
err = -EPIPE; - if (sk->sk_shutdown & SEND_SHUTDOWN) + if (READ_ONCE(sk->sk_shutdown) & SEND_SHUTDOWN) goto failure;
if (sk_wmem_alloc_get(sk) < READ_ONCE(sk->sk_sndbuf))
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit b192812905e4b134f7b7994b079eb647e9d2d37e ]
As with sk->sk_shutdown shown in the previous patch, sk->sk_err can be read locklessly by unix_dgram_sendmsg().
Let's use READ_ONCE() for sk_err as well.
Note that the writer side is marked by commit cc04410af7de ("af_unix: annotate lockless accesses to sk->sk_err").
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 2a78e47f76dba..29c6cb030818b 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2748,7 +2748,7 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo) break; if (READ_ONCE(sk->sk_shutdown) & SEND_SHUTDOWN) break; - if (sk->sk_err) + if (READ_ONCE(sk->sk_err)) break; timeo = schedule_timeout(timeo); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: valis sec@valis.email
[ Upstream commit 8fc134fee27f2263988ae38920bc03da416b03d8 ]
When the plug qdisc is used as a class of the qfq qdisc it could trigger a UAF. This issue can be reproduced with following commands:
tc qdisc add dev lo root handle 1: qfq tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512 tc qdisc add dev lo parent 1:1 handle 2: plug tc filter add dev lo parent 1: basic classid 1:1 ping -c1 127.0.0.1
and boom:
[ 285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0 [ 285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144 [ 285.355903] [ 285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4 [ 285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 285.358376] Call Trace: [ 285.358773] <IRQ> [ 285.359109] dump_stack_lvl+0x44/0x60 [ 285.359708] print_address_description.constprop.0+0x2c/0x3c0 [ 285.360611] kasan_report+0x10c/0x120 [ 285.361195] ? qfq_dequeue+0xa7/0x7f0 [ 285.361780] qfq_dequeue+0xa7/0x7f0 [ 285.362342] __qdisc_run+0xf1/0x970 [ 285.362903] net_tx_action+0x28e/0x460 [ 285.363502] __do_softirq+0x11b/0x3de [ 285.364097] do_softirq.part.0+0x72/0x90 [ 285.364721] </IRQ> [ 285.365072] <TASK> [ 285.365422] __local_bh_enable_ip+0x77/0x90 [ 285.366079] __dev_queue_xmit+0x95f/0x1550 [ 285.366732] ? __pfx_csum_and_copy_from_iter+0x10/0x10 [ 285.367526] ? __pfx___dev_queue_xmit+0x10/0x10 [ 285.368259] ? __build_skb_around+0x129/0x190 [ 285.368960] ? ip_generic_getfrag+0x12c/0x170 [ 285.369653] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 285.370390] ? csum_partial+0x8/0x20 [ 285.370961] ? raw_getfrag+0xe5/0x140 [ 285.371559] ip_finish_output2+0x539/0xa40 [ 285.372222] ? __pfx_ip_finish_output2+0x10/0x10 [ 285.372954] ip_output+0x113/0x1e0 [ 285.373512] ? __pfx_ip_output+0x10/0x10 [ 285.374130] ? icmp_out_count+0x49/0x60 [ 285.374739] ? __pfx_ip_finish_output+0x10/0x10 [ 285.375457] ip_push_pending_frames+0xf3/0x100 [ 285.376173] raw_sendmsg+0xef5/0x12d0 [ 285.376760] ? do_syscall_64+0x40/0x90 [ 285.377359] ? __static_call_text_end+0x136578/0x136578 [ 285.378173] ? do_syscall_64+0x40/0x90 [ 285.378772] ? kasan_enable_current+0x11/0x20 [ 285.379469] ? __pfx_raw_sendmsg+0x10/0x10 [ 285.380137] ? __sock_create+0x13e/0x270 [ 285.380673] ? __sys_socket+0xf3/0x180 [ 285.381174] ? __x64_sys_socket+0x3d/0x50 [ 285.381725] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.382425] ? __rcu_read_unlock+0x48/0x70 [ 285.382975] ? ip4_datagram_release_cb+0xd8/0x380 [ 285.383608] ? __pfx_ip4_datagram_release_cb+0x10/0x10 [ 285.384295] ? preempt_count_sub+0x14/0xc0 [ 285.384844] ? __list_del_entry_valid+0x76/0x140 [ 285.385467] ? _raw_spin_lock_bh+0x87/0xe0 [ 285.386014] ? __pfx__raw_spin_lock_bh+0x10/0x10 [ 285.386645] ? release_sock+0xa0/0xd0 [ 285.387148] ? preempt_count_sub+0x14/0xc0 [ 285.387712] ? freeze_secondary_cpus+0x348/0x3c0 [ 285.388341] ? aa_sk_perm+0x177/0x390 [ 285.388856] ? __pfx_aa_sk_perm+0x10/0x10 [ 285.389441] ? check_stack_object+0x22/0x70 [ 285.390032] ? inet_send_prepare+0x2f/0x120 [ 285.390603] ? __pfx_inet_sendmsg+0x10/0x10 [ 285.391172] sock_sendmsg+0xcc/0xe0 [ 285.391667] __sys_sendto+0x190/0x230 [ 285.392168] ? __pfx___sys_sendto+0x10/0x10 [ 285.392727] ? kvm_clock_get_cycles+0x14/0x30 [ 285.393328] ? set_normalized_timespec64+0x57/0x70 [ 285.393980] ? _raw_spin_unlock_irq+0x1b/0x40 [ 285.394578] ? __x64_sys_clock_gettime+0x11c/0x160 [ 285.395225] ? __pfx___x64_sys_clock_gettime+0x10/0x10 [ 285.395908] ? _copy_to_user+0x3e/0x60 [ 285.396432] ? exit_to_user_mode_prepare+0x1a/0x120 [ 285.397086] ? syscall_exit_to_user_mode+0x22/0x50 [ 285.397734] ? do_syscall_64+0x71/0x90 [ 285.398258] __x64_sys_sendto+0x74/0x90 [ 285.398786] do_syscall_64+0x64/0x90 [ 285.399273] ? exit_to_user_mode_prepare+0x1a/0x120 [ 285.399949] ? syscall_exit_to_user_mode+0x22/0x50 [ 285.400605] ? do_syscall_64+0x71/0x90 [ 285.401124] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.401807] RIP: 0033:0x495726 [ 285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09 [ 285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726 [ 285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000 [ 285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c [ 285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634 [ 285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000 [ 285.410403] </TASK> [ 285.410704] [ 285.410929] Allocated by task 144: [ 285.411402] kasan_save_stack+0x1e/0x40 [ 285.411926] kasan_set_track+0x21/0x30 [ 285.412442] __kasan_slab_alloc+0x55/0x70 [ 285.412973] kmem_cache_alloc_node+0x187/0x3d0 [ 285.413567] __alloc_skb+0x1b4/0x230 [ 285.414060] __ip_append_data+0x17f7/0x1b60 [ 285.414633] ip_append_data+0x97/0xf0 [ 285.415144] raw_sendmsg+0x5a8/0x12d0 [ 285.415640] sock_sendmsg+0xcc/0xe0 [ 285.416117] __sys_sendto+0x190/0x230 [ 285.416626] __x64_sys_sendto+0x74/0x90 [ 285.417145] do_syscall_64+0x64/0x90 [ 285.417624] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.418306] [ 285.418531] Freed by task 144: [ 285.418960] kasan_save_stack+0x1e/0x40 [ 285.419469] kasan_set_track+0x21/0x30 [ 285.419988] kasan_save_free_info+0x27/0x40 [ 285.420556] ____kasan_slab_free+0x109/0x1a0 [ 285.421146] kmem_cache_free+0x1c2/0x450 [ 285.421680] __netif_receive_skb_core+0x2ce/0x1870 [ 285.422333] __netif_receive_skb_one_core+0x97/0x140 [ 285.423003] process_backlog+0x100/0x2f0 [ 285.423537] __napi_poll+0x5c/0x2d0 [ 285.424023] net_rx_action+0x2be/0x560 [ 285.424510] __do_softirq+0x11b/0x3de [ 285.425034] [ 285.425254] The buggy address belongs to the object at ffff8880bad31280 [ 285.425254] which belongs to the cache skbuff_head_cache of size 224 [ 285.426993] The buggy address is located 40 bytes inside of [ 285.426993] freed 224-byte region [ffff8880bad31280, ffff8880bad31360) [ 285.428572] [ 285.428798] The buggy address belongs to the physical page: [ 285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31 [ 285.430758] flags: 0x100000000000200(slab|node=0|zone=1) [ 285.431447] page_type: 0xffffffff() [ 285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000 [ 285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000 [ 285.433562] page dumped because: kasan: bad access detected [ 285.434144] [ 285.434320] Memory state around the buggy address: [ 285.434828] ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.435580] ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 285.436777] ^ [ 285.437106] ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 285.437616] ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.438126] ================================================================== [ 285.438662] Disabling lock debugging due to kernel taint
Fix this by: 1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a function compatible with non-work-conserving qdiscs 2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq.
Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Reported-by: valis sec@valis.email Signed-off-by: valis sec@valis.email Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Link: https://lore.kernel.org/r/20230901162237.11525-1-jhs@mojatatu.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_plug.c | 2 +- net/sched/sch_qfq.c | 22 +++++++++++++++++----- 2 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/net/sched/sch_plug.c b/net/sched/sch_plug.c index ea8c4a7174bba..35f49edf63dbf 100644 --- a/net/sched/sch_plug.c +++ b/net/sched/sch_plug.c @@ -207,7 +207,7 @@ static struct Qdisc_ops plug_qdisc_ops __read_mostly = { .priv_size = sizeof(struct plug_sched_data), .enqueue = plug_enqueue, .dequeue = plug_dequeue, - .peek = qdisc_peek_head, + .peek = qdisc_peek_dequeued, .init = plug_init, .change = plug_change, .reset = qdisc_reset_queue, diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index befaf74b33caa..09d2955baab10 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -974,10 +974,13 @@ static void qfq_update_eligible(struct qfq_sched *q) }
/* Dequeue head packet of the head class in the DRR queue of the aggregate. */ -static void agg_dequeue(struct qfq_aggregate *agg, - struct qfq_class *cl, unsigned int len) +static struct sk_buff *agg_dequeue(struct qfq_aggregate *agg, + struct qfq_class *cl, unsigned int len) { - qdisc_dequeue_peeked(cl->qdisc); + struct sk_buff *skb = qdisc_dequeue_peeked(cl->qdisc); + + if (!skb) + return NULL;
cl->deficit -= (int) len;
@@ -987,6 +990,8 @@ static void agg_dequeue(struct qfq_aggregate *agg, cl->deficit += agg->lmax; list_move_tail(&cl->alist, &agg->active); } + + return skb; }
static inline struct sk_buff *qfq_peek_skb(struct qfq_aggregate *agg, @@ -1132,11 +1137,18 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch) if (!skb) return NULL;
- qdisc_qstats_backlog_dec(sch, skb); sch->q.qlen--; + + skb = agg_dequeue(in_serv_agg, cl, len); + + if (!skb) { + sch->q.qlen++; + return NULL; + } + + qdisc_qstats_backlog_dec(sch, skb); qdisc_bstats_update(sch, skb);
- agg_dequeue(in_serv_agg, cl, len); /* If lmax is lowered, through qfq_change_class, for a class * owning pending packets with larger size than the new value * of lmax, then the following condition may hold.
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shigeru Yoshida syoshida@redhat.com
[ Upstream commit 6ad40b36cd3b04209e2d6c89d252c873d8082a59 ]
kcm_exit_net() should call mutex_destroy() on knet->mutex. This is especially needed if CONFIG_DEBUG_MUTEXES is enabled.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Shigeru Yoshida syoshida@redhat.com Link: https://lore.kernel.org/r/20230902170708.1727999-1-syoshida@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/kcm/kcmsock.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index 393f01b2a7e6d..4580f61426bb8 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -1859,6 +1859,8 @@ static __net_exit void kcm_exit_net(struct net *net) * that all multiplexors and psocks have been destroyed. */ WARN_ON(!list_empty(&knet->mux_list)); + + mutex_destroy(&knet->mutex); }
static struct pernet_operations kcm_net_ops = {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geetha sowjanya gakula@marvell.com
[ Upstream commit 29fe7a1b62717d58f033009874554d99d71f7d37 ]
The smq value used in the CN10K NIX AQ instruction enqueue mailbox handler was truncated to 9-bit value from 10-bit value because of typecasting the CN10K mbox request structure to the CN9K structure. Though this hasn't caused any problems when programming the NIX SQ context to the HW because the context structure is the same size. However, this causes a problem when accessing the structure parameters. This patch reads the right smq value for each platform.
Fixes: 30077d210c83 ("octeontx2-af: cn10k: Update NIX/NPA context structure") Signed-off-by: Geetha sowjanya gakula@marvell.com Signed-off-by: Sunil Kovvuri Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/marvell/octeontx2/af/rvu_nix.c | 21 +++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c index c2f68678e947e..23c2f2ed2fb83 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c @@ -846,6 +846,21 @@ static int nix_aq_enqueue_wait(struct rvu *rvu, struct rvu_block *block, return 0; }
+static void nix_get_aq_req_smq(struct rvu *rvu, struct nix_aq_enq_req *req, + u16 *smq, u16 *smq_mask) +{ + struct nix_cn10k_aq_enq_req *aq_req; + + if (!is_rvu_otx2(rvu)) { + aq_req = (struct nix_cn10k_aq_enq_req *)req; + *smq = aq_req->sq.smq; + *smq_mask = aq_req->sq_mask.smq; + } else { + *smq = req->sq.smq; + *smq_mask = req->sq_mask.smq; + } +} + static int rvu_nix_blk_aq_enq_inst(struct rvu *rvu, struct nix_hw *nix_hw, struct nix_aq_enq_req *req, struct nix_aq_enq_rsp *rsp) @@ -857,6 +872,7 @@ static int rvu_nix_blk_aq_enq_inst(struct rvu *rvu, struct nix_hw *nix_hw, struct rvu_block *block; struct admin_queue *aq; struct rvu_pfvf *pfvf; + u16 smq, smq_mask; void *ctx, *mask; bool ena; u64 cfg; @@ -928,13 +944,14 @@ static int rvu_nix_blk_aq_enq_inst(struct rvu *rvu, struct nix_hw *nix_hw, if (rc) return rc;
+ nix_get_aq_req_smq(rvu, req, &smq, &smq_mask); /* Check if SQ pointed SMQ belongs to this PF/VF or not */ if (req->ctype == NIX_AQ_CTYPE_SQ && ((req->op == NIX_AQ_INSTOP_INIT && req->sq.ena) || (req->op == NIX_AQ_INSTOP_WRITE && - req->sq_mask.ena && req->sq_mask.smq && req->sq.ena))) { + req->sq_mask.ena && req->sq.ena && smq_mask))) { if (!is_valid_txschq(rvu, blkaddr, NIX_TXSCH_LVL_SMQ, - pcifunc, req->sq.smq)) + pcifunc, smq)) return NIX_AF_ERR_AQ_ENQUEUE; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Olga Zaborska olga.zaborska@intel.com
[ Upstream commit 5aa48279712e1f134aac908acde4df798955a955 ]
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx value between 64 and 80. All igc devices can use as low as 64 descriptors. This change will unify igc with other drivers. Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64")
Fixes: 0507ef8a0372 ("igc: Add transmit and receive fastpath and interrupt handlers") Signed-off-by: Olga Zaborska olga.zaborska@intel.com Tested-by: Naama Meir naamax.meir@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h index 38901d2a46807..b4077c3f62ed1 100644 --- a/drivers/net/ethernet/intel/igc/igc.h +++ b/drivers/net/ethernet/intel/igc/igc.h @@ -368,11 +368,11 @@ static inline u32 igc_rss_type(const union igc_adv_rx_desc *rx_desc) /* TX/RX descriptor defines */ #define IGC_DEFAULT_TXD 256 #define IGC_DEFAULT_TX_WORK 128 -#define IGC_MIN_TXD 80 +#define IGC_MIN_TXD 64 #define IGC_MAX_TXD 4096
#define IGC_DEFAULT_RXD 256 -#define IGC_MIN_RXD 80 +#define IGC_MIN_RXD 64 #define IGC_MAX_RXD 4096
/* Supported Rx Buffer Sizes */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Olga Zaborska olga.zaborska@intel.com
[ Upstream commit 8360717524a24a421c36ef8eb512406dbd42160a ]
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx value between 64 and 80. All igbvf devices can use as low as 64 descriptors. This change will unify igbvf with other drivers. Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64")
Fixes: d4e0fe01a38a ("igbvf: add new driver to support 82576 virtual functions") Signed-off-by: Olga Zaborska olga.zaborska@intel.com Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igbvf/igbvf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igbvf/igbvf.h b/drivers/net/ethernet/intel/igbvf/igbvf.h index 57d39ee00b585..7b83678ba83a6 100644 --- a/drivers/net/ethernet/intel/igbvf/igbvf.h +++ b/drivers/net/ethernet/intel/igbvf/igbvf.h @@ -39,11 +39,11 @@ enum latency_range { /* Tx/Rx descriptor defines */ #define IGBVF_DEFAULT_TXD 256 #define IGBVF_MAX_TXD 4096 -#define IGBVF_MIN_TXD 80 +#define IGBVF_MIN_TXD 64
#define IGBVF_DEFAULT_RXD 256 #define IGBVF_MAX_RXD 4096 -#define IGBVF_MIN_RXD 80 +#define IGBVF_MIN_RXD 64
#define IGBVF_MIN_ITR_USECS 10 /* 100000 irq/sec */ #define IGBVF_MAX_ITR_USECS 10000 /* 100 irq/sec */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Olga Zaborska olga.zaborska@intel.com
[ Upstream commit 6319685bdc8ad5310890add907b7c42f89302886 ]
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx value between 64 and 80. All igb devices can use as low as 64 descriptors. This change will unify igb with other drivers. Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64")
Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver") Signed-off-by: Olga Zaborska olga.zaborska@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index 015b781441149..a2b759531cb7b 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -34,11 +34,11 @@ struct igb_adapter; /* TX/RX descriptor defines */ #define IGB_DEFAULT_TXD 256 #define IGB_DEFAULT_TX_WORK 128 -#define IGB_MIN_TXD 80 +#define IGB_MIN_TXD 64 #define IGB_MAX_TXD 4096
#define IGB_DEFAULT_RXD 256 -#define IGB_MIN_RXD 80 +#define IGB_MIN_RXD 64 #define IGB_MAX_RXD 4096
#define IGB_DEFAULT_ITR 3 /* dynamic */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit 6252f47b78031979ad919f971dc8468b893488bd ]
When dev_set_name() fails, zcdn_create() doesn't free the newly allocated resources. Do it.
Fixes: 00fab2350e6b ("s390/zcrypt: multiple zcrypt device nodes support") Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Link: https://lore.kernel.org/r/20230831110000.24279-1-andriy.shevchenko@linux.int... Signed-off-by: Harald Freudenberger freude@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/crypto/zcrypt_api.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/s390/crypto/zcrypt_api.c b/drivers/s390/crypto/zcrypt_api.c index 4b23c9f7f3e54..6b99f7dd06433 100644 --- a/drivers/s390/crypto/zcrypt_api.c +++ b/drivers/s390/crypto/zcrypt_api.c @@ -413,6 +413,7 @@ static int zcdn_create(const char *name) ZCRYPT_NAME "_%d", (int)MINOR(devt)); nodename[sizeof(nodename) - 1] = '\0'; if (dev_set_name(&zcdndev->device, nodename)) { + kfree(zcdndev); rc = -EINVAL; goto unlockout; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jerome Neanne jneanne@baylibre.com
[ Upstream commit ca0e36e3e39a4e8b5a4b647dff8c5938ca6ccbec ]
Random kernel crash detected in TI CICD when regulator driver is added. This is root caused to irq index increment being done twice causing irq_data being allocated outside of the range.
- Rework tps6594_request_reg_irqs with correct index increment - Adjust irq_data kmalloc size to the exact size needed for the device
This has been reported on TI mainline. No public bug report associated.
Reported-by: Udit Kumar u-kumar1@ti.com Fixes: f17ccc5deb4d ("regulator: tps6594-regulator: Add driver for TI TPS6594 regulators") Signed-off-by: Jerome Neanne jneanne@baylibre.com Link: https://lore.kernel.org/r/20230828-tps6594_random_boot_crash_fix-v1-1-f29cbf... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/tps6594-regulator.c | 31 +++++++++++++-------------- 1 file changed, 15 insertions(+), 16 deletions(-)
diff --git a/drivers/regulator/tps6594-regulator.c b/drivers/regulator/tps6594-regulator.c index d5a574ec6d12f..47c3b7efe145e 100644 --- a/drivers/regulator/tps6594-regulator.c +++ b/drivers/regulator/tps6594-regulator.c @@ -384,21 +384,19 @@ static int tps6594_request_reg_irqs(struct platform_device *pdev, if (irq < 0) return -EINVAL;
- irq_data[*irq_idx + j].dev = tps->dev; - irq_data[*irq_idx + j].type = irq_type; - irq_data[*irq_idx + j].rdev = rdev; + irq_data[*irq_idx].dev = tps->dev; + irq_data[*irq_idx].type = irq_type; + irq_data[*irq_idx].rdev = rdev;
error = devm_request_threaded_irq(tps->dev, irq, NULL, - tps6594_regulator_irq_handler, - IRQF_ONESHOT, - irq_type->irq_name, - &irq_data[*irq_idx]); - (*irq_idx)++; + tps6594_regulator_irq_handler, IRQF_ONESHOT, + irq_type->irq_name, &irq_data[*irq_idx]); if (error) { dev_err(tps->dev, "tps6594 failed to request %s IRQ %d: %d\n", irq_type->irq_name, irq, error); return error; } + (*irq_idx)++; } return 0; } @@ -420,8 +418,8 @@ static int tps6594_regulator_probe(struct platform_device *pdev) int error, i, irq, multi, delta; int irq_idx = 0; int buck_idx = 0; - int ext_reg_irq_nb = 2; - + size_t ext_reg_irq_nb = 2; + size_t reg_irq_nb; enum { MULTI_BUCK12, MULTI_BUCK123, @@ -484,15 +482,16 @@ static int tps6594_regulator_probe(struct platform_device *pdev) } }
- if (tps->chip_id == LP8764) + if (tps->chip_id == LP8764) { /* There is only 4 buck on LP8764 */ buck_configured[4] = 1; + reg_irq_nb = size_mul(REGS_INT_NB, (BUCK_NB - 1)); + } else { + reg_irq_nb = size_mul(REGS_INT_NB, (size_add(BUCK_NB, LDO_NB))); + }
- irq_data = devm_kmalloc_array(tps->dev, - REGS_INT_NB * sizeof(struct tps6594_regulator_irq_data), - ARRAY_SIZE(tps6594_bucks_irq_types) + - ARRAY_SIZE(tps6594_ldos_irq_types), - GFP_KERNEL); + irq_data = devm_kmalloc_array(tps->dev, reg_irq_nb, + sizeof(struct tps6594_regulator_irq_data), GFP_KERNEL); if (!irq_data) return -ENOMEM;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ariel Marcovitch arielmarcovitch@gmail.com
[ Upstream commit 2a15de80dd0f7e04a823291aa9eb49c5294f56af ]
The relevant parameter is 'start' and not 'nextid'
Fixes: 460488c58ca8 ("idr: Remove idr_alloc_ext") Signed-off-by: Ariel Marcovitch arielmarcovitch@gmail.com Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- lib/idr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/idr.c b/lib/idr.c index 7ecdfdb5309e7..13f2758c23773 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -100,7 +100,7 @@ EXPORT_SYMBOL_GPL(idr_alloc); * @end: The maximum ID (exclusive). * @gfp: Memory allocation flags. * - * Allocates an unused ID in the range specified by @nextid and @end. If + * Allocates an unused ID in the range specified by @start and @end. If * @end is <= 0, it is treated as one larger than %INT_MAX. This allows * callers to use @start + N as @end as long as N is within integer range. * The search for an unused ID will start at the last ID allocated and will
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 9b271ebaf9a2c5c566a54bc6cd915962e8241130 ]
syzbot/KCSAN reported data-races in iptunnel_xmit_stats() [1]
This can run from multiple cpus without mutual exclusion.
Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.
[1] BUG: KCSAN: data-race in iptunnel_xmit / iptunnel_xmit
read-write to 0xffff8881353df170 of 8 bytes by task 30263 on cpu 1: iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline] iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87 ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560 __dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] __bpf_tx_skb net/core/filter.c:2129 [inline] __bpf_redirect_no_mac net/core/filter.c:2159 [inline] __bpf_redirect+0x723/0x9c0 net/core/filter.c:2182 ____bpf_clone_redirect net/core/filter.c:2453 [inline] bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425 ___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954 __bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195 bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline] __bpf_prog_run include/linux/filter.h:609 [inline] bpf_prog_run include/linux/filter.h:616 [inline] bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423 bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045 bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996 __sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353 __do_sys_bpf kernel/bpf/syscall.c:5439 [inline] __se_sys_bpf kernel/bpf/syscall.c:5437 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
read-write to 0xffff8881353df170 of 8 bytes by task 30249 on cpu 0: iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline] iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87 ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560 __dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] __bpf_tx_skb net/core/filter.c:2129 [inline] __bpf_redirect_no_mac net/core/filter.c:2159 [inline] __bpf_redirect+0x723/0x9c0 net/core/filter.c:2182 ____bpf_clone_redirect net/core/filter.c:2453 [inline] bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425 ___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954 __bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195 bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline] __bpf_prog_run include/linux/filter.h:609 [inline] bpf_prog_run include/linux/filter.h:616 [inline] bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423 bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045 bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996 __sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353 __do_sys_bpf kernel/bpf/syscall.c:5439 [inline] __se_sys_bpf kernel/bpf/syscall.c:5437 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x0000000000018830 -> 0x0000000000018831
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 30249 Comm: syz-executor.4 Not tainted 6.5.0-syzkaller-11704-g3f86ed6ec0b3 #0
Fixes: 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/ip_tunnels.h | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index ed4b6ad3fcac8..cd0e2744f66f3 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -482,15 +482,14 @@ static inline void iptunnel_xmit_stats(struct net_device *dev, int pkt_len) u64_stats_inc(&tstats->tx_packets); u64_stats_update_end(&tstats->syncp); put_cpu_ptr(tstats); + return; + } + + if (pkt_len < 0) { + DEV_STATS_INC(dev, tx_errors); + DEV_STATS_INC(dev, tx_aborted_errors); } else { - struct net_device_stats *err_stats = &dev->stats; - - if (pkt_len < 0) { - err_stats->tx_errors++; - err_stats->tx_aborted_errors++; - } else { - err_stats->tx_dropped++; - } + DEV_STATS_INC(dev, tx_dropped); } }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jianbo Liu jianbol@nvidia.com
[ Upstream commit b7558a77529fef60e7992f40fb5353fed8be0cf8 ]
In the cited commit, the mirred devices are recorded and checked while parsing the actions. In order to avoid system crash, the duplicate action in a single rule is not allowed.
But the rule is actually break down into several FTEs in different tables, for either mirroring, or the specified types of actions which use post action infrastructure.
It will reject certain action list by mistake, for example: actions:enp8s0f0_1,set(ipv4(ttl=63)),enp8s0f0_0,enp8s0f0_1. Here the rule is split to two FTEs because of pedit action.
To fix this issue, when parsing the rule actions, reset if_count to clear the mirred devices array if the rule is split to multiple FTEs, and then the duplicate checking is restarted.
Fixes: 554fe75c1b3f ("net/mlx5e: Avoid duplicating rule destinations") Signed-off-by: Jianbo Liu jianbol@nvidia.com Reviewed-by: Vlad Buslov vladbu@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c | 4 +++- drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/mirred.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/pedit.c | 4 +++- .../ethernet/mellanox/mlx5/core/en/tc/act/redirect_ingress.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan.c | 1 + .../net/ethernet/mellanox/mlx5/core/en/tc/act/vlan_mangle.c | 4 +++- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 1 + 7 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c index 92d3952dfa8b7..feeb41693c176 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c @@ -17,8 +17,10 @@ tc_act_parse_ct(struct mlx5e_tc_act_parse_state *parse_state, if (err) return err;
- if (mlx5e_is_eswitch_flow(parse_state->flow)) + if (mlx5e_is_eswitch_flow(parse_state->flow)) { attr->esw_attr->split_count = attr->esw_attr->out_count; + parse_state->if_count = 0; + }
attr->flags |= MLX5_ATTR_FLAG_CT;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/mirred.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/mirred.c index 291193f7120d5..f63402c480280 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/mirred.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/mirred.c @@ -294,6 +294,7 @@ parse_mirred_ovs_master(struct mlx5e_tc_act_parse_state *parse_state, if (err) return err;
+ parse_state->if_count = 0; esw_attr->out_count++; return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/pedit.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/pedit.c index 3b272bbf4c538..368a95fa77d32 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/pedit.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/pedit.c @@ -98,8 +98,10 @@ tc_act_parse_pedit(struct mlx5e_tc_act_parse_state *parse_state,
attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
- if (ns_type == MLX5_FLOW_NAMESPACE_FDB) + if (ns_type == MLX5_FLOW_NAMESPACE_FDB) { esw_attr->split_count = esw_attr->out_count; + parse_state->if_count = 0; + }
return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/redirect_ingress.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/redirect_ingress.c index ad09a8a5f36e0..2d1d4a04501b4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/redirect_ingress.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/redirect_ingress.c @@ -66,6 +66,7 @@ tc_act_parse_redirect_ingress(struct mlx5e_tc_act_parse_state *parse_state, if (err) return err;
+ parse_state->if_count = 0; esw_attr->out_count++;
return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan.c index c8a3eaf189f6a..a13c5e707b83c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan.c @@ -166,6 +166,7 @@ tc_act_parse_vlan(struct mlx5e_tc_act_parse_state *parse_state, return err;
esw_attr->split_count = esw_attr->out_count; + parse_state->if_count = 0;
return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan_mangle.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan_mangle.c index 310b992307607..f17575b09788d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan_mangle.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan_mangle.c @@ -65,8 +65,10 @@ tc_act_parse_vlan_mangle(struct mlx5e_tc_act_parse_state *parse_state, if (err) return err;
- if (ns_type == MLX5_FLOW_NAMESPACE_FDB) + if (ns_type == MLX5_FLOW_NAMESPACE_FDB) { attr->esw_attr->split_count = attr->esw_attr->out_count; + parse_state->if_count = 0; + }
return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 31708d5aa6087..4b22a91482cec 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -3939,6 +3939,7 @@ parse_tc_actions(struct mlx5e_tc_act_parse_state *parse_state, }
i_split = i + 1; + parse_state->if_count = 0; list_add(&attr->list, &flow->attrs); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Pirko jiri@nvidia.com
[ Upstream commit 9eca8bb8da4385b02bd02b6876af8d4225bf4713 ]
As esw_offloads_load/unload_rep() are used outside eswitch.c it is nicer for them to have "mlx5_" prefix. Add it.
Signed-off-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Shay Drory shayd@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Stable-dep-of: 344134609a56 ("mlx5/core: E-Switch, Create ACL FT for eswitch manager in switchdev mode") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 4 ++-- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 4 ++-- .../net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 10 +++++----- 3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 243c455f10297..85ddf6f7e37df 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1078,7 +1078,7 @@ int mlx5_eswitch_load_vport(struct mlx5_eswitch *esw, u16 vport_num, if (err) return err;
- err = esw_offloads_load_rep(esw, vport_num); + err = mlx5_esw_offloads_load_rep(esw, vport_num); if (err) goto err_rep;
@@ -1091,7 +1091,7 @@ int mlx5_eswitch_load_vport(struct mlx5_eswitch *esw, u16 vport_num,
void mlx5_eswitch_unload_vport(struct mlx5_eswitch *esw, u16 vport_num) { - esw_offloads_unload_rep(esw, vport_num); + mlx5_esw_offloads_unload_rep(esw, vport_num); mlx5_esw_vport_disable(esw, vport_num); }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index ae0dc8a3060d7..040ed6d79258f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -725,8 +725,8 @@ void mlx5_esw_set_spec_source_port(struct mlx5_eswitch *esw, u16 vport, struct mlx5_flow_spec *spec);
-int esw_offloads_load_rep(struct mlx5_eswitch *esw, u16 vport_num); -void esw_offloads_unload_rep(struct mlx5_eswitch *esw, u16 vport_num); +int mlx5_esw_offloads_load_rep(struct mlx5_eswitch *esw, u16 vport_num); +void mlx5_esw_offloads_unload_rep(struct mlx5_eswitch *esw, u16 vport_num);
int mlx5_esw_offloads_rep_load(struct mlx5_eswitch *esw, u16 vport_num); void mlx5_esw_offloads_rep_unload(struct mlx5_eswitch *esw, u16 vport_num); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index e59380ee1ead3..1eb49784c0e11 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -2424,7 +2424,7 @@ void mlx5_esw_offloads_rep_unload(struct mlx5_eswitch *esw, u16 vport_num) __esw_offloads_unload_rep(esw, rep, rep_type); }
-int esw_offloads_load_rep(struct mlx5_eswitch *esw, u16 vport_num) +int mlx5_esw_offloads_load_rep(struct mlx5_eswitch *esw, u16 vport_num) { int err;
@@ -2448,7 +2448,7 @@ int esw_offloads_load_rep(struct mlx5_eswitch *esw, u16 vport_num) return err; }
-void esw_offloads_unload_rep(struct mlx5_eswitch *esw, u16 vport_num) +void mlx5_esw_offloads_unload_rep(struct mlx5_eswitch *esw, u16 vport_num) { if (esw->mode != MLX5_ESWITCH_OFFLOADS) return; @@ -3355,7 +3355,7 @@ int esw_offloads_enable(struct mlx5_eswitch *esw) vport->info.link_state = MLX5_VPORT_ADMIN_STATE_DOWN;
/* Uplink vport rep must load first. */ - err = esw_offloads_load_rep(esw, MLX5_VPORT_UPLINK); + err = mlx5_esw_offloads_load_rep(esw, MLX5_VPORT_UPLINK); if (err) goto err_uplink;
@@ -3366,7 +3366,7 @@ int esw_offloads_enable(struct mlx5_eswitch *esw) return 0;
err_vports: - esw_offloads_unload_rep(esw, MLX5_VPORT_UPLINK); + mlx5_esw_offloads_unload_rep(esw, MLX5_VPORT_UPLINK); err_uplink: esw_offloads_steering_cleanup(esw); err_steering_init: @@ -3404,7 +3404,7 @@ static int esw_offloads_stop(struct mlx5_eswitch *esw, void esw_offloads_disable(struct mlx5_eswitch *esw) { mlx5_eswitch_disable_pf_vf_vports(esw); - esw_offloads_unload_rep(esw, MLX5_VPORT_UPLINK); + mlx5_esw_offloads_unload_rep(esw, MLX5_VPORT_UPLINK); esw_set_passing_vport_metadata(esw, false); esw_offloads_steering_cleanup(esw); mapping_destroy(esw->offloads.reg_c0_obj_pool);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Pirko jiri@nvidia.com
[ Upstream commit 4c0dac1ef8abc6295a91197884f5ceb5d11c2bd9 ]
In order to prepare the devlink port registration function to be common for PFs/VFs and SFs, change the existing devlink port allocation and free functions into PF/VF init and cleanup, so similar helpers could be later on introduced for SFs. Make the init/cleanup helpers responsible for setting/clearing the vport->dl_port pointer.
Signed-off-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Shay Drory shayd@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Stable-dep-of: 344134609a56 ("mlx5/core: E-Switch, Create ACL FT for eswitch manager in switchdev mode") Signed-off-by: Sasha Levin sashal@kernel.org --- .../mellanox/mlx5/core/esw/devlink_port.c | 65 ++++++++++++------- 1 file changed, 43 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c index fdf2be548e855..463bde802e45e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c @@ -22,20 +22,17 @@ static bool mlx5_esw_devlink_port_supported(struct mlx5_eswitch *esw, u16 vport_ mlx5_core_is_ec_vf_vport(esw->dev, vport_num); }
-static struct devlink_port *mlx5_esw_dl_port_alloc(struct mlx5_eswitch *esw, u16 vport_num) +static void mlx5_esw_offloads_pf_vf_devlink_port_attrs_set(struct mlx5_eswitch *esw, + u16 vport_num, + struct devlink_port *dl_port) { struct mlx5_core_dev *dev = esw->dev; struct devlink_port_attrs attrs = {}; struct netdev_phys_item_id ppid = {}; - struct devlink_port *dl_port; u32 controller_num = 0; bool external; u16 pfnum;
- dl_port = kzalloc(sizeof(*dl_port), GFP_KERNEL); - if (!dl_port) - return NULL; - mlx5_esw_get_port_parent_id(dev, &ppid); pfnum = mlx5_get_dev_index(dev); external = mlx5_core_is_ecpf_esw_manager(dev); @@ -63,12 +60,40 @@ static struct devlink_port *mlx5_esw_dl_port_alloc(struct mlx5_eswitch *esw, u16 devlink_port_attrs_pci_vf_set(dl_port, 0, pfnum, vport_num - 1, false); } - return dl_port; }
-static void mlx5_esw_dl_port_free(struct devlink_port *dl_port) +static int mlx5_esw_offloads_pf_vf_devlink_port_init(struct mlx5_eswitch *esw, u16 vport_num) +{ + struct devlink_port *dl_port; + struct mlx5_vport *vport; + + if (!mlx5_esw_devlink_port_supported(esw, vport_num)) + return 0; + + vport = mlx5_eswitch_get_vport(esw, vport_num); + if (IS_ERR(vport)) + return PTR_ERR(vport); + + dl_port = kzalloc(sizeof(*dl_port), GFP_KERNEL); + if (!dl_port) + return -ENOMEM; + + mlx5_esw_offloads_pf_vf_devlink_port_attrs_set(esw, vport_num, dl_port); + + vport->dl_port = dl_port; + return 0; +} + +static void mlx5_esw_offloads_pf_vf_devlink_port_cleanup(struct mlx5_eswitch *esw, u16 vport_num) { - kfree(dl_port); + struct mlx5_vport *vport; + + vport = mlx5_eswitch_get_vport(esw, vport_num); + if (IS_ERR(vport) || !vport->dl_port) + return; + + kfree(vport->dl_port); + vport->dl_port = NULL; }
static const struct devlink_port_ops mlx5_esw_dl_port_ops = { @@ -89,16 +114,17 @@ int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_ struct devlink *devlink; int err;
- if (!mlx5_esw_devlink_port_supported(esw, vport_num)) - return 0; - vport = mlx5_eswitch_get_vport(esw, vport_num); if (IS_ERR(vport)) return PTR_ERR(vport);
- dl_port = mlx5_esw_dl_port_alloc(esw, vport_num); + err = mlx5_esw_offloads_pf_vf_devlink_port_init(esw, vport_num); + if (err) + return err; + + dl_port = vport->dl_port; if (!dl_port) - return -ENOMEM; + return 0;
devlink = priv_to_devlink(dev); dl_port_index = mlx5_esw_vport_to_devlink_port_index(dev, vport_num); @@ -111,13 +137,12 @@ int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_ if (err) goto rate_err;
- vport->dl_port = dl_port; return 0;
rate_err: devl_port_unregister(dl_port); reg_err: - mlx5_esw_dl_port_free(dl_port); + mlx5_esw_offloads_pf_vf_devlink_port_cleanup(esw, vport_num); return err; }
@@ -125,11 +150,8 @@ void mlx5_esw_offloads_devlink_port_unregister(struct mlx5_eswitch *esw, u16 vpo { struct mlx5_vport *vport;
- if (!mlx5_esw_devlink_port_supported(esw, vport_num)) - return; - vport = mlx5_eswitch_get_vport(esw, vport_num); - if (IS_ERR(vport)) + if (IS_ERR(vport) || !vport->dl_port) return;
if (vport->dl_port->devlink_rate) { @@ -138,8 +160,7 @@ void mlx5_esw_offloads_devlink_port_unregister(struct mlx5_eswitch *esw, u16 vpo }
devl_port_unregister(vport->dl_port); - mlx5_esw_dl_port_free(vport->dl_port); - vport->dl_port = NULL; + mlx5_esw_offloads_pf_vf_devlink_port_cleanup(esw, vport_num); }
struct devlink_port *mlx5_esw_offloads_devlink_port(struct mlx5_eswitch *esw, u16 vport_num)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Pirko jiri@nvidia.com
[ Upstream commit d9833bcfe840fff5d368b1c7c68e05c95be8d19c ]
In order to prepare for mlx5_esw_offloads_devlink_port_register/unregister() to be used for SFs as well, push out the PF/VF specific init/cleanup calls outside. Introduce mlx5_eswitch_load/unload_pf_vf_vport() and call them from there. Use these new helpers of PF/VF loading and make mlx5_eswitch_local/unload_vport() reusable for SFs.
Signed-off-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Shay Drory shayd@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Stable-dep-of: 344134609a56 ("mlx5/core: E-Switch, Create ACL FT for eswitch manager in switchdev mode") Signed-off-by: Sasha Levin sashal@kernel.org --- .../mellanox/mlx5/core/esw/devlink_port.c | 13 ++---- .../net/ethernet/mellanox/mlx5/core/eswitch.c | 45 ++++++++++++++----- .../net/ethernet/mellanox/mlx5/core/eswitch.h | 4 ++ .../mellanox/mlx5/core/eswitch_offloads.c | 16 +++++++ 4 files changed, 58 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c index 463bde802e45e..2170539461fa2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c @@ -62,7 +62,7 @@ static void mlx5_esw_offloads_pf_vf_devlink_port_attrs_set(struct mlx5_eswitch * } }
-static int mlx5_esw_offloads_pf_vf_devlink_port_init(struct mlx5_eswitch *esw, u16 vport_num) +int mlx5_esw_offloads_pf_vf_devlink_port_init(struct mlx5_eswitch *esw, u16 vport_num) { struct devlink_port *dl_port; struct mlx5_vport *vport; @@ -84,7 +84,7 @@ static int mlx5_esw_offloads_pf_vf_devlink_port_init(struct mlx5_eswitch *esw, u return 0; }
-static void mlx5_esw_offloads_pf_vf_devlink_port_cleanup(struct mlx5_eswitch *esw, u16 vport_num) +void mlx5_esw_offloads_pf_vf_devlink_port_cleanup(struct mlx5_eswitch *esw, u16 vport_num) { struct mlx5_vport *vport;
@@ -118,10 +118,6 @@ int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_ if (IS_ERR(vport)) return PTR_ERR(vport);
- err = mlx5_esw_offloads_pf_vf_devlink_port_init(esw, vport_num); - if (err) - return err; - dl_port = vport->dl_port; if (!dl_port) return 0; @@ -131,7 +127,7 @@ int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_ err = devl_port_register_with_ops(devlink, dl_port, dl_port_index, &mlx5_esw_dl_port_ops); if (err) - goto reg_err; + return err;
err = devl_rate_leaf_create(dl_port, vport, NULL); if (err) @@ -141,8 +137,6 @@ int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_
rate_err: devl_port_unregister(dl_port); -reg_err: - mlx5_esw_offloads_pf_vf_devlink_port_cleanup(esw, vport_num); return err; }
@@ -160,7 +154,6 @@ void mlx5_esw_offloads_devlink_port_unregister(struct mlx5_eswitch *esw, u16 vpo }
devl_port_unregister(vport->dl_port); - mlx5_esw_offloads_pf_vf_devlink_port_cleanup(esw, vport_num); }
struct devlink_port *mlx5_esw_offloads_devlink_port(struct mlx5_eswitch *esw, u16 vport_num) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 85ddf6f7e37df..591184d892af6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1095,6 +1095,31 @@ void mlx5_eswitch_unload_vport(struct mlx5_eswitch *esw, u16 vport_num) mlx5_esw_vport_disable(esw, vport_num); }
+static int mlx5_eswitch_load_pf_vf_vport(struct mlx5_eswitch *esw, u16 vport_num, + enum mlx5_eswitch_vport_event enabled_events) +{ + int err; + + err = mlx5_esw_offloads_init_pf_vf_rep(esw, vport_num); + if (err) + return err; + + err = mlx5_eswitch_load_vport(esw, vport_num, enabled_events); + if (err) + goto err_load; + return 0; + +err_load: + mlx5_esw_offloads_cleanup_pf_vf_rep(esw, vport_num); + return err; +} + +static void mlx5_eswitch_unload_pf_vf_vport(struct mlx5_eswitch *esw, u16 vport_num) +{ + mlx5_eswitch_unload_vport(esw, vport_num); + mlx5_esw_offloads_cleanup_pf_vf_rep(esw, vport_num); +} + void mlx5_eswitch_unload_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs) { struct mlx5_vport *vport; @@ -1103,7 +1128,7 @@ void mlx5_eswitch_unload_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs) mlx5_esw_for_each_vf_vport(esw, i, vport, num_vfs) { if (!vport->enabled) continue; - mlx5_eswitch_unload_vport(esw, vport->vport); + mlx5_eswitch_unload_pf_vf_vport(esw, vport->vport); } }
@@ -1116,7 +1141,7 @@ static void mlx5_eswitch_unload_ec_vf_vports(struct mlx5_eswitch *esw, mlx5_esw_for_each_ec_vf_vport(esw, i, vport, num_ec_vfs) { if (!vport->enabled) continue; - mlx5_eswitch_unload_vport(esw, vport->vport); + mlx5_eswitch_unload_pf_vf_vport(esw, vport->vport); } }
@@ -1128,7 +1153,7 @@ int mlx5_eswitch_load_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs, int err;
mlx5_esw_for_each_vf_vport(esw, i, vport, num_vfs) { - err = mlx5_eswitch_load_vport(esw, vport->vport, enabled_events); + err = mlx5_eswitch_load_pf_vf_vport(esw, vport->vport, enabled_events); if (err) goto vf_err; } @@ -1148,7 +1173,7 @@ static int mlx5_eswitch_load_ec_vf_vports(struct mlx5_eswitch *esw, u16 num_ec_v int err;
mlx5_esw_for_each_ec_vf_vport(esw, i, vport, num_ec_vfs) { - err = mlx5_eswitch_load_vport(esw, vport->vport, enabled_events); + err = mlx5_eswitch_load_pf_vf_vport(esw, vport->vport, enabled_events); if (err) goto vf_err; } @@ -1190,7 +1215,7 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw, int ret;
/* Enable PF vport */ - ret = mlx5_eswitch_load_vport(esw, MLX5_VPORT_PF, enabled_events); + ret = mlx5_eswitch_load_pf_vf_vport(esw, MLX5_VPORT_PF, enabled_events); if (ret) return ret;
@@ -1201,7 +1226,7 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
/* Enable ECPF vport */ if (mlx5_ecpf_vport_exists(esw->dev)) { - ret = mlx5_eswitch_load_vport(esw, MLX5_VPORT_ECPF, enabled_events); + ret = mlx5_eswitch_load_pf_vf_vport(esw, MLX5_VPORT_ECPF, enabled_events); if (ret) goto ecpf_err; if (mlx5_core_ec_sriov_enabled(esw->dev)) { @@ -1224,11 +1249,11 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw, mlx5_eswitch_unload_ec_vf_vports(esw, esw->esw_funcs.num_ec_vfs); ec_vf_err: if (mlx5_ecpf_vport_exists(esw->dev)) - mlx5_eswitch_unload_vport(esw, MLX5_VPORT_ECPF); + mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_ECPF); ecpf_err: host_pf_disable_hca(esw->dev); pf_hca_err: - mlx5_eswitch_unload_vport(esw, MLX5_VPORT_PF); + mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); return ret; }
@@ -1242,11 +1267,11 @@ void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw) if (mlx5_ecpf_vport_exists(esw->dev)) { if (mlx5_core_ec_sriov_enabled(esw->dev)) mlx5_eswitch_unload_ec_vf_vports(esw, esw->esw_funcs.num_vfs); - mlx5_eswitch_unload_vport(esw, MLX5_VPORT_ECPF); + mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_ECPF); }
host_pf_disable_hca(esw->dev); - mlx5_eswitch_unload_vport(esw, MLX5_VPORT_PF); + mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); }
static void mlx5_eswitch_get_devlink_param(struct mlx5_eswitch *esw) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 040ed6d79258f..56d9a261a5c80 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -725,6 +725,8 @@ void mlx5_esw_set_spec_source_port(struct mlx5_eswitch *esw, u16 vport, struct mlx5_flow_spec *spec);
+int mlx5_esw_offloads_init_pf_vf_rep(struct mlx5_eswitch *esw, u16 vport_num); +void mlx5_esw_offloads_cleanup_pf_vf_rep(struct mlx5_eswitch *esw, u16 vport_num); int mlx5_esw_offloads_load_rep(struct mlx5_eswitch *esw, u16 vport_num); void mlx5_esw_offloads_unload_rep(struct mlx5_eswitch *esw, u16 vport_num);
@@ -739,6 +741,8 @@ int mlx5_eswitch_load_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs, enum mlx5_eswitch_vport_event enabled_events); void mlx5_eswitch_unload_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs);
+int mlx5_esw_offloads_pf_vf_devlink_port_init(struct mlx5_eswitch *esw, u16 vport_num); +void mlx5_esw_offloads_pf_vf_devlink_port_cleanup(struct mlx5_eswitch *esw, u16 vport_num); int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_num); void mlx5_esw_offloads_devlink_port_unregister(struct mlx5_eswitch *esw, u16 vport_num); struct devlink_port *mlx5_esw_offloads_devlink_port(struct mlx5_eswitch *esw, u16 vport_num); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 1eb49784c0e11..2f8bab73643e2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -2424,6 +2424,22 @@ void mlx5_esw_offloads_rep_unload(struct mlx5_eswitch *esw, u16 vport_num) __esw_offloads_unload_rep(esw, rep, rep_type); }
+int mlx5_esw_offloads_init_pf_vf_rep(struct mlx5_eswitch *esw, u16 vport_num) +{ + if (esw->mode != MLX5_ESWITCH_OFFLOADS) + return 0; + + return mlx5_esw_offloads_pf_vf_devlink_port_init(esw, vport_num); +} + +void mlx5_esw_offloads_cleanup_pf_vf_rep(struct mlx5_eswitch *esw, u16 vport_num) +{ + if (esw->mode != MLX5_ESWITCH_OFFLOADS) + return; + + mlx5_esw_offloads_pf_vf_devlink_port_cleanup(esw, vport_num); +} + int mlx5_esw_offloads_load_rep(struct mlx5_eswitch *esw, u16 vport_num) { int err;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bodong Wang bodong@nvidia.com
[ Upstream commit 344134609a564f28b3cc81ca6650319ccd5d8961 ]
ACL flow table is required in switchdev mode when metadata is enabled, driver creates such table when loading each vport. However, not every vport is loaded in switchdev mode. Such as ECPF if it's the eswitch manager. In this case, ACL flow table is still needed.
To make it modularized, create ACL flow table for eswitch manager as default and skip such operations when loading manager vport.
Also, there is no need to load the eswitch manager vport in switchdev mode. This means there is no need to load it on regular connect-x HCAs where the PF is the eswitch manager. This will avoid creating duplicate ACL flow table for host PF vport.
Fixes: 29bcb6e4fe70 ("net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rules") Fixes: eb8e9fae0a22 ("mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager") Fixes: 5019833d661f ("net/mlx5: E-switch, Introduce helper function to enable/disable vports") Signed-off-by: Bodong Wang bodong@nvidia.com Reviewed-by: Mark Bloch mbloch@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/eswitch.c | 21 ++++++-- .../mellanox/mlx5/core/eswitch_offloads.c | 49 +++++++++++++------ 2 files changed, 51 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 591184d892af6..6e9b1b183190d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1212,12 +1212,19 @@ int mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw, enum mlx5_eswitch_vport_event enabled_events) { + bool pf_needed; int ret;
+ pf_needed = mlx5_core_is_ecpf_esw_manager(esw->dev) || + esw->mode == MLX5_ESWITCH_LEGACY; + /* Enable PF vport */ - ret = mlx5_eswitch_load_pf_vf_vport(esw, MLX5_VPORT_PF, enabled_events); - if (ret) - return ret; + if (pf_needed) { + ret = mlx5_eswitch_load_pf_vf_vport(esw, MLX5_VPORT_PF, + enabled_events); + if (ret) + return ret; + }
/* Enable external host PF HCA */ ret = host_pf_enable_hca(esw->dev); @@ -1253,7 +1260,8 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw, ecpf_err: host_pf_disable_hca(esw->dev); pf_hca_err: - mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); + if (pf_needed) + mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); return ret; }
@@ -1271,7 +1279,10 @@ void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw) }
host_pf_disable_hca(esw->dev); - mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); + + if (mlx5_core_is_ecpf_esw_manager(esw->dev) || + esw->mode == MLX5_ESWITCH_LEGACY) + mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); }
static void mlx5_eswitch_get_devlink_param(struct mlx5_eswitch *esw) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 2f8bab73643e2..1ad5a72dcc3fd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -3092,26 +3092,47 @@ esw_vport_destroy_offloads_acl_tables(struct mlx5_eswitch *esw, esw_acl_ingress_ofld_cleanup(esw, vport); }
-static int esw_create_uplink_offloads_acl_tables(struct mlx5_eswitch *esw) +static int esw_create_offloads_acl_tables(struct mlx5_eswitch *esw) { - struct mlx5_vport *vport; + struct mlx5_vport *uplink, *manager; + int ret;
- vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_UPLINK); - if (IS_ERR(vport)) - return PTR_ERR(vport); + uplink = mlx5_eswitch_get_vport(esw, MLX5_VPORT_UPLINK); + if (IS_ERR(uplink)) + return PTR_ERR(uplink); + + ret = esw_vport_create_offloads_acl_tables(esw, uplink); + if (ret) + return ret; + + manager = mlx5_eswitch_get_vport(esw, esw->manager_vport); + if (IS_ERR(manager)) { + ret = PTR_ERR(manager); + goto err_manager; + }
- return esw_vport_create_offloads_acl_tables(esw, vport); + ret = esw_vport_create_offloads_acl_tables(esw, manager); + if (ret) + goto err_manager; + + return 0; + +err_manager: + esw_vport_destroy_offloads_acl_tables(esw, uplink); + return ret; }
-static void esw_destroy_uplink_offloads_acl_tables(struct mlx5_eswitch *esw) +static void esw_destroy_offloads_acl_tables(struct mlx5_eswitch *esw) { struct mlx5_vport *vport;
- vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_UPLINK); - if (IS_ERR(vport)) - return; + vport = mlx5_eswitch_get_vport(esw, esw->manager_vport); + if (!IS_ERR(vport)) + esw_vport_destroy_offloads_acl_tables(esw, vport);
- esw_vport_destroy_offloads_acl_tables(esw, vport); + vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_UPLINK); + if (!IS_ERR(vport)) + esw_vport_destroy_offloads_acl_tables(esw, vport); }
int mlx5_eswitch_reload_reps(struct mlx5_eswitch *esw) @@ -3156,7 +3177,7 @@ static int esw_offloads_steering_init(struct mlx5_eswitch *esw) } esw->fdb_table.offloads.indir = indir;
- err = esw_create_uplink_offloads_acl_tables(esw); + err = esw_create_offloads_acl_tables(esw); if (err) goto create_acl_err;
@@ -3197,7 +3218,7 @@ static int esw_offloads_steering_init(struct mlx5_eswitch *esw) create_restore_err: esw_destroy_offloads_table(esw); create_offloads_err: - esw_destroy_uplink_offloads_acl_tables(esw); + esw_destroy_offloads_acl_tables(esw); create_acl_err: mlx5_esw_indir_table_destroy(esw->fdb_table.offloads.indir); create_indir_err: @@ -3213,7 +3234,7 @@ static void esw_offloads_steering_cleanup(struct mlx5_eswitch *esw) esw_destroy_offloads_fdb_tables(esw); esw_destroy_restore_table(esw); esw_destroy_offloads_table(esw); - esw_destroy_uplink_offloads_acl_tables(esw); + esw_destroy_offloads_acl_tables(esw); mlx5_esw_indir_table_destroy(esw->fdb_table.offloads.indir); mutex_destroy(&esw->fdb_table.offloads.vports.lock); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 954ad9bf13c4f95a4958b5f8433301f2ab99e1f5 ]
More careful measurement of the tc-cbs bandwidth shows that the stream bandwidth (effectively idleslope) increases, there is a larger and larger discrepancy between the rate limit obtained by the software Qdisc, and the rate limit obtained by its offloaded counterpart.
The discrepancy becomes so large, that e.g. at an idleslope of 40000 (40Mbps), the offloaded cbs does not actually rate limit anything, and traffic will pass at line rate through a 100 Mbps port.
The reason for the discrepancy is that the hardware documentation I've been following is incorrect. UM11040.pdf (for SJA1105P/Q/R/S) states about IDLE_SLOPE that it is "the rate (in unit of bytes/sec) at which the credit counter is increased".
Cross-checking with UM10944.pdf (for SJA1105E/T) and UM11107.pdf (for SJA1110), the wording is different: "This field specifies the value, in bytes per second times link speed, by which the credit counter is increased".
So there's an extra scaling for link speed that the driver is currently not accounting for, and apparently (empirically), that link speed is expressed in Kbps.
I've pondered whether to pollute the sja1105_mac_link_up() implementation with CBS shaper reprogramming, but I don't think it is worth it. IMO, the UAPI exposed by tc-cbs requires user space to recalculate the sendslope anyway, since the formula for that depends on port_transmit_rate (see man tc-cbs), which is not an invariant from tc's perspective.
So we use the offload->sendslope and offload->idleslope to deduce the original port_transmit_rate from the CBS formula, and use that value to scale the offload->sendslope and offload->idleslope to values that the hardware understands.
Some numerical data points:
40Mbps stream, max interfering frame size 1500, port speed 100M ---------------------------------------------------------------
tc-cbs parameters: idleslope 40000 sendslope -60000 locredit -900 hicredit 600
which result in hardware values:
Before (doesn't work) After (works) credit_hi 600 600 credit_lo 900 900 send_slope 7500000 75 idle_slope 5000000 50
40Mbps stream, max interfering frame size 1500, port speed 1G -------------------------------------------------------------
tc-cbs parameters: idleslope 40000 sendslope -960000 locredit -1440 hicredit 60
which result in hardware values:
Before (doesn't work) After (works) credit_hi 60 60 credit_lo 1440 1440 send_slope 120000000 120 idle_slope 5000000 5
5.12Mbps stream, max interfering frame size 1522, port speed 100M -----------------------------------------------------------------
tc-cbs parameters: idleslope 5120 sendslope -94880 locredit -1444 hicredit 77
which result in hardware values:
Before (doesn't work) After (works) credit_hi 77 77 credit_lo 1444 1444 send_slope 11860000 118 idle_slope 640000 6
Tested on SJA1105T, SJA1105S and SJA1110A, at 1Gbps and 100Mbps.
Fixes: 4d7525085a9b ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc") Reported-by: Yanan Yang yanan.yang@nxp.com Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/sja1105/sja1105_main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index 3529a565b4aaf..fe7f09519653a 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -2157,6 +2157,7 @@ static int sja1105_setup_tc_cbs(struct dsa_switch *ds, int port, { struct sja1105_private *priv = ds->priv; struct sja1105_cbs_entry *cbs; + s64 port_transmit_rate_kbps; int index;
if (!offload->enable) @@ -2174,9 +2175,17 @@ static int sja1105_setup_tc_cbs(struct dsa_switch *ds, int port, */ cbs->credit_hi = offload->hicredit; cbs->credit_lo = abs(offload->locredit); - /* User space is in kbits/sec, hardware in bytes/sec */ - cbs->idle_slope = offload->idleslope * BYTES_PER_KBIT; - cbs->send_slope = abs(offload->sendslope * BYTES_PER_KBIT); + /* User space is in kbits/sec, while the hardware in bytes/sec times + * link speed. Since the given offload->sendslope is good only for the + * current link speed anyway, and user space is likely to reprogram it + * when that changes, don't even bother to track the port's link speed, + * but deduce the port transmit rate from idleslope - sendslope. + */ + port_transmit_rate_kbps = offload->idleslope - offload->sendslope; + cbs->idle_slope = div_s64(offload->idleslope * BYTES_PER_KBIT, + port_transmit_rate_kbps); + cbs->send_slope = div_s64(abs(offload->sendslope * BYTES_PER_KBIT), + port_transmit_rate_kbps); /* Convert the negative values from 64-bit 2's complement * to 32-bit 2's complement (for the case of 0x80000000 whose * negative is still negative).
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 894cafc5c62ccced758077bd4e970dc714c42637 ]
After running command [2] too many times in a row:
[1] $ tc qdisc add dev sw2p0 root handle 1: mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0 [2] $ tc qdisc replace dev sw2p0 parent 1:1 cbs offload 1 \ idleslope 120000 sendslope -880000 locredit -1320 hicredit 180
(aka more than priv->info->num_cbs_shapers times)
we start seeing the following error message:
Error: Specified device failed to setup cbs hardware offload.
This comes from the fact that ndo_setup_tc(TC_SETUP_QDISC_CBS) presents the same API for the qdisc create and replace cases, and the sja1105 driver fails to distinguish between the 2. Thus, it always thinks that it must allocate the same shaper for a {port, queue} pair, when it may instead have to replace an existing one.
Fixes: 4d7525085a9b ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/sja1105/sja1105_main.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index fe7f09519653a..e6d82ac106bee 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -2123,6 +2123,18 @@ static void sja1105_bridge_leave(struct dsa_switch *ds, int port,
#define BYTES_PER_KBIT (1000LL / 8)
+static int sja1105_find_cbs_shaper(struct sja1105_private *priv, + int port, int prio) +{ + int i; + + for (i = 0; i < priv->info->num_cbs_shapers; i++) + if (priv->cbs[i].port == port && priv->cbs[i].prio == prio) + return i; + + return -1; +} + static int sja1105_find_unused_cbs_shaper(struct sja1105_private *priv) { int i; @@ -2163,9 +2175,14 @@ static int sja1105_setup_tc_cbs(struct dsa_switch *ds, int port, if (!offload->enable) return sja1105_delete_cbs_shaper(priv, port, offload->queue);
- index = sja1105_find_unused_cbs_shaper(priv); - if (index < 0) - return -ENOSPC; + /* The user may be replacing an existing shaper */ + index = sja1105_find_cbs_shaper(priv, port, offload->queue); + if (index < 0) { + /* That isn't the case - see if we can allocate a new one */ + index = sja1105_find_unused_cbs_shaper(priv); + if (index < 0) + return -ENOSPC; + }
cbs = &priv->cbs[index]; cbs->port = port;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 180a7419fe4adc8d9c8e0ef0fd17bcdd0cf78acd ]
The blamed commit left this delta behind:
struct sja1105_cbs_entry { - u64 port; - u64 prio; + u64 port; /* Not used for SJA1110 */ + u64 prio; /* Not used for SJA1110 */ u64 credit_hi; u64 credit_lo; u64 send_slope; u64 idle_slope; };
but did not actually implement tc-cbs offload fully for the new switch. The offload is accepted, but it doesn't work.
The difference compared to earlier switch generations is that now, the table of CBS shapers is sparse, because there are many more shapers, so the mapping between a {port, prio} and a table index is static, rather than requiring us to store the port and prio into the sja1105_cbs_entry.
So, the problem is that the code programs the CBS shaper parameters at a dynamic table index which is incorrect.
All that needs to be done for SJA1110 CBS shapers to work is to bypass the logic which allocates shapers in a dense manner, as for SJA1105, and use the fixed mapping instead.
Fixes: 3e77e59bf8cf ("net: dsa: sja1105: add support for the SJA1110 switch family") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/sja1105/sja1105.h | 2 ++ drivers/net/dsa/sja1105/sja1105_main.c | 13 +++++++++++++ drivers/net/dsa/sja1105/sja1105_spi.c | 4 ++++ 3 files changed, 19 insertions(+)
diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h index dee35ba924ad2..0617d5ccd3ff1 100644 --- a/drivers/net/dsa/sja1105/sja1105.h +++ b/drivers/net/dsa/sja1105/sja1105.h @@ -132,6 +132,8 @@ struct sja1105_info { int max_frame_mem; int num_ports; bool multiple_cascade_ports; + /* Every {port, TXQ} has its own CBS shaper */ + bool fixed_cbs_mapping; enum dsa_tag_protocol tag_proto; const struct sja1105_dynamic_table_ops *dyn_ops; const struct sja1105_table_ops *static_ops; diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index e6d82ac106bee..b6deba4a75121 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -2122,12 +2122,22 @@ static void sja1105_bridge_leave(struct dsa_switch *ds, int port, }
#define BYTES_PER_KBIT (1000LL / 8) +/* Port 0 (the uC port) does not have CBS shapers */ +#define SJA1110_FIXED_CBS(port, prio) ((((port) - 1) * SJA1105_NUM_TC) + (prio))
static int sja1105_find_cbs_shaper(struct sja1105_private *priv, int port, int prio) { int i;
+ if (priv->info->fixed_cbs_mapping) { + i = SJA1110_FIXED_CBS(port, prio); + if (i >= 0 && i < priv->info->num_cbs_shapers) + return i; + + return -1; + } + for (i = 0; i < priv->info->num_cbs_shapers; i++) if (priv->cbs[i].port == port && priv->cbs[i].prio == prio) return i; @@ -2139,6 +2149,9 @@ static int sja1105_find_unused_cbs_shaper(struct sja1105_private *priv) { int i;
+ if (priv->info->fixed_cbs_mapping) + return -1; + for (i = 0; i < priv->info->num_cbs_shapers; i++) if (!priv->cbs[i].idle_slope && !priv->cbs[i].send_slope) return i; diff --git a/drivers/net/dsa/sja1105/sja1105_spi.c b/drivers/net/dsa/sja1105/sja1105_spi.c index 5ce29c8057a41..834b5c1b4db0c 100644 --- a/drivers/net/dsa/sja1105/sja1105_spi.c +++ b/drivers/net/dsa/sja1105/sja1105_spi.c @@ -781,6 +781,7 @@ const struct sja1105_info sja1110a_info = { .tag_proto = DSA_TAG_PROTO_SJA1110, .can_limit_mcast_flood = true, .multiple_cascade_ports = true, + .fixed_cbs_mapping = true, .ptp_ts_bits = 32, .ptpegr_ts_bytes = 8, .max_frame_mem = SJA1110_MAX_FRAME_MEMORY, @@ -831,6 +832,7 @@ const struct sja1105_info sja1110b_info = { .tag_proto = DSA_TAG_PROTO_SJA1110, .can_limit_mcast_flood = true, .multiple_cascade_ports = true, + .fixed_cbs_mapping = true, .ptp_ts_bits = 32, .ptpegr_ts_bytes = 8, .max_frame_mem = SJA1110_MAX_FRAME_MEMORY, @@ -881,6 +883,7 @@ const struct sja1105_info sja1110c_info = { .tag_proto = DSA_TAG_PROTO_SJA1110, .can_limit_mcast_flood = true, .multiple_cascade_ports = true, + .fixed_cbs_mapping = true, .ptp_ts_bits = 32, .ptpegr_ts_bytes = 8, .max_frame_mem = SJA1110_MAX_FRAME_MEMORY, @@ -931,6 +934,7 @@ const struct sja1105_info sja1110d_info = { .tag_proto = DSA_TAG_PROTO_SJA1110, .can_limit_mcast_flood = true, .multiple_cascade_ports = true, + .fixed_cbs_mapping = true, .ptp_ts_bits = 32, .ptpegr_ts_bytes = 8, .max_frame_mem = SJA1110_MAX_FRAME_MEMORY,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 1a961e74d5abbea049588a3d74b759955b4ed9d5 ]
sphinx complains about the use of "%PHYLINK_PCS_NEG_*":
Documentation/networking/kapi:144: ./include/linux/phylink.h:601: WARNING: Inline literal start-string without end-string. Documentation/networking/kapi:144: ./include/linux/phylink.h:633: WARNING: Inline literal start-string without end-string.
These are not valid symbols so drop the '%' prefix.
Alternatively we could use %PHYLINK_PCS_NEG_* (escape the *) or use normal literal ``PHYLINK_PCS_NEG_*`` but there is already a handful of un-adorned DEFINE_* in this file.
Fixes: f99d471afa03 ("net: phylink: add PCS negotiation mode") Reported-by: Stephen Rothwell sfr@canb.auug.org.au Link: https://lore.kernel.org/all/20230626162908.2f149f98@canb.auug.org.au/ Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Bagas Sanjaya bagasdotme@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/phylink.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 1817940a3418e..4dff1eba425ba 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -615,7 +615,7 @@ void pcs_get_state(struct phylink_pcs *pcs, * * The %neg_mode argument should be tested via the phylink_mode_*() family of * functions, or for PCS that set pcs->neg_mode true, should be tested - * against the %PHYLINK_PCS_NEG_* definitions. + * against the PHYLINK_PCS_NEG_* definitions. */ int pcs_config(struct phylink_pcs *pcs, unsigned int neg_mode, phy_interface_t interface, const unsigned long *advertising, @@ -645,7 +645,7 @@ void pcs_an_restart(struct phylink_pcs *pcs); * * The %mode argument should be tested via the phylink_mode_*() family of * functions, or for PCS that set pcs->neg_mode true, should be tested - * against the %PHYLINK_PCS_NEG_* definitions. + * against the PHYLINK_PCS_NEG_* definitions. */ void pcs_link_up(struct phylink_pcs *pcs, unsigned int neg_mode, phy_interface_t interface, int speed, int duplex);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit 7645629f7dc88cd777f98970134bf1a54c8d77e3 ]
If __bpf_prog_enter_sleepable_recur() detects recursion then it returns 0 without undoing rcu_read_lock_trace(), migrate_disable() or decrementing the recursion counter. This is fine in the JIT case because the JIT code will jump in the 0 case to the end and invoke the matching exit trampoline (__bpf_prog_exit_sleepable_recur()).
This is not the case in kern_sys_bpf() which returns directly to the caller with an error code.
Add __bpf_prog_exit_sleepable_recur() as clean up in the recursion case.
Fixes: b1d18a7574d0d ("bpf: Extend sys_bpf commands for bpf_syscall programs.") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Jiri Olsa jolsa@kernel.org Link: https://lore.kernel.org/bpf/20230830080405.251926-2-bigeasy@linutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/syscall.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index a2aef900519c2..c925c270ed8b4 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -5307,6 +5307,7 @@ int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size) run_ctx.saved_run_ctx = NULL; if (!__bpf_prog_enter_sleepable_recur(prog, &run_ctx)) { /* recursion detected */ + __bpf_prog_exit_sleepable_recur(prog, 0, &run_ctx); bpf_prog_put(prog); return -EBUSY; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit 6764e767f4af1e35f87f3497e1182d945de37f93 ]
__bpf_prog_enter_recur() assigns bpf_tramp_run_ctx::saved_run_ctx before performing the recursion check which means in case of a recursion __bpf_prog_exit_recur() uses the previously set bpf_tramp_run_ctx::saved_run_ctx value.
__bpf_prog_enter_sleepable_recur() assigns bpf_tramp_run_ctx::saved_run_ctx after the recursion check which means in case of a recursion __bpf_prog_exit_sleepable_recur() uses an uninitialized value. This does not look right. If I read the entry trampoline code right, then bpf_tramp_run_ctx isn't initialized upfront.
Align __bpf_prog_enter_sleepable_recur() with __bpf_prog_enter_recur() and set bpf_tramp_run_ctx::saved_run_ctx before the recursion check is made. Remove the assignment of saved_run_ctx in kern_sys_bpf() since it happens a few cycles later.
Fixes: e384c7b7b46d0 ("bpf, x86: Create bpf_tramp_run_ctx on the caller thread's stack") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Jiri Olsa jolsa@kernel.org Link: https://lore.kernel.org/bpf/20230830080405.251926-3-bigeasy@linutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/syscall.c | 1 - kernel/bpf/trampoline.c | 5 ++--- 2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c925c270ed8b4..1480b6cf12f06 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -5304,7 +5304,6 @@ int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size) }
run_ctx.bpf_cookie = 0; - run_ctx.saved_run_ctx = NULL; if (!__bpf_prog_enter_sleepable_recur(prog, &run_ctx)) { /* recursion detected */ __bpf_prog_exit_sleepable_recur(prog, 0, &run_ctx); diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 78acf28d48732..53ff50cac61ea 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -926,13 +926,12 @@ u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog, migrate_disable(); might_fault();
+ run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); + if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { bpf_prog_inc_misses_counter(prog); return 0; } - - run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); - return bpf_prog_start_time(); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Leoshkevich iii@linux.ibm.com
[ Upstream commit a192103a11465e9d517975c50f9944dc80e44d61 ]
s390x eBPF programs use the following extension to the s390x calling convention: tail call counter is passed on stack at offset STK_OFF_TCCNT, which callees otherwise use as scratch space.
Currently trampoline does not respect this and clobbers tail call counter. This breaks enforcing tail call limits in eBPF programs, which have trampolines attached to them.
Fix by forwarding a copy of the tail call counter to the original eBPF program in the trampoline (for fexit), and by restoring it at the end of the trampoline (for fentry).
Fixes: 528eb2cb87bc ("s390/bpf: Implement arch_prepare_bpf_trampoline()") Reported-by: Leon Hwang hffilwlqm@gmail.com Signed-off-by: Ilya Leoshkevich iii@linux.ibm.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20230906004448.111674-1-iii@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/net/bpf_jit_comp.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 5e9371fbf3d5f..de2fb12120d2e 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -2088,6 +2088,7 @@ struct bpf_tramp_jit { */ int r14_off; /* Offset of saved %r14 */ int run_ctx_off; /* Offset of struct bpf_tramp_run_ctx */ + int tccnt_off; /* Offset of saved tailcall counter */ int do_fexit; /* do_fexit: label */ };
@@ -2258,12 +2259,16 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, tjit->r14_off = alloc_stack(tjit, sizeof(u64)); tjit->run_ctx_off = alloc_stack(tjit, sizeof(struct bpf_tramp_run_ctx)); + tjit->tccnt_off = alloc_stack(tjit, sizeof(u64)); /* The caller has already reserved STACK_FRAME_OVERHEAD bytes. */ tjit->stack_size -= STACK_FRAME_OVERHEAD; tjit->orig_stack_args_off = tjit->stack_size + STACK_FRAME_OVERHEAD;
/* aghi %r15,-stack_size */ EMIT4_IMM(0xa70b0000, REG_15, -tjit->stack_size); + /* mvc tccnt_off(4,%r15),stack_size+STK_OFF_TCCNT(%r15) */ + _EMIT6(0xd203f000 | tjit->tccnt_off, + 0xf000 | (tjit->stack_size + STK_OFF_TCCNT)); /* stmg %r2,%rN,fwd_reg_args_off(%r15) */ if (nr_reg_args) EMIT6_DISP_LH(0xeb000000, 0x0024, REG_2, @@ -2400,6 +2405,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, (nr_stack_args * sizeof(u64) - 1) << 16 | tjit->stack_args_off, 0xf000 | tjit->orig_stack_args_off); + /* mvc STK_OFF_TCCNT(4,%r15),tccnt_off(%r15) */ + _EMIT6(0xd203f000 | STK_OFF_TCCNT, 0xf000 | tjit->tccnt_off); /* lgr %r1,%r8 */ EMIT4(0xb9040000, REG_1, REG_8); /* %r1() */ @@ -2456,6 +2463,9 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, if (flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET)) EMIT6_DISP_LH(0xe3000000, 0x0004, REG_2, REG_0, REG_15, tjit->retval_off); + /* mvc stack_size+STK_OFF_TCCNT(4,%r15),tccnt_off(%r15) */ + _EMIT6(0xd203f000 | (tjit->stack_size + STK_OFF_TCCNT), + 0xf000 | tjit->tccnt_off); /* aghi %r15,stack_size */ EMIT4_IMM(0xa70b0000, REG_15, tjit->stack_size); /* Emit an expoline for the following indirect jump. */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin KaFai Lau martin.lau@kernel.org
[ Upstream commit a96a44aba556c42b432929d37d60158aca21ad4c ]
'./test_progs -t test_local_storage' reported a splat:
[ 27.137569] ============================= [ 27.138122] [ BUG: Invalid wait context ] [ 27.138650] 6.5.0-03980-gd11ae1b16b0a #247 Tainted: G O [ 27.139542] ----------------------------- [ 27.140106] test_progs/1729 is trying to lock: [ 27.140713] ffff8883ef047b88 (stock_lock){-.-.}-{3:3}, at: local_lock_acquire+0x9/0x130 [ 27.141834] other info that might help us debug this: [ 27.142437] context-{5:5} [ 27.142856] 2 locks held by test_progs/1729: [ 27.143352] #0: ffffffff84bcd9c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x40 [ 27.144492] #1: ffff888107deb2c0 (&storage->lock){..-.}-{2:2}, at: bpf_local_storage_update+0x39e/0x8e0 [ 27.145855] stack backtrace: [ 27.146274] CPU: 0 PID: 1729 Comm: test_progs Tainted: G O 6.5.0-03980-gd11ae1b16b0a #247 [ 27.147550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 27.149127] Call Trace: [ 27.149490] <TASK> [ 27.149867] dump_stack_lvl+0x130/0x1d0 [ 27.152609] dump_stack+0x14/0x20 [ 27.153131] __lock_acquire+0x1657/0x2220 [ 27.153677] lock_acquire+0x1b8/0x510 [ 27.157908] local_lock_acquire+0x29/0x130 [ 27.159048] obj_cgroup_charge+0xf4/0x3c0 [ 27.160794] slab_pre_alloc_hook+0x28e/0x2b0 [ 27.161931] __kmem_cache_alloc_node+0x51/0x210 [ 27.163557] __kmalloc+0xaa/0x210 [ 27.164593] bpf_map_kzalloc+0xbc/0x170 [ 27.165147] bpf_selem_alloc+0x130/0x510 [ 27.166295] bpf_local_storage_update+0x5aa/0x8e0 [ 27.167042] bpf_fd_sk_storage_update_elem+0xdb/0x1a0 [ 27.169199] bpf_map_update_value+0x415/0x4f0 [ 27.169871] map_update_elem+0x413/0x550 [ 27.170330] __sys_bpf+0x5e9/0x640 [ 27.174065] __x64_sys_bpf+0x80/0x90 [ 27.174568] do_syscall_64+0x48/0xa0 [ 27.175201] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 27.175932] RIP: 0033:0x7effb40e41ad [ 27.176357] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d8 [ 27.179028] RSP: 002b:00007ffe64c21fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141 [ 27.180088] RAX: ffffffffffffffda RBX: 00007ffe64c22768 RCX: 00007effb40e41ad [ 27.181082] RDX: 0000000000000020 RSI: 00007ffe64c22008 RDI: 0000000000000002 [ 27.182030] RBP: 00007ffe64c21ff0 R08: 0000000000000000 R09: 00007ffe64c22788 [ 27.183038] R10: 0000000000000064 R11: 0000000000000202 R12: 0000000000000000 [ 27.184006] R13: 00007ffe64c22788 R14: 00007effb42a1000 R15: 0000000000000000 [ 27.184958] </TASK>
It complains about acquiring a local_lock while holding a raw_spin_lock. It means it should not allocate memory while holding a raw_spin_lock since it is not safe for RT.
raw_spin_lock is needed because bpf_local_storage supports tracing context. In particular for task local storage, it is easy to get a "current" task PTR_TO_BTF_ID in tracing bpf prog. However, task (and cgroup) local storage has already been moved to bpf mem allocator which can be used after raw_spin_lock.
The splat is for the sk storage. For sk (and inode) storage, it has not been moved to bpf mem allocator. Using raw_spin_lock or not, kzalloc(GFP_ATOMIC) could theoretically be unsafe in tracing context. However, the local storage helper requires a verifier accepted sk pointer (PTR_TO_BTF_ID), it is hypothetical if that (mean running a bpf prog in a kzalloc unsafe context and also able to hold a verifier accepted sk pointer) could happen.
This patch avoids kzalloc after raw_spin_lock to silent the splat. There is an existing kzalloc before the raw_spin_lock. At that point, a kzalloc is very likely required because a lookup has just been done before. Thus, this patch always does the kzalloc before acquiring the raw_spin_lock and remove the later kzalloc usage after the raw_spin_lock. After this change, it will have a charge and then uncharge during the syscall bpf_map_update_elem() code path. This patch opts for simplicity and not continue the old optimization to save one charge and uncharge.
This issue is dated back to the very first commit of bpf_sk_storage which had been refactored multiple times to create task, inode, and cgroup storage. This patch uses a Fixes tag with a more recent commit that should be easier to do backport.
Fixes: b00fa38a9c1c ("bpf: Enable non-atomic allocations in local storage") Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20230901231129.578493-2-martin.lau@linux.dev Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/bpf_local_storage.c | 47 ++++++++++------------------------ 1 file changed, 14 insertions(+), 33 deletions(-)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c index b5149cfce7d4d..37ad47d52dc55 100644 --- a/kernel/bpf/bpf_local_storage.c +++ b/kernel/bpf/bpf_local_storage.c @@ -553,7 +553,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, void *value, u64 map_flags, gfp_t gfp_flags) { struct bpf_local_storage_data *old_sdata = NULL; - struct bpf_local_storage_elem *selem = NULL; + struct bpf_local_storage_elem *alloc_selem, *selem = NULL; struct bpf_local_storage *local_storage; unsigned long flags; int err; @@ -607,11 +607,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, } }
- if (gfp_flags == GFP_KERNEL) { - selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); - if (!selem) - return ERR_PTR(-ENOMEM); - } + /* A lookup has just been done before and concluded a new selem is + * needed. The chance of an unnecessary alloc is unlikely. + */ + alloc_selem = selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); + if (!alloc_selem) + return ERR_PTR(-ENOMEM);
raw_spin_lock_irqsave(&local_storage->lock, flags);
@@ -623,13 +624,13 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, * simple. */ err = -EAGAIN; - goto unlock_err; + goto unlock; }
old_sdata = bpf_local_storage_lookup(local_storage, smap, false); err = check_flags(old_sdata, map_flags); if (err) - goto unlock_err; + goto unlock;
if (old_sdata && (map_flags & BPF_F_LOCK)) { copy_map_value_locked(&smap->map, old_sdata->data, value, @@ -638,23 +639,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, goto unlock; }
- if (gfp_flags != GFP_KERNEL) { - /* local_storage->lock is held. Hence, we are sure - * we can unlink and uncharge the old_sdata successfully - * later. Hence, instead of charging the new selem now - * and then uncharge the old selem later (which may cause - * a potential but unnecessary charge failure), avoid taking - * a charge at all here (the "!old_sdata" check) and the - * old_sdata will not be uncharged later during - * bpf_selem_unlink_storage_nolock(). - */ - selem = bpf_selem_alloc(smap, owner, value, !old_sdata, gfp_flags); - if (!selem) { - err = -ENOMEM; - goto unlock_err; - } - } - + alloc_selem = NULL; /* First, link the new selem to the map */ bpf_selem_link_map(smap, selem);
@@ -665,20 +650,16 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, if (old_sdata) { bpf_selem_unlink_map(SELEM(old_sdata)); bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata), - false, false); + true, false); }
unlock: raw_spin_unlock_irqrestore(&local_storage->lock, flags); - return SDATA(selem); - -unlock_err: - raw_spin_unlock_irqrestore(&local_storage->lock, flags); - if (selem) { + if (alloc_selem) { mem_uncharge(smap, owner, smap->elem_size); - bpf_selem_free(selem, smap, true); + bpf_selem_free(alloc_selem, smap, true); } - return ERR_PTR(err); + return err ? ERR_PTR(err) : SDATA(selem); }
static u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin KaFai Lau martin.lau@kernel.org
[ Upstream commit 55d49f750b1cb1f177fb1b00ae02cba4613bcfb7 ]
The commit c83597fa5dc6 ("bpf: Refactor some inode/task/sk storage functions for reuse"), refactored the bpf_{sk,task,inode}_storage_free() into bpf_local_storage_unlink_nolock() which then later renamed to bpf_local_storage_destroy(). The commit accidentally passed the "bool uncharge_mem = false" argument to bpf_selem_unlink_storage_nolock() which then stopped the uncharge from happening to the sk->sk_omem_alloc.
This missing uncharge only happens when the sk is going away (during __sk_destruct).
This patch fixes it by always passing "uncharge_mem = true". It is a noop to the task/inode/cgroup storage because they do not have the map_local_storage_(un)charge enabled in the map_ops. A followup patch will be done in bpf-next to remove the uncharge_mem argument.
A selftest is added in the next patch.
Fixes: c83597fa5dc6 ("bpf: Refactor some inode/task/sk storage functions for reuse") Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20230901231129.578493-3-martin.lau@linux.dev Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/bpf_local_storage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c index 37ad47d52dc55..146824cc96893 100644 --- a/kernel/bpf/bpf_local_storage.c +++ b/kernel/bpf/bpf_local_storage.c @@ -760,7 +760,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage) * of the loop will set the free_cgroup_storage to true. */ free_storage = bpf_selem_unlink_storage_nolock( - local_storage, selem, false, true); + local_storage, selem, true, true); } raw_spin_unlock_irqrestore(&local_storage->lock, flags);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit fd94d9dadee58e09b49075240fe83423eb1dcd36 ]
If priv->len is a multiple of 4, then dst[len / 4] can write past the destination array which leads to stack corruption.
This construct is necessary to clean the remainder of the register in case ->len is NOT a multiple of the register size, so make it conditional just like nft_payload.c does.
The bug was added in 4.1 cycle and then copied/inherited when tcp/sctp and ip option support was added.
Bug reported by Zero Day Initiative project (ZDI-CAN-21950, ZDI-CAN-21951, ZDI-CAN-21961).
Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing") Fixes: 935b7f643018 ("netfilter: nft_exthdr: add TCP option matching") Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks") Fixes: dbb5281a1f84 ("netfilter: nf_tables: add support for matching IPv4 options") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_exthdr.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c index a9844eefedebc..3fbaa7bf41f9c 100644 --- a/net/netfilter/nft_exthdr.c +++ b/net/netfilter/nft_exthdr.c @@ -35,6 +35,14 @@ static unsigned int optlen(const u8 *opt, unsigned int offset) return opt[offset + 1]; }
+static int nft_skb_copy_to_reg(const struct sk_buff *skb, int offset, u32 *dest, unsigned int len) +{ + if (len % NFT_REG32_SIZE) + dest[len / NFT_REG32_SIZE] = 0; + + return skb_copy_bits(skb, offset, dest, len); +} + static void nft_exthdr_ipv6_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) @@ -56,8 +64,7 @@ static void nft_exthdr_ipv6_eval(const struct nft_expr *expr, } offset += priv->offset;
- dest[priv->len / NFT_REG32_SIZE] = 0; - if (skb_copy_bits(pkt->skb, offset, dest, priv->len) < 0) + if (nft_skb_copy_to_reg(pkt->skb, offset, dest, priv->len) < 0) goto err; return; err: @@ -153,8 +160,7 @@ static void nft_exthdr_ipv4_eval(const struct nft_expr *expr, } offset += priv->offset;
- dest[priv->len / NFT_REG32_SIZE] = 0; - if (skb_copy_bits(pkt->skb, offset, dest, priv->len) < 0) + if (nft_skb_copy_to_reg(pkt->skb, offset, dest, priv->len) < 0) goto err; return; err: @@ -210,7 +216,8 @@ static void nft_exthdr_tcp_eval(const struct nft_expr *expr, if (priv->flags & NFT_EXTHDR_F_PRESENT) { *dest = 1; } else { - dest[priv->len / NFT_REG32_SIZE] = 0; + if (priv->len % NFT_REG32_SIZE) + dest[priv->len / NFT_REG32_SIZE] = 0; memcpy(dest, opt + offset, priv->len); }
@@ -388,9 +395,8 @@ static void nft_exthdr_sctp_eval(const struct nft_expr *expr, offset + ntohs(sch->length) > pkt->skb->len) break;
- dest[priv->len / NFT_REG32_SIZE] = 0; - if (skb_copy_bits(pkt->skb, offset + priv->offset, - dest, priv->len) < 0) + if (nft_skb_copy_to_reg(pkt->skb, offset + priv->offset, + dest, priv->len) < 0) break; return; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wander Lairson Costa wander@redhat.com
[ Upstream commit f4f8a7803119005e87b716874bec07c751efafec ]
The opt_num field is controlled by user mode and is not currently validated inside the kernel. An attacker can take advantage of this to trigger an OOB read and potentially leak information.
BUG: KASAN: slab-out-of-bounds in nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88 Read of size 2 at addr ffff88804bc64272 by task poc/6431
CPU: 1 PID: 6431 Comm: poc Not tainted 6.0.0-rc4 #1 Call Trace: nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88 nf_osf_find+0x186/0x2f0 net/netfilter/nfnetlink_osf.c:281 nft_osf_eval+0x37f/0x590 net/netfilter/nft_osf.c:47 expr_call_ops_eval net/netfilter/nf_tables_core.c:214 nft_do_chain+0x2b0/0x1490 net/netfilter/nf_tables_core.c:264 nft_do_chain_ipv4+0x17c/0x1f0 net/netfilter/nft_chain_filter.c:23 [..]
Also add validation to genre, subtype and version fields.
Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match") Reported-by: Lucas Leong wmliang@infosec.exchange Signed-off-by: Wander Lairson Costa wander@redhat.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nfnetlink_osf.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/net/netfilter/nfnetlink_osf.c b/net/netfilter/nfnetlink_osf.c index 8f1bfa6ccc2d9..50723ba082890 100644 --- a/net/netfilter/nfnetlink_osf.c +++ b/net/netfilter/nfnetlink_osf.c @@ -315,6 +315,14 @@ static int nfnl_osf_add_callback(struct sk_buff *skb,
f = nla_data(osf_attrs[OSF_ATTR_FINGER]);
+ if (f->opt_num > ARRAY_SIZE(f->opt)) + return -EINVAL; + + if (!memchr(f->genre, 0, MAXGENRELEN) || + !memchr(f->subtype, 0, MAXGENRELEN) || + !memchr(f->version, 0, MAXGENRELEN)) + return -EINVAL; + kf = kmalloc(sizeof(struct nf_osf_finger), GFP_KERNEL); if (!kf) return -ENOMEM;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 2ee52ae94baabf7ee09cf2a8d854b990dac5d0e4 ]
New elements in this transaction might expired before such transaction ends. Skip sync GC for such elements otherwise commit path might walk over an already released object. Once transaction is finished, async GC will collect such expired element.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_rbtree.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index c6435e7092319..f250b5399344a 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -312,6 +312,7 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, struct nft_rbtree_elem *rbe, *rbe_le = NULL, *rbe_ge = NULL; struct rb_node *node, *next, *parent, **p, *first = NULL; struct nft_rbtree *priv = nft_set_priv(set); + u8 cur_genmask = nft_genmask_cur(net); u8 genmask = nft_genmask_next(net); int d, err;
@@ -357,8 +358,11 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, if (!nft_set_elem_active(&rbe->ext, genmask)) continue;
- /* perform garbage collection to avoid bogus overlap reports. */ - if (nft_set_elem_expired(&rbe->ext)) { + /* perform garbage collection to avoid bogus overlap reports + * but skip new elements in this transaction. + */ + if (nft_set_elem_expired(&rbe->ext) && + nft_set_elem_active(&rbe->ext, cur_genmask)) { err = nft_rbtree_gc_elem(set, priv, rbe, genmask); if (err < 0) return err;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 9b5ba5c9c5109bf89dc64a3f4734bd125d1ce52e ]
Deliver audit log from __nf_tables_dump_rules(), table dereference at the end of the table list loop might point to the list head, leading to this crash.
[ 4137.407349] BUG: unable to handle page fault for address: 00000000001f3c50 [ 4137.407357] #PF: supervisor read access in kernel mode [ 4137.407359] #PF: error_code(0x0000) - not-present page [ 4137.407360] PGD 0 P4D 0 [ 4137.407363] Oops: 0000 [#1] PREEMPT SMP PTI [ 4137.407365] CPU: 4 PID: 500177 Comm: nft Not tainted 6.5.0+ #277 [ 4137.407369] RIP: 0010:string+0x49/0xd0 [ 4137.407374] Code: ff 77 36 45 89 d1 31 f6 49 01 f9 66 45 85 d2 75 19 eb 1e 49 39 f8 76 02 88 07 48 83 c7 01 83 c6 01 48 83 c2 01 4c 39 cf 74 07 <0f> b6 02 84 c0 75 e2 4c 89 c2 e9 58 e5 ff ff 48 c7 c0 0e b2 ff 81 [ 4137.407377] RSP: 0018:ffff8881179737f0 EFLAGS: 00010286 [ 4137.407379] RAX: 00000000001f2c50 RBX: ffff888117973848 RCX: ffff0a00ffffff04 [ 4137.407380] RDX: 00000000001f3c50 RSI: 0000000000000000 RDI: 0000000000000000 [ 4137.407381] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff [ 4137.407383] R10: ffffffffffffffff R11: ffff88813584d200 R12: 0000000000000000 [ 4137.407384] R13: ffffffffa15cf709 R14: 0000000000000000 R15: ffffffffa15cf709 [ 4137.407385] FS: 00007fcfc18bb580(0000) GS:ffff88840e700000(0000) knlGS:0000000000000000 [ 4137.407387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4137.407388] CR2: 00000000001f3c50 CR3: 00000001055b2001 CR4: 00000000001706e0 [ 4137.407390] Call Trace: [ 4137.407392] <TASK> [ 4137.407393] ? __die+0x1b/0x60 [ 4137.407397] ? page_fault_oops+0x6b/0xa0 [ 4137.407399] ? exc_page_fault+0x60/0x120 [ 4137.407403] ? asm_exc_page_fault+0x22/0x30 [ 4137.407408] ? string+0x49/0xd0 [ 4137.407410] vsnprintf+0x257/0x4f0 [ 4137.407414] kvasprintf+0x3e/0xb0 [ 4137.407417] kasprintf+0x3e/0x50 [ 4137.407419] nf_tables_dump_rules+0x1c0/0x360 [nf_tables] [ 4137.407439] ? __alloc_skb+0xc3/0x170 [ 4137.407442] netlink_dump+0x170/0x330 [ 4137.407447] __netlink_dump_start+0x227/0x300 [ 4137.407449] nf_tables_getrule+0x205/0x390 [nf_tables]
Deliver audit log only once at the end of the rule dump+reset for consistency with the set dump+reset.
Ensure audit reset access to table under rcu read side lock. The table list iteration holds rcu read lock side, but recent audit code dereferences table object out of the rcu read lock side.
Fixes: ea078ae9108e ("netfilter: nf_tables: Audit log rule reset") Fixes: 7e9be1124dbe ("netfilter: nf_tables: Audit log setelem reset") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Acked-by: Phil Sutter phil@nwl.cc Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index cc70482b94907..a72934f00804e 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3480,6 +3480,10 @@ static int __nf_tables_dump_rules(struct sk_buff *skb, cont_skip: (*idx)++; } + + if (reset && *idx) + audit_log_rule_reset(table, cb->seq, *idx); + return 0; }
@@ -3540,9 +3544,6 @@ static int nf_tables_dump_rules(struct sk_buff *skb, done: rcu_read_unlock();
- if (reset && idx > cb->args[0]) - audit_log_rule_reset(table, cb->seq, idx - cb->args[0]); - cb->args[0] = idx; return skb->len; } @@ -5757,8 +5758,6 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb) if (!args.iter.err && args.iter.count == cb->args[0]) args.iter.err = nft_set_catchall_dump(net, skb, set, reset, cb->seq); - rcu_read_unlock(); - nla_nest_end(skb, nest); nlmsg_end(skb, nlh);
@@ -5766,6 +5765,8 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb) audit_log_nft_set_reset(table, cb->seq, args.iter.count - args.iter.skip);
+ rcu_read_unlock(); + if (args.iter.err && args.iter.err != -EMSGSIZE) return args.iter.err; if (args.iter.count == cb->args[0])
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lukasz Majewski lukma@denx.de
[ Upstream commit 08c6d8bae48c2c28f7017d7b61b5d5a1518ceb39 ]
The KSZ9477 errata points out (in 'Module 4') the link up/down problems when EEE (Energy Efficient Ethernet) is enabled in the device to which the KSZ9477 tries to auto negotiate.
The suggested workaround is to clear advertisement of EEE for PHYs in this chip driver.
To avoid regressions with other switch ICs the new MICREL_NO_EEE flag has been introduced.
Moreover, the in-register disablement of MMD_DEVICE_ID_EEE_ADV.MMD_EEE_ADV MMD register is removed, as this code is both; now executed too late (after previous rework of the PHY and DSA for KSZ switches) and not required as setting all members of eee_broken_modes bit field prevents the KSZ9477 from advertising EEE.
Fixes: 69d3b36ca045 ("net: dsa: microchip: enable EEE support") # for KSZ9477 Signed-off-by: Lukasz Majewski lukma@denx.de Tested-by: Oleksij Rempel o.rempel@pengutronix.de # Confirmed disabled EEE with oscilloscope. Reviewed-by: Oleksij Rempel o.rempel@pengutronix.de Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Link: https://lore.kernel.org/r/20230905093315.784052-1-lukma@denx.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/microchip/ksz_common.c | 16 +++++++++++++++- drivers/net/phy/micrel.c | 9 ++++++--- include/linux/micrel_phy.h | 1 + 3 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c index 6c0623f88654e..d9d843efd111f 100644 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@ -2337,13 +2337,27 @@ static u32 ksz_get_phy_flags(struct dsa_switch *ds, int port) { struct ksz_device *dev = ds->priv;
- if (dev->chip_id == KSZ8830_CHIP_ID) { + switch (dev->chip_id) { + case KSZ8830_CHIP_ID: /* Silicon Errata Sheet (DS80000830A): * Port 1 does not work with LinkMD Cable-Testing. * Port 1 does not respond to received PAUSE control frames. */ if (!port) return MICREL_KSZ8_P1_ERRATA; + break; + case KSZ9477_CHIP_ID: + /* KSZ9477 Errata DS80000754C + * + * Module 4: Energy Efficient Ethernet (EEE) feature select must + * be manually disabled + * The EEE feature is enabled by default, but it is not fully + * operational. It must be manually disabled through register + * controls. If not disabled, the PHY ports can auto-negotiate + * to enable EEE, and this feature can cause link drops when + * linked to another device supporting EEE. + */ + return MICREL_NO_EEE; }
return 0; diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index b6d7981b2d1ee..927d3d54658ef 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -1800,9 +1800,6 @@ static const struct ksz9477_errata_write ksz9477_errata_writes[] = { /* Transmit waveform amplitude can be improved (1000BASE-T, 100BASE-TX, 10BASE-Te) */ {0x1c, 0x04, 0x00d0},
- /* Energy Efficient Ethernet (EEE) feature select must be manually disabled */ - {0x07, 0x3c, 0x0000}, - /* Register settings are required to meet data sheet supply current specifications */ {0x1c, 0x13, 0x6eff}, {0x1c, 0x14, 0xe6ff}, @@ -1847,6 +1844,12 @@ static int ksz9477_config_init(struct phy_device *phydev) return err; }
+ /* According to KSZ9477 Errata DS80000754C (Module 4) all EEE modes + * in this switch shall be regarded as broken. + */ + if (phydev->dev_flags & MICREL_NO_EEE) + phydev->eee_broken_modes = -1; + err = genphy_restart_aneg(phydev); if (err) return err; diff --git a/include/linux/micrel_phy.h b/include/linux/micrel_phy.h index 322d872559847..4e27ca7c49def 100644 --- a/include/linux/micrel_phy.h +++ b/include/linux/micrel_phy.h @@ -44,6 +44,7 @@ #define MICREL_PHY_50MHZ_CLK BIT(0) #define MICREL_PHY_FXEN BIT(1) #define MICREL_KSZ8_P1_ERRATA BIT(2) +#define MICREL_NO_EEE BIT(3)
#define MICREL_KSZ9021_EXTREG_CTRL 0xB #define MICREL_KSZ9021_EXTREG_DATA_WRITE 0xC
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jian Shen shenjian15@huawei.com
[ Upstream commit 61a1deacc3d4fd3d57d7fda4d935f7f7503e8440 ]
Currently, the driver knocks the ring doorbell before updating the ring->last_to_use in tx flow. if the hardware transmiting packet and napi poll scheduling are fast enough, it may get the old ring->last_to_use in drivers' napi poll. In this case, the driver will think the tx is not completed, and return directly without clear the flag __QUEUE_STATE_STACK_XOFF, which may cause tx timeout.
Fixes: 20d06ca2679c ("net: hns3: optimize the tx clean process") Signed-off-by: Jian Shen shenjian15@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index b7b51e56b0308..71772213b4448 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -2102,8 +2102,12 @@ static void hns3_tx_doorbell(struct hns3_enet_ring *ring, int num, */ if (test_bit(HNS3_NIC_STATE_TX_PUSH_ENABLE, &priv->state) && num && !ring->pending_buf && num <= HNS3_MAX_PUSH_BD_NUM && doorbell) { + /* This smp_store_release() pairs with smp_load_aquire() in + * hns3_nic_reclaim_desc(). Ensure that the BD valid bit + * is updated. + */ + smp_store_release(&ring->last_to_use, ring->next_to_use); hns3_tx_push_bd(ring, num); - WRITE_ONCE(ring->last_to_use, ring->next_to_use); return; }
@@ -2114,6 +2118,11 @@ static void hns3_tx_doorbell(struct hns3_enet_ring *ring, int num, return; }
+ /* This smp_store_release() pairs with smp_load_aquire() in + * hns3_nic_reclaim_desc(). Ensure that the BD valid bit is updated. + */ + smp_store_release(&ring->last_to_use, ring->next_to_use); + if (ring->tqp->mem_base) hns3_tx_mem_doorbell(ring); else @@ -2121,7 +2130,6 @@ static void hns3_tx_doorbell(struct hns3_enet_ring *ring, int num, ring->tqp->io_base + HNS3_RING_TX_RING_TAIL_REG);
ring->pending_buf = 0; - WRITE_ONCE(ring->last_to_use, ring->next_to_use); }
static void hns3_tsyn(struct net_device *netdev, struct sk_buff *skb, @@ -3562,9 +3570,8 @@ static void hns3_reuse_buffer(struct hns3_enet_ring *ring, int i) static bool hns3_nic_reclaim_desc(struct hns3_enet_ring *ring, int *bytes, int *pkts, int budget) { - /* pair with ring->last_to_use update in hns3_tx_doorbell(), - * smp_store_release() is not used in hns3_tx_doorbell() because - * the doorbell operation already have the needed barrier operation. + /* This smp_load_acquire() pairs with smp_store_release() in + * hns3_tx_doorbell(). */ int ltu = smp_load_acquire(&ring->last_to_use); int ntc = ring->next_to_clean;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hao Chen chenhao418@huawei.com
[ Upstream commit efccf655e99b6907ca07a466924e91805892e7d3 ]
req1->tcam_data is defined as "u8 tcam_data[8]", and we convert it as (u32 *) without considerring byte order conversion, it may result in printing wrong data for tcam_data.
Convert tcam_data to (__le32 *) first to fix it.
Fixes: b5a0b70d77b9 ("net: hns3: refactor dump fd tcam of debugfs") Signed-off-by: Hao Chen chenhao418@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c index f01a7a9ee02ca..ff3f8f424ad90 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c @@ -1519,7 +1519,7 @@ static int hclge_dbg_fd_tcam_read(struct hclge_dev *hdev, bool sel_x, struct hclge_desc desc[3]; int pos = 0; int ret, i; - u32 *req; + __le32 *req;
hclge_cmd_setup_basic_desc(&desc[0], HCLGE_OPC_FD_TCAM_OP, true); desc[0].flag |= cpu_to_le16(HCLGE_COMM_CMD_FLAG_NEXT); @@ -1544,22 +1544,22 @@ static int hclge_dbg_fd_tcam_read(struct hclge_dev *hdev, bool sel_x, tcam_msg.loc);
/* tcam_data0 ~ tcam_data1 */ - req = (u32 *)req1->tcam_data; + req = (__le32 *)req1->tcam_data; for (i = 0; i < 2; i++) pos += scnprintf(tcam_buf + pos, HCLGE_DBG_TCAM_BUF_SIZE - pos, - "%08x\n", *req++); + "%08x\n", le32_to_cpu(*req++));
/* tcam_data2 ~ tcam_data7 */ - req = (u32 *)req2->tcam_data; + req = (__le32 *)req2->tcam_data; for (i = 0; i < 6; i++) pos += scnprintf(tcam_buf + pos, HCLGE_DBG_TCAM_BUF_SIZE - pos, - "%08x\n", *req++); + "%08x\n", le32_to_cpu(*req++));
/* tcam_data8 ~ tcam_data12 */ - req = (u32 *)req3->tcam_data; + req = (__le32 *)req3->tcam_data; for (i = 0; i < 5; i++) pos += scnprintf(tcam_buf + pos, HCLGE_DBG_TCAM_BUF_SIZE - pos, - "%08x\n", *req++); + "%08x\n", le32_to_cpu(*req++));
return ret; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hao Chen chenhao418@huawei.com
[ Upstream commit c295160b1d95e885f1af4586a221cb221d232d10 ]
Now in hns3_dbg_uninit(), there may be concurrency between kfree buffer and read, it may result in memory error.
Moving debugfs_remove_recursive() in front of kfree buffer to ensure they don't happen at the same time.
Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process") Signed-off-by: Hao Chen chenhao418@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c index f276b5ecb431f..26fb6fefcb9d9 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c @@ -1411,9 +1411,9 @@ int hns3_dbg_init(struct hnae3_handle *handle) return 0;
out: - mutex_destroy(&handle->dbgfs_lock); debugfs_remove_recursive(handle->hnae3_dbgfs); handle->hnae3_dbgfs = NULL; + mutex_destroy(&handle->dbgfs_lock); return ret; }
@@ -1421,6 +1421,9 @@ void hns3_dbg_uninit(struct hnae3_handle *handle) { u32 i;
+ debugfs_remove_recursive(handle->hnae3_dbgfs); + handle->hnae3_dbgfs = NULL; + for (i = 0; i < ARRAY_SIZE(hns3_dbg_cmd); i++) if (handle->dbgfs_buf[i]) { kvfree(handle->dbgfs_buf[i]); @@ -1428,8 +1431,6 @@ void hns3_dbg_uninit(struct hnae3_handle *handle) }
mutex_destroy(&handle->dbgfs_lock); - debugfs_remove_recursive(handle->hnae3_dbgfs); - handle->hnae3_dbgfs = NULL; }
void hns3_dbg_register_debugfs(const char *debugfs_dir_name)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jijie Shao shaojijie@huawei.com
[ Upstream commit fa5564945f7d15ae2390b00c08b6abaef0165cda ]
We hope that tc qdisc and dcb ets commands can not be used crosswise. If we want to use any of the commands to configure tc, We must use the other command to clear the existing configuration.
However, when we configure a single tc with tc qdisc, we can still configure it with dcb ets. Because we use mqprio_active as the tag of tc qdisc configuration, but with dcb ets, we do not check mqprio_active.
This patch fix this issue by check mqprio_active before executing the dcb ets command. and add dcb_ets_active to replace HCLGE_FLAG_DCB_ENABLE and HCLGE_FLAG_MQPRIO_ENABLE at the hclge layer,
Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature") Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hnae3.h | 1 + .../hisilicon/hns3/hns3pf/hclge_dcb.c | 20 +++++-------------- .../hisilicon/hns3/hns3pf/hclge_main.c | 5 +++-- .../hisilicon/hns3/hns3pf/hclge_main.h | 2 -- 4 files changed, 9 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h index a4b43bcd2f0c9..aaf1f42624a79 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h @@ -814,6 +814,7 @@ struct hnae3_tc_info { u8 max_tc; /* Total number of TCs */ u8 num_tc; /* Total number of enabled TCs */ bool mqprio_active; + bool dcb_ets_active; };
#define HNAE3_MAX_DSCP 64 diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c index fad5a5ff3cda5..b98301e205f7f 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c @@ -259,7 +259,7 @@ static int hclge_ieee_setets(struct hnae3_handle *h, struct ieee_ets *ets) int ret;
if (!(hdev->dcbx_cap & DCB_CAP_DCBX_VER_IEEE) || - hdev->flag & HCLGE_FLAG_MQPRIO_ENABLE) + h->kinfo.tc_info.mqprio_active) return -EINVAL;
ret = hclge_ets_validate(hdev, ets, &num_tc, &map_changed); @@ -275,10 +275,7 @@ static int hclge_ieee_setets(struct hnae3_handle *h, struct ieee_ets *ets) }
hclge_tm_schd_info_update(hdev, num_tc); - if (num_tc > 1) - hdev->flag |= HCLGE_FLAG_DCB_ENABLE; - else - hdev->flag &= ~HCLGE_FLAG_DCB_ENABLE; + h->kinfo.tc_info.dcb_ets_active = num_tc > 1;
ret = hclge_ieee_ets_to_tm_info(hdev, ets); if (ret) @@ -487,7 +484,7 @@ static u8 hclge_getdcbx(struct hnae3_handle *h) struct hclge_vport *vport = hclge_get_vport(h); struct hclge_dev *hdev = vport->back;
- if (hdev->flag & HCLGE_FLAG_MQPRIO_ENABLE) + if (h->kinfo.tc_info.mqprio_active) return 0;
return hdev->dcbx_cap; @@ -611,7 +608,8 @@ static int hclge_setup_tc(struct hnae3_handle *h, if (!test_bit(HCLGE_STATE_NIC_REGISTERED, &hdev->state)) return -EBUSY;
- if (hdev->flag & HCLGE_FLAG_DCB_ENABLE) + kinfo = &vport->nic.kinfo; + if (kinfo->tc_info.dcb_ets_active) return -EINVAL;
ret = hclge_mqprio_qopt_check(hdev, mqprio_qopt); @@ -625,7 +623,6 @@ static int hclge_setup_tc(struct hnae3_handle *h, if (ret) return ret;
- kinfo = &vport->nic.kinfo; memcpy(&old_tc_info, &kinfo->tc_info, sizeof(old_tc_info)); hclge_sync_mqprio_qopt(&kinfo->tc_info, mqprio_qopt); kinfo->tc_info.mqprio_active = tc > 0; @@ -634,13 +631,6 @@ static int hclge_setup_tc(struct hnae3_handle *h, if (ret) goto err_out;
- hdev->flag &= ~HCLGE_FLAG_DCB_ENABLE; - - if (tc > 1) - hdev->flag |= HCLGE_FLAG_MQPRIO_ENABLE; - else - hdev->flag &= ~HCLGE_FLAG_MQPRIO_ENABLE; - return hclge_notify_init_up(hdev);
err_out: diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 2d5a2e1ef664d..ce6b658a930cc 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -11026,6 +11026,7 @@ static void hclge_get_mdix_mode(struct hnae3_handle *handle,
static void hclge_info_show(struct hclge_dev *hdev) { + struct hnae3_handle *handle = &hdev->vport->nic; struct device *dev = &hdev->pdev->dev;
dev_info(dev, "PF info begin:\n"); @@ -11042,9 +11043,9 @@ static void hclge_info_show(struct hclge_dev *hdev) dev_info(dev, "This is %s PF\n", hdev->flag & HCLGE_FLAG_MAIN ? "main" : "not main"); dev_info(dev, "DCB %s\n", - hdev->flag & HCLGE_FLAG_DCB_ENABLE ? "enable" : "disable"); + handle->kinfo.tc_info.dcb_ets_active ? "enable" : "disable"); dev_info(dev, "MQPRIO %s\n", - hdev->flag & HCLGE_FLAG_MQPRIO_ENABLE ? "enable" : "disable"); + handle->kinfo.tc_info.mqprio_active ? "enable" : "disable"); dev_info(dev, "Default tx spare buffer size: %u\n", hdev->tx_spare_buf_size);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h index 8f76b568c1bf6..70319ce49a1d2 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h @@ -919,8 +919,6 @@ struct hclge_dev {
#define HCLGE_FLAG_MAIN BIT(0) #define HCLGE_FLAG_DCB_CAPABLE BIT(1) -#define HCLGE_FLAG_DCB_ENABLE BIT(2) -#define HCLGE_FLAG_MQPRIO_ENABLE BIT(3) u32 flag;
u32 pkt_buf_size; /* Total pf buf size for tx/rx */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yisen Zhuang yisen.zhuang@huawei.com
[ Upstream commit 674d9591a32d01df75d6b5fffed4ef942a294376 ]
When sfp is absent or unidentified, the port type should be displayed as PORT_OTHERS, rather than PORT_FIBRE.
Fixes: 88d10bd6f730 ("net: hns3: add support for multiple media type") Signed-off-by: Yisen Zhuang yisen.zhuang@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c index 407d30ee55d2e..64858b3114ac9 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c @@ -773,7 +773,9 @@ static int hns3_get_link_ksettings(struct net_device *netdev, hns3_get_ksettings(h, cmd); break; case HNAE3_MEDIA_TYPE_FIBER: - if (module_type == HNAE3_MODULE_TYPE_CR) + if (module_type == HNAE3_MODULE_TYPE_UNKNOWN) + cmd->base.port = PORT_OTHER; + else if (module_type == HNAE3_MODULE_TYPE_CR) cmd->base.port = PORT_DA; else cmd->base.port = PORT_FIBRE;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jie Wang wangjie125@huawei.com
[ Upstream commit 60326634f6c54528778de18bfef1e8a7a93b3771 ]
HNS3 NIC does not support GSO partial packets segmentation. Actually tunnel packets for example NvGRE packets segment offload and checksum offload is already supported. There is no need to keep gso partial feature bit. So this patch removes it.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Jie Wang wangjie125@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 71772213b4448..613d0a779cef2 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -3315,8 +3315,6 @@ static void hns3_set_default_feature(struct net_device *netdev)
netdev->priv_flags |= IFF_UNICAST_FLT;
- netdev->gso_partial_features |= NETIF_F_GSO_GRE_CSUM; - netdev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_RXCSUM | NETIF_F_SG | NETIF_F_GSO |
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 1b36955cc048c8ff6ba448dbf4be0e52f59f2963 ]
enetc_psi_create() returns an ERR_PTR() or a valid station interface pointer, but checking for the non-NULL quality of the return code blurs that difference away. So if enetc_psi_create() fails, we call enetc_psi_destroy() when we shouldn't. This will likely result in crashes, since enetc_psi_create() cleans up everything after itself when it returns an ERR_PTR().
Fixes: f0168042a212 ("net: enetc: reimplement RFS/RSS memory clearing as PCI quirk") Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lore.kernel.org/netdev/582183ef-e03b-402b-8e2d-6d9bb3c83bd9@moroto.m... Suggested-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230906141609.247579-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/enetc/enetc_pf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c index e0a4cb7e3f501..c153dc083aff0 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c @@ -1402,7 +1402,7 @@ static void enetc_fixup_clear_rss_rfs(struct pci_dev *pdev) return;
si = enetc_psi_create(pdev); - if (si) + if (!IS_ERR(si)) enetc_psi_destroy(pdev); } DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_FREESCALE, ENETC_DEV_ID_PF,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Tesarik petr.tesarik.ext@huawei.com
[ Upstream commit fb60211f377b69acffead3147578f86d0092a7a5 ]
In all these cases, the last argument to dma_declare_coherent_memory() is the buffer end address, but the expected value should be the size of the reserved region.
Fixes: 39fb993038e1 ("media: arch: sh: ap325rxa: Use new renesas-ceu camera driver") Fixes: c2f9b05fd5c1 ("media: arch: sh: ecovec: Use new renesas-ceu camera driver") Fixes: f3590dc32974 ("media: arch: sh: kfr2r09: Use new renesas-ceu camera driver") Fixes: 186c446f4b84 ("media: arch: sh: migor: Use new renesas-ceu camera driver") Fixes: 1a3c230b4151 ("media: arch: sh: ms7724se: Use new renesas-ceu camera driver") Signed-off-by: Petr Tesarik petr.tesarik.ext@huawei.com Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Reviewed-by: Jacopo Mondi jacopo.mondi@ideasonboard.com Reviewed-by: John Paul Adrian Glaubitz glaubitz@physik.fu-berlin.de Reviewed-by: Laurent Pinchart laurent.pinchart+renesas@ideasonboard.com Link: https://lore.kernel.org/r/20230724120742.2187-1-petrtesarik@huaweicloud.com Signed-off-by: John Paul Adrian Glaubitz glaubitz@physik.fu-berlin.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/sh/boards/mach-ap325rxa/setup.c | 2 +- arch/sh/boards/mach-ecovec24/setup.c | 6 ++---- arch/sh/boards/mach-kfr2r09/setup.c | 2 +- arch/sh/boards/mach-migor/setup.c | 2 +- arch/sh/boards/mach-se/7724/setup.c | 6 ++---- 5 files changed, 7 insertions(+), 11 deletions(-)
diff --git a/arch/sh/boards/mach-ap325rxa/setup.c b/arch/sh/boards/mach-ap325rxa/setup.c index 151792162152c..645cccf3da88e 100644 --- a/arch/sh/boards/mach-ap325rxa/setup.c +++ b/arch/sh/boards/mach-ap325rxa/setup.c @@ -531,7 +531,7 @@ static int __init ap325rxa_devices_setup(void) device_initialize(&ap325rxa_ceu_device.dev); dma_declare_coherent_memory(&ap325rxa_ceu_device.dev, ceu_dma_membase, ceu_dma_membase, - ceu_dma_membase + CEU_BUFFER_MEMORY_SIZE - 1); + CEU_BUFFER_MEMORY_SIZE);
platform_device_add(&ap325rxa_ceu_device);
diff --git a/arch/sh/boards/mach-ecovec24/setup.c b/arch/sh/boards/mach-ecovec24/setup.c index 674da7ebd8b7f..7ec03d4a4edf0 100644 --- a/arch/sh/boards/mach-ecovec24/setup.c +++ b/arch/sh/boards/mach-ecovec24/setup.c @@ -1454,15 +1454,13 @@ static int __init arch_setup(void) device_initialize(&ecovec_ceu_devices[0]->dev); dma_declare_coherent_memory(&ecovec_ceu_devices[0]->dev, ceu0_dma_membase, ceu0_dma_membase, - ceu0_dma_membase + - CEU_BUFFER_MEMORY_SIZE - 1); + CEU_BUFFER_MEMORY_SIZE); platform_device_add(ecovec_ceu_devices[0]);
device_initialize(&ecovec_ceu_devices[1]->dev); dma_declare_coherent_memory(&ecovec_ceu_devices[1]->dev, ceu1_dma_membase, ceu1_dma_membase, - ceu1_dma_membase + - CEU_BUFFER_MEMORY_SIZE - 1); + CEU_BUFFER_MEMORY_SIZE); platform_device_add(ecovec_ceu_devices[1]);
gpiod_add_lookup_table(&cn12_power_gpiod_table); diff --git a/arch/sh/boards/mach-kfr2r09/setup.c b/arch/sh/boards/mach-kfr2r09/setup.c index 20f4db778ed6a..c6d556dfbbbe6 100644 --- a/arch/sh/boards/mach-kfr2r09/setup.c +++ b/arch/sh/boards/mach-kfr2r09/setup.c @@ -603,7 +603,7 @@ static int __init kfr2r09_devices_setup(void) device_initialize(&kfr2r09_ceu_device.dev); dma_declare_coherent_memory(&kfr2r09_ceu_device.dev, ceu_dma_membase, ceu_dma_membase, - ceu_dma_membase + CEU_BUFFER_MEMORY_SIZE - 1); + CEU_BUFFER_MEMORY_SIZE);
platform_device_add(&kfr2r09_ceu_device);
diff --git a/arch/sh/boards/mach-migor/setup.c b/arch/sh/boards/mach-migor/setup.c index f60061283c482..773ee767d0c4e 100644 --- a/arch/sh/boards/mach-migor/setup.c +++ b/arch/sh/boards/mach-migor/setup.c @@ -604,7 +604,7 @@ static int __init migor_devices_setup(void) device_initialize(&migor_ceu_device.dev); dma_declare_coherent_memory(&migor_ceu_device.dev, ceu_dma_membase, ceu_dma_membase, - ceu_dma_membase + CEU_BUFFER_MEMORY_SIZE - 1); + CEU_BUFFER_MEMORY_SIZE);
platform_device_add(&migor_ceu_device);
diff --git a/arch/sh/boards/mach-se/7724/setup.c b/arch/sh/boards/mach-se/7724/setup.c index b60a2626e18b2..6495f93540654 100644 --- a/arch/sh/boards/mach-se/7724/setup.c +++ b/arch/sh/boards/mach-se/7724/setup.c @@ -940,15 +940,13 @@ static int __init devices_setup(void) device_initialize(&ms7724se_ceu_devices[0]->dev); dma_declare_coherent_memory(&ms7724se_ceu_devices[0]->dev, ceu0_dma_membase, ceu0_dma_membase, - ceu0_dma_membase + - CEU_BUFFER_MEMORY_SIZE - 1); + CEU_BUFFER_MEMORY_SIZE); platform_device_add(ms7724se_ceu_devices[0]);
device_initialize(&ms7724se_ceu_devices[1]->dev); dma_declare_coherent_memory(&ms7724se_ceu_devices[1]->dev, ceu1_dma_membase, ceu1_dma_membase, - ceu1_dma_membase + - CEU_BUFFER_MEMORY_SIZE - 1); + CEU_BUFFER_MEMORY_SIZE); platform_device_add(ms7724se_ceu_devices[1]);
return platform_add_devices(ms7724se_devices,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Duoming Zhou duoming@zju.edu.cn
[ Upstream commit 246f80a0b17f8f582b2c0996db02998239057c65 ]
The original code puts flush_work() before timer_shutdown_sync() in switch_drv_remove(). Although we use flush_work() to stop the worker, it could be rescheduled in switch_timer(). As a result, a use-after-free bug can occur. The details are shown below:
(cpu 0) | (cpu 1) switch_drv_remove() | flush_work() | ... | switch_timer // timer | schedule_work(&psw->work) timer_shutdown_sync() | ... | switch_work_handler // worker kfree(psw) // free | | psw->state = 0 // use
This patch puts timer_shutdown_sync() before flush_work() to mitigate the bugs. As a result, the worker and timer will be stopped safely before the deallocate operations.
Fixes: 9f5e8eee5cfe ("sh: generic push-switch framework.") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Reviewed-by: John Paul Adrian Glaubitz glaubitz@physik.fu-berlin.de Link: https://lore.kernel.org/r/20230802033737.9738-1-duoming@zju.edu.cn Signed-off-by: John Paul Adrian Glaubitz glaubitz@physik.fu-berlin.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/sh/drivers/push-switch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/sh/drivers/push-switch.c b/arch/sh/drivers/push-switch.c index c95f48ff3f6fb..6ecba5f521eb6 100644 --- a/arch/sh/drivers/push-switch.c +++ b/arch/sh/drivers/push-switch.c @@ -101,8 +101,8 @@ static int switch_drv_remove(struct platform_device *pdev) device_remove_file(&pdev->dev, &dev_attr_switch);
platform_set_drvdata(pdev, NULL); - flush_work(&psw->work); timer_shutdown_sync(&psw->debounce); + flush_work(&psw->work); free_irq(irq, pdev);
kfree(psw);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
commit 08700ec705043eb0cee01b35cf5b9d63f0230d12 upstream.
John David Anglin reported parisc has been broken since commit ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost").
Like ia64, parisc64 uses a function descriptor. The function references must be prefixed with P%.
Also, symbols prefixed $$ from the library have the symbol type STT_LOPROC instead of STT_FUNC. They should be handled as functions too.
Fixes: ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost") Reported-by: John David Anglin dave.anglin@bell.net Tested-by: John David Anglin dave.anglin@bell.net Tested-by: Helge Deller deller@gmx.de Closes: https://lore.kernel.org/linux-parisc/1901598a-e11d-f7dd-a5d9-9a69d06e6b6e@be... Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/export-internal.h | 2 ++ scripts/mod/modpost.c | 9 +++++++++ 2 files changed, 11 insertions(+)
--- a/include/linux/export-internal.h +++ b/include/linux/export-internal.h @@ -52,6 +52,8 @@
#ifdef CONFIG_IA64 #define KSYM_FUNC(name) @fptr(name) +#elif defined(CONFIG_PARISC) && defined(CONFIG_64BIT) +#define KSYM_FUNC(name) P%name #else #define KSYM_FUNC(name) name #endif --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -1226,6 +1226,15 @@ static void check_export_symbol(struct m */ s->is_func = (ELF_ST_TYPE(sym->st_info) == STT_FUNC);
+ /* + * For parisc64, symbols prefixed $$ from the library have the symbol type + * STT_LOPROC. They should be handled as functions too. + */ + if (elf->hdr->e_ident[EI_CLASS] == ELFCLASS64 && + elf->hdr->e_machine == EM_PARISC && + ELF_ST_TYPE(sym->st_info) == STT_LOPROC) + s->is_func = true; + if (match(secname, PATTERNS(INIT_SECTIONS))) warn("%s: %s: EXPORT_SYMBOL used for init symbol. Remove __init or EXPORT_SYMBOL.\n", mod->name, name);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florent CARLI fcarli@gmail.com
commit 6eb28a38f6478a650c7e76b2d6910669615d8a62 upstream.
This driver uses the WATCHDOG_CORE framework and ISA_BUS_API. This commit has these dependencies correctly selected.
Signed-off-by: Florent CARLI fcarli@gmail.com Co-authored-by: Yoann Congal yoann.congal@smile.fr Reviewed-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/r/20230721081347.52069-1-fcarli@gmail.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Wim Van Sebroeck wim@linux-watchdog.org Cc: Yoann Congal yoann.congal@smile.fr Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/watchdog/Kconfig | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/watchdog/Kconfig +++ b/drivers/watchdog/Kconfig @@ -1075,6 +1075,8 @@ config ADVANTECH_WDT config ADVANTECH_EC_WDT tristate "Advantech Embedded Controller Watchdog Timer" depends on X86 + select ISA_BUS_API + select WATCHDOG_CORE help This driver supports Advantech products with ITE based Embedded Controller. It does not support Advantech products with other ECs or without EC.
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fangzhi Zuo jerry.zuo@amd.com
commit 69a959610229ec31b534eaa5f6ec75965f321bed upstream.
Create MST colorsapce property for downstream device would trigger warning message "RIP: 0010:drm_mode_object_add+0x8e/0xa0 [drm]"
After driver is loaded and drm device is registered, create dp colorspace property triggers warning storm at WARN_ON(!dev->driver->load && dev->registered && !obj_free_cb);
Temporary disabling MST dp colorspace property for now.
Reviewed-by: Aurabindo Pillai aurabindo.pillai@amd.com Acked-by: Aurabindo Pillai aurabindo.pillai@amd.com Signed-off-by: Fangzhi Zuo jerry.zuo@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -7295,7 +7295,7 @@ void amdgpu_dm_connector_init_helper(str if (connector_type == DRM_MODE_CONNECTOR_HDMIA) { if (!drm_mode_create_hdmi_colorspace_property(&aconnector->base, supported_colorspaces)) drm_connector_attach_colorspace_property(&aconnector->base); - } else if (connector_type == DRM_MODE_CONNECTOR_DisplayPort || + } else if ((connector_type == DRM_MODE_CONNECTOR_DisplayPort && !aconnector->mst_root) || connector_type == DRM_MODE_CONNECTOR_eDP) { if (!drm_mode_create_dp_colorspace_property(&aconnector->base, supported_colorspaces)) drm_connector_attach_colorspace_property(&aconnector->base);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Kozlov pavel.kozlov@synopsys.com
commit 42f51fb24fd39cc547c086ab3d8a314cc603a91c upstream.
... to avoid unwanted gcc optimizations
SMP kernels fail to boot with commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask optimizations").
| | percpu: BUG: failure at mm/percpu.c:2981/pcpu_build_alloc_info()! |
The write operation performed by the SCOND instruction in the atomic inline asm code is not properly passed to the compiler. The compiler cannot correctly optimize a nested loop that runs through the cpumask in the pcpu_build_alloc_info() function.
Fix this by add a compiler barrier (memory clobber in inline asm).
Apparently atomic ops used to have memory clobber implicitly via surrounding smp_mb(). However commit b64be6836993c431e ("ARC: atomics: implement relaxed variants") removed the smp_mb() for the relaxed variants, but failed to add the explicit compiler barrier.
Link: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/135 Cc: stable@vger.kernel.org # v6.3+ Fixes: b64be6836993c43 ("ARC: atomics: implement relaxed variants") Signed-off-by: Pavel Kozlov pavel.kozlov@synopsys.com Signed-off-by: Vineet Gupta vgupta@kernel.org [vgupta: tweaked the changelog and added Fixes tag] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arc/include/asm/atomic-llsc.h | 6 +++--- arch/arc/include/asm/atomic64-arcv2.h | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-)
--- a/arch/arc/include/asm/atomic-llsc.h +++ b/arch/arc/include/asm/atomic-llsc.h @@ -18,7 +18,7 @@ static inline void arch_atomic_##op(int : [val] "=&r" (val) /* Early clobber to prevent reg reuse */ \ : [ctr] "r" (&v->counter), /* Not "m": llock only supports reg direct addr mode */ \ [i] "ir" (i) \ - : "cc"); \ + : "cc", "memory"); \ } \
#define ATOMIC_OP_RETURN(op, asm_op) \ @@ -34,7 +34,7 @@ static inline int arch_atomic_##op##_ret : [val] "=&r" (val) \ : [ctr] "r" (&v->counter), \ [i] "ir" (i) \ - : "cc"); \ + : "cc", "memory"); \ \ return val; \ } @@ -56,7 +56,7 @@ static inline int arch_atomic_fetch_##op [orig] "=&r" (orig) \ : [ctr] "r" (&v->counter), \ [i] "ir" (i) \ - : "cc"); \ + : "cc", "memory"); \ \ return orig; \ } --- a/arch/arc/include/asm/atomic64-arcv2.h +++ b/arch/arc/include/asm/atomic64-arcv2.h @@ -60,7 +60,7 @@ static inline void arch_atomic64_##op(s6 " bnz 1b \n" \ : "=&r"(val) \ : "r"(&v->counter), "ir"(a) \ - : "cc"); \ + : "cc", "memory"); \ } \
#define ATOMIC64_OP_RETURN(op, op1, op2) \ @@ -77,7 +77,7 @@ static inline s64 arch_atomic64_##op##_r " bnz 1b \n" \ : [val] "=&r"(val) \ : "r"(&v->counter), "ir"(a) \ - : "cc"); /* memory clobber comes from smp_mb() */ \ + : "cc", "memory"); \ \ return val; \ } @@ -99,7 +99,7 @@ static inline s64 arch_atomic64_fetch_## " bnz 1b \n" \ : "=&r"(orig), "=&r"(val) \ : "r"(&v->counter), "ir"(a) \ - : "cc"); /* memory clobber comes from smp_mb() */ \ + : "cc", "memory"); \ \ return orig; \ }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Walter Chang walter.chang@mediatek.com
commit e7d65e40ab5a5940785c5922f317602d0268caaf upstream.
Due to the fact that the use of `writeq_relaxed()` to program CVAL is not guaranteed to be atomic, it is necessary to disable the timer before programming CVAL.
However, if the MMIO timer is already enabled and has not yet expired, there is a possibility of unexpected behavior occurring: when the CPU enters the idle state during this period, and if the CPU's local event is earlier than the broadcast event, the following process occurs:
tick_broadcast_enter() tick_broadcast_oneshot_control(TICK_BROADCAST_ENTER) __tick_broadcast_oneshot_control() ___tick_broadcast_oneshot_control() tick_broadcast_set_event() clockevents_program_event() set_next_event_mem()
During this process, the MMIO timer remains enabled while programming CVAL. To prevent such behavior, disable timer explicitly prior to programming CVAL.
Fixes: 8b82c4f883a7 ("clocksource/drivers/arm_arch_timer: Move MMIO timer programming over to CVAL") Cc: stable@vger.kernel.org Signed-off-by: Walter Chang walter.chang@mediatek.com Acked-by: Marc Zyngier maz@kernel.org Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://lore.kernel.org/r/20230717090735.19370-1-walter.chang@mediatek.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clocksource/arm_arch_timer.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -792,6 +792,13 @@ static __always_inline void set_next_eve u64 cnt;
ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk); + + /* Timer must be disabled before programming CVAL */ + if (ctrl & ARCH_TIMER_CTRL_ENABLE) { + ctrl &= ~ARCH_TIMER_CTRL_ENABLE; + arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk); + } + ctrl |= ARCH_TIMER_CTRL_ENABLE; ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hien Huynh hien.huynh.px@renesas.com
commit c6ec8c83a29fb3aec3efa6fabbf5344498f57c7f upstream.
Before setting DDS and SDS values, we need to clear its value first otherwise, we get incorrect results when we change/update the DMA bus width several times due to the 'OR' expression.
Fixes: 5000d37042a6 ("dmaengine: sh: Add DMAC driver for RZ/G2L SoC") Cc: stable@kernel.org Signed-off-by: Hien Huynh hien.huynh.px@renesas.com Signed-off-by: Biju Das biju.das.jz@bp.renesas.com Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20230706112150.198941-3-biju.das.jz@bp.renesas.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/sh/rz-dmac.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/dma/sh/rz-dmac.c +++ b/drivers/dma/sh/rz-dmac.c @@ -9,6 +9,7 @@ * Copyright 2012 Javier Martin, Vista Silicon javier.martin@vista-silicon.com */
+#include <linux/bitfield.h> #include <linux/dma-mapping.h> #include <linux/dmaengine.h> #include <linux/interrupt.h> @@ -145,8 +146,8 @@ struct rz_dmac { #define CHCFG_REQD BIT(3) #define CHCFG_SEL(bits) ((bits) & 0x07) #define CHCFG_MEM_COPY (0x80400008) -#define CHCFG_FILL_DDS(a) (((a) << 16) & GENMASK(19, 16)) -#define CHCFG_FILL_SDS(a) (((a) << 12) & GENMASK(15, 12)) +#define CHCFG_FILL_DDS_MASK GENMASK(19, 16) +#define CHCFG_FILL_SDS_MASK GENMASK(15, 12) #define CHCFG_FILL_TM(a) (((a) & BIT(5)) << 22) #define CHCFG_FILL_AM(a) (((a) & GENMASK(4, 2)) << 6) #define CHCFG_FILL_LVL(a) (((a) & BIT(1)) << 5) @@ -607,13 +608,15 @@ static int rz_dmac_config(struct dma_cha if (val == CHCFG_DS_INVALID) return -EINVAL;
- channel->chcfg |= CHCFG_FILL_DDS(val); + channel->chcfg &= ~CHCFG_FILL_DDS_MASK; + channel->chcfg |= FIELD_PREP(CHCFG_FILL_DDS_MASK, val);
val = rz_dmac_ds_to_val_mapping(config->src_addr_width); if (val == CHCFG_DS_INVALID) return -EINVAL;
- channel->chcfg |= CHCFG_FILL_SDS(val); + channel->chcfg &= ~CHCFG_FILL_SDS_MASK; + channel->chcfg |= FIELD_PREP(CHCFG_FILL_SDS_MASK, val);
return 0; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ekansh Gupta quic_ekangupt@quicinc.com
commit ada6c2d99aedd1eac2f633d03c652e070bc2ea74 upstream.
Remote heap is used by DSP audioPD on need basis. This memory is allocated from reserved CMA memory region and is then shared with audioPD to use it for it's functionality.
Current implementation of remote heap is not allocating the memory from CMA region, instead it is allocating the memory from SMMU context bank. The arguments passed to scm call for the reassignment of ownership is also not correct. Added changes to allocate CMA memory and have a proper ownership reassignment.
Fixes: 532ad70c6d44 ("misc: fastrpc: Add mmap request assigning for static PD pool") Cc: stable stable@kernel.org Tested-by: Ekansh Gupta quic_ekangupt@quicinc.com Signed-off-by: Ekansh Gupta quic_ekangupt@quicinc.com Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20230811115643.38578-2-srinivas.kandagatla@linaro.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/misc/fastrpc.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
--- a/drivers/misc/fastrpc.c +++ b/drivers/misc/fastrpc.c @@ -1871,7 +1871,11 @@ static int fastrpc_req_mmap(struct fastr return -EINVAL; }
- err = fastrpc_buf_alloc(fl, fl->sctx->dev, req.size, &buf); + if (req.flags == ADSP_MMAP_REMOTE_HEAP_ADDR) + err = fastrpc_remote_heap_alloc(fl, dev, req.size, &buf); + else + err = fastrpc_buf_alloc(fl, dev, req.size, &buf); + if (err) { dev_err(dev, "failed to allocate buffer\n"); return err; @@ -1910,12 +1914,8 @@ static int fastrpc_req_mmap(struct fastr
/* Add memory to static PD pool, protection thru hypervisor */ if (req.flags == ADSP_MMAP_REMOTE_HEAP_ADDR && fl->cctx->vmcount) { - struct qcom_scm_vmperm perm; - - perm.vmid = QCOM_SCM_VMID_HLOS; - perm.perm = QCOM_SCM_PERM_RWX; - err = qcom_scm_assign_mem(buf->phys, buf->size, - &fl->cctx->perms, &perm, 1); + err = qcom_scm_assign_mem(buf->phys, (u64)buf->size, + &fl->cctx->perms, fl->cctx->vmperms, fl->cctx->vmcount); if (err) { dev_err(fl->sctx->dev, "Failed to assign memory phys 0x%llx size 0x%llx err %d", buf->phys, buf->size, err);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ekansh Gupta quic_ekangupt@quicinc.com
commit a2cb9cd6a3949a3804ad9fd7da234892ce6719ec upstream.
Scatterlist table is obtained during map create request and the same table is used for DMA mapping unmap. In case there is any failure while getting the sg_table, ERR_PTR is returned instead of sg_table.
When the map is getting freed, there is only a non-NULL check of sg_table which will also be true in case failure was returned instead of sg_table. This would result in improper unmap request. Add proper check before setting map table to avoid bad unmap request.
Fixes: c68cfb718c8f ("misc: fastrpc: Add support for context Invoke method") Cc: stable stable@kernel.org Signed-off-by: Ekansh Gupta quic_ekangupt@quicinc.com Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20230811115643.38578-3-srinivas.kandagatla@linaro.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/misc/fastrpc.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/misc/fastrpc.c +++ b/drivers/misc/fastrpc.c @@ -756,6 +756,7 @@ static int fastrpc_map_create(struct fas { struct fastrpc_session_ctx *sess = fl->sctx; struct fastrpc_map *map = NULL; + struct sg_table *table; int err = 0;
if (!fastrpc_map_lookup(fl, fd, ppmap, true)) @@ -783,11 +784,12 @@ static int fastrpc_map_create(struct fas goto attach_err; }
- map->table = dma_buf_map_attachment_unlocked(map->attach, DMA_BIDIRECTIONAL); - if (IS_ERR(map->table)) { - err = PTR_ERR(map->table); + table = dma_buf_map_attachment_unlocked(map->attach, DMA_BIDIRECTIONAL); + if (IS_ERR(table)) { + err = PTR_ERR(table); goto map_err; } + map->table = table;
if (attr & FASTRPC_ATTR_SECUREMAP) { map->phys = sg_phys(map->table->sgl);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Yi yi.zhang@huawei.com
commit 373ac521799d9e97061515aca6ec6621789036bb upstream.
journal_clean_one_cp_list() has been merged into journal_shrink_one_cp_list(), but do chekpoint buffer cleanup from the committing process is just a best effort, it should stop scan once it meet a busy buffer, or else it will cause a lot of invalid buffer scan and checks. We catch a performance regression when doing fs_mark tests below.
Test cmd: ./fs_mark -d scratch -s 1024 -n 10000 -t 1 -D 100 -N 100
Before merging checkpoint buffer cleanup: FSUse% Count Size Files/sec App Overhead 95 10000 1024 8304.9 49033
After merging checkpoint buffer cleanup: FSUse% Count Size Files/sec App Overhead 95 10000 1024 7649.0 50012 FSUse% Count Size Files/sec App Overhead 95 10000 1024 2107.1 50871
After merging checkpoint buffer cleanup, the total loop count in journal_shrink_one_cp_list() could be up to 6,261,600+ (50,000+ ~ 100,000+ in general), most of them are invalid. This patch fix it through passing 'shrink_type' into journal_shrink_one_cp_list() and add a new 'SHRINK_BUSY_STOP' to indicate it should stop once meet a busy buffer. After fix, the loop count descending back to 10,000+.
After this fix: FSUse% Count Size Files/sec App Overhead 95 10000 1024 8558.4 49109
Cc: stable@kernel.org Fixes: b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()") Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230714025528.564988-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/jbd2/checkpoint.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
--- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -349,6 +349,8 @@ int jbd2_cleanup_journal_tail(journal_t
/* Checkpoint list management */
+enum shrink_type {SHRINK_DESTROY, SHRINK_BUSY_STOP, SHRINK_BUSY_SKIP}; + /* * journal_shrink_one_cp_list * @@ -360,7 +362,8 @@ int jbd2_cleanup_journal_tail(journal_t * Called with j_list_lock held. */ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh, - bool destroy, bool *released) + enum shrink_type type, + bool *released) { struct journal_head *last_jh; struct journal_head *next_jh = jh; @@ -376,12 +379,15 @@ static unsigned long journal_shrink_one_ jh = next_jh; next_jh = jh->b_cpnext;
- if (destroy) { + if (type == SHRINK_DESTROY) { ret = __jbd2_journal_remove_checkpoint(jh); } else { ret = jbd2_journal_try_remove_checkpoint(jh); - if (ret < 0) - continue; + if (ret < 0) { + if (type == SHRINK_BUSY_SKIP) + continue; + break; + } }
nr_freed++; @@ -445,7 +451,7 @@ again: tid = transaction->t_tid;
freed = journal_shrink_one_cp_list(transaction->t_checkpoint_list, - false, &released); + SHRINK_BUSY_SKIP, &released); nr_freed += freed; (*nr_to_scan) -= min(*nr_to_scan, freed); if (*nr_to_scan == 0) @@ -485,19 +491,21 @@ out: void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy) { transaction_t *transaction, *last_transaction, *next_transaction; + enum shrink_type type; bool released;
transaction = journal->j_checkpoint_transactions; if (!transaction) return;
+ type = destroy ? SHRINK_DESTROY : SHRINK_BUSY_STOP; last_transaction = transaction->t_cpprev; next_transaction = transaction; do { transaction = next_transaction; next_transaction = transaction->t_cpnext; journal_shrink_one_cp_list(transaction->t_checkpoint_list, - destroy, &released); + type, &released); /* * This function only frees up some memory if possible so we * dont have an obligation to finish processing. Bail out if
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhihao Cheng chengzhihao1@huawei.com
commit 590a809ff743e7bd890ba5fb36bc38e20a36de53 upstream.
Following process will corrupt ext4 image: Step 1: jbd2_journal_commit_transaction __jbd2_journal_insert_checkpoint(jh, commit_transaction) // Put jh into trans1->t_checkpoint_list journal->j_checkpoint_transactions = commit_transaction // Put trans1 into journal->j_checkpoint_transactions
Step 2: do_get_write_access test_clear_buffer_dirty(bh) // clear buffer dirty,set jbd dirty __jbd2_journal_file_buffer(jh, transaction) // jh belongs to trans2
Step 3: drop_cache journal_shrink_one_cp_list jbd2_journal_try_remove_checkpoint if (!trylock_buffer(bh)) // lock bh, true if (buffer_dirty(bh)) // buffer is not dirty __jbd2_journal_remove_checkpoint(jh) // remove jh from trans1->t_checkpoint_list
Step 4: jbd2_log_do_checkpoint trans1 = journal->j_checkpoint_transactions // jh is not in trans1->t_checkpoint_list jbd2_cleanup_journal_tail(journal) // trans1 is done
Step 5: Power cut, trans2 is not committed, jh is lost in next mounting.
Fix it by checking 'jh->b_transaction' before remove it from checkpoint.
Cc: stable@kernel.org Fixes: 46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230714025528.564988-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/jbd2/checkpoint.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -639,6 +639,8 @@ int jbd2_journal_try_remove_checkpoint(s { struct buffer_head *bh = jh2bh(jh);
+ if (jh->b_transaction) + return -EBUSY; if (!trylock_buffer(bh)) return -EBUSY; if (buffer_dirty(bh)) {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Yi yi.zhang@huawei.com
commit 2dfba3bb40ad8536b9fa802364f2d40da31aa88e upstream.
We got a filesystem inconsistency issue below while running generic/475 I/O failure pressure test with fast_commit feature enabled.
Symlink /p3/d3/d1c/d6c/dd6/dce/l101 (inode #132605) is invalid.
If fast_commit feature is enabled, a special fast_commit journal area is appended to the end of the normal journal area. The journal->j_last point to the first unused block behind the normal journal area instead of the whole log area, and the journal->j_fc_last point to the first unused block behind the fast_commit journal area. While doing journal recovery, do_one_pass(PASS_SCAN) should first scan the normal journal area and turn around to the first block once it meet journal->j_last, but the wrap() macro misuse the journal->j_fc_last, so the recovering could not read the next magic block (commit block perhaps) and would end early mistakenly and missing tN and every transaction after it in the following example. Finally, it could lead to filesystem inconsistency.
| normal journal area | fast commit area | +-------------------------------------------------+------------------+ | tN(rere) | tN+1 |~| tN-x |...| tN-1 | tN(front) | .... | +-------------------------------------------------+------------------+ / / / start journal->j_last journal->j_fc_last
This patch fix it by use the correct ending journal->j_last.
Fixes: 5b849b5f96b4 ("jbd2: fast commit recovery path") Cc: stable@kernel.org Reported-by: Theodore Ts'o tytso@mit.edu Link: https://lore.kernel.org/linux-ext4/20230613043120.GB1584772@mit.edu/ Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230626073322.3956567-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/jbd2/recovery.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
--- a/fs/jbd2/recovery.c +++ b/fs/jbd2/recovery.c @@ -230,12 +230,8 @@ static int count_tags(journal_t *journal /* Make sure we wrap around the log correctly! */ #define wrap(journal, var) \ do { \ - unsigned long _wrap_last = \ - jbd2_has_feature_fast_commit(journal) ? \ - (journal)->j_fc_last : (journal)->j_last; \ - \ - if (var >= _wrap_last) \ - var -= (_wrap_last - (journal)->j_first); \ + if (var >= (journal)->j_last) \ + var -= ((journal)->j_last - (journal)->j_first); \ } while (0)
static int fc_do_one_pass(journal_t *journal, @@ -524,9 +520,7 @@ static int do_one_pass(journal_t *journa break;
jbd2_debug(2, "Scanning for sequence ID %u at %lu/%lu\n", - next_commit_ID, next_log_block, - jbd2_has_feature_fast_commit(journal) ? - journal->j_fc_last : journal->j_last); + next_commit_ID, next_log_block, journal->j_last);
/* Skip over each chunk of the transaction looking * either the next descriptor block or the final commit
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
commit 768d612f79822d30a1e7d132a4d4b05337ce42ec upstream.
Yikebaer reported an issue: ================================================================== BUG: KASAN: slab-use-after-free in ext4_es_insert_extent+0xc68/0xcb0 fs/ext4/extents_status.c:894 Read of size 4 at addr ffff888112ecc1a4 by task syz-executor/8438
CPU: 1 PID: 8438 Comm: syz-executor Not tainted 6.5.0-rc5 #1 Call Trace: [...] kasan_report+0xba/0xf0 mm/kasan/report.c:588 ext4_es_insert_extent+0xc68/0xcb0 fs/ext4/extents_status.c:894 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...]
Allocated by task 8438: [...] kmem_cache_zalloc include/linux/slab.h:693 [inline] __es_alloc_extent fs/ext4/extents_status.c:469 [inline] ext4_es_insert_extent+0x672/0xcb0 fs/ext4/extents_status.c:873 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...]
Freed by task 8438: [...] kmem_cache_free+0xec/0x490 mm/slub.c:3823 ext4_es_try_to_merge_right fs/ext4/extents_status.c:593 [inline] __es_insert_extent+0x9f4/0x1440 fs/ext4/extents_status.c:802 ext4_es_insert_extent+0x2ca/0xcb0 fs/ext4/extents_status.c:882 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] ==================================================================
The flow of issue triggering is as follows: 1. remove es raw es es removed es1 |-------------------| -> |----|.......|------|
2. insert es es insert es1 merge with es es1 merge with es and free es1 |----|.......|------| -> |------------|------| -> |-------------------|
es merges with newes, then merges with es1, frees es1, then determines if es1->es_len is 0 and triggers a UAF.
The code flow is as follows: ext4_es_insert_extent es1 = __es_alloc_extent(true); es2 = __es_alloc_extent(true); __es_remove_extent(inode, lblk, end, NULL, es1) __es_insert_extent(inode, &newes, es1) ---> insert es1 to es tree __es_insert_extent(inode, &newes, es2) ext4_es_try_to_merge_right ext4_es_free_extent(inode, es1) ---> es1 is freed if (es1 && !es1->es_len) // Trigger UAF by determining if es1 is used.
We determine whether es1 or es2 is used immediately after calling __es_remove_extent() or __es_insert_extent() to avoid triggering a UAF if es1 or es2 is freed.
Reported-by: Yikebaer Aizezi yikebaer61@gmail.com Closes: https://lore.kernel.org/lkml/CALcu4raD4h9coiyEBL4Bm0zjDwxC2CyPiTwsP3zFuhot6y... Fixes: 2a69c450083d ("ext4: using nofail preallocation in ext4_es_insert_extent()") Cc: stable@kernel.org Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230815070808.3377171-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/extents_status.c | 44 +++++++++++++++++++++++++++------------- 1 file changed, 30 insertions(+), 14 deletions(-)
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c index 9b5b8951afb4..6f7de14c0fa8 100644 --- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -878,23 +878,29 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk, err1 = __es_remove_extent(inode, lblk, end, NULL, es1); if (err1 != 0) goto error; + /* Free preallocated extent if it didn't get used. */ + if (es1) { + if (!es1->es_len) + __es_free_extent(es1); + es1 = NULL; + }
err2 = __es_insert_extent(inode, &newes, es2); if (err2 == -ENOMEM && !ext4_es_must_keep(&newes)) err2 = 0; if (err2 != 0) goto error; + /* Free preallocated extent if it didn't get used. */ + if (es2) { + if (!es2->es_len) + __es_free_extent(es2); + es2 = NULL; + }
if (sbi->s_cluster_ratio > 1 && test_opt(inode->i_sb, DELALLOC) && (status & EXTENT_STATUS_WRITTEN || status & EXTENT_STATUS_UNWRITTEN)) __revise_pending(inode, lblk, len); - - /* es is pre-allocated but not used, free it. */ - if (es1 && !es1->es_len) - __es_free_extent(es1); - if (es2 && !es2->es_len) - __es_free_extent(es2); error: write_unlock(&EXT4_I(inode)->i_es_lock); if (err1 || err2) @@ -1491,8 +1497,12 @@ void ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk, */ write_lock(&EXT4_I(inode)->i_es_lock); err = __es_remove_extent(inode, lblk, end, &reserved, es); - if (es && !es->es_len) - __es_free_extent(es); + /* Free preallocated extent if it didn't get used. */ + if (es) { + if (!es->es_len) + __es_free_extent(es); + es = NULL; + } write_unlock(&EXT4_I(inode)->i_es_lock); if (err) goto retry; @@ -2047,19 +2057,25 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk, err1 = __es_remove_extent(inode, lblk, lblk, NULL, es1); if (err1 != 0) goto error; + /* Free preallocated extent if it didn't get used. */ + if (es1) { + if (!es1->es_len) + __es_free_extent(es1); + es1 = NULL; + }
err2 = __es_insert_extent(inode, &newes, es2); if (err2 != 0) goto error; + /* Free preallocated extent if it didn't get used. */ + if (es2) { + if (!es2->es_len) + __es_free_extent(es2); + es2 = NULL; + }
if (allocated) __insert_pending(inode, lblk); - - /* es is pre-allocated but not used, free it. */ - if (es1 && !es1->es_len) - __es_free_extent(es1); - if (es2 && !es2->es_len) - __es_free_extent(es2); error: write_unlock(&EXT4_I(inode)->i_es_lock); if (err1 || err2)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wang Jianjian wangjianjian0@foxmail.com
commit 68228da51c9a436872a4ef4b5a7692e29f7e5bc7 upstream.
When setup_system_zone, flex_bg is not initialized so it is always 1. Use a new helper function, ext4_num_base_meta_blocks() which does not depend on sbi->s_log_groups_per_flex being initialized.
[ Squashed two patches in the Link URL's below together into a single commit, which is simpler to review/understand. Also fix checkpatch warnings. --TYT ]
Cc: stable@kernel.org Signed-off-by: Wang Jianjian wangjianjian0@foxmail.com Link: https://lore.kernel.org/r/tencent_21AF0D446A9916ED5C51492CC6C9A0A77B05@qq.co... Link: https://lore.kernel.org/r/tencent_D744D1450CC169AEA77FCF0A64719909ED05@qq.co... Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/balloc.c | 15 +++++++++++---- fs/ext4/block_validity.c | 8 ++++---- fs/ext4/ext4.h | 2 ++ 3 files changed, 17 insertions(+), 8 deletions(-)
--- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -913,11 +913,11 @@ unsigned long ext4_bg_num_gdb(struct sup }
/* - * This function returns the number of file system metadata clusters at + * This function returns the number of file system metadata blocks at * the beginning of a block group, including the reserved gdt blocks. */ -static unsigned ext4_num_base_meta_clusters(struct super_block *sb, - ext4_group_t block_group) +unsigned int ext4_num_base_meta_blocks(struct super_block *sb, + ext4_group_t block_group) { struct ext4_sb_info *sbi = EXT4_SB(sb); unsigned num; @@ -935,8 +935,15 @@ static unsigned ext4_num_base_meta_clust } else { /* For META_BG_BLOCK_GROUPS */ num += ext4_bg_num_gdb_meta(sb, block_group); } - return EXT4_NUM_B2C(sbi, num); + return num; } + +static unsigned int ext4_num_base_meta_clusters(struct super_block *sb, + ext4_group_t block_group) +{ + return EXT4_NUM_B2C(EXT4_SB(sb), ext4_num_base_meta_blocks(sb, block_group)); +} + /** * ext4_inode_to_goal_block - return a hint for block allocation * @inode: inode for block allocation --- a/fs/ext4/block_validity.c +++ b/fs/ext4/block_validity.c @@ -215,7 +215,6 @@ int ext4_setup_system_zone(struct super_ struct ext4_system_blocks *system_blks; struct ext4_group_desc *gdp; ext4_group_t i; - int flex_size = ext4_flex_bg_size(sbi); int ret;
system_blks = kzalloc(sizeof(*system_blks), GFP_KERNEL); @@ -223,12 +222,13 @@ int ext4_setup_system_zone(struct super_ return -ENOMEM;
for (i=0; i < ngroups; i++) { + unsigned int meta_blks = ext4_num_base_meta_blocks(sb, i); + cond_resched(); - if (ext4_bg_has_super(sb, i) && - ((i < 5) || ((i % flex_size) == 0))) { + if (meta_blks != 0) { ret = add_system_zone(system_blks, ext4_group_first_block_no(sb, i), - ext4_bg_num_gdb(sb, i) + 1, 0); + meta_blks, 0); if (ret) goto err; } --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3084,6 +3084,8 @@ extern const char *ext4_decode_error(str extern void ext4_mark_group_bitmap_corrupted(struct super_block *sb, ext4_group_t block_group, unsigned int flags); +extern unsigned int ext4_num_base_meta_blocks(struct super_block *sb, + ext4_group_t block_group);
extern __printf(7, 8) void __ext4_error(struct super_block *, const char *, unsigned int, bool,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luís Henriques lhenriques@suse.de
commit 7ca4b085f430f3774c3838b3da569ceccd6a0177 upstream.
If the filename casefolding fails, we'll be leaking memory from the fscrypt_name struct, namely from the 'crypto_buf.name' member.
Make sure we free it in the error path on both ext4_fname_setup_filename() and ext4_fname_prepare_lookup() functions.
Cc: stable@kernel.org Fixes: 1ae98e295fa2 ("ext4: optimize match for casefolded encrypted dirs") Signed-off-by: Luís Henriques lhenriques@suse.de Reviewed-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20230803091713.13239-1-lhenriques@suse.de Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/crypto.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/ext4/crypto.c +++ b/fs/ext4/crypto.c @@ -33,6 +33,8 @@ int ext4_fname_setup_filename(struct ino
#if IS_ENABLED(CONFIG_UNICODE) err = ext4_fname_setup_ci_filename(dir, iname, fname); + if (err) + ext4_fname_free_filename(fname); #endif return err; } @@ -51,6 +53,8 @@ int ext4_fname_prepare_lookup(struct ino
#if IS_ENABLED(CONFIG_UNICODE) err = ext4_fname_setup_ci_filename(dir, &dentry->d_name, fname); + if (err) + ext4_fname_free_filename(fname); #endif return err; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Brian Foster bfoster@redhat.com
commit 194505b55dd7899da114a4d47825204eefc0fff5 upstream.
The commit referenced below opened up concurrent unaligned dio under shared locking for pure overwrites. In doing so, it enabled use of the IOMAP_DIO_OVERWRITE_ONLY flag and added a warning on unexpected -EAGAIN returns as an extra precaution, since ext4 does not retry writes in such cases. The flag itself is advisory in this case since ext4 checks for unaligned I/Os and uses appropriate locking up front, rather than on a retry in response to -EAGAIN.
As it turns out, the warning check is susceptible to false positives because there are scenarios where -EAGAIN can be expected from lower layers without necessarily having IOCB_NOWAIT set on the iocb. For example, one instance of the warning has been seen where io_uring sets IOCB_HIPRI, which in turn results in REQ_POLLED|REQ_NOWAIT on the bio. This results in -EAGAIN if the block layer is unable to allocate a request, etc. [Note that there is an outstanding patch to untangle REQ_POLLED and REQ_NOWAIT such that the latter relies on IOCB_NOWAIT, which would also address this instance of the warning.]
Another instance of the warning has been reproduced by syzbot. A dio write is interrupted down in __get_user_pages_locked() waiting on the mm lock and returns -EAGAIN up the stack. If the iomap dio iteration layer has made no progress on the write to this point, -EAGAIN returns up to the filesystem and triggers the warning.
This use of the overwrite flag in ext4 is precautionary and half-baked. I.e., ext4 doesn't actually implement overwrite checking in the iomap callbacks when the flag is set, so the only extra verification it provides are i_size checks in the generic iomap dio layer. Combined with the tendency for false positives, the added verification is not worth the extra trouble. Remove the flag, associated warning, and update the comments to document when concurrent unaligned dio writes are allowed and why said flag is not used.
Cc: stable@kernel.org Reported-by: syzbot+5050ad0fb47527b1808a@syzkaller.appspotmail.com Reported-by: Pengfei Xu pengfei.xu@intel.com Fixes: 310ee0902b8d ("ext4: allow concurrent unaligned dio overwrites") Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230810165559.946222-1-bfoster@redhat.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/file.c | 25 ++++++++++--------------- 1 file changed, 10 insertions(+), 15 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 2071b1e4322c..e99cc17b6bd2 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -476,6 +476,11 @@ static ssize_t ext4_dio_write_checks(struct kiocb *iocb, struct iov_iter *from, * required to change security info in file_modified(), for extending * I/O, any form of non-overwrite I/O, and unaligned I/O to unwritten * extents (as partial block zeroing may be required). + * + * Note that unaligned writes are allowed under shared lock so long as + * they are pure overwrites. Otherwise, concurrent unaligned writes risk + * data corruption due to partial block zeroing in the dio layer, and so + * the I/O must occur exclusively. */ if (*ilock_shared && ((!IS_NOSEC(inode) || *extend || !overwrite || @@ -492,21 +497,12 @@ static ssize_t ext4_dio_write_checks(struct kiocb *iocb, struct iov_iter *from,
/* * Now that locking is settled, determine dio flags and exclusivity - * requirements. Unaligned writes are allowed under shared lock so long - * as they are pure overwrites. Set the iomap overwrite only flag as an - * added precaution in this case. Even though this is unnecessary, we - * can detect and warn on unexpected -EAGAIN if an unsafe unaligned - * write is ever submitted. - * - * Otherwise, concurrent unaligned writes risk data corruption due to - * partial block zeroing in the dio layer, and so the I/O must occur - * exclusively. The inode lock is already held exclusive if the write is - * non-overwrite or extending, so drain all outstanding dio and set the - * force wait dio flag. + * requirements. We don't use DIO_OVERWRITE_ONLY because we enforce + * behavior already. The inode lock is already held exclusive if the + * write is non-overwrite or extending, so drain all outstanding dio and + * set the force wait dio flag. */ - if (*ilock_shared && unaligned_io) { - *dio_flags = IOMAP_DIO_OVERWRITE_ONLY; - } else if (!*ilock_shared && (unaligned_io || *extend)) { + if (!*ilock_shared && (unaligned_io || *extend)) { if (iocb->ki_flags & IOCB_NOWAIT) { ret = -EAGAIN; goto out; @@ -608,7 +604,6 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) iomap_ops = &ext4_iomap_overwrite_ops; ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, dio_flags, NULL, 0); - WARN_ON_ONCE(ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT)); if (ret == -ENOTBLK) ret = 0;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jaegeuk Kim jaegeuk@kernel.org
commit d2d9bb3b6d2fbccb5b33d3a85a2830971625a4ea upstream.
https://bugzilla.kernel.org/show_bug.cgi?id=216050
Somehow we're getting a page which has a different mapping. Let's avoid the infinite loop.
Cc: stable@vger.kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/f2fs/data.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
--- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1389,18 +1389,14 @@ struct page *f2fs_get_lock_data_page(str { struct address_space *mapping = inode->i_mapping; struct page *page; -repeat: + page = f2fs_get_read_data_page(inode, index, 0, for_write, NULL); if (IS_ERR(page)) return page;
/* wait for read completion */ lock_page(page); - if (unlikely(page->mapping != mapping)) { - f2fs_put_page(page, 1); - goto repeat; - } - if (unlikely(!PageUptodate(page))) { + if (unlikely(page->mapping != mapping || !PageUptodate(page))) { f2fs_put_page(page, 1); return ERR_PTR(-EIO); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jaegeuk Kim jaegeuk@kernel.org
commit a3ab55746612247ce3dcaac6de66f5ffc055b9df upstream.
Let's flush the inode being aborted atomic operation to avoid stale dirty inode during eviction in this call stack:
f2fs_mark_inode_dirty_sync+0x22/0x40 [f2fs] f2fs_abort_atomic_write+0xc4/0xf0 [f2fs] f2fs_evict_inode+0x3f/0x690 [f2fs] ? sugov_start+0x140/0x140 evict+0xc3/0x1c0 evict_inodes+0x17b/0x210 generic_shutdown_super+0x32/0x120 kill_block_super+0x21/0x50 deactivate_locked_super+0x31/0x90 cleanup_mnt+0x100/0x160 task_work_run+0x59/0x90 do_exit+0x33b/0xa50 do_group_exit+0x2d/0x80 __x64_sys_exit_group+0x14/0x20 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd
This triggers f2fs_bug_on() in f2fs_evict_inode: f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
This fixes the syzbot report:
loop0: detected capacity change from 0 to 131072 F2FS-fs (loop0): invalid crc value F2FS-fs (loop0): Found nat_bits in checkpoint F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4 ------------[ cut here ]------------ kernel BUG at fs/f2fs/inode.c:869! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 5014 Comm: syz-executor220 Not tainted 6.4.0-syzkaller-11479-g6cd06ab12d1a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 RIP: 0010:f2fs_evict_inode+0x172d/0x1e00 fs/f2fs/inode.c:869 Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 6a 06 00 00 8b 75 40 ba 01 00 00 00 4c 89 e7 e8 6d ce 06 00 e9 aa fc ff ff e8 63 22 e2 fd <0f> 0b e8 5c 22 e2 fd 48 c7 c0 a8 3a 18 8d 48 ba 00 00 00 00 00 fc RSP: 0018:ffffc90003a6fa00 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff8880273b8000 RSI: ffffffff83a2bd0d RDI: 0000000000000007 RBP: ffff888077db91b0 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff888029a3c000 R13: ffff888077db9660 R14: ffff888029a3c0b8 R15: ffff888077db9c50 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1909bb9000 CR3: 00000000276a9000 CR4: 0000000000350ef0 Call Trace: <TASK> evict+0x2ed/0x6b0 fs/inode.c:665 dispose_list+0x117/0x1e0 fs/inode.c:698 evict_inodes+0x345/0x440 fs/inode.c:748 generic_shutdown_super+0xaf/0x480 fs/super.c:478 kill_block_super+0x64/0xb0 fs/super.c:1417 kill_f2fs_super+0x2af/0x3c0 fs/f2fs/super.c:4704 deactivate_locked_super+0x98/0x160 fs/super.c:330 deactivate_super+0xb1/0xd0 fs/super.c:361 cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1254 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xa9a/0x29a0 kernel/exit.c:874 do_group_exit+0xd4/0x2a0 kernel/exit.c:1024 __do_sys_exit_group kernel/exit.c:1035 [inline] __se_sys_exit_group kernel/exit.c:1033 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f309be71a09 Code: Unable to access opcode bytes at 0x7f309be719df. RSP: 002b:00007fff171df518 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00007f309bef7330 RCX: 00007f309be71a09 RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 RBP: 0000000000000001 R08: ffffffffffffffc0 R09: 00007f309bef1e40 R10: 0000000000010600 R11: 0000000000000246 R12: 00007f309bef7330 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:f2fs_evict_inode+0x172d/0x1e00 fs/f2fs/inode.c:869 Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 6a 06 00 00 8b 75 40 ba 01 00 00 00 4c 89 e7 e8 6d ce 06 00 e9 aa fc ff ff e8 63 22 e2 fd <0f> 0b e8 5c 22 e2 fd 48 c7 c0 a8 3a 18 8d 48 ba 00 00 00 00 00 fc RSP: 0018:ffffc90003a6fa00 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff8880273b8000 RSI: ffffffff83a2bd0d RDI: 0000000000000007 RBP: ffff888077db91b0 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff888029a3c000 R13: ffff888077db9660 R14: ffff888029a3c0b8 R15: ffff888077db9c50 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1909bb9000 CR3: 00000000276a9000 CR4: 0000000000350ef0
Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+e1246909d526a9d470fa@syzkaller.appspotmail.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/f2fs/segment.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -205,6 +205,8 @@ void f2fs_abort_atomic_write(struct inod f2fs_i_size_write(inode, fi->original_i_size); fi->original_i_size = 0; } + /* avoid stale dirty inode during eviction */ + sync_inode_metadata(inode, 0); }
static int __replace_atomic_write_block(struct inode *inode, pgoff_t index,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jaegeuk Kim jaegeuk@kernel.org
commit 5c13e2388bf3426fd69a89eb46e50469e9624e56 upstream.
====================================================== WARNING: possible circular locking dependency detected 6.5.0-rc5-syzkaller-00353-gae545c3283dc #0 Not tainted ------------------------------------------------------ syz-executor273/5027 is trying to acquire lock: ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2133 [inline] ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644
but task is already holding lock: ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_down_read fs/f2fs/f2fs.h:2108 [inline] ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_add_dentry+0x92/0x230 fs/f2fs/dir.c:783
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&fi->i_xattr_sem){.+.+}-{3:3}: down_read+0x9c/0x470 kernel/locking/rwsem.c:1520 f2fs_down_read fs/f2fs/f2fs.h:2108 [inline] f2fs_getxattr+0xb1e/0x12c0 fs/f2fs/xattr.c:532 __f2fs_get_acl+0x5a/0x900 fs/f2fs/acl.c:179 f2fs_acl_create fs/f2fs/acl.c:377 [inline] f2fs_init_acl+0x15c/0xb30 fs/f2fs/acl.c:420 f2fs_init_inode_metadata+0x159/0x1290 fs/f2fs/dir.c:558 f2fs_add_regular_entry+0x79e/0xb90 fs/f2fs/dir.c:740 f2fs_add_dentry+0x1de/0x230 fs/f2fs/dir.c:788 f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827 f2fs_add_link fs/f2fs/f2fs.h:3554 [inline] f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781 vfs_mkdir+0x532/0x7e0 fs/namei.c:4117 do_mkdirat+0x2a9/0x330 fs/namei.c:4140 __do_sys_mkdir fs/namei.c:4160 [inline] __se_sys_mkdir fs/namei.c:4158 [inline] __x64_sys_mkdir+0xf2/0x140 fs/namei.c:4158 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (&fi->i_sem){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144 lock_acquire kernel/locking/lockdep.c:5761 [inline] lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726 down_write+0x93/0x200 kernel/locking/rwsem.c:1573 f2fs_down_write fs/f2fs/f2fs.h:2133 [inline] f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644 f2fs_add_dentry+0xa6/0x230 fs/f2fs/dir.c:784 f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827 f2fs_add_link fs/f2fs/f2fs.h:3554 [inline] f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781 vfs_mkdir+0x532/0x7e0 fs/namei.c:4117 ovl_do_mkdir fs/overlayfs/overlayfs.h:196 [inline] ovl_mkdir_real+0xb5/0x370 fs/overlayfs/dir.c:146 ovl_workdir_create+0x3de/0x820 fs/overlayfs/super.c:309 ovl_make_workdir fs/overlayfs/super.c:711 [inline] ovl_get_workdir fs/overlayfs/super.c:864 [inline] ovl_fill_super+0xdab/0x6180 fs/overlayfs/super.c:1400 vfs_get_super+0xf9/0x290 fs/super.c:1152 vfs_get_tree+0x88/0x350 fs/super.c:1519 do_new_mount fs/namespace.c:3335 [inline] path_mount+0x1492/0x1ed0 fs/namespace.c:3662 do_mount fs/namespace.c:3675 [inline] __do_sys_mount fs/namespace.c:3884 [inline] __se_sys_mount fs/namespace.c:3861 [inline] __x64_sys_mount+0x293/0x310 fs/namespace.c:3861 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- rlock(&fi->i_xattr_sem); lock(&fi->i_sem); lock(&fi->i_xattr_sem); lock(&fi->i_sem);
Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+e5600587fa9cbf8e3826@syzkaller.appspotmail.com Fixes: 5eda1ad1aaff "f2fs: fix deadlock in i_xattr_sem and inode page lock" Tested-by: Guenter Roeck linux@roeck-us.net Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/f2fs/f2fs.h | 24 +++++++++++++++--------- fs/f2fs/inline.c | 3 ++- 2 files changed, 17 insertions(+), 10 deletions(-)
--- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2122,15 +2122,6 @@ static inline int f2fs_down_read_trylock return down_read_trylock(&sem->internal_rwsem); }
-#ifdef CONFIG_DEBUG_LOCK_ALLOC -static inline void f2fs_down_read_nested(struct f2fs_rwsem *sem, int subclass) -{ - down_read_nested(&sem->internal_rwsem, subclass); -} -#else -#define f2fs_down_read_nested(sem, subclass) f2fs_down_read(sem) -#endif - static inline void f2fs_up_read(struct f2fs_rwsem *sem) { up_read(&sem->internal_rwsem); @@ -2141,6 +2132,21 @@ static inline void f2fs_down_write(struc down_write(&sem->internal_rwsem); }
+#ifdef CONFIG_DEBUG_LOCK_ALLOC +static inline void f2fs_down_read_nested(struct f2fs_rwsem *sem, int subclass) +{ + down_read_nested(&sem->internal_rwsem, subclass); +} + +static inline void f2fs_down_write_nested(struct f2fs_rwsem *sem, int subclass) +{ + down_write_nested(&sem->internal_rwsem, subclass); +} +#else +#define f2fs_down_read_nested(sem, subclass) f2fs_down_read(sem) +#define f2fs_down_write_nested(sem, subclass) f2fs_down_write(sem) +#endif + static inline int f2fs_down_write_trylock(struct f2fs_rwsem *sem) { return down_write_trylock(&sem->internal_rwsem); --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -641,7 +641,8 @@ int f2fs_add_inline_entry(struct inode * }
if (inode) { - f2fs_down_write(&F2FS_I(inode)->i_sem); + f2fs_down_write_nested(&F2FS_I(inode)->i_sem, + SINGLE_DEPTH_NESTING); page = f2fs_init_inode_metadata(inode, dir, fname, ipage); if (IS_ERR(page)) { err = PTR_ERR(page);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit 92382d744176f230101d54f5c017bccd62770f01 upstream.
A recent change in clang allows it to consider more expressions as compile time constants, which causes it to point out an implicit conversion in the scanf tests:
lib/test_scanf.c:661:2: warning: implicit conversion from 'int' to 'unsigned char' changes value from -168 to 88 [-Wconstant-conversion] 661 | test_number_prefix(unsigned char, "0xA7", "%2hhx%hhx", 0, 0xa7, 2, check_uchar); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ lib/test_scanf.c:609:29: note: expanded from macro 'test_number_prefix' 609 | T result[2] = {~expect[0], ~expect[1]}; \ | ~ ^~~~~~~~~~ 1 warning generated.
The result of the bitwise negation is the type of the operand after going through the integer promotion rules, so this truncation is expected but harmless, as the initial values in the result array get overwritten by _test() anyways. Add an explicit cast to the expected type in test_number_prefix() to silence the warning. There is no functional change, as all the tests still pass with GCC 13.1.0 and clang 18.0.0.
Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linuxq/issues/1899 Link: https://github.com/llvm/llvm-project/commit/610ec954e1f81c0e8fcadedcd25afe64... Suggested-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Petr Mladek pmladek@suse.com Signed-off-by: Petr Mladek pmladek@suse.com Link: https://lore.kernel.org/r/20230807-test_scanf-wconstant-conversion-v2-1-839c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/test_scanf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/test_scanf.c +++ b/lib/test_scanf.c @@ -606,7 +606,7 @@ static void __init numbers_slice(void) #define test_number_prefix(T, str, scan_fmt, expect0, expect1, n_args, fn) \ do { \ const T expect[2] = { expect0, expect1 }; \ - T result[2] = {~expect[0], ~expect[1]}; \ + T result[2] = { (T)~expect[0], (T)~expect[1] }; \ \ _test(fn, &expect, str, scan_fmt, n_args, &result[0], &result[1]); \ } while (0)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Marangi ansuelsmth@gmail.com
commit 23316be8a9d450f33a21f1efe7d89570becbec58 upstream.
Commit 5d4753f741d8 ("hwspinlock: qcom: add support for MMIO on older SoCs") introduced and made regmap_config mandatory in the of_data struct but didn't add the regmap_config for sfpb based devices.
SFPB based devices can both use the legacy syscon way to probe or the new MMIO way and currently device that use the MMIO way are broken as they lack the definition of the now required regmap_config and always return -EINVAL (and indirectly makes fail probing everything that depends on it, smem, nandc with smem-parser...)
Fix this by correctly adding the missing regmap_config and restore function of hwspinlock on SFPB based devices with MMIO implementation.
Cc: stable@vger.kernel.org Fixes: 5d4753f741d8 ("hwspinlock: qcom: add support for MMIO on older SoCs") Signed-off-by: Christian Marangi ansuelsmth@gmail.com Link: https://lore.kernel.org/r/20230716022804.21239-1-ansuelsmth@gmail.com Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwspinlock/qcom_hwspinlock.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/hwspinlock/qcom_hwspinlock.c +++ b/drivers/hwspinlock/qcom_hwspinlock.c @@ -69,9 +69,18 @@ static const struct hwspinlock_ops qcom_ .unlock = qcom_hwspinlock_unlock, };
+static const struct regmap_config sfpb_mutex_config = { + .reg_bits = 32, + .reg_stride = 4, + .val_bits = 32, + .max_register = 0x100, + .fast_io = true, +}; + static const struct qcom_hwspinlock_of_data of_sfpb_mutex = { .offset = 0x4, .stride = 0x4, + .regmap_config = &sfpb_mutex_config, };
static const struct regmap_config tcsr_msm8226_mutex_config = {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Weiner hannes@cmpxchg.org
commit 6f0df8e16eb543167f2929cb756e695709a3551d upstream.
In the eviction recency check, we attempt to retrieve the memcg to which the folio belonged when it was evicted, by the memcg id stored in the shadow entry. However, there is a chance that the retrieved memcg is not the original memcg that has been killed, but a new one which happens to have the same id.
This is a somewhat unfortunate, but acceptable and rare inaccuracy in the heuristics. However, if we retrieve this new memcg between its allocation and when it is properly attached to the memcg hierarchy, we could run into the following NULL pointer exception during the memcg hierarchy traversal done in mem_cgroup_get_nr_swap_pages():
[ 155757.793456] BUG: kernel NULL pointer dereference, address: 00000000000000c0 [ 155757.807568] #PF: supervisor read access in kernel mode [ 155757.818024] #PF: error_code(0x0000) - not-present page [ 155757.828482] PGD 401f77067 P4D 401f77067 PUD 401f76067 PMD 0 [ 155757.839985] Oops: 0000 [#1] SMP [ 155757.887870] RIP: 0010:mem_cgroup_get_nr_swap_pages+0x3d/0xb0 [ 155757.899377] Code: 29 19 4a 02 48 39 f9 74 63 48 8b 97 c0 00 00 00 48 8b b7 58 02 00 00 48 2b b7 c0 01 00 00 48 39 f0 48 0f 4d c6 48 39 d1 74 42 <48> 8b b2 c0 00 00 00 48 8b ba 58 02 00 00 48 2b ba c0 01 00 00 48 [ 155757.937125] RSP: 0018:ffffc9002ecdfbc8 EFLAGS: 00010286 [ 155757.947755] RAX: 00000000003a3b1c RBX: 000007ffffffffff RCX: ffff888280183000 [ 155757.962202] RDX: 0000000000000000 RSI: 0007ffffffffffff RDI: ffff888bbc2d1000 [ 155757.976648] RBP: 0000000000000001 R08: 000000000000000b R09: ffff888ad9cedba0 [ 155757.991094] R10: ffffea0039c07900 R11: 0000000000000010 R12: ffff888b23a7b000 [ 155758.005540] R13: 0000000000000000 R14: ffff888bbc2d1000 R15: 000007ffffc71354 [ 155758.019991] FS: 00007f6234c68640(0000) GS:ffff88903f9c0000(0000) knlGS:0000000000000000 [ 155758.036356] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 155758.048023] CR2: 00000000000000c0 CR3: 0000000a83eb8004 CR4: 00000000007706e0 [ 155758.062473] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 155758.076924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 155758.091376] PKRU: 55555554 [ 155758.096957] Call Trace: [ 155758.102016] <TASK> [ 155758.106502] ? __die+0x78/0xc0 [ 155758.112793] ? page_fault_oops+0x286/0x380 [ 155758.121175] ? exc_page_fault+0x5d/0x110 [ 155758.129209] ? asm_exc_page_fault+0x22/0x30 [ 155758.137763] ? mem_cgroup_get_nr_swap_pages+0x3d/0xb0 [ 155758.148060] workingset_test_recent+0xda/0x1b0 [ 155758.157133] workingset_refault+0xca/0x1e0 [ 155758.165508] filemap_add_folio+0x4d/0x70 [ 155758.173538] page_cache_ra_unbounded+0xed/0x190 [ 155758.182919] page_cache_sync_ra+0xd6/0x1e0 [ 155758.191738] filemap_read+0x68d/0xdf0 [ 155758.199495] ? mlx5e_napi_poll+0x123/0x940 [ 155758.207981] ? __napi_schedule+0x55/0x90 [ 155758.216095] __x64_sys_pread64+0x1d6/0x2c0 [ 155758.224601] do_syscall_64+0x3d/0x80 [ 155758.232058] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 155758.242473] RIP: 0033:0x7f62c29153b5 [ 155758.249938] Code: e8 48 89 75 f0 89 7d f8 48 89 4d e0 e8 b4 e6 f7 ff 41 89 c0 4c 8b 55 e0 48 8b 55 e8 48 8b 75 f0 8b 7d f8 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 45 f8 e8 e7 e6 f7 ff 48 8b [ 155758.288005] RSP: 002b:00007f6234c5ffd0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 [ 155758.303474] RAX: ffffffffffffffda RBX: 00007f628c4e70c0 RCX: 00007f62c29153b5 [ 155758.318075] RDX: 000000000003c041 RSI: 00007f61d2986000 RDI: 0000000000000076 [ 155758.332678] RBP: 00007f6234c5fff0 R08: 0000000000000000 R09: 0000000064d5230c [ 155758.347452] R10: 000000000027d450 R11: 0000000000000293 R12: 000000000003c041 [ 155758.362044] R13: 00007f61d2986000 R14: 00007f629e11b060 R15: 000000000027d450 [ 155758.376661] </TASK>
This patch fixes the issue by moving the memcg's id publication from the alloc stage to online stage, ensuring that any memcg acquired via id must be connected to the memcg tree.
Link: https://lkml.kernel.org/r/20230823225430.166925-1-nphamcs@gmail.com Fixes: f78dfc7b77d5 ("workingset: fix confusion around eviction vs refault container") Signed-off-by: Johannes Weiner hannes@cmpxchg.org Co-developed-by: Nhat Pham nphamcs@gmail.com Signed-off-by: Nhat Pham nphamcs@gmail.com Acked-by: Shakeel Butt shakeelb@google.com Cc: Yosry Ahmed yosryahmed@google.com Cc: Michal Hocko mhocko@suse.com Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memcontrol.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
--- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5329,7 +5329,6 @@ static struct mem_cgroup *mem_cgroup_all INIT_LIST_HEAD(&memcg->deferred_split_queue.split_queue); memcg->deferred_split_queue.split_queue_len = 0; #endif - idr_replace(&mem_cgroup_idr, memcg, memcg->id.id); lru_gen_init_memcg(memcg); return memcg; fail: @@ -5401,14 +5400,27 @@ static int mem_cgroup_css_online(struct if (alloc_shrinker_info(memcg)) goto offline_kmem;
- /* Online state pins memcg ID, memcg ID pins CSS */ - refcount_set(&memcg->id.ref, 1); - css_get(css); - if (unlikely(mem_cgroup_is_root(memcg))) queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME); lru_gen_online_memcg(memcg); + + /* Online state pins memcg ID, memcg ID pins CSS */ + refcount_set(&memcg->id.ref, 1); + css_get(css); + + /* + * Ensure mem_cgroup_from_id() works once we're fully online. + * + * We could do this earlier and require callers to filter with + * css_tryget_online(). But right now there are no users that + * need earlier access, and the workingset code relies on the + * cgroup tree linkage (mem_cgroup_get_nr_swap_pages()). So + * publish it here at the end of onlining. This matches the + * regular ID destruction during offlining. + */ + idr_replace(&mem_cgroup_idr, memcg, memcg->id.id); + return 0; offline_kmem: memcg_offline_kmem(memcg);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Werner Fischer devlists@wefi.net
commit 2a2df98ec592667927b5c1351afa6493ea125c9f upstream.
Elkhart Lake is the successor of Apollo Lake and Gemini Lake. These CPUs and their PCHs are used in mobile and embedded environments.
With this patch I suggest that Elkhart Lake SATA controllers [1] should use the default LPM policy for mobile chipsets. The disadvantage of missing hot-plug support with this setting should not be an issue, as those CPUs are used in embedded environments and not in servers with hot-plug backplanes.
We discovered that the Elkhart Lake SATA controllers have been missing in ahci.c after a customer reported the throttling of his SATA SSD after a short period of higher I/O. We determined the high temperature of the SSD controller in idle mode as the root cause for that.
Depending on the used SSD, we have seen up to 1.8 Watt lower system idle power usage and up to 30°C lower SSD controller temperatures in our tests, when we set med_power_with_dipm manually. I have provided a table showing seven different SATA SSDs from ATP, Intel/Solidigm and Samsung [2].
Intel lists a total of 3 SATA controller IDs (4B60, 4B62, 4B63) in [1] for those mobile PCHs. This commit just adds 0x4b63 as I do not have test systems with 0x4b60 and 0x4b62 SATA controllers. I have tested this patch with a system which uses 0x4b63 as SATA controller.
[1] https://sata-io.org/product/8803 [2] https://www.thomas-krenn.com/en/wiki/SATA_Link_Power_Management#Example_LES_...
Signed-off-by: Werner Fischer devlists@wefi.net Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/ahci.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -421,6 +421,8 @@ static const struct pci_device_id ahci_p { PCI_VDEVICE(INTEL, 0x34d3), board_ahci_low_power }, /* Ice Lake LP AHCI */ { PCI_VDEVICE(INTEL, 0x02d3), board_ahci_low_power }, /* Comet Lake PCH-U AHCI */ { PCI_VDEVICE(INTEL, 0x02d7), board_ahci_low_power }, /* Comet Lake PCH RAID */ + /* Elkhart Lake IDs 0x4b60 & 0x4b62 https://sata-io.org/product/8803 not tested yet */ + { PCI_VDEVICE(INTEL, 0x4b63), board_ahci_low_power }, /* Elkhart Lake AHCI */
/* JMicron 360/1/3/5/6, match class to avoid IDE function */ { PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Schmitz schmitzmic@gmail.com
commit 8a1f00b753ecfdb117dc1a07e68c46d80e7923ea upstream.
With commit 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver with pata_falcon and falconide"), the Q40 IDE driver was replaced by pata_falcon.c.
Both IO and memory resources were defined for the Q40 IDE platform device, but definition of the IDE register addresses was modeled after the Falcon case, both in use of the memory resources and in including register shift and byte vs. word offset in the address.
This was correct for the Falcon case, which does not apply any address translation to the register addresses. In the Q40 case, all of device base address, byte access offset and register shift is included in the platform specific ISA access translation (in asm/mm_io.h).
As a consequence, such address translation gets applied twice, and register addresses are mangled.
Use the device base address from the platform IO resource for Q40 (the IO address translation will then add the correct ISA window base address and byte access offset), with register shift 1. Use MMIO base address and register shift 2 as before for Falcon.
Encode PIO_OFFSET into IO port addresses for all registers for Q40 except the data transfer register. Encode the MMIO offset there (pata_falcon_data_xfer() directly uses raw IO with no address translation).
Reported-by: William R Sowerbutts will@sowerbutts.com Closes: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcagV... Link: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcagV... Fixes: 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver with pata_falcon and falconide") Cc: stable@vger.kernel.org Cc: Finn Thain fthain@linux-m68k.org Cc: Geert Uytterhoeven geert@linux-m68k.org Tested-by: William R Sowerbutts will@sowerbutts.com Signed-off-by: Michael Schmitz schmitzmic@gmail.com Reviewed-by: Sergey Shtylyov s.shtylyov@omp.ru Reviewed-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/pata_falcon.c | 50 ++++++++++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 21 deletions(-)
--- a/drivers/ata/pata_falcon.c +++ b/drivers/ata/pata_falcon.c @@ -123,8 +123,8 @@ static int __init pata_falcon_init_one(s struct resource *base_res, *ctl_res, *irq_res; struct ata_host *host; struct ata_port *ap; - void __iomem *base; - int irq = 0; + void __iomem *base, *ctl_base; + int irq = 0, io_offset = 1, reg_shift = 2; /* Falcon defaults */
dev_info(&pdev->dev, "Atari Falcon and Q40/Q60 PATA controller\n");
@@ -165,26 +165,34 @@ static int __init pata_falcon_init_one(s ap->pio_mask = ATA_PIO4; ap->flags |= ATA_FLAG_SLAVE_POSS | ATA_FLAG_NO_IORDY;
- base = (void __iomem *)base_mem_res->start; /* N.B. this assumes data_addr will be used for word-sized I/O only */ - ap->ioaddr.data_addr = base + 0 + 0 * 4; - ap->ioaddr.error_addr = base + 1 + 1 * 4; - ap->ioaddr.feature_addr = base + 1 + 1 * 4; - ap->ioaddr.nsect_addr = base + 1 + 2 * 4; - ap->ioaddr.lbal_addr = base + 1 + 3 * 4; - ap->ioaddr.lbam_addr = base + 1 + 4 * 4; - ap->ioaddr.lbah_addr = base + 1 + 5 * 4; - ap->ioaddr.device_addr = base + 1 + 6 * 4; - ap->ioaddr.status_addr = base + 1 + 7 * 4; - ap->ioaddr.command_addr = base + 1 + 7 * 4; - - base = (void __iomem *)ctl_mem_res->start; - ap->ioaddr.altstatus_addr = base + 1; - ap->ioaddr.ctl_addr = base + 1; - - ata_port_desc(ap, "cmd 0x%lx ctl 0x%lx", - (unsigned long)base_mem_res->start, - (unsigned long)ctl_mem_res->start); + ap->ioaddr.data_addr = (void __iomem *)base_mem_res->start; + + if (base_res) { /* only Q40 has IO resources */ + io_offset = 0x10000; + reg_shift = 0; + base = (void __iomem *)base_res->start; + ctl_base = (void __iomem *)ctl_res->start; + } else { + base = (void __iomem *)base_mem_res->start; + ctl_base = (void __iomem *)ctl_mem_res->start; + } + + ap->ioaddr.error_addr = base + io_offset + (1 << reg_shift); + ap->ioaddr.feature_addr = base + io_offset + (1 << reg_shift); + ap->ioaddr.nsect_addr = base + io_offset + (2 << reg_shift); + ap->ioaddr.lbal_addr = base + io_offset + (3 << reg_shift); + ap->ioaddr.lbam_addr = base + io_offset + (4 << reg_shift); + ap->ioaddr.lbah_addr = base + io_offset + (5 << reg_shift); + ap->ioaddr.device_addr = base + io_offset + (6 << reg_shift); + ap->ioaddr.status_addr = base + io_offset + (7 << reg_shift); + ap->ioaddr.command_addr = base + io_offset + (7 << reg_shift); + + ap->ioaddr.altstatus_addr = ctl_base + io_offset; + ap->ioaddr.ctl_addr = ctl_base + io_offset; + + ata_port_desc(ap, "cmd %px ctl %px data %px", + base, ctl_base, ap->ioaddr.data_addr);
irq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 0); if (irq_res && irq_res->start > 0) {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit 8566572bf3b4d6e416a4bf2110dbb4817d11ba59 upstream.
Add the missing MODULE_DESCRIPTION() to avoid warnings such as:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/ata/sata_gemini.o
when compiling with W=1.
Fixes: be4e456ed3a5 ("ata: Add driver for Faraday Technology FTIDE010") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/sata_gemini.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/ata/sata_gemini.c +++ b/drivers/ata/sata_gemini.c @@ -428,6 +428,7 @@ static struct platform_driver gemini_sat }; module_platform_driver(gemini_sata_driver);
+MODULE_DESCRIPTION("low level driver for Cortina Systems Gemini SATA bridge"); MODULE_AUTHOR("Linus Walleij linus.walleij@linaro.org"); MODULE_LICENSE("GPL"); MODULE_ALIAS("platform:" DRV_NAME);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit 7274eef5729037300f29d14edeb334a47a098f65 upstream.
Add the missing MODULE_DESCRIPTION() to avoid warnings such as:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/ata/pata_ftide010.o
when compiling with W=1.
Fixes: be4e456ed3a5 ("ata: Add driver for Faraday Technology FTIDE010") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/pata_ftide010.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/ata/pata_ftide010.c +++ b/drivers/ata/pata_ftide010.c @@ -567,6 +567,7 @@ static struct platform_driver pata_ftide }; module_platform_driver(pata_ftide010_driver);
+MODULE_DESCRIPTION("low level driver for Faraday Technology FTIDE010"); MODULE_AUTHOR("Linus Walleij linus.walleij@linaro.org"); MODULE_LICENSE("GPL"); MODULE_ALIAS("platform:" DRV_NAME);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: ruanmeisi ruan.meisi@zte.com.cn
commit b8bd342d50cbf606666488488f9fea374aceb2d5 upstream.
During our debugging of glusterfs, we found an Assertion failed error: inode_lookup >= nlookup, which was caused by the nlookup value in the kernel being greater than that in the FUSE file system.
The issue was introduced by fuse_direntplus_link, where in the function, fuse_iget increments nlookup, and if d_splice_alias returns failure, fuse_direntplus_link returns failure without decrementing nlookup https://github.com/gluster/glusterfs/pull/4081
Signed-off-by: ruanmeisi ruan.meisi@zte.com.cn Fixes: 0b05b18381ee ("fuse: implement NFS-like readdirplus support") Cc: stable@vger.kernel.org # v3.9 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/readdir.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/fs/fuse/readdir.c +++ b/fs/fuse/readdir.c @@ -243,8 +243,16 @@ retry: dput(dentry); dentry = alias; } - if (IS_ERR(dentry)) + if (IS_ERR(dentry)) { + if (!IS_ERR(inode)) { + struct fuse_inode *fi = get_fuse_inode(inode); + + spin_lock(&fi->lock); + fi->nlookup--; + spin_unlock(&fi->lock); + } return PTR_ERR(dentry); + } } if (fc->readdirplus_auto) set_bit(FUSE_I_INIT_RDPLUS, &get_fuse_inode(inode)->state);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naohiro Aota naohiro.aota@wdc.com
commit 332581bde2a419d5f12a93a1cdc2856af649a3cc upstream.
When multiple writes happen at once, we may need to sacrifice a currently active block group to be zone finished for a new allocation. We choose a block group with the least free space left, and zone finish it.
To do the finishing, we need to send IOs for already allocated region and wait for them and on-going IOs. Otherwise, these IOs fail because the zone is already finished at the time the IO reach a device.
However, if a block group dedicated to the data relocation is zone finished, there is a chance that finishing it before an ongoing write IO reaches the device. That is because there is timing gap between an allocation is done (block_group->reservations == 0, as pre-allocation is done) and an ordered extent is created when the relocation IO starts. Thus, if we finish the zone between them, we can fail the IOs.
We cannot simply use "fs_info->data_reloc_bg == block_group->start" to avoid the zone finishing. Because, the data_reloc_bg may already switch to a new block group, while there are still ongoing write IOs to the old data_reloc_bg.
So, this patch reworks the BLOCK_GROUP_FLAG_ZONED_DATA_RELOC bit to indicate there is a data relocation allocation and/or ongoing write to the block group. The bit is set on allocation and cleared in end_io function of the last IO for the currently allocated region.
To change the timing of the bit setting also solves the issue that the bit being left even after there is no IO going on. With the current code, if the data_reloc_bg switches after the last IO to the current data_reloc_bg, the bit is set at this timing and there is no one clearing that bit. As a result, that block group is kept unallocatable for anything.
Fixes: 343d8a30851c ("btrfs: zoned: prevent allocation from previous data relocation BG") Fixes: 74e91b12b115 ("btrfs: zoned: zone finish unused block group") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Naohiro Aota naohiro.aota@wdc.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent-tree.c | 43 +++++++++++++++++++++++-------------------- fs/btrfs/zoned.c | 16 +++++++++++++--- 2 files changed, 36 insertions(+), 23 deletions(-)
--- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3709,7 +3709,8 @@ static int do_allocation_zoned(struct bt fs_info->data_reloc_bg == 0);
if (block_group->ro || - test_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &block_group->runtime_flags)) { + (!ffe_ctl->for_data_reloc && + test_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &block_group->runtime_flags))) { ret = 1; goto out; } @@ -3752,8 +3753,26 @@ static int do_allocation_zoned(struct bt if (ffe_ctl->for_treelog && !fs_info->treelog_bg) fs_info->treelog_bg = block_group->start;
- if (ffe_ctl->for_data_reloc && !fs_info->data_reloc_bg) - fs_info->data_reloc_bg = block_group->start; + if (ffe_ctl->for_data_reloc) { + if (!fs_info->data_reloc_bg) + fs_info->data_reloc_bg = block_group->start; + /* + * Do not allow allocations from this block group, unless it is + * for data relocation. Compared to increasing the ->ro, setting + * the ->zoned_data_reloc_ongoing flag still allows nocow + * writers to come in. See btrfs_inc_nocow_writers(). + * + * We need to disable an allocation to avoid an allocation of + * regular (non-relocation data) extent. With mix of relocation + * extents and regular extents, we can dispatch WRITE commands + * (for relocation extents) and ZONE APPEND commands (for + * regular extents) at the same time to the same zone, which + * easily break the write pointer. + * + * Also, this flag avoids this block group to be zone finished. + */ + set_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &block_group->runtime_flags); + }
ffe_ctl->found_offset = start + block_group->alloc_offset; block_group->alloc_offset += num_bytes; @@ -3771,24 +3790,8 @@ static int do_allocation_zoned(struct bt out: if (ret && ffe_ctl->for_treelog) fs_info->treelog_bg = 0; - if (ret && ffe_ctl->for_data_reloc && - fs_info->data_reloc_bg == block_group->start) { - /* - * Do not allow further allocations from this block group. - * Compared to increasing the ->ro, setting the - * ->zoned_data_reloc_ongoing flag still allows nocow - * writers to come in. See btrfs_inc_nocow_writers(). - * - * We need to disable an allocation to avoid an allocation of - * regular (non-relocation data) extent. With mix of relocation - * extents and regular extents, we can dispatch WRITE commands - * (for relocation extents) and ZONE APPEND commands (for - * regular extents) at the same time to the same zone, which - * easily break the write pointer. - */ - set_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &block_group->runtime_flags); + if (ret && ffe_ctl->for_data_reloc) fs_info->data_reloc_bg = 0; - } spin_unlock(&fs_info->relocation_bg_lock); spin_unlock(&fs_info->treelog_bg_lock); spin_unlock(&block_group->lock); --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2017,6 +2017,10 @@ static int do_zone_finish(struct btrfs_b * and block_group->meta_write_pointer for metadata. */ if (!fully_written) { + if (test_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &block_group->runtime_flags)) { + spin_unlock(&block_group->lock); + return -EAGAIN; + } spin_unlock(&block_group->lock);
ret = btrfs_inc_block_group_ro(block_group, false); @@ -2045,7 +2049,9 @@ static int do_zone_finish(struct btrfs_b return 0; }
- if (block_group->reserved) { + if (block_group->reserved || + test_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, + &block_group->runtime_flags)) { spin_unlock(&block_group->lock); btrfs_dec_block_group_ro(block_group); return -EAGAIN; @@ -2276,7 +2282,10 @@ void btrfs_zoned_release_data_reloc_bg(s
/* All relocation extents are written. */ if (block_group->start + block_group->alloc_offset == logical + length) { - /* Now, release this block group for further allocations. */ + /* + * Now, release this block group for further allocations and + * zone finish. + */ clear_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &block_group->runtime_flags); } @@ -2300,7 +2309,8 @@ int btrfs_zone_finish_one_bg(struct btrf
spin_lock(&block_group->lock); if (block_group->reserved || block_group->alloc_offset == 0 || - (block_group->flags & BTRFS_BLOCK_GROUP_SYSTEM)) { + (block_group->flags & BTRFS_BLOCK_GROUP_SYSTEM) || + test_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &block_group->runtime_flags)) { spin_unlock(&block_group->lock); continue; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov boris@bur.io
commit a6496849671a5bc9218ecec25a983253b34351b1 upstream.
btrfs_start_transaction reserves metadata space of the PERTRANS type before it identifies a transaction to start/join. This allows flushing when reserving that space without a deadlock. However, it results in a race which temporarily breaks qgroup rsv accounting.
T1 T2 start_transaction do_stuff start_transaction qgroup_reserve_meta_pertrans commit_transaction qgroup_free_meta_all_pertrans hit an error starting txn goto reserve_fail qgroup_free_meta_pertrans (already freed!)
The basic issue is that there is nothing preventing another commit from committing before start_transaction finishes (in fact sometimes we intentionally wait for it) so any error path that frees the reserve is at risk of this race.
While this exact space was getting freed anyway, and it's not a huge deal to double free it (just a warning, the free code catches this), it can result in incorrectly freeing some other pertrans reservation in this same reservation, which could then lead to spuriously granting reservations we might not have the space for. Therefore, I do believe it is worth fixing.
To fix it, use the existing prealloc->pertrans conversion mechanism. When we first reserve the space, we reserve prealloc space and only when we are sure we have a transaction do we convert it to pertrans. This way any racing commits do not blow away our reservation, but we still get a pertrans reservation that is freed when _this_ transaction gets committed.
This issue can be reproduced by running generic/269 with either qgroups or squotas enabled via mkfs on the scratch device.
Reviewed-by: Josef Bacik josef@toxicpanda.com CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/transaction.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
--- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -591,8 +591,13 @@ start_transaction(struct btrfs_root *roo u64 delayed_refs_bytes = 0;
qgroup_reserved = num_items * fs_info->nodesize; - ret = btrfs_qgroup_reserve_meta_pertrans(root, qgroup_reserved, - enforce_qgroups); + /* + * Use prealloc for now, as there might be a currently running + * transaction that could free this reserved space prematurely + * by committing. + */ + ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_reserved, + enforce_qgroups, false); if (ret) return ERR_PTR(ret);
@@ -705,6 +710,14 @@ again: h->reloc_reserved = reloc_reserved; }
+ /* + * Now that we have found a transaction to be a part of, convert the + * qgroup reservation from prealloc to pertrans. A different transaction + * can't race in and free our pertrans out from under us. + */ + if (qgroup_reserved) + btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved); + got_it: if (!current->journal_info) current->journal_info = h; @@ -752,7 +765,7 @@ alloc_fail: btrfs_block_rsv_release(fs_info, &fs_info->trans_block_rsv, num_bytes, NULL); reserve_fail: - btrfs_qgroup_free_meta_pertrans(root, qgroup_reserved); + btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved); return ERR_PTR(ret); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov boris@bur.io
commit e28b02118b94e42be3355458a2406c6861e2dd32 upstream.
If we do a write whose bio suffers an error, we will never reclaim the qgroup reserved space for it. We allocate the space in the write_iter codepath, then release the reservation as we allocate the ordered extent, but we only create a delayed ref if the ordered extent finishes. If it has an error, we simply leak the rsv. This is apparent in running any error injecting (dmerror) fstests like btrfs/146 or btrfs/160. Such tests fail due to dmesg on umount complaining about the leaked qgroup data space.
When we clean up other aspects of space on failed ordered_extents, also free the qgroup rsv.
Reviewed-by: Josef Bacik josef@toxicpanda.com CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3359,6 +3359,13 @@ out: btrfs_free_reserved_extent(fs_info, ordered_extent->disk_bytenr, ordered_extent->disk_num_bytes, 1); + /* + * Actually free the qgroup rsv which was released when + * the ordered extent was created. + */ + btrfs_qgroup_free_refroot(fs_info, inode->root->root_key.objectid, + ordered_extent->qgroup_rsv, + BTRFS_QGROUP_RSV_DATA); } }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit 4490e803e1fe9fab8db5025e44e23b55df54078b upstream.
When joining a transaction with TRANS_JOIN_NOSTART, if we don't find a running transaction we end up creating one. This goes against the purpose of TRANS_JOIN_NOSTART which is to join a running transaction if its state is at or below the state TRANS_STATE_COMMIT_START, otherwise return an -ENOENT error and don't start a new transaction. So fix this to not create a new transaction if there's no running transaction at or below that state.
CC: stable@vger.kernel.org # 4.14+ Fixes: a6d155d2e363 ("Btrfs: fix deadlock between fiemap and transaction commits") Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/transaction.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -292,10 +292,11 @@ loop: spin_unlock(&fs_info->trans_lock);
/* - * If we are ATTACH, we just want to catch the current transaction, - * and commit it. If there is no transaction, just return ENOENT. + * If we are ATTACH or TRANS_JOIN_NOSTART, we just want to catch the + * current transaction, and commit it. If there is no transaction, just + * return ENOENT. */ - if (type == TRANS_ATTACH) + if (type == TRANS_ATTACH || type == TRANS_JOIN_NOSTART) return -ENOENT;
/*
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
commit e7f1326cc24e22b38afc3acd328480a1183f9e79 upstream.
One of the CI runs triggered the following panic
assertion failed: PagePrivate(page) && page->private, in fs/btrfs/subpage.c:229 ------------[ cut here ]------------ kernel BUG at fs/btrfs/subpage.c:229! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 0 PID: 923660 Comm: btrfs Not tainted 6.5.0-rc3+ #1 pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : btrfs_subpage_assert+0xbc/0xf0 lr : btrfs_subpage_assert+0xbc/0xf0 sp : ffff800093213720 x29: ffff800093213720 x28: ffff8000932138b4 x27: 000000000c280000 x26: 00000001b5d00000 x25: 000000000c281000 x24: 000000000c281fff x23: 0000000000001000 x22: 0000000000000000 x21: ffffff42b95bf880 x20: ffff42b9528e0000 x19: 0000000000001000 x18: ffffffffffffffff x17: 667274622f736620 x16: 6e69202c65746176 x15: 0000000000000028 x14: 0000000000000003 x13: 00000000002672d7 x12: 0000000000000000 x11: ffffcd3f0ccd9204 x10: ffffcd3f0554ae50 x9 : ffffcd3f0379528c x8 : ffff800093213428 x7 : 0000000000000000 x6 : ffffcd3f091771e8 x5 : ffff42b97f333948 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff42b9556cde80 x0 : 000000000000004f Call trace: btrfs_subpage_assert+0xbc/0xf0 btrfs_subpage_set_dirty+0x38/0xa0 btrfs_page_set_dirty+0x58/0x88 relocate_one_page+0x204/0x5f0 relocate_file_extent_cluster+0x11c/0x180 relocate_data_extent+0xd0/0xf8 relocate_block_group+0x3d0/0x4e8 btrfs_relocate_block_group+0x2d8/0x490 btrfs_relocate_chunk+0x54/0x1a8 btrfs_balance+0x7f4/0x1150 btrfs_ioctl+0x10f0/0x20b8 __arm64_sys_ioctl+0x120/0x11d8 invoke_syscall.constprop.0+0x80/0xd8 do_el0_svc+0x6c/0x158 el0_svc+0x50/0x1b0 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x194/0x198 Code: 91098021 b0007fa0 91346000 97e9c6d2 (d4210000)
This is the same problem outlined in 17b17fcd6d44 ("btrfs: set_page_extent_mapped after read_folio in btrfs_cont_expand") , and the fix is the same. I originally looked for the same pattern elsewhere in our code, but mistakenly skipped over this code because I saw the page cache readahead before we set_page_extent_mapped, not realizing that this was only in the !page case, that we can still end up with a !uptodate page and then do the btrfs_read_folio further down.
The fix here is the same as the above mentioned patch, move the set_page_extent_mapped call to after the btrfs_read_folio() block to make sure that we have the subpage blocksize stuff setup properly before using the page.
CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/relocation.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -3006,9 +3006,6 @@ static int relocate_one_page(struct inod if (!page) return -ENOMEM; } - ret = set_page_extent_mapped(page); - if (ret < 0) - goto release_page;
if (PageReadahead(page)) page_cache_async_readahead(inode->i_mapping, ra, NULL, @@ -3024,6 +3021,15 @@ static int relocate_one_page(struct inod } }
+ /* + * We could have lost page private when we dropped the lock to read the + * page above, make sure we set_page_extent_mapped here so we have any + * of the subpage blocksize stuff we need in place. + */ + ret = set_page_extent_mapped(page); + if (ret < 0) + goto release_page; + page_start = page_offset(page); page_end = page_start + PAGE_SIZE - 1;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naohiro Aota naohiro.aota@wdc.com
commit 5b135b382a360f4c87cf8896d1465b0b07f10cb0 upstream.
Now that, we can re-enable metadata over-commit. As we moved the activation from the reservation time to the write time, we no longer need to ensure all the reserved bytes is properly activated.
Without the metadata over-commit, it suffers from lower performance because it needs to flush the delalloc items more often and allocate more block groups. Re-enabling metadata over-commit will solve the issue.
Fixes: 79417d040f4f ("btrfs: zoned: disable metadata overcommit for zoned") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Naohiro Aota naohiro.aota@wdc.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/space-info.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
--- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -389,11 +389,7 @@ int btrfs_can_overcommit(struct btrfs_fs return 0;
used = btrfs_space_info_used(space_info, true); - if (test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &fs_info->flags) && - (space_info->flags & BTRFS_BLOCK_GROUP_METADATA)) - avail = 0; - else - avail = calc_available_free_space(fs_info, space_info, flush); + avail = calc_available_free_space(fs_info, space_info, flush);
if (used + bytes < space_info->total_bytes + avail) return 1;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anand Jain anand.jain@oracle.com
commit d167aa76dc0683828588c25767da07fb549e4f48 upstream.
The function btrfs_validate_super() should verify the fsid in the provided superblock argument. Because, all its callers expect it to do that.
Such as in the following stack:
write_all_supers() sb = fs_info->super_for_commit; btrfs_validate_write_super(.., sb) btrfs_validate_super(.., sb, ..)
scrub_one_super() btrfs_validate_super(.., sb, ..)
And check_dev_super() btrfs_validate_super(.., sb, ..)
However, it currently verifies the fs_info::super_copy::fsid instead, which is not correct. Fix this using the correct fsid in the superblock argument.
CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Tested-by: Guilherme G. Piccoli gpiccoli@igalia.com Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/disk-io.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2384,11 +2384,10 @@ int btrfs_validate_super(struct btrfs_fs ret = -EINVAL; }
- if (memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid, - BTRFS_FSID_SIZE)) { + if (memcmp(fs_info->fs_devices->fsid, sb->fsid, BTRFS_FSID_SIZE) != 0) { btrfs_err(fs_info, "superblock fsid doesn't match fsid of fs_devices: %pU != %pU", - fs_info->super_copy->fsid, fs_info->fs_devices->fsid); + sb->fsid, fs_info->fs_devices->fsid); ret = -EINVAL; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
commit 1dc4888e725dc748b82858984f2a5bd41efc5201 upstream.
Since commit e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure"), scrub no longer re-use the same path for extent tree search.
This can lead to unnecessary extent tree search, especially for the new stripe based scrub, as we have way more stripes to prepare.
This patch would re-introduce a shared path for extent tree search, and properly release it when the block group is scrubbed.
This change alone can improve scrub performance slightly by reducing the time spend preparing the stripe thus improving the queue depth.
Before (with regression):
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 15578.00 993616.00 5.00 0.03 0.09 63.78 1.32 100.00
After (with this patch):
nvme0n1p3 15875.00 1013328.00 12.00 0.08 0.08 63.83 1.35 100.00
Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure") CC: stable@vger.kernel.org # 6.4+ Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/scrub.c | 41 +++++++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 12 deletions(-)
--- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -175,6 +175,7 @@ struct scrub_ctx { struct scrub_stripe stripes[SCRUB_STRIPES_PER_SCTX]; struct scrub_stripe *raid56_data_stripes; struct btrfs_fs_info *fs_info; + struct btrfs_path extent_path; int first_free; int cur_stripe; atomic_t cancel_req; @@ -339,6 +340,8 @@ static noinline_for_stack struct scrub_c refcount_set(&sctx->refs, 1); sctx->is_dev_replace = is_dev_replace; sctx->fs_info = fs_info; + sctx->extent_path.search_commit_root = 1; + sctx->extent_path.skip_locking = 1; for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++) { int ret;
@@ -1468,6 +1471,7 @@ static void scrub_stripe_reset_bitmaps(s * Return <0 for error. */ static int scrub_find_fill_first_stripe(struct btrfs_block_group *bg, + struct btrfs_path *extent_path, struct btrfs_device *dev, u64 physical, int mirror_num, u64 logical_start, u32 logical_len, @@ -1477,7 +1481,6 @@ static int scrub_find_fill_first_stripe( struct btrfs_root *extent_root = btrfs_extent_root(fs_info, bg->start); struct btrfs_root *csum_root = btrfs_csum_root(fs_info, bg->start); const u64 logical_end = logical_start + logical_len; - struct btrfs_path path = { 0 }; u64 cur_logical = logical_start; u64 stripe_end; u64 extent_start; @@ -1493,14 +1496,13 @@ static int scrub_find_fill_first_stripe( /* The range must be inside the bg. */ ASSERT(logical_start >= bg->start && logical_end <= bg->start + bg->length);
- path.search_commit_root = 1; - path.skip_locking = 1; - - ret = find_first_extent_item(extent_root, &path, logical_start, logical_len); + ret = find_first_extent_item(extent_root, extent_path, logical_start, + logical_len); /* Either error or not found. */ if (ret) goto out; - get_extent_info(&path, &extent_start, &extent_len, &extent_flags, &extent_gen); + get_extent_info(extent_path, &extent_start, &extent_len, &extent_flags, + &extent_gen); if (extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) stripe->nr_meta_extents++; if (extent_flags & BTRFS_EXTENT_FLAG_DATA) @@ -1528,7 +1530,7 @@ static int scrub_find_fill_first_stripe(
/* Fill the extent info for the remaining sectors. */ while (cur_logical <= stripe_end) { - ret = find_first_extent_item(extent_root, &path, cur_logical, + ret = find_first_extent_item(extent_root, extent_path, cur_logical, stripe_end - cur_logical + 1); if (ret < 0) goto out; @@ -1536,7 +1538,7 @@ static int scrub_find_fill_first_stripe( ret = 0; break; } - get_extent_info(&path, &extent_start, &extent_len, + get_extent_info(extent_path, &extent_start, &extent_len, &extent_flags, &extent_gen); if (extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) stripe->nr_meta_extents++; @@ -1576,7 +1578,6 @@ static int scrub_find_fill_first_stripe( } set_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &stripe->state); out: - btrfs_release_path(&path); return ret; }
@@ -1766,8 +1767,9 @@ static int queue_scrub_stripe(struct scr
/* We can queue one stripe using the remaining slot. */ scrub_reset_stripe(stripe); - ret = scrub_find_fill_first_stripe(bg, dev, physical, mirror_num, - logical, length, stripe); + ret = scrub_find_fill_first_stripe(bg, &sctx->extent_path, dev, + physical, mirror_num, logical, + length, stripe); /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; @@ -1785,6 +1787,7 @@ static int scrub_raid56_parity_stripe(st struct btrfs_fs_info *fs_info = sctx->fs_info; struct btrfs_raid_bio *rbio; struct btrfs_io_context *bioc = NULL; + struct btrfs_path extent_path = { 0 }; struct bio *bio; struct scrub_stripe *stripe; bool all_empty = true; @@ -1795,6 +1798,14 @@ static int scrub_raid56_parity_stripe(st
ASSERT(sctx->raid56_data_stripes);
+ /* + * For data stripe search, we cannot re-use the same extent path, as + * the data stripe bytenr may be smaller than previous extent. Thus we + * have to use our own extent path. + */ + extent_path.search_commit_root = 1; + extent_path.skip_locking = 1; + for (int i = 0; i < data_stripes; i++) { int stripe_index; int rot; @@ -1809,7 +1820,7 @@ static int scrub_raid56_parity_stripe(st
scrub_reset_stripe(stripe); set_bit(SCRUB_STRIPE_FLAG_NO_REPORT, &stripe->state); - ret = scrub_find_fill_first_stripe(bg, + ret = scrub_find_fill_first_stripe(bg, &extent_path, map->stripes[stripe_index].dev, physical, 1, full_stripe_start + btrfs_stripe_nr_to_offset(i), BTRFS_STRIPE_LEN, stripe); @@ -1937,6 +1948,7 @@ static int scrub_raid56_parity_stripe(st bio_put(bio); btrfs_bio_counter_dec(fs_info);
+ btrfs_release_path(&extent_path); out: return ret; } @@ -2109,6 +2121,9 @@ static noinline_for_stack int scrub_stri u64 stripe_logical; int stop_loop = 0;
+ /* Extent_path should be released by now. */ + ASSERT(sctx->extent_path.nodes[0] == NULL); + scrub_blocked_if_needed(fs_info);
if (sctx->is_dev_replace && @@ -2227,6 +2242,8 @@ out: ret2 = flush_scrub_stripes(sctx); if (!ret) ret = ret2; + btrfs_release_path(&sctx->extent_path); + if (sctx->raid56_data_stripes) { for (int i = 0; i < nr_data_stripes(map); i++) release_scrub_stripe(&sctx->raid56_data_stripes[i]);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
commit 3c771c194402ffe20d4de68d9fc21e703179a9ce upstream.
One of the bottleneck of the new scrub code is the extra csum tree search.
The old code would only do the csum tree search for each scrub bio, which can be as large as 512KiB, thus they can afford to allocate a new path each time.
But the new scrub code is doing csum tree search for each stripe, which is only 64KiB, this means we'd better re-use the same csum path during each search.
This patch would introduce a per-sctx path for csum tree search, as we don't need to re-allocate the path every time we need to do a csum tree search.
With this change we can further improve the queue depth and improve the scrub read performance:
Before (with regression and cached extent tree path):
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 15875.00 1013328.00 12.00 0.08 0.08 63.83 1.35 100.00
After (with both cached extent/csum tree path):
nvme0n1p3 17759.00 1133280.00 10.00 0.06 0.08 63.81 1.50 100.00
Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure") CC: stable@vger.kernel.org # 6.4+ Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/file-item.c | 36 +++++++++++++++++++++++------------- fs/btrfs/file-item.h | 6 +++--- fs/btrfs/raid56.c | 4 ++-- fs/btrfs/scrub.c | 29 +++++++++++++++++++---------- 4 files changed, 47 insertions(+), 28 deletions(-)
--- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -597,29 +597,37 @@ fail: * Each bit represents a sector. Thus caller should ensure @csum_buf passed * in is large enough to contain all csums. */ -int btrfs_lookup_csums_bitmap(struct btrfs_root *root, u64 start, u64 end, - u8 *csum_buf, unsigned long *csum_bitmap, - bool search_commit) +int btrfs_lookup_csums_bitmap(struct btrfs_root *root, struct btrfs_path *path, + u64 start, u64 end, u8 *csum_buf, + unsigned long *csum_bitmap) { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_key key; - struct btrfs_path *path; struct extent_buffer *leaf; struct btrfs_csum_item *item; const u64 orig_start = start; + bool free_path = false; int ret;
ASSERT(IS_ALIGNED(start, fs_info->sectorsize) && IS_ALIGNED(end + 1, fs_info->sectorsize));
- path = btrfs_alloc_path(); - if (!path) - return -ENOMEM; - - if (search_commit) { - path->skip_locking = 1; - path->reada = READA_FORWARD; - path->search_commit_root = 1; + if (!path) { + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + free_path = true; + } + + /* Check if we can reuse the previous path. */ + if (path->nodes[0]) { + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); + + if (key.objectid == BTRFS_EXTENT_CSUM_OBJECTID && + key.type == BTRFS_EXTENT_CSUM_KEY && + key.offset <= start) + goto search_forward; + btrfs_release_path(path); }
key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; @@ -656,6 +664,7 @@ int btrfs_lookup_csums_bitmap(struct btr } }
+search_forward: while (start <= end) { u64 csum_end;
@@ -712,7 +721,8 @@ int btrfs_lookup_csums_bitmap(struct btr } ret = 0; fail: - btrfs_free_path(path); + if (free_path) + btrfs_free_path(path); return ret; }
--- a/fs/btrfs/file-item.h +++ b/fs/btrfs/file-item.h @@ -57,9 +57,9 @@ int btrfs_lookup_csums_range(struct btrf int btrfs_lookup_csums_list(struct btrfs_root *root, u64 start, u64 end, struct list_head *list, int search_commit, bool nowait); -int btrfs_lookup_csums_bitmap(struct btrfs_root *root, u64 start, u64 end, - u8 *csum_buf, unsigned long *csum_bitmap, - bool search_commit); +int btrfs_lookup_csums_bitmap(struct btrfs_root *root, struct btrfs_path *path, + u64 start, u64 end, u8 *csum_buf, + unsigned long *csum_bitmap); void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode, const struct btrfs_path *path, struct btrfs_file_extent_item *fi, --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -2112,8 +2112,8 @@ static void fill_data_csums(struct btrfs goto error; }
- ret = btrfs_lookup_csums_bitmap(csum_root, start, start + len - 1, - rbio->csum_buf, rbio->csum_bitmap, false); + ret = btrfs_lookup_csums_bitmap(csum_root, NULL, start, start + len - 1, + rbio->csum_buf, rbio->csum_bitmap); if (ret < 0) goto error; if (bitmap_empty(rbio->csum_bitmap, len >> fs_info->sectorsize_bits)) --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -176,6 +176,7 @@ struct scrub_ctx { struct scrub_stripe *raid56_data_stripes; struct btrfs_fs_info *fs_info; struct btrfs_path extent_path; + struct btrfs_path csum_path; int first_free; int cur_stripe; atomic_t cancel_req; @@ -342,6 +343,8 @@ static noinline_for_stack struct scrub_c sctx->fs_info = fs_info; sctx->extent_path.search_commit_root = 1; sctx->extent_path.skip_locking = 1; + sctx->csum_path.search_commit_root = 1; + sctx->csum_path.skip_locking = 1; for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++) { int ret;
@@ -1472,6 +1475,7 @@ static void scrub_stripe_reset_bitmaps(s */ static int scrub_find_fill_first_stripe(struct btrfs_block_group *bg, struct btrfs_path *extent_path, + struct btrfs_path *csum_path, struct btrfs_device *dev, u64 physical, int mirror_num, u64 logical_start, u32 logical_len, @@ -1563,9 +1567,9 @@ static int scrub_find_fill_first_stripe( */ ASSERT(BITS_PER_LONG >= BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits);
- ret = btrfs_lookup_csums_bitmap(csum_root, stripe->logical, - stripe_end, stripe->csums, - &csum_bitmap, true); + ret = btrfs_lookup_csums_bitmap(csum_root, csum_path, + stripe->logical, stripe_end, + stripe->csums, &csum_bitmap); if (ret < 0) goto out; if (ret > 0) @@ -1767,9 +1771,9 @@ static int queue_scrub_stripe(struct scr
/* We can queue one stripe using the remaining slot. */ scrub_reset_stripe(stripe); - ret = scrub_find_fill_first_stripe(bg, &sctx->extent_path, dev, - physical, mirror_num, logical, - length, stripe); + ret = scrub_find_fill_first_stripe(bg, &sctx->extent_path, + &sctx->csum_path, dev, physical, + mirror_num, logical, length, stripe); /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; @@ -1788,6 +1792,7 @@ static int scrub_raid56_parity_stripe(st struct btrfs_raid_bio *rbio; struct btrfs_io_context *bioc = NULL; struct btrfs_path extent_path = { 0 }; + struct btrfs_path csum_path = { 0 }; struct bio *bio; struct scrub_stripe *stripe; bool all_empty = true; @@ -1799,12 +1804,14 @@ static int scrub_raid56_parity_stripe(st ASSERT(sctx->raid56_data_stripes);
/* - * For data stripe search, we cannot re-use the same extent path, as - * the data stripe bytenr may be smaller than previous extent. Thus we - * have to use our own extent path. + * For data stripe search, we cannot re-use the same extent/csum paths, + * as the data stripe bytenr may be smaller than previous extent. Thus + * we have to use our own extent/csum paths. */ extent_path.search_commit_root = 1; extent_path.skip_locking = 1; + csum_path.search_commit_root = 1; + csum_path.skip_locking = 1;
for (int i = 0; i < data_stripes; i++) { int stripe_index; @@ -1820,7 +1827,7 @@ static int scrub_raid56_parity_stripe(st
scrub_reset_stripe(stripe); set_bit(SCRUB_STRIPE_FLAG_NO_REPORT, &stripe->state); - ret = scrub_find_fill_first_stripe(bg, &extent_path, + ret = scrub_find_fill_first_stripe(bg, &extent_path, &csum_path, map->stripes[stripe_index].dev, physical, 1, full_stripe_start + btrfs_stripe_nr_to_offset(i), BTRFS_STRIPE_LEN, stripe); @@ -1949,6 +1956,7 @@ static int scrub_raid56_parity_stripe(st btrfs_bio_counter_dec(fs_info);
btrfs_release_path(&extent_path); + btrfs_release_path(&csum_path); out: return ret; } @@ -2243,6 +2251,7 @@ out: if (!ret) ret = ret2; btrfs_release_path(&sctx->extent_path); + btrfs_release_path(&sctx->csum_path);
if (sctx->raid56_data_stripes) { for (int i = 0; i < nr_data_stripes(map); i++)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
commit ae76d8e3e1351aa1ba09cc68dab6866d356f2e17 upstream.
[REGRESSION] There are several regression reports about the scrub performance with v6.4 kernel.
On a PCIe 3.0 device, the old v6.3 kernel can go 3GB/s scrub speed, but v6.4 can only go 1GB/s, an obvious 66% performance drop.
[CAUSE] Iostat shows a very different behavior between v6.3 and v6.4 kernel:
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 9731.00 3425544.00 17237.00 63.92 2.18 352.02 21.18 100.00 nvme0n1p3 15578.00 993616.00 5.00 0.03 0.09 63.78 1.32 100.00
The upper one is v6.3 while the lower one is v6.4.
There are several obvious differences:
- Very few read merges This turns out to be a behavior change that we no longer do bio plug/unplug.
- Very low aqu-sz This is due to the submit-and-wait behavior of flush_scrub_stripes(), and extra extent/csum tree search.
Both behaviors are not that obvious on SATA SSDs, as SATA SSDs have NCQ to merge the reads, while SATA SSDs can not handle high queue depth well either.
[FIX] For now this patch focuses on the read speed fix. Dev-replace replace speed needs more work.
For the read part, we go two directions to fix the problems:
- Re-introduce blk plug/unplug to merge read requests This is pretty simple, and the behavior is pretty easy to observe.
This would enlarge the average read request size to 512K.
- Introduce multi-group reads and no longer wait for each group Instead of the old behavior, which submits 8 stripes and waits for them, here we would enlarge the total number of stripes to 16 * 8. Which is 8M per device, the same limit as the old scrub in-flight bios size limit.
Now every time we fill a group (8 stripes), we submit them and continue to next stripes.
Only when the full 16 * 8 stripes are all filled, we submit the remaining ones (the last group), and wait for all groups to finish. Then submit the repair writes and dev-replace writes.
This should enlarge the queue depth.
This would greatly improve the merge rate (thus read block size) and queue depth:
Before (with regression, and cached extent/csum path):
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 20666.00 1318240.00 10.00 0.05 0.08 63.79 1.63 100.00
After (with all patches applied):
nvme0n1p3 5165.00 2278304.00 30557.00 85.54 0.55 441.10 2.81 100.00
i.e. 1287 to 2224 MB/s.
CC: stable@vger.kernel.org # 6.4+ Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/scrub.c | 96 ++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 71 insertions(+), 25 deletions(-)
--- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -43,9 +43,20 @@ struct scrub_ctx; /* * The following value only influences the performance. * - * This determines the batch size for stripe submitted in one go. + * This detemines how many stripes would be submitted in one go, + * which is 512KiB (BTRFS_STRIPE_LEN * SCRUB_STRIPES_PER_GROUP). */ -#define SCRUB_STRIPES_PER_SCTX 8 /* That would be 8 64K stripe per-device. */ +#define SCRUB_STRIPES_PER_GROUP 8 + +/* + * How many groups we have for each sctx. + * + * This would be 8M per device, the same value as the old scrub in-flight bios + * size limit. + */ +#define SCRUB_GROUPS_PER_SCTX 16 + +#define SCRUB_TOTAL_STRIPES (SCRUB_GROUPS_PER_SCTX * SCRUB_STRIPES_PER_GROUP)
/* * The following value times PAGE_SIZE needs to be large enough to match the @@ -172,7 +183,7 @@ struct scrub_stripe { };
struct scrub_ctx { - struct scrub_stripe stripes[SCRUB_STRIPES_PER_SCTX]; + struct scrub_stripe stripes[SCRUB_TOTAL_STRIPES]; struct scrub_stripe *raid56_data_stripes; struct btrfs_fs_info *fs_info; struct btrfs_path extent_path; @@ -317,10 +328,10 @@ static noinline_for_stack void scrub_fre if (!sctx) return;
- for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++) + for (i = 0; i < SCRUB_TOTAL_STRIPES; i++) release_scrub_stripe(&sctx->stripes[i]);
- kfree(sctx); + kvfree(sctx); }
static void scrub_put_ctx(struct scrub_ctx *sctx) @@ -335,7 +346,10 @@ static noinline_for_stack struct scrub_c struct scrub_ctx *sctx; int i;
- sctx = kzalloc(sizeof(*sctx), GFP_KERNEL); + /* Since sctx has inline 128 stripes, it can go beyond 64K easily. Use + * kvzalloc(). + */ + sctx = kvzalloc(sizeof(*sctx), GFP_KERNEL); if (!sctx) goto nomem; refcount_set(&sctx->refs, 1); @@ -345,7 +359,7 @@ static noinline_for_stack struct scrub_c sctx->extent_path.skip_locking = 1; sctx->csum_path.search_commit_root = 1; sctx->csum_path.skip_locking = 1; - for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++) { + for (i = 0; i < SCRUB_TOTAL_STRIPES; i++) { int ret;
ret = init_scrub_stripe(fs_info, &sctx->stripes[i]); @@ -1659,6 +1673,28 @@ static bool stripe_has_metadata_error(st return false; }
+static void submit_initial_group_read(struct scrub_ctx *sctx, + unsigned int first_slot, + unsigned int nr_stripes) +{ + struct blk_plug plug; + + ASSERT(first_slot < SCRUB_TOTAL_STRIPES); + ASSERT(first_slot + nr_stripes <= SCRUB_TOTAL_STRIPES); + + scrub_throttle_dev_io(sctx, sctx->stripes[0].dev, + btrfs_stripe_nr_to_offset(nr_stripes)); + blk_start_plug(&plug); + for (int i = 0; i < nr_stripes; i++) { + struct scrub_stripe *stripe = &sctx->stripes[first_slot + i]; + + /* Those stripes should be initialized. */ + ASSERT(test_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &stripe->state)); + scrub_submit_initial_read(sctx, stripe); + } + blk_finish_plug(&plug); +} + static int flush_scrub_stripes(struct scrub_ctx *sctx) { struct btrfs_fs_info *fs_info = sctx->fs_info; @@ -1671,11 +1707,11 @@ static int flush_scrub_stripes(struct sc
ASSERT(test_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &sctx->stripes[0].state));
- scrub_throttle_dev_io(sctx, sctx->stripes[0].dev, - btrfs_stripe_nr_to_offset(nr_stripes)); - for (int i = 0; i < nr_stripes; i++) { - stripe = &sctx->stripes[i]; - scrub_submit_initial_read(sctx, stripe); + /* Submit the stripes which are populated but not submitted. */ + if (nr_stripes % SCRUB_STRIPES_PER_GROUP) { + const int first_slot = round_down(nr_stripes, SCRUB_STRIPES_PER_GROUP); + + submit_initial_group_read(sctx, first_slot, nr_stripes - first_slot); }
for (int i = 0; i < nr_stripes; i++) { @@ -1755,21 +1791,19 @@ static void raid56_scrub_wait_endio(stru
static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *bg, struct btrfs_device *dev, int mirror_num, - u64 logical, u32 length, u64 physical) + u64 logical, u32 length, u64 physical, + u64 *found_logical_ret) { struct scrub_stripe *stripe; int ret;
- /* No available slot, submit all stripes and wait for them. */ - if (sctx->cur_stripe >= SCRUB_STRIPES_PER_SCTX) { - ret = flush_scrub_stripes(sctx); - if (ret < 0) - return ret; - } + /* + * There should always be one slot left, as caller filling the last + * slot should flush them all. + */ + ASSERT(sctx->cur_stripe < SCRUB_TOTAL_STRIPES);
stripe = &sctx->stripes[sctx->cur_stripe]; - - /* We can queue one stripe using the remaining slot. */ scrub_reset_stripe(stripe); ret = scrub_find_fill_first_stripe(bg, &sctx->extent_path, &sctx->csum_path, dev, physical, @@ -1777,7 +1811,20 @@ static int queue_scrub_stripe(struct scr /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; + if (found_logical_ret) + *found_logical_ret = stripe->logical; sctx->cur_stripe++; + + /* We filled one group, submit it. */ + if (sctx->cur_stripe % SCRUB_STRIPES_PER_GROUP == 0) { + const int first_slot = sctx->cur_stripe - SCRUB_STRIPES_PER_GROUP; + + submit_initial_group_read(sctx, first_slot, SCRUB_STRIPES_PER_GROUP); + } + + /* Last slot used, flush them all. */ + if (sctx->cur_stripe == SCRUB_TOTAL_STRIPES) + return flush_scrub_stripes(sctx); return 0; }
@@ -1990,6 +2037,7 @@ static int scrub_simple_mirror(struct sc path.skip_locking = 1; /* Go through each extent items inside the logical range */ while (cur_logical < logical_end) { + u64 found_logical; u64 cur_physical = physical + cur_logical - logical_start;
/* Canceled? */ @@ -2014,7 +2062,7 @@ static int scrub_simple_mirror(struct sc
ret = queue_scrub_stripe(sctx, bg, device, mirror_num, cur_logical, logical_end - cur_logical, - cur_physical); + cur_physical, &found_logical); if (ret > 0) { /* No more extent, just update the accounting */ sctx->stat.last_physical = physical + logical_length; @@ -2024,9 +2072,7 @@ static int scrub_simple_mirror(struct sc if (ret < 0) break;
- ASSERT(sctx->cur_stripe > 0); - cur_logical = sctx->stripes[sctx->cur_stripe - 1].logical - + BTRFS_STRIPE_LEN; + cur_logical = found_logical + BTRFS_STRIPE_LEN;
/* Don't hold CPU for too long time */ cond_resched();
'btrfs: scrub: fix grouping of read IO' seems to intorduce a -Wmaybe-uninitialized warning (which becomes fatal with the kernel's passed -Werror=...) with 6.5.9:
``` # CC fs/btrfs/scrub.o x86_64-pc-linux-gnu-gcc -Wp,-MMD,fs/btrfs/.scrub.o.d -nostdinc -I/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/arch/x86/include -I./arch/x86/include/generated -I/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/include -I./include -I/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/include/uapi -I./include/generated/uapi -include /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/include/linux/compiler-version.h -include /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/include/linux/kconfig.h -include /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/include/linux/compiler_types.h -D__KERNEL__ -Werror -fmacro-prefix-map=/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/= -std=gnu11 -fshort-wchar -funsigned-char -fno-common -fno-PIE -fno-strict-aliasing -Wall -Wundef - Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Werror=strict-prototypes -Wno-format-security -Wno-trigraphs -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=branch -fno-jump-tables -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -march=core2 -mno-red-zone -mcmodel=kernel -Wno-sign-compare -fno-asynchronous-unwind-tables -mindirect-branch=thunk-extern -mindirect-branch-register -mindirect-branch-cs-prefix -mfunction-return=thunk-extern -fno-jump-tables -mharden-sls=all -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -Wno-address-of-packed-member -O2 -fno-allow-store-data-races -Wframe-larger-than=2048 -fstack-protector-strong -Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-dangling-pointer -fomit-frame-pointer -ftrivial-auto-var-init=zero -fno-stack-clash-protection -fzero-call-used-regs=used-gpr -falign- functions=16 -Wvla -Wno-pointer-sign -Wcast-function-type -fstrict-flex-arrays=3 -Wno-stringop-truncation -Wno-stringop-overflow -Wno-restrict -Wno-maybe-uninitialized -Wno-array-bounds -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 -fno-strict-overflow -fno-stack-check -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -Wno-packed-not-aligned -DRANDSTRUCT -fplugin=./scripts/gcc-plugins/randomize_layout_plugin.so -fplugin-arg-randomize_layout_plugin-performance-mode -fplugin=./scripts/gcc-plugins/latent_entropy_plugin.so -fplugin=./scripts/gcc-plugins/structleak_plugin.so -fplugin=./scripts/gcc-plugins/stackleak_plugin.so -DLATENT_ENTROPY_PLUGIN -DSTRUCTLEAK_PLUGIN -DSTACKLEAK_PLUGIN -fplugin-arg-stackleak_plugin-track-min-size=100 -fplugin-arg-stackleak_plugin-arch=x86 -Wextra -Wunused -Wno-unused-parameter -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wmissing-include-dirs -Wunused-bu t-set-variable -Wunused-const-variable -Wpacked-not-aligned -Wstringop-truncation -Wmaybe-uninitialized -Wno-missing-field-initializers -Wno-sign-compare -Wno-type-limits -Wno-shift-negative-value -I /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs -I ./fs/btrfs -DKBUILD_MODFILE='"fs/btrfs/btrfs"' -DKBUILD_BASENAME='"scrub"' -DKBUILD_MODNAME='"btrfs"' -D__KBUILD_MODNAME=kmod_btrfs -c -o fs/btrfs/scrub.o /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c: In function ‘scrub_simple_mirror.isra’: /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2075:29: error: ‘found_logical’ may be used uninitialized [-Werror=maybe-uninitialized[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uniniti...]] 2075 | cur_logical = found_logical + BTRFS_STRIPE_LEN; /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2040:21: note: ‘found_logical’ was declared here 2040 | u64 found_logical; | ^~~~~~~~~~~~~ ```
On 2023-10-26 15:31, Sam James wrote:
'btrfs: scrub: fix grouping of read IO' seems to intorduce a -Wmaybe-uninitialized warning (which becomes fatal with the kernel's passed -Werror=...) with 6.5.9:
/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c: In function ‘scrub_simple_mirror.isra’: /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2075:29: error: ‘found_logical’ may be used uninitialized [-Werror=maybe-uninitialized[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized]] 2075 | cur_logical = found_logical + BTRFS_STRIPE_LEN; /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2040:21: note: ‘found_logical’ was declared here 2040 | u64 found_logical; | ^~~~~~~~~~~~~
Good find! found_logical is passed by reference to queue_scrub_stripe(..) (inlined) where it is used without ever being set:
... /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical; sctx->cur_stripe++; ...
Something is missing here, and somehow I don't think it's just the top-level initialisation of found_logical.
-h
On 2023/10/27 00:30, Holger Hoffstätte wrote:
On 2023-10-26 15:31, Sam James wrote:
'btrfs: scrub: fix grouping of read IO' seems to intorduce a -Wmaybe-uninitialized warning (which becomes fatal with the kernel's passed -Werror=...) with 6.5.9:
/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c: In function ‘scrub_simple_mirror.isra’: /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2075:29: error: ‘found_logical’ may be used uninitialized [-Werror=maybe-uninitialized[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized]] 2075 | cur_logical = found_logical + BTRFS_STRIPE_LEN; /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2040:21: note: ‘found_logical’ was declared here 2040 | u64 found_logical; | ^~~~~~~~~~~~~
Good find! found_logical is passed by reference to queue_scrub_stripe(..) (inlined) where it is used without ever being set:
... /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical; sctx->cur_stripe++; ...
Something is missing here, and somehow I don't think it's just the top-level initialisation of found_logical.
This looks like a false alert for me.
@found_logical is intentionally uninitialized to catch any uninitialized usage by compiler.
It would be set by queue_scrub_stripe() when there is any stripe found.
If there is no stripe found, queue_scrub_stripe() would return >0, then we know there is no more extent and break the loop. If there is any error, we error out too, thus no problem with that.
So to me this looks like a false alert.
Mind to share the compiler and its version?
Thanks, Qu
-h
Qu Wenruo wqu@suse.com writes:
On 2023/10/27 00:30, Holger Hoffstätte wrote:
On 2023-10-26 15:31, Sam James wrote:
'btrfs: scrub: fix grouping of read IO' seems to intorduce a -Wmaybe-uninitialized warning (which becomes fatal with the kernel's passed -Werror=...) with 6.5.9:
/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c: In function ‘scrub_simple_mirror.isra’: /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2075:29: error: ‘found_logical’ may be used uninitialized [-Werror=maybe-uninitialized[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized]] 2075 | cur_logical = found_logical + BTRFS_STRIPE_LEN; /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2040:21: note: ‘found_logical’ was declared here 2040 | u64 found_logical; | ^~~~~~~~~~~~~
Good find! found_logical is passed by reference to queue_scrub_stripe(..) (inlined) where it is used without ever being set: ... /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical; sctx->cur_stripe++; ... Something is missing here, and somehow I don't think it's just the top-level initialisation of found_logical.
This looks like a false alert for me.
@found_logical is intentionally uninitialized to catch any uninitialized usage by compiler.
I feel like this sort of thing is going to be inevitable with -Wmaybe-uninitialized.
It would be set by queue_scrub_stripe() when there is any stripe found.
If there is no stripe found, queue_scrub_stripe() would return >0, then we know there is no more extent and break the loop. If there is any error, we error out too, thus no problem with that.
So to me this looks like a false alert.
Mind to share the compiler and its version?
Sure, GCC 14.0.0 20231022 (experimental). Sorry, I'd meant to include that in the original post..
Thanks, Qu
-h
On 2023/10/27 07:42, Sam James wrote:
Qu Wenruo wqu@suse.com writes:
On 2023/10/27 00:30, Holger Hoffstätte wrote:
On 2023-10-26 15:31, Sam James wrote:
'btrfs: scrub: fix grouping of read IO' seems to intorduce a -Wmaybe-uninitialized warning (which becomes fatal with the kernel's passed -Werror=...) with 6.5.9:
/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c: In function ‘scrub_simple_mirror.isra’: /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2075:29: error: ‘found_logical’ may be used uninitialized [-Werror=maybe-uninitialized[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized]] 2075 | cur_logical = found_logical + BTRFS_STRIPE_LEN; /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2040:21: note: ‘found_logical’ was declared here 2040 | u64 found_logical; | ^~~~~~~~~~~~~
Good find! found_logical is passed by reference to queue_scrub_stripe(..) (inlined) where it is used without ever being set: ... /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical; sctx->cur_stripe++; ... Something is missing here, and somehow I don't think it's just the top-level initialisation of found_logical.
This looks like a false alert for me.
@found_logical is intentionally uninitialized to catch any uninitialized usage by compiler.
I feel like this sort of thing is going to be inevitable with -Wmaybe-uninitialized.
Indeed, it can be tricky for compilers to handle.
But it's not that frequent, thus the btrfs community is mostly fine to handle those reports manually.
It would be set by queue_scrub_stripe() when there is any stripe found.
If there is no stripe found, queue_scrub_stripe() would return >0, then we know there is no more extent and break the loop. If there is any error, we error out too, thus no problem with that.
So to me this looks like a false alert.
Mind to share the compiler and its version?
Sure, GCC 14.0.0 20231022 (experimental). Sorry, I'd meant to include that in the original post..
Wow, the first time I can not catch up on the toolchain version...
And considering it's still experimental, I'd say if we got another report on this particular call site, we may consider add a fix for it, by simply initializing @found_logical to zero.
Thanks, Qu
Thanks, Qu
-h
On 2023-10-26 23:01, Qu Wenruo wrote:
On 2023/10/27 00:30, Holger Hoffstätte wrote:
On 2023-10-26 15:31, Sam James wrote:
'btrfs: scrub: fix grouping of read IO' seems to intorduce a -Wmaybe-uninitialized warning (which becomes fatal with the kernel's passed -Werror=...) with 6.5.9:
/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c: In function ‘scrub_simple_mirror.isra’: /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2075:29: error: ‘found_logical’ may be used uninitialized [-Werror=maybe-uninitialized[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized]] 2075 | cur_logical = found_logical + BTRFS_STRIPE_LEN; /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2040:21: note: ‘found_logical’ was declared here 2040 | u64 found_logical; | ^~~~~~~~~~~~~
Good find! found_logical is passed by reference to queue_scrub_stripe(..) (inlined) where it is used without ever being set:
... /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical; sctx->cur_stripe++; ...
Something is missing here, and somehow I don't think it's just the top-level initialisation of found_logical.
This looks like a false alert for me.
@found_logical is intentionally uninitialized to catch any uninitialized usage by compiler.
It would be set by queue_scrub_stripe() when there is any stripe found.
Can you show me where the reference is set before the quoted if block? The warning happens because according to the compiler this is exactly not what's happening, and I did not see any initializing write either.
thanks, Holger
On 2023/10/27 17:25, Holger Hoffstätte wrote:
On 2023-10-26 23:01, Qu Wenruo wrote:
On 2023/10/27 00:30, Holger Hoffstätte wrote:
On 2023-10-26 15:31, Sam James wrote:
'btrfs: scrub: fix grouping of read IO' seems to intorduce a -Wmaybe-uninitialized warning (which becomes fatal with the kernel's passed -Werror=...) with 6.5.9:
/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c: In function ‘scrub_simple_mirror.isra’: /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2075:29: error: ‘found_logical’ may be used uninitialized [-Werror=maybe-uninitialized[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized]] 2075 | cur_logical = found_logical + BTRFS_STRIPE_LEN; /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2040:21: note: ‘found_logical’ was declared here 2040 | u64 found_logical; | ^~~~~~~~~~~~~
Good find! found_logical is passed by reference to queue_scrub_stripe(..) (inlined) where it is used without ever being set:
... /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical; sctx->cur_stripe++; ...
Something is missing here, and somehow I don't think it's just the top-level initialisation of found_logical.
This looks like a false alert for me.
@found_logical is intentionally uninitialized to catch any uninitialized usage by compiler.
It would be set by queue_scrub_stripe() when there is any stripe found.
Can you show me where the reference is set before the quoted if block?
Sure.
Firstly inside queue_scrub_stripe():
``` ret = scrub_find_fill_first_stripe(bg, &sctx->extent_path, &sctx->csum_path, dev, physical, mirror_num, logical, length, stripe); /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical; ```
In this case, we would only set @found_logical_ret to the found stripe's logical.
Either we got ret > 0, meaning no more extent in the chunk, or we got some critical error.
Then back to scrub_simple_mirrors():
``` ret = queue_scrub_stripe(sctx, bg, device, mirror_num, cur_logical, logical_end - cur_logical, cur_physical, &found_logical); if (ret > 0) { /* No more extent, just update the accounting */ sctx->stat.last_physical = physical + logical_length; ret = 0; break; } if (ret < 0) break;
cur_logical = found_logical + BTRFS_STRIPE_LEN; ```
We got to the if block I mentioned.
Either we got ret != 0, then we would break out the whole loop of scrub_simple_mirror(). Or we got ret == 0, which would initialize @found_logical.
Thanks, Qu
The warning happens because according to the compiler this is exactly not what's happening, and I did not see any initializing write either.
thanks, Holger
On 2023/10/27 17:30, Qu Wenruo wrote:
On 2023/10/27 17:25, Holger Hoffstätte wrote:
On 2023-10-26 23:01, Qu Wenruo wrote:
On 2023/10/27 00:30, Holger Hoffstätte wrote:
On 2023-10-26 15:31, Sam James wrote:
'btrfs: scrub: fix grouping of read IO' seems to intorduce a -Wmaybe-uninitialized warning (which becomes fatal with the kernel's passed -Werror=...) with 6.5.9:
/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c: In function ‘scrub_simple_mirror.isra’: /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2075:29: error: ‘found_logical’ may be used uninitialized [-Werror=maybe-uninitialized[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized]] 2075 | cur_logical = found_logical + BTRFS_STRIPE_LEN; /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2040:21: note: ‘found_logical’ was declared here 2040 | u64 found_logical; | ^~~~~~~~~~~~~
Good find! found_logical is passed by reference to queue_scrub_stripe(..) (inlined) where it is used without ever being set:
... /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical; sctx->cur_stripe++; ...
Something is missing here, and somehow I don't think it's just the top-level initialisation of found_logical.
This looks like a false alert for me.
@found_logical is intentionally uninitialized to catch any uninitialized usage by compiler.
It would be set by queue_scrub_stripe() when there is any stripe found.
Can you show me where the reference is set before the quoted if block?
Sure.
Firstly inside queue_scrub_stripe():
ret = scrub_find_fill_first_stripe(bg, &sctx->extent_path, &sctx->csum_path, dev, physical, mirror_num, logical, length, stripe); /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical;
In this case, we would only set @found_logical_ret to the found stripe's logical.
Either we got ret > 0, meaning no more extent in the chunk, or we got some critical error.
Sorry, missing the latter part of the sentence:
In either case, the caller would handle it.
Then back to scrub_simple_mirrors():
ret = queue_scrub_stripe(sctx, bg, device, mirror_num, cur_logical, logical_end - cur_logical, cur_physical, &found_logical); if (ret > 0) { /* No more extent, just update the accounting */ sctx->stat.last_physical = physical + logical_length; ret = 0; break; } if (ret < 0) break; cur_logical = found_logical + BTRFS_STRIPE_LEN;
We got to the if block I mentioned.
Either we got ret != 0, then we would break out the whole loop of scrub_simple_mirror(). Or we got ret == 0, which would initialize @found_logical.
Thanks, Qu
The warning happens because according to the compiler this is exactly not what's happening, and I did not see any initializing write either.
thanks, Holger
On 2023-10-27 09:00, Qu Wenruo wrote:
On 2023/10/27 17:25, Holger Hoffstätte wrote:
On 2023-10-26 23:01, Qu Wenruo wrote:
On 2023/10/27 00:30, Holger Hoffstätte wrote:
On 2023-10-26 15:31, Sam James wrote:
'btrfs: scrub: fix grouping of read IO' seems to intorduce a -Wmaybe-uninitialized warning (which becomes fatal with the kernel's passed -Werror=...) with 6.5.9:
/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c: In function ‘scrub_simple_mirror.isra’: /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2075:29: error: ‘found_logical’ may be used uninitialized [-Werror=maybe-uninitialized[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized]] 2075 | cur_logical = found_logical + BTRFS_STRIPE_LEN; /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2040:21: note: ‘found_logical’ was declared here 2040 | u64 found_logical; | ^~~~~~~~~~~~~
Good find! found_logical is passed by reference to queue_scrub_stripe(..) (inlined) where it is used without ever being set:
... /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical; sctx->cur_stripe++; ...
Something is missing here, and somehow I don't think it's just the top-level initialisation of found_logical.
This looks like a false alert for me.
@found_logical is intentionally uninitialized to catch any uninitialized usage by compiler.
It would be set by queue_scrub_stripe() when there is any stripe found.
Can you show me where the reference is set before the quoted if block?
Sure.
Firstly inside queue_scrub_stripe():
ret = scrub_find_fill_first_stripe(bg, &sctx->extent_path, &sctx->csum_path, dev, physical, mirror_num, logical, length, stripe); /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical;
In this case, we would only set @found_logical_ret to the found stripe's logical.
Either we got ret > 0, meaning no more extent in the chunk, or we got some critical error.
Then back to scrub_simple_mirrors():
I think I now understand the comment. This is a terrible foot gun, and in fact shows that this is not a false alert at all.
Within queue_scrub_stripe() you are "possibly" using (checking) a reference that has not been initialized, neither in the caller nor within queue_scrub_stripe(). To the compiler this reference may not exist yet (even if we know that it is on the stack..) and point to la-la land. The fact that the logic depends on ret always being != 0 is not visible to the compiler, so the warning is completely correct: 0 is a valid possible value for ret, so this potentially allows the uninitialized read to happen.
What happens after that is not the point. You cannot safely check an uninitialized reference, even when the control flow "cannot happen" due to a data dependency.
It seems to me the safest way forward is to simply initialize found_logical in the parent and remove the "if (found_logical_ret)" check, since it does nothing useful. Would that work?
thanks, Holger
On 2023/10/27 18:22, Holger Hoffstätte wrote:
On 2023-10-27 09:00, Qu Wenruo wrote:
On 2023/10/27 17:25, Holger Hoffstätte wrote:
On 2023-10-26 23:01, Qu Wenruo wrote:
On 2023/10/27 00:30, Holger Hoffstätte wrote:
On 2023-10-26 15:31, Sam James wrote:
'btrfs: scrub: fix grouping of read IO' seems to intorduce a -Wmaybe-uninitialized warning (which becomes fatal with the kernel's passed -Werror=...) with 6.5.9:
/var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c: In function ‘scrub_simple_mirror.isra’: /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2075:29: error: ‘found_logical’ may be used uninitialized [-Werror=maybe-uninitialized[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized]] 2075 | cur_logical = found_logical + BTRFS_STRIPE_LEN; /var/tmp/portage/sys-kernel/gentoo-kernel-6.5.9/work/linux-6.5/fs/btrfs/scrub.c:2040:21: note: ‘found_logical’ was declared here 2040 | u64 found_logical; | ^~~~~~~~~~~~~
Good find! found_logical is passed by reference to queue_scrub_stripe(..) (inlined) where it is used without ever being set:
... /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical; sctx->cur_stripe++; ...
Something is missing here, and somehow I don't think it's just the top-level initialisation of found_logical.
This looks like a false alert for me.
@found_logical is intentionally uninitialized to catch any uninitialized usage by compiler.
It would be set by queue_scrub_stripe() when there is any stripe found.
Can you show me where the reference is set before the quoted if block?
Sure.
Firstly inside queue_scrub_stripe():
ret = scrub_find_fill_first_stripe(bg, &sctx->extent_path, &sctx->csum_path, dev, physical, mirror_num, logical, length, stripe); /* Either >0 as no more extents or <0 for error. */ if (ret) return ret; if (found_logical_ret) *found_logical_ret = stripe->logical;
In this case, we would only set @found_logical_ret to the found stripe's logical.
Either we got ret > 0, meaning no more extent in the chunk, or we got some critical error.
Then back to scrub_simple_mirrors():
I think I now understand the comment. This is a terrible foot gun, and in fact shows that this is not a false alert at all.
Within queue_scrub_stripe() you are "possibly" using (checking) a reference that has not been initialized, neither in the caller nor within queue_scrub_stripe(). To the compiler this reference may not exist yet (even if we know that it is on the stack..) and point to la-la land. The fact that the logic depends on ret always being != 0 is not visible to the compiler, so the warning is completely correct: 0 is a valid possible value for ret, so this potentially allows the uninitialized read to happen.
What happens after that is not the point. You cannot safely check an uninitialized reference, even when the control flow "cannot happen" due to a data dependency.
It seems to me the safest way forward is to simply initialize found_logical in the parent and remove the "if (found_logical_ret)" check, since it does nothing useful. Would that work?
Sure, that sounds pretty reasonable.
The "if (found_logical_ret)" is totally duplicated, if following the regular btrfs way, I'd convert it to an "ASSERT(found_logical_ret);" at the beginning of the function queue_scrub_stripe().
In that case, I'm totally fine to submit a patch to fix this warning.
Thanks for the report, Qu
thanks, Holger
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liu Ying victor.liu@nxp.com
commit aa656d48e871a1b062e1bbf9474d8b831c35074c upstream.
When disabling overlay plane in mxsfb_plane_overlay_atomic_update(), overlay plane's framebuffer pointer is NULL. So, dereferencing it would cause a kernel Oops(NULL pointer dereferencing). Fix the issue by disabling overlay plane in mxsfb_plane_overlay_atomic_disable() instead.
Fixes: cb285a5348e7 ("drm: mxsfb: Replace mxsfb_get_fb_paddr() with drm_fb_cma_get_gem_addr()") Cc: stable@vger.kernel.org # 5.19+ Signed-off-by: Liu Ying victor.liu@nxp.com Reviewed-by: Marek Vasut marex@denx.de Signed-off-by: Marek Vasut marex@denx.de Link: https://patchwork.freedesktop.org/patch/msgid/20230612092359.784115-1-victor... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/mxsfb/mxsfb_kms.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/gpu/drm/mxsfb/mxsfb_kms.c +++ b/drivers/gpu/drm/mxsfb/mxsfb_kms.c @@ -611,6 +611,14 @@ static void mxsfb_plane_overlay_atomic_u writel(ctrl, mxsfb->base + LCDC_AS_CTRL); }
+static void mxsfb_plane_overlay_atomic_disable(struct drm_plane *plane, + struct drm_atomic_state *state) +{ + struct mxsfb_drm_private *mxsfb = to_mxsfb_drm_private(plane->dev); + + writel(0, mxsfb->base + LCDC_AS_CTRL); +} + static bool mxsfb_format_mod_supported(struct drm_plane *plane, uint32_t format, uint64_t modifier) @@ -626,6 +634,7 @@ static const struct drm_plane_helper_fun static const struct drm_plane_helper_funcs mxsfb_plane_overlay_helper_funcs = { .atomic_check = mxsfb_plane_atomic_check, .atomic_update = mxsfb_plane_overlay_atomic_update, + .atomic_disable = mxsfb_plane_overlay_atomic_disable, };
static const struct drm_plane_funcs mxsfb_plane_funcs = {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: William Zhang william.zhang@broadcom.com
commit e66dd317194daae0475fe9e5577c80aa97f16cb9 upstream.
When executing a NAND command within the panic write path, wait for any pending command instead of calling BUG_ON to avoid crashing while already crashing.
Fixes: 27c5b17cd1b1 ("mtd: nand: add NAND driver "library" for Broadcom STB NAND controller") Signed-off-by: William Zhang william.zhang@broadcom.com Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Reviewed-by: Kursad Oney kursad.oney@broadcom.com Reviewed-by: Kamal Dasu kamal.dasu@broadcom.com Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20230706182909.79151-4-william.zhang@broad... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/brcmnand/brcmnand.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c +++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c @@ -1592,7 +1592,17 @@ static void brcmnand_send_cmd(struct brc
dev_dbg(ctrl->dev, "send native cmd %d addr 0x%llx\n", cmd, cmd_addr);
- BUG_ON(ctrl->cmd_pending != 0); + /* + * If we came here through _panic_write and there is a pending + * command, try to wait for it. If it times out, rather than + * hitting BUG_ON, just return so we don't crash while crashing. + */ + if (oops_in_progress) { + if (ctrl->cmd_pending && + bcmnand_ctrl_poll_status(ctrl, NAND_CTRL_RDY, NAND_CTRL_RDY, 0)) + return; + } else + BUG_ON(ctrl->cmd_pending != 0); ctrl->cmd_pending = cmd;
ret = bcmnand_ctrl_poll_status(ctrl, NAND_CTRL_RDY, NAND_CTRL_RDY, 0);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: William Zhang william.zhang@broadcom.com
commit 5d53244186c9ac58cb88d76a0958ca55b83a15cd upstream.
When the oob buffer length is not in multiple of words, the oob write function does out-of-bounds read on the oob source buffer at the last iteration. Fix that by always checking length limit on the oob buffer read and fill with 0xff when reaching the end of the buffer to the oob registers.
Fixes: 27c5b17cd1b1 ("mtd: nand: add NAND driver "library" for Broadcom STB NAND controller") Signed-off-by: William Zhang william.zhang@broadcom.com Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20230706182909.79151-5-william.zhang@broad... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/brcmnand/brcmnand.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c +++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c @@ -1461,19 +1461,33 @@ static int write_oob_to_regs(struct brcm const u8 *oob, int sas, int sector_1k) { int tbytes = sas << sector_1k; - int j; + int j, k = 0; + u32 last = 0xffffffff; + u8 *plast = (u8 *)&last;
/* Adjust OOB values for 1K sector size */ if (sector_1k && (i & 0x01)) tbytes = max(0, tbytes - (int)ctrl->max_oob); tbytes = min_t(int, tbytes, ctrl->max_oob);
- for (j = 0; j < tbytes; j += 4) + /* + * tbytes may not be multiple of words. Make sure we don't read out of + * the boundary and stop at last word. + */ + for (j = 0; (j + 3) < tbytes; j += 4) oob_reg_write(ctrl, j, (oob[j + 0] << 24) | (oob[j + 1] << 16) | (oob[j + 2] << 8) | (oob[j + 3] << 0)); + + /* handle the remaing bytes */ + while (j < tbytes) + plast[k++] = oob[j++]; + + if (tbytes & 0x3) + oob_reg_write(ctrl, (tbytes & ~0x3), (__force u32)cpu_to_be32(last)); + return tbytes; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Walleij linus.walleij@linaro.org
commit 83e824a4a595132f9bd7ac4f5afff857bfc5991e upstream.
The Winbond "w25q128" (actual vendor name W25Q128JV) has exactly the same flags as the sibling device "w25q128jv". The devices both require unlocking to enable write access.
The actual product naming between devices vs the Linux strings in winbond.c:
0xef4018: "w25q128" W25Q128JV-IN/IQ/JQ 0xef7018: "w25q128jv" W25Q128JV-IM/JM
The latter device, "w25q128jv" supports features named DTQ and QPI, otherwise it is the same.
Not having the right flags has the annoying side effect that write access does not work.
After this patch I can write to the flash on the Inteno XG6846 router.
The flash memory also supports dual and quad SPI modes. This does not currently manifest, but by turning on SFDP parsing, the right SPI modes are emitted in /sys/kernel/debug/spi-nor/spi1.0/capabilities for this chip, so we also turn on this.
Since we now have determined that SFDP parsing works on the device, we also detect the geometry using SFDP.
After this dmesg and sysfs says: [ 1.062401] spi-nor spi1.0: w25q128 (16384 Kbytes) cat erasesize 65536 (16384*1024)/65536 = 256 sectors
spi-nor sysfs: cat jedec_id ef4018 cat manufacturer winbond cat partname w25q128 hexdump -v -C sfdp 00000000 53 46 44 50 05 01 00 ff 00 05 01 10 80 00 00 ff 00000010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00000030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00000040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00000050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00000060 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00000070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00000080 e5 20 f9 ff ff ff ff 07 44 eb 08 6b 08 3b 42 bb 00000090 fe ff ff ff ff ff 00 00 ff ff 40 eb 0c 20 0f 52 000000a0 10 d8 00 00 36 02 a6 00 82 ea 14 c9 e9 63 76 33 000000b0 7a 75 7a 75 f7 a2 d5 5c 19 f7 4d ff e9 30 f8 80
Cc: stable@vger.kernel.org Suggested-by: Michael Walle michael@walle.cc Reviewed-by: Michael Walle michael@walle.cc Signed-off-by: Linus Walleij linus.walleij@linaro.org Link: https://lore.kernel.org/r/20230718-spi-nor-winbond-w25q128-v5-1-a73653ee46c3... Signed-off-by: Tudor Ambarus tudor.ambarus@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/spi-nor/winbond.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/mtd/spi-nor/winbond.c +++ b/drivers/mtd/spi-nor/winbond.c @@ -120,8 +120,9 @@ static const struct flash_info winbond_n NO_SFDP_FLAGS(SECT_4K) }, { "w25q80bl", INFO(0xef4014, 0, 64 * 1024, 16) NO_SFDP_FLAGS(SECT_4K) }, - { "w25q128", INFO(0xef4018, 0, 64 * 1024, 256) - NO_SFDP_FLAGS(SECT_4K) }, + { "w25q128", INFO(0xef4018, 0, 0, 0) + PARSE_SFDP + FLAGS(SPI_NOR_HAS_LOCK | SPI_NOR_HAS_TB) }, { "w25q256", INFO(0xef4019, 0, 64 * 1024, 512) NO_SFDP_FLAGS(SECT_4K | SPI_NOR_DUAL_READ | SPI_NOR_QUAD_READ) .fixups = &w25q256_fixups },
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: William Zhang william.zhang@broadcom.com
commit 9cc0a598b944816f2968baf2631757f22721b996 upstream.
If system is busy during the command status polling function, the driver may not get the chance to poll the status register till the end of time out and return the premature status. Do a final check after time out happens to ensure reading the correct status.
Fixes: 9d2ee0a60b8b ("mtd: nand: brcmnand: Check flash #WP pin status before nand erase/program") Signed-off-by: William Zhang william.zhang@broadcom.com Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20230706182909.79151-3-william.zhang@broad... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/brcmnand/brcmnand.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c +++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c @@ -1072,6 +1072,14 @@ static int bcmnand_ctrl_poll_status(stru cpu_relax(); } while (time_after(limit, jiffies));
+ /* + * do a final check after time out in case the CPU was busy and the driver + * did not get enough time to perform the polling to avoid false alarms + */ + val = brcmnand_read_reg(ctrl, BRCMNAND_INTFC_STATUS); + if ((val & mask) == expected_val) + return 0; + dev_warn(ctrl->dev, "timeout on status poll (expected %x got %x)\n", expected_val, val & mask);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: William Zhang william.zhang@broadcom.com
commit 2ec2839a9062db8a592525a3fdabd42dcd9a3a9b upstream.
v7.2 controller has different ECC level field size and shift in the acc control register than its predecessor and successor controller. It needs to be set specifically.
Fixes: decba6d47869 ("mtd: brcmnand: Add v7.2 controller support") Signed-off-by: William Zhang william.zhang@broadcom.com Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20230706182909.79151-2-william.zhang@broad... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/brcmnand/brcmnand.c | 74 +++++++++++++++++-------------- 1 file changed, 41 insertions(+), 33 deletions(-)
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c +++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c @@ -272,6 +272,7 @@ struct brcmnand_controller { const unsigned int *page_sizes; unsigned int page_size_shift; unsigned int max_oob; + u32 ecc_level_shift; u32 features;
/* for low-power standby/resume only */ @@ -596,6 +597,34 @@ enum { INTFC_CTLR_READY = BIT(31), };
+/*********************************************************************** + * NAND ACC CONTROL bitfield + * + * Some bits have remained constant throughout hardware revision, while + * others have shifted around. + ***********************************************************************/ + +/* Constant for all versions (where supported) */ +enum { + /* See BRCMNAND_HAS_CACHE_MODE */ + ACC_CONTROL_CACHE_MODE = BIT(22), + + /* See BRCMNAND_HAS_PREFETCH */ + ACC_CONTROL_PREFETCH = BIT(23), + + ACC_CONTROL_PAGE_HIT = BIT(24), + ACC_CONTROL_WR_PREEMPT = BIT(25), + ACC_CONTROL_PARTIAL_PAGE = BIT(26), + ACC_CONTROL_RD_ERASED = BIT(27), + ACC_CONTROL_FAST_PGM_RDIN = BIT(28), + ACC_CONTROL_WR_ECC = BIT(30), + ACC_CONTROL_RD_ECC = BIT(31), +}; + +#define ACC_CONTROL_ECC_SHIFT 16 +/* Only for v7.2 */ +#define ACC_CONTROL_ECC_EXT_SHIFT 13 + static inline bool brcmnand_non_mmio_ops(struct brcmnand_controller *ctrl) { #if IS_ENABLED(CONFIG_MTD_NAND_BRCMNAND_BCMA) @@ -737,6 +766,12 @@ static int brcmnand_revision_init(struct else if (of_property_read_bool(ctrl->dev->of_node, "brcm,nand-has-wp")) ctrl->features |= BRCMNAND_HAS_WP;
+ /* v7.2 has different ecc level shift in the acc register */ + if (ctrl->nand_version == 0x0702) + ctrl->ecc_level_shift = ACC_CONTROL_ECC_EXT_SHIFT; + else + ctrl->ecc_level_shift = ACC_CONTROL_ECC_SHIFT; + return 0; }
@@ -931,30 +966,6 @@ static inline int brcmnand_cmd_shift(str return 0; }
-/*********************************************************************** - * NAND ACC CONTROL bitfield - * - * Some bits have remained constant throughout hardware revision, while - * others have shifted around. - ***********************************************************************/ - -/* Constant for all versions (where supported) */ -enum { - /* See BRCMNAND_HAS_CACHE_MODE */ - ACC_CONTROL_CACHE_MODE = BIT(22), - - /* See BRCMNAND_HAS_PREFETCH */ - ACC_CONTROL_PREFETCH = BIT(23), - - ACC_CONTROL_PAGE_HIT = BIT(24), - ACC_CONTROL_WR_PREEMPT = BIT(25), - ACC_CONTROL_PARTIAL_PAGE = BIT(26), - ACC_CONTROL_RD_ERASED = BIT(27), - ACC_CONTROL_FAST_PGM_RDIN = BIT(28), - ACC_CONTROL_WR_ECC = BIT(30), - ACC_CONTROL_RD_ECC = BIT(31), -}; - static inline u32 brcmnand_spare_area_mask(struct brcmnand_controller *ctrl) { if (ctrl->nand_version == 0x0702) @@ -967,18 +978,15 @@ static inline u32 brcmnand_spare_area_ma return GENMASK(4, 0); }
-#define NAND_ACC_CONTROL_ECC_SHIFT 16 -#define NAND_ACC_CONTROL_ECC_EXT_SHIFT 13 - static inline u32 brcmnand_ecc_level_mask(struct brcmnand_controller *ctrl) { u32 mask = (ctrl->nand_version >= 0x0600) ? 0x1f : 0x0f;
- mask <<= NAND_ACC_CONTROL_ECC_SHIFT; + mask <<= ACC_CONTROL_ECC_SHIFT;
/* v7.2 includes additional ECC levels */ - if (ctrl->nand_version >= 0x0702) - mask |= 0x7 << NAND_ACC_CONTROL_ECC_EXT_SHIFT; + if (ctrl->nand_version == 0x0702) + mask |= 0x7 << ACC_CONTROL_ECC_EXT_SHIFT;
return mask; } @@ -992,8 +1000,8 @@ static void brcmnand_set_ecc_enabled(str
if (en) { acc_control |= ecc_flags; /* enable RD/WR ECC */ - acc_control |= host->hwcfg.ecc_level - << NAND_ACC_CONTROL_ECC_SHIFT; + acc_control &= ~brcmnand_ecc_level_mask(ctrl); + acc_control |= host->hwcfg.ecc_level << ctrl->ecc_level_shift; } else { acc_control &= ~ecc_flags; /* disable RD/WR ECC */ acc_control &= ~brcmnand_ecc_level_mask(ctrl); @@ -2593,7 +2601,7 @@ static int brcmnand_set_cfg(struct brcmn tmp &= ~brcmnand_ecc_level_mask(ctrl); tmp &= ~brcmnand_spare_area_mask(ctrl); if (ctrl->nand_version >= 0x0302) { - tmp |= cfg->ecc_level << NAND_ACC_CONTROL_ECC_SHIFT; + tmp |= cfg->ecc_level << ctrl->ecc_level_shift; tmp |= cfg->spare_area_size; } nand_writereg(ctrl, acc_control_offs, tmp);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hamza Mahfooz hamza.mahfooz@amd.com
commit a81de4a22bbe3183b7f0d6f13f592b8f5b5a3c18 upstream.
This reverts commit 3a31e8b89b7240d9a17ace8a1ed050bdcb560f9e.
We still need to call dcn20_adjust_freesync_v_startup() for older DCN3+ ASICs. Otherwise, it can cause DP to HDMI 2.1 PCONs to fail to light up.
Cc: stable@vger.kernel.org Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2809 Reviewed-by: Fangzhi Zuo jerry.zuo@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 24 ++++--------------- 1 file changed, 4 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c index 8afda5ecc0cd..d01bc2dff49b 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c @@ -1099,6 +1099,10 @@ void dcn20_calculate_dlg_params(struct dc *dc, context->res_ctx.pipe_ctx[i].plane_res.bw.dppclk_khz = pipes[pipe_idx].clks_cfg.dppclk_mhz * 1000; context->res_ctx.pipe_ctx[i].pipe_dlg_param = pipes[pipe_idx].pipe.dest; + if (context->res_ctx.pipe_ctx[i].stream->adaptive_sync_infopacket.valid) + dcn20_adjust_freesync_v_startup( + &context->res_ctx.pipe_ctx[i].stream->timing, + &context->res_ctx.pipe_ctx[i].pipe_dlg_param.vstartup_start);
pipe_idx++; } @@ -1927,7 +1931,6 @@ static bool dcn20_validate_bandwidth_internal(struct dc *dc, struct dc_state *co int vlevel = 0; int pipe_split_from[MAX_PIPES]; int pipe_cnt = 0; - int i = 0; display_e2e_pipe_params_st *pipes = kzalloc(dc->res_pool->pipe_count * sizeof(display_e2e_pipe_params_st), GFP_ATOMIC); DC_LOGGER_INIT(dc->ctx->logger);
@@ -1951,15 +1954,6 @@ static bool dcn20_validate_bandwidth_internal(struct dc *dc, struct dc_state *co dcn20_calculate_wm(dc, context, pipes, &pipe_cnt, pipe_split_from, vlevel, fast_validate); dcn20_calculate_dlg_params(dc, context, pipes, pipe_cnt, vlevel);
- for (i = 0; i < dc->res_pool->pipe_count; i++) { - if (!context->res_ctx.pipe_ctx[i].stream) - continue; - if (context->res_ctx.pipe_ctx[i].stream->adaptive_sync_infopacket.valid) - dcn20_adjust_freesync_v_startup( - &context->res_ctx.pipe_ctx[i].stream->timing, - &context->res_ctx.pipe_ctx[i].pipe_dlg_param.vstartup_start); - } - BW_VAL_TRACE_END_WATERMARKS();
goto validate_out; @@ -2232,7 +2226,6 @@ bool dcn21_validate_bandwidth_fp(struct dc *dc, int vlevel = 0; int pipe_split_from[MAX_PIPES]; int pipe_cnt = 0; - int i = 0; display_e2e_pipe_params_st *pipes = kzalloc(dc->res_pool->pipe_count * sizeof(display_e2e_pipe_params_st), GFP_ATOMIC); DC_LOGGER_INIT(dc->ctx->logger);
@@ -2261,15 +2254,6 @@ bool dcn21_validate_bandwidth_fp(struct dc *dc, dcn21_calculate_wm(dc, context, pipes, &pipe_cnt, pipe_split_from, vlevel, fast_validate); dcn20_calculate_dlg_params(dc, context, pipes, pipe_cnt, vlevel);
- for (i = 0; i < dc->res_pool->pipe_count; i++) { - if (!context->res_ctx.pipe_ctx[i].stream) - continue; - if (context->res_ctx.pipe_ctx[i].stream->adaptive_sync_infopacket.valid) - dcn20_adjust_freesync_v_startup( - &context->res_ctx.pipe_ctx[i].stream->timing, - &context->res_ctx.pipe_ctx[i].pipe_dlg_param.vstartup_start); - } - BW_VAL_TRACE_END_WATERMARKS();
goto validate_out;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Melissa Wen mwen@igalia.com
commit 57a943ebfcdb4a97fbb409640234bdb44bfa1953 upstream.
For DRM legacy gamma, AMD display manager applies implicit sRGB degamma using a pre-defined sRGB transfer function. It works fine for DCN2 family where degamma ROM and custom curves go to the same color block. But, on DCN3+, degamma is split into two blocks: degamma ROM for pre-defined TFs and `gamma correction` for user/custom curves and degamma ROM settings doesn't apply to cursor plane. To get DRM legacy gamma working as expected, enable cursor degamma ROM for implict sRGB degamma on HW with this configuration.
Cc: stable@vger.kernel.org Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2803 Fixes: 96b020e2163f ("drm/amd/display: check attr flag before set cursor degamma on DCN3+") Signed-off-by: Melissa Wen mwen@igalia.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c @@ -1260,6 +1260,13 @@ void amdgpu_dm_plane_handle_cursor_updat attributes.rotation_angle = 0; attributes.attribute_flags.value = 0;
+ /* Enable cursor degamma ROM on DCN3+ for implicit sRGB degamma in DRM + * legacy gamma setup. + */ + if (crtc_state->cm_is_degamma_srgb && + adev->dm.dc->caps.color.dpp.gamma_corr) + attributes.attribute_flags.bits.ENABLE_CURSOR_DEGAMMA = 1; + attributes.pitch = afb->base.pitches[0] / afb->base.format->cpp[0];
if (crtc_state->stream) {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hamza Mahfooz hamza.mahfooz@amd.com
commit 47428f4b638d3b3264a2efa1a567b0bbddbb6107 upstream.
Since, calling dcn20_adjust_freesync_v_startup() on DCN3.1+ ASICs can cause the display to flicker and underflow to occur, we shouldn't call it for them. So, ensure that the DCN version is less than DCN_VERSION_3_1 before calling dcn20_adjust_freesync_v_startup().
Cc: stable@vger.kernel.org Reviewed-by: Fangzhi Zuo jerry.zuo@amd.com Signed-off-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c @@ -1099,7 +1099,8 @@ void dcn20_calculate_dlg_params(struct d context->res_ctx.pipe_ctx[i].plane_res.bw.dppclk_khz = pipes[pipe_idx].clks_cfg.dppclk_mhz * 1000; context->res_ctx.pipe_ctx[i].pipe_dlg_param = pipes[pipe_idx].pipe.dest; - if (context->res_ctx.pipe_ctx[i].stream->adaptive_sync_infopacket.valid) + if (dc->ctx->dce_version < DCN_VERSION_3_1 && + context->res_ctx.pipe_ctx[i].stream->adaptive_sync_infopacket.valid) dcn20_adjust_freesync_v_startup( &context->res_ctx.pipe_ctx[i].stream->timing, &context->res_ctx.pipe_ctx[i].pipe_dlg_param.vstartup_start);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hamza Mahfooz hamza.mahfooz@amd.com
commit 07e388aab042774f284a2ad75a70a194517cdad4 upstream.
There are two places in apply_below_the_range() where it's possible for a divide by zero error to occur. So, to fix this make sure the divisor is non-zero before attempting the computation in both cases.
Cc: stable@vger.kernel.org Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2637 Fixes: a463b263032f ("drm/amd/display: Fix frames_to_insert math") Fixes: ded6119e825a ("drm/amd/display: Reinstate LFC optimization") Reviewed-by: Aurabindo Pillai aurabindo.pillai@amd.com Signed-off-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/modules/freesync/freesync.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c +++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c @@ -338,7 +338,9 @@ static void apply_below_the_range(struct * - Delta for CEIL: delta_from_mid_point_in_us_1 * - Delta for FLOOR: delta_from_mid_point_in_us_2 */ - if ((last_render_time_in_us / mid_point_frames_ceil) < in_out_vrr->min_duration_in_us) { + if (mid_point_frames_ceil && + (last_render_time_in_us / mid_point_frames_ceil) < + in_out_vrr->min_duration_in_us) { /* Check for out of range. * If using CEIL produces a value that is out of range, * then we are forced to use FLOOR. @@ -385,8 +387,9 @@ static void apply_below_the_range(struct /* Either we've calculated the number of frames to insert, * or we need to insert min duration frames */ - if (last_render_time_in_us / frames_to_insert < - in_out_vrr->min_duration_in_us){ + if (frames_to_insert && + (last_render_time_in_us / frames_to_insert) < + in_out_vrr->min_duration_in_us){ frames_to_insert -= (frames_to_insert > 1) ? 1 : 0; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit 50011c2a245792993f2756e5b5b571512bfa409e upstream.
Reset the mask of available "registers" and refresh the IDT vectoring info snapshot in vmx_vcpu_enter_exit(), before KVM potentially handles a an NMI VM-Exit. One of the "registers" that KVM VMX lazily loads is the vmcs.VM_EXIT_INTR_INFO field, which is holds the vector+type on "exception or NMI" VM-Exits, i.e. is needed to identify NMIs. Clearing the available registers bitmask after handling NMIs results in KVM querying info from the last VM-Exit that read vmcs.VM_EXIT_INTR_INFO, and leads to both missed NMIs and spurious NMIs in the host.
Opportunistically grab vmcs.IDT_VECTORING_INFO_FIELD early in the VM-Exit path too, e.g. to guard against similar consumption of stale data. The field is read on every "normal" VM-Exit, and there's no point in delaying the inevitable.
Reported-by: Like Xu like.xu.linux@gmail.com Fixes: 11df586d774f ("KVM: VMX: Handle NMI VM-Exits in noinstr region") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230825014532.2846714-1-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/vmx.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)
--- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7243,13 +7243,20 @@ static noinstr void vmx_vcpu_enter_exit( flags);
vcpu->arch.cr2 = native_read_cr2(); + vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET; + + vmx->idt_vectoring_info = 0;
vmx_enable_fb_clear(vmx);
- if (unlikely(vmx->fail)) + if (unlikely(vmx->fail)) { vmx->exit_reason.full = 0xdead; - else - vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON); + goto out; + } + + vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON); + if (likely(!vmx->exit_reason.failed_vmentry)) + vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && is_nmi(vmx_get_intr_info(vcpu))) { @@ -7258,6 +7265,7 @@ static noinstr void vmx_vcpu_enter_exit( kvm_after_interrupt(vcpu); }
+out: guest_state_exit_irqoff(); }
@@ -7379,8 +7387,6 @@ static fastpath_t vmx_vcpu_run(struct kv loadsegment(es, __USER_DS); #endif
- vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET; - pt_guest_exit(vmx);
kvm_load_host_xsave_state(vcpu); @@ -7397,17 +7403,12 @@ static fastpath_t vmx_vcpu_run(struct kv vmx->nested.nested_run_pending = 0; }
- vmx->idt_vectoring_info = 0; - if (unlikely(vmx->fail)) return EXIT_FASTPATH_NONE;
if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY)) kvm_machine_check();
- if (likely(!vmx->exit_reason.failed_vmentry)) - vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); - trace_kvm_exit(vcpu, KVM_ISA_VMX);
if (unlikely(vmx->exit_reason.failed_vmentry))
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit 4c08e737f056fec930b416a2bd37ed266d724f95 upstream.
Hoist the acquisition of ir_list_lock from avic_update_iommu_vcpu_affinity() to its two callers, avic_vcpu_load() and avic_vcpu_put(), specifically to encapsulate the write to the vCPU's entry in the AVIC Physical ID table. This will allow a future fix to pull information from the Physical ID entry when updating the IRTE, without potentially consuming stale information, i.e. without racing with the vCPU being (un)loaded.
Add a comment to call out that ir_list_lock does NOT protect against multiple writers, specifically that reading the Physical ID entry in avic_vcpu_put() outside of the lock is safe.
To preserve some semblance of independence from ir_list_lock, keep the READ_ONCE() in avic_vcpu_load() even though acuiring the spinlock effectively ensures the load(s) will be generated after acquiring the lock.
Cc: stable@vger.kernel.org Tested-by: Alejandro Jimenez alejandro.j.jimenez@oracle.com Reviewed-by: Joao Martins joao.m.martins@oracle.com Link: https://lore.kernel.org/r/20230808233132.2499764-2-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/avic.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-)
--- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -986,10 +986,11 @@ static inline int avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r) { int ret = 0; - unsigned long flags; struct amd_svm_iommu_ir *ir; struct vcpu_svm *svm = to_svm(vcpu);
+ lockdep_assert_held(&svm->ir_list_lock); + if (!kvm_arch_has_assigned_device(vcpu->kvm)) return 0;
@@ -997,19 +998,15 @@ avic_update_iommu_vcpu_affinity(struct k * Here, we go through the per-vcpu ir_list to update all existing * interrupt remapping table entry targeting this vcpu. */ - spin_lock_irqsave(&svm->ir_list_lock, flags); - if (list_empty(&svm->ir_list)) - goto out; + return 0;
list_for_each_entry(ir, &svm->ir_list, node) { ret = amd_iommu_update_ga(cpu, r, ir->data); if (ret) - break; + return ret; } -out: - spin_unlock_irqrestore(&svm->ir_list_lock, flags); - return ret; + return 0; }
void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) @@ -1017,6 +1014,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcp u64 entry; int h_physical_id = kvm_cpu_get_apicid(cpu); struct vcpu_svm *svm = to_svm(vcpu); + unsigned long flags;
lockdep_assert_preemption_disabled();
@@ -1033,6 +1031,8 @@ void avic_vcpu_load(struct kvm_vcpu *vcp if (kvm_vcpu_is_blocking(vcpu)) return;
+ spin_lock_irqsave(&svm->ir_list_lock, flags); + entry = READ_ONCE(*(svm->avic_physical_id_cache)); WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
@@ -1042,25 +1042,40 @@ void avic_vcpu_load(struct kvm_vcpu *vcp
WRITE_ONCE(*(svm->avic_physical_id_cache), entry); avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, true); + + spin_unlock_irqrestore(&svm->ir_list_lock, flags); }
void avic_vcpu_put(struct kvm_vcpu *vcpu) { u64 entry; struct vcpu_svm *svm = to_svm(vcpu); + unsigned long flags;
lockdep_assert_preemption_disabled();
+ /* + * Note, reading the Physical ID entry outside of ir_list_lock is safe + * as only the pCPU that has loaded (or is loading) the vCPU is allowed + * to modify the entry, and preemption is disabled. I.e. the vCPU + * can't be scheduled out and thus avic_vcpu_{put,load}() can't run + * recursively. + */ entry = READ_ONCE(*(svm->avic_physical_id_cache));
/* Nothing to do if IsRunning == '0' due to vCPU blocking. */ if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)) return;
+ spin_lock_irqsave(&svm->ir_list_lock, flags); + avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; WRITE_ONCE(*(svm->avic_physical_id_cache), entry); + + spin_unlock_irqrestore(&svm->ir_list_lock, flags); + }
void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit cb49631ad111570f1bad37702c11c2ae07fa2e3c upstream.
Don't inject a #UD if KVM attempts to "emulate" to skip an instruction for an SEV guest, and instead resume the guest and hope that it can make forward progress. When commit 04c40f344def ("KVM: SVM: Inject #UD on attempted emulation for SEV guest w/o insn buffer") added the completely arbitrary #UD behavior, there were no known scenarios where a well-behaved guest would induce a VM-Exit that triggered emulation, i.e. it was thought that injecting #UD would be helpful.
However, now that KVM (correctly) attempts to re-inject INT3/INTO, e.g. if a #NPF is encountered when attempting to deliver the INT3/INTO, an SEV guest can trigger emulation without a buffer, through no fault of its own. Resuming the guest and retrying the INT3/INTO is architecturally wrong, e.g. the vCPU will incorrectly re-hit code #DBs, but for SEV guests there is literally no other option that has a chance of making forward progress.
Drop the #UD injection for all "skip" emulation, not just those related to INT3/INTO, even though that means that the guest will likely end up in an infinite loop instead of getting a #UD (the vCPU may also crash, e.g. if KVM emulated everything about an instruction except for advancing RIP). There's no evidence that suggests that an unexpected #UD is actually better than hanging the vCPU, e.g. a soft-hung vCPU can still respond to IRQs and NMIs to generate a backtrace.
Reported-by: Wu Zongyo wuzongyo@mail.ustc.edu.cn Closes: https://lore.kernel.org/all/8eb933fd-2cf3-d7a9-32fe-2a1d82eac42a@mail.ustc.e... Fixes: 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction") Cc: stable@vger.kernel.org Cc: Tom Lendacky thomas.lendacky@amd.com Link: https://lore.kernel.org/r/20230825013621.2845700-2-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/svm.c | 35 +++++++++++++++++++++++++++-------- 1 file changed, 27 insertions(+), 8 deletions(-)
--- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -365,6 +365,8 @@ static void svm_set_interrupt_shadow(str svm->vmcb->control.int_state |= SVM_INTERRUPT_SHADOW_MASK;
} +static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, + void *insn, int insn_len);
static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu, bool commit_side_effects) @@ -385,6 +387,14 @@ static int __svm_skip_emulated_instructi }
if (!svm->next_rip) { + /* + * FIXME: Drop this when kvm_emulate_instruction() does the + * right thing and treats "can't emulate" as outright failure + * for EMULTYPE_SKIP. + */ + if (!svm_can_emulate_instruction(vcpu, EMULTYPE_SKIP, NULL, 0)) + return 0; + if (unlikely(!commit_side_effects)) old_rflags = svm->vmcb->save.rflags;
@@ -4651,16 +4661,25 @@ static bool svm_can_emulate_instruction( * and cannot be decrypted by KVM, i.e. KVM would read cyphertext and * decode garbage. * - * Inject #UD if KVM reached this point without an instruction buffer. - * In practice, this path should never be hit by a well-behaved guest, - * e.g. KVM doesn't intercept #UD or #GP for SEV guests, but this path - * is still theoretically reachable, e.g. via unaccelerated fault-like - * AVIC access, and needs to be handled by KVM to avoid putting the - * guest into an infinite loop. Injecting #UD is somewhat arbitrary, - * but its the least awful option given lack of insight into the guest. + * If KVM is NOT trying to simply skip an instruction, inject #UD if + * KVM reached this point without an instruction buffer. In practice, + * this path should never be hit by a well-behaved guest, e.g. KVM + * doesn't intercept #UD or #GP for SEV guests, but this path is still + * theoretically reachable, e.g. via unaccelerated fault-like AVIC + * access, and needs to be handled by KVM to avoid putting the guest + * into an infinite loop. Injecting #UD is somewhat arbitrary, but + * its the least awful option given lack of insight into the guest. + * + * If KVM is trying to skip an instruction, simply resume the guest. + * If a #NPF occurs while the guest is vectoring an INT3/INTO, then KVM + * will attempt to re-inject the INT3/INTO and skip the instruction. + * In that scenario, retrying the INT3/INTO and hoping the guest will + * make forward progress is the only option that has a chance of + * success (and in practice it will work the vast majority of the time). */ if (unlikely(!insn)) { - kvm_queue_exception(vcpu, UD_VECTOR); + if (!(emul_type & EMULTYPE_SKIP)) + kvm_queue_exception(vcpu, UD_VECTOR); return false; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit f1187ef24eb8f36e8ad8106d22615ceddeea6097 upstream.
Fix a goof where KVM tries to grab source vCPUs from the destination VM when doing intrahost migration. Grabbing the wrong vCPU not only hoses the guest, it also crashes the host due to the VMSA pointer being left NULL.
BUG: unable to handle page fault for address: ffffe38687000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 39 PID: 17143 Comm: sev_migrate_tes Tainted: GO 6.5.0-smp--fff2e47e6c3b-next #151 Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.28.0 07/10/2023 RIP: 0010:__free_pages+0x15/0xd0 RSP: 0018:ffff923fcf6e3c78 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffe38687000000 RCX: 0000000000000100 RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffe38687000000 RBP: ffff923fcf6e3c88 R08: ffff923fcafb0000 R09: 0000000000000000 R10: 0000000000000000 R11: ffffffff83619b90 R12: ffff923fa9540000 R13: 0000000000080007 R14: ffff923f6d35d000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff929d0d7c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffe38687000000 CR3: 0000005224c34005 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> sev_free_vcpu+0xcb/0x110 [kvm_amd] svm_vcpu_free+0x75/0xf0 [kvm_amd] kvm_arch_vcpu_destroy+0x36/0x140 [kvm] kvm_destroy_vcpus+0x67/0x100 [kvm] kvm_arch_destroy_vm+0x161/0x1d0 [kvm] kvm_put_kvm+0x276/0x560 [kvm] kvm_vm_release+0x25/0x30 [kvm] __fput+0x106/0x280 ____fput+0x12/0x20 task_work_run+0x86/0xb0 do_exit+0x2e3/0x9c0 do_group_exit+0xb1/0xc0 __x64_sys_exit_group+0x1b/0x20 do_syscall_64+0x41/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> CR2: ffffe38687000000
Fixes: 6defa24d3b12 ("KVM: SEV: Init target VMCBs in sev_migrate_from") Cc: stable@vger.kernel.org Cc: Peter Gonda pgonda@google.com Reviewed-by: Peter Gonda pgonda@google.com Reviewed-by: Pankaj Gupta pankaj.gupta@amd.com Link: https://lore.kernel.org/r/20230825022357.2852133-2-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/sev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1725,7 +1725,7 @@ static void sev_migrate_from(struct kvm * Note, the source is not required to have the same number of * vCPUs as the destination when migrating a vanilla SEV VM. */ - src_vcpu = kvm_get_vcpu(dst_kvm, i); + src_vcpu = kvm_get_vcpu(src_kvm, i); src_svm = to_svm(src_vcpu);
/*
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit 7cafe9b8e22bb3d77f130c461aedf6868c4aaf58 upstream.
Check for nested TSC scaling support on nested SVM VMRUN instead of asserting that TSC scaling is exposed to L1 if L1's MSR_AMD64_TSC_RATIO has diverged from KVM's default. Userspace can trigger the WARN at will by writing the MSR and then updating guest CPUID to hide the feature (modifying guest CPUID is allowed anytime before KVM_RUN). E.g. hacking KVM's state_test selftest to do
vcpu_set_msr(vcpu, MSR_AMD64_TSC_RATIO, 0); vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_TSCRATEMSR);
after restoring state in a new VM+vCPU yields an endless supply of:
------------[ cut here ]------------ WARNING: CPU: 164 PID: 62565 at arch/x86/kvm/svm/nested.c:699 nested_vmcb02_prepare_control+0x3d6/0x3f0 [kvm_amd] Call Trace: <TASK> enter_svm_guest_mode+0x114/0x560 [kvm_amd] nested_svm_vmrun+0x260/0x330 [kvm_amd] vmrun_interception+0x29/0x30 [kvm_amd] svm_invoke_exit_handler+0x35/0x100 [kvm_amd] svm_handle_exit+0xe7/0x180 [kvm_amd] kvm_arch_vcpu_ioctl_run+0x1eab/0x2570 [kvm] kvm_vcpu_ioctl+0x4c9/0x5b0 [kvm] __se_sys_ioctl+0x7a/0xc0 __x64_sys_ioctl+0x21/0x30 do_syscall_64+0x41/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x45ca1b
Note, the nested #VMEXIT path has the same flaw, but needs a different fix and will be handled separately.
Fixes: 5228eb96a487 ("KVM: x86: nSVM: implement nested TSC scaling") Cc: Maxim Levitsky mlevitsk@redhat.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230729011608.1065019-2-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/nested.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -695,10 +695,9 @@ static void nested_vmcb02_prepare_contro
vmcb02->control.tsc_offset = vcpu->arch.tsc_offset;
- if (svm->tsc_ratio_msr != kvm_caps.default_tsc_scaling_ratio) { - WARN_ON(!svm->tsc_scaling_enabled); + if (svm->tsc_scaling_enabled && + svm->tsc_ratio_msr != kvm_caps.default_tsc_scaling_ratio) nested_svm_update_tsc_ratio_msr(vcpu); - }
vmcb02->control.int_ctl = (svm->nested.ctl.int_ctl & int_ctl_vmcb12_bits) |
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit 0c94e2468491cbf0754f49a5136ab51294a96b69 upstream.
When emulating nested VM-Exit, load L1's TSC multiplier if L1's desired ratio doesn't match the current ratio, not if the ratio L1 is using for L2 diverges from the default. Functionally, the end result is the same as KVM will run L2 with L1's multiplier if L2's multiplier is the default, i.e. checking that L1's multiplier is loaded is equivalent to checking if L2 has a non-default multiplier.
However, the assertion that TSC scaling is exposed to L1 is flawed, as userspace can trigger the WARN at will by writing the MSR and then updating guest CPUID to hide the feature (modifying guest CPUID is allowed anytime before KVM_RUN). E.g. hacking KVM's state_test selftest to do
vcpu_set_msr(vcpu, MSR_AMD64_TSC_RATIO, 0); vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_TSCRATEMSR);
after restoring state in a new VM+vCPU yields an endless supply of:
------------[ cut here ]------------ WARNING: CPU: 10 PID: 206939 at arch/x86/kvm/svm/nested.c:1105 nested_svm_vmexit+0x6af/0x720 [kvm_amd] Call Trace: nested_svm_exit_handled+0x102/0x1f0 [kvm_amd] svm_handle_exit+0xb9/0x180 [kvm_amd] kvm_arch_vcpu_ioctl_run+0x1eab/0x2570 [kvm] kvm_vcpu_ioctl+0x4c9/0x5b0 [kvm] ? trace_hardirqs_off+0x4d/0xa0 __se_sys_ioctl+0x7a/0xc0 __x64_sys_ioctl+0x21/0x30 do_syscall_64+0x41/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Unlike the nested VMRUN path, hoisting the svm->tsc_scaling_enabled check into the if-statement is wrong as KVM needs to ensure L1's multiplier is loaded in the above scenario. Alternatively, the WARN_ON() could simply be deleted, but that would make KVM's behavior even more subtle, e.g. it's not immediately obvious why it's safe to write MSR_AMD64_TSC_RATIO when checking only tsc_ratio_msr.
Fixes: 5228eb96a487 ("KVM: x86: nSVM: implement nested TSC scaling") Cc: Maxim Levitsky mlevitsk@redhat.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230729011608.1065019-3-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/nested.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -1100,8 +1100,8 @@ int nested_svm_vmexit(struct vcpu_svm *s vmcb_mark_dirty(vmcb01, VMCB_INTERCEPTS); }
- if (svm->tsc_ratio_msr != kvm_caps.default_tsc_scaling_ratio) { - WARN_ON(!svm->tsc_scaling_enabled); + if (kvm_caps.has_tsc_control && + vcpu->arch.tsc_scaling_ratio != vcpu->arch.l1_tsc_scaling_ratio) { vcpu->arch.tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio; __svm_write_tsc_multiplier(vcpu->arch.tsc_scaling_ratio); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit f3cebc75e7425d6949d726bb8e937095b0aef025 upstream.
Update the target pCPU for IOMMU doorbells when updating IRTE routing if KVM is actively running the associated vCPU. KVM currently only updates the pCPU when loading the vCPU (via avic_vcpu_load()), and so doorbell events will be delayed until the vCPU goes through a put+load cycle (which might very well "never" happen for the lifetime of the VM).
To avoid inserting a stale pCPU, e.g. due to racing between updating IRTE routing and vCPU load/put, get the pCPU information from the vCPU's Physical APIC ID table entry (a.k.a. avic_physical_id_cache in KVM) and update the IRTE while holding ir_list_lock. Add comments with --verbose enabled to explain exactly what is and isn't protected by ir_list_lock.
Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt") Reported-by: dengqiao.joey dengqiao.joey@bytedance.com Cc: stable@vger.kernel.org Cc: Alejandro Jimenez alejandro.j.jimenez@oracle.com Cc: Joao Martins joao.m.martins@oracle.com Cc: Maxim Levitsky mlevitsk@redhat.com Cc: Suravee Suthikulpanit suravee.suthikulpanit@amd.com Tested-by: Alejandro Jimenez alejandro.j.jimenez@oracle.com Reviewed-by: Joao Martins joao.m.martins@oracle.com Link: https://lore.kernel.org/r/20230808233132.2499764-3-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/avic.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
--- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -791,6 +791,7 @@ static int svm_ir_list_add(struct vcpu_s int ret = 0; unsigned long flags; struct amd_svm_iommu_ir *ir; + u64 entry;
/** * In some cases, the existing irte is updated and re-set, @@ -824,6 +825,18 @@ static int svm_ir_list_add(struct vcpu_s ir->data = pi->ir_data;
spin_lock_irqsave(&svm->ir_list_lock, flags); + + /* + * Update the target pCPU for IOMMU doorbells if the vCPU is running. + * If the vCPU is NOT running, i.e. is blocking or scheduled out, KVM + * will update the pCPU info when the vCPU awkened and/or scheduled in. + * See also avic_vcpu_load(). + */ + entry = READ_ONCE(*(svm->avic_physical_id_cache)); + if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK) + amd_iommu_update_ga(entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK, + true, pi->ir_data); + list_add(&ir->node, &svm->ir_list); spin_unlock_irqrestore(&svm->ir_list_lock, flags); out: @@ -1031,6 +1044,13 @@ void avic_vcpu_load(struct kvm_vcpu *vcp if (kvm_vcpu_is_blocking(vcpu)) return;
+ /* + * Grab the per-vCPU interrupt remapping lock even if the VM doesn't + * _currently_ have assigned devices, as that can change. Holding + * ir_list_lock ensures that either svm_ir_list_add() will consume + * up-to-date entry information, or that this task will wait until + * svm_ir_list_add() completes to set the new target pCPU. + */ spin_lock_irqsave(&svm->ir_list_lock, flags);
entry = READ_ONCE(*(svm->avic_physical_id_cache)); @@ -1067,6 +1087,14 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)) return;
+ /* + * Take and hold the per-vCPU interrupt remapping lock while updating + * the Physical ID entry even though the lock doesn't protect against + * multiple writers (see above). Holding ir_list_lock ensures that + * either svm_ir_list_add() will consume up-to-date entry information, + * or that this task will wait until svm_ir_list_add() completes to + * mark the vCPU as not running. + */ spin_lock_irqsave(&svm->ir_list_lock, flags);
avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit 1952e74da96fb3e48b72a2d0ece78c688a5848c1 upstream.
Skip initializing the VMSA physical address in the VMCB if the VMSA is NULL, which occurs during intrahost migration as KVM initializes the VMCB before copying over state from the source to the destination (including the VMSA and its physical address).
In normal builds, __pa() is just math, so the bug isn't fatal, but with CONFIG_DEBUG_VIRTUAL=y, the validity of the virtual address is verified and passing in NULL will make the kernel unhappy.
Fixes: 6defa24d3b12 ("KVM: SEV: Init target VMCBs in sev_migrate_from") Cc: stable@vger.kernel.org Cc: Peter Gonda pgonda@google.com Reviewed-by: Peter Gonda pgonda@google.com Reviewed-by: Pankaj Gupta pankaj.gupta@amd.com Link: https://lore.kernel.org/r/20230825022357.2852133-3-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/sev.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2955,9 +2955,12 @@ static void sev_es_init_vmcb(struct vcpu /* * An SEV-ES guest requires a VMSA area that is a separate from the * VMCB page. Do not include the encryption mask on the VMSA physical - * address since hardware will access it using the guest key. + * address since hardware will access it using the guest key. Note, + * the VMSA will be NULL if this vCPU is the destination for intrahost + * migration, and will be copied later. */ - svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa); + if (svm->sev_es.vmsa) + svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
/* Can't intercept CR register access, HV can't modify CR registers */ svm_clr_intercept(svm, INTERCEPT_CR0_READ);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej W. Rozycki macro@orcam.me.uk
commit 4fe4a6374c4db9ae2b849b61e84b58685dca565a upstream.
We have originally guarded fiddling with CHECKFLAGS in our arch Makefile by checking for the CONFIG_MIPS variable, not set for targets such as `distclean', etc. that neither include `.config' nor use the compiler.
Starting from commit 805b2e1d427a ("kbuild: include Makefile.compiler only when compiler is needed") we have had a generic `need-compiler' variable explicitly telling us if the compiler will be used and thus its capabilities need to be checked and expressed in the form of compilation flags. If this variable is not set, then `make' functions such as `cc-option' are undefined, causing all kinds of weirdness to happen if we expect specific results to be returned, most recently:
cc1: error: '-mloongson-mmi' must be used with '-mhard-float'
messages with configurations such as `fuloong2e_defconfig' and the `modules_install' target, which does include `.config' and yet does not use the compiler.
Replace the check for CONFIG_MIPS with one for `need-compiler' instead, so as to prevent the compiler from being ever called for CHECKFLAGS when not needed.
Reported-by: Guillaume Tucker guillaume.tucker@collabora.com Closes: https://lore.kernel.org/r/85031c0c-d981-031e-8a50-bc4fad2ddcd8@collabora.com... Signed-off-by: Maciej W. Rozycki macro@orcam.me.uk Fixes: 805b2e1d427a ("kbuild: include Makefile.compiler only when compiler is needed") Cc: stable@vger.kernel.org # v5.13+ Reported-by: "kernelci.org bot" bot@kernelci.org Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/mips/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/mips/Makefile +++ b/arch/mips/Makefile @@ -341,7 +341,7 @@ KBUILD_CFLAGS += -fno-asynchronous-unwin
KBUILD_LDFLAGS += -m $(ld-emul)
-ifdef CONFIG_MIPS +ifdef need-compiler CHECKFLAGS += $(shell $(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) -dM -E -x c /dev/null | \ grep -E -vw '__GNUC_(MINOR_|PATCHLEVEL_)?_' | \ sed -e "s/^#define /-D'/" -e "s/ /'='/" -e "s/$$/'/" -e 's/$$/&&/g')
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej W. Rozycki macro@orcam.me.uk
commit a79a404e6c2241ebc528b9ebf4c0832457b498c3 upstream.
Remove a build-time check for the presence of the GCC `-msym32' option. This option has been there since GCC 4.1.0, which is below the minimum required as at commit 805b2e1d427a ("kbuild: include Makefile.compiler only when compiler is needed"), when an error message:
arch/mips/Makefile:306: *** CONFIG_CPU_DADDI_WORKAROUNDS unsupported without -msym32. Stop.
started to trigger for the `modules_install' target with configurations such as `decstation_64_defconfig' that set CONFIG_CPU_DADDI_WORKAROUNDS, because said commit has made `cc-option-yn' an undefined function for non-build targets.
Reported-by: Jan-Benedict Glaw jbglaw@lug-owl.de Signed-off-by: Maciej W. Rozycki macro@orcam.me.uk Fixes: 805b2e1d427a ("kbuild: include Makefile.compiler only when compiler is needed") Cc: stable@vger.kernel.org # v5.13+ Reviewed-by: Philippe Mathieu-Daudé philmd@linaro.org Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/mips/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/mips/Makefile +++ b/arch/mips/Makefile @@ -299,8 +299,8 @@ ifdef CONFIG_64BIT endif endif
- ifeq ($(KBUILD_SYM32)$(call cc-option-yn,-msym32), yy) - cflags-y += -msym32 -DKBUILD_64BIT_SYM32 + ifeq ($(KBUILD_SYM32), y) + cflags-$(KBUILD_SYM32) += -msym32 -DKBUILD_64BIT_SYM32 else ifeq ($(CONFIG_CPU_DADDI_WORKAROUNDS), y) $(error CONFIG_CPU_DADDI_WORKAROUNDS unsupported without -msym32)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namhyung Kim namhyung@kernel.org
commit e2cabf2a44791f01c21f8d5189b946926e34142e upstream.
The commit ef9ff6017e3c4593 ("perf ui browser: Move the extra title lines from the hists browser") introduced ui_browser__gotorc_title() to help moving non-title lines easily. But it missed to update the title for the hierarchy mode so it won't print the header line on TUI at all.
$ perf report --hierarchy
Fixes: ef9ff6017e3c4593 ("perf ui browser: Move the extra title lines from the hists browser") Signed-off-by: Namhyung Kim namhyung@kernel.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@kernel.org Cc: Jiri Olsa jolsa@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230731094934.1616495-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/ui/browsers/hists.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/perf/ui/browsers/hists.c +++ b/tools/perf/ui/browsers/hists.c @@ -1779,7 +1779,7 @@ static void hists_browser__hierarchy_hea hists_browser__scnprintf_hierarchy_headers(browser, headers, sizeof(headers));
- ui_browser__gotorc(&browser->b, 0, 0); + ui_browser__gotorc_title(&browser->b, 0, 0); ui_browser__set_color(&browser->b, HE_COLORSET_ROOT); ui_browser__write_nstring(&browser->b, headers, browser->b.width + 1); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namhyung Kim namhyung@kernel.org
commit 7822a8913f4c51c7d1aff793b525d60c3384fb5b upstream.
The bison and flex generate C files from the source (.y and .l) files. When O= option is used, they are saved in a separate directory but the default build rule assumes the .C files are in the source directory. So it might read invalid file if there are generated files from an old version. The same is true for the pmu-events files.
For example, the following command would cause a build failure:
$ git checkout v6.3 $ make -C tools/perf # build in the same directory
$ git checkout v6.5-rc2 $ mkdir build # create a build directory $ make -C tools/perf O=build # build in a different directory but it # refers files in the source directory
Let's update the build rule to specify those cases explicitly to depend on the files in the output directory.
Note that it's not a complete fix and it needs the next patch for the include path too.
Fixes: 80eeb67fe577aa76 ("perf jevents: Program to convert JSON file") Signed-off-by: Namhyung Kim namhyung@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Anup Sharma anupnewsmail@gmail.com Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@kernel.org Cc: Jiri Olsa jolsa@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230728022447.1323563-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/build/Makefile.build | 10 ++++++++++ tools/perf/pmu-events/Build | 6 ++++++ 2 files changed, 16 insertions(+)
--- a/tools/build/Makefile.build +++ b/tools/build/Makefile.build @@ -117,6 +117,16 @@ $(OUTPUT)%.s: %.c FORCE $(call rule_mkdir) $(call if_changed_dep,cc_s_c)
+# bison and flex files are generated in the OUTPUT directory +# so it needs a separate rule to depend on them properly +$(OUTPUT)%-bison.o: $(OUTPUT)%-bison.c FORCE + $(call rule_mkdir) + $(call if_changed_dep,$(host)cc_o_c) + +$(OUTPUT)%-flex.o: $(OUTPUT)%-flex.c FORCE + $(call rule_mkdir) + $(call if_changed_dep,$(host)cc_o_c) + # Gather build data: # obj-y - list of build objects # subdir-y - list of directories to nest --- a/tools/perf/pmu-events/Build +++ b/tools/perf/pmu-events/Build @@ -35,3 +35,9 @@ $(PMU_EVENTS_C): $(JSON) $(JSON_TEST) $( $(call rule_mkdir) $(Q)$(call echo-cmd,gen)$(PYTHON) $(JEVENTS_PY) $(JEVENTS_ARCH) $(JEVENTS_MODEL) pmu-events/arch $@ endif + +# pmu-events.c file is generated in the OUTPUT directory so it needs a +# separate rule to depend on it properly +$(OUTPUT)pmu-events/pmu-events.o: $(PMU_EVENTS_C) + $(call rule_mkdir) + $(call if_changed_dep,cc_o_c)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namhyung Kim namhyung@kernel.org
commit 68ca249c964f520af7f8763e22f12bd26b57b870 upstream.
As of now, bpf counters (bperf) don't support event groups. But the default perf stat includes topdown metrics if supported (on recent Intel machines) which require groups. That makes perf stat exiting.
$ sudo perf stat --bpf-counter true bpf managed perf events do not yet support groups.
Actually the test explicitly uses cycles event only, but it missed to pass the option when it checks the availability of the command.
Fixes: 2c0cb9f56020d2ea ("perf test: Add a shell test for 'perf stat --bpf-counters' new option") Reviewed-by: Song Liu song@kernel.org Signed-off-by: Namhyung Kim namhyung@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@kernel.org Cc: Jiri Olsa jolsa@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: bpf@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230825164152.165610-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/tests/shell/stat_bpf_counters.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/tools/perf/tests/shell/stat_bpf_counters.sh +++ b/tools/perf/tests/shell/stat_bpf_counters.sh @@ -22,10 +22,10 @@ compare_number() }
# skip if --bpf-counters is not supported -if ! perf stat --bpf-counters true > /dev/null 2>&1; then +if ! perf stat -e cycles --bpf-counters true > /dev/null 2>&1; then if [ "$1" = "-v" ]; then echo "Skipping: --bpf-counters not supported" - perf --no-pager stat --bpf-counters true || true + perf --no-pager stat -e cycles --bpf-counters true || true fi exit 2 fi
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namhyung Kim namhyung@kernel.org
commit 9bf63282ea77a531ea58acb42fb3f40d2d1e4497 upstream.
The PERF_RECORD_ATTR is used for a pipe mode to describe an event with attribute and IDs. The ID table comes after the attr and it calculate size of the table using the total record size and the attr size.
n_ids = (total_record_size - end_of_the_attr_field) / sizeof(u64)
This is fine for most use cases, but sometimes it saves the pipe output in a file and then process it later. And it becomes a problem if there is a change in attr size between the record and report.
$ perf record -o- > perf-pipe.data # old version $ perf report -i- < perf-pipe.data # new version
For example, if the attr size is 128 and it has 4 IDs, then it would save them in 168 byte like below:
8 byte: perf event header { .type = PERF_RECORD_ATTR, .size = 168 }, 128 byte: perf event attr { .size = 128, ... }, 32 byte: event IDs [] = { 1234, 1235, 1236, 1237 },
But when report later, it thinks the attr size is 136 then it only read the last 3 entries as ID.
8 byte: perf event header { .type = PERF_RECORD_ATTR, .size = 168 }, 136 byte: perf event attr { .size = 136, ... }, 24 byte: event IDs [] = { 1235, 1236, 1237 }, // 1234 is missing
So it should use the recorded version of the attr. The attr has the size field already then it should honor the size when reading data.
Fixes: 2c46dbb517a10b18 ("perf: Convert perf header attrs into attr events") Signed-off-by: Namhyung Kim namhyung@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@kernel.org Cc: Jiri Olsa jolsa@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Tom Zanussi zanussi@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230825152552.112913-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/util/header.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -4382,7 +4382,8 @@ int perf_event__process_attr(struct perf union perf_event *event, struct evlist **pevlist) { - u32 i, ids, n_ids; + u32 i, n_ids; + u64 *ids; struct evsel *evsel; struct evlist *evlist = *pevlist;
@@ -4398,9 +4399,8 @@ int perf_event__process_attr(struct perf
evlist__add(evlist, evsel);
- ids = event->header.size; - ids -= (void *)&event->attr.id - (void *)event; - n_ids = ids / sizeof(u64); + n_ids = event->header.size - sizeof(event->header) - event->attr.attr.size; + n_ids = n_ids / sizeof(u64); /* * We don't have the cpu and thread maps on the header, so * for allocating the perf_sample_id table we fake 1 cpu and @@ -4409,8 +4409,9 @@ int perf_event__process_attr(struct perf if (perf_evsel__alloc_id(&evsel->core, 1, n_ids)) return -ENOMEM;
+ ids = (void *)&event->attr.attr + event->attr.attr.size; for (i = 0; i < n_ids; i++) { - perf_evlist__id_add(&evlist->core, &evsel->core, 0, i, event->attr.id[i]); + perf_evlist__id_add(&evlist->core, &evsel->core, 0, i, ids[i]); }
return 0;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namhyung Kim namhyung@kernel.org
commit c7e97f215a4ad634b746804679f5937d25f77e29 upstream.
The flex and bison generate header files from the source. When user specified a build directory with O= option, it'd generate files under the directory. The build command has -I option to specify the header include directory.
But the -I option only affects the files included like <...>. Let's change the flex and bison headers to use it instead of "...".
Fixes: 80eeb67fe577aa76 ("perf jevents: Program to convert JSON file") Signed-off-by: Namhyung Kim namhyung@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Anup Sharma anupnewsmail@gmail.com Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@kernel.org Cc: Jiri Olsa jolsa@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230728022447.1323563-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/pmu-events/jevents.py | 2 +- tools/perf/util/bpf-filter.c | 4 ++-- tools/perf/util/expr.c | 4 ++-- tools/perf/util/parse-events.c | 4 ++-- tools/perf/util/pmu.c | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-)
--- a/tools/perf/pmu-events/jevents.py +++ b/tools/perf/pmu-events/jevents.py @@ -999,7 +999,7 @@ such as "arm/cortex-a34".''', _args = ap.parse_args()
_args.output_file.write(""" -#include "pmu-events/pmu-events.h" +#include <pmu-events/pmu-events.h> #include "util/header.h" #include "util/pmu.h" #include <string.h> --- a/tools/perf/util/bpf-filter.c +++ b/tools/perf/util/bpf-filter.c @@ -9,8 +9,8 @@ #include "util/evsel.h"
#include "util/bpf-filter.h" -#include "util/bpf-filter-flex.h" -#include "util/bpf-filter-bison.h" +#include <util/bpf-filter-flex.h> +#include <util/bpf-filter-bison.h>
#include "bpf_skel/sample-filter.h" #include "bpf_skel/sample_filter.skel.h" --- a/tools/perf/util/expr.c +++ b/tools/perf/util/expr.c @@ -10,8 +10,8 @@ #include "debug.h" #include "evlist.h" #include "expr.h" -#include "expr-bison.h" -#include "expr-flex.h" +#include <util/expr-bison.h> +#include <util/expr-flex.h> #include "util/hashmap.h" #include "smt.h" #include "tsc.h" --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -18,8 +18,8 @@ #include "debug.h" #include <api/fs/tracing_path.h> #include <perf/cpumap.h> -#include "parse-events-bison.h" -#include "parse-events-flex.h" +#include <util/parse-events-bison.h> +#include <util/parse-events-flex.h> #include "pmu.h" #include "pmus.h" #include "asm/bug.h" --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -19,8 +19,8 @@ #include "evsel.h" #include "pmu.h" #include "pmus.h" -#include "pmu-bison.h" -#include "pmu-flex.h" +#include <util/pmu-bison.h> +#include <util/pmu-flex.h> #include "parse-events.h" #include "print-events.h" #include "header.h"
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namhyung Kim namhyung@kernel.org
commit f6b8436bede3e80226e8b2100279c4450c73806a upstream.
The 'e' key is to toggle expand/collapse the selected entry only. But the current code has a bug that it only increases the number of entries by 1 in the hierarchy mode so users cannot move under the current entry after the key stroke. This is due to a wrong assumption in the hist_entry__set_folding().
The commit b33f922651011eff ("perf hists browser: Put hist_entry folding logic into single function") factored out the code, but actually it should be handled separately. The hist_browser__set_folding() is to update fold state for each entry so it needs to traverse all (child) entries regardless of the current fold state. So it increases the number of entries by 1.
But the hist_entry__set_folding() only cares the currently selected entry and its all children. So it should count all unfolded child entries. This code is implemented in hist_browser__toggle_fold() already so we can just call it.
Fixes: b33f922651011eff ("perf hists browser: Put hist_entry folding logic into single function") Signed-off-by: Namhyung Kim namhyung@kernel.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@kernel.org Cc: Jiri Olsa jolsa@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230731094934.1616495-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/ui/browsers/hists.c | 58 ++++++++++++++++------------------------- 1 file changed, 24 insertions(+), 34 deletions(-)
--- a/tools/perf/ui/browsers/hists.c +++ b/tools/perf/ui/browsers/hists.c @@ -407,11 +407,6 @@ static bool hist_browser__selection_has_ return container_of(ms, struct callchain_list, ms)->has_children; }
-static bool hist_browser__he_selection_unfolded(struct hist_browser *browser) -{ - return browser->he_selection ? browser->he_selection->unfolded : false; -} - static bool hist_browser__selection_unfolded(struct hist_browser *browser) { struct hist_entry *he = browser->he_selection; @@ -584,8 +579,8 @@ static int hierarchy_set_folding(struct return n; }
-static void __hist_entry__set_folding(struct hist_entry *he, - struct hist_browser *hb, bool unfold) +static void hist_entry__set_folding(struct hist_entry *he, + struct hist_browser *hb, bool unfold) { hist_entry__init_have_children(he); he->unfolded = unfold ? he->has_children : false; @@ -603,34 +598,12 @@ static void __hist_entry__set_folding(st he->nr_rows = 0; }
-static void hist_entry__set_folding(struct hist_entry *he, - struct hist_browser *browser, bool unfold) -{ - double percent; - - percent = hist_entry__get_percent_limit(he); - if (he->filtered || percent < browser->min_pcnt) - return; - - __hist_entry__set_folding(he, browser, unfold); - - if (!he->depth || unfold) - browser->nr_hierarchy_entries++; - if (he->leaf) - browser->nr_callchain_rows += he->nr_rows; - else if (unfold && !hist_entry__has_hierarchy_children(he, browser->min_pcnt)) { - browser->nr_hierarchy_entries++; - he->has_no_entry = true; - he->nr_rows = 1; - } else - he->has_no_entry = false; -} - static void __hist_browser__set_folding(struct hist_browser *browser, bool unfold) { struct rb_node *nd; struct hist_entry *he; + double percent;
nd = rb_first_cached(&browser->hists->entries); while (nd) { @@ -640,6 +613,21 @@ __hist_browser__set_folding(struct hist_ nd = __rb_hierarchy_next(nd, HMD_FORCE_CHILD);
hist_entry__set_folding(he, browser, unfold); + + percent = hist_entry__get_percent_limit(he); + if (he->filtered || percent < browser->min_pcnt) + continue; + + if (!he->depth || unfold) + browser->nr_hierarchy_entries++; + if (he->leaf) + browser->nr_callchain_rows += he->nr_rows; + else if (unfold && !hist_entry__has_hierarchy_children(he, browser->min_pcnt)) { + browser->nr_hierarchy_entries++; + he->has_no_entry = true; + he->nr_rows = 1; + } else + he->has_no_entry = false; } }
@@ -659,8 +647,10 @@ static void hist_browser__set_folding_se if (!browser->he_selection) return;
- hist_entry__set_folding(browser->he_selection, browser, unfold); - browser->b.nr_entries = hist_browser__nr_entries(browser); + if (unfold == browser->he_selection->unfolded) + return; + + hist_browser__toggle_fold(browser); }
static void ui_browser__warn_lost_events(struct ui_browser *browser) @@ -732,8 +722,8 @@ static int hist_browser__handle_hotkey(s hist_browser__set_folding(browser, true); break; case 'e': - /* Expand the selected entry. */ - hist_browser__set_folding_selected(browser, !hist_browser__he_selection_unfolded(browser)); + /* Toggle expand/collapse the selected entry. */ + hist_browser__toggle_fold(browser); break; case 'H': browser->show_headers = !browser->show_headers;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wenjing Liu wenjing.liu@amd.com
commit 49a30c3d1a2258fc93cfe6eea8e4951dabadc824 upstream.
ODM power optimization is only supported with single stream. When ODM power optimization is enabled, we might not have enough free pipes for enabling other stream. So when we are committing more than 1 stream we should first switch off ODM power optimization to make room for new stream and then allocating pipe resource for the new stream.
Cc: stable@vger.kernel.org Fixes: 59de751e3845 ("drm/amd/display: add ODM case when looking for first split pipe") Reviewed-by: Dillon Varone dillon.varone@amd.com Acked-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Wenjing Liu wenjing.liu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/core/dc.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -2061,12 +2061,12 @@ enum dc_status dc_commit_streams(struct } }
- /* Check for case where we are going from odm 2:1 to max - * pipe scenario. For these cases, we will call - * commit_minimal_transition_state() to exit out of odm 2:1 - * first before processing new streams + /* ODM Combine 2:1 power optimization is only applied for single stream + * scenario, it uses extra pipes than needed to reduce power consumption + * We need to switch off this feature to make room for new streams. */ - if (stream_count == dc->res_pool->pipe_count) { + if (stream_count > dc->current_state->stream_count && + dc->current_state->stream_count == 1) { for (i = 0; i < dc->res_pool->pipe_count; i++) { pipe = &dc->current_state->res_ctx.pipe_ctx[i]; if (pipe->next_odm_pipe)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabe Teeger gabe.teeger@amd.com
commit 5a3ccb1400339268c5e3dc1fa044a7f6c7f59a02 upstream.
[Why] We wait for mpc idle while in a locked state, leading to potential deadlock.
[What] Move the wait_for_idle call to outside of HW lock. This and a call to wait_drr_doublebuffer_pending_clear are moved added to a new static helper function called wait_for_outstanding_hw_updates, to make the interface clearer.
Cc: stable@vger.kernel.org Fixes: 8f0d304d21b3 ("drm/amd/display: Do not commit pipe when updating DRR") Reviewed-by: Jun Lei jun.lei@amd.com Acked-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Gabe Teeger gabe.teeger@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/Makefile | 1 drivers/gpu/drm/amd/display/dc/core/dc.c | 58 ++++++++++++++------- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 11 --- 3 files changed, 42 insertions(+), 28 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/Makefile +++ b/drivers/gpu/drm/amd/display/dc/Makefile @@ -78,3 +78,4 @@ DC_EDID += dc_edid_parser.o AMD_DISPLAY_DMUB = $(addprefix $(AMDDALPATH)/dc/,$(DC_DMUB)) AMD_DISPLAY_EDID = $(addprefix $(AMDDALPATH)/dc/,$(DC_EDID)) AMD_DISPLAY_FILES += $(AMD_DISPLAY_DMUB) $(AMD_DISPLAY_EDID) + --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -3589,6 +3589,45 @@ static void commit_planes_for_stream_fas top_pipe_to_program->stream->update_flags.raw = 0; }
+static void wait_for_outstanding_hw_updates(struct dc *dc, const struct dc_state *dc_context) +{ +/* + * This function calls HWSS to wait for any potentially double buffered + * operations to complete. It should be invoked as a pre-amble prior + * to full update programming before asserting any HW locks. + */ + int pipe_idx; + int opp_inst; + int opp_count = dc->res_pool->pipe_count; + struct hubp *hubp; + int mpcc_inst; + const struct pipe_ctx *pipe_ctx; + + for (pipe_idx = 0; pipe_idx < dc->res_pool->pipe_count; pipe_idx++) { + pipe_ctx = &dc_context->res_ctx.pipe_ctx[pipe_idx]; + + if (!pipe_ctx->stream) + continue; + + if (pipe_ctx->stream_res.tg->funcs->wait_drr_doublebuffer_pending_clear) + pipe_ctx->stream_res.tg->funcs->wait_drr_doublebuffer_pending_clear(pipe_ctx->stream_res.tg); + + hubp = pipe_ctx->plane_res.hubp; + if (!hubp) + continue; + + mpcc_inst = hubp->inst; + // MPCC inst is equal to pipe index in practice + for (opp_inst = 0; opp_inst < opp_count; opp_inst++) { + if (dc->res_pool->opps[opp_inst]->mpcc_disconnect_pending[mpcc_inst]) { + dc->res_pool->mpc->funcs->wait_for_idle(dc->res_pool->mpc, mpcc_inst); + dc->res_pool->opps[opp_inst]->mpcc_disconnect_pending[mpcc_inst] = false; + break; + } + } + } +} + static void commit_planes_for_stream(struct dc *dc, struct dc_surface_update *srf_updates, int surface_count, @@ -3607,24 +3646,9 @@ static void commit_planes_for_stream(str // dc->current_state anymore, so we have to cache it before we apply // the new SubVP context subvp_prev_use = false; - - dc_z10_restore(dc); - - if (update_type == UPDATE_TYPE_FULL) { - /* wait for all double-buffer activity to clear on all pipes */ - int pipe_idx; - - for (pipe_idx = 0; pipe_idx < dc->res_pool->pipe_count; pipe_idx++) { - struct pipe_ctx *pipe_ctx = &context->res_ctx.pipe_ctx[pipe_idx]; - - if (!pipe_ctx->stream) - continue; - - if (pipe_ctx->stream_res.tg->funcs->wait_drr_doublebuffer_pending_clear) - pipe_ctx->stream_res.tg->funcs->wait_drr_doublebuffer_pending_clear(pipe_ctx->stream_res.tg); - } - } + if (update_type == UPDATE_TYPE_FULL) + wait_for_outstanding_hw_updates(dc, context);
if (update_type == UPDATE_TYPE_FULL) { dc_allow_idle_optimizations(dc, false); --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c @@ -1580,17 +1580,6 @@ static void dcn20_update_dchubp_dpp( || plane_state->update_flags.bits.global_alpha_change || plane_state->update_flags.bits.per_pixel_alpha_change) { // MPCC inst is equal to pipe index in practice - int mpcc_inst = hubp->inst; - int opp_inst; - int opp_count = dc->res_pool->pipe_count; - - for (opp_inst = 0; opp_inst < opp_count; opp_inst++) { - if (dc->res_pool->opps[opp_inst]->mpcc_disconnect_pending[mpcc_inst]) { - dc->res_pool->mpc->funcs->wait_for_idle(dc->res_pool->mpc, mpcc_inst); - dc->res_pool->opps[opp_inst]->mpcc_disconnect_pending[mpcc_inst] = false; - break; - } - } hws->funcs.update_mpcc(dc, pipe_ctx); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jay Cornwall jay.cornwall@amd.com
commit e9dca969b2426702a73719ab9207e43c6d80b581 upstream.
mqd_stride function was introduced in commit 2f77b9a242a2 ("drm/amdkfd: Update MQD management on multi XCC setup") but not assigned for gfx11. Fixes a NULL dereference in debugfs.
Signed-off-by: Jay Cornwall jay.cornwall@amd.com Signed-off-by: Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 6.5.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c @@ -437,6 +437,7 @@ struct mqd_manager *mqd_manager_init_v11 mqd->is_occupied = kfd_is_occupied_cp; mqd->mqd_size = sizeof(struct v11_compute_mqd); mqd->get_wave_state = get_wave_state; + mqd->mqd_stride = kfd_mqd_stride; #if defined(CONFIG_DEBUG_FS) mqd->debugfs_show_mqd = debugfs_show_mqd; #endif @@ -452,6 +453,7 @@ struct mqd_manager *mqd_manager_init_v11 mqd->destroy_mqd = kfd_destroy_mqd_cp; mqd->is_occupied = kfd_is_occupied_cp; mqd->mqd_size = sizeof(struct v11_compute_mqd); + mqd->mqd_stride = kfd_mqd_stride; #if defined(CONFIG_DEBUG_FS) mqd->debugfs_show_mqd = debugfs_show_mqd; #endif @@ -481,6 +483,7 @@ struct mqd_manager *mqd_manager_init_v11 mqd->destroy_mqd = kfd_destroy_mqd_sdma; mqd->is_occupied = kfd_is_occupied_sdma; mqd->mqd_size = sizeof(struct v11_sdma_mqd); + mqd->mqd_stride = kfd_mqd_stride; #if defined(CONFIG_DEBUG_FS) mqd->debugfs_show_mqd = debugfs_show_mqd_sdma; #endif
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hamza Mahfooz hamza.mahfooz@amd.com
commit 0a611560f53bfd489e33f4a718c915f1a6123d03 upstream.
fbcon requires that we implement &drm_framebuffer_funcs.dirty. Otherwise, the framebuffer might take a while to flush (which would manifest as noticeable lag). However, we can't enable this callback for non-fbcon cases since it may cause too many atomic commits to be made at once. So, implement amdgpu_dirtyfb() and only enable it for fbcon framebuffers (we can use the "struct drm_file file" parameter in the callback to check for this since it is only NULL when called by fbcon, at least in the mainline kernel) on devices that support atomic KMS.
Cc: Aurabindo Pillai aurabindo.pillai@amd.com Cc: Mario Limonciello mario.limonciello@amd.com Cc: stable@vger.kernel.org # 6.1+ Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2519 Reviewed-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c @@ -38,6 +38,8 @@ #include <linux/pci.h> #include <linux/pm_runtime.h> #include <drm/drm_crtc_helper.h> +#include <drm/drm_damage_helper.h> +#include <drm/drm_drv.h> #include <drm/drm_edid.h> #include <drm/drm_fb_helper.h> #include <drm/drm_gem_framebuffer_helper.h> @@ -529,11 +531,29 @@ bool amdgpu_display_ddc_probe(struct amd return true; }
+static int amdgpu_dirtyfb(struct drm_framebuffer *fb, struct drm_file *file, + unsigned int flags, unsigned int color, + struct drm_clip_rect *clips, unsigned int num_clips) +{ + + if (file) + return -ENOSYS; + + return drm_atomic_helper_dirtyfb(fb, file, flags, color, clips, + num_clips); +} + static const struct drm_framebuffer_funcs amdgpu_fb_funcs = { .destroy = drm_gem_fb_destroy, .create_handle = drm_gem_fb_create_handle, };
+static const struct drm_framebuffer_funcs amdgpu_fb_funcs_atomic = { + .destroy = drm_gem_fb_destroy, + .create_handle = drm_gem_fb_create_handle, + .dirty = amdgpu_dirtyfb +}; + uint32_t amdgpu_display_supported_domains(struct amdgpu_device *adev, uint64_t bo_flags) { @@ -1136,7 +1156,11 @@ static int amdgpu_display_gem_fb_verify_ if (ret) goto err;
- ret = drm_framebuffer_init(dev, &rfb->base, &amdgpu_fb_funcs); + if (drm_drv_uses_atomic_modeset(dev)) + ret = drm_framebuffer_init(dev, &rfb->base, + &amdgpu_fb_funcs_atomic); + else + ret = drm_framebuffer_init(dev, &rfb->base, &amdgpu_fb_funcs);
if (ret) goto err;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 6a5a148aaf14747570cc634f9cdfcb0393f5617f ]
bpf_probe_read_kernel() has a __weak definition in core.c and another definition with an incompatible prototype in kernel/trace/bpf_trace.c, when CONFIG_BPF_EVENTS is enabled.
Since the two are incompatible, there cannot be a shared declaration in a header file, but the lack of a prototype causes a W=1 warning:
kernel/bpf/core.c:1638:12: error: no previous prototype for 'bpf_probe_read_kernel' [-Werror=missing-prototypes]
On 32-bit architectures, the local prototype
u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
passes arguments in other registers as the one in bpf_trace.c
BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, const void *, unsafe_ptr)
which uses 64-bit arguments in pairs of registers.
As both versions of the function are fairly simple and only really differ in one line, just move them into a header file as an inline function that does not add any overhead for the bpf_trace.c callers and actually avoids a function call for the other one.
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/ac25cb0f-b804-1649-3afb-1dc6138c2716@iogearbox.n... Signed-off-by: Arnd Bergmann arnd@arndb.de Acked-by: Yonghong Song yonghong.song@linux.dev Link: https://lore.kernel.org/r/20230801111449.185301-1-arnd@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/bpf.h | 12 ++++++++++++ kernel/bpf/core.c | 10 ++-------- kernel/trace/bpf_trace.c | 11 ----------- 3 files changed, 14 insertions(+), 19 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f58895830adae..f316affcd2e13 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2619,6 +2619,18 @@ static inline void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr) } #endif /* CONFIG_BPF_SYSCALL */
+static __always_inline int +bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr) +{ + int ret = -EFAULT; + + if (IS_ENABLED(CONFIG_BPF_EVENTS)) + ret = copy_from_kernel_nofault(dst, unsafe_ptr, size); + if (unlikely(ret < 0)) + memset(dst, 0, size); + return ret; +} + void __bpf_free_used_btfs(struct bpf_prog_aux *aux, struct btf_mod_pair *used_btfs, u32 len);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index dc85240a01342..e3e45b651cd40 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1635,12 +1635,6 @@ bool bpf_opcode_in_insntable(u8 code) }
#ifndef CONFIG_BPF_JIT_ALWAYS_ON -u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr) -{ - memset(dst, 0, size); - return -EFAULT; -} - /** * ___bpf_prog_run - run eBPF program on a given context * @regs: is the array of MAX_BPF_EXT_REG eBPF pseudo-registers @@ -1931,8 +1925,8 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn) DST = *(SIZE *)(unsigned long) (SRC + insn->off); \ CONT; \ LDX_PROBE_MEM_##SIZEOP: \ - bpf_probe_read_kernel(&DST, sizeof(SIZE), \ - (const void *)(long) (SRC + insn->off)); \ + bpf_probe_read_kernel_common(&DST, sizeof(SIZE), \ + (const void *)(long) (SRC + insn->off)); \ DST = *((SIZE *)&DST); \ CONT;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 30d8db47c1e2f..abf287b2678a1 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -223,17 +223,6 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto = { .arg3_type = ARG_ANYTHING, };
-static __always_inline int -bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr) -{ - int ret; - - ret = copy_from_kernel_nofault(dst, unsafe_ptr, size); - if (unlikely(ret < 0)) - memset(dst, 0, size); - return ret; -} - BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, const void *, unsafe_ptr) {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Biju Das biju.das.jz@bp.renesas.com
[ Upstream commit 42a95739c5bc4d7a6e93a43117e9283598ba2287 ]
Change the scope of the variables {clkin_name, xin_name} from global->local to fix the below warning.
drivers/regulator/raa215300.c:42:12: sparse: sparse: symbol 'xin_name' was not declared. Should it be static?
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202306250552.Fan9WTiN-lkp@intel.com/ Signed-off-by: Biju Das biju.das.jz@bp.renesas.com Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20230629104200.102663-1-biju.das.jz@bp.renesas.com Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: e21ac64e669e ("regulator: raa215300: Fix resource leak in case of error") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/raa215300.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/regulator/raa215300.c b/drivers/regulator/raa215300.c index 24a1c89f5dbc9..8e1a4c86b9789 100644 --- a/drivers/regulator/raa215300.c +++ b/drivers/regulator/raa215300.c @@ -38,8 +38,6 @@ #define RAA215300_REG_BLOCK_EN_RTC_EN BIT(6) #define RAA215300_RTC_DEFAULT_ADDR 0x6f
-const char *clkin_name = "clkin"; -const char *xin_name = "xin"; static struct clk *clk;
static const struct regmap_config raa215300_regmap_config = { @@ -71,8 +69,10 @@ static int raa215300_clk_present(struct i2c_client *client, const char *name) static int raa215300_i2c_probe(struct i2c_client *client) { struct device *dev = &client->dev; - const char *clk_name = xin_name; + const char *clkin_name = "clkin"; unsigned int pmic_version, val; + const char *xin_name = "xin"; + const char *clk_name = NULL; struct regmap *regmap; int ret;
@@ -114,15 +114,17 @@ static int raa215300_i2c_probe(struct i2c_client *client) ret = raa215300_clk_present(client, xin_name); if (ret < 0) { return ret; - } else if (!ret) { + } else if (ret) { + clk_name = xin_name; + } else { ret = raa215300_clk_present(client, clkin_name); if (ret < 0) return ret; - - clk_name = clkin_name; + if (ret) + clk_name = clkin_name; }
- if (ret) { + if (clk_name) { char *name = pmic_version >= 0x12 ? "isl1208" : "raa215300_a0"; struct device_node *np = client->dev.of_node; u32 addr = RAA215300_RTC_DEFAULT_ADDR;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Biju Das biju.das.jz@bp.renesas.com
[ Upstream commit e21ac64e669e960688e79bf5babeed63132dac8a ]
The clk_register_clkdev() allocates memory by calling vclkdev_alloc() and this memory is not freed in the error path. Similarly, resources allocated by clk_register_fixed_rate() are not freed in the error path.
Fix these issues by using devm_clk_hw_register_fixed_rate() and devm_clk_hw_register_clkdev().
After this, the static variable clk is not needed. Replace it with local variable hw in probe() and drop calling clk_unregister_fixed_rate() from raa215300_rtc_unregister_device().
Fixes: 7bce16630837 ("regulator: Add Renesas PMIC RAA215300 driver") Cc: stable@kernel.org Signed-off-by: Biju Das biju.das.jz@bp.renesas.com Link: https://lore.kernel.org/r/20230816135550.146657-2-biju.das.jz@bp.renesas.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/raa215300.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/regulator/raa215300.c b/drivers/regulator/raa215300.c index 8e1a4c86b9789..253645696d0bb 100644 --- a/drivers/regulator/raa215300.c +++ b/drivers/regulator/raa215300.c @@ -38,8 +38,6 @@ #define RAA215300_REG_BLOCK_EN_RTC_EN BIT(6) #define RAA215300_RTC_DEFAULT_ADDR 0x6f
-static struct clk *clk; - static const struct regmap_config raa215300_regmap_config = { .reg_bits = 8, .val_bits = 8, @@ -49,10 +47,6 @@ static const struct regmap_config raa215300_regmap_config = { static void raa215300_rtc_unregister_device(void *data) { i2c_unregister_device(data); - if (!clk) { - clk_unregister_fixed_rate(clk); - clk = NULL; - } }
static int raa215300_clk_present(struct i2c_client *client, const char *name) @@ -130,10 +124,16 @@ static int raa215300_i2c_probe(struct i2c_client *client) u32 addr = RAA215300_RTC_DEFAULT_ADDR; struct i2c_board_info info = {}; struct i2c_client *rtc_client; + struct clk_hw *hw; ssize_t size;
- clk = clk_register_fixed_rate(NULL, clk_name, NULL, 0, 32000); - clk_register_clkdev(clk, clk_name, NULL); + hw = devm_clk_hw_register_fixed_rate(dev, clk_name, NULL, 0, 32000); + if (IS_ERR(hw)) + return PTR_ERR(hw); + + ret = devm_clk_hw_register_clkdev(dev, hw, clk_name, NULL); + if (ret) + return dev_err_probe(dev, ret, "Failed to initialize clkdev\n");
if (np) { int i;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
[ Upstream commit 6428bc7bd3f35e43c8cb7359cb89d83248d339d2 ]
Clean up the code, e.g. make proc_mckinley_root static, drop the now empty mckinley header file and remove some unneeded ifdefs around procfs functions.
Signed-off-by: Helge Deller deller@gmx.de Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202308300800.Jod4sHzM-lkp@intel.com/ Fixes: 77e0ddf097d6 ("parisc: ccio-dma: Create private runway procfs root entry") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/parisc/include/asm/mckinley.h | 8 -------- drivers/parisc/sba_iommu.c | 10 ++-------- 2 files changed, 2 insertions(+), 16 deletions(-) delete mode 100644 arch/parisc/include/asm/mckinley.h
diff --git a/arch/parisc/include/asm/mckinley.h b/arch/parisc/include/asm/mckinley.h deleted file mode 100644 index 1314390b9034b..0000000000000 --- a/arch/parisc/include/asm/mckinley.h +++ /dev/null @@ -1,8 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef ASM_PARISC_MCKINLEY_H -#define ASM_PARISC_MCKINLEY_H - -/* declared in arch/parisc/kernel/setup.c */ -extern struct proc_dir_entry * proc_mckinley_root; - -#endif /*ASM_PARISC_MCKINLEY_H*/ diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c index 8f28f8696bf32..b8e91cbb60567 100644 --- a/drivers/parisc/sba_iommu.c +++ b/drivers/parisc/sba_iommu.c @@ -46,8 +46,6 @@ #include <linux/module.h>
#include <asm/ropes.h> -#include <asm/mckinley.h> /* for proc_mckinley_root */ -#include <asm/runway.h> /* for proc_runway_root */ #include <asm/page.h> /* for PAGE0 */ #include <asm/pdc.h> /* for PDC_MODEL_* */ #include <asm/pdcpat.h> /* for is_pdc_pat() */ @@ -122,7 +120,7 @@ MODULE_PARM_DESC(sba_reserve_agpgart, "Reserve half of IO pdir as AGPGART"); #endif
static struct proc_dir_entry *proc_runway_root __ro_after_init; -struct proc_dir_entry *proc_mckinley_root __ro_after_init; +static struct proc_dir_entry *proc_mckinley_root __ro_after_init;
/************************************ ** SBA register read and write support @@ -1899,9 +1897,7 @@ static int __init sba_driver_callback(struct parisc_device *dev) int i; char *version; void __iomem *sba_addr = ioremap(dev->hpa.start, SBA_FUNC_SIZE); -#ifdef CONFIG_PROC_FS - struct proc_dir_entry *root; -#endif + struct proc_dir_entry *root __maybe_unused;
sba_dump_ranges(sba_addr);
@@ -1967,7 +1963,6 @@ static int __init sba_driver_callback(struct parisc_device *dev)
hppa_dma_ops = &sba_ops;
-#ifdef CONFIG_PROC_FS switch (dev->id.hversion) { case PLUTO_MCKINLEY_PORT: if (!proc_mckinley_root) @@ -1985,7 +1980,6 @@ static int __init sba_driver_callback(struct parisc_device *dev)
proc_create_single("sba_iommu", 0, root, sba_proc_info); proc_create_single("sba_iommu-bitmap", 0, root, sba_proc_bitmap_info); -#endif return 0; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jinjie Ruan ruanjinjie@huawei.com
[ Upstream commit 2810c1e99867a811e631dd24e63e6c1e3b78a59d ]
Inject fault while probing kunit-example-test.ko, if kstrdup() fails in mod_sysfs_setup() in load_module(), the mod->state will switch from MODULE_STATE_COMING to MODULE_STATE_GOING instead of from MODULE_STATE_LIVE to MODULE_STATE_GOING, so only kunit_module_exit() will be called without kunit_module_init(), and the mod->kunit_suites is no set correctly and the free in kunit_free_suite_set() will cause below wild-memory-access bug.
The mod->state state machine when load_module() succeeds:
MODULE_STATE_UNFORMED ---> MODULE_STATE_COMING ---> MODULE_STATE_LIVE ^ | | | delete_module +---------------- MODULE_STATE_GOING <---------+
The mod->state state machine when load_module() fails at mod_sysfs_setup():
MODULE_STATE_UNFORMED ---> MODULE_STATE_COMING ---> MODULE_STATE_GOING ^ | | | +-----------------------------------------------+
Call kunit_module_init() at MODULE_STATE_COMING state to fix the issue because MODULE_STATE_LIVE is transformed from it.
Unable to handle kernel paging request at virtual address ffffff341e942a88 KASAN: maybe wild-memory-access in range [0x0003f9a0f4a15440-0x0003f9a0f4a15447] Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000441ea000 [ffffff341e942a88] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP Modules linked in: kunit_example_test(-) cfg80211 rfkill 8021q garp mrp stp llc ipv6 [last unloaded: kunit_example_test] CPU: 3 PID: 2035 Comm: modprobe Tainted: G W N 6.5.0-next-20230828+ #136 Hardware name: linux,dummy-virt (DT) pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kfree+0x2c/0x70 lr : kunit_free_suite_set+0xcc/0x13c sp : ffff8000829b75b0 x29: ffff8000829b75b0 x28: ffff8000829b7b90 x27: 0000000000000000 x26: dfff800000000000 x25: ffffcd07c82a7280 x24: ffffcd07a50ab300 x23: ffffcd07a50ab2e8 x22: 1ffff00010536ec0 x21: dfff800000000000 x20: ffffcd07a50ab2f0 x19: ffffcd07a50ab2f0 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffffcd07c24b6764 x14: ffffcd07c24b63c0 x13: ffffcd07c4cebb94 x12: ffff700010536ec7 x11: 1ffff00010536ec6 x10: ffff700010536ec6 x9 : dfff800000000000 x8 : 00008fffefac913a x7 : 0000000041b58ab3 x6 : 0000000000000000 x5 : 1ffff00010536ec5 x4 : ffff8000829b7628 x3 : dfff800000000000 x2 : ffffff341e942a80 x1 : ffffcd07a50aa000 x0 : fffffc0000000000 Call trace: kfree+0x2c/0x70 kunit_free_suite_set+0xcc/0x13c kunit_module_notify+0xd8/0x360 blocking_notifier_call_chain+0xc4/0x128 load_module+0x382c/0x44a4 init_module_from_file+0xd4/0x128 idempotent_init_module+0x2c8/0x524 __arm64_sys_finit_module+0xac/0x100 invoke_syscall+0x6c/0x258 el0_svc_common.constprop.0+0x160/0x22c do_el0_svc+0x44/0x5c el0_svc+0x38/0x78 el0t_64_sync_handler+0x13c/0x158 el0t_64_sync+0x190/0x194 Code: aa0003e1 b25657e0 d34cfc42 8b021802 (f9400440) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs Kernel Offset: 0x4d0742200000 from 0xffff800080000000 PHYS_OFFSET: 0xffffee43c0000000 CPU features: 0x88000203,3c020000,1000421b Memory Limit: none Rebooting in 1 seconds..
Fixes: 3d6e44623841 ("kunit: unify module and builtin suite definitions") Signed-off-by: Jinjie Ruan ruanjinjie@huawei.com Reviewed-by: Rae Moar rmoar@google.com Reviewed-by: David Gow davidgow@google.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- lib/kunit/test.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c index 84e4666555c94..e8c9dd9d73a30 100644 --- a/lib/kunit/test.c +++ b/lib/kunit/test.c @@ -744,12 +744,13 @@ static int kunit_module_notify(struct notifier_block *nb, unsigned long val,
switch (val) { case MODULE_STATE_LIVE: - kunit_module_init(mod); break; case MODULE_STATE_GOING: kunit_module_exit(mod); break; case MODULE_STATE_COMING: + kunit_module_init(mod); + break; case MODULE_STATE_UNFORMED: break; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liu Jian liujian56@huawei.com
[ Upstream commit ac28b1ec6135649b5d78b028e47264cb3ebca5ea ]
I got the below warning when do fuzzing test: unregister_netdevice: waiting for bond0 to become free. Usage count = 2
It can be repoduced via:
ip link add bond0 type bond sysctl -w net.ipv4.conf.bond0.promote_secondaries=1 ip addr add 4.117.174.103/0 scope 0x40 dev bond0 ip addr add 192.168.100.111/255.255.255.254 scope 0 dev bond0 ip addr add 0.0.0.4/0 scope 0x40 secondary dev bond0 ip addr del 4.117.174.103/0 scope 0x40 dev bond0 ip link delete bond0 type bond
In this reproduction test case, an incorrect 'last_prim' is found in __inet_del_ifa(), as a result, the secondary address(0.0.0.4/0 scope 0x40) is lost. The memory of the secondary address is leaked and the reference of in_device and net_device is leaked.
Fix this problem: Look for 'last_prim' starting at location of the deleted IP and inserting the promoted IP into the location of 'last_prim'.
Fixes: 0ff60a45678e ("[IPV4]: Fix secondary IP addresses after promotion") Signed-off-by: Liu Jian liujian56@huawei.com Signed-off-by: Julian Anastasov ja@ssi.bg Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/devinet.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 5deac0517ef70..37be82496322d 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -355,14 +355,14 @@ static void __inet_del_ifa(struct in_device *in_dev, { struct in_ifaddr *promote = NULL; struct in_ifaddr *ifa, *ifa1; - struct in_ifaddr *last_prim; + struct in_ifaddr __rcu **last_prim; struct in_ifaddr *prev_prom = NULL; int do_promote = IN_DEV_PROMOTE_SECONDARIES(in_dev);
ASSERT_RTNL();
ifa1 = rtnl_dereference(*ifap); - last_prim = rtnl_dereference(in_dev->ifa_list); + last_prim = ifap; if (in_dev->dead) goto no_promotions;
@@ -376,7 +376,7 @@ static void __inet_del_ifa(struct in_device *in_dev, while ((ifa = rtnl_dereference(*ifap1)) != NULL) { if (!(ifa->ifa_flags & IFA_F_SECONDARY) && ifa1->ifa_scope <= ifa->ifa_scope) - last_prim = ifa; + last_prim = &ifa->ifa_next;
if (!(ifa->ifa_flags & IFA_F_SECONDARY) || ifa1->ifa_mask != ifa->ifa_mask || @@ -440,9 +440,9 @@ static void __inet_del_ifa(struct in_device *in_dev,
rcu_assign_pointer(prev_prom->ifa_next, next_sec);
- last_sec = rtnl_dereference(last_prim->ifa_next); + last_sec = rtnl_dereference(*last_prim); rcu_assign_pointer(promote->ifa_next, last_sec); - rcu_assign_pointer(last_prim->ifa_next, promote); + rcu_assign_pointer(*last_prim, promote); }
promote->ifa_flags &= ~IFA_F_SECONDARY;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Björn Töpel bjorn@rivosinc.com
[ Upstream commit 9616cb34b08ec86642b162eae75c5a7ca8debe3c ]
Timeouts in kselftest are done using the "timeout" command with the "--foreground" option. Without the "foreground" option, it is not possible for a user to cancel the runner using SIGINT, because the signal is not propagated to timeout which is running in a different process group. The "forground" options places the timeout in the same process group as its parent, but only sends the SIGTERM (on timeout) signal to the forked process. Unfortunately, this does not play nice with all kselftests, e.g. "net:fcnal-test.sh", where the child processes will linger because timeout does not send SIGTERM to the group.
Some users have noted these hangs [1].
Fix this by nesting the timeout with an additional timeout without the foreground option.
Link: https://lore.kernel.org/all/7650b2eb-0aee-a2b0-2e64-c9bc63210f67@alu.unizg.h... # [1] Fixes: 651e0d881461 ("kselftest/runner: allow to properly deliver signals to tests") Signed-off-by: Björn Töpel bjorn@rivosinc.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kselftest/runner.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh index 1c952d1401d46..70e0a465e30da 100644 --- a/tools/testing/selftests/kselftest/runner.sh +++ b/tools/testing/selftests/kselftest/runner.sh @@ -36,7 +36,8 @@ tap_timeout() { # Make sure tests will time out if utility is available. if [ -x /usr/bin/timeout ] ; then - /usr/bin/timeout --foreground "$kselftest_timeout" $1 + /usr/bin/timeout --foreground "$kselftest_timeout" \ + /usr/bin/timeout "$kselftest_timeout" $1 else $1 fi
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Björn Töpel bjorn@rivosinc.com
[ Upstream commit 3f3f384139ed147c71e1d770accf610133d5309b ]
When kselftest is built/installed with the 'gen_tar' target, rsync is used for the installation step to copy files. Extra care is needed for tests that have symlinks. Commit ae108c48b5d2 ("selftests: net: Fix cross-tree inclusion of scripts") added '-L' (transform symlink into referent file/dir) to rsync, to fix dangling links. However, that broke some tests where the symlink (being a symlink) is part of the test (e.g. exec:execveat).
Use rsync's '--copy-unsafe-links' that does right thing.
Fixes: ae108c48b5d2 ("selftests: net: Fix cross-tree inclusion of scripts") Signed-off-by: Björn Töpel bjorn@rivosinc.com Reviewed-by: Benjamin Poirier bpoirier@nvidia.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/lib.mk | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk index d17854285f2b6..118e0964bda94 100644 --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -106,7 +106,7 @@ endef run_tests: all ifdef building_out_of_srctree @if [ "X$(TEST_PROGS)$(TEST_PROGS_EXTENDED)$(TEST_FILES)" != "X" ]; then \ - rsync -aLq $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(OUTPUT); \ + rsync -aq --copy-unsafe-links $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(OUTPUT); \ fi @if [ "X$(TEST_PROGS)" != "X" ]; then \ $(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) \ @@ -120,7 +120,7 @@ endif
define INSTALL_SINGLE_RULE $(if $(INSTALL_LIST),@mkdir -p $(INSTALL_PATH)) - $(if $(INSTALL_LIST),rsync -aL $(INSTALL_LIST) $(INSTALL_PATH)/) + $(if $(INSTALL_LIST),rsync -a --copy-unsafe-links $(INSTALL_LIST) $(INSTALL_PATH)/) endef
define INSTALL_RULE
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naveen N Rao naveen@kernel.org
[ Upstream commit 145036f88d693d7ef3aa8537a4b1aa22f8764647 ]
Commit b81a3a100cca1b ("tracing/histogram: Add simple tests for stacktrace usage of synthetic events") changed the output text in tracefs README, but missed updating some of the dependencies specified in selftests. This causes some of the tests to exit as unsupported.
Fix this by changing the grep pattern. Since we want these tests to work on older kernels, match only against the common last part of the pattern.
Link: https://lore.kernel.org/linux-trace-kernel/20230614091046.2178539-1-naveen@k...
Cc: linux-kselftest@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Shuah Khan shuah@kernel.org Fixes: b81a3a100cca ("tracing/histogram: Add simple tests for stacktrace usage of synthetic events") Signed-off-by: Naveen N Rao naveen@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../trigger/inter-event/trigger-synthetic-event-dynstring.tc | 2 +- .../inter-event/trigger-synthetic_event_syntax_errors.tc | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-dynstring.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-dynstring.tc index 213d890ed1886..174376ddbc6c7 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-dynstring.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-dynstring.tc @@ -1,7 +1,7 @@ #!/bin/sh # SPDX-License-Identifier: GPL-2.0 # description: event trigger - test inter-event histogram trigger trace action with dynamic string param -# requires: set_event synthetic_events events/sched/sched_process_exec/hist "char name[]' >> synthetic_events":README ping:program +# requires: set_event synthetic_events events/sched/sched_process_exec/hist "' >> synthetic_events":README ping:program
fail() { #msg echo $1 diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc index 955e3ceea44b5..b927ee54c02da 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc @@ -1,7 +1,7 @@ #!/bin/sh # SPDX-License-Identifier: GPL-2.0 # description: event trigger - test synthetic_events syntax parser errors -# requires: synthetic_events error_log "char name[]' >> synthetic_events":README +# requires: synthetic_events error_log "' >> synthetic_events":README
check_error() { # command-with-error-pos-by-^ ftrace_errlog_check 'synthetic_events' "$1" 'synthetic_events'
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jinjie Ruan ruanjinjie@huawei.com
[ Upstream commit 281f65d29d6da1a9b6907fb0b145aaf34f4e4822 ]
Inject fault When select CONFIG_VCAP_KUNIT_TEST, the below memory leak occurs. If kzalloc() for duprule succeeds, but the following kmemdup() fails, the duprule, ckf and caf memory will be leaked. So kfree them in the error path.
unreferenced object 0xffff122744c50600 (size 192): comm "kunit_try_catch", pid 346, jiffies 4294896122 (age 911.812s) hex dump (first 32 bytes): 10 27 00 00 04 00 00 00 1e 00 00 00 2c 01 00 00 .'..........,... 00 00 00 00 00 00 00 00 18 06 c5 44 27 12 ff ff ...........D'... backtrace: [<00000000394b0db8>] __kmem_cache_alloc_node+0x274/0x2f8 [<0000000001bedc67>] kmalloc_trace+0x38/0x88 [<00000000b0612f98>] vcap_dup_rule+0x50/0x460 [<000000005d2d3aca>] vcap_add_rule+0x8cc/0x1038 [<00000000eef9d0f8>] test_vcap_xn_rule_creator.constprop.0.isra.0+0x238/0x494 [<00000000cbda607b>] vcap_api_rule_remove_in_front_test+0x1ac/0x698 [<00000000c8766299>] kunit_try_run_case+0xe0/0x20c [<00000000c4fe9186>] kunit_generic_run_threadfn_adapter+0x50/0x94 [<00000000f6864acf>] kthread+0x2e8/0x374 [<0000000022e639b3>] ret_from_fork+0x10/0x20
Fixes: 814e7693207f ("net: microchip: vcap api: Add a storage state to a VCAP rule") Signed-off-by: Jinjie Ruan ruanjinjie@huawei.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/vcap/vcap_api.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microchip/vcap/vcap_api.c b/drivers/net/ethernet/microchip/vcap/vcap_api.c index a418ad8e8770a..15a3a31b15d45 100644 --- a/drivers/net/ethernet/microchip/vcap/vcap_api.c +++ b/drivers/net/ethernet/microchip/vcap/vcap_api.c @@ -1021,18 +1021,32 @@ static struct vcap_rule_internal *vcap_dup_rule(struct vcap_rule_internal *ri, list_for_each_entry(ckf, &ri->data.keyfields, ctrl.list) { newckf = kmemdup(ckf, sizeof(*newckf), GFP_KERNEL); if (!newckf) - return ERR_PTR(-ENOMEM); + goto err; list_add_tail(&newckf->ctrl.list, &duprule->data.keyfields); }
list_for_each_entry(caf, &ri->data.actionfields, ctrl.list) { newcaf = kmemdup(caf, sizeof(*newcaf), GFP_KERNEL); if (!newcaf) - return ERR_PTR(-ENOMEM); + goto err; list_add_tail(&newcaf->ctrl.list, &duprule->data.actionfields); }
return duprule; + +err: + list_for_each_entry_safe(ckf, newckf, &duprule->data.keyfields, ctrl.list) { + list_del(&ckf->ctrl.list); + kfree(ckf); + } + + list_for_each_entry_safe(caf, newcaf, &duprule->data.actionfields, ctrl.list) { + list_del(&caf->ctrl.list); + kfree(caf); + } + + kfree(duprule); + return ERR_PTR(-ENOMEM); }
static void vcap_apply_width(u8 *dst, int width, int bytes)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ratheesh Kannoth rkannoth@marvell.com
[ Upstream commit 88e69af061f2e061a68751ef9cad47a674527a1b ]
The access to page pool `cache' array and the `count' variable is not locked. Page pool cache access is fine as long as there is only one consumer per pool.
octeontx2 driver fills in rx buffers from page pool in NAPI context. If system is stressed and could not allocate buffers, refiiling work will be delegated to a delayed workqueue. This means that there are two cosumers to the page pool cache.
Either workqueue or IRQ/NAPI can be run on other CPU. This will lead to lock less access, hence corruption of cache pool indexes.
To fix this issue, NAPI is rescheduled from workqueue context to refill rx buffers.
Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool") Signed-off-by: Ratheesh Kannoth rkannoth@marvell.com Reported-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Reviewed-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/marvell/octeontx2/nic/cn10k.c | 6 ++- .../ethernet/marvell/octeontx2/nic/cn10k.h | 2 +- .../marvell/octeontx2/nic/otx2_common.c | 43 +++---------------- .../marvell/octeontx2/nic/otx2_common.h | 3 +- .../ethernet/marvell/octeontx2/nic/otx2_pf.c | 7 +-- .../marvell/octeontx2/nic/otx2_txrx.c | 30 ++++++++++--- .../marvell/octeontx2/nic/otx2_txrx.h | 4 +- 7 files changed, 44 insertions(+), 51 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.c index 826f691de2595..a4a258da8dd59 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.c @@ -107,12 +107,13 @@ int cn10k_sq_aq_init(void *dev, u16 qidx, u16 sqb_aura) }
#define NPA_MAX_BURST 16 -void cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq) +int cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq) { struct otx2_nic *pfvf = dev; + int cnt = cq->pool_ptrs; u64 ptrs[NPA_MAX_BURST]; - int num_ptrs = 1; dma_addr_t bufptr; + int num_ptrs = 1;
/* Refill pool with new buffers */ while (cq->pool_ptrs) { @@ -131,6 +132,7 @@ void cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq) num_ptrs = 1; } } + return cnt - cq->pool_ptrs; }
void cn10k_sqe_flush(void *dev, struct otx2_snd_queue *sq, int size, int qidx) diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.h b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.h index 8ae96815865e6..c1861f7de2545 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.h +++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.h @@ -24,7 +24,7 @@ static inline int mtu_to_dwrr_weight(struct otx2_nic *pfvf, int mtu) return weight; }
-void cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq); +int cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq); void cn10k_sqe_flush(void *dev, struct otx2_snd_queue *sq, int size, int qidx); int cn10k_sq_aq_init(void *dev, u16 qidx, u16 sqb_aura); int cn10k_lmtst_init(struct otx2_nic *pfvf); diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c index b9712040a0bc2..20ecc90d203e0 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c @@ -573,20 +573,8 @@ int otx2_alloc_rbuf(struct otx2_nic *pfvf, struct otx2_pool *pool, int otx2_alloc_buffer(struct otx2_nic *pfvf, struct otx2_cq_queue *cq, dma_addr_t *dma) { - if (unlikely(__otx2_alloc_rbuf(pfvf, cq->rbpool, dma))) { - struct refill_work *work; - struct delayed_work *dwork; - - work = &pfvf->refill_wrk[cq->cq_idx]; - dwork = &work->pool_refill_work; - /* Schedule a task if no other task is running */ - if (!cq->refill_task_sched) { - cq->refill_task_sched = true; - schedule_delayed_work(dwork, - msecs_to_jiffies(100)); - } + if (unlikely(__otx2_alloc_rbuf(pfvf, cq->rbpool, dma))) return -ENOMEM; - } return 0; }
@@ -1080,39 +1068,20 @@ static int otx2_cq_init(struct otx2_nic *pfvf, u16 qidx) static void otx2_pool_refill_task(struct work_struct *work) { struct otx2_cq_queue *cq; - struct otx2_pool *rbpool; struct refill_work *wrk; - int qidx, free_ptrs = 0; struct otx2_nic *pfvf; - dma_addr_t bufptr; + int qidx;
wrk = container_of(work, struct refill_work, pool_refill_work.work); pfvf = wrk->pf; qidx = wrk - pfvf->refill_wrk; cq = &pfvf->qset.cq[qidx]; - rbpool = cq->rbpool; - free_ptrs = cq->pool_ptrs;
- while (cq->pool_ptrs) { - if (otx2_alloc_rbuf(pfvf, rbpool, &bufptr)) { - /* Schedule a WQ if we fails to free atleast half of the - * pointers else enable napi for this RQ. - */ - if (!((free_ptrs - cq->pool_ptrs) > free_ptrs / 2)) { - struct delayed_work *dwork; - - dwork = &wrk->pool_refill_work; - schedule_delayed_work(dwork, - msecs_to_jiffies(100)); - } else { - cq->refill_task_sched = false; - } - return; - } - pfvf->hw_ops->aura_freeptr(pfvf, qidx, bufptr + OTX2_HEAD_ROOM); - cq->pool_ptrs--; - } cq->refill_task_sched = false; + + local_bh_disable(); + napi_schedule(wrk->napi); + local_bh_enable(); }
int otx2_config_nix_queues(struct otx2_nic *pfvf) diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h index ba8091131ec08..0e81849db3538 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h @@ -301,6 +301,7 @@ struct flr_work { struct refill_work { struct delayed_work pool_refill_work; struct otx2_nic *pf; + struct napi_struct *napi; };
/* PTPv2 originTimestamp structure */ @@ -373,7 +374,7 @@ struct dev_hw_ops { int (*sq_aq_init)(void *dev, u16 qidx, u16 sqb_aura); void (*sqe_flush)(void *dev, struct otx2_snd_queue *sq, int size, int qidx); - void (*refill_pool_ptrs)(void *dev, struct otx2_cq_queue *cq); + int (*refill_pool_ptrs)(void *dev, struct otx2_cq_queue *cq); void (*aura_freeptr)(void *dev, int aura, u64 buf); };
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c index 9551b422622a4..9ded98bb1c890 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c @@ -1942,6 +1942,10 @@ int otx2_stop(struct net_device *netdev)
netif_tx_disable(netdev);
+ for (wrk = 0; wrk < pf->qset.cq_cnt; wrk++) + cancel_delayed_work_sync(&pf->refill_wrk[wrk].pool_refill_work); + devm_kfree(pf->dev, pf->refill_wrk); + otx2_free_hw_resources(pf); otx2_free_cints(pf, pf->hw.cint_cnt); otx2_disable_napi(pf); @@ -1949,9 +1953,6 @@ int otx2_stop(struct net_device *netdev) for (qidx = 0; qidx < netdev->num_tx_queues; qidx++) netdev_tx_reset_queue(netdev_get_tx_queue(netdev, qidx));
- for (wrk = 0; wrk < pf->qset.cq_cnt; wrk++) - cancel_delayed_work_sync(&pf->refill_wrk[wrk].pool_refill_work); - devm_kfree(pf->dev, pf->refill_wrk);
kfree(qset->sq); kfree(qset->cq); diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c index e369baf115301..e77d438489557 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c @@ -424,9 +424,10 @@ static int otx2_rx_napi_handler(struct otx2_nic *pfvf, return processed_cqe; }
-void otx2_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq) +int otx2_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq) { struct otx2_nic *pfvf = dev; + int cnt = cq->pool_ptrs; dma_addr_t bufptr;
while (cq->pool_ptrs) { @@ -435,6 +436,8 @@ void otx2_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq) otx2_aura_freeptr(pfvf, cq->cq_idx, bufptr + OTX2_HEAD_ROOM); cq->pool_ptrs--; } + + return cnt - cq->pool_ptrs; }
static int otx2_tx_napi_handler(struct otx2_nic *pfvf, @@ -521,6 +524,7 @@ int otx2_napi_handler(struct napi_struct *napi, int budget) struct otx2_cq_queue *cq; struct otx2_qset *qset; struct otx2_nic *pfvf; + int filled_cnt = -1;
cq_poll = container_of(napi, struct otx2_cq_poll, napi); pfvf = (struct otx2_nic *)cq_poll->dev; @@ -541,7 +545,7 @@ int otx2_napi_handler(struct napi_struct *napi, int budget) }
if (rx_cq && rx_cq->pool_ptrs) - pfvf->hw_ops->refill_pool_ptrs(pfvf, rx_cq); + filled_cnt = pfvf->hw_ops->refill_pool_ptrs(pfvf, rx_cq); /* Clear the IRQ */ otx2_write64(pfvf, NIX_LF_CINTX_INT(cq_poll->cint_idx), BIT_ULL(0));
@@ -561,9 +565,25 @@ int otx2_napi_handler(struct napi_struct *napi, int budget) otx2_config_irq_coalescing(pfvf, i); }
- /* Re-enable interrupts */ - otx2_write64(pfvf, NIX_LF_CINTX_ENA_W1S(cq_poll->cint_idx), - BIT_ULL(0)); + if (unlikely(!filled_cnt)) { + struct refill_work *work; + struct delayed_work *dwork; + + work = &pfvf->refill_wrk[cq->cq_idx]; + dwork = &work->pool_refill_work; + /* Schedule a task if no other task is running */ + if (!cq->refill_task_sched) { + work->napi = napi; + cq->refill_task_sched = true; + schedule_delayed_work(dwork, + msecs_to_jiffies(100)); + } + } else { + /* Re-enable interrupts */ + otx2_write64(pfvf, + NIX_LF_CINTX_ENA_W1S(cq_poll->cint_idx), + BIT_ULL(0)); + } } return workdone; } diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h index 9e3bfbe5c4809..a82ffca8ce1b1 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h @@ -170,6 +170,6 @@ void cn10k_sqe_flush(void *dev, struct otx2_snd_queue *sq, int size, int qidx); void otx2_sqe_flush(void *dev, struct otx2_snd_queue *sq, int size, int qidx); -void otx2_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq); -void cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq); +int otx2_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq); +int cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq); #endif /* OTX2_TXRX_H */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guangguan Wang guangguan.wang@linux.alibaba.com
[ Upstream commit f5146e3ef0a9eea405874b36178c19a4863b8989 ]
While doing smcr_port_add, there maybe linkgroup add into or delete from smc_lgr_list.list at the same time, which may result kernel crash. So, use smc_lgr_list.lock to protect smc_lgr_list.list iterate in smcr_port_add.
The crash calltrace show below: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 559726 Comm: kworker/0:92 Kdump: loaded Tainted: G Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014 Workqueue: events smc_ib_port_event_work [smc] RIP: 0010:smcr_port_add+0xa6/0xf0 [smc] RSP: 0000:ffffa5a2c8f67de0 EFLAGS: 00010297 RAX: 0000000000000001 RBX: ffff9935e0650000 RCX: 0000000000000000 RDX: 0000000000000010 RSI: ffff9935e0654290 RDI: ffff9935c8560000 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9934c0401918 R10: 0000000000000000 R11: ffffffffb4a5c278 R12: ffff99364029aae4 R13: ffff99364029aa00 R14: 00000000ffffffed R15: ffff99364029ab08 FS: 0000000000000000(0000) GS:ffff994380600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000f06a10003 CR4: 0000000002770ef0 PKRU: 55555554 Call Trace: smc_ib_port_event_work+0x18f/0x380 [smc] process_one_work+0x19b/0x340 worker_thread+0x30/0x370 ? process_one_work+0x340/0x340 kthread+0x114/0x130 ? __kthread_cancel_work+0x50/0x50 ret_from_fork+0x1f/0x30
Fixes: 1f90a05d9ff9 ("net/smc: add smcr_port_add() and smcr_link_up() processing") Signed-off-by: Guangguan Wang guangguan.wang@linux.alibaba.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/smc_core.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 6b78075404d7d..3e89bb9b7c56c 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1654,6 +1654,7 @@ void smcr_port_add(struct smc_ib_device *smcibdev, u8 ibport) { struct smc_link_group *lgr, *n;
+ spin_lock_bh(&smc_lgr_list.lock); list_for_each_entry_safe(lgr, n, &smc_lgr_list.list, list) { struct smc_link *link;
@@ -1669,6 +1670,7 @@ void smcr_port_add(struct smc_ib_device *smcibdev, u8 ibport) if (link) smc_llc_add_link_local(link); } + spin_unlock_bh(&smc_lgr_list.lock); }
/* link is down - switch connections to alternate link,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vincent Whitchurch vincent.whitchurch@axis.com
[ Upstream commit fa60b8163816f194786f3ee334c9a458da7699c6 ]
Setting ethtool -C eth0 tx-usecs 0 is supposed to disable the use of the coalescing timer but currently it gets programmed with zero delay instead.
Disable the use of the coalescing timer if tx-usecs is zero by preventing it from being restarted. Note that to keep things simple we don't start/stop the timer when the coalescing settings are changed, but just let that happen on the next transmit or timer expiry.
Fixes: 8fce33317023 ("net: stmmac: Rework coalesce timer and fix multi-queue races") Signed-off-by: Vincent Whitchurch vincent.whitchurch@axis.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 4727f7be4f86e..6931973028aef 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2703,9 +2703,7 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue)
/* We still have pending packets, let's call for a new scheduling */ if (tx_q->dirty_tx != tx_q->cur_tx) - hrtimer_start(&tx_q->txtimer, - STMMAC_COAL_TIMER(priv->tx_coal_timer[queue]), - HRTIMER_MODE_REL); + stmmac_tx_timer_arm(priv, queue);
__netif_tx_unlock_bh(netdev_get_tx_queue(priv->dev, queue));
@@ -2986,9 +2984,13 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) static void stmmac_tx_timer_arm(struct stmmac_priv *priv, u32 queue) { struct stmmac_tx_queue *tx_q = &priv->dma_conf.tx_queue[queue]; + u32 tx_coal_timer = priv->tx_coal_timer[queue]; + + if (!tx_coal_timer) + return;
hrtimer_start(&tx_q->txtimer, - STMMAC_COAL_TIMER(priv->tx_coal_timer[queue]), + STMMAC_COAL_TIMER(tx_coal_timer), HRTIMER_MODE_REL); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangyu Hua hbh25y@gmail.com
[ Upstream commit 51fe0a470543f345e3c62b6798929de3ddcedc1d ]
rules is allocated in ethtool_get_rxnfc and the size is determined by rule_cnt from user space. So rule_cnt needs to be check before using rules to avoid OOB writing or NULL pointer dereference.
Fixes: 90b509b39ac9 ("net: mvpp2: cls: Add Classification offload support") Signed-off-by: Hangyu Hua hbh25y@gmail.com Reviewed-by: Marcin Wojtas mw@semihalf.com Reviewed-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c index 1fec84b4c068d..0129afa1210e6 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c @@ -5586,6 +5586,11 @@ static int mvpp2_ethtool_get_rxnfc(struct net_device *dev, break; case ETHTOOL_GRXCLSRLALL: for (i = 0; i < MVPP2_N_RFS_ENTRIES_PER_FLOW; i++) { + if (loc == info->rule_cnt) { + ret = -EMSGSIZE; + break; + } + if (port->rfs_rules[i]) rules[loc++] = i; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangyu Hua hbh25y@gmail.com
[ Upstream commit e4c79810755f66c9a933ca810da2724133b1165a ]
rule_locs is allocated in ethtool_get_rxnfc and the size is determined by rule_cnt from user space. So rule_cnt needs to be check before using rule_locs to avoid NULL pointer dereference.
Fixes: 7aab747e5563 ("net: ethernet: mediatek: add ethtool functions to configure RX flows of HW LRO") Signed-off-by: Hangyu Hua hbh25y@gmail.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 2d15342c260ae..7f0807672071f 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -2860,6 +2860,9 @@ static int mtk_hwlro_get_fdir_all(struct net_device *dev, int i;
for (i = 0; i < MTK_MAX_LRO_IP_CNT; i++) { + if (cnt == cmd->rule_cnt) + return -EMSGSIZE; + if (mac->hwlro_ip[i]) { rule_locs[cnt] = i; cnt++;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit 484b4833c604c0adcf19eac1ca14b60b757355b5 ]
Syzbot reports the following uninit-value access problem.
===================================================== BUG: KMSAN: uninit-value in fill_frame_info net/hsr/hsr_forward.c:601 [inline] BUG: KMSAN: uninit-value in hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616 fill_frame_info net/hsr/hsr_forward.c:601 [inline] hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616 hsr_dev_xmit+0x192/0x330 net/hsr/hsr_device.c:223 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3560 __dev_queue_xmit+0x34d0/0x52a0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3087 [inline] packet_sendmsg+0x8b1d/0x9f30 net/packet/af_packet.c:3119 sock_sendmsg_nosec net/socket.c:730 [inline] sock_sendmsg net/socket.c:753 [inline] __sys_sendto+0x781/0xa30 net/socket.c:2176 __do_sys_sendto net/socket.c:2188 [inline] __se_sys_sendto net/socket.c:2184 [inline] __ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82
Uninit was created at: slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767 slab_alloc_node mm/slub.c:3478 [inline] kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523 kmalloc_reserve+0x148/0x470 net/core/skbuff.c:559 __alloc_skb+0x318/0x740 net/core/skbuff.c:644 alloc_skb include/linux/skbuff.h:1286 [inline] alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6299 sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2794 packet_alloc_skb net/packet/af_packet.c:2936 [inline] packet_snd net/packet/af_packet.c:3030 [inline] packet_sendmsg+0x70e8/0x9f30 net/packet/af_packet.c:3119 sock_sendmsg_nosec net/socket.c:730 [inline] sock_sendmsg net/socket.c:753 [inline] __sys_sendto+0x781/0xa30 net/socket.c:2176 __do_sys_sendto net/socket.c:2188 [inline] __se_sys_sendto net/socket.c:2184 [inline] __ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82
It is because VLAN not yet supported in hsr driver. Return error when protocol is ETH_P_8021Q in fill_frame_info() now to fix it.
Fixes: 451d8123f897 ("net: prp: add packet handling support") Reported-by: syzbot+bf7e6250c7ce248f3ec9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=bf7e6250c7ce248f3ec9 Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/hsr/hsr_forward.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/hsr/hsr_forward.c b/net/hsr/hsr_forward.c index 629daacc96071..b71dab630a873 100644 --- a/net/hsr/hsr_forward.c +++ b/net/hsr/hsr_forward.c @@ -594,6 +594,7 @@ static int fill_frame_info(struct hsr_frame_info *frame, proto = vlan_hdr->vlanhdr.h_vlan_encapsulated_proto; /* FIXME: */ netdev_warn_once(skb->dev, "VLAN not yet supported"); + return -EINVAL; }
frame->is_from_san = false;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit 54024dbec95585243391caeb9f04a2620e630765 ]
Use eth_broadcast_addr() to assign broadcast address instead of memset().
Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 32530dba1bd4 ("net:ethernet:adi:adin1110: Fix forwarding offload") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/adi/adin1110.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/adi/adin1110.c b/drivers/net/ethernet/adi/adin1110.c index f5c2d7a9abc10..1c009b485188d 100644 --- a/drivers/net/ethernet/adi/adin1110.c +++ b/drivers/net/ethernet/adi/adin1110.c @@ -739,7 +739,7 @@ static int adin1110_broadcasts_filter(struct adin1110_port_priv *port_priv, u32 port_rules = 0; u8 mask[ETH_ALEN];
- memset(mask, 0xFF, ETH_ALEN); + eth_broadcast_addr(mask);
if (accept_broadcast && port_priv->state == BR_STATE_FORWARDING) port_rules = adin1110_port_rules(port_priv, true, true); @@ -760,7 +760,7 @@ static int adin1110_set_mac_address(struct net_device *netdev, return -EADDRNOTAVAIL;
eth_hw_addr_set(netdev, dev_addr); - memset(mask, 0xFF, ETH_ALEN); + eth_broadcast_addr(mask);
mac_slot = (!port_priv->nr) ? ADIN_MAC_P1_ADDR_SLOT : ADIN_MAC_P2_ADDR_SLOT; port_rules = adin1110_port_rules(port_priv, true, false); @@ -1271,7 +1271,7 @@ static int adin1110_port_set_blocking_state(struct adin1110_port_priv *port_priv goto out;
/* Allow only BPDUs to be passed to the CPU */ - memset(mask, 0xFF, ETH_ALEN); + eth_broadcast_addr(mask); port_rules = adin1110_port_rules(port_priv, true, false); ret = adin1110_write_mac_address(port_priv, mac_slot, mac, mask, port_rules); @@ -1386,7 +1386,7 @@ static int adin1110_fdb_add(struct adin1110_port_priv *port_priv,
other_port = priv->ports[!port_priv->nr]; port_rules = adin1110_port_rules(port_priv, false, true); - memset(mask, 0xFF, ETH_ALEN); + eth_broadcast_addr(mask);
return adin1110_write_mac_address(other_port, mac_nr, (u8 *)fdb->addr, mask, port_rules);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ciprian Regus ciprian.regus@analog.com
[ Upstream commit 32530dba1bd48da4437d18d9a8dbc9d2826938a6 ]
Currently, when a new fdb entry is added (with both ports of the ADIN2111 bridged), the driver configures the MAC filters for the wrong port, which results in the forwarding being done by the host, and not actually hardware offloaded.
The ADIN2111 offloads the forwarding by setting filters on the destination MAC address of incoming frames. Based on these, they may be routed to the other port. Thus, if a frame has to be forwarded from port 1 to port 2, the required configuration for the ADDR_FILT_UPRn register should set the APPLY2PORT1 bit (instead of APPLY2PORT2, as it's currently the case).
Fixes: bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support") Signed-off-by: Ciprian Regus ciprian.regus@analog.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/adi/adin1110.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/adi/adin1110.c b/drivers/net/ethernet/adi/adin1110.c index 1c009b485188d..ca66b747b7c5d 100644 --- a/drivers/net/ethernet/adi/adin1110.c +++ b/drivers/net/ethernet/adi/adin1110.c @@ -1385,7 +1385,7 @@ static int adin1110_fdb_add(struct adin1110_port_priv *port_priv, return -ENOMEM;
other_port = priv->ports[!port_priv->nr]; - port_rules = adin1110_port_rules(port_priv, false, true); + port_rules = adin1110_port_rules(other_port, false, true); eth_broadcast_addr(mask);
return adin1110_write_mac_address(other_port, mac_nr, (u8 *)fdb->addr,
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 02c652f5465011126152bbd93b6a582a1d0c32f1 ]
Commit 4d9423549501 ("net: dsa: sja1105: offload bridge port flags to device") has partially hidden some multicast entries from showing up in the "bridge fdb show" output, but it wasn't enough. Addresses which are added through "bridge mdb add" still show up. Hide them all.
Fixes: 291d1e72b756 ("net: dsa: sja1105: Add support for FDB and MDB management") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/sja1105/sja1105_main.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index b6deba4a75121..ba65a95b0c372 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -1875,13 +1875,14 @@ static int sja1105_fdb_dump(struct dsa_switch *ds, int port, if (!(l2_lookup.destports & BIT(port))) continue;
- /* We need to hide the FDB entry for unknown multicast */ - if (l2_lookup.macaddr == SJA1105_UNKNOWN_MULTICAST && - l2_lookup.mask_macaddr == SJA1105_UNKNOWN_MULTICAST) - continue; - u64_to_ether_addr(l2_lookup.macaddr, macaddr);
+ /* Hardware FDB is shared for fdb and mdb, "bridge fdb show" + * only wants to see unicast + */ + if (is_multicast_ether_addr(macaddr)) + continue; + /* We need to hide the dsa_8021q VLANs from the user. */ if (vid_is_dsa_8021q(l2_lookup.vlanid)) l2_lookup.vlanid = 0;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit c956798062b5a308db96e75157747291197f0378 ]
Currently, sja1105_dynamic_config_wait_complete() returns either 0 or -ETIMEDOUT, because it just looks at the read_poll_timeout() return code.
There will be future changes which move some more checks to sja1105_dynamic_config_poll_valid(). It is important that we propagate their exact return code (-ENOENT, -EINVAL), because callers of sja1105_dynamic_config_read() depend on them.
Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 7cef293b9a63 ("net: dsa: sja1105: fix multicast forwarding working only for last added mdb entry") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/sja1105/sja1105_dynamic_config.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c index 7729d3f8b7f50..93d47dab8d3e9 100644 --- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c +++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c @@ -1211,13 +1211,14 @@ sja1105_dynamic_config_wait_complete(struct sja1105_private *priv, struct sja1105_dyn_cmd *cmd, const struct sja1105_dynamic_table_ops *ops) { - int rc; - - return read_poll_timeout(sja1105_dynamic_config_poll_valid, - rc, rc != -EAGAIN, - SJA1105_DYNAMIC_CONFIG_SLEEP_US, - SJA1105_DYNAMIC_CONFIG_TIMEOUT_US, - false, priv, cmd, ops); + int err, rc; + + err = read_poll_timeout(sja1105_dynamic_config_poll_valid, + rc, rc != -EAGAIN, + SJA1105_DYNAMIC_CONFIG_SLEEP_US, + SJA1105_DYNAMIC_CONFIG_TIMEOUT_US, + false, priv, cmd, ops); + return err < 0 ? err : rc; }
/* Provides read access to the settings through the dynamic interface
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 7cef293b9a634a05fcce9e1df4aee3aeed023345 ]
The commit cited in Fixes: did 2 things: it refactored the read-back polling from sja1105_dynamic_config_read() into a new function, sja1105_dynamic_config_wait_complete(), and it called that from sja1105_dynamic_config_write() too.
What is problematic is the refactoring.
The refactored code from sja1105_dynamic_config_poll_valid() works like the previous one, but the problem is that it uses another packed_buf[] SPI buffer, and there was code at the end of sja1105_dynamic_config_read() which was relying on the read-back packed_buf[]:
/* Don't dereference possibly NULL pointer - maybe caller * only wanted to see whether the entry existed or not. */ if (entry) ops->entry_packing(packed_buf, entry, UNPACK);
After the change, the packed_buf[] that this code sees is no longer the entry read back from hardware, but the original entry that the caller passed to the sja1105_dynamic_config_read(), packed into this buffer.
This difference is the most notable with the SJA1105_SEARCH uses from sja1105pqrs_fdb_add() - used for both fdb and mdb. There, we have logic added by commit 728db843df88 ("net: dsa: sja1105: ignore the FDB entry for unknown multicast when adding a new address") to figure out whether the address we're trying to add matches on any existing hardware entry, with the exception of the catch-all multicast address.
That logic was broken, because with sja1105_dynamic_config_read() not working properly, it doesn't return us the entry read back from hardware, but the entry that we passed to it. And, since for multicast, a match will always exist, it will tell us that any mdb entry already exists at index=0 L2 Address Lookup table. It is index=0 because the caller doesn't know the index - it wants to find it out, and sja1105_dynamic_config_read() does:
if (index < 0) { // SJA1105_SEARCH /* Avoid copying a signed negative number to an u64 */ cmd.index = 0; // <- this cmd.search = true; } else { cmd.index = index; cmd.search = false; }
So, to the caller of sja1105_dynamic_config_read(), the returned info looks entirely legit, and it will add all mdb entries to FDB index 0. There, they will always overwrite each other (not to mention, potentially they can also overwrite a pre-existing bridge fdb entry), and the user-visible impact will be that only the last mdb entry will be forwarded as it should. The others won't (will be flooded or dropped, depending on the egress flood settings).
Fixing is a bit more complicated, and involves either passing the same packed_buf[] to sja1105_dynamic_config_wait_complete(), or moving all the extra processing on the packed_buf[] to sja1105_dynamic_config_wait_complete(). I've opted for the latter, because it makes sja1105_dynamic_config_wait_complete() a bit more self-contained.
Fixes: df405910ab9f ("net: dsa: sja1105: wait for dynamic config command completion on writes too") Reported-by: Yanan Yang yanan.yang@nxp.com Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/dsa/sja1105/sja1105_dynamic_config.c | 80 +++++++++---------- 1 file changed, 37 insertions(+), 43 deletions(-)
diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c index 93d47dab8d3e9..984c0e604e8de 100644 --- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c +++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c @@ -1175,18 +1175,15 @@ const struct sja1105_dynamic_table_ops sja1110_dyn_ops[BLK_IDX_MAX_DYN] = {
static int sja1105_dynamic_config_poll_valid(struct sja1105_private *priv, - struct sja1105_dyn_cmd *cmd, - const struct sja1105_dynamic_table_ops *ops) + const struct sja1105_dynamic_table_ops *ops, + void *entry, bool check_valident, + bool check_errors) { u8 packed_buf[SJA1105_MAX_DYN_CMD_SIZE] = {}; + struct sja1105_dyn_cmd cmd = {}; int rc;
- /* We don't _need_ to read the full entry, just the command area which - * is a fixed SJA1105_SIZE_DYN_CMD. But our cmd_packing() API expects a - * buffer that contains the full entry too. Additionally, our API - * doesn't really know how many bytes into the buffer does the command - * area really begin. So just read back the whole entry. - */ + /* Read back the whole entry + command structure. */ rc = sja1105_xfer_buf(priv, SPI_READ, ops->addr, packed_buf, ops->packed_size); if (rc) @@ -1195,11 +1192,25 @@ sja1105_dynamic_config_poll_valid(struct sja1105_private *priv, /* Unpack the command structure, and return it to the caller in case it * needs to perform further checks on it (VALIDENT). */ - memset(cmd, 0, sizeof(*cmd)); - ops->cmd_packing(packed_buf, cmd, UNPACK); + ops->cmd_packing(packed_buf, &cmd, UNPACK);
/* Hardware hasn't cleared VALID => still working on it */ - return cmd->valid ? -EAGAIN : 0; + if (cmd.valid) + return -EAGAIN; + + if (check_valident && !cmd.valident && !(ops->access & OP_VALID_ANYWAY)) + return -ENOENT; + + if (check_errors && cmd.errors) + return -EINVAL; + + /* Don't dereference possibly NULL pointer - maybe caller + * only wanted to see whether the entry existed or not. + */ + if (entry) + ops->entry_packing(packed_buf, entry, UNPACK); + + return 0; }
/* Poll the dynamic config entry's control area until the hardware has @@ -1208,8 +1219,9 @@ sja1105_dynamic_config_poll_valid(struct sja1105_private *priv, */ static int sja1105_dynamic_config_wait_complete(struct sja1105_private *priv, - struct sja1105_dyn_cmd *cmd, - const struct sja1105_dynamic_table_ops *ops) + const struct sja1105_dynamic_table_ops *ops, + void *entry, bool check_valident, + bool check_errors) { int err, rc;
@@ -1217,7 +1229,8 @@ sja1105_dynamic_config_wait_complete(struct sja1105_private *priv, rc, rc != -EAGAIN, SJA1105_DYNAMIC_CONFIG_SLEEP_US, SJA1105_DYNAMIC_CONFIG_TIMEOUT_US, - false, priv, cmd, ops); + false, priv, ops, entry, check_valident, + check_errors); return err < 0 ? err : rc; }
@@ -1287,25 +1300,14 @@ int sja1105_dynamic_config_read(struct sja1105_private *priv, mutex_lock(&priv->dynamic_config_lock); rc = sja1105_xfer_buf(priv, SPI_WRITE, ops->addr, packed_buf, ops->packed_size); - if (rc < 0) { - mutex_unlock(&priv->dynamic_config_lock); - return rc; - } - - rc = sja1105_dynamic_config_wait_complete(priv, &cmd, ops); - mutex_unlock(&priv->dynamic_config_lock); if (rc < 0) - return rc; + goto out;
- if (!cmd.valident && !(ops->access & OP_VALID_ANYWAY)) - return -ENOENT; + rc = sja1105_dynamic_config_wait_complete(priv, ops, entry, true, false); +out: + mutex_unlock(&priv->dynamic_config_lock);
- /* Don't dereference possibly NULL pointer - maybe caller - * only wanted to see whether the entry existed or not. - */ - if (entry) - ops->entry_packing(packed_buf, entry, UNPACK); - return 0; + return rc; }
int sja1105_dynamic_config_write(struct sja1105_private *priv, @@ -1357,22 +1359,14 @@ int sja1105_dynamic_config_write(struct sja1105_private *priv, mutex_lock(&priv->dynamic_config_lock); rc = sja1105_xfer_buf(priv, SPI_WRITE, ops->addr, packed_buf, ops->packed_size); - if (rc < 0) { - mutex_unlock(&priv->dynamic_config_lock); - return rc; - } - - rc = sja1105_dynamic_config_wait_complete(priv, &cmd, ops); - mutex_unlock(&priv->dynamic_config_lock); if (rc < 0) - return rc; + goto out;
- cmd = (struct sja1105_dyn_cmd) {0}; - ops->cmd_packing(packed_buf, &cmd, UNPACK); - if (cmd.errors) - return -EINVAL; + rc = sja1105_dynamic_config_wait_complete(priv, ops, NULL, false, true); +out: + mutex_unlock(&priv->dynamic_config_lock);
- return 0; + return rc; }
static u8 sja1105_crc8_add(u8 crc, u8 byte, u8 poly)
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit ea32690daf4fa525dc5a4d164bd00ed8c756e1c6 ]
sja1105_fdb_add() runs from the dsa_owq, and sja1105_port_mcast_flood() runs from switchdev_deferred_process_work(). Prior to the blamed commit, they used to be indirectly serialized through the rtnl_lock(), which no longer holds true because dsa_owq dropped that.
So, it is now possible that we traverse the static config BLK_IDX_L2_LOOKUP elements concurrently compared to when we change them, in sja1105_static_fdb_change(). That is not ideal, since it might result in data corruption.
Introduce a mutex which serializes accesses to the hardware FDB and to the static config elements for the L2 Address Lookup table.
I can't find a good reason to add locking around sja1105_fdb_dump(). I'll add it later if needed.
Fixes: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/sja1105/sja1105.h | 2 + drivers/net/dsa/sja1105/sja1105_main.c | 56 ++++++++++++++++++++------ 2 files changed, 45 insertions(+), 13 deletions(-)
diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h index 0617d5ccd3ff1..8c66d3bf61f02 100644 --- a/drivers/net/dsa/sja1105/sja1105.h +++ b/drivers/net/dsa/sja1105/sja1105.h @@ -266,6 +266,8 @@ struct sja1105_private { * the switch doesn't confuse them with one another. */ struct mutex mgmt_lock; + /* Serializes accesses to the FDB */ + struct mutex fdb_lock; /* PTP two-step TX timestamp ID, and its serialization lock */ spinlock_t ts_id_lock; u8 ts_id; diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index ba65a95b0c372..79927191ac623 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -1805,6 +1805,7 @@ static int sja1105_fdb_add(struct dsa_switch *ds, int port, struct dsa_db db) { struct sja1105_private *priv = ds->priv; + int rc;
if (!vid) { switch (db.type) { @@ -1819,12 +1820,16 @@ static int sja1105_fdb_add(struct dsa_switch *ds, int port, } }
- return priv->info->fdb_add_cmd(ds, port, addr, vid); + mutex_lock(&priv->fdb_lock); + rc = priv->info->fdb_add_cmd(ds, port, addr, vid); + mutex_unlock(&priv->fdb_lock); + + return rc; }
-static int sja1105_fdb_del(struct dsa_switch *ds, int port, - const unsigned char *addr, u16 vid, - struct dsa_db db) +static int __sja1105_fdb_del(struct dsa_switch *ds, int port, + const unsigned char *addr, u16 vid, + struct dsa_db db) { struct sja1105_private *priv = ds->priv;
@@ -1844,6 +1849,20 @@ static int sja1105_fdb_del(struct dsa_switch *ds, int port, return priv->info->fdb_del_cmd(ds, port, addr, vid); }
+static int sja1105_fdb_del(struct dsa_switch *ds, int port, + const unsigned char *addr, u16 vid, + struct dsa_db db) +{ + struct sja1105_private *priv = ds->priv; + int rc; + + mutex_lock(&priv->fdb_lock); + rc = __sja1105_fdb_del(ds, port, addr, vid, db); + mutex_unlock(&priv->fdb_lock); + + return rc; +} + static int sja1105_fdb_dump(struct dsa_switch *ds, int port, dsa_fdb_dump_cb_t *cb, void *data) { @@ -1906,6 +1925,8 @@ static void sja1105_fast_age(struct dsa_switch *ds, int port) }; int i;
+ mutex_lock(&priv->fdb_lock); + for (i = 0; i < SJA1105_MAX_L2_LOOKUP_COUNT; i++) { struct sja1105_l2_lookup_entry l2_lookup = {0}; u8 macaddr[ETH_ALEN]; @@ -1919,7 +1940,7 @@ static void sja1105_fast_age(struct dsa_switch *ds, int port) if (rc) { dev_err(ds->dev, "Failed to read FDB: %pe\n", ERR_PTR(rc)); - return; + break; }
if (!(l2_lookup.destports & BIT(port))) @@ -1931,14 +1952,16 @@ static void sja1105_fast_age(struct dsa_switch *ds, int port)
u64_to_ether_addr(l2_lookup.macaddr, macaddr);
- rc = sja1105_fdb_del(ds, port, macaddr, l2_lookup.vlanid, db); + rc = __sja1105_fdb_del(ds, port, macaddr, l2_lookup.vlanid, db); if (rc) { dev_err(ds->dev, "Failed to delete FDB entry %pM vid %lld: %pe\n", macaddr, l2_lookup.vlanid, ERR_PTR(rc)); - return; + break; } } + + mutex_unlock(&priv->fdb_lock); }
static int sja1105_mdb_add(struct dsa_switch *ds, int port, @@ -2962,7 +2985,9 @@ static int sja1105_port_mcast_flood(struct sja1105_private *priv, int to, { struct sja1105_l2_lookup_entry *l2_lookup; struct sja1105_table *table; - int match; + int match, rc; + + mutex_lock(&priv->fdb_lock);
table = &priv->static_config.tables[BLK_IDX_L2_LOOKUP]; l2_lookup = table->entries; @@ -2975,7 +3000,8 @@ static int sja1105_port_mcast_flood(struct sja1105_private *priv, int to, if (match == table->entry_count) { NL_SET_ERR_MSG_MOD(extack, "Could not find FDB entry for unknown multicast"); - return -ENOSPC; + rc = -ENOSPC; + goto out; }
if (flags.val & BR_MCAST_FLOOD) @@ -2983,10 +3009,13 @@ static int sja1105_port_mcast_flood(struct sja1105_private *priv, int to, else l2_lookup[match].destports &= ~BIT(to);
- return sja1105_dynamic_config_write(priv, BLK_IDX_L2_LOOKUP, - l2_lookup[match].index, - &l2_lookup[match], - true); + rc = sja1105_dynamic_config_write(priv, BLK_IDX_L2_LOOKUP, + l2_lookup[match].index, + &l2_lookup[match], true); +out: + mutex_unlock(&priv->fdb_lock); + + return rc; }
static int sja1105_port_pre_bridge_flags(struct dsa_switch *ds, int port, @@ -3356,6 +3385,7 @@ static int sja1105_probe(struct spi_device *spi) mutex_init(&priv->ptp_data.lock); mutex_init(&priv->dynamic_config_lock); mutex_init(&priv->mgmt_lock); + mutex_init(&priv->fdb_lock); spin_lock_init(&priv->ts_id_lock);
rc = sja1105_parse_dt(priv);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 86899e9e1e29e854b5f6dcc24ba4f75f792c89aa ]
Currently, when we add the first sja1105 port to a bridge with vlan_filtering 1, then we sometimes see this output:
sja1105 spi2.2: port 4 failed to read back entry for be:79:b4:9e:9e:96 vid 3088: -ENOENT sja1105 spi2.2: Reset switch and programmed static config. Reason: VLAN filtering sja1105 spi2.2: port 0 failed to add be:79:b4:9e:9e:96 vid 0 to fdb: -2
It is because sja1105_fdb_add() runs from the dsa_owq which is no longer serialized with switch resets since it dropped the rtnl_lock() in the blamed commit.
Either performing the FDB accesses before the reset, or after the reset, is equally fine, because sja1105_static_fdb_change() backs up those changes in the static config, but FDB access during reset isn't ok.
Make sja1105_static_config_reload() take the fdb_lock to fix that.
Fixes: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/sja1105/sja1105_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index 79927191ac623..013976b0af9f1 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -2304,6 +2304,7 @@ int sja1105_static_config_reload(struct sja1105_private *priv, int rc, i; s64 now;
+ mutex_lock(&priv->fdb_lock); mutex_lock(&priv->mgmt_lock);
mac = priv->static_config.tables[BLK_IDX_MAC_CONFIG].entries; @@ -2416,6 +2417,7 @@ int sja1105_static_config_reload(struct sja1105_private *priv, goto out; out: mutex_unlock(&priv->mgmt_lock); + mutex_unlock(&priv->fdb_lock);
return rc; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hayes Wang hayeswang@realtek.com
[ Upstream commit a7b8d60b37237680009dd0b025fe8c067aba0ee3 ]
According to the document of napi, there is no rx process when the budget is 0. Therefore, r8152_poll() has to return 0 directly when the budget is equal to 0.
Fixes: d2187f8e4454 ("r8152: divide the tx and rx bottom functions") Signed-off-by: Hayes Wang hayeswang@realtek.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/r8152.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 0738baa5b82e4..e88bedca8f32f 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -2629,6 +2629,9 @@ static int r8152_poll(struct napi_struct *napi, int budget) struct r8152 *tp = container_of(napi, struct r8152, napi); int work_done;
+ if (!budget) + return 0; + work_done = rx_bottom(tp, budget);
if (work_done < budget) {
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shigeru Yoshida syoshida@redhat.com
[ Upstream commit c821a88bd720b0046433173185fd841a100d44ad ]
syzbot reported a memory leak like below:
BUG: memory leak unreferenced object 0xffff88810b088c00 (size 240): comm "syz-executor186", pid 5012, jiffies 4294943306 (age 13.680s) hex dump (first 32 bytes): 00 89 08 0b 81 88 ff ff 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff83e5d5ff>] __alloc_skb+0x1ef/0x230 net/core/skbuff.c:634 [<ffffffff84606e59>] alloc_skb include/linux/skbuff.h:1289 [inline] [<ffffffff84606e59>] kcm_sendmsg+0x269/0x1050 net/kcm/kcmsock.c:815 [<ffffffff83e479c6>] sock_sendmsg_nosec net/socket.c:725 [inline] [<ffffffff83e479c6>] sock_sendmsg+0x56/0xb0 net/socket.c:748 [<ffffffff83e47f55>] ____sys_sendmsg+0x365/0x470 net/socket.c:2494 [<ffffffff83e4c389>] ___sys_sendmsg+0xc9/0x130 net/socket.c:2548 [<ffffffff83e4c536>] __sys_sendmsg+0xa6/0x120 net/socket.c:2577 [<ffffffff84ad7bb8>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff84ad7bb8>] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 [<ffffffff84c0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
In kcm_sendmsg(), kcm_tx_msg(head)->last_skb is used as a cursor to append newly allocated skbs to 'head'. If some bytes are copied, an error occurred, and jumped to out_error label, 'last_skb' is left unmodified. A later kcm_sendmsg() will use an obsoleted 'last_skb' reference, corrupting the 'head' frag_list and causing the leak.
This patch fixes this issue by properly updating the last allocated skb in 'last_skb'.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-and-tested-by: syzbot+6f98de741f7dbbfc4ccb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6f98de741f7dbbfc4ccb Signed-off-by: Shigeru Yoshida syoshida@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/kcm/kcmsock.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index 4580f61426bb8..740539a218b7c 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -939,6 +939,8 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
if (head != kcm->seq_skb) kfree_skb(head); + else if (copied) + kcm_tx_msg(head)->last_skb = skb;
err = sk_stream_error(sk, msg->msg_flags, err);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liming Sun limings@nvidia.com
[ Upstream commit 78034cbece79c2d730ad0770b3b7f23eedbbecf5 ]
This commit fixes tmfifo console stuck issue when the virtual networking interface is in down state. In such case, the network Rx descriptors runs out and causes the Rx network packet staying in the head of the tmfifo thus blocking the console packets. The fix is to drop the Rx network packet when no more Rx descriptors. Function name mlxbf_tmfifo_release_pending_pkt() is also renamed to mlxbf_tmfifo_release_pkt() to be more approperiate.
Fixes: 1357dfd7261f ("platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc") Signed-off-by: Liming Sun limings@nvidia.com Reviewed-by: Vadim Pasternak vadimp@nvidia.com Reviewed-by: David Thompson davthompson@nvidia.com Link: https://lore.kernel.org/r/8c0177dc938ae03f52ff7e0b62dbeee74b7bec09.169332254... Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/mellanox/mlxbf-tmfifo.c | 66 ++++++++++++++++++------ 1 file changed, 49 insertions(+), 17 deletions(-)
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c index b600b77d91ef2..5c1f859b682a8 100644 --- a/drivers/platform/mellanox/mlxbf-tmfifo.c +++ b/drivers/platform/mellanox/mlxbf-tmfifo.c @@ -59,6 +59,7 @@ struct mlxbf_tmfifo; * @vq: pointer to the virtio virtqueue * @desc: current descriptor of the pending packet * @desc_head: head descriptor of the pending packet + * @drop_desc: dummy desc for packet dropping * @cur_len: processed length of the current descriptor * @rem_len: remaining length of the pending packet * @pkt_len: total length of the pending packet @@ -75,6 +76,7 @@ struct mlxbf_tmfifo_vring { struct virtqueue *vq; struct vring_desc *desc; struct vring_desc *desc_head; + struct vring_desc drop_desc; int cur_len; int rem_len; u32 pkt_len; @@ -86,6 +88,14 @@ struct mlxbf_tmfifo_vring { struct mlxbf_tmfifo *fifo; };
+/* Check whether vring is in drop mode. */ +#define IS_VRING_DROP(_r) ({ \ + typeof(_r) (r) = (_r); \ + (r->desc_head == &r->drop_desc ? true : false); }) + +/* A stub length to drop maximum length packet. */ +#define VRING_DROP_DESC_MAX_LEN GENMASK(15, 0) + /* Interrupt types. */ enum { MLXBF_TM_RX_LWM_IRQ, @@ -262,6 +272,7 @@ static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo, vring->align = SMP_CACHE_BYTES; vring->index = i; vring->vdev_id = tm_vdev->vdev.id.device; + vring->drop_desc.len = VRING_DROP_DESC_MAX_LEN; dev = &tm_vdev->vdev.dev;
size = vring_size(vring->num, vring->align); @@ -367,7 +378,7 @@ static u32 mlxbf_tmfifo_get_pkt_len(struct mlxbf_tmfifo_vring *vring, return len; }
-static void mlxbf_tmfifo_release_pending_pkt(struct mlxbf_tmfifo_vring *vring) +static void mlxbf_tmfifo_release_pkt(struct mlxbf_tmfifo_vring *vring) { struct vring_desc *desc_head; u32 len = 0; @@ -596,19 +607,25 @@ static void mlxbf_tmfifo_rxtx_word(struct mlxbf_tmfifo_vring *vring,
if (vring->cur_len + sizeof(u64) <= len) { /* The whole word. */ - if (is_rx) - memcpy(addr + vring->cur_len, &data, sizeof(u64)); - else - memcpy(&data, addr + vring->cur_len, sizeof(u64)); + if (!IS_VRING_DROP(vring)) { + if (is_rx) + memcpy(addr + vring->cur_len, &data, + sizeof(u64)); + else + memcpy(&data, addr + vring->cur_len, + sizeof(u64)); + } vring->cur_len += sizeof(u64); } else { /* Leftover bytes. */ - if (is_rx) - memcpy(addr + vring->cur_len, &data, - len - vring->cur_len); - else - memcpy(&data, addr + vring->cur_len, - len - vring->cur_len); + if (!IS_VRING_DROP(vring)) { + if (is_rx) + memcpy(addr + vring->cur_len, &data, + len - vring->cur_len); + else + memcpy(&data, addr + vring->cur_len, + len - vring->cur_len); + } vring->cur_len = len; }
@@ -709,8 +726,16 @@ static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo_vring *vring, /* Get the descriptor of the next packet. */ if (!vring->desc) { desc = mlxbf_tmfifo_get_next_pkt(vring, is_rx); - if (!desc) - return false; + if (!desc) { + /* Drop next Rx packet to avoid stuck. */ + if (is_rx) { + desc = &vring->drop_desc; + vring->desc_head = desc; + vring->desc = desc; + } else { + return false; + } + } } else { desc = vring->desc; } @@ -743,17 +768,24 @@ static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo_vring *vring, vring->rem_len -= len;
/* Get the next desc on the chain. */ - if (vring->rem_len > 0 && + if (!IS_VRING_DROP(vring) && vring->rem_len > 0 && (virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT)) { idx = virtio16_to_cpu(vdev, desc->next); desc = &vr->desc[idx]; goto mlxbf_tmfifo_desc_done; }
- /* Done and release the pending packet. */ - mlxbf_tmfifo_release_pending_pkt(vring); + /* Done and release the packet. */ desc = NULL; fifo->vring[is_rx] = NULL; + if (!IS_VRING_DROP(vring)) { + mlxbf_tmfifo_release_pkt(vring); + } else { + vring->pkt_len = 0; + vring->desc_head = NULL; + vring->desc = NULL; + return false; + }
/* * Make sure the load/store are in order before @@ -933,7 +965,7 @@ static void mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev)
/* Release the pending packet. */ if (vring->desc) - mlxbf_tmfifo_release_pending_pkt(vring); + mlxbf_tmfifo_release_pkt(vring); vq = vring->vq; if (vq) { vring->vq = NULL;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liming Sun limings@nvidia.com
[ Upstream commit fc4c655821546239abb3cf4274d66b9747aa87dd ]
This commit drops over-sized network packets to avoid tmfifo queue stuck.
Fixes: 1357dfd7261f ("platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc") Signed-off-by: Liming Sun limings@nvidia.com Reviewed-by: Vadim Pasternak vadimp@nvidia.com Reviewed-by: David Thompson davthompson@nvidia.com Link: https://lore.kernel.org/r/9318936c2447f76db475c985ca6d91f057efcd41.169332254... Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/mellanox/mlxbf-tmfifo.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c index 5c1f859b682a8..f3696a54a2bd7 100644 --- a/drivers/platform/mellanox/mlxbf-tmfifo.c +++ b/drivers/platform/mellanox/mlxbf-tmfifo.c @@ -224,7 +224,7 @@ static u8 mlxbf_tmfifo_net_default_mac[ETH_ALEN] = { static efi_char16_t mlxbf_tmfifo_efi_name[] = L"RshimMacAddr";
/* Maximum L2 header length. */ -#define MLXBF_TMFIFO_NET_L2_OVERHEAD 36 +#define MLXBF_TMFIFO_NET_L2_OVERHEAD (ETH_HLEN + VLAN_HLEN)
/* Supported virtio-net features. */ #define MLXBF_TMFIFO_NET_FEATURES \ @@ -642,13 +642,14 @@ static void mlxbf_tmfifo_rxtx_word(struct mlxbf_tmfifo_vring *vring, * flag is set. */ static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo_vring *vring, - struct vring_desc *desc, + struct vring_desc **desc, bool is_rx, bool *vring_change) { struct mlxbf_tmfifo *fifo = vring->fifo; struct virtio_net_config *config; struct mlxbf_tmfifo_msg_hdr hdr; int vdev_id, hdr_len; + bool drop_rx = false;
/* Read/Write packet header. */ if (is_rx) { @@ -668,8 +669,8 @@ static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo_vring *vring, if (ntohs(hdr.len) > __virtio16_to_cpu(virtio_legacy_is_little_endian(), config->mtu) + - MLXBF_TMFIFO_NET_L2_OVERHEAD) - return; + MLXBF_TMFIFO_NET_L2_OVERHEAD) + drop_rx = true; } else { vdev_id = VIRTIO_ID_CONSOLE; hdr_len = 0; @@ -684,16 +685,25 @@ static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo_vring *vring,
if (!tm_dev2) return; - vring->desc = desc; + vring->desc = *desc; vring = &tm_dev2->vrings[MLXBF_TMFIFO_VRING_RX]; *vring_change = true; } + + if (drop_rx && !IS_VRING_DROP(vring)) { + if (vring->desc_head) + mlxbf_tmfifo_release_pkt(vring); + *desc = &vring->drop_desc; + vring->desc_head = *desc; + vring->desc = *desc; + } + vring->pkt_len = ntohs(hdr.len) + hdr_len; } else { /* Network virtio has an extra header. */ hdr_len = (vring->vdev_id == VIRTIO_ID_NET) ? sizeof(struct virtio_net_hdr) : 0; - vring->pkt_len = mlxbf_tmfifo_get_pkt_len(vring, desc); + vring->pkt_len = mlxbf_tmfifo_get_pkt_len(vring, *desc); hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ? VIRTIO_ID_NET : VIRTIO_ID_CONSOLE; hdr.len = htons(vring->pkt_len - hdr_len); @@ -742,7 +752,7 @@ static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo_vring *vring,
/* Beginning of a packet. Start to Rx/Tx packet header. */ if (vring->pkt_len == 0) { - mlxbf_tmfifo_rxtx_header(vring, desc, is_rx, &vring_change); + mlxbf_tmfifo_rxtx_header(vring, &desc, is_rx, &vring_change); (*avail)--;
/* Return if new packet is for another ring. */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shravan Kumar Ramani shravankr@nvidia.com
[ Upstream commit 80ccd40568bcd3655b0fd0be1e9b3379fd6e1056 ]
Replace sprintf with sysfs_emit where possible. Size check in mlxbf_pmc_event_list_show should account for "\0".
Fixes: 1a218d312e65 ("platform/mellanox: mlxbf-pmc: Add Mellanox BlueField PMC driver") Signed-off-by: Shravan Kumar Ramani shravankr@nvidia.com Reviewed-by: Vadim Pasternak vadimp@nvidia.com Reviewed-by: David Thompson davthompson@nvidia.com Link: https://lore.kernel.org/r/bef39ef32319a31b32f999065911f61b0d3b17c3.169391773... Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/mellanox/mlxbf-pmc.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/platform/mellanox/mlxbf-pmc.c b/drivers/platform/mellanox/mlxbf-pmc.c index be967d797c28e..95afcae7b9fa9 100644 --- a/drivers/platform/mellanox/mlxbf-pmc.c +++ b/drivers/platform/mellanox/mlxbf-pmc.c @@ -1008,7 +1008,7 @@ static ssize_t mlxbf_pmc_counter_show(struct device *dev, } else return -EINVAL;
- return sprintf(buf, "0x%llx\n", value); + return sysfs_emit(buf, "0x%llx\n", value); }
/* Store function for "counter" sysfs files */ @@ -1078,13 +1078,13 @@ static ssize_t mlxbf_pmc_event_show(struct device *dev,
err = mlxbf_pmc_read_event(blk_num, cnt_num, is_l3, &evt_num); if (err) - return sprintf(buf, "No event being monitored\n"); + return sysfs_emit(buf, "No event being monitored\n");
evt_name = mlxbf_pmc_get_event_name(pmc->block_name[blk_num], evt_num); if (!evt_name) return -EINVAL;
- return sprintf(buf, "0x%llx: %s\n", evt_num, evt_name); + return sysfs_emit(buf, "0x%llx: %s\n", evt_num, evt_name); }
/* Store function for "event" sysfs files */ @@ -1139,9 +1139,9 @@ static ssize_t mlxbf_pmc_event_list_show(struct device *dev, return -EINVAL;
for (i = 0, buf[0] = '\0'; i < size; ++i) { - len += sprintf(e_info, "0x%x: %s\n", events[i].evt_num, - events[i].evt_name); - if (len > PAGE_SIZE) + len += snprintf(e_info, sizeof(e_info), "0x%x: %s\n", + events[i].evt_num, events[i].evt_name); + if (len >= PAGE_SIZE) break; strcat(buf, e_info); ret = len; @@ -1168,7 +1168,7 @@ static ssize_t mlxbf_pmc_enable_show(struct device *dev,
value = FIELD_GET(MLXBF_PMC_L3C_PERF_CNT_CFG_EN, perfcnt_cfg);
- return sprintf(buf, "%d\n", value); + return sysfs_emit(buf, "%d\n", value); }
/* Store function for "enable" sysfs files - only for l3cache */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shravan Kumar Ramani shravankr@nvidia.com
[ Upstream commit 0f5969452e162efc50bdc98968fb62b424a9874b ]
This fix involves 2 changes: - All event regs have a reset value of 0, which is not a valid event_number as per the event_list for most blocks and hence seen as an error. Add a "disable" event with event_number 0 for all blocks.
- The enable bit for each counter need not be checked before reading the event info, and hence removed.
Fixes: 1a218d312e65 ("platform/mellanox: mlxbf-pmc: Add Mellanox BlueField PMC driver") Signed-off-by: Shravan Kumar Ramani shravankr@nvidia.com Reviewed-by: Vadim Pasternak vadimp@nvidia.com Reviewed-by: David Thompson davthompson@nvidia.com Link: https://lore.kernel.org/r/04d0213932d32681de1c716b54320ed894e52425.169391773... Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/mellanox/mlxbf-pmc.c | 27 +++++++-------------------- 1 file changed, 7 insertions(+), 20 deletions(-)
diff --git a/drivers/platform/mellanox/mlxbf-pmc.c b/drivers/platform/mellanox/mlxbf-pmc.c index 95afcae7b9fa9..2d4bbe99959ef 100644 --- a/drivers/platform/mellanox/mlxbf-pmc.c +++ b/drivers/platform/mellanox/mlxbf-pmc.c @@ -191,6 +191,7 @@ static const struct mlxbf_pmc_events mlxbf_pmc_smgen_events[] = { };
static const struct mlxbf_pmc_events mlxbf_pmc_trio_events_1[] = { + { 0x0, "DISABLE" }, { 0xa0, "TPIO_DATA_BEAT" }, { 0xa1, "TDMA_DATA_BEAT" }, { 0xa2, "MAP_DATA_BEAT" }, @@ -214,6 +215,7 @@ static const struct mlxbf_pmc_events mlxbf_pmc_trio_events_1[] = { };
static const struct mlxbf_pmc_events mlxbf_pmc_trio_events_2[] = { + { 0x0, "DISABLE" }, { 0xa0, "TPIO_DATA_BEAT" }, { 0xa1, "TDMA_DATA_BEAT" }, { 0xa2, "MAP_DATA_BEAT" }, @@ -246,6 +248,7 @@ static const struct mlxbf_pmc_events mlxbf_pmc_trio_events_2[] = { };
static const struct mlxbf_pmc_events mlxbf_pmc_ecc_events[] = { + { 0x0, "DISABLE" }, { 0x100, "ECC_SINGLE_ERROR_CNT" }, { 0x104, "ECC_DOUBLE_ERROR_CNT" }, { 0x114, "SERR_INJ" }, @@ -258,6 +261,7 @@ static const struct mlxbf_pmc_events mlxbf_pmc_ecc_events[] = { };
static const struct mlxbf_pmc_events mlxbf_pmc_mss_events[] = { + { 0x0, "DISABLE" }, { 0xc0, "RXREQ_MSS" }, { 0xc1, "RXDAT_MSS" }, { 0xc2, "TXRSP_MSS" }, @@ -265,6 +269,7 @@ static const struct mlxbf_pmc_events mlxbf_pmc_mss_events[] = { };
static const struct mlxbf_pmc_events mlxbf_pmc_hnf_events[] = { + { 0x0, "DISABLE" }, { 0x45, "HNF_REQUESTS" }, { 0x46, "HNF_REJECTS" }, { 0x47, "ALL_BUSY" }, @@ -323,6 +328,7 @@ static const struct mlxbf_pmc_events mlxbf_pmc_hnf_events[] = { };
static const struct mlxbf_pmc_events mlxbf_pmc_hnfnet_events[] = { + { 0x0, "DISABLE" }, { 0x12, "CDN_REQ" }, { 0x13, "DDN_REQ" }, { 0x14, "NDN_REQ" }, @@ -892,7 +898,7 @@ static int mlxbf_pmc_read_event(int blk_num, uint32_t cnt_num, bool is_l3, uint64_t *result) { uint32_t perfcfg_offset, perfval_offset; - uint64_t perfmon_cfg, perfevt, perfctl; + uint64_t perfmon_cfg, perfevt;
if (cnt_num >= pmc->block[blk_num].counters) return -EINVAL; @@ -904,25 +910,6 @@ static int mlxbf_pmc_read_event(int blk_num, uint32_t cnt_num, bool is_l3, perfval_offset = perfcfg_offset + pmc->block[blk_num].counters * MLXBF_PMC_REG_SIZE;
- /* Set counter in "read" mode */ - perfmon_cfg = FIELD_PREP(MLXBF_PMC_PERFMON_CONFIG_ADDR, - MLXBF_PMC_PERFCTL); - perfmon_cfg |= FIELD_PREP(MLXBF_PMC_PERFMON_CONFIG_STROBE, 1); - perfmon_cfg |= FIELD_PREP(MLXBF_PMC_PERFMON_CONFIG_WR_R_B, 0); - - if (mlxbf_pmc_write(pmc->block[blk_num].mmio_base + perfcfg_offset, - MLXBF_PMC_WRITE_REG_64, perfmon_cfg)) - return -EFAULT; - - /* Check if the counter is enabled */ - - if (mlxbf_pmc_read(pmc->block[blk_num].mmio_base + perfval_offset, - MLXBF_PMC_READ_REG_64, &perfctl)) - return -EFAULT; - - if (!FIELD_GET(MLXBF_PMC_PERFCTL_EN0, perfctl)) - return -EINVAL; - /* Set counter in "read" mode */ perfmon_cfg = FIELD_PREP(MLXBF_PMC_PERFMON_CONFIG_ADDR, MLXBF_PMC_PERFEVT);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit 0a138f1670bd1af13ba6949c48ea86ddd4bf557e ]
The only probing method supported by the Nvidia SN2201 platform driver is probing through an ACPI match table. Hence add a dependency on ACPI, to prevent asking the user about this driver when configuring a kernel without ACPI support.
Fixes: 662f24826f95 ("platform/mellanox: Add support for new SN2201 system") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Acked-by: Vadim Pasternak vadimp@nvidia.com Acked-by: Andi Shyti andi.shyti@kernel.org Link: https://lore.kernel.org/r/ec5a4071691ab08d58771b7732a9988e89779268.169382836... Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/mellanox/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/platform/mellanox/Kconfig b/drivers/platform/mellanox/Kconfig index 382793e73a60a..30b50920b278c 100644 --- a/drivers/platform/mellanox/Kconfig +++ b/drivers/platform/mellanox/Kconfig @@ -80,8 +80,8 @@ config MLXBF_PMC
config NVSW_SN2201 tristate "Nvidia SN2201 platform driver support" - depends on HWMON - depends on I2C + depends on HWMON && I2C + depends on ACPI || COMPILE_TEST select REGMAP_I2C help This driver provides support for the Nvidia SN2201 platform.
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liu Jian liujian56@huawei.com
[ Upstream commit cfaa80c91f6f99b9342b6557f0f0e1143e434066 ]
I got the below warning when do fuzzing test: BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470 Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9
CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G OE Hardware name: linux,dummy-virt (DT) Workqueue: pencrypt_parallel padata_parallel_worker Call trace: dump_backtrace+0x0/0x420 show_stack+0x34/0x44 dump_stack+0x1d0/0x248 __kasan_report+0x138/0x140 kasan_report+0x44/0x6c __asan_load4+0x94/0xd0 scatterwalk_copychunks+0x320/0x470 skcipher_next_slow+0x14c/0x290 skcipher_walk_next+0x2fc/0x480 skcipher_walk_first+0x9c/0x110 skcipher_walk_aead_common+0x380/0x440 skcipher_walk_aead_encrypt+0x54/0x70 ccm_encrypt+0x13c/0x4d0 crypto_aead_encrypt+0x7c/0xfc pcrypt_aead_enc+0x28/0x84 padata_parallel_worker+0xd0/0x2dc process_one_work+0x49c/0xbdc worker_thread+0x124/0x880 kthread+0x210/0x260 ret_from_fork+0x10/0x18
This is because the value of rec_seq of tls_crypto_info configured by the user program is too large, for example, 0xffffffffffffff. In addition, TLS is asynchronously accelerated. When tls_do_encryption() returns -EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow, skmsg is released before the asynchronous encryption process ends. As a result, the UAF problem occurs during the asynchronous processing of the encryption module.
If the operation is asynchronous and the encryption module returns EINPROGRESS, do not free the record information.
Fixes: 635d93981786 ("net/tls: free record only on encryption error") Signed-off-by: Liu Jian liujian56@huawei.com Reviewed-by: Sabrina Dubroca sd@queasysnail.net Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 53f944e6d8ef2..e047abc600893 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -817,7 +817,7 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk, psock = sk_psock_get(sk); if (!psock || !policy) { err = tls_push_record(sk, flags, record_type); - if (err && sk->sk_err == EBADMSG) { + if (err && err != -EINPROGRESS && sk->sk_err == EBADMSG) { *copied -= sk_msg_free(sk, msg); tls_free_open_rec(sk); err = -sk->sk_err; @@ -846,7 +846,7 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk, switch (psock->eval) { case __SK_PASS: err = tls_push_record(sk, flags, record_type); - if (err && sk->sk_err == EBADMSG) { + if (err && err != -EINPROGRESS && sk->sk_err == EBADMSG) { *copied -= sk_msg_free(sk, msg); tls_free_open_rec(sk); err = -sk->sk_err;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sascha Hauer s.hauer@pengutronix.de
[ Upstream commit 403f0e771457e2b8811dc280719d11b9bacf10f4 ]
macb_set_tx_clk() is called under a spinlock but itself calls clk_set_rate() which can sleep. This results in:
| BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 | pps pps1: new PPS source ptp1 | in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 40, name: kworker/u4:3 | preempt_count: 1, expected: 0 | RCU nest depth: 0, expected: 0 | 4 locks held by kworker/u4:3/40: | #0: ffff000003409148 | macb ff0c0000.ethernet: gem-ptp-timer ptp clock registered. | ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x14c/0x51c | #1: ffff8000833cbdd8 ((work_completion)(&pl->resolve)){+.+.}-{0:0}, at: process_one_work+0x14c/0x51c | #2: ffff000004f01578 (&pl->state_mutex){+.+.}-{4:4}, at: phylink_resolve+0x44/0x4e8 | #3: ffff000004f06f50 (&bp->lock){....}-{3:3}, at: macb_mac_link_up+0x40/0x2ac | irq event stamp: 113998 | hardirqs last enabled at (113997): [<ffff800080e8503c>] _raw_spin_unlock_irq+0x30/0x64 | hardirqs last disabled at (113998): [<ffff800080e84478>] _raw_spin_lock_irqsave+0xac/0xc8 | softirqs last enabled at (113608): [<ffff800080010630>] __do_softirq+0x430/0x4e4 | softirqs last disabled at (113597): [<ffff80008001614c>] ____do_softirq+0x10/0x1c | CPU: 0 PID: 40 Comm: kworker/u4:3 Not tainted 6.5.0-11717-g9355ce8b2f50-dirty #368 | Hardware name: ... ZynqMP ... (DT) | Workqueue: events_power_efficient phylink_resolve | Call trace: | dump_backtrace+0x98/0xf0 | show_stack+0x18/0x24 | dump_stack_lvl+0x60/0xac | dump_stack+0x18/0x24 | __might_resched+0x144/0x24c | __might_sleep+0x48/0x98 | __mutex_lock+0x58/0x7b0 | mutex_lock_nested+0x24/0x30 | clk_prepare_lock+0x4c/0xa8 | clk_set_rate+0x24/0x8c | macb_mac_link_up+0x25c/0x2ac | phylink_resolve+0x178/0x4e8 | process_one_work+0x1ec/0x51c | worker_thread+0x1ec/0x3e4 | kthread+0x120/0x124 | ret_from_fork+0x10/0x20
The obvious fix is to move the call to macb_set_tx_clk() out of the protected area. This seems safe as rx and tx are both disabled anyway at this point. It is however not entirely clear what the spinlock shall protect. It could be the read-modify-write access to the NCFGR register, but this is accessed in macb_set_rx_mode() and macb_set_rxcsum_feature() as well without holding the spinlock. It could also be the register accesses done in mog_init_rings() or macb_init_buffers(), but again these functions are called without holding the spinlock in macb_hresp_error_task(). The locking seems fishy in this driver and it might deserve another look before this patch is applied.
Fixes: 633e98a711ac0 ("net: macb: use resolved link config in mac_link_up()") Signed-off-by: Sascha Hauer s.hauer@pengutronix.de Link: https://lore.kernel.org/r/20230908112913.1701766-1-s.hauer@pengutronix.de Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cadence/macb_main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 82929ee76739d..2d18acf3d5eae 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -757,8 +757,6 @@ static void macb_mac_link_up(struct phylink_config *config, if (rx_pause) ctrl |= MACB_BIT(PAE);
- macb_set_tx_clk(bp, speed); - /* Initialize rings & buffers as clearing MACB_BIT(TE) in link down * cleared the pipeline and control registers. */ @@ -778,6 +776,9 @@ static void macb_mac_link_up(struct phylink_config *config,
spin_unlock_irqrestore(&bp->lock, flags);
+ if (!(bp->caps & MACB_CAPS_MACB_IS_EMAC)) + macb_set_tx_clk(bp, speed); + /* Enable Rx and Tx; Enable PTP unicast */ ctrl = macb_readl(bp, NCR); if (gem_has_ptp(bp))
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toke Høiland-Jørgensen toke@redhat.com
[ Upstream commit 7a6102aa6df0d5d032b4cbc51935d1d4cda17254 ]
There's an early return in veth_set_features() if the device is in a down state, which leads to the XDP feature flags not being updated when enabling GRO while the device is down. Which in turn leads to XDP_REDIRECT not working, because the redirect code now checks the flags.
Fix this by updating the feature flags after bringing the device up.
Before this patch:
NETDEV_XDP_ACT_BASIC: yes NETDEV_XDP_ACT_REDIRECT: yes NETDEV_XDP_ACT_NDO_XMIT: no NETDEV_XDP_ACT_XSK_ZEROCOPY: no NETDEV_XDP_ACT_HW_OFFLOAD: no NETDEV_XDP_ACT_RX_SG: yes NETDEV_XDP_ACT_NDO_XMIT_SG: no
After this patch:
NETDEV_XDP_ACT_BASIC: yes NETDEV_XDP_ACT_REDIRECT: yes NETDEV_XDP_ACT_NDO_XMIT: yes NETDEV_XDP_ACT_XSK_ZEROCOPY: no NETDEV_XDP_ACT_HW_OFFLOAD: no NETDEV_XDP_ACT_RX_SG: yes NETDEV_XDP_ACT_NDO_XMIT_SG: yes
Fixes: fccca038f300 ("veth: take into account device reconfiguration for xdp_features flag") Fixes: 66c0e13ad236 ("drivers: net: turn on XDP features") Signed-off-by: Toke Høiland-Jørgensen toke@redhat.com Link: https://lore.kernel.org/r/20230911135826.722295-1-toke@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/veth.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 2db678c0082a3..fc0d0114d8c27 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1447,6 +1447,8 @@ static int veth_open(struct net_device *dev) netif_carrier_on(peer); }
+ veth_set_xdp_features(dev); + return 0; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 8cdd9f1aaedf823006449faa4e540026c692ac43 ]
ip6_sock_set_addr_preferences() second argument should be an integer.
SUNRPC attempts to set IPV6_PREFER_SRC_PUBLIC were translated to IPV6_PREFER_SRC_TMP
Fixes: 18d5ad623275 ("ipv6: add ip6_sock_set_addr_preferences") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Christoph Hellwig hch@lst.de Cc: Chuck Lever chuck.lever@oracle.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230911154213.713941-1-edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/ipv6.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 2acc4c808d45d..b08bd694385aa 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1356,7 +1356,7 @@ static inline int __ip6_sock_set_addr_preferences(struct sock *sk, int val) return 0; }
-static inline int ip6_sock_set_addr_preferences(struct sock *sk, bool val) +static inline int ip6_sock_set_addr_preferences(struct sock *sk, int val) { int ret;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit c6d277064b1da7f9015b575a562734de87a7e463 ]
This is a prep patch to make the following patches cleaner that touch inet_bind2_bucket_match() and inet_bind2_bucket_match_addr_any().
Both functions have duplicated comparison for netns, port, and l3mdev. Let's factorise them.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: aa99e5f87bd5 ("tcp: Fix bind() regression for v4-mapped-v6 wildcard address.") Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/inet_hashtables.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 0819d6001b9ab..1ab81534f8c55 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -795,41 +795,39 @@ static bool inet_bind2_bucket_match(const struct inet_bind2_bucket *tb, const struct net *net, unsigned short port, int l3mdev, const struct sock *sk) { + if (!net_eq(ib2_net(tb), net) || tb->port != port || + tb->l3mdev != l3mdev) + return false; + #if IS_ENABLED(CONFIG_IPV6) if (sk->sk_family != tb->family) return false;
if (sk->sk_family == AF_INET6) - return net_eq(ib2_net(tb), net) && tb->port == port && - tb->l3mdev == l3mdev && - ipv6_addr_equal(&tb->v6_rcv_saddr, &sk->sk_v6_rcv_saddr); - else + return ipv6_addr_equal(&tb->v6_rcv_saddr, &sk->sk_v6_rcv_saddr); #endif - return net_eq(ib2_net(tb), net) && tb->port == port && - tb->l3mdev == l3mdev && tb->rcv_saddr == sk->sk_rcv_saddr; + return tb->rcv_saddr == sk->sk_rcv_saddr; }
bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const struct net *net, unsigned short port, int l3mdev, const struct sock *sk) { + if (!net_eq(ib2_net(tb), net) || tb->port != port || + tb->l3mdev != l3mdev) + return false; + #if IS_ENABLED(CONFIG_IPV6) if (sk->sk_family != tb->family) { if (sk->sk_family == AF_INET) - return net_eq(ib2_net(tb), net) && tb->port == port && - tb->l3mdev == l3mdev && - ipv6_addr_any(&tb->v6_rcv_saddr); + return ipv6_addr_any(&tb->v6_rcv_saddr);
return false; }
if (sk->sk_family == AF_INET6) - return net_eq(ib2_net(tb), net) && tb->port == port && - tb->l3mdev == l3mdev && - ipv6_addr_any(&tb->v6_rcv_saddr); - else + return ipv6_addr_any(&tb->v6_rcv_saddr); #endif - return net_eq(ib2_net(tb), net) && tb->port == port && - tb->l3mdev == l3mdev && tb->rcv_saddr == 0; + return tb->rcv_saddr == 0; }
/* The socket's bhash2 hashbucket spinlock must be held when this is called */
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit aa99e5f87bd54db55dd37cb130bd5eb55933027f ]
Andrei Vagin reported bind() regression with strace logs.
If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4 socket to 127.0.0.1, the 2nd bind() should fail but now succeeds.
from socket import *
s1 = socket(AF_INET6, SOCK_STREAM) s1.bind(('::ffff:0.0.0.0', 0))
s2 = socket(AF_INET, SOCK_STREAM) s2.bind(('127.0.0.1', s1.getsockname()[1]))
During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check if tb has the v4-mapped-v6 wildcard address.
The example above does not work after commit 5456262d2baa ("net: Fix incorrect address comparison when searching for a bind2 bucket"), but the blamed change is not the commit.
Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated as 0.0.0.0, and the sequence above worked by chance. Technically, this case has been broken since bhash2 was introduced.
Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0, the 2nd bind() fails properly because we fall back to using bhash to detect conflicts for the v4-mapped-v6 address.
Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Reported-by: Andrei Vagin avagin@google.com Closes: https://lore.kernel.org/netdev/ZPuYBOFC8zsK6r9T@google.com/ Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/ipv6.h | 5 +++++ net/ipv4/inet_hashtables.c | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h index b08bd694385aa..a14ac821fb36f 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -784,6 +784,11 @@ static inline bool ipv6_addr_v4mapped(const struct in6_addr *a) cpu_to_be32(0x0000ffff))) == 0UL; }
+static inline bool ipv6_addr_v4mapped_any(const struct in6_addr *a) +{ + return ipv6_addr_v4mapped(a) && ipv4_is_zeronet(a->s6_addr32[3]); +} + static inline bool ipv6_addr_v4mapped_loopback(const struct in6_addr *a) { return ipv6_addr_v4mapped(a) && ipv4_is_loopback(a->s6_addr32[3]); diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 1ab81534f8c55..fb13a28577b01 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -819,7 +819,8 @@ bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const #if IS_ENABLED(CONFIG_IPV6) if (sk->sk_family != tb->family) { if (sk->sk_family == AF_INET) - return ipv6_addr_any(&tb->v6_rcv_saddr); + return ipv6_addr_any(&tb->v6_rcv_saddr) || + ipv6_addr_v4mapped_any(&tb->v6_rcv_saddr);
return false; }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit c48ef9c4aed3632566b57ba66cec6ec78624d4cb ]
Since bhash2 was introduced, the example below does not work as expected. These two bind() should conflict, but the 2nd bind() now succeeds.
from socket import *
s1 = socket(AF_INET6, SOCK_STREAM) s1.bind(('::ffff:127.0.0.1', 0))
s2 = socket(AF_INET, SOCK_STREAM) s2.bind(('127.0.0.1', s1.getsockname()[1]))
During the 2nd bind() in inet_csk_get_port(), inet_bind2_bucket_find() fails to find the 1st socket's tb2, so inet_bind2_bucket_create() allocates a new tb2 for the 2nd socket. Then, we call inet_csk_bind_conflict() that checks conflicts in the new tb2 by inet_bhash2_conflict(). However, the new tb2 does not include the 1st socket, thus the bind() finally succeeds.
In this case, inet_bind2_bucket_match() must check if AF_INET6 tb2 has the conflicting v4-mapped-v6 address so that inet_bind2_bucket_find() returns the 1st socket's tb2.
Note that if we bind two sockets to 127.0.0.1 and then ::FFFF:127.0.0.1, the 2nd bind() fails properly for the same reason mentinoed in the previous commit.
Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Acked-by: Andrei Vagin avagin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/inet_hashtables.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index fb13a28577b01..ae5e786a0598d 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -800,8 +800,13 @@ static bool inet_bind2_bucket_match(const struct inet_bind2_bucket *tb, return false;
#if IS_ENABLED(CONFIG_IPV6) - if (sk->sk_family != tb->family) + if (sk->sk_family != tb->family) { + if (sk->sk_family == AF_INET) + return ipv6_addr_v4mapped(&tb->v6_rcv_saddr) && + tb->v6_rcv_saddr.s6_addr32[3] == sk->sk_rcv_saddr; + return false; + }
if (sk->sk_family == AF_INET6) return ipv6_addr_equal(&tb->v6_rcv_saddr, &sk->sk_v6_rcv_saddr);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 0071d15517b4a3d265abc00395beb1138e7236c7 ]
The selftest passes the IPv6 address length for an IPv4 address. We should pass the correct length.
Note inet_bind_sk() does not check if the size is larger than sizeof(struct sockaddr_in), so there is no real bug in this selftest.
Fixes: 13715acf8ab5 ("selftest: Add test for bind() conflicts.") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/bind_wildcard.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/bind_wildcard.c b/tools/testing/selftests/net/bind_wildcard.c index 58edfc15d28bd..e7ebe72e879d7 100644 --- a/tools/testing/selftests/net/bind_wildcard.c +++ b/tools/testing/selftests/net/bind_wildcard.c @@ -100,7 +100,7 @@ void bind_sockets(struct __test_metadata *_metadata, TEST_F(bind_wildcard, v4_v6) { bind_sockets(_metadata, self, - (struct sockaddr *)&self->addr4, sizeof(self->addr6), + (struct sockaddr *)&self->addr4, sizeof(self->addr4), (struct sockaddr *)&self->addr6, sizeof(self->addr6)); }
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vadim Fedorenko vadim.fedorenko@linux.dev
[ Upstream commit 3c44191dd76cf9c0cc49adaf34384cbd42ef8ad2 ]
The commit in fixes introduced flags to control the status of hardware configuration while processing packets. At the same time another structure is used to provide configuration of timestamper to user-space applications. The way it was coded makes this structures go out of sync easily. The repro is easy for 82599 chips:
[root@hostname ~]# hwstamp_ctl -i eth0 -r 12 -t 1 current settings: tx_type 0 rx_filter 0 new settings: tx_type 1 rx_filter 12
The eth0 device is properly configured to timestamp any PTPv2 events.
[root@hostname ~]# hwstamp_ctl -i eth0 -r 1 -t 1 current settings: tx_type 1 rx_filter 12 SIOCSHWTSTAMP failed: Numerical result out of range The requested time stamping mode is not supported by the hardware.
The error is properly returned because HW doesn't support all packets timestamping. But the adapter->flags is cleared of timestamp flags even though no HW configuration was done. From that point no RX timestamps are received by user-space application. But configuration shows good values:
[root@hostname ~]# hwstamp_ctl -i eth0 current settings: tx_type 1 rx_filter 12
Fix the issue by applying new flags only when the HW was actually configured.
Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices") Signed-off-by: Vadim Fedorenko vadim.fedorenko@linux.dev Reviewed-by: Simon Horman horms@kernel.org Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 28 +++++++++++--------- 1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c index 0310af851086b..9339edbd90821 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c @@ -979,6 +979,7 @@ static int ixgbe_ptp_set_timestamp_mode(struct ixgbe_adapter *adapter, u32 tsync_tx_ctl = IXGBE_TSYNCTXCTL_ENABLED; u32 tsync_rx_ctl = IXGBE_TSYNCRXCTL_ENABLED; u32 tsync_rx_mtrl = PTP_EV_PORT << 16; + u32 aflags = adapter->flags; bool is_l2 = false; u32 regval;
@@ -996,20 +997,20 @@ static int ixgbe_ptp_set_timestamp_mode(struct ixgbe_adapter *adapter, case HWTSTAMP_FILTER_NONE: tsync_rx_ctl = 0; tsync_rx_mtrl = 0; - adapter->flags &= ~(IXGBE_FLAG_RX_HWTSTAMP_ENABLED | - IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER); + aflags &= ~(IXGBE_FLAG_RX_HWTSTAMP_ENABLED | + IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER); break; case HWTSTAMP_FILTER_PTP_V1_L4_SYNC: tsync_rx_ctl |= IXGBE_TSYNCRXCTL_TYPE_L4_V1; tsync_rx_mtrl |= IXGBE_RXMTRL_V1_SYNC_MSG; - adapter->flags |= (IXGBE_FLAG_RX_HWTSTAMP_ENABLED | - IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER); + aflags |= (IXGBE_FLAG_RX_HWTSTAMP_ENABLED | + IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER); break; case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ: tsync_rx_ctl |= IXGBE_TSYNCRXCTL_TYPE_L4_V1; tsync_rx_mtrl |= IXGBE_RXMTRL_V1_DELAY_REQ_MSG; - adapter->flags |= (IXGBE_FLAG_RX_HWTSTAMP_ENABLED | - IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER); + aflags |= (IXGBE_FLAG_RX_HWTSTAMP_ENABLED | + IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER); break; case HWTSTAMP_FILTER_PTP_V2_EVENT: case HWTSTAMP_FILTER_PTP_V2_L2_EVENT: @@ -1023,8 +1024,8 @@ static int ixgbe_ptp_set_timestamp_mode(struct ixgbe_adapter *adapter, tsync_rx_ctl |= IXGBE_TSYNCRXCTL_TYPE_EVENT_V2; is_l2 = true; config->rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT; - adapter->flags |= (IXGBE_FLAG_RX_HWTSTAMP_ENABLED | - IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER); + aflags |= (IXGBE_FLAG_RX_HWTSTAMP_ENABLED | + IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER); break; case HWTSTAMP_FILTER_PTP_V1_L4_EVENT: case HWTSTAMP_FILTER_NTP_ALL: @@ -1035,7 +1036,7 @@ static int ixgbe_ptp_set_timestamp_mode(struct ixgbe_adapter *adapter, if (hw->mac.type >= ixgbe_mac_X550) { tsync_rx_ctl |= IXGBE_TSYNCRXCTL_TYPE_ALL; config->rx_filter = HWTSTAMP_FILTER_ALL; - adapter->flags |= IXGBE_FLAG_RX_HWTSTAMP_ENABLED; + aflags |= IXGBE_FLAG_RX_HWTSTAMP_ENABLED; break; } fallthrough; @@ -1046,8 +1047,6 @@ static int ixgbe_ptp_set_timestamp_mode(struct ixgbe_adapter *adapter, * Delay_Req messages and hardware does not support * timestamping all packets => return error */ - adapter->flags &= ~(IXGBE_FLAG_RX_HWTSTAMP_ENABLED | - IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER); config->rx_filter = HWTSTAMP_FILTER_NONE; return -ERANGE; } @@ -1079,8 +1078,8 @@ static int ixgbe_ptp_set_timestamp_mode(struct ixgbe_adapter *adapter, IXGBE_TSYNCRXCTL_TYPE_ALL | IXGBE_TSYNCRXCTL_TSIP_UT_EN; config->rx_filter = HWTSTAMP_FILTER_ALL; - adapter->flags |= IXGBE_FLAG_RX_HWTSTAMP_ENABLED; - adapter->flags &= ~IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER; + aflags |= IXGBE_FLAG_RX_HWTSTAMP_ENABLED; + aflags &= ~IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER; is_l2 = true; break; default: @@ -1113,6 +1112,9 @@ static int ixgbe_ptp_set_timestamp_mode(struct ixgbe_adapter *adapter,
IXGBE_WRITE_FLUSH(hw);
+ /* configure adapter flags only when HW is actually configured */ + adapter->flags = aflags; + /* clear TX/RX time stamp registers, just to be sure */ ixgbe_ptp_clear_tx_timestamp(adapter); IXGBE_READ_REG(hw, IXGBE_RXSTMPH);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Corinna Vinschen vinschen@redhat.com
[ Upstream commit bc6ed2fa24b14e40e1005488bbe11268ce7108fa ]
After commit 50f303496d92 ("igb: Enable SR-IOV after reinit"), removing the igb module could hang or crash (depending on the machine) when the module has been loaded with the max_vfs parameter set to some value != 0.
In case of one test machine with a dual port 82580, this hang occurred:
[ 232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1 [ 233.093257] igb 0000:41:00.1: IOV Disabled [ 233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0 [ 233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata) [ 233.352248] igb 0000:41:00.0: device [8086:1516] error status/mask=00100000 [ 233.361088] igb 0000:41:00.0: [20] UnsupReq (First) [ 233.368183] igb 0000:41:00.0: AER: TLP Header: 40000001 0000040f cdbfc00c c [ 233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata) [ 233.388779] igb 0000:41:00.1: device [8086:1516] error status/mask=00100000 [ 233.397629] igb 0000:41:00.1: [20] UnsupReq (First) [ 233.404736] igb 0000:41:00.1: AER: TLP Header: 40000001 0000040f cdbfc00c c [ 233.538214] pci 0000:41:00.1: AER: can't recover (no error_detected callback) [ 233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0 [ 233.546197] pcieport 0000:40:01.0: AER: device recovery failed [ 234.157244] igb 0000:41:00.0: IOV Disabled [ 371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds. [ 371.627489] Not tainted 6.4.0-dirty #2 [ 371.632257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this. [ 371.641000] task:irq/35-aerdrv state:D stack:0 pid:257 ppid:2 f0 [ 371.650330] Call Trace: [ 371.653061] <TASK> [ 371.655407] __schedule+0x20e/0x660 [ 371.659313] schedule+0x5a/0xd0 [ 371.662824] schedule_preempt_disabled+0x11/0x20 [ 371.667983] __mutex_lock.constprop.0+0x372/0x6c0 [ 371.673237] ? __pfx_aer_root_reset+0x10/0x10 [ 371.678105] report_error_detected+0x25/0x1c0 [ 371.682974] ? __pfx_report_normal_detected+0x10/0x10 [ 371.688618] pci_walk_bus+0x72/0x90 [ 371.692519] pcie_do_recovery+0xb2/0x330 [ 371.696899] aer_process_err_devices+0x117/0x170 [ 371.702055] aer_isr+0x1c0/0x1e0 [ 371.705661] ? __set_cpus_allowed_ptr+0x54/0xa0 [ 371.710723] ? __pfx_irq_thread_fn+0x10/0x10 [ 371.715496] irq_thread_fn+0x20/0x60 [ 371.719491] irq_thread+0xe6/0x1b0 [ 371.723291] ? __pfx_irq_thread_dtor+0x10/0x10 [ 371.728255] ? __pfx_irq_thread+0x10/0x10 [ 371.732731] kthread+0xe2/0x110 [ 371.736243] ? __pfx_kthread+0x10/0x10 [ 371.740430] ret_from_fork+0x2c/0x50 [ 371.744428] </TASK>
The reproducer was a simple script:
#!/bin/sh for i in `seq 1 5`; do modprobe -rv igb modprobe -v igb max_vfs=1 sleep 1 modprobe -rv igb done
It turned out that this could only be reproduce on 82580 (quad and dual-port), but not on 82576, i350 and i210. Further debugging showed that igb_enable_sriov()'s call to pci_enable_sriov() is failing, because dev->is_physfn is 0 on 82580.
Prior to commit 50f303496d92 ("igb: Enable SR-IOV after reinit"), igb_enable_sriov() jumped into the "err_out" cleanup branch. After this commit it only returned the error code.
So the cleanup didn't take place, and the incorrect VF setup in the igb_adapter structure fooled the igb driver into assuming that VFs have been set up where no VF actually existed.
Fix this problem by cleaning up again if pci_enable_sriov() fails.
Fixes: 50f303496d92 ("igb: Enable SR-IOV after reinit") Signed-off-by: Corinna Vinschen vinschen@redhat.com Reviewed-by: Akihiko Odaki akihiko.odaki@daynix.com Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_main.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index ac19730e8db91..12f106b3a878b 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3827,8 +3827,11 @@ static int igb_enable_sriov(struct pci_dev *pdev, int num_vfs, bool reinit) }
/* only call pci_enable_sriov() if no VFs are allocated already */ - if (!old_vfs) + if (!old_vfs) { err = pci_enable_sriov(pdev, adapter->vfs_allocated_count); + if (err) + goto err_out; + }
goto out;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com
[ Upstream commit e7b1ef29420fe52c2c1a273a9b4b36103a522625 ]
Fix unmasking irq condition by using napi_complete_done(). Otherwise, redundant interrupts happen.
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/rswitch.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/renesas/rswitch.c b/drivers/net/ethernet/renesas/rswitch.c index 4e412ac0965a4..449ed1f5624c9 100644 --- a/drivers/net/ethernet/renesas/rswitch.c +++ b/drivers/net/ethernet/renesas/rswitch.c @@ -816,10 +816,10 @@ static int rswitch_poll(struct napi_struct *napi, int budget)
netif_wake_subqueue(ndev, 0);
- napi_complete(napi); - - rswitch_enadis_data_irq(priv, rdev->tx_queue->index, true); - rswitch_enadis_data_irq(priv, rdev->rx_queue->index, true); + if (napi_complete_done(napi, budget - quota)) { + rswitch_enadis_data_irq(priv, rdev->tx_queue->index, true); + rswitch_enadis_data_irq(priv, rdev->rx_queue->index, true); + }
out: return budget - quota;
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit a22730b1b4bf437c6bbfdeff5feddf54be4aeada ]
syzkaller found a memory leak in kcm_sendmsg(), and commit c821a88bd720 ("kcm: Fix memory leak in error path of kcm_sendmsg()") suppressed it by updating kcm_tx_msg(head)->last_skb if partial data is copied so that the following sendmsg() will resume from the skb.
However, we cannot know how many bytes were copied when we get the error. Thus, we could mess up the MSG_MORE queue.
When kcm_sendmsg() fails for SOCK_DGRAM, we should purge the queue as we do so for UDP by udp_flush_pending_frames().
Even without this change, when the error occurred, the following sendmsg() resumed from a wrong skb and the queue was messed up. However, we have yet to get such a report, and only syzkaller stumbled on it. So, this can be changed safely.
Note this does not change SOCK_SEQPACKET behaviour.
Fixes: c821a88bd720 ("kcm: Fix memory leak in error path of kcm_sendmsg()") Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230912022753.33327-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/kcm/kcmsock.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index 740539a218b7c..dd1d8ffd5f594 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -930,17 +930,18 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) out_error: kcm_push(kcm);
- if (copied && sock->type == SOCK_SEQPACKET) { + if (sock->type == SOCK_SEQPACKET) { /* Wrote some bytes before encountering an * error, return partial success. */ - goto partial_message; - } - - if (head != kcm->seq_skb) + if (copied) + goto partial_message; + if (head != kcm->seq_skb) + kfree_skb(head); + } else { kfree_skb(head); - else if (copied) - kcm_tx_msg(head)->last_skb = skb; + kcm->seq_skb = NULL; + }
err = sk_stream_error(sk, msg->msg_flags, err);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit 3cec50490969afd4a76ccee441f747d869ccff77 upstream.
Commit 408579cd627a ("mm: Update do_vmi_align_munmap() return semantics") seems to have updated one of the callers of do_vmi_munmap() incorrectly: it used to check for the error case (which didn't change: negative means error).
That commit changed the check to the success case (which did change: before that commit, 0 was success, and 1 was "success and lock downgraded". After the change, it's always 0 for success, and the lock will have been released if requested).
This didn't change any actual VM behavior _except_ for memory accounting when 'VM_ACCOUNT' was set on the vma. Which made the wrong return value test fairly subtle, since everything continues to work.
Or rather - it continues to work but the "Committed memory" accounting goes all wonky (Committed_AS value in /proc/meminfo), and depending on settings that then causes problems much much later as the VM relies on bogus statistics for its heuristics.
Revert that one line of the change back to the original logic.
Fixes: 408579cd627a ("mm: Update do_vmi_align_munmap() return semantics") Reported-by: Christoph Biedl linux-kernel.bfrz@manchmal.in-ulm.de Reported-bisected-and-tested-by: Michael Labiuk michael.labiuk@virtuozzo.com Cc: Bagas Sanjaya bagasdotme@gmail.com Cc: Liam R. Howlett Liam.Howlett@oracle.com Link: https://lore.kernel.org/all/1694366957@msgid.manchmal.in-ulm.de/ Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Cc: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mremap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/mremap.c +++ b/mm/mremap.c @@ -715,7 +715,7 @@ static unsigned long move_vma(struct vm_ }
vma_iter_init(&vmi, mm, old_addr); - if (!do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false)) { + if (do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false) < 0) { /* OOM: unable to split vma, just get accounts right */ if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) vm_acct_memory(old_len >> PAGE_SHIFT);
6.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wesley Chalmers wesley.chalmers@amd.com
commit 3d028d5d60d516c536de1ddd3ebf3d55f3f8983b upstream.
[WHY] Currently, when insert_plane is called with insert_above_mpcc parameter that is equal to tree->opp_list, the function returns NULL.
[HOW] Instead, the function should insert the plane at the top of the tree.
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Reviewed-by: Jun Lei jun.lei@amd.com Acked-by: Tom Chung chiahsuan.chung@amd.com Signed-off-by: Wesley Chalmers wesley.chalmers@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c @@ -212,8 +212,9 @@ struct mpcc *mpc1_insert_plane( /* check insert_above_mpcc exist in tree->opp_list */ struct mpcc *temp_mpcc = tree->opp_list;
- while (temp_mpcc && temp_mpcc->mpcc_bot != insert_above_mpcc) - temp_mpcc = temp_mpcc->mpcc_bot; + if (temp_mpcc != insert_above_mpcc) + while (temp_mpcc && temp_mpcc->mpcc_bot != insert_above_mpcc) + temp_mpcc = temp_mpcc->mpcc_bot; if (temp_mpcc == NULL) return NULL; }
Hello,
On Sun, 17 Sep 2023 21:10:00 +0200 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] e5d077fb1eae ("Linux 6.5.4-rc1")
Thanks, SJ
[...]
---
[32m ok 1 selftests: damon: debugfs_attrs.sh ok 2 selftests: damon: debugfs_schemes.sh ok 3 selftests: damon: debugfs_target_ids.sh ok 4 selftests: damon: debugfs_empty_targets.sh ok 5 selftests: damon: debugfs_huge_count_read_write.sh ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: debugfs_rm_non_contexts.sh ok 8 selftests: damon: sysfs.sh ok 9 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 10 selftests: damon: reclaim.sh ok 11 selftests: damon: lru_sort.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_m68k.sh ok 12 selftests: damon-tests: build_arm64.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m _remote_run_corr.sh SUCCESS
On Sun, Sep 17, 2023 at 09:10:00PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully compiled and installed bindeb-pkgs on my computer (Acer Aspire E15, Intel Core i3 Haswell). No noticeable regressions.
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
On 9/17/23 12:10 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
Hi Greg,
On 17/09/2023 20:10, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
...
Sameer Pujar spujar@nvidia.com arm64: tegra: Update AHUB clock parent and rate
Unfortunately, the above change is causing a regression in one of our audio tests and we are looking into why this is.
Can we drop this from stable for now?
Sorry about this.
Jon
On Mon, Sep 18, 2023 at 01:52:10PM +0100, Jon Hunter wrote:
Hi Greg,
On 17/09/2023 20:10, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
...
Sameer Pujar spujar@nvidia.com arm64: tegra: Update AHUB clock parent and rate
Unfortunately, the above change is causing a regression in one of our audio tests and we are looking into why this is.
Can we drop this from stable for now?
Is it also a problem in Linus's tree? Keeping bug-compatible is always good :)
thanks,
greg k-h
On 18/09/2023 13:56, Greg Kroah-Hartman wrote:
On Mon, Sep 18, 2023 at 01:52:10PM +0100, Jon Hunter wrote:
Hi Greg,
On 17/09/2023 20:10, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
...
Sameer Pujar spujar@nvidia.com arm64: tegra: Update AHUB clock parent and rate
Unfortunately, the above change is causing a regression in one of our audio tests and we are looking into why this is.
Can we drop this from stable for now?
Is it also a problem in Linus's tree? Keeping bug-compatible is always good :)
Yes this is also seen in Linus' tree, so we are compatible :-)
Jon
On 9/18/23 05:56, Greg Kroah-Hartman wrote:
On Mon, Sep 18, 2023 at 01:52:10PM +0100, Jon Hunter wrote:
Hi Greg,
On 17/09/2023 20:10, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
...
Sameer Pujar spujar@nvidia.com arm64: tegra: Update AHUB clock parent and rate
Unfortunately, the above change is causing a regression in one of our audio tests and we are looking into why this is.
Can we drop this from stable for now?
Is it also a problem in Linus's tree? Keeping bug-compatible is always good :)
Sorry, no, it isn't, especially in the context of at the same time suggesting that everyone should start using the most recent stable release immediately (instead of, say, selectively picking security patches).
I don't think Tegra users would be happy if their audio stopped working, and it seems unlikely that they would accept the argument that they should be happy to be bug-compatible with the latest upstream kernel - even more so if that latest upstream kernel is a release candidate and the problem was introduced in the commit window.
Guenter
On Mon, Sep 18, 2023 at 08:17:43AM -0700, Guenter Roeck wrote:
On 9/18/23 05:56, Greg Kroah-Hartman wrote:
On Mon, Sep 18, 2023 at 01:52:10PM +0100, Jon Hunter wrote:
Hi Greg,
On 17/09/2023 20:10, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
...
Sameer Pujar spujar@nvidia.com arm64: tegra: Update AHUB clock parent and rate
Unfortunately, the above change is causing a regression in one of our audio tests and we are looking into why this is.
Can we drop this from stable for now?
Is it also a problem in Linus's tree? Keeping bug-compatible is always good :)
Sorry, no, it isn't, especially in the context of at the same time suggesting that everyone should start using the most recent stable release immediately (instead of, say, selectively picking security patches).
I don't think Tegra users would be happy if their audio stopped working, and it seems unlikely that they would accept the argument that they should be happy to be bug-compatible with the latest upstream kernel - even more so if that latest upstream kernel is a release candidate and the problem was introduced in the commit window.
Very good point, I've now dropped it.
greg k-h
Hi!
Sameer Pujar spujar@nvidia.com arm64: tegra: Update AHUB clock parent and rate
Unfortunately, the above change is causing a regression in one of our audio tests and we are looking into why this is.
Can we drop this from stable for now?
Is it also a problem in Linus's tree? Keeping bug-compatible is always good :)
Please update Documentation/process/stable-kernel-rules.rst accordingly. People expect certain properties from stable tree, and of the rules written there only "It or an equivalent fix must already exist in Linus' tree (upstream)." is followed.
Best regards, Pavel
On 18/09/2023 13:52, Jon Hunter wrote:
Hi Greg,
On 17/09/2023 20:10, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
...
Sameer Pujar spujar@nvidia.com arm64: tegra: Update AHUB clock parent and rate
Unfortunately, the above change is causing a regression in one of our audio tests and we are looking into why this is.
Can we drop this from stable for now?
An alternative to dropping this change is to pull in the following fixes ...
https://lore.kernel.org/linux-tegra/169447691068.2390116.1051850521758046996...
Thanks Jon
On Mon, Sep 18, 2023 at 02:04:27PM +0100, Jon Hunter wrote:
On 18/09/2023 13:52, Jon Hunter wrote:
Hi Greg,
On 17/09/2023 20:10, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
...
Sameer Pujar spujar@nvidia.com arm64: tegra: Update AHUB clock parent and rate
Unfortunately, the above change is causing a regression in one of our audio tests and we are looking into why this is.
Can we drop this from stable for now?
An alternative to dropping this change is to pull in the following fixes ...
https://lore.kernel.org/linux-tegra/169447691068.2390116.1051850521758046996...
they aren't in Linus's tree yet, any plans on when that will happen?
And to confirm, what are the git ids of these in linux-next?
thanks,
greg k-h
On 18/09/2023 14:14, Greg Kroah-Hartman wrote:
On Mon, Sep 18, 2023 at 02:04:27PM +0100, Jon Hunter wrote:
On 18/09/2023 13:52, Jon Hunter wrote:
Hi Greg,
On 17/09/2023 20:10, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
...
Sameer Pujar spujar@nvidia.com arm64: tegra: Update AHUB clock parent and rate
Unfortunately, the above change is causing a regression in one of our audio tests and we are looking into why this is.
Can we drop this from stable for now?
An alternative to dropping this change is to pull in the following fixes ...
https://lore.kernel.org/linux-tegra/169447691068.2390116.1051850521758046996...
they aren't in Linus's tree yet, any plans on when that will happen?
Not clear at the moment, I see Mark has them queued up here ...
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git/log/?h=for...
And to confirm, what are the git ids of these in linux-next?
In next I see the following git ids ...
e765886249c5 ASoC: tegra: Fix redundant PLLA and PLLA_OUT0 updates f101583fa9f8 ASoC: soc-utils: Export snd_soc_dai_is_dummy() symbol
Thanks Jon
On Mon, Sep 18, 2023 at 01:52:10PM +0100, Jon Hunter wrote:
Hi Greg,
On 17/09/2023 20:10, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
...
Sameer Pujar spujar@nvidia.com arm64: tegra: Update AHUB clock parent and rate
Unfortunately, the above change is causing a regression in one of our audio tests and we are looking into why this is.
Can we drop this from stable for now?
Ok, now dropped, thanks all for the report.
greg k-h
On Sun, Sep 17, 2023 at 09:10:00PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Tested rc1 against the Fedora build system (aarch64, ppc64le, s390x, x86_64), and boot tested x86_64. No regressions noted.
Tested-by: Justin M. Forbes jforbes@fedoraproject.org
On Mon, 18 Sept 2023 at 01:13, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.5.4-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-6.5.y * git commit: e5d077fb1eae958bde2a55b3065972193e501b78 * git describe: v6.5.2-1016-ge5d077fb1eae * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.5.y/build/v6.5.2-...
## Test Regressions (compared to v6.5.2-740-g7bfd1316ceae)
## Metric Regressions (compared to v6.5.2-740-g7bfd1316ceae)
## Test Fixes (compared to v6.5.2-740-g7bfd1316ceae)
## Metric Fixes (compared to v6.5.2-740-g7bfd1316ceae)
## Test result summary total: 245581, pass: 211062, fail: 3143, skip: 31023, xfail: 353
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 278 total, 276 passed, 2 failed * arm64: 94 total, 90 passed, 4 failed * i386: 69 total, 69 passed, 0 failed * mips: 59 total, 55 passed, 4 failed * parisc: 8 total, 8 passed, 0 failed * powerpc: 75 total, 71 passed, 4 failed * riscv: 52 total, 48 passed, 4 failed * s390: 32 total, 26 passed, 6 failed * sh: 28 total, 24 passed, 4 failed * sparc: 16 total, 16 passed, 0 failed * x86_64: 81 total, 76 passed, 5 failed
## Test suites summary * boot * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-user_events * kselftest-vDSO * kselftest-vm * kselftest-watchdog * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * perf * rcutorture * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
On Sun, Sep 17, 2023 at 09:10:00PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 157 fail: 0 Qemu test results: total: 530 pass: 530 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On 9/17/2023 12:10 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On 9/17/23 13:10, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.5.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.5.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Sun, Sep 17, 2023 at 09:10:00PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.5.4 release. There are 285 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Conor Dooley conor.dooley@microchip.com
Thanks, Conor.
linux-stable-mirror@lists.linaro.org