This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.26-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.6.26-rc1
Ard Biesheuvel ardb@kernel.org x86/efistub: Remap kernel text read-only before dropping NX attribute
Ard Biesheuvel ardb@kernel.org x86/sev: Move early startup code into .head.text section
Ard Biesheuvel ardb@kernel.org x86/sme: Move early SME kernel encryption handling into .head.text
Ard Biesheuvel ardb@kernel.org efi/libstub: Add generic support for parsing mem_encrypt=
Hou Wenlong houwenlong.hwl@antgroup.com x86/head/64: Move the __head definition to <asm/init.h>
Andrii Nakryiko andrii@kernel.org bpf: support deferring bpf_link dealloc to after RCU grace period
Andrii Nakryiko andrii@kernel.org bpf: put uprobe link's path and task in release callback
Davide Caratti dcaratti@redhat.com mptcp: don't account accept() of non-MPC client as fallback to TCP
Davide Caratti dcaratti@redhat.com mptcp: don't overwrite sock_ops in mptcp_is_tcpsk()
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: connect: fix shellcheck warnings
Sergey Shtylyov s.shtylyov@omp.ru of: module: prevent NULL pointer dereference in vsnprintf()
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "x86/mpparse: Register APIC address only once"
Andi Shyti andi.shyti@linux.intel.com drm/i915/gt: Enable only one CCS for compute workload
Andi Shyti andi.shyti@linux.intel.com drm/i915/gt: Do not generate the command streamer for all the CCS
Andi Shyti andi.shyti@linux.intel.com drm/i915/gt: Disable HW load balancing for CCS
Paulo Alcantara pc@manguebit.com smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect()
Paulo Alcantara pc@manguebit.com smb: client: fix potential UAF in smb2_is_network_name_deleted()
Paulo Alcantara pc@manguebit.com smb: client: fix potential UAF in is_valid_oplock_break()
Paulo Alcantara pc@manguebit.com smb: client: fix potential UAF in smb2_is_valid_lease_break()
Paulo Alcantara pc@manguebit.com smb: client: fix potential UAF in smb2_is_valid_oplock_break()
Paulo Alcantara pc@manguebit.com smb: client: fix potential UAF in cifs_dump_full_key()
Paulo Alcantara pc@manguebit.com smb: client: fix potential UAF in cifs_stats_proc_show()
Paulo Alcantara pc@manguebit.com smb: client: fix potential UAF in cifs_stats_proc_write()
Paulo Alcantara pc@manguebit.com smb: client: fix potential UAF in cifs_debug_files_proc_show()
Ritvik Budhiraja rbudhiraja@microsoft.com smb3: retrying on failed server close
Paulo Alcantara pc@manguebit.com smb: client: serialise cifs_construct_tcon() with cifs_mount_mutex
Paulo Alcantara pc@manguebit.com smb: client: handle DFS tcons in cifs_construct_tcon()
Stefan O'Rear sorear@fastmail.com riscv: process: Fix kernel gp leakage
Samuel Holland samuel.holland@sifive.com riscv: Fix spurious errors from __get/put_kernel_nofault
Sumanth Korikkar sumanthk@linux.ibm.com s390/entry: align system call table on 8 bytes
Edward Liaw edliaw@google.com selftests/mm: include strings.h for ffsl
David Hildenbrand david@redhat.com mm/secretmem: fix GUP-fast succeeding on secretmem folios
Mark Brown broonie@kernel.org arm64/ptrace: Use saved floating point state type to determine SVE layout
Kan Liang kan.liang@linux.intel.com perf/x86/intel/ds: Don't clear ->pebs_data_cfg for the last PEBS event
Jason A. Donenfeld Jason@zx2c4.com x86/coco: Require seeding RNG with RDRAND on CoCo systems
Borislav Petkov (AMD) bp@alien8.de x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
David Hildenbrand david@redhat.com x86/mm/pat: fix VM_PAT handling in COW mappings
Herve Codina herve.codina@bootlin.com of: dynamic: Synchronize of_changeset_destroy() with the devlink removals
Herve Codina herve.codina@bootlin.com driver core: Introduce device_link_wait_removal()
Jens Axboe axboe@kernel.dk io_uring/kbuf: hold io_buffer_list reference over mmap
Jens Axboe axboe@kernel.dk io_uring: use private workqueue for exit work
Jens Axboe axboe@kernel.dk io_uring/kbuf: protect io_buffer_list teardown with a reference
Jens Axboe axboe@kernel.dk io_uring/kbuf: get rid of bl->is_ready
Jens Axboe axboe@kernel.dk io_uring/kbuf: get rid of lower BGID lists
I Gede Agastya Darma Laksana gedeagas22@gmail.com ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone
Christoffer Sandberg cs@tuxedo.de ALSA: hda/realtek - Fix inactive headset mic jack
Namjae Jeon linkinjeon@kernel.org ksmbd: do not set SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1
Namjae Jeon linkinjeon@kernel.org ksmbd: validate payload size in ipc response
Namjae Jeon linkinjeon@kernel.org ksmbd: don't send oplock break if rename fails
Kent Gibson warthog618@gmail.com gpio: cdev: fix missed label sanitizing in debounce_setup()
Bartosz Golaszewski bartosz.golaszewski@linaro.org gpio: cdev: check for NULL labels when sanitizing them for irqs
Borislav Petkov (AMD) bp@alien8.de x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk
Jesse Brandeburg jesse.brandeburg@intel.com ice: fix typo in assignment
Jeff Layton jlayton@kernel.org nfsd: hold a lighter-weight client reference over CB_RECALL_ANY
Alexandre Ghiti alexghiti@rivosinc.com riscv: Disable preemption when using patch_map()
Chuck Lever chuck.lever@oracle.com SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP
Vijendar Mukunda Vijendar.Mukunda@amd.com ASoC: SOF: amd: fix for false dsp interrupts
Arnd Bergmann arnd@arndb.de ata: sata_mv: Fix PCI device ID table declaration compilation warning
Thomas Richter tmricht@linux.ibm.com s390/pai: fix sampling event removal for PMU device driver
Thomas Richter tmricht@linux.ibm.com s390/pai: rework paiXXX_start and paiXXX_stop functions
Thomas Richter tmricht@linux.ibm.com s390/pai: cleanup event initialization
Thomas Richter tmricht@linux.ibm.com s390/pai_crypto: remove per-cpu variable assignement in event initialization
Thomas Richter tmricht@linux.ibm.com s390/pai: initialize event count once at initialization
Huai-Yuan Liu qq810974084@gmail.com spi: mchp-pci1xxx: Fix a possible null pointer dereference in pci1xxx_spi_probe
David Howells dhowells@redhat.com cifs: Fix caching to try to do open O_WRONLY as rdwr on server
Oswald Buddenhagen oswald.buddenhagen@gmx.de Revert "ALSA: emu10k1: fix synthesizer sample playback position and caching"
Li Nan linan122@huawei.com scsi: sd: Unregister device if device_add_disk() failed in sd_probe()
Arnd Bergmann arnd@arndb.de scsi: mylex: Fix sysfs buffer lengths
Arnd Bergmann arnd@arndb.de ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
Richard Fitzgerald rf@opensource.cirrus.com regmap: maple: Fix uninitialized symbol 'ret' warnings
Vijendar Mukunda Vijendar.Mukunda@amd.com ASoC: amd: acp: fix for acp_init function error handling
Jaewon Kim jaewon02.kim@samsung.com spi: s3c64xx: Use DMA mode from fifo size
Tudor Ambarus tudor.ambarus@linaro.org spi: s3c64xx: determine the fifo depth only once
Tudor Ambarus tudor.ambarus@linaro.org spi: s3c64xx: allow full FIFO masks
Tudor Ambarus tudor.ambarus@linaro.org spi: s3c64xx: define a magic value
Tudor Ambarus tudor.ambarus@linaro.org spi: s3c64xx: remove else after return
Tudor Ambarus tudor.ambarus@linaro.org spi: s3c64xx: explicitly include <linux/bits.h>
Tudor Ambarus tudor.ambarus@linaro.org spi: s3c64xx: sort headers alphabetically
Sam Protsenko semen.protsenko@linaro.org spi: s3c64xx: Extract FIFO depth calculation to a dedicated macro
Stephen Lee slee08177@gmail.com ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ASoC: rt722-sdca-sdw: fix locking sequence
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ASoC: rt712-sdca-sdw: fix locking sequence
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ASoC: rt711-sdw: fix locking sequence
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ASoC: rt711-sdca: fix locking sequence
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ASoC: rt5682-sdw: fix locking sequence
Rob Clark robdclark@chromium.org drm/prime: Unbreak virtgpu dma-buf export
Dave Airlie airlied@redhat.com nouveau/uvmm: fix addr/range calcs for remap operations
Christian Hewitt christianshewitt@gmail.com drm/panfrost: fix power transition timeout warnings
Simon Trimmer simont@opensource.cirrus.com ALSA: hda: cs35l56: Add ACPI device match tables
Richard Fitzgerald rf@opensource.cirrus.com regmap: maple: Fix cache corruption in regcache_maple_drop()
Victor Isaev victor@torrio.net RISC-V: Update AT_VECTOR_SIZE_ARCH for new AT_MINSIGSTKSZ
Pu Lehui pulehui@huawei.com drivers/perf: riscv: Disable PERF_SAMPLE_BRANCH_* while not supported
Richard Fitzgerald rf@opensource.cirrus.com ASoC: wm_adsp: Fix missing mutex_lock in wm_adsp_write_ctl()
Dominique Martinet asmadeus@codewreck.org 9p: Fix read/write debug statements to report server reply
Jann Horn jannh@google.com fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
Ashish Kalra ashish.kalra@amd.com KVM: SVM: Add support for allowing zero SEV ASIDs
Sean Christopherson seanjc@google.com KVM: SVM: Use unsigned integers when dealing with ASIDs
Paul Barker paul.barker.ct@bp.renesas.com net: ravb: Always update error counters
Paul Barker paul.barker.ct@bp.renesas.com net: ravb: Always process TX descriptor ring
Claudiu Beznea claudiu.beznea.uj@bp.renesas.com net: ravb: Let IP-specific receive function to interrogate descriptors
Vitaly Lifshits vitaly.lifshits@intel.com e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue
Vitaly Lifshits vitaly.lifshits@intel.com e1000e: Minor flow correction in e1000_shutdown function
Vitaly Lifshits vitaly.lifshits@intel.com e1000e: Workaround for sporadic MDI error on Meteor Lake systems
Jesse Brandeburg jesse.brandeburg@intel.com intel: legacy: field get conversion
Jesse Brandeburg jesse.brandeburg@intel.com intel: add bit macro includes where needed
Ivan Vecera ivecera@redhat.com i40e: Remove circular header dependencies and fix headers
Ivan Vecera ivecera@redhat.com i40e: Split i40e_osdep.h
Ivan Vecera ivecera@redhat.com i40e: Move memory allocation structures to i40e_alloc.h
Ivan Vecera ivecera@redhat.com i40e: Simplify memory allocation functions
Ivan Vecera ivecera@redhat.com virtchnl: Add header dependencies
Ivan Vecera ivecera@redhat.com i40e: Refactor I40E_MDIO_CLAUSE* macros
Ivan Vecera ivecera@redhat.com i40e: Remove back pointer from i40e_hw structure
Ivan Vecera ivecera@redhat.com i40e: Enforce software interrupt during busy-poll exit
Ivan Vecera ivecera@redhat.com i40e: Remove _t suffix from enum type names
Mario Limonciello mario.limonciello@amd.com drm/amd: Flush GFXOFF requests in prepare stage
Mario Limonciello mario.limonciello@amd.com drm/amd: Add concept of running prepare_suspend() sequence for IP blocks
Mario Limonciello mario.limonciello@amd.com drm/amd: Evict resources during PM ops prepare() callback
Chris Park chris.park@amd.com drm/amd/display: Prevent crash when disable stream
Dmytro Laktyushkin dmytro.laktyushkin@amd.com drm/amd/display: Fix DPSTREAM CLK on and off sequence
Christian A. Ehrhardt lk@c--e.de usb: typec: ucsi: Check for notifications after init
Krishna Kurapati quic_kriskura@quicinc.com usb: typec: ucsi: Fix race between typec_switch and role_switch
Alexander Wetzel Alexander@wetzel-home.de scsi: sg: Avoid sg device teardown race
Aleksandr Loktionov aleksandr.loktionov@intel.com i40e: fix vf may be used uninitialized in this function warning
Aleksandr Loktionov aleksandr.loktionov@intel.com i40e: fix i40e_count_filters() to count only active/new filters
Aleksandr Mishin amishin@t-argos.ru octeontx2-af: Add array index check
Su Hui suhui@nfschina.com octeontx2-pf: check negative error code in otx2_open()
Hariprasad Kelam hkelam@marvell.com octeontx2-af: Fix issue with loading coalesced KPU profiles
Antoine Tenart atenart@kernel.org udp: prevent local UDP tunnel packets from being GROed
Antoine Tenart atenart@kernel.org udp: do not transition UDP GRO fraglist partial checksums to unnecessary
Antoine Tenart atenart@kernel.org udp: do not accept non-tunnel GSO skbs landing in a tunnel
Atlas Yu atlas.yu@canonical.com r8169: skip DASH fw status checks when DASH is disabled
David Thompson davthompson@nvidia.com mlxbf_gige: stop interface during shutdown
Kuniyuki Iwashima kuniyu@amazon.com ipv6: Fix infinite recursion in fib6_dump_done().
Duoming Zhou duoming@zju.edu.cn ax25: fix use-after-free bugs caused by ax25_ds_del_timer
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix bind() regression for v6-only wildcard and v4(-mapped-v6) non-wildcard addresses.
Jakub Kicinski kuba@kernel.org selftests: reuseaddr_conflict: add missing new line at the end of the output
Eric Dumazet edumazet@google.com erspan: make sure erspan_base_hdr is present in skb->head
Ivan Vecera ivecera@redhat.com i40e: Fix VF MAC filter removal
Petr Oros poros@redhat.com ice: fix enabling RX VLAN filtering
Antoine Tenart atenart@kernel.org gro: fix ownership transfer
Antoine Tenart atenart@kernel.org selftests: net: gro fwd: update vxlan GRO test expectations
Michael Krummsdorf michael.krummsdorf@tq-group.com net: dsa: mv88e6xxx: fix usable ports on 88e6020
Aleksandr Mishin amishin@t-argos.ru net: phy: micrel: Fix potential null pointer dereference
Wei Fang wei.fang@nxp.com net: fec: Set mac_managed_pm during probe
Duanqiang Wen duanqiangwen@net-swift.com net: txgbe: fix i2c dev name cannot match clkdev
Horatiu Vultur horatiu.vultur@microchip.com net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
Piotr Wejman piotrwejman90@gmail.com net: stmmac: fix rx queue priority assignment
Eric Dumazet edumazet@google.com net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
Christophe JAILLET christophe.jaillet@wanadoo.fr net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()
Eric Dumazet edumazet@google.com net/sched: act_skbmod: prevent kernel-infoleak
Will Deacon will@kernel.org KVM: arm64: Ensure target address is granule-aligned for range TLBI
Borislav Petkov (AMD) bp@alien8.de x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO
Jakub Sitnicki jakub@cloudflare.com bpf, sockmap: Prevent lock inversion deadlock in map delete elem
Christophe JAILLET christophe.jaillet@wanadoo.fr vboxsf: Avoid an spurious warning if load_nls_xxx() fails
Eric Dumazet edumazet@google.com netfilter: validate user input for expected length
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: discard table flag update with pending basechain deletion
Ziyang Xuan william.xuanziyang@huawei.com netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: flush pending destroy work before exit_net release
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: reject new basechain after table flag update
Borislav Petkov (AMD) bp@alien8.de x86/bugs: Fix the SRSO mitigation on Zen3/4
Josh Poimboeuf jpoimboe@kernel.org x86/nospec: Refactor UNTRAIN_RET[_*]
Josh Poimboeuf jpoimboe@kernel.org x86/srso: Disentangle rethunk-dependent options
Josh Poimboeuf jpoimboe@kernel.org x86/srso: Improve i-cache locality for alias mitigation
Marco Pinna marco.pinn95@gmail.com vsock/virtio: fix packet delivery to tap device
Haiyang Zhang haiyangz@microsoft.com net: mana: Fix Rx DMA datasize and skb_over_panic
Jose Ignacio Tornos Martinez jtornosm@redhat.com net: usb: ax88179_178a: avoid the interface always configured as random address
Mahmoud Adam mngyadam@amazon.com net/rds: fix possible cp null dereference
Jesper Dangaard Brouer hawk@kernel.org xen-netfront: Add missing skb_mark_for_recycle
Geliang Tang tanggeliang@kylinos.cn selftests: mptcp: join: fix dev in check_endpoint
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: release batch on table validation from abort path
Bastien Nocera hadess@hadess.net Bluetooth: Fix TOCTOU in HCI debugfs implementation
Hui Wang hui.wang@canonical.com Bluetooth: hci_event: set the conn encrypted before conn establishes
Johan Hovold johan+linaro@kernel.org Bluetooth: add quirk for broken address properties
Johan Hovold johan+linaro@kernel.org Bluetooth: qca: fix device-address endianness
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sc7180-trogdor: mark bluetooth address as broken
Johan Hovold johan+linaro@kernel.org Revert "Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT"
Uros Bizjak ubizjak@gmail.com x86/bpf: Fix IP after emitting call depth accounting
Sean Christopherson seanjc@google.com x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined word
Heiner Kallweit hkallweit1@gmail.com r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d
Christian Göttsche cgzones@googlemail.com selinux: avoid dereference of garbage after mount failure
Oliver Upton oliver.upton@linux.dev KVM: arm64: Fix host-programmed guest events in nVHE
Anup Patel apatel@ventanamicro.com RISC-V: KVM: Fix APLIC in_clrip[x] read emulation
Anup Patel apatel@ventanamicro.com RISC-V: KVM: Fix APLIC setipnum_le/be write emulation
Bartosz Golaszewski bartosz.golaszewski@linaro.org gpio: cdev: sanitize the label before requesting the interrupt
Masahiro Yamada masahiroy@kernel.org modpost: do not make find_tosym() return NULL
Jack Brennen jbrennen@google.com modpost: Optimize symbol search from linear to binary search
Sandipan Das sandipan.das@amd.com perf/x86/amd/lbr: Use freeze based on availability
Sandipan Das sandipan.das@amd.com x86/cpufeatures: Add new word for scattered features
Sandipan Das sandipan.das@amd.com perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later
Borislav Petkov (AMD) bp@alien8.de x86/CPU/AMD: Add X86_FEATURE_ZEN1
Borislav Petkov (AMD) bp@alien8.de x86/CPU/AMD: Get rid of amd_erratum_1054[]
Borislav Petkov (AMD) bp@alien8.de x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init function
Borislav Petkov (AMD) bp@alien8.de x86/CPU/AMD: Move Zenbleed check to the Zen2 init function
Borislav Petkov (AMD) bp@alien8.de x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init function
Borislav Petkov (AMD) bp@alien8.de x86/CPU/AMD: Carve out the erratum 1386 fix
Borislav Petkov (AMD) bp@alien8.de x86/CPU/AMD: Add ZenX generations flags
Filipe Manana fdmanana@suse.com btrfs: fix race when detecting delalloc ranges during fiemap
Filipe Manana fdmanana@suse.com btrfs: ensure fiemap doesn't race with writes when FIEMAP_FLAG_SYNC is given
Ingo Molnar mingo@kernel.org Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
Peter Xu peterx@redhat.com mm/treewide: replace pud_large() with pud_leaf()
Arnd Bergmann arnd@arndb.de dm integrity: fix out-of-range warning
Tejas Upadhyay tejas.upadhyay@intel.com drm/i915/mtl: Update workaround 14018575942
Matt Roper matthew.d.roper@intel.com drm/i915/xelpg: Extend some workarounds/tuning to gfx version 12.74
Tejas Upadhyay tejas.upadhyay@intel.com drm/i915/mtl: Update workaround 14016712196
Matt Roper matthew.d.roper@intel.com drm/i915: Replace several IS_METEORLAKE with proper IP version checks
Matt Roper matthew.d.roper@intel.com drm/i915: Eliminate IS_MTL_GRAPHICS_STEP
Matt Roper matthew.d.roper@intel.com drm/i915/xelpg: Call Xe_LPG workaround functions based on IP version
Matt Roper matthew.d.roper@intel.com drm/i915: Consolidate condition for Wa_22011802037
Matt Roper matthew.d.roper@intel.com drm/i915: Tidy workaround definitions
Matt Roper matthew.d.roper@intel.com drm/i915/dg2: Drop pre-production GT workarounds
Florian Westphal fw@strlen.de inet: inet_defrag: prevent sk release while still in use
Hariprasad Kelam hkelam@marvell.com Octeontx2-af: fix pause frame configuration in GMP mode
Raju Lakkaraju Raju.Lakkaraju@microchip.com net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips
Justin Chen justin.chen@broadcom.com net: bcmasp: Bring up unimac after PHY link up
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: skip netdev hook unregistration if table is dormant
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: reject table flag and netdev basechain updates
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: reject destroy command to remove basechain hooks
David Howells dhowells@redhat.com cifs: Fix duplicate fscache cookie warnings
Andrei Matei andreimatei1@gmail.com bpf: Protect against int overflow for stack access size
David Thompson davthompson@nvidia.com mlxbf_gige: call request_irq() after NAPI initialized
Sabrina Dubroca sd@queasysnail.net tls: get psock ref after taking rxlock to avoid leak
Sabrina Dubroca sd@queasysnail.net tls: adjust recv return with async crypto and failed copy to userspace
Sabrina Dubroca sd@queasysnail.net tls: recv: process_rx_list shouldn't use an offset with kvec
Jian Shen shenjian15@huawei.com net: hns3: mark unexcuted loopback test result as UNEXECUTED
Yonglong Liu liuyonglong@huawei.com net: hns3: fix kernel crash when devlink reload during pf initialization
Jie Wang wangjie125@huawei.com net: hns3: fix index limit to support all queue stats
Nikita Kiryushin kiryushin@ancud.ru ACPICA: debugger: check status of acpi_evaluate_object() in acpi_db_walk_for_fields()
Ido Schimmel idosch@nvidia.com selftests: vxlan_mdb: Fix failures with old libnet
Bjørn Mork bjorn@mork.no net: wwan: t7xx: Split 64bit accesses to fix alignment issues
Eric Dumazet edumazet@google.com tcp: properly terminate timers for kernel sockets
Ravi Gunasekaran r-gunasekaran@ti.com net: hsr: hsr_slave: Fix the promiscuous mode in offload mode
Alexandra Winter wintera@linux.ibm.com s390/qeth: handle deferred cc1
Kurt Kanzenbach kurt@linutronix.de igc: Remove stale comment about Tx timestamping
Przemek Kitszel przemyslaw.kitszel@intel.com ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa()
Jesse Brandeburg jesse.brandeburg@intel.com ice: fix memory corruption bug with suspend and rebuild
Michal Swiatkowski michal.swiatkowski@linux.intel.com ice: realloc VSI stats arrays
Steven Zou steven.zou@intel.com ice: Refactor FW data type and fix bitmap casting issue
Simon Trimmer simont@opensource.cirrus.com ALSA: hda: cs35l56: Set the init_done flag before component_add()
Benjamin Berg benjamin.berg@intel.com wifi: iwlwifi: mvm: include link ID when releasing frames
Emmanuel Grumbach emmanuel.grumbach@intel.com wifi: iwlwifi: disable multi rx queue for 9000
Johannes Berg johannes.berg@intel.com wifi: iwlwifi: mvm: rfi: fix potential response leaks
David Thompson davthompson@nvidia.com mlxbf_gige: stop PHY during open() error paths
Jakub Kicinski kuba@kernel.org tools: ynl: fix setting presence bits in simple nests
Ryosuke Yasuoka ryasuoka@redhat.com nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet
Artem Savkov asavkov@redhat.com arm64: bpf: fix 32bit unconditional bswap
Pavel Sakharov p.sakharov@ispras.ru dma-buf: Fix NULL pointer dereference in sanitycheck()
Puranjay Mohan puranjay12@gmail.com bpf, arm64: fix bug in BPF_LDX_MEMSX
Ilya Leoshkevich iii@linux.ibm.com s390/bpf: Fix bpf_plt pointer arithmetic
Hangbin Liu liuhangbin@gmail.com scripts/bpf_doc: Use silent mode when exec make cmd
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915: Pre-populate the cursor physical dma address
Maarten Lankhorst maarten.lankhorst@linux.intel.com drm/i915/display: Use i915_gem_object_get_dma_address to get dma address
-------------
Diffstat:
Makefile | 4 +- arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi | 2 + arch/arm64/kernel/ptrace.c | 5 +- arch/arm64/kvm/hyp/pgtable.c | 11 +- arch/arm64/net/bpf_jit_comp.c | 4 +- arch/powerpc/mm/book3s64/pgtable.c | 2 +- arch/riscv/include/asm/uaccess.h | 4 +- arch/riscv/include/uapi/asm/auxvec.h | 2 +- arch/riscv/kernel/patch.c | 8 + arch/riscv/kernel/process.c | 3 - arch/riscv/kvm/aia_aplic.c | 37 +- arch/s390/boot/vmem.c | 2 +- arch/s390/include/asm/pgtable.h | 4 +- arch/s390/kernel/entry.S | 1 + arch/s390/kernel/perf_pai_crypto.c | 48 +-- arch/s390/kernel/perf_pai_ext.c | 43 +-- arch/s390/mm/gmap.c | 2 +- arch/s390/mm/hugetlbpage.c | 4 +- arch/s390/mm/pageattr.c | 2 +- arch/s390/mm/pgtable.c | 2 +- arch/s390/mm/vmem.c | 6 +- arch/s390/net/bpf_jit_comp.c | 46 +-- arch/sparc/mm/init_64.c | 2 +- arch/x86/boot/compressed/Makefile | 2 +- arch/x86/boot/compressed/misc.c | 1 + arch/x86/boot/compressed/sev.c | 3 + arch/x86/coco/core.c | 41 ++ arch/x86/events/amd/core.c | 24 +- arch/x86/events/amd/lbr.c | 16 +- arch/x86/events/intel/ds.c | 8 +- arch/x86/include/asm/asm-prototypes.h | 1 + arch/x86/include/asm/boot.h | 1 + arch/x86/include/asm/coco.h | 2 + arch/x86/include/asm/cpufeature.h | 8 +- arch/x86/include/asm/cpufeatures.h | 16 +- arch/x86/include/asm/disabled-features.h | 3 +- arch/x86/include/asm/init.h | 2 + arch/x86/include/asm/mem_encrypt.h | 8 +- arch/x86/include/asm/nospec-branch.h | 71 ++-- arch/x86/include/asm/required-features.h | 3 +- arch/x86/include/asm/sev.h | 10 +- arch/x86/kernel/cpu/amd.c | 129 +++++-- arch/x86/kernel/cpu/bugs.c | 5 +- arch/x86/kernel/cpu/mce/core.c | 4 +- arch/x86/kernel/cpu/scattered.c | 1 + arch/x86/kernel/head64.c | 3 +- arch/x86/kernel/mpparse.c | 10 +- arch/x86/kernel/setup.c | 2 + arch/x86/kernel/sev-shared.c | 23 +- arch/x86/kernel/sev.c | 14 +- arch/x86/kernel/vmlinux.lds.S | 7 +- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/reverse_cpuid.h | 2 + arch/x86/kvm/svm/sev.c | 45 ++- arch/x86/kvm/trace.h | 10 +- arch/x86/lib/retpoline.S | 165 ++++---- arch/x86/mm/fault.c | 4 +- arch/x86/mm/ident_map.c | 23 +- arch/x86/mm/init_64.c | 4 +- arch/x86/mm/kasan_init_64.c | 2 +- arch/x86/mm/mem_encrypt_identity.c | 44 +-- arch/x86/mm/pat/memtype.c | 49 ++- arch/x86/mm/pat/set_memory.c | 6 +- arch/x86/mm/pgtable.c | 2 +- arch/x86/mm/pti.c | 2 +- arch/x86/net/bpf_jit_comp.c | 2 +- arch/x86/power/hibernate.c | 2 +- arch/x86/xen/mmu_pv.c | 4 +- drivers/acpi/acpica/dbnames.c | 8 +- drivers/ata/sata_mv.c | 63 ++- drivers/ata/sata_sx4.c | 6 +- drivers/base/core.c | 26 +- drivers/base/regmap/regcache-maple.c | 6 +- drivers/bluetooth/btqca.c | 8 +- drivers/bluetooth/hci_qca.c | 19 +- drivers/dma-buf/st-dma-fence-chain.c | 6 +- drivers/firmware/efi/libstub/efi-stub-helper.c | 8 + drivers/firmware/efi/libstub/efistub.h | 2 +- drivers/firmware/efi/libstub/x86-stub.c | 11 +- drivers/gpio/gpiolib-cdev.c | 58 ++- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 43 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 +- .../amd/display/dc/dce110/dce110_hw_sequencer.c | 3 +- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 11 +- drivers/gpu/drm/amd/include/amd_shared.h | 1 + drivers/gpu/drm/drm_prime.c | 7 +- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/display/intel_cursor.c | 6 +- drivers/gpu/drm/i915/display/intel_display_types.h | 1 + drivers/gpu/drm/i915/display/intel_fb_pin.c | 10 + drivers/gpu/drm/i915/display/skl_universal_plane.c | 5 +- drivers/gpu/drm/i915/gem/i915_gem_create.c | 4 +- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 10 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 21 +- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- .../gpu/drm/i915/gt/intel_execlists_submission.c | 4 +- drivers/gpu/drm/i915/gt/intel_gt.h | 31 ++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 ++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 + drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 7 +- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 6 + drivers/gpu/drm/i915/gt/intel_lrc.c | 38 +- drivers/gpu/drm/i915/gt/intel_mocs.c | 23 +- drivers/gpu/drm/i915/gt/intel_rc6.c | 6 +- drivers/gpu/drm/i915/gt/intel_reset.c | 20 +- drivers/gpu/drm/i915/gt/intel_reset.h | 2 + drivers/gpu/drm/i915/gt/intel_rps.c | 2 +- drivers/gpu/drm/i915/gt/intel_workarounds.c | 422 +++++++-------------- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 26 +- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 6 +- drivers/gpu/drm/i915/i915_debugfs.c | 2 +- drivers/gpu/drm/i915/i915_drv.h | 4 - drivers/gpu/drm/i915/i915_perf.c | 11 +- drivers/gpu/drm/i915/intel_clock_gating.c | 8 - drivers/gpu/drm/nouveau/nouveau_uvmm.c | 6 +- drivers/gpu/drm/panfrost/panfrost_gpu.c | 6 +- drivers/md/dm-integrity.c | 2 +- drivers/net/dsa/mv88e6xxx/chip.c | 6 +- drivers/net/dsa/sja1105/sja1105_mdio.c | 2 +- drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c | 28 +- drivers/net/ethernet/freescale/fec_main.c | 11 +- .../hns3/hns3_common/hclge_comm_tqp_stats.c | 2 +- drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 19 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 4 + drivers/net/ethernet/intel/e1000/e1000_hw.c | 46 +-- drivers/net/ethernet/intel/e1000e/80003es2lan.c | 3 +- drivers/net/ethernet/intel/e1000e/82571.c | 3 +- drivers/net/ethernet/intel/e1000e/ethtool.c | 7 +- drivers/net/ethernet/intel/e1000e/hw.h | 2 + drivers/net/ethernet/intel/e1000e/ich8lan.c | 56 +-- drivers/net/ethernet/intel/e1000e/mac.c | 2 +- drivers/net/ethernet/intel/e1000e/netdev.c | 35 +- drivers/net/ethernet/intel/e1000e/phy.c | 191 ++++++---- drivers/net/ethernet/intel/e1000e/phy.h | 2 + drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 4 +- drivers/net/ethernet/intel/fm10k/fm10k_vf.c | 10 +- drivers/net/ethernet/intel/i40e/i40e.h | 63 ++- drivers/net/ethernet/intel/i40e/i40e_adminq.c | 8 +- drivers/net/ethernet/intel/i40e/i40e_adminq.h | 3 +- drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h | 2 + drivers/net/ethernet/intel/i40e/i40e_alloc.h | 24 +- drivers/net/ethernet/intel/i40e/i40e_client.c | 1 - drivers/net/ethernet/intel/i40e/i40e_common.c | 12 +- drivers/net/ethernet/intel/i40e/i40e_dcb.c | 4 +- drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_ddp.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_debug.h | 47 +++ drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 3 +- drivers/net/ethernet/intel/i40e/i40e_diag.h | 5 +- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 3 +- drivers/net/ethernet/intel/i40e/i40e_hmc.c | 16 +- drivers/net/ethernet/intel/i40e/i40e_hmc.h | 4 + drivers/net/ethernet/intel/i40e/i40e_io.h | 16 + drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c | 9 +- drivers/net/ethernet/intel/i40e/i40e_lan_hmc.h | 2 + drivers/net/ethernet/intel/i40e/i40e_main.c | 70 ++-- drivers/net/ethernet/intel/i40e/i40e_nvm.c | 3 + drivers/net/ethernet/intel/i40e/i40e_osdep.h | 59 --- drivers/net/ethernet/intel/i40e/i40e_prototype.h | 4 +- drivers/net/ethernet/intel/i40e/i40e_ptp.c | 9 +- drivers/net/ethernet/intel/i40e/i40e_register.h | 5 + drivers/net/ethernet/intel/i40e/i40e_txrx.c | 89 +++-- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 6 +- drivers/net/ethernet/intel/i40e/i40e_txrx_common.h | 2 + drivers/net/ethernet/intel/i40e/i40e_type.h | 54 +-- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 47 +-- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 4 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 4 - drivers/net/ethernet/intel/i40e/i40e_xsk.h | 4 + drivers/net/ethernet/intel/iavf/iavf_common.c | 3 +- drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 5 +- drivers/net/ethernet/intel/iavf/iavf_fdir.c | 1 + drivers/net/ethernet/intel/iavf/iavf_txrx.c | 1 + drivers/net/ethernet/intel/ice/ice_adminq_cmd.h | 3 +- drivers/net/ethernet/intel/ice/ice_lag.c | 4 +- drivers/net/ethernet/intel/ice/ice_lib.c | 74 ++-- drivers/net/ethernet/intel/ice/ice_switch.c | 24 +- drivers/net/ethernet/intel/ice/ice_switch.h | 4 +- .../net/ethernet/intel/ice/ice_vf_vsi_vlan_ops.c | 18 +- drivers/net/ethernet/intel/igb/e1000_82575.c | 29 +- drivers/net/ethernet/intel/igb/e1000_i210.c | 19 +- drivers/net/ethernet/intel/igb/e1000_mac.c | 2 +- drivers/net/ethernet/intel/igb/e1000_nvm.c | 18 +- drivers/net/ethernet/intel/igb/e1000_phy.c | 13 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 8 +- drivers/net/ethernet/intel/igb/igb_main.c | 4 +- drivers/net/ethernet/intel/igbvf/mbx.c | 1 + drivers/net/ethernet/intel/igbvf/netdev.c | 33 +- drivers/net/ethernet/intel/igc/igc_i225.c | 1 + drivers/net/ethernet/intel/igc/igc_main.c | 4 - drivers/net/ethernet/intel/igc/igc_phy.c | 1 + drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 30 +- drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 16 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c | 8 +- drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 8 +- drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c | 8 +- drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 19 +- drivers/net/ethernet/marvell/octeontx2/af/cgx.c | 5 + .../net/ethernet/marvell/octeontx2/af/rvu_cgx.c | 2 + .../net/ethernet/marvell/octeontx2/af/rvu_npc.c | 2 +- .../net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 2 +- .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 31 +- drivers/net/ethernet/microchip/lan743x_main.c | 18 + drivers/net/ethernet/microchip/lan743x_main.h | 4 + drivers/net/ethernet/microsoft/mana/mana_en.c | 2 +- drivers/net/ethernet/realtek/r8169_main.c | 40 +- drivers/net/ethernet/renesas/ravb_main.c | 33 +- drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 40 +- .../net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 38 +- drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c | 8 +- drivers/net/phy/micrel.c | 31 +- drivers/net/usb/ax88179_178a.c | 2 + drivers/net/wireless/intel/iwlwifi/iwl-trans.h | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 4 +- drivers/net/wireless/intel/iwlwifi/mvm/rfi.c | 8 +- drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 31 +- drivers/net/wwan/t7xx/t7xx_cldma.c | 4 +- drivers/net/wwan/t7xx/t7xx_hif_cldma.c | 9 +- drivers/net/wwan/t7xx/t7xx_pcie_mac.c | 8 +- drivers/net/xen-netfront.c | 1 + drivers/of/dynamic.c | 12 + drivers/of/module.c | 8 + drivers/perf/riscv_pmu.c | 4 + drivers/s390/net/qeth_core_main.c | 38 +- drivers/scsi/myrb.c | 20 +- drivers/scsi/myrs.c | 24 +- drivers/scsi/sd.c | 2 +- drivers/scsi/sg.c | 4 +- drivers/spi/spi-pci1xxxx.c | 2 + drivers/spi/spi-s3c64xx.c | 80 ++-- drivers/usb/typec/ucsi/ucsi.c | 10 +- drivers/usb/typec/ucsi/ucsi_glink.c | 14 + fs/btrfs/extent_io.c | 208 +++++++--- fs/btrfs/inode.c | 22 +- fs/nfsd/nfs4state.c | 7 +- fs/pipe.c | 17 +- fs/smb/client/cached_dir.c | 6 +- fs/smb/client/cifs_debug.c | 6 + fs/smb/client/cifsfs.c | 11 + fs/smb/client/cifsglob.h | 17 +- fs/smb/client/connect.c | 45 ++- fs/smb/client/dir.c | 15 + fs/smb/client/file.c | 111 +++++- fs/smb/client/fs_context.c | 6 +- fs/smb/client/fs_context.h | 12 + fs/smb/client/fscache.c | 16 +- fs/smb/client/fscache.h | 6 + fs/smb/client/inode.c | 2 + fs/smb/client/ioctl.c | 6 +- fs/smb/client/misc.c | 2 + fs/smb/client/smb1ops.c | 4 +- fs/smb/client/smb2misc.c | 4 + fs/smb/client/smb2ops.c | 11 +- fs/smb/client/smb2pdu.c | 2 +- fs/smb/server/ksmbd_netlink.h | 3 +- fs/smb/server/mgmt/share_config.c | 7 +- fs/smb/server/smb2ops.c | 10 +- fs/smb/server/smb2pdu.c | 3 +- fs/smb/server/transport_ipc.c | 37 ++ fs/vboxsf/super.c | 3 +- include/kvm/arm_pmu.h | 2 +- include/linux/avf/virtchnl.h | 5 + include/linux/bpf.h | 16 +- include/linux/device.h | 1 + include/linux/io_uring_types.h | 1 - include/linux/secretmem.h | 4 +- include/linux/skbuff.h | 7 +- include/linux/udp.h | 28 ++ include/net/bluetooth/hci.h | 9 + include/net/inet_connection_sock.h | 1 + include/net/mana/mana.h | 1 - include/net/sock.h | 7 + io_uring/io_uring.c | 18 +- io_uring/kbuf.c | 116 ++---- io_uring/kbuf.h | 8 +- kernel/bpf/syscall.c | 35 +- kernel/bpf/verifier.c | 5 + kernel/trace/bpf_trace.c | 10 +- mm/memory.c | 4 + net/9p/client.c | 10 +- net/ax25/ax25_dev.c | 2 +- net/bluetooth/hci_debugfs.c | 64 ++-- net/bluetooth/hci_event.c | 25 ++ net/bluetooth/hci_sync.c | 5 +- net/bridge/netfilter/ebtables.c | 6 + net/core/gro.c | 3 +- net/core/sock_map.c | 6 + net/hsr/hsr_slave.c | 3 +- net/ipv4/inet_connection_sock.c | 33 +- net/ipv4/inet_fragment.c | 70 +++- net/ipv4/ip_fragment.c | 2 +- net/ipv4/ip_gre.c | 5 + net/ipv4/netfilter/arp_tables.c | 4 + net/ipv4/netfilter/ip_tables.c | 4 + net/ipv4/tcp.c | 2 + net/ipv4/udp.c | 7 + net/ipv4/udp_offload.c | 23 +- net/ipv6/ip6_fib.c | 14 +- net/ipv6/ip6_gre.c | 3 + net/ipv6/netfilter/ip6_tables.c | 4 + net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +- net/ipv6/udp.c | 2 +- net/ipv6/udp_offload.c | 8 +- net/mptcp/protocol.c | 106 ++---- net/mptcp/subflow.c | 2 + net/netfilter/nf_tables_api.c | 92 ++++- net/nfc/nci/core.c | 5 + net/rds/rdma.c | 2 +- net/sched/act_skbmod.c | 10 +- net/sched/sch_api.c | 2 +- net/sunrpc/svcsock.c | 10 +- net/tls/tls_sw.c | 7 +- net/vmw_vsock/virtio_transport.c | 3 +- scripts/bpf_doc.py | 4 +- scripts/mod/Makefile | 4 +- scripts/mod/modpost.c | 73 +--- scripts/mod/modpost.h | 25 ++ scripts/mod/symsearch.c | 199 ++++++++++ security/selinux/selinuxfs.c | 12 +- sound/pci/emu10k1/emu10k1_callback.c | 7 +- sound/pci/hda/cs35l56_hda.c | 4 +- sound/pci/hda/cs35l56_hda_i2c.c | 13 +- sound/pci/hda/cs35l56_hda_spi.c | 13 +- sound/pci/hda/patch_realtek.c | 3 +- sound/soc/amd/acp/acp-pci.c | 5 +- sound/soc/codecs/rt5682-sdw.c | 4 +- sound/soc/codecs/rt711-sdca-sdw.c | 4 +- sound/soc/codecs/rt711-sdw.c | 4 +- sound/soc/codecs/rt712-sdca-sdw.c | 5 +- sound/soc/codecs/rt722-sdca-sdw.c | 4 +- sound/soc/codecs/wm_adsp.c | 3 +- sound/soc/soc-ops.c | 2 +- sound/soc/sof/amd/acp.c | 8 +- tools/arch/x86/include/asm/cpufeatures.h | 2 +- tools/net/ynl/ynl-gen-c.py | 7 +- tools/testing/selftests/mm/vm_util.h | 2 +- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 85 +++-- tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 +- tools/testing/selftests/net/reuseaddr_conflict.c | 2 +- tools/testing/selftests/net/test_vxlan_mdb.sh | 205 ++++++---- tools/testing/selftests/net/udpgro_fwd.sh | 10 +- 343 files changed, 3828 insertions(+), 2302 deletions(-)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
[ Upstream commit 7054b551de18e9875fbdf8d4f3baade428353545 ]
Works better for xe like that. obj is no longer const.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20231204134946.16219-1-maarten... Reviewed-by: Jouni Högander jouni.hogander@intel.com Stable-dep-of: 582dc04b0658 ("drm/i915: Pre-populate the cursor physical dma address") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/display/intel_cursor.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_cursor.c b/drivers/gpu/drm/i915/display/intel_cursor.c index b342fad180ca5..0d21c34f74990 100644 --- a/drivers/gpu/drm/i915/display/intel_cursor.c +++ b/drivers/gpu/drm/i915/display/intel_cursor.c @@ -23,6 +23,8 @@ #include "intel_psr.h" #include "skl_watermark.h"
+#include "gem/i915_gem_object.h" + /* Cursor formats */ static const u32 intel_cursor_formats[] = { DRM_FORMAT_ARGB8888, @@ -33,11 +35,11 @@ static u32 intel_cursor_base(const struct intel_plane_state *plane_state) struct drm_i915_private *dev_priv = to_i915(plane_state->uapi.plane->dev); const struct drm_framebuffer *fb = plane_state->hw.fb; - const struct drm_i915_gem_object *obj = intel_fb_obj(fb); + struct drm_i915_gem_object *obj = intel_fb_obj(fb); u32 base;
if (DISPLAY_INFO(dev_priv)->cursor_needs_physical) - base = sg_dma_address(obj->mm.pages->sgl); + base = i915_gem_object_get_dma_address(obj, 0); else base = intel_plane_ggtt_offset(plane_state);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
[ Upstream commit 582dc04b0658ef3b90aeb49cbdd9747c2f1eccc3 ]
Calling i915_gem_object_get_dma_address() from the vblank evade critical section triggers might_sleep().
While we know that we've already pinned the framebuffer and thus i915_gem_object_get_dma_address() will in fact not sleep in this case, it seems reasonable to keep the unconditional might_sleep() for maximum coverage.
So let's instead pre-populate the dma address during fb pinning, which all happens before we enter the vblank evade critical section.
We can use u32 for the dma address as this class of hardware doesn't support >32bit addresses.
Cc: stable@vger.kernel.org Fixes: 0225a90981c8 ("drm/i915: Make cursor plane registers unlocked") Reported-by: Borislav Petkov bp@alien8.de Closes: https://lore.kernel.org/intel-gfx/20240227100342.GAZd2zfmYcPS_SndtO@fat_crat... Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240325175738.3440-1-ville.sy... Tested-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Chaitanya Kumar Borah chaitanya.kumar.borah@intel.com (cherry picked from commit c1289a5c3594cf04caa94ebf0edeb50c62009f1f) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/display/intel_cursor.c | 4 +--- drivers/gpu/drm/i915/display/intel_display_types.h | 1 + drivers/gpu/drm/i915/display/intel_fb_pin.c | 10 ++++++++++ 3 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_cursor.c b/drivers/gpu/drm/i915/display/intel_cursor.c index 0d21c34f74990..61df6cd3f3778 100644 --- a/drivers/gpu/drm/i915/display/intel_cursor.c +++ b/drivers/gpu/drm/i915/display/intel_cursor.c @@ -34,12 +34,10 @@ static u32 intel_cursor_base(const struct intel_plane_state *plane_state) { struct drm_i915_private *dev_priv = to_i915(plane_state->uapi.plane->dev); - const struct drm_framebuffer *fb = plane_state->hw.fb; - struct drm_i915_gem_object *obj = intel_fb_obj(fb); u32 base;
if (DISPLAY_INFO(dev_priv)->cursor_needs_physical) - base = i915_gem_object_get_dma_address(obj, 0); + base = plane_state->phys_dma_addr; else base = intel_plane_ggtt_offset(plane_state);
diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h b/drivers/gpu/drm/i915/display/intel_display_types.h index 7fc92b1474cc4..8b0dc2b75da4a 100644 --- a/drivers/gpu/drm/i915/display/intel_display_types.h +++ b/drivers/gpu/drm/i915/display/intel_display_types.h @@ -701,6 +701,7 @@ struct intel_plane_state { #define PLANE_HAS_FENCE BIT(0)
struct intel_fb_view view; + u32 phys_dma_addr; /* for cursor_needs_physical */
/* Plane pxp decryption state */ bool decrypt; diff --git a/drivers/gpu/drm/i915/display/intel_fb_pin.c b/drivers/gpu/drm/i915/display/intel_fb_pin.c index fffd568070d41..a131656757f2b 100644 --- a/drivers/gpu/drm/i915/display/intel_fb_pin.c +++ b/drivers/gpu/drm/i915/display/intel_fb_pin.c @@ -254,6 +254,16 @@ int intel_plane_pin_fb(struct intel_plane_state *plane_state) return PTR_ERR(vma);
plane_state->ggtt_vma = vma; + + /* + * Pre-populate the dma address before we enter the vblank + * evade critical section as i915_gem_object_get_dma_address() + * will trigger might_sleep() even if it won't actually sleep, + * which is the case when the fb has already been pinned. + */ + if (phys_cursor) + plane_state->phys_dma_addr = + i915_gem_object_get_dma_address(intel_fb_obj(fb), 0); } else { struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 5384cc0d1a88c27448a6a4e65b8abe6486de8012 ]
When getting kernel version via make, the result may be polluted by other output, like directory change info. e.g.
$ export MAKEFLAGS="-w" $ make kernelversion make: Entering directory '/home/net' 6.8.0 make: Leaving directory '/home/net'
This will distort the reStructuredText output and make latter rst2man failed like:
[...] bpf-helpers.rst:20: (WARNING/2) Field list ends without a blank line; unexpected unindent. [...]
Using silent mode would help. e.g.
$ make -s --no-print-directory kernelversion 6.8.0
Fixes: fd0a38f9c37d ("scripts/bpf: Set version attribute for bpf-helpers(7) man page") Signed-off-by: Michael Hofmann mhofmann@redhat.com Signed-off-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Quentin Monnet qmo@kernel.org Acked-by: Alejandro Colomar alx@kernel.org Link: https://lore.kernel.org/bpf/20240315023443.2364442-1-liuhangbin@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/bpf_doc.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py index 0669bac5e900e..3f899cc7e99a9 100755 --- a/scripts/bpf_doc.py +++ b/scripts/bpf_doc.py @@ -414,8 +414,8 @@ class PrinterRST(Printer): version = version.stdout.decode().rstrip() except: try: - version = subprocess.run(['make', 'kernelversion'], cwd=linuxRoot, - capture_output=True, check=True) + version = subprocess.run(['make', '-s', '--no-print-directory', 'kernelversion'], + cwd=linuxRoot, capture_output=True, check=True) version = version.stdout.decode().rstrip() except: return 'Linux'
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Leoshkevich iii@linux.ibm.com
[ Upstream commit 7ded842b356d151ece8ac4985940438e6d3998bb ]
Kui-Feng Lee reported a crash on s390x triggered by the dummy_st_ops/dummy_init_ptr_arg test [1]:
[<0000000000000002>] 0x2 [<00000000009d5cde>] bpf_struct_ops_test_run+0x156/0x250 [<000000000033145a>] __sys_bpf+0xa1a/0xd00 [<00000000003319dc>] __s390x_sys_bpf+0x44/0x50 [<0000000000c4382c>] __do_syscall+0x244/0x300 [<0000000000c59a40>] system_call+0x70/0x98
This is caused by GCC moving memcpy() after assignments in bpf_jit_plt(), resulting in NULL pointers being written instead of the return and the target addresses.
Looking at the GCC internals, the reordering is allowed because the alias analysis thinks that the memcpy() destination and the assignments' left-hand-sides are based on different objects: new_plt and bpf_plt_ret/bpf_plt_target respectively, and therefore they cannot alias.
This is in turn due to a violation of the C standard:
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object ...
From the C's perspective, bpf_plt_ret and bpf_plt are distinct objects
and cannot be subtracted. In the practical terms, doing so confuses the GCC's alias analysis.
The code was written this way in order to let the C side know a few offsets defined in the assembly. While nice, this is by no means necessary. Fix the noncompliance by hardcoding these offsets.
[1] https://lore.kernel.org/bpf/c9923c1d-971d-4022-8dc8-1364e929d34c@gmail.com/
Fixes: f1d5df84cd8c ("s390/bpf: Implement bpf_arch_text_poke()") Signed-off-by: Ilya Leoshkevich iii@linux.ibm.com Message-ID: 20240320015515.11883-1-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/net/bpf_jit_comp.c | 46 ++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 26 deletions(-)
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index e507692e51e71..8af02176f68bf 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -516,11 +516,12 @@ static void bpf_skip(struct bpf_jit *jit, int size) * PLT for hotpatchable calls. The calling convention is the same as for the * ftrace hotpatch trampolines: %r0 is return address, %r1 is clobbered. */ -extern const char bpf_plt[]; -extern const char bpf_plt_ret[]; -extern const char bpf_plt_target[]; -extern const char bpf_plt_end[]; -#define BPF_PLT_SIZE 32 +struct bpf_plt { + char code[16]; + void *ret; + void *target; +} __packed; +extern const struct bpf_plt bpf_plt; asm( ".pushsection .rodata\n" " .balign 8\n" @@ -531,15 +532,14 @@ asm( " .balign 8\n" "bpf_plt_ret: .quad 0\n" "bpf_plt_target: .quad 0\n" - "bpf_plt_end:\n" " .popsection\n" );
-static void bpf_jit_plt(void *plt, void *ret, void *target) +static void bpf_jit_plt(struct bpf_plt *plt, void *ret, void *target) { - memcpy(plt, bpf_plt, BPF_PLT_SIZE); - *(void **)((char *)plt + (bpf_plt_ret - bpf_plt)) = ret; - *(void **)((char *)plt + (bpf_plt_target - bpf_plt)) = target ?: ret; + memcpy(plt, &bpf_plt, sizeof(*plt)); + plt->ret = ret; + plt->target = target; }
/* @@ -662,9 +662,9 @@ static void bpf_jit_epilogue(struct bpf_jit *jit, u32 stack_depth) jit->prg = ALIGN(jit->prg, 8); jit->prologue_plt = jit->prg; if (jit->prg_buf) - bpf_jit_plt(jit->prg_buf + jit->prg, + bpf_jit_plt((struct bpf_plt *)(jit->prg_buf + jit->prg), jit->prg_buf + jit->prologue_plt_ret, NULL); - jit->prg += BPF_PLT_SIZE; + jit->prg += sizeof(struct bpf_plt); }
static int get_probe_mem_regno(const u8 *insn) @@ -1901,9 +1901,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) struct bpf_jit jit; int pass;
- if (WARN_ON_ONCE(bpf_plt_end - bpf_plt != BPF_PLT_SIZE)) - return orig_fp; - if (!fp->jit_requested) return orig_fp;
@@ -2009,14 +2006,11 @@ bool bpf_jit_supports_far_kfunc_call(void) int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, void *old_addr, void *new_addr) { + struct bpf_plt expected_plt, current_plt, new_plt, *plt; struct { u16 opc; s32 disp; } __packed insn; - char expected_plt[BPF_PLT_SIZE]; - char current_plt[BPF_PLT_SIZE]; - char new_plt[BPF_PLT_SIZE]; - char *plt; char *ret; int err;
@@ -2035,18 +2029,18 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, */ } else { /* Verify the PLT. */ - plt = (char *)ip + (insn.disp << 1); - err = copy_from_kernel_nofault(current_plt, plt, BPF_PLT_SIZE); + plt = ip + (insn.disp << 1); + err = copy_from_kernel_nofault(¤t_plt, plt, + sizeof(current_plt)); if (err < 0) return err; ret = (char *)ip + 6; - bpf_jit_plt(expected_plt, ret, old_addr); - if (memcmp(current_plt, expected_plt, BPF_PLT_SIZE)) + bpf_jit_plt(&expected_plt, ret, old_addr); + if (memcmp(¤t_plt, &expected_plt, sizeof(current_plt))) return -EINVAL; /* Adjust the call address. */ - bpf_jit_plt(new_plt, ret, new_addr); - s390_kernel_write(plt + (bpf_plt_target - bpf_plt), - new_plt + (bpf_plt_target - bpf_plt), + bpf_jit_plt(&new_plt, ret, new_addr); + s390_kernel_write(&plt->target, &new_plt.target, sizeof(void *)); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Puranjay Mohan puranjay12@gmail.com
[ Upstream commit 114b5b3b4bde7358624437be2f12cde1b265224e ]
A64_LDRSW() takes three registers: Xt, Xn, Xm as arguments and it loads and sign extends the value at address Xn + Xm into register Xt.
Currently, the offset is being directly used in place of the tmp register which has the offset already loaded by the last emitted instruction.
This will cause JIT failures. The easiest way to reproduce this is to test the following code through test_bpf module:
{ "BPF_LDX_MEMSX | BPF_W", .u.insns_int = { BPF_LD_IMM64(R1, 0x00000000deadbeefULL), BPF_LD_IMM64(R2, 0xffffffffdeadbeefULL), BPF_STX_MEM(BPF_DW, R10, R1, -7), BPF_LDX_MEMSX(BPF_W, R0, R10, -7), BPF_JMP_REG(BPF_JNE, R0, R2, 1), BPF_ALU64_IMM(BPF_MOV, R0, 0), BPF_EXIT_INSN(), }, INTERNAL, { }, { { 0, 0 } }, .stack_depth = 7, },
We need to use the offset as -7 to trigger this code path, there could be other valid ways to trigger this from proper BPF programs as well.
This code is rejected by the JIT because -7 is passed to A64_LDRSW() but it expects a valid register (0 - 31).
roott@pjy:~# modprobe test_bpf test_name="BPF_LDX_MEMSX | BPF_W" [11300.490371] test_bpf: test_bpf: set 'test_bpf' as the default test_suite. [11300.491750] test_bpf: #345 BPF_LDX_MEMSX | BPF_W [11300.493179] aarch64_insn_encode_register: unknown register encoding -7 [11300.494133] aarch64_insn_encode_register: unknown register encoding -7 [11300.495292] FAIL to select_runtime err=-524 [11300.496804] test_bpf: Summary: 0 PASSED, 1 FAILED, [0/0 JIT'ed] modprobe: ERROR: could not insert 'test_bpf': Invalid argument
Applying this patch fixes the issue.
root@pjy:~# modprobe test_bpf test_name="BPF_LDX_MEMSX | BPF_W" [ 292.837436] test_bpf: test_bpf: set 'test_bpf' as the default test_suite. [ 292.839416] test_bpf: #345 BPF_LDX_MEMSX | BPF_W jited:1 156 PASS [ 292.844794] test_bpf: Summary: 1 PASSED, 0 FAILED, [1/1 JIT'ed]
Fixes: cc88f540da52 ("bpf, arm64: Support sign-extension load instructions") Signed-off-by: Puranjay Mohan puranjay12@gmail.com Message-ID: 20240312235917.103626-1-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/net/bpf_jit_comp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 150d1c6543f7f..5fe4d8b3fdc89 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -1189,7 +1189,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, } else { emit_a64_mov_i(1, tmp, off, ctx); if (sign_extend) - emit(A64_LDRSW(dst, src_adj, off_adj), ctx); + emit(A64_LDRSW(dst, src, tmp), ctx); else emit(A64_LDR32(dst, src, tmp), ctx); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Sakharov p.sakharov@ispras.ru
[ Upstream commit 2295bd846765c766701e666ed2e4b35396be25e6 ]
If due to a memory allocation failure mock_chain() returns NULL, it is passed to dma_fence_enable_sw_signaling() resulting in NULL pointer dereference there.
Call dma_fence_enable_sw_signaling() only if mock_chain() succeeds.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: d62c43a953ce ("dma-buf: Enable signaling on fence for selftests") Signed-off-by: Pavel Sakharov p.sakharov@ispras.ru Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Christian König christian.koenig@amd.com Link: https://patchwork.freedesktop.org/patch/msgid/20240319231527.1821372-1-p.sak... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma-buf/st-dma-fence-chain.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c index c0979c8049b5a..661de4add4c72 100644 --- a/drivers/dma-buf/st-dma-fence-chain.c +++ b/drivers/dma-buf/st-dma-fence-chain.c @@ -84,11 +84,11 @@ static int sanitycheck(void *arg) return -ENOMEM;
chain = mock_chain(NULL, f, 1); - if (!chain) + if (chain) + dma_fence_enable_sw_signaling(chain); + else err = -ENOMEM;
- dma_fence_enable_sw_signaling(chain); - dma_fence_signal(f); dma_fence_put(f);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Artem Savkov asavkov@redhat.com
[ Upstream commit a51cd6bf8e10793103c5870ff9e4db295a843604 ]
In case when is64 == 1 in emit(A64_REV32(is64, dst, dst), ctx) the generated insn reverses byte order for both high and low 32-bit words, resuling in an incorrect swap as indicated by the jit test:
[ 9757.262607] test_bpf: #312 BSWAP 16: 0x0123456789abcdef -> 0xefcd jited:1 8 PASS [ 9757.264435] test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times) [ 9757.266260] test_bpf: #314 BSWAP 64: 0x0123456789abcdef -> 0x67452301 jited:1 8 PASS [ 9757.268000] test_bpf: #315 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 jited:1 8 PASS [ 9757.269686] test_bpf: #316 BSWAP 16: 0xfedcba9876543210 -> 0x1032 jited:1 8 PASS [ 9757.271380] test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times) [ 9757.273022] test_bpf: #318 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe jited:1 7 PASS [ 9757.274721] test_bpf: #319 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 jited:1 9 PASS
Fix this by forcing 32bit variant of rev32.
Fixes: 1104247f3f979 ("bpf, arm64: Support unconditional bswap") Signed-off-by: Artem Savkov asavkov@redhat.com Tested-by: Puranjay Mohan puranjay12@gmail.com Acked-by: Puranjay Mohan puranjay12@gmail.com Acked-by: Xu Kuohai xukuohai@huawei.com Message-ID: 20240321081809.158803-1-asavkov@redhat.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/net/bpf_jit_comp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 5fe4d8b3fdc89..29196dce9b91d 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -876,7 +876,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, emit(A64_UXTH(is64, dst, dst), ctx); break; case 32: - emit(A64_REV32(is64, dst, dst), ctx); + emit(A64_REV32(0, dst, dst), ctx); /* upper 32 bits already cleared */ break; case 64:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryosuke Yasuoka ryasuoka@redhat.com
[ Upstream commit d24b03535e5eb82e025219c2f632b485409c898f ]
syzbot reported the following uninit-value access issue [1][2]:
nci_rx_work() parses and processes received packet. When the payload length is zero, each message type handler reads uninitialized payload and KMSAN detects this issue. The receipt of a packet with a zero-size payload is considered unexpected, and therefore, such packets should be silently discarded.
This patch resolved this issue by checking payload size before calling each message type handler codes.
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Reported-and-tested-by: syzbot+7ea9413ea6749baf5574@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+29b5ca705d2e0f4a44d2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7ea9413ea6749baf5574 [1] Closes: https://syzkaller.appspot.com/bug?extid=29b5ca705d2e0f4a44d2 [2] Signed-off-by: Ryosuke Yasuoka ryasuoka@redhat.com Reviewed-by: Jeremy Cline jeremy@jcline.org Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/nfc/nci/core.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c index 12684d835cb53..772ddb5824d9e 100644 --- a/net/nfc/nci/core.c +++ b/net/nfc/nci/core.c @@ -1516,6 +1516,11 @@ static void nci_rx_work(struct work_struct *work) nfc_send_to_raw_sock(ndev->nfc_dev, skb, RAW_PAYLOAD_NCI, NFC_DIRECTION_RX);
+ if (!nci_plen(skb->data)) { + kfree_skb(skb); + break; + } + /* Process frame */ switch (nci_mt(skb->data)) { case NCI_MT_RSP_PKT:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit f6c8f5e8694c7a78c94e408b628afa6255cc428a ]
When we set members of simple nested structures in requests we need to set "presence" bits for all the nesting layers below. This has nothing to do with the presence type of the last layer.
Fixes: be5bea1cc0bf ("net: add basic C code generators for Netlink") Reviewed-by: Breno Leitao leitao@debian.org Link: https://lore.kernel.org/r/20240321020214.1250202-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/net/ynl/ynl-gen-c.py | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/net/ynl/ynl-gen-c.py b/tools/net/ynl/ynl-gen-c.py index 897af958cee85..575b7e248e521 100755 --- a/tools/net/ynl/ynl-gen-c.py +++ b/tools/net/ynl/ynl-gen-c.py @@ -198,8 +198,11 @@ class Type(SpecAttr): presence = '' for i in range(0, len(ref)): presence = f"{var}->{'.'.join(ref[:i] + [''])}_present.{ref[i]}" - if self.presence_type() == 'bit': - code.append(presence + ' = 1;') + # Every layer below last is a nest, so we know it uses bit presence + # last layer is "self" and may be a complex type + if i == len(ref) - 1 and self.presence_type() != 'bit': + continue + code.append(presence + ' = 1;') code += self._setter_lines(ri, member, presence)
func_name = f"{op_prefix(ri, direction, deref=deref)}_set_{'_'.join(ref)}"
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Thompson davthompson@nvidia.com
[ Upstream commit d6c30c5a168f8586b8bcc0d8e42e2456eb05209b ]
The mlxbf_gige_open() routine starts the PHY as part of normal initialization. The mlxbf_gige_open() routine must stop the PHY during its error paths.
Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: David Thompson davthompson@nvidia.com Reviewed-by: Asmaa Mnebhi asmaa@nvidia.com Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Jiri Pirko jiri@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c index aaf1faed4133e..044ff5f87b5e8 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c @@ -157,7 +157,7 @@ static int mlxbf_gige_open(struct net_device *netdev)
err = mlxbf_gige_tx_init(priv); if (err) - goto free_irqs; + goto phy_deinit; err = mlxbf_gige_rx_init(priv); if (err) goto tx_deinit; @@ -185,6 +185,9 @@ static int mlxbf_gige_open(struct net_device *netdev) tx_deinit: mlxbf_gige_tx_deinit(priv);
+phy_deinit: + phy_stop(phydev); + free_irqs: mlxbf_gige_free_irqs(priv); return err;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 06a093807eb7b5c5b29b6cff49f8174a4e702341 ]
If the rx payload length check fails, or if kmemdup() fails, we still need to free the command response. Fix that.
Fixes: 21254908cbe9 ("iwlwifi: mvm: add RFI-M support") Co-authored-by: Anjaneyulu pagadala.yesu.anjaneyulu@intel.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240319100755.db2fa0196aa7.I116293b132502ac68a65527330fa... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/rfi.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rfi.c b/drivers/net/wireless/intel/iwlwifi/mvm/rfi.c index 2ecd32bed752f..045c862a8fc4f 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rfi.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rfi.c @@ -132,14 +132,18 @@ struct iwl_rfi_freq_table_resp_cmd *iwl_rfi_get_freq_table(struct iwl_mvm *mvm) if (ret) return ERR_PTR(ret);
- if (WARN_ON_ONCE(iwl_rx_packet_payload_len(cmd.resp_pkt) != resp_size)) + if (WARN_ON_ONCE(iwl_rx_packet_payload_len(cmd.resp_pkt) != + resp_size)) { + iwl_free_resp(&cmd); return ERR_PTR(-EIO); + }
resp = kmemdup(cmd.resp_pkt->data, resp_size, GFP_KERNEL); + iwl_free_resp(&cmd); + if (!resp) return ERR_PTR(-ENOMEM);
- iwl_free_resp(&cmd); return resp; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Emmanuel Grumbach emmanuel.grumbach@intel.com
[ Upstream commit 29fa9a984b6d1075020f12071a89897fd62ed27f ]
Multi rx queue allows to spread the load of the Rx streams on different CPUs. 9000 series required complex synchronization mechanisms from the driver side since the hardware / firmware is not able to provide information about duplicate packets and timeouts inside the reordering buffer.
Users have complained that for newer devices, all those synchronization mechanisms have caused spurious packet drops. Those packet drops disappeared if we simplify the code, but unfortunately, we can't have RSS enabled on 9000 series without this complex code.
Remove support for RSS on 9000 so that we can make the code much simpler for newer devices and fix the bugs for them.
The down side of this patch is a that all the Rx path will be routed to a single CPU, but this has never been an issue, the modern CPUs are just fast enough to cope with all the traffic.
Signed-off-by: Emmanuel Grumbach emmanuel.grumbach@intel.com Signed-off-by: Gregory Greenman gregory.greenman@intel.com Link: https://lore.kernel.org/r/20231017115047.2917eb8b7af9.Iddd7dcf335387ba46fcbb... Signed-off-by: Johannes Berg johannes.berg@intel.com Stable-dep-of: e78d78773089 ("wifi: iwlwifi: mvm: include link ID when releasing frames") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/iwl-trans.h | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 4 +++- drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 11 ++++++++++- 3 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-trans.h b/drivers/net/wireless/intel/iwlwifi/iwl-trans.h index 168eda2132fb8..9dcc1506bd0b0 100644 --- a/drivers/net/wireless/intel/iwlwifi/iwl-trans.h +++ b/drivers/net/wireless/intel/iwlwifi/iwl-trans.h @@ -278,7 +278,7 @@ static inline void iwl_free_rxb(struct iwl_rx_cmd_buffer *r) #define IWL_MGMT_TID 15 #define IWL_FRAME_LIMIT 64 #define IWL_MAX_RX_HW_QUEUES 16 -#define IWL_9000_MAX_RX_HW_QUEUES 6 +#define IWL_9000_MAX_RX_HW_QUEUES 1
/** * enum iwl_wowlan_status - WoWLAN image/device status diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c index aaa9840d0d4c5..ee9d14250a261 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c @@ -352,7 +352,9 @@ int iwl_mvm_mac_setup_register(struct iwl_mvm *mvm) ieee80211_hw_set(hw, HAS_RATE_CONTROL); }
- if (iwl_mvm_has_new_rx_api(mvm)) + /* We want to use the mac80211's reorder buffer for 9000 */ + if (iwl_mvm_has_new_rx_api(mvm) && + mvm->trans->trans_cfg->device_family > IWL_DEVICE_FAMILY_9000) ieee80211_hw_set(hw, SUPPORTS_REORDERING_BUFFER);
if (fw_has_capa(&mvm->fw->ucode_capa, diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c index bac0228b8c866..92b3e18dbe877 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c @@ -963,6 +963,9 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm, baid = (reorder & IWL_RX_MPDU_REORDER_BAID_MASK) >> IWL_RX_MPDU_REORDER_BAID_SHIFT;
+ if (mvm->trans->trans_cfg->device_family == IWL_DEVICE_FAMILY_9000) + return false; + /* * This also covers the case of receiving a Block Ack Request * outside a BA session; we'll pass it to mac80211 and that @@ -2621,9 +2624,15 @@ void iwl_mvm_rx_mpdu_mq(struct iwl_mvm *mvm, struct napi_struct *napi,
if (!iwl_mvm_reorder(mvm, napi, queue, sta, skb, desc) && likely(!iwl_mvm_time_sync_frame(mvm, skb, hdr->addr2)) && - likely(!iwl_mvm_mei_filter_scan(mvm, skb))) + likely(!iwl_mvm_mei_filter_scan(mvm, skb))) { + if (mvm->trans->trans_cfg->device_family == IWL_DEVICE_FAMILY_9000 && + (desc->mac_flags2 & IWL_RX_MPDU_MFLG2_AMSDU) && + !(desc->amsdu_info & IWL_RX_MPDU_AMSDU_LAST_SUBFRAME)) + rx_status->flag |= RX_FLAG_AMSDU_MORE; + iwl_mvm_pass_packet_to_mac80211(mvm, napi, skb, queue, sta, link_sta); + } out: rcu_read_unlock(); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Berg benjamin.berg@intel.com
[ Upstream commit e78d7877308989ef91b64a3c746ae31324c07caa ]
When releasing frames from the reorder buffer, the link ID was not included in the RX status information. This subsequently led mac80211 to drop the frame. Change it so that the link information is set immediately when possible so that it doesn't not need to be filled in anymore when submitting the frame to mac80211.
Fixes: b8a85a1d42d7 ("wifi: iwlwifi: mvm: rxmq: report link ID to mac80211") Signed-off-by: Benjamin Berg benjamin.berg@intel.com Tested-by: Emmanuel Grumbach emmanuel.grumbach@intel.com Reviewed-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240320232419.bbbd5e9bfe80.Iec1bf5c884e371f7bc5ea2534ed9... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 20 ++++++++----------- 1 file changed, 8 insertions(+), 12 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c index 92b3e18dbe877..e9360b555ac93 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c @@ -236,21 +236,13 @@ static void iwl_mvm_add_rtap_sniffer_config(struct iwl_mvm *mvm, static void iwl_mvm_pass_packet_to_mac80211(struct iwl_mvm *mvm, struct napi_struct *napi, struct sk_buff *skb, int queue, - struct ieee80211_sta *sta, - struct ieee80211_link_sta *link_sta) + struct ieee80211_sta *sta) { if (unlikely(iwl_mvm_check_pn(mvm, skb, queue, sta))) { kfree_skb(skb); return; }
- if (sta && sta->valid_links && link_sta) { - struct ieee80211_rx_status *rx_status = IEEE80211_SKB_RXCB(skb); - - rx_status->link_valid = 1; - rx_status->link_id = link_sta->link_id; - } - ieee80211_rx_napi(mvm->hw, sta, skb, napi); }
@@ -636,7 +628,7 @@ static void iwl_mvm_release_frames(struct iwl_mvm *mvm, while ((skb = __skb_dequeue(skb_list))) { iwl_mvm_pass_packet_to_mac80211(mvm, napi, skb, reorder_buf->queue, - sta, NULL /* FIXME */); + sta); reorder_buf->num_stored--; } } @@ -2489,6 +2481,11 @@ void iwl_mvm_rx_mpdu_mq(struct iwl_mvm *mvm, struct napi_struct *napi, if (IS_ERR(sta)) sta = NULL; link_sta = rcu_dereference(mvm->fw_id_to_link_sta[id]); + + if (sta && sta->valid_links && link_sta) { + rx_status->link_valid = 1; + rx_status->link_id = link_sta->link_id; + } } } else if (!is_multicast_ether_addr(hdr->addr2)) { /* @@ -2630,8 +2627,7 @@ void iwl_mvm_rx_mpdu_mq(struct iwl_mvm *mvm, struct napi_struct *napi, !(desc->amsdu_info & IWL_RX_MPDU_AMSDU_LAST_SUBFRAME)) rx_status->flag |= RX_FLAG_AMSDU_MORE;
- iwl_mvm_pass_packet_to_mac80211(mvm, napi, skb, queue, sta, - link_sta); + iwl_mvm_pass_packet_to_mac80211(mvm, napi, skb, queue, sta); } out: rcu_read_unlock();
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Simon Trimmer simont@opensource.cirrus.com
[ Upstream commit cafe9c6a72cf1ffe96d2561d988a141cb5c093db ]
Initialization is completed before adding the component as that can start the process of the device binding and trigger actions that check init_done.
Signed-off-by: Simon Trimmer simont@opensource.cirrus.com Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier") Message-ID: 20240325145510.328378-1-rf@opensource.cirrus.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/cs35l56_hda.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/pci/hda/cs35l56_hda.c b/sound/pci/hda/cs35l56_hda.c index 7adc1d373d65c..27848d6469636 100644 --- a/sound/pci/hda/cs35l56_hda.c +++ b/sound/pci/hda/cs35l56_hda.c @@ -978,14 +978,14 @@ int cs35l56_hda_common_probe(struct cs35l56_hda *cs35l56, int id) pm_runtime_mark_last_busy(cs35l56->base.dev); pm_runtime_enable(cs35l56->base.dev);
+ cs35l56->base.init_done = true; + ret = component_add(cs35l56->base.dev, &cs35l56_hda_comp_ops); if (ret) { dev_err(cs35l56->base.dev, "Register component failed: %d\n", ret); goto pm_err; }
- cs35l56->base.init_done = true; - return 0;
pm_err:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Zou steven.zou@intel.com
[ Upstream commit 817b18965b58a6e5fb6ce97abf01b03a205a6aea ]
According to the datasheet, the recipe association data is an 8-byte little-endian value. It is described as 'Bitmap of the recipe indexes associated with this profile', it is from 24 to 31 byte area in FW. Therefore, it is defined to '__le64 recipe_assoc' in struct ice_aqc_recipe_to_profile. And then fix the bitmap casting issue, as we must never ever use castings for bitmap type.
Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface") Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Andrii Staikov andrii.staikov@intel.com Reviewed-by: Jan Sokolowski jan.sokolowski@intel.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Steven Zou steven.zou@intel.com Tested-by: Sujai Buvaneswaran sujai.buvaneswaran@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/intel/ice/ice_adminq_cmd.h | 3 ++- drivers/net/ethernet/intel/ice/ice_lag.c | 4 ++-- drivers/net/ethernet/intel/ice/ice_switch.c | 24 +++++++++++-------- drivers/net/ethernet/intel/ice/ice_switch.h | 4 ++-- 4 files changed, 20 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h index 45f3e351653db..72ca2199c9572 100644 --- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h +++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h @@ -592,8 +592,9 @@ struct ice_aqc_recipe_data_elem { struct ice_aqc_recipe_to_profile { __le16 profile_id; u8 rsvd[6]; - DECLARE_BITMAP(recipe_assoc, ICE_MAX_NUM_RECIPES); + __le64 recipe_assoc; }; +static_assert(sizeof(struct ice_aqc_recipe_to_profile) == 16);
/* Add/Update/Remove/Get switch rules (indirect 0x02A0, 0x02A1, 0x02A2, 0x02A3) */ diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c index 23e197c3d02a7..4e675c7c199fa 100644 --- a/drivers/net/ethernet/intel/ice/ice_lag.c +++ b/drivers/net/ethernet/intel/ice/ice_lag.c @@ -2000,14 +2000,14 @@ int ice_init_lag(struct ice_pf *pf) /* associate recipes to profiles */ for (n = 0; n < ICE_PROFID_IPV6_GTPU_IPV6_TCP_INNER; n++) { err = ice_aq_get_recipe_to_profile(&pf->hw, n, - (u8 *)&recipe_bits, NULL); + &recipe_bits, NULL); if (err) continue;
if (recipe_bits & BIT(ICE_SW_LKUP_DFLT)) { recipe_bits |= BIT(lag->pf_recipe); ice_aq_map_recipe_to_profile(&pf->hw, n, - (u8 *)&recipe_bits, NULL); + recipe_bits, NULL); } }
diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c b/drivers/net/ethernet/intel/ice/ice_switch.c index 2f77b684ff765..4c6d58bb2690d 100644 --- a/drivers/net/ethernet/intel/ice/ice_switch.c +++ b/drivers/net/ethernet/intel/ice/ice_switch.c @@ -2032,12 +2032,12 @@ ice_update_recipe_lkup_idx(struct ice_hw *hw, * ice_aq_map_recipe_to_profile - Map recipe to packet profile * @hw: pointer to the HW struct * @profile_id: package profile ID to associate the recipe with - * @r_bitmap: Recipe bitmap filled in and need to be returned as response + * @r_assoc: Recipe bitmap filled in and need to be returned as response * @cd: pointer to command details structure or NULL * Recipe to profile association (0x0291) */ int -ice_aq_map_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u8 *r_bitmap, +ice_aq_map_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u64 r_assoc, struct ice_sq_cd *cd) { struct ice_aqc_recipe_to_profile *cmd; @@ -2049,7 +2049,7 @@ ice_aq_map_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u8 *r_bitmap, /* Set the recipe ID bit in the bitmask to let the device know which * profile we are associating the recipe to */ - memcpy(cmd->recipe_assoc, r_bitmap, sizeof(cmd->recipe_assoc)); + cmd->recipe_assoc = cpu_to_le64(r_assoc);
return ice_aq_send_cmd(hw, &desc, NULL, 0, cd); } @@ -2058,12 +2058,12 @@ ice_aq_map_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u8 *r_bitmap, * ice_aq_get_recipe_to_profile - Map recipe to packet profile * @hw: pointer to the HW struct * @profile_id: package profile ID to associate the recipe with - * @r_bitmap: Recipe bitmap filled in and need to be returned as response + * @r_assoc: Recipe bitmap filled in and need to be returned as response * @cd: pointer to command details structure or NULL * Associate profile ID with given recipe (0x0293) */ int -ice_aq_get_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u8 *r_bitmap, +ice_aq_get_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u64 *r_assoc, struct ice_sq_cd *cd) { struct ice_aqc_recipe_to_profile *cmd; @@ -2076,7 +2076,7 @@ ice_aq_get_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u8 *r_bitmap,
status = ice_aq_send_cmd(hw, &desc, NULL, 0, cd); if (!status) - memcpy(r_bitmap, cmd->recipe_assoc, sizeof(cmd->recipe_assoc)); + *r_assoc = le64_to_cpu(cmd->recipe_assoc);
return status; } @@ -2121,6 +2121,7 @@ int ice_alloc_recipe(struct ice_hw *hw, u16 *rid) static void ice_get_recp_to_prof_map(struct ice_hw *hw) { DECLARE_BITMAP(r_bitmap, ICE_MAX_NUM_RECIPES); + u64 recp_assoc; u16 i;
for (i = 0; i < hw->switch_info->max_used_prof_index + 1; i++) { @@ -2128,8 +2129,9 @@ static void ice_get_recp_to_prof_map(struct ice_hw *hw)
bitmap_zero(profile_to_recipe[i], ICE_MAX_NUM_RECIPES); bitmap_zero(r_bitmap, ICE_MAX_NUM_RECIPES); - if (ice_aq_get_recipe_to_profile(hw, i, (u8 *)r_bitmap, NULL)) + if (ice_aq_get_recipe_to_profile(hw, i, &recp_assoc, NULL)) continue; + bitmap_from_arr64(r_bitmap, &recp_assoc, ICE_MAX_NUM_RECIPES); bitmap_copy(profile_to_recipe[i], r_bitmap, ICE_MAX_NUM_RECIPES); for_each_set_bit(j, r_bitmap, ICE_MAX_NUM_RECIPES) @@ -5431,22 +5433,24 @@ ice_add_adv_recipe(struct ice_hw *hw, struct ice_adv_lkup_elem *lkups, */ list_for_each_entry(fvit, &rm->fv_list, list_entry) { DECLARE_BITMAP(r_bitmap, ICE_MAX_NUM_RECIPES); + u64 recp_assoc; u16 j;
status = ice_aq_get_recipe_to_profile(hw, fvit->profile_id, - (u8 *)r_bitmap, NULL); + &recp_assoc, NULL); if (status) goto err_unroll;
+ bitmap_from_arr64(r_bitmap, &recp_assoc, ICE_MAX_NUM_RECIPES); bitmap_or(r_bitmap, r_bitmap, rm->r_bitmap, ICE_MAX_NUM_RECIPES); status = ice_acquire_change_lock(hw, ICE_RES_WRITE); if (status) goto err_unroll;
+ bitmap_to_arr64(&recp_assoc, r_bitmap, ICE_MAX_NUM_RECIPES); status = ice_aq_map_recipe_to_profile(hw, fvit->profile_id, - (u8 *)r_bitmap, - NULL); + recp_assoc, NULL); ice_release_change_lock(hw);
if (status) diff --git a/drivers/net/ethernet/intel/ice/ice_switch.h b/drivers/net/ethernet/intel/ice/ice_switch.h index db7e501b7e0a4..89ffa1b51b5ad 100644 --- a/drivers/net/ethernet/intel/ice/ice_switch.h +++ b/drivers/net/ethernet/intel/ice/ice_switch.h @@ -424,10 +424,10 @@ int ice_aq_add_recipe(struct ice_hw *hw, struct ice_aqc_recipe_data_elem *s_recipe_list, u16 num_recipes, struct ice_sq_cd *cd); int -ice_aq_get_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u8 *r_bitmap, +ice_aq_get_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u64 *r_assoc, struct ice_sq_cd *cd); int -ice_aq_map_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u8 *r_bitmap, +ice_aq_map_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u64 r_assoc, struct ice_sq_cd *cd);
#endif /* _ICE_SWITCH_H_ */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Swiatkowski michal.swiatkowski@linux.intel.com
[ Upstream commit 5995ef88e3a8c2b014f51256a88be8e336532ce7 ]
Previously only case when queues amount is lower was covered. Implement realloc for case when queues amount is higher than previous one. Use krealloc() function and zero new allocated elements.
It has to be done before ice_vsi_def_cfg(), because stats element for ring is set there.
Reviewed-by: Wojciech Drewek wojciech.drewek@intel.com Signed-off-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Tested-by: Sujai Buvaneswaran sujai.buvaneswaran@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 1cb7fdb1dfde ("ice: fix memory corruption bug with suspend and rebuild") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_lib.c | 58 ++++++++++++++++-------- 1 file changed, 39 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 7f4bc110ead44..47298ab675a55 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -3084,27 +3084,26 @@ ice_vsi_rebuild_set_coalesce(struct ice_vsi *vsi, }
/** - * ice_vsi_realloc_stat_arrays - Frees unused stat structures + * ice_vsi_realloc_stat_arrays - Frees unused stat structures or alloc new ones * @vsi: VSI pointer - * @prev_txq: Number of Tx rings before ring reallocation - * @prev_rxq: Number of Rx rings before ring reallocation */ -static void -ice_vsi_realloc_stat_arrays(struct ice_vsi *vsi, int prev_txq, int prev_rxq) +static int +ice_vsi_realloc_stat_arrays(struct ice_vsi *vsi) { + u16 req_txq = vsi->req_txq ? vsi->req_txq : vsi->alloc_txq; + u16 req_rxq = vsi->req_rxq ? vsi->req_rxq : vsi->alloc_rxq; + struct ice_ring_stats **tx_ring_stats; + struct ice_ring_stats **rx_ring_stats; struct ice_vsi_stats *vsi_stat; struct ice_pf *pf = vsi->back; + u16 prev_txq = vsi->alloc_txq; + u16 prev_rxq = vsi->alloc_rxq; int i;
- if (!prev_txq || !prev_rxq) - return; - if (vsi->type == ICE_VSI_CHNL) - return; - vsi_stat = pf->vsi_stats[vsi->idx];
- if (vsi->num_txq < prev_txq) { - for (i = vsi->num_txq; i < prev_txq; i++) { + if (req_txq < prev_txq) { + for (i = req_txq; i < prev_txq; i++) { if (vsi_stat->tx_ring_stats[i]) { kfree_rcu(vsi_stat->tx_ring_stats[i], rcu); WRITE_ONCE(vsi_stat->tx_ring_stats[i], NULL); @@ -3112,14 +3111,36 @@ ice_vsi_realloc_stat_arrays(struct ice_vsi *vsi, int prev_txq, int prev_rxq) } }
- if (vsi->num_rxq < prev_rxq) { - for (i = vsi->num_rxq; i < prev_rxq; i++) { + tx_ring_stats = vsi_stat->rx_ring_stats; + vsi_stat->tx_ring_stats = + krealloc_array(vsi_stat->tx_ring_stats, req_txq, + sizeof(*vsi_stat->tx_ring_stats), + GFP_KERNEL | __GFP_ZERO); + if (!vsi_stat->tx_ring_stats) { + vsi_stat->tx_ring_stats = tx_ring_stats; + return -ENOMEM; + } + + if (req_rxq < prev_rxq) { + for (i = req_rxq; i < prev_rxq; i++) { if (vsi_stat->rx_ring_stats[i]) { kfree_rcu(vsi_stat->rx_ring_stats[i], rcu); WRITE_ONCE(vsi_stat->rx_ring_stats[i], NULL); } } } + + rx_ring_stats = vsi_stat->rx_ring_stats; + vsi_stat->rx_ring_stats = + krealloc_array(vsi_stat->rx_ring_stats, req_rxq, + sizeof(*vsi_stat->rx_ring_stats), + GFP_KERNEL | __GFP_ZERO); + if (!vsi_stat->rx_ring_stats) { + vsi_stat->rx_ring_stats = rx_ring_stats; + return -ENOMEM; + } + + return 0; }
/** @@ -3136,9 +3157,9 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags) { struct ice_vsi_cfg_params params = {}; struct ice_coalesce_stored *coalesce; - int ret, prev_txq, prev_rxq; int prev_num_q_vectors = 0; struct ice_pf *pf; + int ret;
if (!vsi) return -EINVAL; @@ -3157,8 +3178,9 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
- prev_txq = vsi->num_txq; - prev_rxq = vsi->num_rxq; + ret = ice_vsi_realloc_stat_arrays(vsi); + if (ret) + goto err_vsi_cfg;
ice_vsi_decfg(vsi); ret = ice_vsi_cfg_def(vsi, ¶ms); @@ -3176,8 +3198,6 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags) return ice_schedule_reset(pf, ICE_RESET_PFR); }
- ice_vsi_realloc_stat_arrays(vsi, prev_txq, prev_rxq); - ice_vsi_rebuild_set_coalesce(vsi, coalesce, prev_num_q_vectors); kfree(coalesce);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesse Brandeburg jesse.brandeburg@intel.com
[ Upstream commit 1cb7fdb1dfde1aab66780b4ba44dba6402172111 ]
The ice driver would previously panic after suspend. This is caused from the driver *only* calling the ice_vsi_free_q_vectors() function by itself, when it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload") the driver has zeroed out num_q_vectors, and only restored it in ice_vsi_cfg_def().
This further causes the ice_rebuild() function to allocate a zero length buffer, after which num_q_vectors is updated, and then the new value of num_q_vectors is used to index into the zero length buffer, which corrupts memory.
The fix entails making sure all the code referencing num_q_vectors only does so after it has been reset via ice_vsi_cfg_def().
I didn't perform a full bisect, but I was able to test against 6.1.77 kernel and that ice driver works fine for suspend/resume with no panic, so sometime since then, this problem was introduced.
Also clean up an un-needed init of a local variable in the function being modified.
PANIC from 6.8.0-rc1:
[1026674.915596] PM: suspend exit [1026675.664697] ice 0000:17:00.1: PTP reset successful [1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time [1026675.667660] ice 0000:b1:00.0: PTP reset successful [1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time [1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None [1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010 [1026677.192753] ice 0000:17:00.0: PTP reset successful [1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time [1026677.197928] #PF: supervisor read access in kernel mode [1026677.197933] #PF: error_code(0x0000) - not-present page [1026677.197937] PGD 1557a7067 P4D 0 [1026677.212133] ice 0000:b1:00.1: PTP reset successful [1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time [1026677.212575] [1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI [1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G W 6.8.0-rc1+ #1 [1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022 [1026677.269367] Workqueue: ice ice_service_task [ice] [1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6 [1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202 [1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000 [1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828 [1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010 [1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0 [1026677.344472] FS: 0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000 [1026677.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0 [1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1026677.381952] PKRU: 55555554 [1026677.385116] Call Trace: [1026677.388023] <TASK> [1026677.390589] ? __die+0x20/0x70 [1026677.394105] ? page_fault_oops+0x82/0x160 [1026677.398576] ? do_user_addr_fault+0x65/0x6a0 [1026677.403307] ? exc_page_fault+0x6a/0x150 [1026677.407694] ? asm_exc_page_fault+0x22/0x30 [1026677.412349] ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.418614] ice_vsi_rebuild+0x34b/0x3c0 [ice] [1026677.423583] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [1026677.429147] ice_rebuild+0x18b/0x520 [ice] [1026677.433746] ? delay_tsc+0x8f/0xc0 [1026677.437630] ice_do_reset+0xa3/0x190 [ice] [1026677.442231] ice_service_task+0x26/0x440 [ice] [1026677.447180] process_one_work+0x174/0x340 [1026677.451669] worker_thread+0x27e/0x390 [1026677.455890] ? __pfx_worker_thread+0x10/0x10 [1026677.460627] kthread+0xee/0x120 [1026677.464235] ? __pfx_kthread+0x10/0x10 [1026677.468445] ret_from_fork+0x2d/0x50 [1026677.472476] ? __pfx_kthread+0x10/0x10 [1026677.476671] ret_from_fork_asm+0x1b/0x30 [1026677.481050] </TASK>
Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload") Reported-by: Robert Elliott elliott@hpe.com Signed-off-by: Jesse Brandeburg jesse.brandeburg@intel.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_lib.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 47298ab675a55..0b7132a42e359 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -3157,7 +3157,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags) { struct ice_vsi_cfg_params params = {}; struct ice_coalesce_stored *coalesce; - int prev_num_q_vectors = 0; + int prev_num_q_vectors; struct ice_pf *pf; int ret;
@@ -3171,13 +3171,6 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags) if (WARN_ON(vsi->type == ICE_VSI_VF && !vsi->vf)) return -EINVAL;
- coalesce = kcalloc(vsi->num_q_vectors, - sizeof(struct ice_coalesce_stored), GFP_KERNEL); - if (!coalesce) - return -ENOMEM; - - prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce); - ret = ice_vsi_realloc_stat_arrays(vsi); if (ret) goto err_vsi_cfg; @@ -3187,6 +3180,13 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags) if (ret) goto err_vsi_cfg;
+ coalesce = kcalloc(vsi->num_q_vectors, + sizeof(struct ice_coalesce_stored), GFP_KERNEL); + if (!coalesce) + return -ENOMEM; + + prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce); + ret = ice_vsi_cfg_tc_lan(pf, vsi); if (ret) { if (vsi_flags & ICE_VSI_FLAG_INIT) { @@ -3205,8 +3205,8 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
err_vsi_cfg_tc_lan: ice_vsi_decfg(vsi); -err_vsi_cfg: kfree(coalesce); +err_vsi_cfg: return ret; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Przemek Kitszel przemyslaw.kitszel@intel.com
[ Upstream commit aec806fb4afba5fe80b09e29351379a4292baa43 ]
Change kzalloc() flags used in ixgbe_ipsec_vf_add_sa() to GFP_ATOMIC, to avoid sleeping in IRQ context.
Dan Carpenter, with the help of Smatch, has found following issue: The patch eda0333ac293: "ixgbe: add VF IPsec management" from Aug 13, 2018 (linux-next), leads to the following Smatch static checker warning: drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c:917 ixgbe_ipsec_vf_add_sa() warn: sleeping in IRQ context
The call tree that Smatch is worried about is: ixgbe_msix_other() <- IRQ handler -> ixgbe_msg_task() -> ixgbe_rcv_msg_from_vf() -> ixgbe_ipsec_vf_add_sa()
Fixes: eda0333ac293 ("ixgbe: add VF IPsec management") Reported-by: Dan Carpenter dan.carpenter@linaro.org Link: https://lore.kernel.org/intel-wired-lan/db31a0b0-4d9f-4e6b-aed8-88266eb5665c... Reviewed-by: Michal Kubiak michal.kubiak@intel.com Signed-off-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Shannon Nelson shannon.nelson@amd.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c index 13a6fca31004a..866024f2b9eeb 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c @@ -914,7 +914,13 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf) goto err_out; }
- xs = kzalloc(sizeof(*xs), GFP_KERNEL); + algo = xfrm_aead_get_byname(aes_gcm_name, IXGBE_IPSEC_AUTH_BITS, 1); + if (unlikely(!algo)) { + err = -ENOENT; + goto err_out; + } + + xs = kzalloc(sizeof(*xs), GFP_ATOMIC); if (unlikely(!xs)) { err = -ENOMEM; goto err_out; @@ -930,14 +936,8 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf) memcpy(&xs->id.daddr.a4, sam->addr, sizeof(xs->id.daddr.a4)); xs->xso.dev = adapter->netdev;
- algo = xfrm_aead_get_byname(aes_gcm_name, IXGBE_IPSEC_AUTH_BITS, 1); - if (unlikely(!algo)) { - err = -ENOENT; - goto err_xs; - } - aead_len = sizeof(*xs->aead) + IXGBE_IPSEC_KEY_BITS / 8; - xs->aead = kzalloc(aead_len, GFP_KERNEL); + xs->aead = kzalloc(aead_len, GFP_ATOMIC); if (unlikely(!xs->aead)) { err = -ENOMEM; goto err_xs;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kurt Kanzenbach kurt@linutronix.de
[ Upstream commit 47ce2956c7a61ff354723e28235205fa2012265b ]
The initial igc Tx timestamping implementation used only one register for retrieving Tx timestamps. Commit 3ed247e78911 ("igc: Add support for multiple in-flight TX timestamps") added support for utilizing all four of them e.g., for multiple domain support. Remove the stale comment/FIXME.
Fixes: 3ed247e78911 ("igc: Add support for multiple in-flight TX timestamps") Signed-off-by: Kurt Kanzenbach kurt@linutronix.de Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Tested-by: Naama Meir naamax.meir@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc_main.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index fc1de116d5548..e83700ad7e622 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -1640,10 +1640,6 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
if (unlikely(test_bit(IGC_RING_FLAG_TX_HWTSTAMP, &tx_ring->flags) && skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) { - /* FIXME: add support for retrieving timestamps from - * the other timer registers before skipping the - * timestamping request. - */ unsigned long flags; u32 tstamp_flags;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandra Winter wintera@linux.ibm.com
[ Upstream commit afb373ff3f54c9d909efc7f810dc80a9742807b2 ]
The IO subsystem expects a driver to retry a ccw_device_start, when the subsequent interrupt response block (irb) contains a deferred condition code 1.
Symptoms before this commit: On the read channel we always trigger the next read anyhow, so no different behaviour here. On the write channel we may experience timeout errors, because the expected reply will never be received without the retry. Other callers of qeth_send_control_data() may wrongly assume that the ccw was successful, which may cause problems later.
Note that since commit 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") and commit 5ef1dc40ffa6 ("s390/cio: fix invalid -EBUSY on ccw_device_start") deferred CC1s are much more likely to occur. See the commit message of the latter for more background information.
Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") Signed-off-by: Alexandra Winter wintera@linux.ibm.com Co-developed-by: Thorsten Winkler twinkler@linux.ibm.com Signed-off-by: Thorsten Winkler twinkler@linux.ibm.com Reviewed-by: Peter Oberparleiter oberpar@linux.ibm.com Link: https://lore.kernel.org/r/20240321115337.3564694-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/net/qeth_core_main.c | 38 +++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index cd783290bde5e..1148b4ecabdde 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -1179,6 +1179,20 @@ static int qeth_check_irb_error(struct qeth_card *card, struct ccw_device *cdev, } }
+/** + * qeth_irq() - qeth interrupt handler + * @cdev: ccw device + * @intparm: expect pointer to iob + * @irb: Interruption Response Block + * + * In the good path: + * corresponding qeth channel is locked with last used iob as active_cmd. + * But this function is also called for error interrupts. + * + * Caller ensures that: + * Interrupts are disabled; ccw device lock is held; + * + */ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm, struct irb *irb) { @@ -1220,11 +1234,10 @@ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm, iob = (struct qeth_cmd_buffer *) (addr_t)intparm; }
- qeth_unlock_channel(card, channel); - rc = qeth_check_irb_error(card, cdev, irb); if (rc) { /* IO was terminated, free its resources. */ + qeth_unlock_channel(card, channel); if (iob) qeth_cancel_cmd(iob, rc); return; @@ -1268,6 +1281,7 @@ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm, rc = qeth_get_problem(card, cdev, irb); if (rc) { card->read_or_write_problem = 1; + qeth_unlock_channel(card, channel); if (iob) qeth_cancel_cmd(iob, rc); qeth_clear_ipacmd_list(card); @@ -1276,6 +1290,26 @@ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm, } }
+ if (scsw_cmd_is_valid_cc(&irb->scsw) && irb->scsw.cmd.cc == 1 && iob) { + /* channel command hasn't started: retry. + * active_cmd is still set to last iob + */ + QETH_CARD_TEXT(card, 2, "irqcc1"); + rc = ccw_device_start_timeout(cdev, __ccw_from_cmd(iob), + (addr_t)iob, 0, 0, iob->timeout); + if (rc) { + QETH_DBF_MESSAGE(2, + "ccw retry on %x failed, rc = %i\n", + CARD_DEVID(card), rc); + QETH_CARD_TEXT_(card, 2, " err%d", rc); + qeth_unlock_channel(card, channel); + qeth_cancel_cmd(iob, rc); + } + return; + } + + qeth_unlock_channel(card, channel); + if (iob) { /* sanity check: */ if (irb->scsw.cmd.count > iob->length) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ravi Gunasekaran r-gunasekaran@ti.com
[ Upstream commit b11c81731c810efe592e510bb0110e0db6877419 ]
commit e748d0fd66ab ("net: hsr: Disable promiscuous mode in offload mode") disables promiscuous mode of slave devices while creating an HSR interface. But while deleting the HSR interface, it does not take care of it. It decreases the promiscuous mode count, which eventually enables promiscuous mode on the slave devices when creating HSR interface again.
Fix this by not decrementing the promiscuous mode count while deleting the HSR interface when offload is enabled.
Fixes: e748d0fd66ab ("net: hsr: Disable promiscuous mode in offload mode") Signed-off-by: Ravi Gunasekaran r-gunasekaran@ti.com Reviewed-by: Jiri Pirko jiri@nvidia.com Link: https://lore.kernel.org/r/20240322100447.27615-1-r-gunasekaran@ti.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/hsr/hsr_slave.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/hsr/hsr_slave.c b/net/hsr/hsr_slave.c index e5742f2a2d522..1b6457f357bdb 100644 --- a/net/hsr/hsr_slave.c +++ b/net/hsr/hsr_slave.c @@ -220,7 +220,8 @@ void hsr_del_port(struct hsr_port *port) netdev_update_features(master->dev); dev_set_mtu(master->dev, hsr_get_max_mtu(hsr)); netdev_rx_handler_unregister(port->dev); - dev_set_promiscuity(port->dev, -1); + if (!port->hsr->fwd_offloaded) + dev_set_promiscuity(port->dev, -1); netdev_upper_dev_unlink(port->dev, master->dev); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 151c9c724d05d5b0dd8acd3e11cb69ef1f2dbada ]
We had various syzbot reports about tcp timers firing after the corresponding netns has been dismantled.
Fortunately Josef Bacik could trigger the issue more often, and could test a patch I wrote two years ago.
When TCP sockets are closed, we call inet_csk_clear_xmit_timers() to 'stop' the timers.
inet_csk_clear_xmit_timers() can be called from any context, including when socket lock is held. This is the reason it uses sk_stop_timer(), aka del_timer(). This means that ongoing timers might finish much later.
For user sockets, this is fine because each running timer holds a reference on the socket, and the user socket holds a reference on the netns.
For kernel sockets, we risk that the netns is freed before timer can complete, because kernel sockets do not hold reference on the netns.
This patch adds inet_csk_clear_xmit_timers_sync() function that using sk_stop_timer_sync() to make sure all timers are terminated before the kernel socket is released. Modules using kernel sockets close them in their netns exit() handler.
Also add sock_not_owned_by_me() helper to get LOCKDEP support : inet_csk_clear_xmit_timers_sync() must not be called while socket lock is held.
It is very possible we can revert in the future commit 3a58f13a881e ("net: rds: acquire refcount on TCP sockets") which attempted to solve the issue in rds only. (net/smc/af_smc.c and net/mptcp/subflow.c have similar code)
We probably can remove the check_net() tests from tcp_out_of_resources() and __tcp_close() in the future.
Reported-by: Josef Bacik josef@toxicpanda.com Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/ Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Fixes: 8a68173691f0 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket") Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsa... Signed-off-by: Eric Dumazet edumazet@google.com Tested-by: Josef Bacik josef@toxicpanda.com Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Link: https://lore.kernel.org/r/20240322135732.1535772-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/inet_connection_sock.h | 1 + include/net/sock.h | 7 +++++++ net/ipv4/inet_connection_sock.c | 14 ++++++++++++++ net/ipv4/tcp.c | 2 ++ 4 files changed, 24 insertions(+)
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 01a73bf74fa19..6ecac01115d9c 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -173,6 +173,7 @@ void inet_csk_init_xmit_timers(struct sock *sk, void (*delack_handler)(struct timer_list *), void (*keepalive_handler)(struct timer_list *)); void inet_csk_clear_xmit_timers(struct sock *sk); +void inet_csk_clear_xmit_timers_sync(struct sock *sk);
static inline void inet_csk_schedule_ack(struct sock *sk) { diff --git a/include/net/sock.h b/include/net/sock.h index e70c903b04f30..25780942ec8bf 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1808,6 +1808,13 @@ static inline void sock_owned_by_me(const struct sock *sk) #endif }
+static inline void sock_not_owned_by_me(const struct sock *sk) +{ +#ifdef CONFIG_LOCKDEP + WARN_ON_ONCE(lockdep_sock_is_held(sk) && debug_locks); +#endif +} + static inline bool sock_owned_by_user(const struct sock *sk) { sock_owned_by_me(sk); diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 762817d6c8d70..a587cb6be807c 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -774,6 +774,20 @@ void inet_csk_clear_xmit_timers(struct sock *sk) } EXPORT_SYMBOL(inet_csk_clear_xmit_timers);
+void inet_csk_clear_xmit_timers_sync(struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + /* ongoing timer handlers need to acquire socket lock. */ + sock_not_owned_by_me(sk); + + icsk->icsk_pending = icsk->icsk_ack.pending = 0; + + sk_stop_timer_sync(sk, &icsk->icsk_retransmit_timer); + sk_stop_timer_sync(sk, &icsk->icsk_delack_timer); + sk_stop_timer_sync(sk, &sk->sk_timer); +} + void inet_csk_delete_keepalive_timer(struct sock *sk) { sk_stop_timer(sk, &sk->sk_timer); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 68bb8d6bcc113..f8df35f7352a5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2931,6 +2931,8 @@ void tcp_close(struct sock *sk, long timeout) lock_sock(sk); __tcp_close(sk, timeout); release_sock(sk); + if (!sk->sk_net_refcnt) + inet_csk_clear_xmit_timers_sync(sk); sock_put(sk); } EXPORT_SYMBOL(tcp_close);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bjørn Mork bjorn@mork.no
[ Upstream commit 7d5a7dd5a35876f0ecc286f3602a88887a788217 ]
Some of the registers are aligned on a 32bit boundary, causing alignment faults on 64bit platforms.
Unable to handle kernel paging request at virtual address ffffffc084a1d004 Mem abort info: ESR = 0x0000000096000061 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x21: alignment fault Data abort info: ISV = 0, ISS = 0x00000061, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000046ad6000 [ffffffc084a1d004] pgd=100000013ffff003, p4d=100000013ffff003, pud=100000013ffff003, pmd=0068000020a00711 Internal error: Oops: 0000000096000061 [#1] SMP Modules linked in: mtk_t7xx(+) qcserial pppoe ppp_async option nft_fib_inet nf_flow_table_inet mt7921u(O) mt7921s(O) mt7921e(O) mt7921_common(O) iwlmvm(O) iwldvm(O) usb_wwan rndis_host qmi_wwan pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7996e(O) mt792x_usb(O) mt792x_lib(O) mt7915e(O) mt76_usb(O) mt76_sdio(O) mt76_connac_lib(O) mt76(O) mac80211(O) iwlwifi(O) huawei_cdc_ncm cfg80211(O) cdc_ncm cdc_ether wwan usbserial usbnet slhc sfp rtc_pcf8563 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mt6577_auxadc mdio_i2c libcrc32c compat(O) cdc_wdm cdc_acm at24 crypto_safexcel pwm_fan i2c_gpio i2c_smbus industrialio i2c_algo_bit i2c_mux_reg i2c_mux_pca954x i2c_mux_pca9541 i2c_mux_gpio i2c_mux dummy oid_registry tun sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes cbc authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd nvme nvme_core gpio_button_hotplug(O) dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax usbcore usb_common ptp aquantia pps_core mii tpm encrypted_keys trusted CPU: 3 PID: 5266 Comm: kworker/u9:1 Tainted: G O 6.6.22 #0 Hardware name: Bananapi BPI-R4 (DT) Workqueue: md_hk_wq t7xx_fsm_uninit [mtk_t7xx] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx] lr : t7xx_cldma_start+0xac/0x13c [mtk_t7xx] sp : ffffffc085d63d30 x29: ffffffc085d63d30 x28: 0000000000000000 x27: 0000000000000000 x26: 0000000000000000 x25: ffffff80c804f2c0 x24: ffffff80ca196c05 x23: 0000000000000000 x22: ffffff80c814b9b8 x21: ffffff80c814b128 x20: 0000000000000001 x19: ffffff80c814b080 x18: 0000000000000014 x17: 0000000055c9806b x16: 000000007c5296d0 x15: 000000000f6bca68 x14: 00000000dbdbdce4 x13: 000000001aeaf72a x12: 0000000000000001 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : ffffff80ca1ef6b4 x7 : ffffff80c814b818 x6 : 0000000000000018 x5 : 0000000000000870 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 000000010a947000 x1 : ffffffc084a1d004 x0 : ffffffc084a1d004 Call trace: t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx] t7xx_fsm_uninit+0x578/0x5ec [mtk_t7xx] process_one_work+0x154/0x2a0 worker_thread+0x2ac/0x488 kthread+0xe0/0xec ret_from_fork+0x10/0x20 Code: f9400800 91001000 8b214001 d50332bf (f9000022) ---[ end trace 0000000000000000 ]---
The inclusion of io-64-nonatomic-lo-hi.h indicates that all 64bit accesses can be replaced by pairs of nonatomic 32bit access. Fix alignment by forcing all accesses to be 32bit on 64bit platforms.
Link: https://forum.openwrt.org/t/fibocom-fm350-gl-support/142682/72 Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface") Signed-off-by: Bjørn Mork bjorn@mork.no Reviewed-by: Sergey Ryazanov ryazanov.s.a@gmail.com Tested-by: Liviu Dudau liviu@dudau.co.uk Link: https://lore.kernel.org/r/20240322144000.1683822-1-bjorn@mork.no Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wwan/t7xx/t7xx_cldma.c | 4 ++-- drivers/net/wwan/t7xx/t7xx_hif_cldma.c | 9 +++++---- drivers/net/wwan/t7xx/t7xx_pcie_mac.c | 8 ++++---- 3 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/net/wwan/t7xx/t7xx_cldma.c b/drivers/net/wwan/t7xx/t7xx_cldma.c index 9f43f256db1d0..f0a4783baf1f3 100644 --- a/drivers/net/wwan/t7xx/t7xx_cldma.c +++ b/drivers/net/wwan/t7xx/t7xx_cldma.c @@ -106,7 +106,7 @@ bool t7xx_cldma_tx_addr_is_set(struct t7xx_cldma_hw *hw_info, unsigned int qno) { u32 offset = REG_CLDMA_UL_START_ADDRL_0 + qno * ADDR_SIZE;
- return ioread64(hw_info->ap_pdn_base + offset); + return ioread64_lo_hi(hw_info->ap_pdn_base + offset); }
void t7xx_cldma_hw_set_start_addr(struct t7xx_cldma_hw *hw_info, unsigned int qno, u64 address, @@ -117,7 +117,7 @@ void t7xx_cldma_hw_set_start_addr(struct t7xx_cldma_hw *hw_info, unsigned int qn
reg = tx_rx == MTK_RX ? hw_info->ap_ao_base + REG_CLDMA_DL_START_ADDRL_0 : hw_info->ap_pdn_base + REG_CLDMA_UL_START_ADDRL_0; - iowrite64(address, reg + offset); + iowrite64_lo_hi(address, reg + offset); }
void t7xx_cldma_hw_resume_queue(struct t7xx_cldma_hw *hw_info, unsigned int qno, diff --git a/drivers/net/wwan/t7xx/t7xx_hif_cldma.c b/drivers/net/wwan/t7xx/t7xx_hif_cldma.c index cc70360364b7d..554ba4669cc8d 100644 --- a/drivers/net/wwan/t7xx/t7xx_hif_cldma.c +++ b/drivers/net/wwan/t7xx/t7xx_hif_cldma.c @@ -139,8 +139,9 @@ static int t7xx_cldma_gpd_rx_from_q(struct cldma_queue *queue, int budget, bool return -ENODEV; }
- gpd_addr = ioread64(hw_info->ap_pdn_base + REG_CLDMA_DL_CURRENT_ADDRL_0 + - queue->index * sizeof(u64)); + gpd_addr = ioread64_lo_hi(hw_info->ap_pdn_base + + REG_CLDMA_DL_CURRENT_ADDRL_0 + + queue->index * sizeof(u64)); if (req->gpd_addr == gpd_addr || hwo_polling_count++ >= 100) return 0;
@@ -318,8 +319,8 @@ static void t7xx_cldma_txq_empty_hndl(struct cldma_queue *queue) struct t7xx_cldma_hw *hw_info = &md_ctrl->hw_info;
/* Check current processing TGPD, 64-bit address is in a table by Q index */ - ul_curr_addr = ioread64(hw_info->ap_pdn_base + REG_CLDMA_UL_CURRENT_ADDRL_0 + - queue->index * sizeof(u64)); + ul_curr_addr = ioread64_lo_hi(hw_info->ap_pdn_base + REG_CLDMA_UL_CURRENT_ADDRL_0 + + queue->index * sizeof(u64)); if (req->gpd_addr != ul_curr_addr) { spin_unlock_irqrestore(&md_ctrl->cldma_lock, flags); dev_err(md_ctrl->dev, "CLDMA%d queue %d is not empty\n", diff --git a/drivers/net/wwan/t7xx/t7xx_pcie_mac.c b/drivers/net/wwan/t7xx/t7xx_pcie_mac.c index 76da4c15e3de1..f071ec7ff23d5 100644 --- a/drivers/net/wwan/t7xx/t7xx_pcie_mac.c +++ b/drivers/net/wwan/t7xx/t7xx_pcie_mac.c @@ -75,7 +75,7 @@ static void t7xx_pcie_mac_atr_tables_dis(void __iomem *pbase, enum t7xx_atr_src_ for (i = 0; i < ATR_TABLE_NUM_PER_ATR; i++) { offset = ATR_PORT_OFFSET * port + ATR_TABLE_OFFSET * i; reg = pbase + ATR_PCIE_WIN0_T0_ATR_PARAM_SRC_ADDR + offset; - iowrite64(0, reg); + iowrite64_lo_hi(0, reg); } }
@@ -112,17 +112,17 @@ static int t7xx_pcie_mac_atr_cfg(struct t7xx_pci_dev *t7xx_dev, struct t7xx_atr_
reg = pbase + ATR_PCIE_WIN0_T0_TRSL_ADDR + offset; value = cfg->trsl_addr & ATR_PCIE_WIN0_ADDR_ALGMT; - iowrite64(value, reg); + iowrite64_lo_hi(value, reg);
reg = pbase + ATR_PCIE_WIN0_T0_TRSL_PARAM + offset; iowrite32(cfg->trsl_id, reg);
reg = pbase + ATR_PCIE_WIN0_T0_ATR_PARAM_SRC_ADDR + offset; value = (cfg->src_addr & ATR_PCIE_WIN0_ADDR_ALGMT) | (atr_size << 1) | BIT(0); - iowrite64(value, reg); + iowrite64_lo_hi(value, reg);
/* Ensure ATR is set */ - ioread64(reg); + ioread64_lo_hi(reg); return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f1425529c33def8b46faae4400dd9e2bbaf16a05 ]
Locally generated IP multicast packets (such as the ones used in the test) do not perform routing and simply egress the bound device.
However, as explained in commit 8bcfb4ae4d97 ("selftests: forwarding: Fix failing tests with old libnet"), old versions of libnet (used by mausezahn) do not use the "SO_BINDTODEVICE" socket option. Specifically, the library started using the option for IPv6 sockets in version 1.1.6 and for IPv4 sockets in version 1.2. This explains why on Ubuntu - which uses version 1.1.6 - the IPv4 overlay tests are failing whereas the IPv6 ones are passing.
Fix by specifying the source and destination MAC of the packets which will cause mausezahn to use a packet socket instead of an IP socket.
Fixes: 62199e3f1658 ("selftests: net: Add VXLAN MDB test") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/5bb50349-196d-4892-8ed2-f37543aa863f@alu.uniz... Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Signed-off-by: Ido Schimmel idosch@nvidia.com Link: https://lore.kernel.org/r/20240325075030.2379513-1-idosch@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/test_vxlan_mdb.sh | 205 +++++++++++------- 1 file changed, 128 insertions(+), 77 deletions(-)
diff --git a/tools/testing/selftests/net/test_vxlan_mdb.sh b/tools/testing/selftests/net/test_vxlan_mdb.sh index 31e5f0f8859d1..be8e66abc74e1 100755 --- a/tools/testing/selftests/net/test_vxlan_mdb.sh +++ b/tools/testing/selftests/net/test_vxlan_mdb.sh @@ -984,6 +984,7 @@ encap_params_common() local plen=$1; shift local enc_ethtype=$1; shift local grp=$1; shift + local grp_dmac=$1; shift local src=$1; shift local mz=$1; shift
@@ -1002,11 +1003,11 @@ encap_params_common() run_cmd "bridge -n $ns1 mdb replace dev vx0 port vx0 grp $grp permanent dst $vtep2_ip src_vni 10020"
run_cmd "tc -n $ns2 filter replace dev vx0 ingress pref 1 handle 101 proto all flower enc_dst_ip $vtep1_ip action pass" - run_cmd "ip netns exec $ns1 $mz br0.10 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Destination IP - match"
- run_cmd "ip netns exec $ns1 $mz br0.20 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.20 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Destination IP - no match"
@@ -1019,20 +1020,20 @@ encap_params_common() run_cmd "bridge -n $ns1 mdb replace dev vx0 port vx0 grp $grp permanent dst $vtep1_ip dst_port 1111 src_vni 10020"
run_cmd "tc -n $ns2 filter replace dev veth0 ingress pref 1 handle 101 proto $enc_ethtype flower ip_proto udp dst_port 4789 action pass" - run_cmd "ip netns exec $ns1 $mz br0.10 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev veth0 ingress" 101 1 log_test $? 0 "Default destination port - match"
- run_cmd "ip netns exec $ns1 $mz br0.20 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.20 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev veth0 ingress" 101 1 log_test $? 0 "Default destination port - no match"
run_cmd "tc -n $ns2 filter replace dev veth0 ingress pref 1 handle 101 proto $enc_ethtype flower ip_proto udp dst_port 1111 action pass" - run_cmd "ip netns exec $ns1 $mz br0.20 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.20 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev veth0 ingress" 101 1 log_test $? 0 "Non-default destination port - match"
- run_cmd "ip netns exec $ns1 $mz br0.10 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev veth0 ingress" 101 1 log_test $? 0 "Non-default destination port - no match"
@@ -1045,11 +1046,11 @@ encap_params_common() run_cmd "bridge -n $ns1 mdb replace dev vx0 port vx0 grp $grp permanent dst $vtep1_ip src_vni 10020"
run_cmd "tc -n $ns2 filter replace dev vx0 ingress pref 1 handle 101 proto all flower enc_key_id 10010 action pass" - run_cmd "ip netns exec $ns1 $mz br0.10 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Default destination VNI - match"
- run_cmd "ip netns exec $ns1 $mz br0.20 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.20 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Default destination VNI - no match"
@@ -1057,11 +1058,11 @@ encap_params_common() run_cmd "bridge -n $ns1 mdb replace dev vx0 port vx0 grp $grp permanent dst $vtep1_ip vni 10010 src_vni 10020"
run_cmd "tc -n $ns2 filter replace dev vx0 ingress pref 1 handle 101 proto all flower enc_key_id 10020 action pass" - run_cmd "ip netns exec $ns1 $mz br0.10 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Non-default destination VNI - match"
- run_cmd "ip netns exec $ns1 $mz br0.20 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.20 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Non-default destination VNI - no match"
@@ -1079,6 +1080,7 @@ encap_params_ipv4_ipv4() local plen=32 local enc_ethtype="ip" local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local src=192.0.2.129
echo @@ -1086,7 +1088,7 @@ encap_params_ipv4_ipv4() echo "------------------------------------------------------------------"
encap_params_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $enc_ethtype \ - $grp $src "mausezahn" + $grp $grp_dmac $src "mausezahn" }
encap_params_ipv6_ipv4() @@ -1098,6 +1100,7 @@ encap_params_ipv6_ipv4() local plen=32 local enc_ethtype="ip" local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local src=2001:db8:100::1
echo @@ -1105,7 +1108,7 @@ encap_params_ipv6_ipv4() echo "------------------------------------------------------------------"
encap_params_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $enc_ethtype \ - $grp $src "mausezahn -6" + $grp $grp_dmac $src "mausezahn -6" }
encap_params_ipv4_ipv6() @@ -1117,6 +1120,7 @@ encap_params_ipv4_ipv6() local plen=128 local enc_ethtype="ipv6" local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local src=192.0.2.129
echo @@ -1124,7 +1128,7 @@ encap_params_ipv4_ipv6() echo "------------------------------------------------------------------"
encap_params_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $enc_ethtype \ - $grp $src "mausezahn" + $grp $grp_dmac $src "mausezahn" }
encap_params_ipv6_ipv6() @@ -1136,6 +1140,7 @@ encap_params_ipv6_ipv6() local plen=128 local enc_ethtype="ipv6" local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local src=2001:db8:100::1
echo @@ -1143,7 +1148,7 @@ encap_params_ipv6_ipv6() echo "------------------------------------------------------------------"
encap_params_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $enc_ethtype \ - $grp $src "mausezahn -6" + $grp $grp_dmac $src "mausezahn -6" }
starg_exclude_ir_common() @@ -1154,6 +1159,7 @@ starg_exclude_ir_common() local vtep2_ip=$1; shift local plen=$1; shift local grp=$1; shift + local grp_dmac=$1; shift local valid_src=$1; shift local invalid_src=$1; shift local mz=$1; shift @@ -1175,14 +1181,14 @@ starg_exclude_ir_common() run_cmd "bridge -n $ns1 mdb replace dev vx0 port vx0 grp $grp permanent filter_mode exclude source_list $invalid_src dst $vtep2_ip src_vni 10010"
# Check that invalid source is not forwarded to any VTEP. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $invalid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $invalid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 0 log_test $? 0 "Block excluded source - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 0 log_test $? 0 "Block excluded source - second VTEP"
# Check that valid source is forwarded to both VTEPs. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Forward valid source - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 1 @@ -1192,14 +1198,14 @@ starg_exclude_ir_common() run_cmd "bridge -n $ns1 mdb del dev vx0 port vx0 grp $grp dst $vtep2_ip src_vni 10010"
# Check that invalid source is not forwarded to any VTEP. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $invalid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $invalid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Block excluded source after removal - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 1 log_test $? 0 "Block excluded source after removal - second VTEP"
# Check that valid source is forwarded to the remaining VTEP. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 2 log_test $? 0 "Forward valid source after removal - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 1 @@ -1214,6 +1220,7 @@ starg_exclude_ir_ipv4_ipv4() local vtep2_ip=198.51.100.200 local plen=32 local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local valid_src=192.0.2.129 local invalid_src=192.0.2.145
@@ -1222,7 +1229,7 @@ starg_exclude_ir_ipv4_ipv4() echo "-------------------------------------------------------------"
starg_exclude_ir_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $grp \ - $valid_src $invalid_src "mausezahn" + $grp_dmac $valid_src $invalid_src "mausezahn" }
starg_exclude_ir_ipv6_ipv4() @@ -1233,6 +1240,7 @@ starg_exclude_ir_ipv6_ipv4() local vtep2_ip=198.51.100.200 local plen=32 local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local valid_src=2001:db8:100::1 local invalid_src=2001:db8:200::1
@@ -1241,7 +1249,7 @@ starg_exclude_ir_ipv6_ipv4() echo "-------------------------------------------------------------"
starg_exclude_ir_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $grp \ - $valid_src $invalid_src "mausezahn -6" + $grp_dmac $valid_src $invalid_src "mausezahn -6" }
starg_exclude_ir_ipv4_ipv6() @@ -1252,6 +1260,7 @@ starg_exclude_ir_ipv4_ipv6() local vtep2_ip=2001:db8:2000::1 local plen=128 local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local valid_src=192.0.2.129 local invalid_src=192.0.2.145
@@ -1260,7 +1269,7 @@ starg_exclude_ir_ipv4_ipv6() echo "-------------------------------------------------------------"
starg_exclude_ir_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $grp \ - $valid_src $invalid_src "mausezahn" + $grp_dmac $valid_src $invalid_src "mausezahn" }
starg_exclude_ir_ipv6_ipv6() @@ -1271,6 +1280,7 @@ starg_exclude_ir_ipv6_ipv6() local vtep2_ip=2001:db8:2000::1 local plen=128 local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local valid_src=2001:db8:100::1 local invalid_src=2001:db8:200::1
@@ -1279,7 +1289,7 @@ starg_exclude_ir_ipv6_ipv6() echo "-------------------------------------------------------------"
starg_exclude_ir_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $grp \ - $valid_src $invalid_src "mausezahn -6" + $grp_dmac $valid_src $invalid_src "mausezahn -6" }
starg_include_ir_common() @@ -1290,6 +1300,7 @@ starg_include_ir_common() local vtep2_ip=$1; shift local plen=$1; shift local grp=$1; shift + local grp_dmac=$1; shift local valid_src=$1; shift local invalid_src=$1; shift local mz=$1; shift @@ -1311,14 +1322,14 @@ starg_include_ir_common() run_cmd "bridge -n $ns1 mdb replace dev vx0 port vx0 grp $grp permanent filter_mode include source_list $valid_src dst $vtep2_ip src_vni 10010"
# Check that invalid source is not forwarded to any VTEP. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $invalid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $invalid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 0 log_test $? 0 "Block excluded source - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 0 log_test $? 0 "Block excluded source - second VTEP"
# Check that valid source is forwarded to both VTEPs. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Forward valid source - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 1 @@ -1328,14 +1339,14 @@ starg_include_ir_common() run_cmd "bridge -n $ns1 mdb del dev vx0 port vx0 grp $grp dst $vtep2_ip src_vni 10010"
# Check that invalid source is not forwarded to any VTEP. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $invalid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $invalid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Block excluded source after removal - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 1 log_test $? 0 "Block excluded source after removal - second VTEP"
# Check that valid source is forwarded to the remaining VTEP. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 2 log_test $? 0 "Forward valid source after removal - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 1 @@ -1350,6 +1361,7 @@ starg_include_ir_ipv4_ipv4() local vtep2_ip=198.51.100.200 local plen=32 local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local valid_src=192.0.2.129 local invalid_src=192.0.2.145
@@ -1358,7 +1370,7 @@ starg_include_ir_ipv4_ipv4() echo "-------------------------------------------------------------"
starg_include_ir_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $grp \ - $valid_src $invalid_src "mausezahn" + $grp_dmac $valid_src $invalid_src "mausezahn" }
starg_include_ir_ipv6_ipv4() @@ -1369,6 +1381,7 @@ starg_include_ir_ipv6_ipv4() local vtep2_ip=198.51.100.200 local plen=32 local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local valid_src=2001:db8:100::1 local invalid_src=2001:db8:200::1
@@ -1377,7 +1390,7 @@ starg_include_ir_ipv6_ipv4() echo "-------------------------------------------------------------"
starg_include_ir_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $grp \ - $valid_src $invalid_src "mausezahn -6" + $grp_dmac $valid_src $invalid_src "mausezahn -6" }
starg_include_ir_ipv4_ipv6() @@ -1388,6 +1401,7 @@ starg_include_ir_ipv4_ipv6() local vtep2_ip=2001:db8:2000::1 local plen=128 local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local valid_src=192.0.2.129 local invalid_src=192.0.2.145
@@ -1396,7 +1410,7 @@ starg_include_ir_ipv4_ipv6() echo "-------------------------------------------------------------"
starg_include_ir_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $grp \ - $valid_src $invalid_src "mausezahn" + $grp_dmac $valid_src $invalid_src "mausezahn" }
starg_include_ir_ipv6_ipv6() @@ -1407,6 +1421,7 @@ starg_include_ir_ipv6_ipv6() local vtep2_ip=2001:db8:2000::1 local plen=128 local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local valid_src=2001:db8:100::1 local invalid_src=2001:db8:200::1
@@ -1415,7 +1430,7 @@ starg_include_ir_ipv6_ipv6() echo "-------------------------------------------------------------"
starg_include_ir_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $grp \ - $valid_src $invalid_src "mausezahn -6" + $grp_dmac $valid_src $invalid_src "mausezahn -6" }
starg_exclude_p2mp_common() @@ -1425,6 +1440,7 @@ starg_exclude_p2mp_common() local mcast_grp=$1; shift local plen=$1; shift local grp=$1; shift + local grp_dmac=$1; shift local valid_src=$1; shift local invalid_src=$1; shift local mz=$1; shift @@ -1442,12 +1458,12 @@ starg_exclude_p2mp_common() run_cmd "bridge -n $ns1 mdb replace dev vx0 port vx0 grp $grp permanent filter_mode exclude source_list $invalid_src dst $mcast_grp src_vni 10010 via veth0"
# Check that invalid source is not forwarded. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $invalid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $invalid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 0 log_test $? 0 "Block excluded source"
# Check that valid source is forwarded. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Forward valid source"
@@ -1455,7 +1471,7 @@ starg_exclude_p2mp_common() run_cmd "ip -n $ns2 address del $mcast_grp/$plen dev veth0"
# Check that valid source is not received anymore. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Receive of valid source after removal from group" } @@ -1467,6 +1483,7 @@ starg_exclude_p2mp_ipv4_ipv4() local mcast_grp=238.1.1.1 local plen=32 local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local valid_src=192.0.2.129 local invalid_src=192.0.2.145
@@ -1474,7 +1491,7 @@ starg_exclude_p2mp_ipv4_ipv4() echo "Data path: (*, G) EXCLUDE - P2MP - IPv4 overlay / IPv4 underlay" echo "---------------------------------------------------------------"
- starg_exclude_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp \ + starg_exclude_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp $grp_dmac \ $valid_src $invalid_src "mausezahn" }
@@ -1485,6 +1502,7 @@ starg_exclude_p2mp_ipv6_ipv4() local mcast_grp=238.1.1.1 local plen=32 local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local valid_src=2001:db8:100::1 local invalid_src=2001:db8:200::1
@@ -1492,7 +1510,7 @@ starg_exclude_p2mp_ipv6_ipv4() echo "Data path: (*, G) EXCLUDE - P2MP - IPv6 overlay / IPv4 underlay" echo "---------------------------------------------------------------"
- starg_exclude_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp \ + starg_exclude_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp $grp_dmac \ $valid_src $invalid_src "mausezahn -6" }
@@ -1503,6 +1521,7 @@ starg_exclude_p2mp_ipv4_ipv6() local mcast_grp=ff0e::2 local plen=128 local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local valid_src=192.0.2.129 local invalid_src=192.0.2.145
@@ -1510,7 +1529,7 @@ starg_exclude_p2mp_ipv4_ipv6() echo "Data path: (*, G) EXCLUDE - P2MP - IPv4 overlay / IPv6 underlay" echo "---------------------------------------------------------------"
- starg_exclude_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp \ + starg_exclude_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp $grp_dmac \ $valid_src $invalid_src "mausezahn" }
@@ -1521,6 +1540,7 @@ starg_exclude_p2mp_ipv6_ipv6() local mcast_grp=ff0e::2 local plen=128 local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local valid_src=2001:db8:100::1 local invalid_src=2001:db8:200::1
@@ -1528,7 +1548,7 @@ starg_exclude_p2mp_ipv6_ipv6() echo "Data path: (*, G) EXCLUDE - P2MP - IPv6 overlay / IPv6 underlay" echo "---------------------------------------------------------------"
- starg_exclude_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp \ + starg_exclude_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp $grp_dmac \ $valid_src $invalid_src "mausezahn -6" }
@@ -1539,6 +1559,7 @@ starg_include_p2mp_common() local mcast_grp=$1; shift local plen=$1; shift local grp=$1; shift + local grp_dmac=$1; shift local valid_src=$1; shift local invalid_src=$1; shift local mz=$1; shift @@ -1556,12 +1577,12 @@ starg_include_p2mp_common() run_cmd "bridge -n $ns1 mdb replace dev vx0 port vx0 grp $grp permanent filter_mode include source_list $valid_src dst $mcast_grp src_vni 10010 via veth0"
# Check that invalid source is not forwarded. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $invalid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $invalid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 0 log_test $? 0 "Block excluded source"
# Check that valid source is forwarded. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Forward valid source"
@@ -1569,7 +1590,7 @@ starg_include_p2mp_common() run_cmd "ip -n $ns2 address del $mcast_grp/$plen dev veth0"
# Check that valid source is not received anymore. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $valid_src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Receive of valid source after removal from group" } @@ -1581,6 +1602,7 @@ starg_include_p2mp_ipv4_ipv4() local mcast_grp=238.1.1.1 local plen=32 local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local valid_src=192.0.2.129 local invalid_src=192.0.2.145
@@ -1588,7 +1610,7 @@ starg_include_p2mp_ipv4_ipv4() echo "Data path: (*, G) INCLUDE - P2MP - IPv4 overlay / IPv4 underlay" echo "---------------------------------------------------------------"
- starg_include_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp \ + starg_include_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp $grp_dmac \ $valid_src $invalid_src "mausezahn" }
@@ -1599,6 +1621,7 @@ starg_include_p2mp_ipv6_ipv4() local mcast_grp=238.1.1.1 local plen=32 local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local valid_src=2001:db8:100::1 local invalid_src=2001:db8:200::1
@@ -1606,7 +1629,7 @@ starg_include_p2mp_ipv6_ipv4() echo "Data path: (*, G) INCLUDE - P2MP - IPv6 overlay / IPv4 underlay" echo "---------------------------------------------------------------"
- starg_include_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp \ + starg_include_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp $grp_dmac \ $valid_src $invalid_src "mausezahn -6" }
@@ -1617,6 +1640,7 @@ starg_include_p2mp_ipv4_ipv6() local mcast_grp=ff0e::2 local plen=128 local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local valid_src=192.0.2.129 local invalid_src=192.0.2.145
@@ -1624,7 +1648,7 @@ starg_include_p2mp_ipv4_ipv6() echo "Data path: (*, G) INCLUDE - P2MP - IPv4 overlay / IPv6 underlay" echo "---------------------------------------------------------------"
- starg_include_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp \ + starg_include_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp $grp_dmac \ $valid_src $invalid_src "mausezahn" }
@@ -1635,6 +1659,7 @@ starg_include_p2mp_ipv6_ipv6() local mcast_grp=ff0e::2 local plen=128 local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local valid_src=2001:db8:100::1 local invalid_src=2001:db8:200::1
@@ -1642,7 +1667,7 @@ starg_include_p2mp_ipv6_ipv6() echo "Data path: (*, G) INCLUDE - P2MP - IPv6 overlay / IPv6 underlay" echo "---------------------------------------------------------------"
- starg_include_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp \ + starg_include_p2mp_common $ns1 $ns2 $mcast_grp $plen $grp $grp_dmac \ $valid_src $invalid_src "mausezahn -6" }
@@ -1654,6 +1679,7 @@ egress_vni_translation_common() local plen=$1; shift local proto=$1; shift local grp=$1; shift + local grp_dmac=$1; shift local src=$1; shift local mz=$1; shift
@@ -1689,20 +1715,20 @@ egress_vni_translation_common() # Make sure that packets sent from the first VTEP over VLAN 10 are # received by the SVI corresponding to the L3VNI (14000 / VLAN 4000) on # the second VTEP, since it is configured as PVID. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev br0.4000 ingress" 101 1 log_test $? 0 "Egress VNI translation - PVID configured"
# Remove PVID flag from VLAN 4000 on the second VTEP and make sure # packets are no longer received by the SVI interface. run_cmd "bridge -n $ns2 vlan add vid 4000 dev vx0" - run_cmd "ip netns exec $ns1 $mz br0.10 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev br0.4000 ingress" 101 1 log_test $? 0 "Egress VNI translation - no PVID configured"
# Reconfigure the PVID and make sure packets are received again. run_cmd "bridge -n $ns2 vlan add vid 4000 dev vx0 pvid" - run_cmd "ip netns exec $ns1 $mz br0.10 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev br0.4000 ingress" 101 2 log_test $? 0 "Egress VNI translation - PVID reconfigured" } @@ -1715,6 +1741,7 @@ egress_vni_translation_ipv4_ipv4() local plen=32 local proto="ipv4" local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local src=192.0.2.129
echo @@ -1722,7 +1749,7 @@ egress_vni_translation_ipv4_ipv4() echo "----------------------------------------------------------------"
egress_vni_translation_common $ns1 $ns2 $mcast_grp $plen $proto $grp \ - $src "mausezahn" + $grp_dmac $src "mausezahn" }
egress_vni_translation_ipv6_ipv4() @@ -1733,6 +1760,7 @@ egress_vni_translation_ipv6_ipv4() local plen=32 local proto="ipv6" local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local src=2001:db8:100::1
echo @@ -1740,7 +1768,7 @@ egress_vni_translation_ipv6_ipv4() echo "----------------------------------------------------------------"
egress_vni_translation_common $ns1 $ns2 $mcast_grp $plen $proto $grp \ - $src "mausezahn -6" + $grp_dmac $src "mausezahn -6" }
egress_vni_translation_ipv4_ipv6() @@ -1751,6 +1779,7 @@ egress_vni_translation_ipv4_ipv6() local plen=128 local proto="ipv4" local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local src=192.0.2.129
echo @@ -1758,7 +1787,7 @@ egress_vni_translation_ipv4_ipv6() echo "----------------------------------------------------------------"
egress_vni_translation_common $ns1 $ns2 $mcast_grp $plen $proto $grp \ - $src "mausezahn" + $grp_dmac $src "mausezahn" }
egress_vni_translation_ipv6_ipv6() @@ -1769,6 +1798,7 @@ egress_vni_translation_ipv6_ipv6() local plen=128 local proto="ipv6" local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local src=2001:db8:100::1
echo @@ -1776,7 +1806,7 @@ egress_vni_translation_ipv6_ipv6() echo "----------------------------------------------------------------"
egress_vni_translation_common $ns1 $ns2 $mcast_grp $plen $proto $grp \ - $src "mausezahn -6" + $grp_dmac $src "mausezahn -6" }
all_zeros_mdb_common() @@ -1789,12 +1819,18 @@ all_zeros_mdb_common() local vtep4_ip=$1; shift local plen=$1; shift local ipv4_grp=239.1.1.1 + local ipv4_grp_dmac=01:00:5e:01:01:01 local ipv4_unreg_grp=239.2.2.2 + local ipv4_unreg_grp_dmac=01:00:5e:02:02:02 local ipv4_ll_grp=224.0.0.100 + local ipv4_ll_grp_dmac=01:00:5e:00:00:64 local ipv4_src=192.0.2.129 local ipv6_grp=ff0e::1 + local ipv6_grp_dmac=33:33:00:00:00:01 local ipv6_unreg_grp=ff0e::2 + local ipv6_unreg_grp_dmac=33:33:00:00:00:02 local ipv6_ll_grp=ff02::1 + local ipv6_ll_grp_dmac=33:33:00:00:00:01 local ipv6_src=2001:db8:100::1
# Install all-zeros (catchall) MDB entries for IPv4 and IPv6 traffic @@ -1830,7 +1866,7 @@ all_zeros_mdb_common()
# Send registered IPv4 multicast and make sure it only arrives to the # first VTEP. - run_cmd "ip netns exec $ns1 mausezahn br0.10 -A $ipv4_src -B $ipv4_grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 mausezahn br0.10 -a own -b $ipv4_grp_dmac -A $ipv4_src -B $ipv4_grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "Registered IPv4 multicast - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 0 @@ -1838,7 +1874,7 @@ all_zeros_mdb_common()
# Send unregistered IPv4 multicast that is not link-local and make sure # it arrives to the first and second VTEPs. - run_cmd "ip netns exec $ns1 mausezahn br0.10 -A $ipv4_src -B $ipv4_unreg_grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 mausezahn br0.10 -a own -b $ipv4_unreg_grp_dmac -A $ipv4_src -B $ipv4_unreg_grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 2 log_test $? 0 "Unregistered IPv4 multicast - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 1 @@ -1846,7 +1882,7 @@ all_zeros_mdb_common()
# Send IPv4 link-local multicast traffic and make sure it does not # arrive to any VTEP. - run_cmd "ip netns exec $ns1 mausezahn br0.10 -A $ipv4_src -B $ipv4_ll_grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 mausezahn br0.10 -a own -b $ipv4_ll_grp_dmac -A $ipv4_src -B $ipv4_ll_grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 2 log_test $? 0 "Link-local IPv4 multicast - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 1 @@ -1881,7 +1917,7 @@ all_zeros_mdb_common()
# Send registered IPv6 multicast and make sure it only arrives to the # third VTEP. - run_cmd "ip netns exec $ns1 mausezahn -6 br0.10 -A $ipv6_src -B $ipv6_grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 mausezahn -6 br0.10 -a own -b $ipv6_grp_dmac -A $ipv6_src -B $ipv6_grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 103 1 log_test $? 0 "Registered IPv6 multicast - third VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 104 0 @@ -1889,7 +1925,7 @@ all_zeros_mdb_common()
# Send unregistered IPv6 multicast that is not link-local and make sure # it arrives to the third and fourth VTEPs. - run_cmd "ip netns exec $ns1 mausezahn -6 br0.10 -A $ipv6_src -B $ipv6_unreg_grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 mausezahn -6 br0.10 -a own -b $ipv6_unreg_grp_dmac -A $ipv6_src -B $ipv6_unreg_grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 103 2 log_test $? 0 "Unregistered IPv6 multicast - third VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 104 1 @@ -1897,7 +1933,7 @@ all_zeros_mdb_common()
# Send IPv6 link-local multicast traffic and make sure it does not # arrive to any VTEP. - run_cmd "ip netns exec $ns1 mausezahn -6 br0.10 -A $ipv6_src -B $ipv6_ll_grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 mausezahn -6 br0.10 -a own -b $ipv6_ll_grp_dmac -A $ipv6_src -B $ipv6_ll_grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 103 2 log_test $? 0 "Link-local IPv6 multicast - third VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 104 1 @@ -1972,6 +2008,7 @@ mdb_fdb_common() local plen=$1; shift local proto=$1; shift local grp=$1; shift + local grp_dmac=$1; shift local src=$1; shift local mz=$1; shift
@@ -1995,7 +2032,7 @@ mdb_fdb_common()
# Send IP multicast traffic and make sure it is forwarded by the MDB # and only arrives to the first VTEP. - run_cmd "ip netns exec $ns1 $mz br0.10 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "IP multicast - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 0 @@ -2012,7 +2049,7 @@ mdb_fdb_common() # Remove the MDB entry and make sure that IP multicast is now forwarded # by the FDB to the second VTEP. run_cmd "bridge -n $ns1 mdb del dev vx0 port vx0 grp $grp dst $vtep1_ip src_vni 10010" - run_cmd "ip netns exec $ns1 $mz br0.10 -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" + run_cmd "ip netns exec $ns1 $mz br0.10 -a own -b $grp_dmac -A $src -B $grp -t udp sp=12345,dp=54321 -p 100 -c 1 -q" tc_check_packets "$ns2" "dev vx0 ingress" 101 1 log_test $? 0 "IP multicast after removal - first VTEP" tc_check_packets "$ns2" "dev vx0 ingress" 102 2 @@ -2028,14 +2065,15 @@ mdb_fdb_ipv4_ipv4() local plen=32 local proto="ipv4" local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local src=192.0.2.129
echo echo "Data path: MDB with FDB - IPv4 overlay / IPv4 underlay" echo "------------------------------------------------------"
- mdb_fdb_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $proto $grp $src \ - "mausezahn" + mdb_fdb_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $proto $grp \ + $grp_dmac $src "mausezahn" }
mdb_fdb_ipv6_ipv4() @@ -2047,14 +2085,15 @@ mdb_fdb_ipv6_ipv4() local plen=32 local proto="ipv6" local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local src=2001:db8:100::1
echo echo "Data path: MDB with FDB - IPv6 overlay / IPv4 underlay" echo "------------------------------------------------------"
- mdb_fdb_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $proto $grp $src \ - "mausezahn -6" + mdb_fdb_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $proto $grp \ + $grp_dmac $src "mausezahn -6" }
mdb_fdb_ipv4_ipv6() @@ -2066,14 +2105,15 @@ mdb_fdb_ipv4_ipv6() local plen=128 local proto="ipv4" local grp=239.1.1.1 + local grp_dmac=01:00:5e:01:01:01 local src=192.0.2.129
echo echo "Data path: MDB with FDB - IPv4 overlay / IPv6 underlay" echo "------------------------------------------------------"
- mdb_fdb_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $proto $grp $src \ - "mausezahn" + mdb_fdb_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $proto $grp \ + $grp_dmac $src "mausezahn" }
mdb_fdb_ipv6_ipv6() @@ -2085,14 +2125,15 @@ mdb_fdb_ipv6_ipv6() local plen=128 local proto="ipv6" local grp=ff0e::1 + local grp_dmac=33:33:00:00:00:01 local src=2001:db8:100::1
echo echo "Data path: MDB with FDB - IPv6 overlay / IPv6 underlay" echo "------------------------------------------------------"
- mdb_fdb_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $proto $grp $src \ - "mausezahn -6" + mdb_fdb_common $ns1 $ns2 $vtep1_ip $vtep2_ip $plen $proto $grp \ + $grp_dmac $src "mausezahn -6" }
mdb_grp1_loop() @@ -2127,7 +2168,9 @@ mdb_torture_common() local vtep1_ip=$1; shift local vtep2_ip=$1; shift local grp1=$1; shift + local grp1_dmac=$1; shift local grp2=$1; shift + local grp2_dmac=$1; shift local src=$1; shift local mz=$1; shift local pid1 @@ -2152,9 +2195,9 @@ mdb_torture_common() pid1=$! mdb_grp2_loop $ns1 $vtep1_ip $vtep2_ip $grp2 & pid2=$! - ip netns exec $ns1 $mz br0.10 -A $src -B $grp1 -t udp sp=12345,dp=54321 -p 100 -c 0 -q & + ip netns exec $ns1 $mz br0.10 -a own -b $grp1_dmac -A $src -B $grp1 -t udp sp=12345,dp=54321 -p 100 -c 0 -q & pid3=$! - ip netns exec $ns1 $mz br0.10 -A $src -B $grp2 -t udp sp=12345,dp=54321 -p 100 -c 0 -q & + ip netns exec $ns1 $mz br0.10 -a own -b $grp2_dmac -A $src -B $grp2 -t udp sp=12345,dp=54321 -p 100 -c 0 -q & pid4=$!
sleep 30 @@ -2170,15 +2213,17 @@ mdb_torture_ipv4_ipv4() local vtep1_ip=198.51.100.100 local vtep2_ip=198.51.100.200 local grp1=239.1.1.1 + local grp1_dmac=01:00:5e:01:01:01 local grp2=239.2.2.2 + local grp2_dmac=01:00:5e:02:02:02 local src=192.0.2.129
echo echo "Data path: MDB torture test - IPv4 overlay / IPv4 underlay" echo "----------------------------------------------------------"
- mdb_torture_common $ns1 $vtep1_ip $vtep2_ip $grp1 $grp2 $src \ - "mausezahn" + mdb_torture_common $ns1 $vtep1_ip $vtep2_ip $grp1 $grp1_dmac $grp2 \ + $grp2_dmac $src "mausezahn" }
mdb_torture_ipv6_ipv4() @@ -2187,15 +2232,17 @@ mdb_torture_ipv6_ipv4() local vtep1_ip=198.51.100.100 local vtep2_ip=198.51.100.200 local grp1=ff0e::1 + local grp1_dmac=33:33:00:00:00:01 local grp2=ff0e::2 + local grp2_dmac=33:33:00:00:00:02 local src=2001:db8:100::1
echo echo "Data path: MDB torture test - IPv6 overlay / IPv4 underlay" echo "----------------------------------------------------------"
- mdb_torture_common $ns1 $vtep1_ip $vtep2_ip $grp1 $grp2 $src \ - "mausezahn -6" + mdb_torture_common $ns1 $vtep1_ip $vtep2_ip $grp1 $grp1_dmac $grp2 \ + $grp2_dmac $src "mausezahn -6" }
mdb_torture_ipv4_ipv6() @@ -2204,15 +2251,17 @@ mdb_torture_ipv4_ipv6() local vtep1_ip=2001:db8:1000::1 local vtep2_ip=2001:db8:2000::1 local grp1=239.1.1.1 + local grp1_dmac=01:00:5e:01:01:01 local grp2=239.2.2.2 + local grp2_dmac=01:00:5e:02:02:02 local src=192.0.2.129
echo echo "Data path: MDB torture test - IPv4 overlay / IPv6 underlay" echo "----------------------------------------------------------"
- mdb_torture_common $ns1 $vtep1_ip $vtep2_ip $grp1 $grp2 $src \ - "mausezahn" + mdb_torture_common $ns1 $vtep1_ip $vtep2_ip $grp1 $grp1_dmac $grp2 \ + $grp2_dmac $src "mausezahn" }
mdb_torture_ipv6_ipv6() @@ -2221,15 +2270,17 @@ mdb_torture_ipv6_ipv6() local vtep1_ip=2001:db8:1000::1 local vtep2_ip=2001:db8:2000::1 local grp1=ff0e::1 + local grp1_dmac=33:33:00:00:00:01 local grp2=ff0e::2 + local grp2_dmac=33:33:00:00:00:02 local src=2001:db8:100::1
echo echo "Data path: MDB torture test - IPv6 overlay / IPv6 underlay" echo "----------------------------------------------------------"
- mdb_torture_common $ns1 $vtep1_ip $vtep2_ip $grp1 $grp2 $src \ - "mausezahn -6" + mdb_torture_common $ns1 $vtep1_ip $vtep2_ip $grp1 $grp1_dmac $grp2 \ + $grp2_dmac $src "mausezahn -6" }
################################################################################
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Kiryushin kiryushin@ancud.ru
[ Upstream commit 40e2710860e57411ab57a1529c5a2748abbe8a19 ]
ACPICA commit 9061cd9aa131205657c811a52a9f8325a040c6c9
Errors in acpi_evaluate_object() can lead to incorrect state of buffer.
This can lead to access to data in previously ACPI_FREEd buffer and secondary ACPI_FREE to the same buffer later.
Handle errors in acpi_evaluate_object the same way it is done earlier with acpi_ns_handle_to_pathname.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Link: https://github.com/acpica/acpica/commit/9061cd9a Fixes: 5fd033288a86 ("ACPICA: debugger: add command to dump all fields of particular subtype") Signed-off-by: Nikita Kiryushin kiryushin@ancud.ru Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/acpica/dbnames.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/acpica/dbnames.c b/drivers/acpi/acpica/dbnames.c index b91155ea9c343..c9131259f717b 100644 --- a/drivers/acpi/acpica/dbnames.c +++ b/drivers/acpi/acpica/dbnames.c @@ -550,8 +550,12 @@ acpi_db_walk_for_fields(acpi_handle obj_handle, ACPI_FREE(buffer.pointer);
buffer.length = ACPI_ALLOCATE_LOCAL_BUFFER; - acpi_evaluate_object(obj_handle, NULL, NULL, &buffer); - + status = acpi_evaluate_object(obj_handle, NULL, NULL, &buffer); + if (ACPI_FAILURE(status)) { + acpi_os_printf("Could Not evaluate object %p\n", + obj_handle); + return (AE_OK); + } /* * Since this is a field unit, surround the output in braces */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jie Wang wangjie125@huawei.com
[ Upstream commit 47e39d213e09c6cae0d6b4d95e454ea404013312 ]
Currently, hns hardware supports more than 512 queues and the index limit in hclge_comm_tqps_update_stats is wrong. So this patch removes it.
Fixes: 287db5c40d15 ("net: hns3: create new set of common tqp stats APIs for PF and VF reuse") Signed-off-by: Jie Wang wangjie125@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Reviewed-by: Michal Kubiak michal.kubiak@intel.com Reviewed-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/hisilicon/hns3/hns3_common/hclge_comm_tqp_stats.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_common/hclge_comm_tqp_stats.c b/drivers/net/ethernet/hisilicon/hns3/hns3_common/hclge_comm_tqp_stats.c index f3c9395d8351c..618f66d9586b3 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_common/hclge_comm_tqp_stats.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_common/hclge_comm_tqp_stats.c @@ -85,7 +85,7 @@ int hclge_comm_tqps_update_stats(struct hnae3_handle *handle, hclge_comm_cmd_setup_basic_desc(&desc, HCLGE_OPC_QUERY_TX_STATS, true);
- desc.data[0] = cpu_to_le32(tqp->index & 0x1ff); + desc.data[0] = cpu_to_le32(tqp->index); ret = hclge_comm_cmd_send(hw, &desc, 1); if (ret) { dev_err(&hw->cmq.csq.pdev->dev,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yonglong Liu liuyonglong@huawei.com
[ Upstream commit 93305b77ffcb042f1538ecc383505e87d95aa05a ]
The devlink reload process will access the hardware resources, but the register operation is done before the hardware is initialized. So, processing the devlink reload during initialization may lead to kernel crash. This patch fixes this by taking devl_lock during initialization.
Fixes: b741269b2759 ("net: hns3: add support for registering devlink for PF") Signed-off-by: Yonglong Liu liuyonglong@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index f1ca2cda2961e..dfd0c5f4cb9f5 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -11614,6 +11614,8 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev) if (ret) goto err_pci_uninit;
+ devl_lock(hdev->devlink); + /* Firmware command queue initialize */ ret = hclge_comm_cmd_queue_init(hdev->pdev, &hdev->hw.hw); if (ret) @@ -11793,6 +11795,7 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
hclge_task_schedule(hdev, round_jiffies_relative(HZ));
+ devl_unlock(hdev->devlink); return 0;
err_mdiobus_unreg: @@ -11805,6 +11808,7 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev) err_cmd_uninit: hclge_comm_cmd_uninit(hdev->ae_dev, &hdev->hw.hw); err_devlink_uninit: + devl_unlock(hdev->devlink); hclge_devlink_uninit(hdev); err_pci_uninit: pcim_iounmap(pdev, hdev->hw.hw.io_base);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jian Shen shenjian15@huawei.com
[ Upstream commit 5bd088d6c21a45ee70e6116879310e54174d75eb ]
Currently, loopback test may be skipped when resetting, but the test result will still show as 'PASS', because the driver doesn't set ETH_TEST_FL_FAILED flag. Fix it by setting the flag and initializating the value to UNEXECUTED.
Fixes: 4c8dab1c709c ("net: hns3: reconstruct function hns3_self_test") Signed-off-by: Jian Shen shenjian15@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Reviewed-by: Michal Kubiak michal.kubiak@intel.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/hisilicon/hns3/hns3_ethtool.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c index 682239f33082b..78181eea93c1c 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c @@ -78,6 +78,9 @@ static const struct hns3_stats hns3_rxq_stats[] = { #define HNS3_NIC_LB_TEST_NO_MEM_ERR 1 #define HNS3_NIC_LB_TEST_TX_CNT_ERR 2 #define HNS3_NIC_LB_TEST_RX_CNT_ERR 3 +#define HNS3_NIC_LB_TEST_UNEXECUTED 4 + +static int hns3_get_sset_count(struct net_device *netdev, int stringset);
static int hns3_lp_setup(struct net_device *ndev, enum hnae3_loop loop, bool en) { @@ -418,18 +421,26 @@ static void hns3_do_external_lb(struct net_device *ndev, static void hns3_self_test(struct net_device *ndev, struct ethtool_test *eth_test, u64 *data) { + int cnt = hns3_get_sset_count(ndev, ETH_SS_TEST); struct hns3_nic_priv *priv = netdev_priv(ndev); struct hnae3_handle *h = priv->ae_handle; int st_param[HNAE3_LOOP_NONE][2]; bool if_running = netif_running(ndev); + int i; + + /* initialize the loopback test result, avoid marking an unexcuted + * loopback test as PASS. + */ + for (i = 0; i < cnt; i++) + data[i] = HNS3_NIC_LB_TEST_UNEXECUTED;
if (hns3_nic_resetting(ndev)) { netdev_err(ndev, "dev resetting!"); - return; + goto failure; }
if (!(eth_test->flags & ETH_TEST_FL_OFFLINE)) - return; + goto failure;
if (netif_msg_ifdown(h)) netdev_info(ndev, "self test start\n"); @@ -451,6 +462,10 @@ static void hns3_self_test(struct net_device *ndev,
if (netif_msg_ifdown(h)) netdev_info(ndev, "self test end\n"); + return; + +failure: + eth_test->flags |= ETH_TEST_FL_FAILED; }
static void hns3_update_limit_promisc_mode(struct net_device *netdev,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 7608a971fdeb4c3eefa522d1bfe8d4bc6b2481cc ]
Only MSG_PEEK needs to copy from an offset during the final process_rx_list call, because the bytes we copied at the beginning of tls_sw_recvmsg were left on the rx_list. In the KVEC case, we removed data from the rx_list as we were copying it, so there's no need to use an offset, just like in the normal case.
Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/e5487514f828e0347d2b92ca40002c62b58af73d.171112096... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index acf5bb74fd386..8e753d10e694a 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -2152,7 +2152,7 @@ int tls_sw_recvmsg(struct sock *sk, }
/* Drain records from the rx_list & copy if required */ - if (is_peek || is_kvec) + if (is_peek) err = process_rx_list(ctx, msg, &control, copied + peeked, decrypted - peeked, is_peek, NULL); else
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 85eef9a41d019b59be7bc91793f26251909c0710 ]
process_rx_list may not copy as many bytes as we want to the userspace buffer, for example in case we hit an EFAULT during the copy. If this happens, we should only count the bytes that were actually copied, which may be 0.
Subtracting async_copy_bytes is correct in both peek and !peek cases, because decrypted == async_copy_bytes + peeked for the peek case: peek is always !ZC, and we can go through either the sync or async path. In the async case, we add chunk to both decrypted and async_copy_bytes. In the sync case, we add chunk to both decrypted and peeked. I missed that in commit 6caaf104423d ("tls: fix peeking with sync+async decryption").
Fixes: 4d42cd6bc2ac ("tls: rx: fix return value for async crypto") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/1b5a1eaab3c088a9dd5d9f1059ceecd7afe888d1.171112096... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 8e753d10e694a..925de4caa894a 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -2158,6 +2158,9 @@ int tls_sw_recvmsg(struct sock *sk, else err = process_rx_list(ctx, msg, &control, 0, async_copy_bytes, is_peek, NULL); + + /* we could have copied less than we wanted, and possibly nothing */ + decrypted += max(err, 0) - async_copy_bytes; }
copied += decrypted;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 417e91e856099e9b8a42a2520e2255e6afe024be ]
At the start of tls_sw_recvmsg, we take a reference on the psock, and then call tls_rx_reader_lock. If that fails, we return directly without releasing the reference.
Instead of adding a new label, just take the reference after locking has succeeded, since we don't need it before.
Fixes: 4cbc325ed6b4 ("tls: rx: allow only one reader at a time") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/fe2ade22d030051ce4c3638704ed58b67d0df643.171112096... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 925de4caa894a..df166f6afad82 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1976,10 +1976,10 @@ int tls_sw_recvmsg(struct sock *sk, if (unlikely(flags & MSG_ERRQUEUE)) return sock_recv_errqueue(sk, msg, len, SOL_IP, IP_RECVERR);
- psock = sk_psock_get(sk); err = tls_rx_reader_lock(sk, ctx, flags & MSG_DONTWAIT); if (err < 0) return err; + psock = sk_psock_get(sk); bpf_strp_enabled = sk_psock_strp_enabled(psock);
/* If crypto failed the connection is broken */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Thompson davthompson@nvidia.com
[ Upstream commit f7442a634ac06b953fc1f7418f307b25acd4cfbc ]
The mlxbf_gige driver encounters a NULL pointer exception in mlxbf_gige_open() when kdump is enabled. The sequence to reproduce the exception is as follows: a) enable kdump b) trigger kdump via "echo c > /proc/sysrq-trigger" c) kdump kernel executes d) kdump kernel loads mlxbf_gige module e) the mlxbf_gige module runs its open() as the the "oob_net0" interface is brought up f) mlxbf_gige module will experience an exception during its open(), something like:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000086000004 EC = 0x21: IABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000086000004 [#1] SMP CPU: 0 PID: 812 Comm: NetworkManager Tainted: G OE 5.15.0-1035-bluefield #37-Ubuntu Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024 pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : 0x0 lr : __napi_poll+0x40/0x230 sp : ffff800008003e00 x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8 x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000 x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000 x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0 x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398 x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2 x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238 Call trace: 0x0 net_rx_action+0x178/0x360 __do_softirq+0x15c/0x428 __irq_exit_rcu+0xac/0xec irq_exit+0x18/0x2c handle_domain_irq+0x6c/0xa0 gic_handle_irq+0xec/0x1b0 call_on_irq_stack+0x20/0x2c do_interrupt_handler+0x5c/0x70 el1_interrupt+0x30/0x50 el1h_64_irq_handler+0x18/0x2c el1h_64_irq+0x7c/0x80 __setup_irq+0x4c0/0x950 request_threaded_irq+0xf4/0x1bc mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige] mlxbf_gige_open+0x5c/0x170 [mlxbf_gige] __dev_open+0x100/0x220 __dev_change_flags+0x16c/0x1f0 dev_change_flags+0x2c/0x70 do_setlink+0x220/0xa40 __rtnl_newlink+0x56c/0x8a0 rtnl_newlink+0x58/0x84 rtnetlink_rcv_msg+0x138/0x3c4 netlink_rcv_skb+0x64/0x130 rtnetlink_rcv+0x20/0x30 netlink_unicast+0x2ec/0x360 netlink_sendmsg+0x278/0x490 __sock_sendmsg+0x5c/0x6c ____sys_sendmsg+0x290/0x2d4 ___sys_sendmsg+0x84/0xd0 __sys_sendmsg+0x70/0xd0 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x78/0x100 el0_svc_common.constprop.0+0x54/0x184 do_el0_svc+0x30/0xac el0_svc+0x48/0x160 el0t_64_sync_handler+0xa4/0x12c el0t_64_sync+0x1a4/0x1a8 Code: bad PC value ---[ end trace 7d1c3f3bf9d81885 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt Kernel Offset: 0x2870a7a00000 from 0xffff800008000000 PHYS_OFFSET: 0x80000000 CPU features: 0x0,000005c1,a3332a5a Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
The exception happens because there is a pending RX interrupt before the call to request_irq(RX IRQ) executes. Then, the RX IRQ handler fires immediately after this request_irq() completes. The RX IRQ handler runs "napi_schedule()" before NAPI is fully initialized via "netif_napi_add()" and "napi_enable()", both which happen later in the open() logic.
The logic in mlxbf_gige_open() must fully initialize NAPI before any calls to request_irq() execute.
Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: David Thompson davthompson@nvidia.com Reviewed-by: Asmaa Mnebhi asmaa@nvidia.com Link: https://lore.kernel.org/r/20240325183627.7641-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../mellanox/mlxbf_gige/mlxbf_gige_main.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c index 044ff5f87b5e8..f1fa5f10051f2 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c @@ -139,13 +139,10 @@ static int mlxbf_gige_open(struct net_device *netdev) control |= MLXBF_GIGE_CONTROL_PORT_EN; writeq(control, priv->base + MLXBF_GIGE_CONTROL);
- err = mlxbf_gige_request_irqs(priv); - if (err) - return err; mlxbf_gige_cache_stats(priv); err = mlxbf_gige_clean_port(priv); if (err) - goto free_irqs; + return err;
/* Clear driver's valid_polarity to match hardware, * since the above call to clean_port() resets the @@ -166,6 +163,10 @@ static int mlxbf_gige_open(struct net_device *netdev) napi_enable(&priv->napi); netif_start_queue(netdev);
+ err = mlxbf_gige_request_irqs(priv); + if (err) + goto napi_deinit; + /* Set bits in INT_EN that we care about */ int_en = MLXBF_GIGE_INT_EN_HW_ACCESS_ERROR | MLXBF_GIGE_INT_EN_TX_CHECKSUM_INPUTS | @@ -182,14 +183,17 @@ static int mlxbf_gige_open(struct net_device *netdev)
return 0;
+napi_deinit: + netif_stop_queue(netdev); + napi_disable(&priv->napi); + netif_napi_del(&priv->napi); + mlxbf_gige_rx_deinit(priv); + tx_deinit: mlxbf_gige_tx_deinit(priv);
phy_deinit: phy_stop(phydev); - -free_irqs: - mlxbf_gige_free_irqs(priv); return err; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrei Matei andreimatei1@gmail.com
[ Upstream commit ecc6a2101840177e57c925c102d2d29f260d37c8 ]
This patch re-introduces protection against the size of access to stack memory being negative; the access size can appear negative as a result of overflowing its signed int representation. This should not actually happen, as there are other protections along the way, but we should protect against it anyway. One code path was missing such protections (fixed in the previous patch in the series), causing out-of-bounds array accesses in check_stack_range_initialized(). This patch causes the verification of a program with such a non-sensical access size to fail.
This check used to exist in a more indirect way, but was inadvertendly removed in a833a17aeac7.
Fixes: a833a17aeac7 ("bpf: Fix verification of indirect var-off stack access") Reported-by: syzbot+33f4297b5f927648741a@syzkaller.appspotmail.com Reported-by: syzbot+aafd0513053a1cbf52ef@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/CAADnVQLORV5PT0iTAhRER+iLBTkByCYNBYyvBSgjN1T31K+... Acked-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Andrei Matei andreimatei1@gmail.com Link: https://lore.kernel.org/r/20240327024245.318299-3-andreimatei1@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/verifier.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 396c4c66932f2..c9fc734989c68 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -6637,6 +6637,11 @@ static int check_stack_access_within_bounds( err = check_stack_slot_within_bounds(env, min_off, state, type); if (!err && max_off > 0) err = -EINVAL; /* out of stack access into non-negative offsets */ + if (!err && access_size < 0) + /* access_size should not be negative (or overflow an int); others checks + * along the way should have prevented such an access. + */ + err = -EFAULT; /* invalid negative access size; integer overflow? */
if (err) { if (tnum_is_const(reg->var_off)) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit 8876a37277cb832e1861c35f8c661825179f73f5 ]
fscache emits a lot of duplicate cookie warnings with cifs because the index key for the fscache cookies does not include everything that the cifs_find_inode() function does. The latter is used with iget5_locked() to distinguish between inodes in the local inode cache.
Fix this by adding the creation time and file type to the fscache cookie key.
Additionally, add a couple of comments to note that if one is changed the other must be also.
Signed-off-by: David Howells dhowells@redhat.com Fixes: 70431bfd825d ("cifs: Support fscache indexing rewrite") cc: Shyam Prasad N nspmangalore@gmail.com cc: Rohith Surabattula rohiths.msft@gmail.com cc: Jeff Layton jlayton@kernel.org cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/fscache.c | 16 +++++++++++++++- fs/smb/client/inode.c | 2 ++ 2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/fs/smb/client/fscache.c b/fs/smb/client/fscache.c index e5cad149f5a2d..a4ee801b29394 100644 --- a/fs/smb/client/fscache.c +++ b/fs/smb/client/fscache.c @@ -12,6 +12,16 @@ #include "cifs_fs_sb.h" #include "cifsproto.h"
+/* + * Key for fscache inode. [!] Contents must match comparisons in cifs_find_inode(). + */ +struct cifs_fscache_inode_key { + + __le64 uniqueid; /* server inode number */ + __le64 createtime; /* creation time on server */ + u8 type; /* S_IFMT file type */ +} __packed; + static void cifs_fscache_fill_volume_coherency( struct cifs_tcon *tcon, struct cifs_fscache_volume_coherency_data *cd) @@ -97,15 +107,19 @@ void cifs_fscache_release_super_cookie(struct cifs_tcon *tcon) void cifs_fscache_get_inode_cookie(struct inode *inode) { struct cifs_fscache_inode_coherency_data cd; + struct cifs_fscache_inode_key key; struct cifsInodeInfo *cifsi = CIFS_I(inode); struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
+ key.uniqueid = cpu_to_le64(cifsi->uniqueid); + key.createtime = cpu_to_le64(cifsi->createtime); + key.type = (inode->i_mode & S_IFMT) >> 12; cifs_fscache_fill_coherency(&cifsi->netfs.inode, &cd);
cifsi->netfs.cache = fscache_acquire_cookie(tcon->fscache, 0, - &cifsi->uniqueid, sizeof(cifsi->uniqueid), + &key, sizeof(key), &cd, sizeof(cd), i_size_read(&cifsi->netfs.inode)); if (cifsi->netfs.cache) diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c index cb9e719e67ae2..fa6330d586e89 100644 --- a/fs/smb/client/inode.c +++ b/fs/smb/client/inode.c @@ -1390,6 +1390,8 @@ cifs_find_inode(struct inode *inode, void *opaque) { struct cifs_fattr *fattr = opaque;
+ /* [!] The compared values must be the same in struct cifs_fscache_inode_key. */ + /* don't match inode with different uniqueid */ if (CIFS_I(inode)->uniqueid != fattr->cf_uniqueid) return 0;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit b32ca27fa238ff83427d23bef2a5b741e2a88a1e ]
Report EOPNOTSUPP if NFT_MSG_DESTROYCHAIN is used to delete hooks in an existing netdev basechain, thus, only NFT_MSG_DELCHAIN is allowed.
Fixes: 7d937b107108f ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index f10419ba6e0bd..0653f1e5e8929 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2934,7 +2934,8 @@ static int nf_tables_delchain(struct sk_buff *skb, const struct nfnl_info *info, nft_ctx_init(&ctx, net, skb, info->nlh, family, table, chain, nla);
if (nla[NFTA_CHAIN_HOOK]) { - if (chain->flags & NFT_CHAIN_HW_OFFLOAD) + if (NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYCHAIN || + chain->flags & NFT_CHAIN_HW_OFFLOAD) return -EOPNOTSUPP;
if (nft_is_base_chain(chain)) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 1e1fb6f00f52812277963365d9bd835b9b0ea4e0 ]
netdev basechain updates are stored in the transaction object hook list. When setting on the table dormant flag, it iterates over the existing hooks in the basechain. Thus, skipping the hooks that are being added/deleted in this transaction, which leaves hook registration in inconsistent state.
Reject table flag updates in combination with netdev basechain updates in the same batch:
- Update table flags and add/delete basechain: Check from basechain update path if there are pending flag updates for this table. - add/delete basechain and update table flags: Iterate over the transaction list to search for basechain updates from the table update path.
In both cases, the batch is rejected. Based on suggestion from Florian Westphal.
Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Fixes: 7d937b107108f ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 0653f1e5e8929..6e4e22a10a826 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1200,6 +1200,25 @@ static void nf_tables_table_disable(struct net *net, struct nft_table *table) #define __NFT_TABLE_F_UPDATE (__NFT_TABLE_F_WAS_DORMANT | \ __NFT_TABLE_F_WAS_AWAKEN)
+static bool nft_table_pending_update(const struct nft_ctx *ctx) +{ + struct nftables_pernet *nft_net = nft_pernet(ctx->net); + struct nft_trans *trans; + + if (ctx->table->flags & __NFT_TABLE_F_UPDATE) + return true; + + list_for_each_entry(trans, &nft_net->commit_list, list) { + if ((trans->msg_type == NFT_MSG_NEWCHAIN || + trans->msg_type == NFT_MSG_DELCHAIN) && + trans->ctx.table == ctx->table && + nft_trans_chain_update(trans)) + return true; + } + + return false; +} + static int nf_tables_updtable(struct nft_ctx *ctx) { struct nft_trans *trans; @@ -1223,7 +1242,7 @@ static int nf_tables_updtable(struct nft_ctx *ctx) return -EOPNOTSUPP;
/* No dormant off/on/off/on games in single transaction */ - if (ctx->table->flags & __NFT_TABLE_F_UPDATE) + if (nft_table_pending_update(ctx)) return -EINVAL;
trans = nft_trans_alloc(ctx, NFT_MSG_NEWTABLE, @@ -2621,6 +2640,13 @@ static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy, } }
+ if (table->flags & __NFT_TABLE_F_UPDATE && + !list_empty(&hook.list)) { + NL_SET_BAD_ATTR(extack, attr); + err = -EOPNOTSUPP; + goto err_hooks; + } + if (!(table->flags & NFT_TABLE_F_DORMANT) && nft_is_base_chain(chain) && !list_empty(&hook.list)) { @@ -2850,6 +2876,9 @@ static int nft_delchain_hook(struct nft_ctx *ctx, struct nft_trans *trans; int err;
+ if (ctx->table->flags & __NFT_TABLE_F_UPDATE) + return -EOPNOTSUPP; + err = nft_chain_parse_hook(ctx->net, basechain, nla, &chain_hook, ctx->family, chain->flags, extack); if (err < 0)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 216e7bf7402caf73f4939a8e0248392e96d7c0da ]
Skip hook unregistration when adding or deleting devices from an existing netdev basechain. Otherwise, commit/abort path try to unregister hooks which not enabled.
Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Fixes: 7d937b107108 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 6e4e22a10a826..b2ef7e37f11cd 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -10083,9 +10083,11 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) if (nft_trans_chain_update(trans)) { nf_tables_chain_notify(&trans->ctx, NFT_MSG_DELCHAIN, &nft_trans_chain_hooks(trans)); - nft_netdev_unregister_hooks(net, - &nft_trans_chain_hooks(trans), - true); + if (!(trans->ctx.table->flags & NFT_TABLE_F_DORMANT)) { + nft_netdev_unregister_hooks(net, + &nft_trans_chain_hooks(trans), + true); + } } else { nft_chain_del(trans->ctx.chain); nf_tables_chain_notify(&trans->ctx, NFT_MSG_DELCHAIN, @@ -10357,9 +10359,11 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) break; case NFT_MSG_NEWCHAIN: if (nft_trans_chain_update(trans)) { - nft_netdev_unregister_hooks(net, - &nft_trans_chain_hooks(trans), - true); + if (!(trans->ctx.table->flags & NFT_TABLE_F_DORMANT)) { + nft_netdev_unregister_hooks(net, + &nft_trans_chain_hooks(trans), + true); + } free_percpu(nft_trans_chain_stats(trans)); kfree(nft_trans_chain_name(trans)); nft_trans_destroy(trans);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Justin Chen justin.chen@broadcom.com
[ Upstream commit dfd222e2aef68818320a57b13a1c52a44c22bc80 ]
The unimac requires the PHY RX clk during reset or it may be put into a bad state. Bring up the unimac after link up to ensure the PHY RX clk exists.
Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Justin Chen justin.chen@broadcom.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/broadcom/asp2/bcmasp_intf.c | 28 +++++++++++++------ 1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c b/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c index 9cae5a3090000..b3d04f49f77e9 100644 --- a/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c +++ b/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c @@ -391,7 +391,9 @@ static void umac_reset(struct bcmasp_intf *intf) umac_wl(intf, 0x0, UMC_CMD); umac_wl(intf, UMC_CMD_SW_RESET, UMC_CMD); usleep_range(10, 100); - umac_wl(intf, 0x0, UMC_CMD); + /* We hold the umac in reset and bring it out of + * reset when phy link is up. + */ }
static void umac_set_hw_addr(struct bcmasp_intf *intf, @@ -411,6 +413,8 @@ static void umac_enable_set(struct bcmasp_intf *intf, u32 mask, u32 reg;
reg = umac_rl(intf, UMC_CMD); + if (reg & UMC_CMD_SW_RESET) + return; if (enable) reg |= mask; else @@ -429,7 +433,6 @@ static void umac_init(struct bcmasp_intf *intf) umac_wl(intf, 0x800, UMC_FRM_LEN); umac_wl(intf, 0xffff, UMC_PAUSE_CNTRL); umac_wl(intf, 0x800, UMC_RX_MAX_PKT_SZ); - umac_enable_set(intf, UMC_CMD_PROMISC, 1); }
static int bcmasp_tx_poll(struct napi_struct *napi, int budget) @@ -656,6 +659,12 @@ static void bcmasp_adj_link(struct net_device *dev) UMC_CMD_HD_EN | UMC_CMD_RX_PAUSE_IGNORE | UMC_CMD_TX_PAUSE_IGNORE); reg |= cmd_bits; + if (reg & UMC_CMD_SW_RESET) { + reg &= ~UMC_CMD_SW_RESET; + umac_wl(intf, reg, UMC_CMD); + udelay(2); + reg |= UMC_CMD_TX_EN | UMC_CMD_RX_EN | UMC_CMD_PROMISC; + } umac_wl(intf, reg, UMC_CMD);
intf->eee.eee_active = phy_init_eee(phydev, 0) >= 0; @@ -1061,9 +1070,6 @@ static int bcmasp_netif_init(struct net_device *dev, bool phy_connect)
umac_init(intf);
- /* Disable the UniMAC RX/TX */ - umac_enable_set(intf, (UMC_CMD_RX_EN | UMC_CMD_TX_EN), 0); - umac_set_hw_addr(intf, dev->dev_addr);
intf->old_duplex = -1; @@ -1083,9 +1089,6 @@ static int bcmasp_netif_init(struct net_device *dev, bool phy_connect)
bcmasp_enable_rx(intf, 1);
- /* Turn on UniMAC TX/RX */ - umac_enable_set(intf, (UMC_CMD_RX_EN | UMC_CMD_TX_EN), 1); - intf->crc_fwd = !!(umac_rl(intf, UMC_CMD) & UMC_CMD_CRC_FWD);
bcmasp_netif_start(dev); @@ -1321,7 +1324,14 @@ static void bcmasp_suspend_to_wol(struct bcmasp_intf *intf) if (intf->wolopts & WAKE_FILTER) bcmasp_netfilt_suspend(intf);
- /* UniMAC receive needs to be turned on */ + /* Bring UniMAC out of reset if needed and enable RX */ + reg = umac_rl(intf, UMC_CMD); + if (reg & UMC_CMD_SW_RESET) + reg &= ~UMC_CMD_SW_RESET; + + reg |= UMC_CMD_RX_EN | UMC_CMD_PROMISC; + umac_wl(intf, reg, UMC_CMD); + umac_enable_set(intf, UMC_CMD_RX_EN, 1);
if (intf->parent->wol_irq > 0) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Raju Lakkaraju Raju.Lakkaraju@microchip.com
[ Upstream commit e4a58989f5c839316ac63675e8800b9eed7dbe96 ]
PCI11x1x Rev B0 devices might drop packets when receiving back to back frames at 2.5G link speed. Change the B0 Rev device's Receive filtering Engine FIFO threshold parameter from its hardware default of 4 to 3 dwords to prevent the problem. Rev C0 and later hardware already defaults to 3 dwords.
Fixes: bb4f6bffe33c ("net: lan743x: Add PCI11010 / PCI11414 device IDs") Signed-off-by: Raju Lakkaraju Raju.Lakkaraju@microchip.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240326065805.686128-1-Raju.Lakkaraju@microchip.c... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/lan743x_main.c | 18 ++++++++++++++++++ drivers/net/ethernet/microchip/lan743x_main.h | 4 ++++ 2 files changed, 22 insertions(+)
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c index c81cdeb4d4e7e..0b6174748d2b4 100644 --- a/drivers/net/ethernet/microchip/lan743x_main.c +++ b/drivers/net/ethernet/microchip/lan743x_main.c @@ -25,6 +25,8 @@ #define PCS_POWER_STATE_DOWN 0x6 #define PCS_POWER_STATE_UP 0x4
+#define RFE_RD_FIFO_TH_3_DWORDS 0x3 + static void pci11x1x_strap_get_status(struct lan743x_adapter *adapter) { u32 chip_rev; @@ -3223,6 +3225,21 @@ static void lan743x_full_cleanup(struct lan743x_adapter *adapter) lan743x_pci_cleanup(adapter); }
+static void pci11x1x_set_rfe_rd_fifo_threshold(struct lan743x_adapter *adapter) +{ + u16 rev = adapter->csr.id_rev & ID_REV_CHIP_REV_MASK_; + + if (rev == ID_REV_CHIP_REV_PCI11X1X_B0_) { + u32 misc_ctl; + + misc_ctl = lan743x_csr_read(adapter, MISC_CTL_0); + misc_ctl &= ~MISC_CTL_0_RFE_READ_FIFO_MASK_; + misc_ctl |= FIELD_PREP(MISC_CTL_0_RFE_READ_FIFO_MASK_, + RFE_RD_FIFO_TH_3_DWORDS); + lan743x_csr_write(adapter, MISC_CTL_0, misc_ctl); + } +} + static int lan743x_hardware_init(struct lan743x_adapter *adapter, struct pci_dev *pdev) { @@ -3238,6 +3255,7 @@ static int lan743x_hardware_init(struct lan743x_adapter *adapter, pci11x1x_strap_get_status(adapter); spin_lock_init(&adapter->eth_syslock_spinlock); mutex_init(&adapter->sgmii_rw_lock); + pci11x1x_set_rfe_rd_fifo_threshold(adapter); } else { adapter->max_tx_channels = LAN743X_MAX_TX_CHANNELS; adapter->used_tx_channels = LAN743X_USED_TX_CHANNELS; diff --git a/drivers/net/ethernet/microchip/lan743x_main.h b/drivers/net/ethernet/microchip/lan743x_main.h index 52609fc13ad95..f0b486f85450e 100644 --- a/drivers/net/ethernet/microchip/lan743x_main.h +++ b/drivers/net/ethernet/microchip/lan743x_main.h @@ -26,6 +26,7 @@ #define ID_REV_CHIP_REV_MASK_ (0x0000FFFF) #define ID_REV_CHIP_REV_A0_ (0x00000000) #define ID_REV_CHIP_REV_B0_ (0x00000010) +#define ID_REV_CHIP_REV_PCI11X1X_B0_ (0x000000B0)
#define FPGA_REV (0x04) #define FPGA_REV_GET_MINOR_(fpga_rev) (((fpga_rev) >> 8) & 0x000000FF) @@ -311,6 +312,9 @@ #define SGMII_CTL_LINK_STATUS_SOURCE_ BIT(8) #define SGMII_CTL_SGMII_POWER_DN_ BIT(1)
+#define MISC_CTL_0 (0x920) +#define MISC_CTL_0_RFE_READ_FIFO_MASK_ GENMASK(6, 4) + /* Vendor Specific SGMII MMD details */ #define SR_VSMMD_PCS_ID1 0x0004 #define SR_VSMMD_PCS_ID2 0x0005
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hariprasad Kelam hkelam@marvell.com
[ Upstream commit 40d4b4807cadd83fb3f46cc8cd67a945b5b25461 ]
The Octeontx2 MAC block (CGX) has separate data paths (SMU and GMP) for different speeds, allowing for efficient data transfer.
The previous patch which added pause frame configuration has a bug due to which pause frame feature is not working in GMP mode.
This patch fixes the issue by configurating appropriate registers.
Fixes: f7e086e754fe ("octeontx2-af: Pause frame configuration at cgx") Signed-off-by: Hariprasad Kelam hkelam@marvell.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240326052720.4441-1-hkelam@marvell.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/af/cgx.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c index 6c18d3d2442eb..2539c985f695a 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c @@ -808,6 +808,11 @@ static int cgx_lmac_enadis_pause_frm(void *cgxd, int lmac_id, if (!is_lmac_valid(cgx, lmac_id)) return -ENODEV;
+ cfg = cgx_read(cgx, lmac_id, CGXX_GMP_GMI_RXX_FRM_CTL); + cfg &= ~CGX_GMP_GMI_RXX_FRM_CTL_CTL_BCK; + cfg |= rx_pause ? CGX_GMP_GMI_RXX_FRM_CTL_CTL_BCK : 0x0; + cgx_write(cgx, lmac_id, CGXX_GMP_GMI_RXX_FRM_CTL, cfg); + cfg = cgx_read(cgx, lmac_id, CGXX_SMUX_RX_FRM_CTL); cfg &= ~CGX_SMUX_RX_FRM_CTL_CTL_BCK; cfg |= rx_pause ? CGX_SMUX_RX_FRM_CTL_CTL_BCK : 0x0;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 18685451fc4e546fc0e718580d32df3c0e5c8272 ]
ip_local_out() and other functions can pass skb->sk as function argument.
If the skb is a fragment and reassembly happens before such function call returns, the sk must not be released.
This affects skb fragments reassembled via netfilter or similar modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline.
Eric Dumazet made an initial analysis of this bug. Quoting Eric: Calling ip_defrag() in output path is also implying skb_orphan(), which is buggy because output path relies on sk not disappearing.
A relevant old patch about the issue was : 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()")
[..]
net/ipv4/ip_output.c depends on skb->sk being set, and probably to an inet socket, not an arbitrary one.
If we orphan the packet in ipvlan, then downstream things like FQ packet scheduler will not work properly.
We need to change ip_defrag() to only use skb_orphan() when really needed, ie whenever frag_list is going to be used.
Eric suggested to stash sk in fragment queue and made an initial patch. However there is a problem with this:
If skb is refragmented again right after, ip_do_fragment() will copy head->sk to the new fragments, and sets up destructor to sock_wfree. IOW, we have no choice but to fix up sk_wmem accouting to reflect the fully reassembled skb, else wmem will underflow.
This change moves the orphan down into the core, to last possible moment. As ip_defrag_offset is aliased with sk_buff->sk member, we must move the offset into the FRAG_CB, else skb->sk gets clobbered.
This allows to delay the orphaning long enough to learn if the skb has to be queued or if the skb is completing the reasm queue.
In the former case, things work as before, skb is orphaned. This is safe because skb gets queued/stolen and won't continue past reasm engine.
In the latter case, we will steal the skb->sk reference, reattach it to the head skb, and fix up wmem accouting when inet_frag inflates truesize.
Fixes: 7026b1ddb6b8 ("netfilter: Pass socket pointer down through okfn().") Diagnosed-by: Eric Dumazet edumazet@google.com Reported-by: xingwei lee xrivendell7@gmail.com Reported-by: yue sun samsun1006219@gmail.com Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/skbuff.h | 7 +-- net/ipv4/inet_fragment.c | 70 ++++++++++++++++++++----- net/ipv4/ip_fragment.c | 2 +- net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +- 4 files changed, 60 insertions(+), 21 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2922059908cc5..9e61f6df6bc55 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -736,8 +736,6 @@ typedef unsigned char *sk_buff_data_t; * @list: queue head * @ll_node: anchor in an llist (eg socket defer_list) * @sk: Socket we are owned by - * @ip_defrag_offset: (aka @sk) alternate use of @sk, used in - * fragmentation management * @dev: Device we arrived on/are leaving by * @dev_scratch: (aka @dev) alternate use of @dev when @dev would be %NULL * @cb: Control buffer. Free for use by every layer. Put private vars here @@ -860,10 +858,7 @@ struct sk_buff { struct llist_node ll_node; };
- union { - struct sock *sk; - int ip_defrag_offset; - }; + struct sock *sk;
union { ktime_t tstamp; diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 7072fc0783ef5..c88c9034d6300 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -24,6 +24,8 @@ #include <net/ip.h> #include <net/ipv6.h>
+#include "../core/sock_destructor.h" + /* Use skb->cb to track consecutive/adjacent fragments coming at * the end of the queue. Nodes in the rb-tree queue will * contain "runs" of one or more adjacent fragments. @@ -39,6 +41,7 @@ struct ipfrag_skb_cb { }; struct sk_buff *next_frag; int frag_run_len; + int ip_defrag_offset; };
#define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb)) @@ -396,12 +399,12 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb, */ if (!last) fragrun_create(q, skb); /* First fragment. */ - else if (last->ip_defrag_offset + last->len < end) { + else if (FRAG_CB(last)->ip_defrag_offset + last->len < end) { /* This is the common case: skb goes to the end. */ /* Detect and discard overlaps. */ - if (offset < last->ip_defrag_offset + last->len) + if (offset < FRAG_CB(last)->ip_defrag_offset + last->len) return IPFRAG_OVERLAP; - if (offset == last->ip_defrag_offset + last->len) + if (offset == FRAG_CB(last)->ip_defrag_offset + last->len) fragrun_append_to_last(q, skb); else fragrun_create(q, skb); @@ -418,13 +421,13 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb,
parent = *rbn; curr = rb_to_skb(parent); - curr_run_end = curr->ip_defrag_offset + + curr_run_end = FRAG_CB(curr)->ip_defrag_offset + FRAG_CB(curr)->frag_run_len; - if (end <= curr->ip_defrag_offset) + if (end <= FRAG_CB(curr)->ip_defrag_offset) rbn = &parent->rb_left; else if (offset >= curr_run_end) rbn = &parent->rb_right; - else if (offset >= curr->ip_defrag_offset && + else if (offset >= FRAG_CB(curr)->ip_defrag_offset && end <= curr_run_end) return IPFRAG_DUP; else @@ -438,7 +441,7 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb, rb_insert_color(&skb->rbnode, &q->rb_fragments); }
- skb->ip_defrag_offset = offset; + FRAG_CB(skb)->ip_defrag_offset = offset;
return IPFRAG_OK; } @@ -448,13 +451,28 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, struct sk_buff *parent) { struct sk_buff *fp, *head = skb_rb_first(&q->rb_fragments); - struct sk_buff **nextp; + void (*destructor)(struct sk_buff *); + unsigned int orig_truesize = 0; + struct sk_buff **nextp = NULL; + struct sock *sk = skb->sk; int delta;
+ if (sk && is_skb_wmem(skb)) { + /* TX: skb->sk might have been passed as argument to + * dst->output and must remain valid until tx completes. + * + * Move sk to reassembled skb and fix up wmem accounting. + */ + orig_truesize = skb->truesize; + destructor = skb->destructor; + } + if (head != skb) { fp = skb_clone(skb, GFP_ATOMIC); - if (!fp) - return NULL; + if (!fp) { + head = skb; + goto out_restore_sk; + } FRAG_CB(fp)->next_frag = FRAG_CB(skb)->next_frag; if (RB_EMPTY_NODE(&skb->rbnode)) FRAG_CB(parent)->next_frag = fp; @@ -463,6 +481,12 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, &q->rb_fragments); if (q->fragments_tail == skb) q->fragments_tail = fp; + + if (orig_truesize) { + /* prevent skb_morph from releasing sk */ + skb->sk = NULL; + skb->destructor = NULL; + } skb_morph(skb, head); FRAG_CB(skb)->next_frag = FRAG_CB(head)->next_frag; rb_replace_node(&head->rbnode, &skb->rbnode, @@ -470,13 +494,13 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, consume_skb(head); head = skb; } - WARN_ON(head->ip_defrag_offset != 0); + WARN_ON(FRAG_CB(head)->ip_defrag_offset != 0);
delta = -head->truesize;
/* Head of list must not be cloned. */ if (skb_unclone(head, GFP_ATOMIC)) - return NULL; + goto out_restore_sk;
delta += head->truesize; if (delta) @@ -492,7 +516,7 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
clone = alloc_skb(0, GFP_ATOMIC); if (!clone) - return NULL; + goto out_restore_sk; skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list; skb_frag_list_init(head); for (i = 0; i < skb_shinfo(head)->nr_frags; i++) @@ -509,6 +533,21 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, nextp = &skb_shinfo(head)->frag_list; }
+out_restore_sk: + if (orig_truesize) { + int ts_delta = head->truesize - orig_truesize; + + /* if this reassembled skb is fragmented later, + * fraglist skbs will get skb->sk assigned from head->sk, + * and each frag skb will be released via sock_wfree. + * + * Update sk_wmem_alloc. + */ + head->sk = sk; + head->destructor = destructor; + refcount_add(ts_delta, &sk->sk_wmem_alloc); + } + return nextp; } EXPORT_SYMBOL(inet_frag_reasm_prepare); @@ -516,6 +555,8 @@ EXPORT_SYMBOL(inet_frag_reasm_prepare); void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head, void *reasm_data, bool try_coalesce) { + struct sock *sk = is_skb_wmem(head) ? head->sk : NULL; + const unsigned int head_truesize = head->truesize; struct sk_buff **nextp = reasm_data; struct rb_node *rbn; struct sk_buff *fp; @@ -579,6 +620,9 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head, head->prev = NULL; head->tstamp = q->stamp; head->mono_delivery_time = q->mono_delivery_time; + + if (sk) + refcount_add(sum_truesize - head_truesize, &sk->sk_wmem_alloc); } EXPORT_SYMBOL(inet_frag_reasm_finish);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index a4941f53b5237..fb947d1613fe2 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -384,6 +384,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) }
skb_dst_drop(skb); + skb_orphan(skb); return -EINPROGRESS;
insert_error: @@ -487,7 +488,6 @@ int ip_defrag(struct net *net, struct sk_buff *skb, u32 user) struct ipq *qp;
__IP_INC_STATS(net, IPSTATS_MIB_REASMREQDS); - skb_orphan(skb);
/* Lookup (or create) queue header */ qp = ip_find(net, ip_hdr(skb), user, vif); diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index b2dd48911c8d6..efbec7ee27d0a 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -294,6 +294,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, }
skb_dst_drop(skb); + skb_orphan(skb); return -EINPROGRESS;
insert_error: @@ -469,7 +470,6 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user) hdr = ipv6_hdr(skb); fhdr = (struct frag_hdr *)skb_transport_header(skb);
- skb_orphan(skb); fq = fq_find(net, fhdr->identification, user, hdr, skb->dev ? skb->dev->ifindex : 0); if (fq == NULL) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matt Roper matthew.d.roper@intel.com
[ Upstream commit eaeb4b3614529bfa8a7edfdd7ecf6977b27f18b2 ]
DG2 first production steppings were C0 (for DG2-G10), B1 (for DG2-G11), and A1 (for DG2-G12). Several workarounds that apply onto to pre-production hardware can be dropped. Furthermore, several workarounds that apply to all production steppings can have their conditions simplified to no longer check the GT stepping.
v2: - Keep Wa_16011777198 in place for now; it will be removed separately in a follow-up patch to keep review easier.
Bspec: 44477 Signed-off-by: Matt Roper matthew.d.roper@intel.com Acked-by: Jani Nikula jani.nikula@intel.com Reviewed-by: Matt Atwood matthew.s.atwood@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230816214201.534095-10-matth... Stable-dep-of: 186bce682772 ("drm/i915/mtl: Update workaround 14018575942") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/intel_lrc.c | 34 +--- drivers/gpu/drm/i915/gt/intel_mocs.c | 21 +- drivers/gpu/drm/i915/gt/intel_rc6.c | 6 +- drivers/gpu/drm/i915/gt/intel_workarounds.c | 211 +------------------- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 20 +- drivers/gpu/drm/i915/intel_clock_gating.c | 8 - 6 files changed, 21 insertions(+), 279 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index c378cc7c953c4..f297c5808e7c8 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1316,29 +1316,6 @@ gen12_emit_cmd_buf_wa(const struct intel_context *ce, u32 *cs) return cs; }
-/* - * On DG2 during context restore of a preempted context in GPGPU mode, - * RCS restore hang is detected. This is extremely timing dependent. - * To address this below sw wabb is implemented for DG2 A steppings. - */ -static u32 * -dg2_emit_rcs_hang_wabb(const struct intel_context *ce, u32 *cs) -{ - *cs++ = MI_LOAD_REGISTER_IMM(1); - *cs++ = i915_mmio_reg_offset(GEN12_STATE_ACK_DEBUG(ce->engine->mmio_base)); - *cs++ = 0x21; - - *cs++ = MI_LOAD_REGISTER_REG; - *cs++ = i915_mmio_reg_offset(RING_NOPID(ce->engine->mmio_base)); - *cs++ = i915_mmio_reg_offset(XEHP_CULLBIT1); - - *cs++ = MI_LOAD_REGISTER_REG; - *cs++ = i915_mmio_reg_offset(RING_NOPID(ce->engine->mmio_base)); - *cs++ = i915_mmio_reg_offset(XEHP_CULLBIT2); - - return cs; -} - /* * The bspec's tuning guide asks us to program a vertical watermark value of * 0x3FF. However this register is not saved/restored properly by the @@ -1363,14 +1340,8 @@ gen12_emit_indirect_ctx_rcs(const struct intel_context *ce, u32 *cs) cs = gen12_emit_cmd_buf_wa(ce, cs); cs = gen12_emit_restore_scratch(ce, cs);
- /* Wa_22011450934:dg2 */ - if (IS_DG2_GRAPHICS_STEP(ce->engine->i915, G10, STEP_A0, STEP_B0) || - IS_DG2_GRAPHICS_STEP(ce->engine->i915, G11, STEP_A0, STEP_B0)) - cs = dg2_emit_rcs_hang_wabb(ce, cs); - /* Wa_16013000631:dg2 */ - if (IS_DG2_GRAPHICS_STEP(ce->engine->i915, G10, STEP_B0, STEP_C0) || - IS_DG2_G11(ce->engine->i915)) + if (IS_DG2_G11(ce->engine->i915)) cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE, 0);
cs = gen12_emit_aux_table_inv(ce->engine, cs); @@ -1391,8 +1362,7 @@ gen12_emit_indirect_ctx_xcs(const struct intel_context *ce, u32 *cs) cs = gen12_emit_restore_scratch(ce, cs);
/* Wa_16013000631:dg2 */ - if (IS_DG2_GRAPHICS_STEP(ce->engine->i915, G10, STEP_B0, STEP_C0) || - IS_DG2_G11(ce->engine->i915)) + if (IS_DG2_G11(ce->engine->i915)) if (ce->engine->class == COMPUTE_CLASS) cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE, diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c index 2c014407225cc..bf8b42d2d3279 100644 --- a/drivers/gpu/drm/i915/gt/intel_mocs.c +++ b/drivers/gpu/drm/i915/gt/intel_mocs.c @@ -404,18 +404,6 @@ static const struct drm_i915_mocs_entry dg2_mocs_table[] = { MOCS_ENTRY(3, 0, L3_3_WB | L3_LKUP(1)), };
-static const struct drm_i915_mocs_entry dg2_mocs_table_g10_ax[] = { - /* Wa_14011441408: Set Go to Memory for MOCS#0 */ - MOCS_ENTRY(0, 0, L3_1_UC | L3_GLBGO(1) | L3_LKUP(1)), - /* UC - Coherent; GO:Memory */ - MOCS_ENTRY(1, 0, L3_1_UC | L3_GLBGO(1) | L3_LKUP(1)), - /* UC - Non-Coherent; GO:Memory */ - MOCS_ENTRY(2, 0, L3_1_UC | L3_GLBGO(1)), - - /* WB - LC */ - MOCS_ENTRY(3, 0, L3_3_WB | L3_LKUP(1)), -}; - static const struct drm_i915_mocs_entry pvc_mocs_table[] = { /* Error */ MOCS_ENTRY(0, 0, L3_3_WB), @@ -521,13 +509,8 @@ static unsigned int get_mocs_settings(const struct drm_i915_private *i915, table->wb_index = 2; table->unused_entries_index = 2; } else if (IS_DG2(i915)) { - if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) { - table->size = ARRAY_SIZE(dg2_mocs_table_g10_ax); - table->table = dg2_mocs_table_g10_ax; - } else { - table->size = ARRAY_SIZE(dg2_mocs_table); - table->table = dg2_mocs_table; - } + table->size = ARRAY_SIZE(dg2_mocs_table); + table->table = dg2_mocs_table; table->uc_index = 1; table->n_entries = GEN9_NUM_MOCS_ENTRIES; table->unused_entries_index = 3; diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c index ccdc1afbf11b5..b8c9338176bd6 100644 --- a/drivers/gpu/drm/i915/gt/intel_rc6.c +++ b/drivers/gpu/drm/i915/gt/intel_rc6.c @@ -118,14 +118,12 @@ static void gen11_rc6_enable(struct intel_rc6 *rc6) GEN6_RC_CTL_EI_MODE(1);
/* - * Wa_16011777198 and BSpec 52698 - Render powergating must be off. + * BSpec 52698 - Render powergating must be off. * FIXME BSpec is outdated, disabling powergating for MTL is just * temporary wa and should be removed after fixing real cause * of forcewake timeouts. */ - if (IS_METEORLAKE(gt->i915) || - IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) || - IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0)) + if (IS_METEORLAKE(gt->i915)) pg_enable = GEN9_MEDIA_PG_ENABLE | GEN11_MEDIA_SAMPLER_PG_ENABLE; diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 3ae0dbd39eaa3..7b426f3015b34 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -764,39 +764,15 @@ static void dg2_ctx_workarounds_init(struct intel_engine_cs *engine, { dg2_ctx_gt_tuning_init(engine, wal);
- /* Wa_16011186671:dg2_g11 */ - if (IS_DG2_GRAPHICS_STEP(engine->i915, G11, STEP_A0, STEP_B0)) { - wa_mcr_masked_dis(wal, VFLSKPD, DIS_MULT_MISS_RD_SQUASH); - wa_mcr_masked_en(wal, VFLSKPD, DIS_OVER_FETCH_CACHE); - } - - if (IS_DG2_GRAPHICS_STEP(engine->i915, G10, STEP_A0, STEP_B0)) { - /* Wa_14010469329:dg2_g10 */ - wa_mcr_masked_en(wal, XEHP_COMMON_SLICE_CHICKEN3, - XEHP_DUAL_SIMD8_SEQ_MERGE_DISABLE); - - /* - * Wa_22010465075:dg2_g10 - * Wa_22010613112:dg2_g10 - * Wa_14010698770:dg2_g10 - */ - wa_mcr_masked_en(wal, XEHP_COMMON_SLICE_CHICKEN3, - GEN12_DISABLE_CPS_AWARE_COLOR_PIPE); - } - /* Wa_16013271637:dg2 */ wa_mcr_masked_en(wal, XEHP_SLICE_COMMON_ECO_CHICKEN1, MSC_MSAA_REODER_BUF_BYPASS_DISABLE);
/* Wa_14014947963:dg2 */ - if (IS_DG2_GRAPHICS_STEP(engine->i915, G10, STEP_B0, STEP_FOREVER) || - IS_DG2_G11(engine->i915) || IS_DG2_G12(engine->i915)) - wa_masked_field_set(wal, VF_PREEMPTION, PREEMPTION_VERTEX_COUNT, 0x4000); + wa_masked_field_set(wal, VF_PREEMPTION, PREEMPTION_VERTEX_COUNT, 0x4000);
/* Wa_18018764978:dg2 */ - if (IS_DG2_GRAPHICS_STEP(engine->i915, G10, STEP_C0, STEP_FOREVER) || - IS_DG2_G11(engine->i915) || IS_DG2_G12(engine->i915)) - wa_mcr_masked_en(wal, XEHP_PSS_MODE2, SCOREBOARD_STALL_FLUSH_CONTROL); + wa_mcr_masked_en(wal, XEHP_PSS_MODE2, SCOREBOARD_STALL_FLUSH_CONTROL);
/* Wa_15010599737:dg2 */ wa_mcr_masked_en(wal, CHICKEN_RASTER_1, DIS_SF_ROUND_NEAREST_EVEN); @@ -1606,31 +1582,11 @@ xehpsdv_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal) static void dg2_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal) { - struct intel_engine_cs *engine; - int id; - xehp_init_mcr(gt, wal);
/* Wa_14011060649:dg2 */ wa_14011060649(gt, wal);
- /* - * Although there are per-engine instances of these registers, - * they technically exist outside the engine itself and are not - * impacted by engine resets. Furthermore, they're part of the - * GuC blacklist so trying to treat them as engine workarounds - * will result in GuC initialization failure and a wedged GPU. - */ - for_each_engine(engine, gt, id) { - if (engine->class != VIDEO_DECODE_CLASS) - continue; - - /* Wa_16010515920:dg2_g10 */ - if (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_B0)) - wa_write_or(wal, VDBOX_CGCTL3F18(engine->mmio_base), - ALNUNIT_CLKGATE_DIS); - } - if (IS_DG2_G10(gt->i915)) { /* Wa_22010523718:dg2 */ wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE, @@ -1641,65 +1597,6 @@ dg2_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal) DSS_ROUTER_CLKGATE_DIS); }
- if (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_B0) || - IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0)) { - /* Wa_14012362059:dg2 */ - wa_mcr_write_or(wal, XEHP_MERT_MOD_CTRL, FORCE_MISS_FTLB); - } - - if (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_B0)) { - /* Wa_14010948348:dg2_g10 */ - wa_write_or(wal, UNSLCGCTL9430, MSQDUNIT_CLKGATE_DIS); - - /* Wa_14011037102:dg2_g10 */ - wa_write_or(wal, UNSLCGCTL9444, LTCDD_CLKGATE_DIS); - - /* Wa_14011371254:dg2_g10 */ - wa_mcr_write_or(wal, XEHP_SLICE_UNIT_LEVEL_CLKGATE, NODEDSS_CLKGATE_DIS); - - /* Wa_14011431319:dg2_g10 */ - wa_write_or(wal, UNSLCGCTL9440, GAMTLBOACS_CLKGATE_DIS | - GAMTLBVDBOX7_CLKGATE_DIS | - GAMTLBVDBOX6_CLKGATE_DIS | - GAMTLBVDBOX5_CLKGATE_DIS | - GAMTLBVDBOX4_CLKGATE_DIS | - GAMTLBVDBOX3_CLKGATE_DIS | - GAMTLBVDBOX2_CLKGATE_DIS | - GAMTLBVDBOX1_CLKGATE_DIS | - GAMTLBVDBOX0_CLKGATE_DIS | - GAMTLBKCR_CLKGATE_DIS | - GAMTLBGUC_CLKGATE_DIS | - GAMTLBBLT_CLKGATE_DIS); - wa_write_or(wal, UNSLCGCTL9444, GAMTLBGFXA0_CLKGATE_DIS | - GAMTLBGFXA1_CLKGATE_DIS | - GAMTLBCOMPA0_CLKGATE_DIS | - GAMTLBCOMPA1_CLKGATE_DIS | - GAMTLBCOMPB0_CLKGATE_DIS | - GAMTLBCOMPB1_CLKGATE_DIS | - GAMTLBCOMPC0_CLKGATE_DIS | - GAMTLBCOMPC1_CLKGATE_DIS | - GAMTLBCOMPD0_CLKGATE_DIS | - GAMTLBCOMPD1_CLKGATE_DIS | - GAMTLBMERT_CLKGATE_DIS | - GAMTLBVEBOX3_CLKGATE_DIS | - GAMTLBVEBOX2_CLKGATE_DIS | - GAMTLBVEBOX1_CLKGATE_DIS | - GAMTLBVEBOX0_CLKGATE_DIS); - - /* Wa_14010569222:dg2_g10 */ - wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE, - GAMEDIA_CLKGATE_DIS); - - /* Wa_14011028019:dg2_g10 */ - wa_mcr_write_or(wal, SSMCGCTL9530, RTFUNIT_CLKGATE_DIS); - - /* Wa_14010680813:dg2_g10 */ - wa_mcr_write_or(wal, XEHP_GAMSTLB_CTRL, - CONTROL_BLOCK_CLKGATE_DIS | - EGRESS_BLOCK_CLKGATE_DIS | - TAG_BLOCK_CLKGATE_DIS); - } - /* Wa_14014830051:dg2 */ wa_mcr_write_clr(wal, SARB_CHICKEN1, COMP_CKN_IN);
@@ -2242,29 +2139,10 @@ static void dg2_whitelist_build(struct intel_engine_cs *engine)
switch (engine->class) { case RENDER_CLASS: - /* - * Wa_1507100340:dg2_g10 - * - * This covers 4 registers which are next to one another : - * - PS_INVOCATION_COUNT - * - PS_INVOCATION_COUNT_UDW - * - PS_DEPTH_COUNT - * - PS_DEPTH_COUNT_UDW - */ - if (IS_DG2_GRAPHICS_STEP(engine->i915, G10, STEP_A0, STEP_B0)) - whitelist_reg_ext(w, PS_INVOCATION_COUNT, - RING_FORCE_TO_NONPRIV_ACCESS_RD | - RING_FORCE_TO_NONPRIV_RANGE_4); - /* Required by recommended tuning setting (not a workaround) */ whitelist_mcr_reg(w, XEHP_COMMON_SLICE_CHICKEN3);
break; - case COMPUTE_CLASS: - /* Wa_16011157294:dg2_g10 */ - if (IS_DG2_GRAPHICS_STEP(engine->i915, G10, STEP_A0, STEP_B0)) - whitelist_reg(w, GEN9_CTX_PREEMPT_REG); - break; default: break; } @@ -2415,12 +2293,6 @@ engine_fake_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) } }
-static bool needs_wa_1308578152(struct intel_engine_cs *engine) -{ - return intel_sseu_find_first_xehp_dss(&engine->gt->info.sseu, 0, 0) >= - GEN_DSS_PER_GSLICE; -} - static void rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) { @@ -2435,42 +2307,20 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0) || - IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_FOREVER) || - IS_DG2_G11(i915) || IS_DG2_G12(i915)) { + IS_DG2(i915)) { /* Wa_1509727124 */ wa_mcr_masked_en(wal, GEN10_SAMPLER_MODE, SC_DISABLE_POWER_OPTIMIZATION_EBB); }
- if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_FOREVER) || - IS_DG2_G11(i915) || IS_DG2_G12(i915) || - IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0)) { + if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || + IS_DG2(i915)) { /* Wa_22012856258 */ wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN2, GEN12_DISABLE_READ_SUPPRESSION); }
- if (IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_B0)) { - /* Wa_14013392000:dg2_g11 */ - wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN2, GEN12_ENABLE_LARGE_GRF_MODE); - } - - if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0) || - IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_B0)) { - /* Wa_14012419201:dg2 */ - wa_mcr_masked_en(wal, GEN9_ROW_CHICKEN4, - GEN12_DISABLE_HDR_PAST_PAYLOAD_HOLD_FIX); - } - - /* Wa_1308578152:dg2_g10 when first gslice is fused off */ - if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_C0) && - needs_wa_1308578152(engine)) { - wa_masked_dis(wal, GEN12_CS_DEBUG_MODE1_CCCSUNIT_BE_COMMON, - GEN12_REPLAY_MODE_GRANULARITY); - } - - if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_FOREVER) || - IS_DG2_G11(i915) || IS_DG2_G12(i915)) { + if (IS_DG2(i915)) { /* * Wa_22010960976:dg2 * Wa_14013347512:dg2 @@ -2479,34 +2329,7 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) LSC_L1_FLUSH_CTL_3D_DATAPORT_FLUSH_EVENTS_MASK); }
- if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) { - /* - * Wa_1608949956:dg2_g10 - * Wa_14010198302:dg2_g10 - */ - wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN, - MDQ_ARBITRATION_MODE | UGM_BACKUP_MODE); - } - - if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) - /* Wa_22010430635:dg2 */ - wa_mcr_masked_en(wal, - GEN9_ROW_CHICKEN4, - GEN12_DISABLE_GRF_CLEAR); - - /* Wa_14013202645:dg2 */ - if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_C0) || - IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_B0)) - wa_mcr_write_or(wal, RT_CTRL, DIS_NULL_QUERY); - - /* Wa_22012532006:dg2 */ - if (IS_DG2_GRAPHICS_STEP(engine->i915, G10, STEP_A0, STEP_C0) || - IS_DG2_GRAPHICS_STEP(engine->i915, G11, STEP_A0, STEP_B0)) - wa_mcr_masked_en(wal, GEN9_HALF_SLICE_CHICKEN7, - DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA); - - if (IS_DG2_GRAPHICS_STEP(i915, G11, STEP_B0, STEP_FOREVER) || - IS_DG2_G10(i915)) { + if (IS_DG2_G11(i915) || IS_DG2_G10(i915)) { /* Wa_22014600077:dg2 */ wa_mcr_add(wal, GEN10_CACHE_MODE_SS, 0, _MASKED_BIT_ENABLE(ENABLE_EU_COUNT_FOR_TDL_FLUSH), @@ -3050,8 +2873,7 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li
if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0) || - IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_FOREVER) || - IS_DG2_G11(i915) || IS_DG2_G12(i915)) { + IS_DG2(i915)) { /* Wa_22013037850 */ wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, DISABLE_128B_EVICTION_COMMAND_UDW); @@ -3072,8 +2894,7 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li wa_masked_en(wal, VFG_PREEMPTION_CHICKEN, POLYGON_TRIFAN_LINELOOP_DISABLE); }
- if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_C0) || - IS_DG2_G11(i915)) { + if (IS_DG2_G11(i915)) { /* * Wa_22012826095:dg2 * Wa_22013059131:dg2 @@ -3087,18 +2908,6 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li FORCE_1_SUB_MESSAGE_PER_FRAGMENT); }
- if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) { - /* - * Wa_14010918519:dg2_g10 - * - * LSC_CHICKEN_BIT_0 always reads back as 0 is this stepping, - * so ignoring verification. - */ - wa_mcr_add(wal, LSC_CHICKEN_BIT_0_UDW, 0, - FORCE_SLM_FENCE_SCOPE_TO_TILE | FORCE_UGM_FENCE_SCOPE_TO_TILE, - 0, false); - } - if (IS_XEHPSDV(i915)) { /* Wa_1409954639 */ wa_mcr_masked_en(wal, @@ -3131,7 +2940,7 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, DIS_CHAIN_2XSIMD8); }
- if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_C0) || IS_DG2_G11(i915)) + if (IS_DG2_G11(i915)) /* * Wa_22012654132 * diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 569b5fe94c416..82a2ecc12b212 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -272,18 +272,14 @@ static u32 guc_ctl_wa_flags(struct intel_guc *guc) GRAPHICS_VER_FULL(gt->i915) < IP_VER(12, 50)) flags |= GUC_WA_POLLCS;
- /* Wa_16011759253:dg2_g10:a0 */ - if (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_B0)) - flags |= GUC_WA_GAM_CREDITS; - /* Wa_14014475959 */ if (IS_MTL_GRAPHICS_STEP(gt->i915, M, STEP_A0, STEP_B0) || IS_DG2(gt->i915)) flags |= GUC_WA_HOLD_CCS_SWITCHOUT;
/* - * Wa_14012197797:dg2_g10:a0,dg2_g11:a0 - * Wa_22011391025:dg2_g10,dg2_g11,dg2_g12 + * Wa_14012197797 + * Wa_22011391025 * * The same WA bit is used for both and 22011391025 is applicable to * all DG2. @@ -297,17 +293,11 @@ static u32 guc_ctl_wa_flags(struct intel_guc *guc) GRAPHICS_VER_FULL(gt->i915) < IP_VER(12, 70))) flags |= GUC_WA_PRE_PARSER;
- /* Wa_16011777198:dg2 */ - if (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) || - IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0)) - flags |= GUC_WA_RCS_RESET_BEFORE_RC6; - /* - * Wa_22012727170:dg2_g10[a0-c0), dg2_g11[a0..) - * Wa_22012727685:dg2_g11[a0..) + * Wa_22012727170 + * Wa_22012727685 */ - if (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) || - IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_FOREVER)) + if (IS_DG2_G11(gt->i915)) flags |= GUC_WA_CONTEXT_ISOLATION;
/* Wa_16015675438 */ diff --git a/drivers/gpu/drm/i915/intel_clock_gating.c b/drivers/gpu/drm/i915/intel_clock_gating.c index 81a4d32734e94..c66eb6abd4a2e 100644 --- a/drivers/gpu/drm/i915/intel_clock_gating.c +++ b/drivers/gpu/drm/i915/intel_clock_gating.c @@ -396,14 +396,6 @@ static void dg2_init_clock_gating(struct drm_i915_private *i915) /* Wa_22010954014:dg2 */ intel_uncore_rmw(&i915->uncore, XEHP_CLOCK_GATE_DIS, 0, SGSI_SIDECLK_DIS); - - /* - * Wa_14010733611:dg2_g10 - * Wa_22010146351:dg2_g10 - */ - if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) - intel_uncore_rmw(&i915->uncore, XEHP_CLOCK_GATE_DIS, 0, - SGR_DIS | SGGI_DIS); }
static void pvc_init_clock_gating(struct drm_i915_private *i915)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matt Roper matthew.d.roper@intel.com
[ Upstream commit f1c805716516f9e648e13f0108cea8096e0c7023 ]
Removal of the DG2 pre-production workarounds has left duplicate condition blocks in a couple places, as well as some inconsistent platform ordering. Reshuffle and consolidate some of the workarounds to reduce the number of condition blocks and to more consistently follow the "newest platform first" convention. Code movement only; no functional change.
Signed-off-by: Matt Roper matthew.d.roper@intel.com Acked-by: Jani Nikula jani.nikula@intel.com Reviewed-by: Matt Atwood matthew.s.atwood@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230816214201.534095-11-matth... Stable-dep-of: 186bce682772 ("drm/i915/mtl: Update workaround 14018575942") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/intel_workarounds.c | 100 +++++++++----------- 1 file changed, 46 insertions(+), 54 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 7b426f3015b34..69973dc518280 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2337,6 +2337,19 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) true); }
+ if (IS_DG2(i915) || IS_ALDERLAKE_P(i915) || IS_ALDERLAKE_S(i915) || + IS_DG1(i915) || IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) { + /* + * Wa_1606700617:tgl,dg1,adl-p + * Wa_22010271021:tgl,rkl,dg1,adl-s,adl-p + * Wa_14010826681:tgl,dg1,rkl,adl-p + * Wa_18019627453:dg2 + */ + wa_masked_en(wal, + GEN9_CS_DEBUG_MODE1, + FF_DOP_CLOCK_GATE_DISABLE); + } + if (IS_ALDERLAKE_P(i915) || IS_ALDERLAKE_S(i915) || IS_DG1(i915) || IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) { /* Wa_1606931601:tgl,rkl,dg1,adl-s,adl-p */ @@ -2350,19 +2363,11 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) */ wa_write_or(wal, GEN7_FF_THREAD_MODE, GEN12_FF_TESSELATION_DOP_GATE_DISABLE); - }
- if (IS_ALDERLAKE_P(i915) || IS_DG2(i915) || IS_ALDERLAKE_S(i915) || - IS_DG1(i915) || IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) { - /* - * Wa_1606700617:tgl,dg1,adl-p - * Wa_22010271021:tgl,rkl,dg1,adl-s,adl-p - * Wa_14010826681:tgl,dg1,rkl,adl-p - * Wa_18019627453:dg2 - */ - wa_masked_en(wal, - GEN9_CS_DEBUG_MODE1, - FF_DOP_CLOCK_GATE_DISABLE); + /* Wa_1406941453:tgl,rkl,dg1,adl-s,adl-p */ + wa_mcr_masked_en(wal, + GEN10_SAMPLER_MODE, + ENABLE_SMALLPL); }
if (IS_ALDERLAKE_P(i915) || IS_ALDERLAKE_S(i915) || @@ -2389,14 +2394,6 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) GEN8_RC_SEMA_IDLE_MSG_DISABLE); }
- if (IS_DG1(i915) || IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915) || - IS_ALDERLAKE_S(i915) || IS_ALDERLAKE_P(i915)) { - /* Wa_1406941453:tgl,rkl,dg1,adl-s,adl-p */ - wa_mcr_masked_en(wal, - GEN10_SAMPLER_MODE, - ENABLE_SMALLPL); - } - if (GRAPHICS_VER(i915) == 11) { /* This is not an Wa. Enable for better image quality */ wa_masked_en(wal, @@ -2877,6 +2874,9 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li /* Wa_22013037850 */ wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, DISABLE_128B_EVICTION_COMMAND_UDW); + + /* Wa_18017747507 */ + wa_masked_en(wal, VFG_PREEMPTION_CHICKEN, POLYGON_TRIFAN_LINELOOP_DISABLE); }
if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || @@ -2887,11 +2887,20 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0, DISABLE_D8_D16_COASLESCE); }
- if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0) || - IS_DG2(i915)) { - /* Wa_18017747507 */ - wa_masked_en(wal, VFG_PREEMPTION_CHICKEN, POLYGON_TRIFAN_LINELOOP_DISABLE); + if (IS_PONTEVECCHIO(i915) || IS_DG2(i915)) { + /* Wa_14015227452:dg2,pvc */ + wa_mcr_masked_en(wal, GEN9_ROW_CHICKEN4, XEHP_DIS_BBL_SYSPIPE); + + /* Wa_16015675438:dg2,pvc */ + wa_masked_en(wal, FF_SLICE_CS_CHICKEN2, GEN12_PERF_FIX_BALANCING_CFE_DISABLE); + } + + if (IS_DG2(i915)) { + /* + * Wa_16011620976:dg2_g11 + * Wa_22015475538:dg2 + */ + wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, DIS_CHAIN_2XSIMD8); }
if (IS_DG2_G11(i915)) { @@ -2906,6 +2915,18 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li /* Wa_22013059131:dg2 */ wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0, FORCE_1_SUB_MESSAGE_PER_FRAGMENT); + + /* + * Wa_22012654132 + * + * Note that register 0xE420 is write-only and cannot be read + * back for verification on DG2 (due to Wa_14012342262), so + * we need to explicitly skip the readback. + */ + wa_mcr_add(wal, GEN10_CACHE_MODE_SS, 0, + _MASKED_BIT_ENABLE(ENABLE_PREFETCH_INTO_IC), + 0 /* write-only, so skip validation */, + true); }
if (IS_XEHPSDV(i915)) { @@ -2923,35 +2944,6 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN1, GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE); } - - if (IS_DG2(i915) || IS_PONTEVECCHIO(i915)) { - /* Wa_14015227452:dg2,pvc */ - wa_mcr_masked_en(wal, GEN9_ROW_CHICKEN4, XEHP_DIS_BBL_SYSPIPE); - - /* Wa_16015675438:dg2,pvc */ - wa_masked_en(wal, FF_SLICE_CS_CHICKEN2, GEN12_PERF_FIX_BALANCING_CFE_DISABLE); - } - - if (IS_DG2(i915)) { - /* - * Wa_16011620976:dg2_g11 - * Wa_22015475538:dg2 - */ - wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, DIS_CHAIN_2XSIMD8); - } - - if (IS_DG2_G11(i915)) - /* - * Wa_22012654132 - * - * Note that register 0xE420 is write-only and cannot be read - * back for verification on DG2 (due to Wa_14012342262), so - * we need to explicitly skip the readback. - */ - wa_mcr_add(wal, GEN10_CACHE_MODE_SS, 0, - _MASKED_BIT_ENABLE(ENABLE_PREFETCH_INTO_IC), - 0 /* write-only, so skip validation */, - true); }
static void
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matt Roper matthew.d.roper@intel.com
[ Upstream commit 28c46feec7f8760683ef08f12746630a3598173e ]
The workaround bounds for Wa_22011802037 are somewhat complex and are replicated in several places throughout the code. Pull the condition out to a helper function to prevent mistakes if this condition needs to change again in the future.
Signed-off-by: Matt Roper matthew.d.roper@intel.com Reviewed-by: Gustavo Sousa gustavo.sousa@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230821180619.650007-12-matth... Stable-dep-of: 186bce682772 ("drm/i915/mtl: Update workaround 14018575942") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 4 +--- .../drm/i915/gt/intel_execlists_submission.c | 4 +--- drivers/gpu/drm/i915/gt/intel_reset.c | 18 ++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_reset.h | 2 ++ drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +--- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 +--- 6 files changed, 24 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index e85d70a62123f..84a75c95f3f7d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1616,9 +1616,7 @@ static int __intel_engine_stop_cs(struct intel_engine_cs *engine, * Wa_22011802037: Prior to doing a reset, ensure CS is * stopped, set ring stop bit and prefetch disable bit to halt CS */ - if (IS_MTL_GRAPHICS_STEP(engine->i915, M, STEP_A0, STEP_B0) || - (GRAPHICS_VER(engine->i915) >= 11 && - GRAPHICS_VER_FULL(engine->i915) < IP_VER(12, 70))) + if (intel_engine_reset_needs_wa_22011802037(engine->gt)) intel_uncore_write_fw(uncore, RING_MODE_GEN7(engine->mmio_base), _MASKED_BIT_ENABLE(GEN12_GFX_PREFETCH_DISABLE));
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 5a720e2523126..42e09f1589205 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3001,9 +3001,7 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine) * Wa_22011802037: In addition to stopping the cs, we need * to wait for any pending mi force wakeups */ - if (IS_MTL_GRAPHICS_STEP(engine->i915, M, STEP_A0, STEP_B0) || - (GRAPHICS_VER(engine->i915) >= 11 && - GRAPHICS_VER_FULL(engine->i915) < IP_VER(12, 70))) + if (intel_engine_reset_needs_wa_22011802037(engine->gt)) intel_engine_wait_for_pending_mi_fw(engine);
engine->execlists.reset_ccid = active_ccid(engine); diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 5fa57a34cf4bb..3a3f71ce3cb77 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -1632,6 +1632,24 @@ void __intel_fini_wedge(struct intel_wedge_me *w) w->gt = NULL; }
+/* + * Wa_22011802037 requires that we (or the GuC) ensure that no command + * streamers are executing MI_FORCE_WAKE while an engine reset is initiated. + */ +bool intel_engine_reset_needs_wa_22011802037(struct intel_gt *gt) +{ + if (GRAPHICS_VER(gt->i915) < 11) + return false; + + if (IS_MTL_GRAPHICS_STEP(gt->i915, M, STEP_A0, STEP_B0)) + return true; + + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70)) + return false; + + return true; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftest_reset.c" #include "selftest_hangcheck.c" diff --git a/drivers/gpu/drm/i915/gt/intel_reset.h b/drivers/gpu/drm/i915/gt/intel_reset.h index 25c975b6e8fc0..f615b30b81c59 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.h +++ b/drivers/gpu/drm/i915/gt/intel_reset.h @@ -78,4 +78,6 @@ void __intel_fini_wedge(struct intel_wedge_me *w); bool intel_has_gpu_reset(const struct intel_gt *gt); bool intel_has_reset_engine(const struct intel_gt *gt);
+bool intel_engine_reset_needs_wa_22011802037(struct intel_gt *gt); + #endif /* I915_RESET_H */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 82a2ecc12b212..da967938fea58 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -288,9 +288,7 @@ static u32 guc_ctl_wa_flags(struct intel_guc *guc) flags |= GUC_WA_DUAL_QUEUE;
/* Wa_22011802037: graphics version 11/12 */ - if (IS_MTL_GRAPHICS_STEP(gt->i915, M, STEP_A0, STEP_B0) || - (GRAPHICS_VER(gt->i915) >= 11 && - GRAPHICS_VER_FULL(gt->i915) < IP_VER(12, 70))) + if (intel_engine_reset_needs_wa_22011802037(gt)) flags |= GUC_WA_PRE_PARSER;
/* diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 836e4d9d65ef6..7a3e02ea56639 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1690,9 +1690,7 @@ static void guc_engine_reset_prepare(struct intel_engine_cs *engine) * Wa_22011802037: In addition to stopping the cs, we need * to wait for any pending mi force wakeups */ - if (IS_MTL_GRAPHICS_STEP(engine->i915, M, STEP_A0, STEP_B0) || - (GRAPHICS_VER(engine->i915) >= 11 && - GRAPHICS_VER_FULL(engine->i915) < IP_VER(12, 70))) { + if (intel_engine_reset_needs_wa_22011802037(engine->gt)) { intel_engine_stop_cs(engine); intel_engine_wait_for_pending_mi_fw(engine); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matt Roper matthew.d.roper@intel.com
[ Upstream commit f7696ded7c9e358670dae1801660f442f059c7db ]
Although some of our Xe_LPG workarounds were already being applied based on IP version correctly, others were matching on MTL as a base platform, which is incorrect. Although MTL is the only platform right now that uses Xe_LPG IP, this may not always be the case. If a future platform re-uses this graphics IP, the same workarounds should be applied, even if it isn't a "MTL" platform.
We were also incorrectly applying Xe_LPG workarounds/tuning to the Xe_LPM+ media IP in one or two places; we should make sure that we don't try to apply graphics workarounds to the media GT and vice versa where they don't belong. A new helper macro IS_GT_IP_RANGE() is added to help ensure this is handled properly -- it checks that the GT matches the IP type being tested as well as the IP version falling in the proper range.
Note that many of the stepping-based workarounds are still incorrectly checking for a MTL base platform; that will be remedied in a later patch.
v2: - Rework macro into a slightly more generic IS_GT_IP_RANGE() that can be used for either GFX or MEDIA checks.
v3: - Switch back to separate macros for gfx and media. (Jani) - Move macro to intel_gt.h. (Andi)
Cc: Gustavo Sousa gustavo.sousa@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jani Nikula jani.nikula@linux.intel.com Cc: Andi Shyti andi.shyti@linux.intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230821180619.650007-14-matth... Stable-dep-of: 186bce682772 ("drm/i915/mtl: Update workaround 14018575942") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/intel_gt.h | 11 ++++++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 38 +++++++++++---------- 2 files changed, 31 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 6c34547b58b59..15c25980411db 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -14,6 +14,17 @@ struct drm_i915_private; struct drm_printer;
+/* + * Check that the GT is a graphics GT and has an IP version within the + * specified range (inclusive). + */ +#define IS_GFX_GT_IP_RANGE(gt, from, until) ( \ + BUILD_BUG_ON_ZERO((from) < IP_VER(2, 0)) + \ + BUILD_BUG_ON_ZERO((until) < (from)) + \ + ((gt)->type != GT_MEDIA && \ + GRAPHICS_VER_FULL((gt)->i915) >= (from) && \ + GRAPHICS_VER_FULL((gt)->i915) <= (until))) + #define GT_TRACE(gt, fmt, ...) do { \ const struct intel_gt *gt__ __maybe_unused = (gt); \ GEM_TRACE("%s " fmt, dev_name(gt__->i915->drm.dev), \ diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 69973dc518280..4c24f3897aee1 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -781,8 +781,8 @@ static void dg2_ctx_workarounds_init(struct intel_engine_cs *engine, wa_masked_en(wal, CACHE_MODE_1, MSAA_OPTIMIZATION_REDUC_DISABLE); }
-static void mtl_ctx_gt_tuning_init(struct intel_engine_cs *engine, - struct i915_wa_list *wal) +static void xelpg_ctx_gt_tuning_init(struct intel_engine_cs *engine, + struct i915_wa_list *wal) { struct drm_i915_private *i915 = engine->i915;
@@ -793,12 +793,12 @@ static void mtl_ctx_gt_tuning_init(struct intel_engine_cs *engine, wa_add(wal, DRAW_WATERMARK, VERT_WM_VAL, 0x3FF, 0, false); }
-static void mtl_ctx_workarounds_init(struct intel_engine_cs *engine, - struct i915_wa_list *wal) +static void xelpg_ctx_workarounds_init(struct intel_engine_cs *engine, + struct i915_wa_list *wal) { struct drm_i915_private *i915 = engine->i915;
- mtl_ctx_gt_tuning_init(engine, wal); + xelpg_ctx_gt_tuning_init(engine, wal);
if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0)) { @@ -907,8 +907,8 @@ __intel_engine_init_ctx_wa(struct intel_engine_cs *engine, if (engine->class != RENDER_CLASS) goto done;
- if (IS_METEORLAKE(i915)) - mtl_ctx_workarounds_init(engine, wal); + if (IS_GFX_GT_IP_RANGE(engine->gt, IP_VER(12, 70), IP_VER(12, 71))) + xelpg_ctx_workarounds_init(engine, wal); else if (IS_PONTEVECCHIO(i915)) ; /* noop; none at this time */ else if (IS_DG2(i915)) @@ -1688,10 +1688,8 @@ xelpmp_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal) */ static void gt_tuning_settings(struct intel_gt *gt, struct i915_wa_list *wal) { - if (IS_METEORLAKE(gt->i915)) { - if (gt->type != GT_MEDIA) - wa_mcr_write_or(wal, XEHP_L3SCQREG7, BLEND_FILL_CACHING_OPT_DIS); - + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71))) { + wa_mcr_write_or(wal, XEHP_L3SCQREG7, BLEND_FILL_CACHING_OPT_DIS); wa_mcr_write_or(wal, XEHP_SQCM, EN_32B_ACCESS); }
@@ -1723,7 +1721,7 @@ gt_init_workarounds(struct intel_gt *gt, struct i915_wa_list *wal) return; }
- if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)) + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71))) xelpg_gt_workarounds_init(gt, wal); else if (IS_PONTEVECCHIO(i915)) pvc_gt_workarounds_init(gt, wal); @@ -2172,7 +2170,7 @@ static void pvc_whitelist_build(struct intel_engine_cs *engine) blacklist_trtt(engine); }
-static void mtl_whitelist_build(struct intel_engine_cs *engine) +static void xelpg_whitelist_build(struct intel_engine_cs *engine) { struct i915_wa_list *w = &engine->whitelist;
@@ -2194,8 +2192,10 @@ void intel_engine_init_whitelist(struct intel_engine_cs *engine)
wa_init_start(w, engine->gt, "whitelist", engine->name);
- if (IS_METEORLAKE(i915)) - mtl_whitelist_build(engine); + if (engine->gt->type == GT_MEDIA) + ; /* none yet */ + else if (IS_GFX_GT_IP_RANGE(engine->gt, IP_VER(12, 70), IP_VER(12, 71))) + xelpg_whitelist_build(engine); else if (IS_PONTEVECCHIO(i915)) pvc_whitelist_build(engine); else if (IS_DG2(i915)) @@ -2795,10 +2795,12 @@ ccs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) * function invoked by __intel_engine_init_ctx_wa(). */ static void -add_render_compute_tuning_settings(struct drm_i915_private *i915, +add_render_compute_tuning_settings(struct intel_gt *gt, struct i915_wa_list *wal) { - if (IS_METEORLAKE(i915) || IS_DG2(i915)) + struct drm_i915_private *i915 = gt->i915; + + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71)) || IS_DG2(i915)) wa_mcr_write_clr_set(wal, RT_CTRL, STACKID_CTRL, STACKID_CTRL_512);
/* @@ -2828,7 +2830,7 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li { struct drm_i915_private *i915 = engine->i915;
- add_render_compute_tuning_settings(i915, wal); + add_render_compute_tuning_settings(engine->gt, wal);
if (GRAPHICS_VER(i915) >= 11) { /* This is not a Wa (although referred to as
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matt Roper matthew.d.roper@intel.com
[ Upstream commit 5a213086a025349361b5cf75c8fd4591d96a7a99 ]
Several workarounds are guarded by IS_MTL_GRAPHICS_STEP. However none of these workarounds are actually tied to MTL as a platform; they only relate to the Xe_LPG graphics IP, regardless of what platform it appears in. At the moment MTL is the only platform that uses Xe_LPG with IP versions 12.70 and 12.71, but we can't count on this being true in the future. Switch these to use a new IS_GFX_GT_IP_STEP() macro instead that is purely based on IP version. IS_GFX_GT_IP_STEP() is also GT-based rather than device-based, which will help prevent mistakes where we accidentally try to apply Xe_LPG graphics workarounds to the Xe_LPM+ media GT and vice-versa.
v2: - Switch to a more generic and shorter IS_GT_IP_STEP macro that can be used for both graphics and media IP (and any other kind of GTs that show up in the future). v3: - Switch back to long-form IS_GFX_GT_IP_STEP macro. (Jani) - Move macro to intel_gt.h. (Andi) v4: - Build IS_GFX_GT_IP_STEP on top of IS_GFX_GT_IP_RANGE and IS_GRAPHICS_STEP building blocks and name the parameters from/until rather than begin/fixed. (Jani) - Fix usage examples in comment. v5: - Tweak comment on macro. (Gustavo)
Cc: Gustavo Sousa gustavo.sousa@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Andi Shyti andi.shyti@linux.intel.com Cc: Jani Nikula jani.nikula@linux.intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com Reviewed-by: Gustavo Sousa gustavo.sousa@intel.com Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230821180619.650007-15-matth... Stable-dep-of: 186bce682772 ("drm/i915/mtl: Update workaround 14018575942") Signed-off-by: Sasha Levin sashal@kernel.org --- .../drm/i915/display/skl_universal_plane.c | 5 +- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 11 +++-- drivers/gpu/drm/i915/gt/intel_gt.h | 20 ++++++++ drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 7 ++- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +- drivers/gpu/drm/i915/gt/intel_reset.c | 2 +- drivers/gpu/drm/i915/gt/intel_workarounds.c | 48 ++++++++++--------- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 2 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- drivers/gpu/drm/i915/i915_drv.h | 4 -- 10 files changed, 62 insertions(+), 43 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index ffc15d278a39d..d557ecd4e1ebe 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -20,6 +20,7 @@ #include "skl_scaler.h" #include "skl_universal_plane.h" #include "skl_watermark.h" +#include "gt/intel_gt.h" #include "pxp/intel_pxp.h"
static const u32 skl_plane_formats[] = { @@ -2169,8 +2170,8 @@ static bool skl_plane_has_rc_ccs(struct drm_i915_private *i915, enum pipe pipe, enum plane_id plane_id) { /* Wa_14017240301 */ - if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0)) + if (IS_GFX_GT_IP_STEP(to_gt(i915), IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(to_gt(i915), IP_VER(12, 71), STEP_A0, STEP_B0)) return false;
/* Wa_22011186057 */ diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index 7ad36198aab2a..3ac3e12d9c524 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -4,9 +4,9 @@ */
#include "gen8_engine_cs.h" -#include "i915_drv.h" #include "intel_engine_regs.h" #include "intel_gpu_commands.h" +#include "intel_gt.h" #include "intel_lrc.h" #include "intel_ring.h"
@@ -226,8 +226,8 @@ u32 *gen12_emit_aux_table_inv(struct intel_engine_cs *engine, u32 *cs) static int mtl_dummy_pipe_control(struct i915_request *rq) { /* Wa_14016712196 */ - if (IS_MTL_GRAPHICS_STEP(rq->i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(rq->i915, P, STEP_A0, STEP_B0)) { + if (IS_GFX_GT_IP_STEP(rq->engine->gt, IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(rq->engine->gt, IP_VER(12, 71), STEP_A0, STEP_B0)) { u32 *cs;
/* dummy PIPE_CONTROL + depth flush */ @@ -808,6 +808,7 @@ u32 *gen12_emit_fini_breadcrumb_xcs(struct i915_request *rq, u32 *cs) u32 *gen12_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) { struct drm_i915_private *i915 = rq->i915; + struct intel_gt *gt = rq->engine->gt; u32 flags = (PIPE_CONTROL_CS_STALL | PIPE_CONTROL_TLB_INVALIDATE | PIPE_CONTROL_TILE_CACHE_FLUSH | @@ -818,8 +819,8 @@ u32 *gen12_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) PIPE_CONTROL_FLUSH_ENABLE);
/* Wa_14016712196 */ - if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0)) + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0)) /* dummy PIPE_CONTROL + depth flush */ cs = gen12_emit_pipe_control(cs, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH, 0); diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 15c25980411db..6e63b46682f76 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -25,6 +25,26 @@ struct drm_printer; GRAPHICS_VER_FULL((gt)->i915) >= (from) && \ GRAPHICS_VER_FULL((gt)->i915) <= (until)))
+/* + * Check that the GT is a graphics GT with a specific IP version and has + * a stepping in the range [from, until). The lower stepping bound is + * inclusive, the upper bound is exclusive. The most common use-case of this + * macro is for checking bounds for workarounds, which usually have a stepping + * ("from") at which the hardware issue is first present and another stepping + * ("until") at which a hardware fix is present and the software workaround is + * no longer necessary. E.g., + * + * IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) + * IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_B1, STEP_FOREVER) + * + * "STEP_FOREVER" can be passed as "until" for workarounds that have no upper + * stepping bound for the specified IP version. + */ +#define IS_GFX_GT_IP_STEP(gt, ipver, from, until) ( \ + BUILD_BUG_ON_ZERO((until) <= (from)) + \ + (IS_GFX_GT_IP_RANGE((gt), (ipver), (ipver)) && \ + IS_GRAPHICS_STEP((gt)->i915, (from), (until)))) + #define GT_TRACE(gt, fmt, ...) do { \ const struct intel_gt *gt__ __maybe_unused = (gt); \ GEM_TRACE("%s " fmt, dev_name(gt__->i915->drm.dev), \ diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c index 2c0f1f3e28ff8..c6dec485aefbe 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c @@ -3,8 +3,7 @@ * Copyright © 2022 Intel Corporation */
-#include "i915_drv.h" - +#include "intel_gt.h" #include "intel_gt_mcr.h" #include "intel_gt_print.h" #include "intel_gt_regs.h" @@ -166,8 +165,8 @@ void intel_gt_mcr_init(struct intel_gt *gt) gt->steering_table[OADDRM] = xelpmp_oaddrm_steering_table; } else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)) { /* Wa_14016747170 */ - if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0)) + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0)) fuse = REG_FIELD_GET(MTL_GT_L3_EXC_MASK, intel_uncore_read(gt->uncore, MTL_GT_ACTIVITY_FACTOR)); diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index f297c5808e7c8..b99efa348ad1e 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1347,8 +1347,8 @@ gen12_emit_indirect_ctx_rcs(const struct intel_context *ce, u32 *cs) cs = gen12_emit_aux_table_inv(ce->engine, cs);
/* Wa_16014892111 */ - if (IS_MTL_GRAPHICS_STEP(ce->engine->i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(ce->engine->i915, P, STEP_A0, STEP_B0) || + if (IS_GFX_GT_IP_STEP(ce->engine->gt, IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(ce->engine->gt, IP_VER(12, 71), STEP_A0, STEP_B0) || IS_DG2(ce->engine->i915)) cs = dg2_emit_draw_watermark_setting(cs);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 3a3f71ce3cb77..63d0892d3c45a 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -1641,7 +1641,7 @@ bool intel_engine_reset_needs_wa_22011802037(struct intel_gt *gt) if (GRAPHICS_VER(gt->i915) < 11) return false;
- if (IS_MTL_GRAPHICS_STEP(gt->i915, M, STEP_A0, STEP_B0)) + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0)) return true;
if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70)) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 4c24f3897aee1..b6237e999be93 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -784,24 +784,24 @@ static void dg2_ctx_workarounds_init(struct intel_engine_cs *engine, static void xelpg_ctx_gt_tuning_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) { - struct drm_i915_private *i915 = engine->i915; + struct intel_gt *gt = engine->gt;
dg2_ctx_gt_tuning_init(engine, wal);
- if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_B0, STEP_FOREVER) || - IS_MTL_GRAPHICS_STEP(i915, P, STEP_B0, STEP_FOREVER)) + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_B0, STEP_FOREVER) || + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_B0, STEP_FOREVER)) wa_add(wal, DRAW_WATERMARK, VERT_WM_VAL, 0x3FF, 0, false); }
static void xelpg_ctx_workarounds_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) { - struct drm_i915_private *i915 = engine->i915; + struct intel_gt *gt = engine->gt;
xelpg_ctx_gt_tuning_init(engine, wal);
- if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0)) { + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0)) { /* Wa_14014947963 */ wa_masked_field_set(wal, VF_PREEMPTION, PREEMPTION_VERTEX_COUNT, 0x4000); @@ -1644,8 +1644,8 @@ xelpg_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal) /* Wa_22016670082 */ wa_write_or(wal, GEN12_SQCNT1, GEN12_STRICT_RAR_ENABLE);
- if (IS_MTL_GRAPHICS_STEP(gt->i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(gt->i915, P, STEP_A0, STEP_B0)) { + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0)) { /* Wa_14014830051 */ wa_mcr_write_clr(wal, SARB_CHICKEN1, COMP_CKN_IN);
@@ -2297,23 +2297,24 @@ static void rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) { struct drm_i915_private *i915 = engine->i915; + struct intel_gt *gt = engine->gt;
- if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0)) { + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0)) { /* Wa_22014600077 */ wa_mcr_masked_en(wal, GEN10_CACHE_MODE_SS, ENABLE_EU_COUNT_FOR_TDL_FLUSH); }
- if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0) || + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0) || IS_DG2(i915)) { /* Wa_1509727124 */ wa_mcr_masked_en(wal, GEN10_SAMPLER_MODE, SC_DISABLE_POWER_OPTIMIZATION_EBB); }
- if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || IS_DG2(i915)) { /* Wa_22012856258 */ wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN2, @@ -2829,8 +2830,9 @@ static void general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) { struct drm_i915_private *i915 = engine->i915; + struct intel_gt *gt = engine->gt;
- add_render_compute_tuning_settings(engine->gt, wal); + add_render_compute_tuning_settings(gt, wal);
if (GRAPHICS_VER(i915) >= 11) { /* This is not a Wa (although referred to as @@ -2851,13 +2853,13 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE); }
- if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_B0, STEP_FOREVER) || - IS_MTL_GRAPHICS_STEP(i915, P, STEP_B0, STEP_FOREVER)) + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_B0, STEP_FOREVER) || + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_B0, STEP_FOREVER)) /* Wa_14017856879 */ wa_mcr_masked_en(wal, GEN9_ROW_CHICKEN3, MTL_DISABLE_FIX_FOR_EOT_FLUSH);
- if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0)) + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0)) /* * Wa_14017066071 * Wa_14017654203 @@ -2865,13 +2867,13 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li wa_mcr_masked_en(wal, GEN10_SAMPLER_MODE, MTL_DISABLE_SAMPLER_SC_OOO);
- if (IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0)) + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0)) /* Wa_22015279794 */ wa_mcr_masked_en(wal, GEN10_CACHE_MODE_SS, DISABLE_PREFETCH_INTO_IC);
- if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0) || + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0) || IS_DG2(i915)) { /* Wa_22013037850 */ wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, @@ -2881,8 +2883,8 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li wa_masked_en(wal, VFG_PREEMPTION_CHICKEN, POLYGON_TRIFAN_LINELOOP_DISABLE); }
- if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || - IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0) || + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0) || IS_PONTEVECCHIO(i915) || IS_DG2(i915)) { /* Wa_22014226127 */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index da967938fea58..861d0c58388cf 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -273,7 +273,7 @@ static u32 guc_ctl_wa_flags(struct intel_guc *guc) flags |= GUC_WA_POLLCS;
/* Wa_14014475959 */ - if (IS_MTL_GRAPHICS_STEP(gt->i915, M, STEP_A0, STEP_B0) || + if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || IS_DG2(gt->i915)) flags |= GUC_WA_HOLD_CCS_SWITCHOUT;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 7a3e02ea56639..b5de5a9f59671 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -4297,7 +4297,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
/* Wa_14014475959:dg2 */ if (engine->class == COMPUTE_CLASS) - if (IS_MTL_GRAPHICS_STEP(engine->i915, M, STEP_A0, STEP_B0) || + if (IS_GFX_GT_IP_STEP(engine->gt, IP_VER(12, 70), STEP_A0, STEP_B0) || IS_DG2(engine->i915)) engine->flags |= I915_ENGINE_USES_WA_HOLD_CCS_SWITCHOUT;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 7a8ce7239bc9e..e0e0493d6c1f0 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -658,10 +658,6 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, #define IS_XEHPSDV_GRAPHICS_STEP(__i915, since, until) \ (IS_XEHPSDV(__i915) && IS_GRAPHICS_STEP(__i915, since, until))
-#define IS_MTL_GRAPHICS_STEP(__i915, variant, since, until) \ - (IS_SUBPLATFORM(__i915, INTEL_METEORLAKE, INTEL_SUBPLATFORM_##variant) && \ - IS_GRAPHICS_STEP(__i915, since, until)) - #define IS_MTL_DISPLAY_STEP(__i915, since, until) \ (IS_METEORLAKE(__i915) && \ IS_DISPLAY_STEP(__i915, since, until))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matt Roper matthew.d.roper@intel.com
[ Upstream commit 14128d64090fa88445376cb8ccf91c50c08bd410 ]
Many of the IS_METEORLAKE conditions throughout the driver are supposed to be checks for Xe_LPG and/or Xe_LPM+ IP, not for the MTL platform specifically. Update those checks to ensure that the code will still operate properly if/when these IP versions show up on future platforms.
v2: - Update two more conditions (one for pg_enable, one for MTL HuC compatibility). v3: - Don't change GuC/HuC compatibility check, which sounds like it truly is specific to the MTL platform. (Gustavo) - Drop a non-lineage workaround number for the OA timestamp frequency workaround. (Gustavo)
Cc: Gustavo Sousa gustavo.sousa@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com Reviewed-by: Gustavo Sousa gustavo.sousa@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230821180619.650007-20-matth... Stable-dep-of: 186bce682772 ("drm/i915/mtl: Update workaround 14018575942") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 4 ++-- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_mocs.c | 2 +- drivers/gpu/drm/i915/gt/intel_rc6.c | 2 +- drivers/gpu/drm/i915/gt/intel_reset.c | 2 +- drivers/gpu/drm/i915/gt/intel_rps.c | 2 +- drivers/gpu/drm/i915/i915_debugfs.c | 2 +- drivers/gpu/drm/i915/i915_perf.c | 11 +++++------ 8 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index d24c0ce8805c7..19156ba4b9ef4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -405,8 +405,8 @@ static int ext_set_pat(struct i915_user_extension __user *base, void *data) BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) != offsetofend(struct drm_i915_gem_create_ext_set_pat, rsvd));
- /* Limiting the extension only to Meteor Lake */ - if (!IS_METEORLAKE(i915)) + /* Limiting the extension only to Xe_LPG and beyond */ + if (GRAPHICS_VER_FULL(i915) < IP_VER(12, 70)) return -ENODEV;
if (copy_from_user(&ext, base, sizeof(ext))) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index a95615b345cd7..5a3a5b29d1507 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -21,7 +21,7 @@ static void intel_gsc_idle_msg_enable(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915;
- if (IS_METEORLAKE(i915) && engine->id == GSC0) { + if (MEDIA_VER(i915) >= 13 && engine->id == GSC0) { intel_uncore_write(engine->gt->uncore, RC_PSMI_CTRL_GSCCS, _MASKED_BIT_DISABLE(IDLE_MSG_DISABLE)); diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c index bf8b42d2d3279..07269ff3be136 100644 --- a/drivers/gpu/drm/i915/gt/intel_mocs.c +++ b/drivers/gpu/drm/i915/gt/intel_mocs.c @@ -495,7 +495,7 @@ static unsigned int get_mocs_settings(const struct drm_i915_private *i915, memset(table, 0, sizeof(struct drm_i915_mocs_table));
table->unused_entries_index = I915_MOCS_PTE; - if (IS_METEORLAKE(i915)) { + if (IS_GFX_GT_IP_RANGE(&i915->gt0, IP_VER(12, 70), IP_VER(12, 71))) { table->size = ARRAY_SIZE(mtl_mocs_table); table->table = mtl_mocs_table; table->n_entries = MTL_NUM_MOCS_ENTRIES; diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c index b8c9338176bd6..9e113e9473260 100644 --- a/drivers/gpu/drm/i915/gt/intel_rc6.c +++ b/drivers/gpu/drm/i915/gt/intel_rc6.c @@ -123,7 +123,7 @@ static void gen11_rc6_enable(struct intel_rc6 *rc6) * temporary wa and should be removed after fixing real cause * of forcewake timeouts. */ - if (IS_METEORLAKE(gt->i915)) + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71))) pg_enable = GEN9_MEDIA_PG_ENABLE | GEN11_MEDIA_SAMPLER_PG_ENABLE; diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 63d0892d3c45a..13fb8e5042c58 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -705,7 +705,7 @@ static int __reset_guc(struct intel_gt *gt)
static bool needs_wa_14015076503(struct intel_gt *gt, intel_engine_mask_t engine_mask) { - if (!IS_METEORLAKE(gt->i915) || !HAS_ENGINE(gt, GSC0)) + if (MEDIA_VER_FULL(gt->i915) != IP_VER(13, 0) || !HAS_ENGINE(gt, GSC0)) return false;
if (!__HAS_ENGINE(engine_mask, GSC0)) diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c index 092542f53aad9..4feef874e6d69 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.c +++ b/drivers/gpu/drm/i915/gt/intel_rps.c @@ -1161,7 +1161,7 @@ void gen6_rps_get_freq_caps(struct intel_rps *rps, struct intel_rps_freq_caps *c { struct drm_i915_private *i915 = rps_to_i915(rps);
- if (IS_METEORLAKE(i915)) + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)) return mtl_get_freq_caps(rps, caps); else return __gen6_rps_get_freq_caps(rps, caps); diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 4de44cf1026dc..7a90a2e32c9f1 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -144,7 +144,7 @@ static const char *i915_cache_level_str(struct drm_i915_gem_object *obj) { struct drm_i915_private *i915 = obj_to_i915(obj);
- if (IS_METEORLAKE(i915)) { + if (IS_GFX_GT_IP_RANGE(to_gt(i915), IP_VER(12, 70), IP_VER(12, 71))) { switch (obj->pat_index) { case 0: return " WB"; case 1: return " WT"; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 8f4a25d2cfc24..48ea17b49b3a0 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -3255,11 +3255,10 @@ get_sseu_config(struct intel_sseu *out_sseu, */ u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915) { - /* - * Wa_18013179988:dg2 - * Wa_14015846243:mtl - */ - if (IS_DG2(i915) || IS_METEORLAKE(i915)) { + struct intel_gt *gt = to_gt(i915); + + /* Wa_18013179988 */ + if (IS_DG2(i915) || IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71))) { intel_wakeref_t wakeref; u32 reg, shift;
@@ -4564,7 +4563,7 @@ static bool xehp_is_valid_b_counter_addr(struct i915_perf *perf, u32 addr)
static bool gen12_is_valid_mux_addr(struct i915_perf *perf, u32 addr) { - if (IS_METEORLAKE(perf->i915)) + if (GRAPHICS_VER_FULL(perf->i915) >= IP_VER(12, 70)) return reg_in_range_table(addr, mtl_oa_mux_regs); else return reg_in_range_table(addr, gen12_oa_mux_regs);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tejas Upadhyay tejas.upadhyay@intel.com
[ Upstream commit 7467e1da906468bcbd311023b30708193103ecf9 ]
Now this workaround is permanent workaround on MTL and DG2, earlier we used to apply on MTL A0 step only. VLK-45480
Fixes: d922b80b1010 ("drm/i915/gt: Add workaround 14016712196") Signed-off-by: Tejas Upadhyay tejas.upadhyay@intel.com Acked-by: Nirmoy Das nirmoy.das@intel.com Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230828063450.2642748-1-tejas... Stable-dep-of: 186bce682772 ("drm/i915/mtl: Update workaround 14018575942") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index 3ac3e12d9c524..ba4c2422b3402 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -226,8 +226,8 @@ u32 *gen12_emit_aux_table_inv(struct intel_engine_cs *engine, u32 *cs) static int mtl_dummy_pipe_control(struct i915_request *rq) { /* Wa_14016712196 */ - if (IS_GFX_GT_IP_STEP(rq->engine->gt, IP_VER(12, 70), STEP_A0, STEP_B0) || - IS_GFX_GT_IP_STEP(rq->engine->gt, IP_VER(12, 71), STEP_A0, STEP_B0)) { + if (IS_GFX_GT_IP_RANGE(rq->engine->gt, IP_VER(12, 70), IP_VER(12, 71)) || + IS_DG2(rq->i915)) { u32 *cs;
/* dummy PIPE_CONTROL + depth flush */ @@ -819,8 +819,7 @@ u32 *gen12_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) PIPE_CONTROL_FLUSH_ENABLE);
/* Wa_14016712196 */ - if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || - IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0)) + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71)) || IS_DG2(i915)) /* dummy PIPE_CONTROL + depth flush */ cs = gen12_emit_pipe_control(cs, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH, 0);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matt Roper matthew.d.roper@intel.com
[ Upstream commit c44d4ef47fdad0a33966de89f9064e19736bb52f ]
Some of our existing Xe_LPG workarounds and tuning are also applicable to the version 12.74 variant. Extend the condition bounds accordingly. Also fix the comment on Wa_14018575942 while we're at it.
v2: Extend some more workarounds (Harish)
Signed-off-by: Matt Roper matthew.d.roper@intel.com Signed-off-by: Harish Chegondi harish.chegondi@intel.com Signed-off-by: Haridhar Kalvala haridhar.kalvala@intel.com Reviewed-by: Matt Atwood matthew.s.atwood@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240108122738.14399-4-haridha... Stable-dep-of: 186bce682772 ("drm/i915/mtl: Update workaround 14018575942") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 4 ++-- drivers/gpu/drm/i915/gt/intel_workarounds.c | 24 +++++++++++++-------- drivers/gpu/drm/i915/i915_perf.c | 2 +- 3 files changed, 18 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index ba4c2422b3402..cddf8c16e9a72 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -226,7 +226,7 @@ u32 *gen12_emit_aux_table_inv(struct intel_engine_cs *engine, u32 *cs) static int mtl_dummy_pipe_control(struct i915_request *rq) { /* Wa_14016712196 */ - if (IS_GFX_GT_IP_RANGE(rq->engine->gt, IP_VER(12, 70), IP_VER(12, 71)) || + if (IS_GFX_GT_IP_RANGE(rq->engine->gt, IP_VER(12, 70), IP_VER(12, 74)) || IS_DG2(rq->i915)) { u32 *cs;
@@ -819,7 +819,7 @@ u32 *gen12_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) PIPE_CONTROL_FLUSH_ENABLE);
/* Wa_14016712196 */ - if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71)) || IS_DG2(i915)) + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 74)) || IS_DG2(i915)) /* dummy PIPE_CONTROL + depth flush */ cs = gen12_emit_pipe_control(cs, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH, 0); diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index b6237e999be93..37b2b0440923f 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -788,8 +788,13 @@ static void xelpg_ctx_gt_tuning_init(struct intel_engine_cs *engine,
dg2_ctx_gt_tuning_init(engine, wal);
- if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_B0, STEP_FOREVER) || - IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_B0, STEP_FOREVER)) + /* + * Due to Wa_16014892111, the DRAW_WATERMARK tuning must be done in + * gen12_emit_indirect_ctx_rcs() rather than here on some early + * steppings. + */ + if (!(IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) || + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0))) wa_add(wal, DRAW_WATERMARK, VERT_WM_VAL, 0x3FF, 0, false); }
@@ -907,7 +912,7 @@ __intel_engine_init_ctx_wa(struct intel_engine_cs *engine, if (engine->class != RENDER_CLASS) goto done;
- if (IS_GFX_GT_IP_RANGE(engine->gt, IP_VER(12, 70), IP_VER(12, 71))) + if (IS_GFX_GT_IP_RANGE(engine->gt, IP_VER(12, 70), IP_VER(12, 74))) xelpg_ctx_workarounds_init(engine, wal); else if (IS_PONTEVECCHIO(i915)) ; /* noop; none at this time */ @@ -1638,7 +1643,7 @@ pvc_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal) static void xelpg_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal) { - /* Wa_14018778641 / Wa_18018781329 */ + /* Wa_14018575942 / Wa_18018781329 */ wa_mcr_write_or(wal, COMP_MOD_CTRL, FORCE_MISS_FTLB);
/* Wa_22016670082 */ @@ -1688,7 +1693,7 @@ xelpmp_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal) */ static void gt_tuning_settings(struct intel_gt *gt, struct i915_wa_list *wal) { - if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71))) { + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 74))) { wa_mcr_write_or(wal, XEHP_L3SCQREG7, BLEND_FILL_CACHING_OPT_DIS); wa_mcr_write_or(wal, XEHP_SQCM, EN_32B_ACCESS); } @@ -1721,7 +1726,7 @@ gt_init_workarounds(struct intel_gt *gt, struct i915_wa_list *wal) return; }
- if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71))) + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 74))) xelpg_gt_workarounds_init(gt, wal); else if (IS_PONTEVECCHIO(i915)) pvc_gt_workarounds_init(gt, wal); @@ -2194,7 +2199,7 @@ void intel_engine_init_whitelist(struct intel_engine_cs *engine)
if (engine->gt->type == GT_MEDIA) ; /* none yet */ - else if (IS_GFX_GT_IP_RANGE(engine->gt, IP_VER(12, 70), IP_VER(12, 71))) + else if (IS_GFX_GT_IP_RANGE(engine->gt, IP_VER(12, 70), IP_VER(12, 74))) xelpg_whitelist_build(engine); else if (IS_PONTEVECCHIO(i915)) pvc_whitelist_build(engine); @@ -2801,7 +2806,7 @@ add_render_compute_tuning_settings(struct intel_gt *gt, { struct drm_i915_private *i915 = gt->i915;
- if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71)) || IS_DG2(i915)) + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 74)) || IS_DG2(i915)) wa_mcr_write_clr_set(wal, RT_CTRL, STACKID_CTRL, STACKID_CTRL_512);
/* @@ -2854,7 +2859,8 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li }
if (IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_B0, STEP_FOREVER) || - IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_B0, STEP_FOREVER)) + IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_B0, STEP_FOREVER) || + IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 74), IP_VER(12, 74))) /* Wa_14017856879 */ wa_mcr_masked_en(wal, GEN9_ROW_CHICKEN3, MTL_DISABLE_FIX_FOR_EOT_FLUSH);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 48ea17b49b3a0..3f90403d86cb4 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -3258,7 +3258,7 @@ u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915) struct intel_gt *gt = to_gt(i915);
/* Wa_18013179988 */ - if (IS_DG2(i915) || IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71))) { + if (IS_DG2(i915) || IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 74))) { intel_wakeref_t wakeref; u32 reg, shift;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tejas Upadhyay tejas.upadhyay@intel.com
[ Upstream commit 186bce682772e7346bf7ced5325b5f4ff050ccfb ]
Applying WA 14018575942 only on Compute engine has impact on some apps like chrome. Updating this WA to apply on Render engine as well as it is helping with performance on Chrome.
Note: There is no concern from media team thus not applying WA on media engines. We will revisit if any issues reported from media team.
V2(Matt): - Use correct WA number
Fixes: 668f37e1ee11 ("drm/i915/mtl: Update workaround 14018778641") Signed-off-by: Tejas Upadhyay tejas.upadhyay@intel.com Reviewed-by: Matt Roper matthew.d.roper@intel.com Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240228103738.2018458-1-tejas... (cherry picked from commit 71271280175aa0ed6673e40cce7c01296bcd05f6) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/intel_workarounds.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 37b2b0440923f..0ea52f77b4c72 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -1644,6 +1644,7 @@ static void xelpg_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal) { /* Wa_14018575942 / Wa_18018781329 */ + wa_mcr_write_or(wal, RENDER_MOD_CTRL, FORCE_MISS_FTLB); wa_mcr_write_or(wal, COMP_MOD_CTRL, FORCE_MISS_FTLB);
/* Wa_22016670082 */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 8e91c2342351e0f5ef6c0a704384a7f6fc70c3b2 ]
Depending on the value of CONFIG_HZ, clang complains about a pointless comparison:
drivers/md/dm-integrity.c:4085:12: error: result of comparison of constant 42949672950 with expression of type 'unsigned int' is always false [-Werror,-Wtautological-constant-out-of-range-compare] if (val >= (uint64_t)UINT_MAX * 1000 / HZ) {
As the check remains useful for other configurations, shut up the warning by adding a second type cast to uint64_t.
Fixes: 468dfca38b1a ("dm integrity: add a bitmap mode") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Mikulas Patocka mpatocka@redhat.com Reviewed-by: Justin Stitt justinstitt@google.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-integrity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c index e7cd27e387df1..470add73f7bda 100644 --- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -4231,7 +4231,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned int argc, char **argv } else if (sscanf(opt_string, "sectors_per_bit:%llu%c", &llval, &dummy) == 1) { log2_sectors_per_bitmap_bit = !llval ? 0 : __ilog2_u64(llval); } else if (sscanf(opt_string, "bitmap_flush_interval:%u%c", &val, &dummy) == 1) { - if (val >= (uint64_t)UINT_MAX * 1000 / HZ) { + if ((uint64_t)val >= (uint64_t)UINT_MAX * 1000 / HZ) { r = -EINVAL; ti->error = "Invalid bitmap_flush_interval argument"; goto bad;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Xu peterx@redhat.com
[ Upstream commit 0a845e0f6348ccfa2dcc8c450ffd1c9ffe8c4add ]
pud_large() is always defined as pud_leaf(). Merge their usages. Chose pud_leaf() because pud_leaf() is a global API, while pud_large() is not.
Link: https://lkml.kernel.org/r/20240305043750.93762-9-peterx@redhat.com Signed-off-by: Peter Xu peterx@redhat.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: "Aneesh Kumar K.V" aneesh.kumar@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Christophe Leroy christophe.leroy@csgroup.eu Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Ingo Molnar mingo@redhat.com Cc: Kirill A. Shutemov kirill@shutemov.name Cc: Michael Ellerman mpe@ellerman.id.au Cc: Muchun Song muchun.song@linux.dev Cc: "Naveen N. Rao" naveen.n.rao@linux.ibm.com Cc: Nicholas Piggin npiggin@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vincenzo Frascino vincenzo.frascino@arm.com Cc: Yang Shi shy828301@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Stable-dep-of: c567f2948f57 ("Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/mm/book3s64/pgtable.c | 2 +- arch/s390/boot/vmem.c | 2 +- arch/s390/include/asm/pgtable.h | 4 ++-- arch/s390/mm/gmap.c | 2 +- arch/s390/mm/hugetlbpage.c | 4 ++-- arch/s390/mm/pageattr.c | 2 +- arch/s390/mm/pgtable.c | 2 +- arch/s390/mm/vmem.c | 6 +++--- arch/sparc/mm/init_64.c | 2 +- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/mm/fault.c | 4 ++-- arch/x86/mm/ident_map.c | 2 +- arch/x86/mm/init_64.c | 4 ++-- arch/x86/mm/kasan_init_64.c | 2 +- arch/x86/mm/mem_encrypt_identity.c | 2 +- arch/x86/mm/pat/set_memory.c | 6 +++--- arch/x86/mm/pgtable.c | 2 +- arch/x86/mm/pti.c | 2 +- arch/x86/power/hibernate.c | 2 +- arch/x86/xen/mmu_pv.c | 4 ++-- 20 files changed, 29 insertions(+), 29 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 926bec775f41c..9822366dc186e 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -130,7 +130,7 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(pte_hw_valid(pud_pte(*pudp))); assert_spin_locked(pud_lockptr(mm, pudp)); - WARN_ON(!(pud_large(pud))); + WARN_ON(!(pud_leaf(pud))); #endif trace_hugepage_set_pud(addr, pud_val(pud)); return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud)); diff --git a/arch/s390/boot/vmem.c b/arch/s390/boot/vmem.c index 442a74f113cbf..14e1a73ffcfe6 100644 --- a/arch/s390/boot/vmem.c +++ b/arch/s390/boot/vmem.c @@ -360,7 +360,7 @@ static void pgtable_pud_populate(p4d_t *p4d, unsigned long addr, unsigned long e } pmd = boot_crst_alloc(_SEGMENT_ENTRY_EMPTY); pud_populate(&init_mm, pud, pmd); - } else if (pud_large(*pud)) { + } else if (pud_leaf(*pud)) { continue; } pgtable_pmd_populate(pud, addr, next, mode); diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index fb3ee7758b765..38290b0078c56 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -729,7 +729,7 @@ static inline int pud_bad(pud_t pud) { unsigned long type = pud_val(pud) & _REGION_ENTRY_TYPE_MASK;
- if (type > _REGION_ENTRY_TYPE_R3 || pud_large(pud)) + if (type > _REGION_ENTRY_TYPE_R3 || pud_leaf(pud)) return 1; if (type < _REGION_ENTRY_TYPE_R3) return 0; @@ -1396,7 +1396,7 @@ static inline unsigned long pud_deref(pud_t pud) unsigned long origin_mask;
origin_mask = _REGION_ENTRY_ORIGIN; - if (pud_large(pud)) + if (pud_leaf(pud)) origin_mask = _REGION3_ENTRY_ORIGIN_LARGE; return (unsigned long)__va(pud_val(pud) & origin_mask); } diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 157e0a8d5157d..d17bb1ef63f41 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -596,7 +596,7 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr) pud = pud_offset(p4d, vmaddr); VM_BUG_ON(pud_none(*pud)); /* large puds cannot yet be handled */ - if (pud_large(*pud)) + if (pud_leaf(*pud)) return -EFAULT; pmd = pmd_offset(pud, vmaddr); VM_BUG_ON(pmd_none(*pmd)); diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index 297a6d897d5a0..5f64f3d0fafbb 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -224,7 +224,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, if (p4d_present(*p4dp)) { pudp = pud_offset(p4dp, addr); if (pud_present(*pudp)) { - if (pud_large(*pudp)) + if (pud_leaf(*pudp)) return (pte_t *) pudp; pmdp = pmd_offset(pudp, addr); } @@ -240,7 +240,7 @@ int pmd_huge(pmd_t pmd)
int pud_huge(pud_t pud) { - return pud_large(pud); + return pud_leaf(pud); }
bool __init arch_hugetlb_valid_size(unsigned long size) diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c index b87e96c64b61d..441f654d048d2 100644 --- a/arch/s390/mm/pageattr.c +++ b/arch/s390/mm/pageattr.c @@ -274,7 +274,7 @@ static int walk_pud_level(p4d_t *p4d, unsigned long addr, unsigned long end, if (pud_none(*pudp)) return -EINVAL; next = pud_addr_end(addr, end); - if (pud_large(*pudp)) { + if (pud_leaf(*pudp)) { need_split = !!(flags & SET_MEMORY_4K); need_split |= !!(addr & ~PUD_MASK); need_split |= !!(addr + PUD_SIZE > next); diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 5cb92941540b3..5e349869590a8 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -479,7 +479,7 @@ static int pmd_lookup(struct mm_struct *mm, unsigned long addr, pmd_t **pmdp) return -ENOENT;
/* Large PUDs are not supported yet. */ - if (pud_large(*pud)) + if (pud_leaf(*pud)) return -EFAULT;
*pmdp = pmd_offset(pud, addr); diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c index 6d276103c6d58..2d3f65da56eea 100644 --- a/arch/s390/mm/vmem.c +++ b/arch/s390/mm/vmem.c @@ -322,7 +322,7 @@ static int modify_pud_table(p4d_t *p4d, unsigned long addr, unsigned long end, if (!add) { if (pud_none(*pud)) continue; - if (pud_large(*pud)) { + if (pud_leaf(*pud)) { if (IS_ALIGNED(addr, PUD_SIZE) && IS_ALIGNED(next, PUD_SIZE)) { pud_clear(pud); @@ -343,7 +343,7 @@ static int modify_pud_table(p4d_t *p4d, unsigned long addr, unsigned long end, if (!pmd) goto out; pud_populate(&init_mm, pud, pmd); - } else if (pud_large(*pud)) { + } else if (pud_leaf(*pud)) { continue; } ret = modify_pmd_table(pud, addr, next, add, direct); @@ -586,7 +586,7 @@ pte_t *vmem_get_alloc_pte(unsigned long addr, bool alloc) if (!pmd) goto out; pud_populate(&init_mm, pud, pmd); - } else if (WARN_ON_ONCE(pud_large(*pud))) { + } else if (WARN_ON_ONCE(pud_leaf(*pud))) { goto out; } pmd = pmd_offset(pud, addr); diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index f83017992eaae..d7db4e737218c 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -1665,7 +1665,7 @@ bool kern_addr_valid(unsigned long addr) if (pud_none(*pud)) return false;
- if (pud_large(*pud)) + if (pud_leaf(*pud)) return pfn_valid(pud_pfn(*pud));
pmd = pmd_offset(pud, addr); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f7901cb4d2fa4..11c484d72eab2 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3120,7 +3120,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, if (pud_none(pud) || !pud_present(pud)) goto out;
- if (pud_large(pud)) { + if (pud_leaf(pud)) { level = PG_LEVEL_1G; goto out; } diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index a9d69ec994b75..e238517968836 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -376,7 +376,7 @@ static void dump_pagetable(unsigned long address) goto bad;
pr_cont("PUD %lx ", pud_val(*pud)); - if (!pud_present(*pud) || pud_large(*pud)) + if (!pud_present(*pud) || pud_leaf(*pud)) goto out;
pmd = pmd_offset(pud, address); @@ -1037,7 +1037,7 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) if (!pud_present(*pud)) return 0;
- if (pud_large(*pud)) + if (pud_leaf(*pud)) return spurious_kernel_fault_check(error_code, (pte_t *) pud);
pmd = pmd_offset(pud, address); diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c index f50cc210a9818..a204a332c71fc 100644 --- a/arch/x86/mm/ident_map.c +++ b/arch/x86/mm/ident_map.c @@ -33,7 +33,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page, next = end;
/* if this is already a gbpage, this portion is already mapped */ - if (pud_large(*pud)) + if (pud_leaf(*pud)) continue;
/* Is using a gbpage allowed? */ diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index a190aae8ceaf7..19d209b412d7a 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -617,7 +617,7 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end, }
if (!pud_none(*pud)) { - if (!pud_large(*pud)) { + if (!pud_leaf(*pud)) { pmd = pmd_offset(pud, 0); paddr_last = phys_pmd_init(pmd, paddr, paddr_end, @@ -1163,7 +1163,7 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end, if (!pud_present(*pud)) continue;
- if (pud_large(*pud) && + if (pud_leaf(*pud) && IS_ALIGNED(addr, PUD_SIZE) && IS_ALIGNED(next, PUD_SIZE)) { spin_lock(&init_mm.page_table_lock); diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index 0302491d799d1..fcf508c52bdc5 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -115,7 +115,7 @@ static void __init kasan_populate_p4d(p4d_t *p4d, unsigned long addr, pud = pud_offset(p4d, addr); do { next = pud_addr_end(addr, end); - if (!pud_large(*pud)) + if (!pud_leaf(*pud)) kasan_populate_pud(pud, addr, next, nid); } while (pud++, addr = next, addr != end); } diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c index 0166ab1780ccb..ead3561359242 100644 --- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/mm/mem_encrypt_identity.c @@ -144,7 +144,7 @@ static pud_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd) set_pud(pud, __pud(PUD_FLAGS | __pa(pmd))); }
- if (pud_large(*pud)) + if (pud_leaf(*pud)) return NULL;
return pud; diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index bda9f129835e9..f3c4c756fe1ee 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -684,7 +684,7 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, return NULL;
*level = PG_LEVEL_1G; - if (pud_large(*pud) || !pud_present(*pud)) + if (pud_leaf(*pud) || !pud_present(*pud)) return (pte_t *)pud;
pmd = pmd_offset(pud, address); @@ -743,7 +743,7 @@ pmd_t *lookup_pmd_address(unsigned long address) return NULL;
pud = pud_offset(p4d, address); - if (pud_none(*pud) || pud_large(*pud) || !pud_present(*pud)) + if (pud_none(*pud) || pud_leaf(*pud) || !pud_present(*pud)) return NULL;
return pmd_offset(pud, address); @@ -1274,7 +1274,7 @@ static void unmap_pud_range(p4d_t *p4d, unsigned long start, unsigned long end) */ while (end - start >= PUD_SIZE) {
- if (pud_large(*pud)) + if (pud_leaf(*pud)) pud_clear(pud); else unmap_pmd_range(pud, start, start + PUD_SIZE); diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 9deadf517f14a..8e1ef5345b7a8 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -774,7 +774,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) */ int pud_clear_huge(pud_t *pud) { - if (pud_large(*pud)) { + if (pud_leaf(*pud)) { pud_clear(pud); return 1; } diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 78414c6d1b5ed..51b6b78e6b175 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -217,7 +217,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
pud = pud_offset(p4d, address); /* The user page tables do not use large mappings: */ - if (pud_large(*pud)) { + if (pud_leaf(*pud)) { WARN_ON(1); return NULL; } diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c index 6f955eb1e1631..d8af46e677503 100644 --- a/arch/x86/power/hibernate.c +++ b/arch/x86/power/hibernate.c @@ -170,7 +170,7 @@ int relocate_restore_code(void) goto out; } pud = pud_offset(p4d, relocated_restore_code); - if (pud_large(*pud)) { + if (pud_leaf(*pud)) { set_pud(pud, __pud(pud_val(*pud) & ~_PAGE_NX)); goto out; } diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index b6830554ff690..9d4a9311e819b 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -1082,7 +1082,7 @@ static void __init xen_cleanmfnmap_pud(pud_t *pud, bool unpin) pmd_t *pmd_tbl; int i;
- if (pud_large(*pud)) { + if (pud_leaf(*pud)) { pa = pud_val(*pud) & PHYSICAL_PAGE_MASK; xen_free_ro_pages(pa, PUD_SIZE); return; @@ -1863,7 +1863,7 @@ static phys_addr_t __init xen_early_virt_to_phys(unsigned long vaddr) if (!pud_present(pud)) return 0; pa = pud_val(pud) & PTE_PFN_MASK; - if (pud_large(pud)) + if (pud_leaf(pud)) return pa + (vaddr & ~PUD_MASK);
pmd = native_make_pmd(xen_read_phys_ulong(pa + pmd_index(vaddr) *
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ingo Molnar mingo@kernel.org
[ Upstream commit c567f2948f57bdc03ed03403ae0234085f376b7d ]
This reverts commit d794734c9bbfe22f86686dc2909c25f5ffe1a572.
While the original change tries to fix a bug, it also unintentionally broke existing systems, see the regressions reported at:
https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph...
Since d794734c9bbf was also marked for -stable, let's back it out before causing more damage.
Note that due to another upstream change the revert was not 100% automatic:
0a845e0f6348 mm/treewide: replace pud_large() with pud_leaf()
Signed-off-by: Ingo Molnar mingo@kernel.org Cc: stable@vger.kernel.org Cc: Russ Anderson rja@hpe.com Cc: Steve Wahl steve.wahl@hpe.com Cc: Dave Hansen dave.hansen@linux.intel.com Link: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph... Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/mm/ident_map.c | 23 +++++------------------ 1 file changed, 5 insertions(+), 18 deletions(-)
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c index a204a332c71fc..968d7005f4a72 100644 --- a/arch/x86/mm/ident_map.c +++ b/arch/x86/mm/ident_map.c @@ -26,31 +26,18 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page, for (; addr < end; addr = next) { pud_t *pud = pud_page + pud_index(addr); pmd_t *pmd; - bool use_gbpage;
next = (addr & PUD_MASK) + PUD_SIZE; if (next > end) next = end;
- /* if this is already a gbpage, this portion is already mapped */ - if (pud_leaf(*pud)) - continue; - - /* Is using a gbpage allowed? */ - use_gbpage = info->direct_gbpages; - - /* Don't use gbpage if it maps more than the requested region. */ - /* at the begining: */ - use_gbpage &= ((addr & ~PUD_MASK) == 0); - /* ... or at the end: */ - use_gbpage &= ((next & ~PUD_MASK) == 0); - - /* Never overwrite existing mappings */ - use_gbpage &= !pud_present(*pud); - - if (use_gbpage) { + if (info->direct_gbpages) { pud_t pudval;
+ if (pud_present(*pud)) + continue; + + addr &= PUD_MASK; pudval = __pud((addr - info->offset) | info->page_flag); set_pud(pud, pudval); continue;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 418b09027743d9a9fb39116bed46a192f868a3c3 ]
When FIEMAP_FLAG_SYNC is given to fiemap the expectation is that that are no concurrent writes and we get a stable view of the inode's extent layout.
When the flag is given we flush all IO (and wait for ordered extents to complete) and then lock the inode in shared mode, however that leaves open the possibility that a write might happen right after the flushing and before locking the inode. So fix this by flushing again after locking the inode - we leave the initial flushing before locking the inode to avoid holding the lock and blocking other RO operations while waiting for IO and ordered extents to complete. The second flushing while holding the inode's lock will most of the time do nothing or very little since the time window for new writes to have happened is small.
Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 978b63f7464a ("btrfs: fix race when detecting delalloc ranges during fiemap") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 21 ++++++++------------- fs/btrfs/inode.c | 22 +++++++++++++++++++++- 2 files changed, 29 insertions(+), 14 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index fc8eb8d86ca25..45d427c3033d7 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2953,17 +2953,15 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, range_end = round_up(start + len, sectorsize); prev_extent_end = range_start;
- btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED); - ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end); if (ret < 0) - goto out_unlock; + goto out; btrfs_release_path(path);
path->reada = READA_FORWARD; ret = fiemap_search_slot(inode, path, range_start); if (ret < 0) { - goto out_unlock; + goto out; } else if (ret > 0) { /* * No file extent item found, but we may have delalloc between @@ -3010,7 +3008,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, backref_ctx, 0, 0, 0, prev_extent_end, hole_end); if (ret < 0) { - goto out_unlock; + goto out; } else if (ret > 0) { /* fiemap_fill_next_extent() told us to stop. */ stopped = true; @@ -3066,7 +3064,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, extent_gen, backref_ctx); if (ret < 0) - goto out_unlock; + goto out; else if (ret > 0) flags |= FIEMAP_EXTENT_SHARED; } @@ -3077,7 +3075,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, }
if (ret < 0) { - goto out_unlock; + goto out; } else if (ret > 0) { /* fiemap_fill_next_extent() told us to stop. */ stopped = true; @@ -3088,12 +3086,12 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, next_item: if (fatal_signal_pending(current)) { ret = -EINTR; - goto out_unlock; + goto out; }
ret = fiemap_next_leaf_item(inode, path); if (ret < 0) { - goto out_unlock; + goto out; } else if (ret > 0) { /* No more file extent items for this inode. */ break; @@ -3117,7 +3115,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, &delalloc_cached_state, backref_ctx, 0, 0, 0, prev_extent_end, range_end - 1); if (ret < 0) - goto out_unlock; + goto out; prev_extent_end = range_end; }
@@ -3155,9 +3153,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, }
ret = emit_last_fiemap_cache(fieinfo, &cache); - -out_unlock: - btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); out: free_extent_state(delalloc_cached_state); btrfs_free_backref_share_ctx(backref_ctx); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ca79c2b8adc46..1ac14223ffb50 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7813,6 +7813,7 @@ struct iomap_dio *btrfs_dio_write(struct kiocb *iocb, struct iov_iter *iter, static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len) { + struct btrfs_inode *btrfs_inode = BTRFS_I(inode); int ret;
ret = fiemap_prep(inode, fieinfo, start, &len, 0); @@ -7838,7 +7839,26 @@ static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, return ret; }
- return extent_fiemap(BTRFS_I(inode), fieinfo, start, len); + btrfs_inode_lock(btrfs_inode, BTRFS_ILOCK_SHARED); + + /* + * We did an initial flush to avoid holding the inode's lock while + * triggering writeback and waiting for the completion of IO and ordered + * extents. Now after we locked the inode we do it again, because it's + * possible a new write may have happened in between those two steps. + */ + if (fieinfo->fi_flags & FIEMAP_FLAG_SYNC) { + ret = btrfs_wait_ordered_range(inode, 0, LLONG_MAX); + if (ret) { + btrfs_inode_unlock(btrfs_inode, BTRFS_ILOCK_SHARED); + return ret; + } + } + + ret = extent_fiemap(btrfs_inode, fieinfo, start, len); + btrfs_inode_unlock(btrfs_inode, BTRFS_ILOCK_SHARED); + + return ret; }
static int btrfs_writepages(struct address_space *mapping,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 978b63f7464abcfd364a6c95f734282c50f3decf ]
For fiemap we recently stopped locking the target extent range for the whole duration of the fiemap call, in order to avoid a deadlock in a scenario where the fiemap buffer happens to be a memory mapped range of the same file. This use case is very unlikely to be useful in practice but it may be triggered by fuzz testing (syzbot, etc).
This however introduced a race that makes us miss delalloc ranges for file regions that are currently holes, so the caller of fiemap will not be aware that there's data for some file regions. This can be quite serious for some use cases - for example in coreutils versions before 9.0, the cp program used fiemap to detect holes and data in the source file, copying only regions with data (extents or delalloc) from the source file to the destination file in order to preserve holes (see the documentation for its --sparse command line option). This means that if cp was used with a source file that had delalloc in a hole, the destination file could end up without that data, which is effectively a data loss issue, if it happened to hit the race described below.
The race happens like this:
1) Fiemap is called, without the FIEMAP_FLAG_SYNC flag, for a file that has delalloc in the file range [64M, 65M[, which is currently a hole;
2) Fiemap locks the inode in shared mode, then starts iterating the inode's subvolume tree searching for file extent items, without having the whole fiemap target range locked in the inode's io tree - the change introduced recently by commit b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking"). It only locks ranges in the io tree when it finds a hole or prealloc extent since that commit;
3) Note that fiemap clones each leaf before using it, and this is to avoid deadlocks when locking a file range in the inode's io tree and the fiemap buffer is memory mapped to some file, because writing to the page with btrfs_page_mkwrite() will wait on any ordered extent for the page's range and the ordered extent needs to lock the range and may need to modify the same leaf, therefore leading to a deadlock on the leaf;
4) While iterating the file extent items in the cloned leaf before finding the hole in the range [64M, 65M[, the delalloc in that range is flushed and its ordered extent completes - meaning the corresponding file extent item is in the inode's subvolume tree, but not present in the cloned leaf that fiemap is iterating over;
5) When fiemap finds the hole in the [64M, 65M[ range by seeing the gap in the cloned leaf (or a file extent item with disk_bytenr == 0 in case the NO_HOLES feature is not enabled), it will lock that file range in the inode's io tree and then search for delalloc by checking for the EXTENT_DELALLOC bit in the io tree for that range and ordered extents (with btrfs_find_delalloc_in_range()). But it finds nothing since the delalloc in that range was already flushed and the ordered extent completed and is gone - as a result fiemap will not report that there's delalloc or an extent for the range [64M, 65M[, so user space will be mislead into thinking that there's a hole in that range.
This could actually be sporadically triggered with test case generic/094 from fstests, which reports a missing extent/delalloc range like this:
# generic/094 2s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad) # --- tests/generic/094.out 2020-06-10 19:29:03.830519425 +0100 # +++ /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad 2024-02-28 11:00:00.381071525 +0000 # @@ -1,3 +1,9 @@ # QA output created by 094 # fiemap run with sync # fiemap run without sync # +ERROR: couldn't find extent at 7 # +map is 'HHDDHPPDPHPH' # +logical: [ 5.. 6] phys: 301517.. 301518 flags: 0x800 tot: 2 # +logical: [ 8.. 8] phys: 301520.. 301520 flags: 0x800 tot: 1 # ... # (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/094.out /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad' to see the entire diff)
So in order to fix this, while still avoiding deadlocks in the case where the fiemap buffer is memory mapped to the same file, change fiemap to work like the following:
1) Always lock the whole range in the inode's io tree before starting to iterate the inode's subvolume tree searching for file extent items, just like we did before commit b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking");
2) Now instead of writing to the fiemap buffer every time we have an extent to report, write instead to a temporary buffer (1 page), and when that buffer becomes full, stop iterating the file extent items, unlock the range in the io tree, release the search path, submit all the entries kept in that buffer to the fiemap buffer, and then resume the search for file extent items after locking again the remainder of the range in the io tree.
The buffer having a size of a page, allows for 146 entries in a system with 4K pages. This is a large enough value to have a good performance by avoiding too many restarts of the search for file extent items. In other words this preserves the huge performance gains made in the last two years to fiemap, while avoiding the deadlocks in case the fiemap buffer is memory mapped to the same file (useless in practice, but possible and exercised by fuzz testing and syzbot).
Fixes: b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking") Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 221 +++++++++++++++++++++++++++++++------------ 1 file changed, 160 insertions(+), 61 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 45d427c3033d7..5acb2cb79d4bf 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2410,12 +2410,65 @@ int try_release_extent_mapping(struct page *page, gfp_t mask) return try_release_extent_state(tree, page, mask); }
+struct btrfs_fiemap_entry { + u64 offset; + u64 phys; + u64 len; + u32 flags; +}; + /* - * To cache previous fiemap extent + * Indicate the caller of emit_fiemap_extent() that it needs to unlock the file + * range from the inode's io tree, unlock the subvolume tree search path, flush + * the fiemap cache and relock the file range and research the subvolume tree. + * The value here is something negative that can't be confused with a valid + * errno value and different from 1 because that's also a return value from + * fiemap_fill_next_extent() and also it's often used to mean some btree search + * did not find a key, so make it some distinct negative value. + */ +#define BTRFS_FIEMAP_FLUSH_CACHE (-(MAX_ERRNO + 1)) + +/* + * Used to: + * + * - Cache the next entry to be emitted to the fiemap buffer, so that we can + * merge extents that are contiguous and can be grouped as a single one; * - * Will be used for merging fiemap extent + * - Store extents ready to be written to the fiemap buffer in an intermediary + * buffer. This intermediary buffer is to ensure that in case the fiemap + * buffer is memory mapped to the fiemap target file, we don't deadlock + * during btrfs_page_mkwrite(). This is because during fiemap we are locking + * an extent range in order to prevent races with delalloc flushing and + * ordered extent completion, which is needed in order to reliably detect + * delalloc in holes and prealloc extents. And this can lead to a deadlock + * if the fiemap buffer is memory mapped to the file we are running fiemap + * against (a silly, useless in practice scenario, but possible) because + * btrfs_page_mkwrite() will try to lock the same extent range. */ struct fiemap_cache { + /* An array of ready fiemap entries. */ + struct btrfs_fiemap_entry *entries; + /* Number of entries in the entries array. */ + int entries_size; + /* Index of the next entry in the entries array to write to. */ + int entries_pos; + /* + * Once the entries array is full, this indicates what's the offset for + * the next file extent item we must search for in the inode's subvolume + * tree after unlocking the extent range in the inode's io tree and + * releasing the search path. + */ + u64 next_search_offset; + /* + * This matches struct fiemap_extent_info::fi_mapped_extents, we use it + * to count ourselves emitted extents and stop instead of relying on + * fiemap_fill_next_extent() because we buffer ready fiemap entries at + * the @entries array, and we want to stop as soon as we hit the max + * amount of extents to map, not just to save time but also to make the + * logic at extent_fiemap() simpler. + */ + unsigned int extents_mapped; + /* Fields for the cached extent (unsubmitted, not ready, extent). */ u64 offset; u64 phys; u64 len; @@ -2423,6 +2476,28 @@ struct fiemap_cache { bool cached; };
+static int flush_fiemap_cache(struct fiemap_extent_info *fieinfo, + struct fiemap_cache *cache) +{ + for (int i = 0; i < cache->entries_pos; i++) { + struct btrfs_fiemap_entry *entry = &cache->entries[i]; + int ret; + + ret = fiemap_fill_next_extent(fieinfo, entry->offset, + entry->phys, entry->len, + entry->flags); + /* + * Ignore 1 (reached max entries) because we keep track of that + * ourselves in emit_fiemap_extent(). + */ + if (ret < 0) + return ret; + } + cache->entries_pos = 0; + + return 0; +} + /* * Helper to submit fiemap extent. * @@ -2437,8 +2512,8 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, struct fiemap_cache *cache, u64 offset, u64 phys, u64 len, u32 flags) { + struct btrfs_fiemap_entry *entry; u64 cache_end; - int ret = 0;
/* Set at the end of extent_fiemap(). */ ASSERT((flags & FIEMAP_EXTENT_LAST) == 0); @@ -2451,7 +2526,9 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, * find an extent that starts at an offset behind the end offset of the * previous extent we processed. This happens if fiemap is called * without FIEMAP_FLAG_SYNC and there are ordered extents completing - * while we call btrfs_next_leaf() (through fiemap_next_leaf_item()). + * after we had to unlock the file range, release the search path, emit + * the fiemap extents stored in the buffer (cache->entries array) and + * the lock the remainder of the range and re-search the btree. * * For example we are in leaf X processing its last item, which is the * file extent item for file range [512K, 1M[, and after @@ -2564,11 +2641,35 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
emit: /* Not mergeable, need to submit cached one */ - ret = fiemap_fill_next_extent(fieinfo, cache->offset, cache->phys, - cache->len, cache->flags); - cache->cached = false; - if (ret) - return ret; + + if (cache->entries_pos == cache->entries_size) { + /* + * We will need to research for the end offset of the last + * stored extent and not from the current offset, because after + * unlocking the range and releasing the path, if there's a hole + * between that end offset and this current offset, a new extent + * may have been inserted due to a new write, so we don't want + * to miss it. + */ + entry = &cache->entries[cache->entries_size - 1]; + cache->next_search_offset = entry->offset + entry->len; + cache->cached = false; + + return BTRFS_FIEMAP_FLUSH_CACHE; + } + + entry = &cache->entries[cache->entries_pos]; + entry->offset = cache->offset; + entry->phys = cache->phys; + entry->len = cache->len; + entry->flags = cache->flags; + cache->entries_pos++; + cache->extents_mapped++; + + if (cache->extents_mapped == fieinfo->fi_extents_max) { + cache->cached = false; + return 1; + } assign: cache->cached = true; cache->offset = offset; @@ -2694,8 +2795,8 @@ static int fiemap_search_slot(struct btrfs_inode *inode, struct btrfs_path *path * neighbour leaf). * We also need the private clone because holding a read lock on an * extent buffer of the subvolume's b+tree will make lockdep unhappy - * when we call fiemap_fill_next_extent(), because that may cause a page - * fault when filling the user space buffer with fiemap data. + * when we check if extents are shared, as backref walking may need to + * lock the same leaf we are processing. */ clone = btrfs_clone_extent_buffer(path->nodes[0]); if (!clone) @@ -2735,34 +2836,16 @@ static int fiemap_process_hole(struct btrfs_inode *inode, * it beyond i_size. */ while (cur_offset < end && cur_offset < i_size) { - struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end; u64 prealloc_start; - u64 lockstart; - u64 lockend; u64 prealloc_len = 0; bool delalloc;
- lockstart = round_down(cur_offset, inode->root->fs_info->sectorsize); - lockend = round_up(end, inode->root->fs_info->sectorsize); - - /* - * We are only locking for the delalloc range because that's the - * only thing that can change here. With fiemap we have a lock - * on the inode, so no buffered or direct writes can happen. - * - * However mmaps and normal page writeback will cause this to - * change arbitrarily. We have to lock the extent lock here to - * make sure that nobody messes with the tree while we're doing - * btrfs_find_delalloc_in_range. - */ - lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, cur_offset, end, delalloc_cached_state, &delalloc_start, &delalloc_end); - unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) break;
@@ -2930,6 +3013,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len) { const u64 ino = btrfs_ino(inode); + struct extent_state *cached_state = NULL; struct extent_state *delalloc_cached_state = NULL; struct btrfs_path *path; struct fiemap_cache cache = { 0 }; @@ -2942,26 +3026,33 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, bool stopped = false; int ret;
+ cache.entries_size = PAGE_SIZE / sizeof(struct btrfs_fiemap_entry); + cache.entries = kmalloc_array(cache.entries_size, + sizeof(struct btrfs_fiemap_entry), + GFP_KERNEL); backref_ctx = btrfs_alloc_backref_share_check_ctx(); path = btrfs_alloc_path(); - if (!backref_ctx || !path) { + if (!cache.entries || !backref_ctx || !path) { ret = -ENOMEM; goto out; }
+restart: range_start = round_down(start, sectorsize); range_end = round_up(start + len, sectorsize); prev_extent_end = range_start;
+ lock_extent(&inode->io_tree, range_start, range_end, &cached_state); + ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end); if (ret < 0) - goto out; + goto out_unlock; btrfs_release_path(path);
path->reada = READA_FORWARD; ret = fiemap_search_slot(inode, path, range_start); if (ret < 0) { - goto out; + goto out_unlock; } else if (ret > 0) { /* * No file extent item found, but we may have delalloc between @@ -3008,7 +3099,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, backref_ctx, 0, 0, 0, prev_extent_end, hole_end); if (ret < 0) { - goto out; + goto out_unlock; } else if (ret > 0) { /* fiemap_fill_next_extent() told us to stop. */ stopped = true; @@ -3064,7 +3155,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, extent_gen, backref_ctx); if (ret < 0) - goto out; + goto out_unlock; else if (ret > 0) flags |= FIEMAP_EXTENT_SHARED; } @@ -3075,9 +3166,9 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, }
if (ret < 0) { - goto out; + goto out_unlock; } else if (ret > 0) { - /* fiemap_fill_next_extent() told us to stop. */ + /* emit_fiemap_extent() told us to stop. */ stopped = true; break; } @@ -3086,12 +3177,12 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, next_item: if (fatal_signal_pending(current)) { ret = -EINTR; - goto out; + goto out_unlock; }
ret = fiemap_next_leaf_item(inode, path); if (ret < 0) { - goto out; + goto out_unlock; } else if (ret > 0) { /* No more file extent items for this inode. */ break; @@ -3100,22 +3191,12 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, }
check_eof_delalloc: - /* - * Release (and free) the path before emitting any final entries to - * fiemap_fill_next_extent() to keep lockdep happy. This is because - * once we find no more file extent items exist, we may have a - * non-cloned leaf, and fiemap_fill_next_extent() can trigger page - * faults when copying data to the user space buffer. - */ - btrfs_free_path(path); - path = NULL; - if (!stopped && prev_extent_end < range_end) { ret = fiemap_process_hole(inode, fieinfo, &cache, &delalloc_cached_state, backref_ctx, 0, 0, 0, prev_extent_end, range_end - 1); if (ret < 0) - goto out; + goto out_unlock; prev_extent_end = range_end; }
@@ -3123,28 +3204,16 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, const u64 i_size = i_size_read(&inode->vfs_inode);
if (prev_extent_end < i_size) { - struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end; - u64 lockstart; - u64 lockend; bool delalloc;
- lockstart = round_down(prev_extent_end, sectorsize); - lockend = round_up(i_size, sectorsize); - - /* - * See the comment in fiemap_process_hole as to why - * we're doing the locking here. - */ - lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, prev_extent_end, i_size - 1, &delalloc_cached_state, &delalloc_start, &delalloc_end); - unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) cache.flags |= FIEMAP_EXTENT_LAST; } else { @@ -3152,9 +3221,39 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, } }
+out_unlock: + unlock_extent(&inode->io_tree, range_start, range_end, &cached_state); + + if (ret == BTRFS_FIEMAP_FLUSH_CACHE) { + btrfs_release_path(path); + ret = flush_fiemap_cache(fieinfo, &cache); + if (ret) + goto out; + len -= cache.next_search_offset - start; + start = cache.next_search_offset; + goto restart; + } else if (ret < 0) { + goto out; + } + + /* + * Must free the path before emitting to the fiemap buffer because we + * may have a non-cloned leaf and if the fiemap buffer is memory mapped + * to a file, a write into it (through btrfs_page_mkwrite()) may trigger + * waiting for an ordered extent that in order to complete needs to + * modify that leaf, therefore leading to a deadlock. + */ + btrfs_free_path(path); + path = NULL; + + ret = flush_fiemap_cache(fieinfo, &cache); + if (ret) + goto out; + ret = emit_last_fiemap_cache(fieinfo, &cache); out: free_extent_state(delalloc_cached_state); + kfree(cache.entries); btrfs_free_backref_share_ctx(backref_ctx); btrfs_free_path(path); return ret;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov (AMD) bp@alien8.de
[ Upstream commit 30fa92832f405d5ac9f263e99f62445fa3084008 ]
Add X86_FEATURE flags for each Zen generation. They should be used from now on instead of checking f/m/s.
Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Nikolay Borisov nik.borisov@suse.com Acked-by: Thomas Gleixner tglx@linutronix.de Link: http://lore.kernel.org/r/20231120104152.13740-2-bp@alien8.de Stable-dep-of: c7b2edd8377b ("perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/cpufeatures.h | 5 ++- arch/x86/kernel/cpu/amd.c | 70 +++++++++++++++++++++++++++++- 2 files changed, 72 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index bd33f6366c80d..1f9db287165ac 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -218,7 +218,7 @@ #define X86_FEATURE_IBRS ( 7*32+25) /* Indirect Branch Restricted Speculation */ #define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */ #define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */ -#define X86_FEATURE_ZEN (7*32+28) /* "" CPU based on Zen microarchitecture */ +#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU based on Zen microarchitecture */ #define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */ #define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */ #define X86_FEATURE_MSR_IA32_FEAT_CTL ( 7*32+31) /* "" MSR IA32_FEAT_CTL configured */ @@ -312,6 +312,9 @@ #define X86_FEATURE_SRSO_ALIAS (11*32+25) /* "" AMD BTB untrain RETs through aliasing */ #define X86_FEATURE_IBPB_ON_VMEXIT (11*32+26) /* "" Issue an IBPB only on VMEXIT */ #define X86_FEATURE_APIC_MSRS_FENCE (11*32+27) /* "" IA32_TSC_DEADLINE and X2APIC MSRs need fencing */ +#define X86_FEATURE_ZEN2 (11*32+28) /* "" CPU based on Zen2 microarchitecture */ +#define X86_FEATURE_ZEN3 (11*32+29) /* "" CPU based on Zen3 microarchitecture */ +#define X86_FEATURE_ZEN4 (11*32+30) /* "" CPU based on Zen4 microarchitecture */
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 031bca974fbf3..5391385707b3f 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -620,6 +620,49 @@ static void bsp_init_amd(struct cpuinfo_x86 *c) }
resctrl_cpu_detect(c); + + /* Figure out Zen generations: */ + switch (c->x86) { + case 0x17: { + switch (c->x86_model) { + case 0x00 ... 0x2f: + case 0x50 ... 0x5f: + setup_force_cpu_cap(X86_FEATURE_ZEN); + break; + case 0x30 ... 0x4f: + case 0x60 ... 0x7f: + case 0x90 ... 0x91: + case 0xa0 ... 0xaf: + setup_force_cpu_cap(X86_FEATURE_ZEN2); + break; + default: + goto warn; + } + break; + } + case 0x19: { + switch (c->x86_model) { + case 0x00 ... 0x0f: + case 0x20 ... 0x5f: + setup_force_cpu_cap(X86_FEATURE_ZEN3); + break; + case 0x10 ... 0x1f: + case 0x60 ... 0xaf: + setup_force_cpu_cap(X86_FEATURE_ZEN4); + break; + default: + goto warn; + } + break; + } + default: + break; + } + + return; + +warn: + WARN_ONCE(1, "Family 0x%x, model: 0x%x??\n", c->x86, c->x86_model); }
static void early_detect_mem_encrypt(struct cpuinfo_x86 *c) @@ -978,8 +1021,6 @@ void init_spectral_chicken(struct cpuinfo_x86 *c)
static void init_amd_zn(struct cpuinfo_x86 *c) { - set_cpu_cap(c, X86_FEATURE_ZEN); - #ifdef CONFIG_NUMA node_reclaim_distance = 32; #endif @@ -1042,6 +1083,22 @@ static void zenbleed_check(struct cpuinfo_x86 *c) } }
+static void init_amd_zen(struct cpuinfo_x86 *c) +{ +} + +static void init_amd_zen2(struct cpuinfo_x86 *c) +{ +} + +static void init_amd_zen3(struct cpuinfo_x86 *c) +{ +} + +static void init_amd_zen4(struct cpuinfo_x86 *c) +{ +} + static void init_amd(struct cpuinfo_x86 *c) { early_init_amd(c); @@ -1080,6 +1137,15 @@ static void init_amd(struct cpuinfo_x86 *c) case 0x19: init_amd_zn(c); break; }
+ if (boot_cpu_has(X86_FEATURE_ZEN)) + init_amd_zen(c); + else if (boot_cpu_has(X86_FEATURE_ZEN2)) + init_amd_zen2(c); + else if (boot_cpu_has(X86_FEATURE_ZEN3)) + init_amd_zen3(c); + else if (boot_cpu_has(X86_FEATURE_ZEN4)) + init_amd_zen4(c); + /* * Enable workaround for FXSAVE leak on CPUs * without a XSaveErPtr feature
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov (AMD) bp@alien8.de
[ Upstream commit a7c32a1ae9ee43abfe884f5af376877c4301d166 ]
Call it on the affected CPU generations.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Nikolay Borisov nik.borisov@suse.com Link: http://lore.kernel.org/r/20231120104152.13740-3-bp@alien8.de Stable-dep-of: c7b2edd8377b ("perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/amd.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 5391385707b3f..28c3a1045b060 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -988,6 +988,19 @@ static void init_amd_bd(struct cpuinfo_x86 *c) clear_rdrand_cpuid_bit(c); }
+static void fix_erratum_1386(struct cpuinfo_x86 *c) +{ + /* + * Work around Erratum 1386. The XSAVES instruction malfunctions in + * certain circumstances on Zen1/2 uarch, and not all parts have had + * updated microcode at the time of writing (March 2023). + * + * Affected parts all have no supervisor XSAVE states, meaning that + * the XSAVEC instruction (which works fine) is equivalent. + */ + clear_cpu_cap(c, X86_FEATURE_XSAVES); +} + void init_spectral_chicken(struct cpuinfo_x86 *c) { #ifdef CONFIG_CPU_UNRET_ENTRY @@ -1008,15 +1021,6 @@ void init_spectral_chicken(struct cpuinfo_x86 *c) } } #endif - /* - * Work around Erratum 1386. The XSAVES instruction malfunctions in - * certain circumstances on Zen1/2 uarch, and not all parts have had - * updated microcode at the time of writing (March 2023). - * - * Affected parts all have no supervisor XSAVE states, meaning that - * the XSAVEC instruction (which works fine) is equivalent. - */ - clear_cpu_cap(c, X86_FEATURE_XSAVES); }
static void init_amd_zn(struct cpuinfo_x86 *c) @@ -1085,10 +1089,12 @@ static void zenbleed_check(struct cpuinfo_x86 *c)
static void init_amd_zen(struct cpuinfo_x86 *c) { + fix_erratum_1386(c); }
static void init_amd_zen2(struct cpuinfo_x86 *c) { + fix_erratum_1386(c); }
static void init_amd_zen3(struct cpuinfo_x86 *c)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov (AMD) bp@alien8.de
[ Upstream commit 0da91912fc150d6d321b15e648bead202ced1a27 ]
No functional changes.
Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Nikolay Borisov nik.borisov@suse.com Link: http://lore.kernel.org/r/20231120104152.13740-5-bp@alien8.de Stable-dep-of: c7b2edd8377b ("perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/amd.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 28c3a1045b060..71503181bffd0 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -1028,6 +1028,11 @@ static void init_amd_zn(struct cpuinfo_x86 *c) #ifdef CONFIG_NUMA node_reclaim_distance = 32; #endif +} + +static void init_amd_zen(struct cpuinfo_x86 *c) +{ + fix_erratum_1386(c);
/* Fix up CPUID bits, but only if not virtualised. */ if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) { @@ -1087,11 +1092,6 @@ static void zenbleed_check(struct cpuinfo_x86 *c) } }
-static void init_amd_zen(struct cpuinfo_x86 *c) -{ - fix_erratum_1386(c); -} - static void init_amd_zen2(struct cpuinfo_x86 *c) { fix_erratum_1386(c);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov (AMD) bp@alien8.de
[ Upstream commit f69759be251dce722942594fbc62e53a40822a82 ]
Prefix it properly so that it is clear which generation it is dealing with.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: http://lore.kernel.org/r/20231120104152.13740-8-bp@alien8.de Stable-dep-of: c7b2edd8377b ("perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/amd.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 71503181bffd0..d8a0dc01a7db2 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -70,12 +70,6 @@ static const int amd_erratum_383[] = static const int amd_erratum_1054[] = AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0, 0, 0x2f, 0xf));
-static const int amd_zenbleed[] = - AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x30, 0x0, 0x4f, 0xf), - AMD_MODEL_RANGE(0x17, 0x60, 0x0, 0x7f, 0xf), - AMD_MODEL_RANGE(0x17, 0x90, 0x0, 0x91, 0xf), - AMD_MODEL_RANGE(0x17, 0xa0, 0x0, 0xaf, 0xf)); - static const int amd_div0[] = AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x00, 0x0, 0x2f, 0xf), AMD_MODEL_RANGE(0x17, 0x50, 0x0, 0x5f, 0xf)); @@ -1073,11 +1067,8 @@ static bool cpu_has_zenbleed_microcode(void) return true; }
-static void zenbleed_check(struct cpuinfo_x86 *c) +static void zen2_zenbleed_check(struct cpuinfo_x86 *c) { - if (!cpu_has_amd_erratum(c, amd_zenbleed)) - return; - if (cpu_has(c, X86_FEATURE_HYPERVISOR)) return;
@@ -1095,6 +1086,7 @@ static void zenbleed_check(struct cpuinfo_x86 *c) static void init_amd_zen2(struct cpuinfo_x86 *c) { fix_erratum_1386(c); + zen2_zenbleed_check(c); }
static void init_amd_zen3(struct cpuinfo_x86 *c) @@ -1219,8 +1211,6 @@ static void init_amd(struct cpuinfo_x86 *c) cpu_has(c, X86_FEATURE_AUTOIBRS)) WARN_ON_ONCE(msr_set_bit(MSR_EFER, _EFER_AUTOIBRS));
- zenbleed_check(c); - if (cpu_has_amd_erratum(c, amd_div0)) { pr_notice_once("AMD Zen1 DIV0 bug detected. Disable SMT for full protection.\n"); setup_force_cpu_bug(X86_BUG_DIV0); @@ -1385,7 +1375,7 @@ static void zenbleed_check_cpu(void *unused) { struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
- zenbleed_check(c); + zen2_zenbleed_check(c); }
void amd_check_microcode(void)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov (AMD) bp@alien8.de
[ Upstream commit bfff3c6692ce64fa9d86eb829d18229c307a0855 ]
No functional changes.
Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Nikolay Borisov nik.borisov@suse.com Link: http://lore.kernel.org/r/20231120104152.13740-9-bp@alien8.de Stable-dep-of: c7b2edd8377b ("perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/amd.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index d8a0dc01a7db2..f4373530c7de0 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -70,10 +70,6 @@ static const int amd_erratum_383[] = static const int amd_erratum_1054[] = AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0, 0, 0x2f, 0xf));
-static const int amd_div0[] = - AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x00, 0x0, 0x2f, 0xf), - AMD_MODEL_RANGE(0x17, 0x50, 0x0, 0x5f, 0xf)); - static const int amd_erratum_1485[] = AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x19, 0x10, 0x0, 0x1f, 0xf), AMD_MODEL_RANGE(0x19, 0x60, 0x0, 0xaf, 0xf)); @@ -1043,6 +1039,9 @@ static void init_amd_zen(struct cpuinfo_x86 *c) if (c->x86 == 0x19 && !cpu_has(c, X86_FEATURE_BTC_NO)) set_cpu_cap(c, X86_FEATURE_BTC_NO); } + + pr_notice_once("AMD Zen1 DIV0 bug detected. Disable SMT for full protection.\n"); + setup_force_cpu_bug(X86_BUG_DIV0); }
static bool cpu_has_zenbleed_microcode(void) @@ -1211,11 +1210,6 @@ static void init_amd(struct cpuinfo_x86 *c) cpu_has(c, X86_FEATURE_AUTOIBRS)) WARN_ON_ONCE(msr_set_bit(MSR_EFER, _EFER_AUTOIBRS));
- if (cpu_has_amd_erratum(c, amd_div0)) { - pr_notice_once("AMD Zen1 DIV0 bug detected. Disable SMT for full protection.\n"); - setup_force_cpu_bug(X86_BUG_DIV0); - } - if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && cpu_has_amd_erratum(c, amd_erratum_1485)) msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov (AMD) bp@alien8.de
[ Upstream commit 54c33e23f75d5c9925495231c57d3319336722ef ]
No functional changes.
Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Nikolay Borisov nik.borisov@suse.com Link: http://lore.kernel.org/r/20231120104152.13740-10-bp@alien8.de Stable-dep-of: c7b2edd8377b ("perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/amd.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index f4373530c7de0..98fa23ef97df2 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -66,10 +66,6 @@ static const int amd_erratum_400[] = static const int amd_erratum_383[] = AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf));
-/* #1054: Instructions Retired Performance Counter May Be Inaccurate */ -static const int amd_erratum_1054[] = - AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0, 0, 0x2f, 0xf)); - static const int amd_erratum_1485[] = AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x19, 0x10, 0x0, 0x1f, 0xf), AMD_MODEL_RANGE(0x19, 0x60, 0x0, 0xaf, 0xf)); @@ -1194,7 +1190,7 @@ static void init_amd(struct cpuinfo_x86 *c) * Counter May Be Inaccurate". */ if (cpu_has(c, X86_FEATURE_IRPERF) && - !cpu_has_amd_erratum(c, amd_erratum_1054)) + (boot_cpu_has(X86_FEATURE_ZEN) && c->x86_model > 0x2f)) msr_set_bit(MSR_K7_HWCR, MSR_K7_HWCR_IRPERF_EN_BIT);
check_null_seg_clears_base(c);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov (AMD) bp@alien8.de
[ Upstream commit 232afb557835d6f6859c73bf610bad308c96b131 ]
Add a synthetic feature flag specifically for first generation Zen machines. There's need to have a generic flag for all Zen generations so make X86_FEATURE_ZEN be that flag.
Fixes: 30fa92832f40 ("x86/CPU/AMD: Add ZenX generations flags") Suggested-by: Brian Gerst brgerst@gmail.com Suggested-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/r/dc3835e3-0731-4230-bbb9-336bbe3d042b@amd.com Stable-dep-of: c7b2edd8377b ("perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/cpufeatures.h | 3 ++- arch/x86/kernel/cpu/amd.c | 11 ++++++----- tools/arch/x86/include/asm/cpufeatures.h | 2 +- 3 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 1f9db287165ac..bc66aec9139ea 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -218,7 +218,7 @@ #define X86_FEATURE_IBRS ( 7*32+25) /* Indirect Branch Restricted Speculation */ #define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */ #define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */ -#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU based on Zen microarchitecture */ +#define X86_FEATURE_ZEN ( 7*32+28) /* "" Generic flag for all Zen and newer */ #define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */ #define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */ #define X86_FEATURE_MSR_IA32_FEAT_CTL ( 7*32+31) /* "" MSR IA32_FEAT_CTL configured */ @@ -315,6 +315,7 @@ #define X86_FEATURE_ZEN2 (11*32+28) /* "" CPU based on Zen2 microarchitecture */ #define X86_FEATURE_ZEN3 (11*32+29) /* "" CPU based on Zen3 microarchitecture */ #define X86_FEATURE_ZEN4 (11*32+30) /* "" CPU based on Zen4 microarchitecture */ +#define X86_FEATURE_ZEN1 (11*32+31) /* "" CPU based on Zen1 microarchitecture */
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 98fa23ef97df2..9fd91022d92d0 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -613,7 +613,7 @@ static void bsp_init_amd(struct cpuinfo_x86 *c) switch (c->x86_model) { case 0x00 ... 0x2f: case 0x50 ... 0x5f: - setup_force_cpu_cap(X86_FEATURE_ZEN); + setup_force_cpu_cap(X86_FEATURE_ZEN1); break; case 0x30 ... 0x4f: case 0x60 ... 0x7f: @@ -1011,12 +1011,13 @@ void init_spectral_chicken(struct cpuinfo_x86 *c)
static void init_amd_zn(struct cpuinfo_x86 *c) { + setup_force_cpu_cap(X86_FEATURE_ZEN); #ifdef CONFIG_NUMA node_reclaim_distance = 32; #endif }
-static void init_amd_zen(struct cpuinfo_x86 *c) +static void init_amd_zen1(struct cpuinfo_x86 *c) { fix_erratum_1386(c);
@@ -1130,8 +1131,8 @@ static void init_amd(struct cpuinfo_x86 *c) case 0x19: init_amd_zn(c); break; }
- if (boot_cpu_has(X86_FEATURE_ZEN)) - init_amd_zen(c); + if (boot_cpu_has(X86_FEATURE_ZEN1)) + init_amd_zen1(c); else if (boot_cpu_has(X86_FEATURE_ZEN2)) init_amd_zen2(c); else if (boot_cpu_has(X86_FEATURE_ZEN3)) @@ -1190,7 +1191,7 @@ static void init_amd(struct cpuinfo_x86 *c) * Counter May Be Inaccurate". */ if (cpu_has(c, X86_FEATURE_IRPERF) && - (boot_cpu_has(X86_FEATURE_ZEN) && c->x86_model > 0x2f)) + (boot_cpu_has(X86_FEATURE_ZEN1) && c->x86_model > 0x2f)) msr_set_bit(MSR_K7_HWCR, MSR_K7_HWCR_IRPERF_EN_BIT);
check_null_seg_clears_base(c); diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h index 798e60b5454b7..845a4023ba44e 100644 --- a/tools/arch/x86/include/asm/cpufeatures.h +++ b/tools/arch/x86/include/asm/cpufeatures.h @@ -219,7 +219,7 @@ #define X86_FEATURE_IBRS ( 7*32+25) /* Indirect Branch Restricted Speculation */ #define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */ #define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */ -#define X86_FEATURE_ZEN (7*32+28) /* "" CPU based on Zen microarchitecture */ +#define X86_FEATURE_ZEN ( 7*32+28) /* "" Generic flag for all Zen and newer */ #define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */ #define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */ #define X86_FEATURE_MSR_IA32_FEAT_CTL ( 7*32+31) /* "" MSR IA32_FEAT_CTL configured */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sandipan Das sandipan.das@amd.com
[ Upstream commit c7b2edd8377be983442c1344cb940cd2ac21b601 ]
AMD processors based on Zen 2 and later microarchitectures do not support PMCx087 (instruction pipe stalls) which is used as the backing event for "stalled-cycles-frontend" and "stalled-cycles-backend".
Use PMCx0A9 (cycles where micro-op queue is empty) instead to count frontend stalls and remove the entry for backend stalls since there is no direct replacement.
Signed-off-by: Sandipan Das sandipan.das@amd.com Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Ian Rogers irogers@google.com Fixes: 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h") Link: https://lore.kernel.org/r/03d7fc8fa2a28f9be732116009025bdec1b3ec97.171135218... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/events/amd/core.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c index 5365d6acbf090..b30349eeb7678 100644 --- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -250,7 +250,7 @@ static const u64 amd_perfmon_event_map[PERF_COUNT_HW_MAX] = /* * AMD Performance Monitor Family 17h and later: */ -static const u64 amd_f17h_perfmon_event_map[PERF_COUNT_HW_MAX] = +static const u64 amd_zen1_perfmon_event_map[PERF_COUNT_HW_MAX] = { [PERF_COUNT_HW_CPU_CYCLES] = 0x0076, [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0, @@ -262,10 +262,24 @@ static const u64 amd_f17h_perfmon_event_map[PERF_COUNT_HW_MAX] = [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x0187, };
+static const u64 amd_zen2_perfmon_event_map[PERF_COUNT_HW_MAX] = +{ + [PERF_COUNT_HW_CPU_CYCLES] = 0x0076, + [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0, + [PERF_COUNT_HW_CACHE_REFERENCES] = 0xff60, + [PERF_COUNT_HW_CACHE_MISSES] = 0x0964, + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c2, + [PERF_COUNT_HW_BRANCH_MISSES] = 0x00c3, + [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x00a9, +}; + static u64 amd_pmu_event_map(int hw_event) { - if (boot_cpu_data.x86 >= 0x17) - return amd_f17h_perfmon_event_map[hw_event]; + if (cpu_feature_enabled(X86_FEATURE_ZEN2) || boot_cpu_data.x86 >= 0x19) + return amd_zen2_perfmon_event_map[hw_event]; + + if (cpu_feature_enabled(X86_FEATURE_ZEN1)) + return amd_zen1_perfmon_event_map[hw_event];
return amd_perfmon_event_map[hw_event]; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sandipan Das sandipan.das@amd.com
[ Upstream commit 7f274e609f3d5f45c22b1dd59053f6764458b492 ]
Add a new word for scattered features because all free bits among the existing Linux-defined auxiliary flags have been exhausted.
Signed-off-by: Sandipan Das sandipan.das@amd.com Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/8380d2a0da469a1f0ad75b8954a79fb689599ff6.171109158... Stable-dep-of: 598c2fafc06f ("perf/x86/amd/lbr: Use freeze based on availability") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/cpufeature.h | 6 ++++-- arch/x86/include/asm/cpufeatures.h | 2 +- arch/x86/include/asm/disabled-features.h | 3 ++- arch/x86/include/asm/required-features.h | 3 ++- 4 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index a1273698fc430..42157ddcc09d4 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -91,8 +91,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 19, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 20, feature_bit) || \ + CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 21, feature_bit) || \ REQUIRED_MASK_CHECK || \ - BUILD_BUG_ON_ZERO(NCAPINTS != 21)) + BUILD_BUG_ON_ZERO(NCAPINTS != 22))
#define DISABLED_MASK_BIT_SET(feature_bit) \ ( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 0, feature_bit) || \ @@ -116,8 +117,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 19, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 20, feature_bit) || \ + CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 21, feature_bit) || \ DISABLED_MASK_CHECK || \ - BUILD_BUG_ON_ZERO(NCAPINTS != 21)) + BUILD_BUG_ON_ZERO(NCAPINTS != 22))
#define cpu_has(c, bit) \ (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \ diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index bc66aec9139ea..a42db7bbe5933 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -13,7 +13,7 @@ /* * Defines x86 CPU feature bits */ -#define NCAPINTS 21 /* N 32-bit words worth of info */ +#define NCAPINTS 22 /* N 32-bit words worth of info */ #define NBUGINTS 2 /* N 32-bit bug flags */
/* diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 702d93fdd10e8..88fcf08458d9c 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -143,6 +143,7 @@ #define DISABLED_MASK18 (DISABLE_IBT) #define DISABLED_MASK19 0 #define DISABLED_MASK20 0 -#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21) +#define DISABLED_MASK21 0 +#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 22)
#endif /* _ASM_X86_DISABLED_FEATURES_H */ diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h index 7ba1726b71c7b..e9187ddd3d1fd 100644 --- a/arch/x86/include/asm/required-features.h +++ b/arch/x86/include/asm/required-features.h @@ -99,6 +99,7 @@ #define REQUIRED_MASK18 0 #define REQUIRED_MASK19 0 #define REQUIRED_MASK20 0 -#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21) +#define REQUIRED_MASK21 0 +#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 22)
#endif /* _ASM_X86_REQUIRED_FEATURES_H */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sandipan Das sandipan.das@amd.com
[ Upstream commit 598c2fafc06fe5c56a1a415fb7b544b31453d637 ]
Currently, the LBR code assumes that LBR Freeze is supported on all processors when X86_FEATURE_AMD_LBR_V2 is available i.e. CPUID leaf 0x80000022[EAX] bit 1 is set. This is incorrect as the availability of the feature is additionally dependent on CPUID leaf 0x80000022[EAX] bit 2 being set, which may not be set for all Zen 4 processors.
Define a new feature bit for LBR and PMC freeze and set the freeze enable bit (FLBRI) in DebugCtl (MSR 0x1d9) conditionally.
It should still be possible to use LBR without freeze for profile-guided optimization of user programs by using an user-only branch filter during profiling. When the user-only filter is enabled, branches are no longer recorded after the transition to CPL 0 upon PMI arrival. When branch entries are read in the PMI handler, the branch stack does not change.
E.g.
$ perf record -j any,u -e ex_ret_brn_tkn ./workload
Since the feature bit is visible under flags in /proc/cpuinfo, it can be used to determine the feasibility of use-cases which require LBR Freeze to be supported by the hardware such as profile-guided optimization of kernels.
Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") Signed-off-by: Sandipan Das sandipan.das@amd.com Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/69a453c97cfd11c6f2584b19f937fe6df741510f.171109158... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/events/amd/core.c | 4 ++-- arch/x86/events/amd/lbr.c | 16 ++++++++++------ arch/x86/include/asm/cpufeatures.h | 8 ++++++++ arch/x86/kernel/cpu/scattered.c | 1 + 4 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c index b30349eeb7678..8ed10366c4a27 100644 --- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -918,8 +918,8 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs) if (!status) goto done;
- /* Read branch records before unfreezing */ - if (status & GLOBAL_STATUS_LBRS_FROZEN) { + /* Read branch records */ + if (x86_pmu.lbr_nr) { amd_pmu_lbr_read(); status &= ~GLOBAL_STATUS_LBRS_FROZEN; } diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c index eb31f850841a8..110e34c59643a 100644 --- a/arch/x86/events/amd/lbr.c +++ b/arch/x86/events/amd/lbr.c @@ -400,10 +400,12 @@ void amd_pmu_lbr_enable_all(void) wrmsrl(MSR_AMD64_LBR_SELECT, lbr_select); }
- rdmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl); - rdmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg); + if (cpu_feature_enabled(X86_FEATURE_AMD_LBR_PMC_FREEZE)) { + rdmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl); + wrmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI); + }
- wrmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI); + rdmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg); wrmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg | DBG_EXTN_CFG_LBRV2EN); }
@@ -416,10 +418,12 @@ void amd_pmu_lbr_disable_all(void) return;
rdmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg); - rdmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl); - wrmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg & ~DBG_EXTN_CFG_LBRV2EN); - wrmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl & ~DEBUGCTLMSR_FREEZE_LBRS_ON_PMI); + + if (cpu_feature_enabled(X86_FEATURE_AMD_LBR_PMC_FREEZE)) { + rdmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl); + wrmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl & ~DEBUGCTLMSR_FREEZE_LBRS_ON_PMI); + } }
__init int amd_pmu_lbr_init(void) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index a42db7bbe5933..9b2b99670d364 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -456,6 +456,14 @@ #define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* "" MSR_PRED_CMD[IBPB] flushes all branch type predictions */ #define X86_FEATURE_SRSO_NO (20*32+29) /* "" CPU is not affected by SRSO */
+/* + * Extended auxiliary flags: Linux defined - for features scattered in various + * CPUID levels like 0x80000022, etc. + * + * Reuse free bits when adding new feature flags! + */ +#define X86_FEATURE_AMD_LBR_PMC_FREEZE (21*32+ 0) /* AMD LBR and PMC Freeze */ + /* * BUG word(s) */ diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c index 0dad49a09b7a9..a515328d9d7d8 100644 --- a/arch/x86/kernel/cpu/scattered.c +++ b/arch/x86/kernel/cpu/scattered.c @@ -49,6 +49,7 @@ static const struct cpuid_bit cpuid_bits[] = { { X86_FEATURE_BMEC, CPUID_EBX, 3, 0x80000020, 0 }, { X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 }, { X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 }, + { X86_FEATURE_AMD_LBR_PMC_FREEZE, CPUID_EAX, 2, 0x80000022, 0 }, { 0, 0, 0, 0, 0 } };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jack Brennen jbrennen@google.com
[ Upstream commit 4074532758c5c367d3fcb8d124150824a254659d ]
Modify modpost to use binary search for converting addresses back into symbol references. Previously it used linear search.
This change saves a few seconds of wall time for defconfig builds, but can save several minutes on allyesconfigs.
Before: $ make LLVM=1 -j128 allyesconfig vmlinux -s KCFLAGS="-Wno-error" $ time scripts/mod/modpost -M -m -a -N -o vmlinux.symvers vmlinux.o 198.38user 1.27system 3:19.71elapsed
After: $ make LLVM=1 -j128 allyesconfig vmlinux -s KCFLAGS="-Wno-error" $ time scripts/mod/modpost -M -m -a -N -o vmlinux.symvers vmlinux.o 11.91user 0.85system 0:12.78elapsed
Signed-off-by: Jack Brennen jbrennen@google.com Tested-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Masahiro Yamada masahiroy@kernel.org Stable-dep-of: 1102f9f85bf6 ("modpost: do not make find_tosym() return NULL") Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/mod/Makefile | 4 +- scripts/mod/modpost.c | 70 ++------------ scripts/mod/modpost.h | 25 +++++ scripts/mod/symsearch.c | 199 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 232 insertions(+), 66 deletions(-) create mode 100644 scripts/mod/symsearch.c
diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile index c9e38ad937fd4..3c54125eb3733 100644 --- a/scripts/mod/Makefile +++ b/scripts/mod/Makefile @@ -5,7 +5,7 @@ CFLAGS_REMOVE_empty.o += $(CC_FLAGS_LTO) hostprogs-always-y += modpost mk_elfconfig always-y += empty.o
-modpost-objs := modpost.o file2alias.o sumversion.o +modpost-objs := modpost.o file2alias.o sumversion.o symsearch.o
devicetable-offsets-file := devicetable-offsets.h
@@ -16,7 +16,7 @@ targets += $(devicetable-offsets-file) devicetable-offsets.s
# dependencies on generated files need to be listed explicitly
-$(obj)/modpost.o $(obj)/file2alias.o $(obj)/sumversion.o: $(obj)/elfconfig.h +$(obj)/modpost.o $(obj)/file2alias.o $(obj)/sumversion.o $(obj)/symsearch.o: $(obj)/elfconfig.h $(obj)/file2alias.o: $(obj)/$(devicetable-offsets-file)
quiet_cmd_elfconfig = MKELF $@ diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 5191fdbd3fa23..66589fb4e9aef 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -22,7 +22,6 @@ #include <errno.h> #include "modpost.h" #include "../../include/linux/license.h" -#include "../../include/linux/module_symbol.h"
static bool module_enabled; /* Are we using CONFIG_MODVERSIONS? */ @@ -577,11 +576,14 @@ static int parse_elf(struct elf_info *info, const char *filename) *p = TO_NATIVE(*p); }
+ symsearch_init(info); + return 1; }
static void parse_elf_finish(struct elf_info *info) { + symsearch_finish(info); release_file(info->hdr, info->size); }
@@ -1042,71 +1044,10 @@ static int secref_whitelist(const char *fromsec, const char *fromsym, return 1; }
-/* - * If there's no name there, ignore it; likewise, ignore it if it's - * one of the magic symbols emitted used by current tools. - * - * Otherwise if find_symbols_between() returns those symbols, they'll - * fail the whitelist tests and cause lots of false alarms ... fixable - * only by merging __exit and __init sections into __text, bloating - * the kernel (which is especially evil on embedded platforms). - */ -static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym) -{ - const char *name = elf->strtab + sym->st_name; - - if (!name || !strlen(name)) - return 0; - return !is_mapping_symbol(name); -} - -/* Look up the nearest symbol based on the section and the address */ -static Elf_Sym *find_nearest_sym(struct elf_info *elf, Elf_Addr addr, - unsigned int secndx, bool allow_negative, - Elf_Addr min_distance) -{ - Elf_Sym *sym; - Elf_Sym *near = NULL; - Elf_Addr sym_addr, distance; - bool is_arm = (elf->hdr->e_machine == EM_ARM); - - for (sym = elf->symtab_start; sym < elf->symtab_stop; sym++) { - if (get_secindex(elf, sym) != secndx) - continue; - if (!is_valid_name(elf, sym)) - continue; - - sym_addr = sym->st_value; - - /* - * For ARM Thumb instruction, the bit 0 of st_value is set - * if the symbol is STT_FUNC type. Mask it to get the address. - */ - if (is_arm && ELF_ST_TYPE(sym->st_info) == STT_FUNC) - sym_addr &= ~1; - - if (addr >= sym_addr) - distance = addr - sym_addr; - else if (allow_negative) - distance = sym_addr - addr; - else - continue; - - if (distance <= min_distance) { - min_distance = distance; - near = sym; - } - - if (min_distance == 0) - break; - } - return near; -} - static Elf_Sym *find_fromsym(struct elf_info *elf, Elf_Addr addr, unsigned int secndx) { - return find_nearest_sym(elf, addr, secndx, false, ~0); + return symsearch_find_nearest(elf, addr, secndx, false, ~0); }
static Elf_Sym *find_tosym(struct elf_info *elf, Elf_Addr addr, Elf_Sym *sym) @@ -1119,7 +1060,8 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf_Addr addr, Elf_Sym *sym) * Strive to find a better symbol name, but the resulting name may not * match the symbol referenced in the original code. */ - return find_nearest_sym(elf, addr, get_secindex(elf, sym), true, 20); + return symsearch_find_nearest(elf, addr, get_secindex(elf, sym), + true, 20); }
static bool is_executable_section(struct elf_info *elf, unsigned int secndx) diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h index 5f94c2c9f2d95..6413f26fcb6b4 100644 --- a/scripts/mod/modpost.h +++ b/scripts/mod/modpost.h @@ -10,6 +10,7 @@ #include <fcntl.h> #include <unistd.h> #include <elf.h> +#include "../../include/linux/module_symbol.h"
#include "list.h" #include "elfconfig.h" @@ -128,6 +129,8 @@ struct elf_info { * take shndx from symtab_shndx_start[N] instead */ Elf32_Word *symtab_shndx_start; Elf32_Word *symtab_shndx_stop; + + struct symsearch *symsearch; };
/* Accessor for sym->st_shndx, hides ugliness of "64k sections" */ @@ -154,6 +157,28 @@ static inline unsigned int get_secindex(const struct elf_info *info, return index; }
+/* + * If there's no name there, ignore it; likewise, ignore it if it's + * one of the magic symbols emitted used by current tools. + * + * Internal symbols created by tools should be ignored by modpost. + */ +static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym) +{ + const char *name = elf->strtab + sym->st_name; + + if (!name || !strlen(name)) + return 0; + return !is_mapping_symbol(name); +} + +/* symsearch.c */ +void symsearch_init(struct elf_info *elf); +void symsearch_finish(struct elf_info *elf); +Elf_Sym *symsearch_find_nearest(struct elf_info *elf, Elf_Addr addr, + unsigned int secndx, bool allow_negative, + Elf_Addr min_distance); + /* file2alias.c */ void handle_moddevtable(struct module *mod, struct elf_info *info, Elf_Sym *sym, const char *symname); diff --git a/scripts/mod/symsearch.c b/scripts/mod/symsearch.c new file mode 100644 index 0000000000000..aa4ed51f9960c --- /dev/null +++ b/scripts/mod/symsearch.c @@ -0,0 +1,199 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Helper functions for finding the symbol in an ELF which is "nearest" + * to a given address. + */ + +#include "modpost.h" + +struct syminfo { + unsigned int symbol_index; + unsigned int section_index; + Elf_Addr addr; +}; + +/* + * Container used to hold an entire binary search table. + * Entries in table are ascending, sorted first by section_index, + * then by addr, and last by symbol_index. The sorting by + * symbol_index is used to ensure predictable behavior when + * multiple symbols are present with the same address; all + * symbols past the first are effectively ignored, by eliding + * them in symsearch_fixup(). + */ +struct symsearch { + unsigned int table_size; + struct syminfo table[]; +}; + +static int syminfo_compare(const void *s1, const void *s2) +{ + const struct syminfo *sym1 = s1; + const struct syminfo *sym2 = s2; + + if (sym1->section_index > sym2->section_index) + return 1; + if (sym1->section_index < sym2->section_index) + return -1; + if (sym1->addr > sym2->addr) + return 1; + if (sym1->addr < sym2->addr) + return -1; + if (sym1->symbol_index > sym2->symbol_index) + return 1; + if (sym1->symbol_index < sym2->symbol_index) + return -1; + return 0; +} + +static unsigned int symbol_count(struct elf_info *elf) +{ + unsigned int result = 0; + + for (Elf_Sym *sym = elf->symtab_start; sym < elf->symtab_stop; sym++) { + if (is_valid_name(elf, sym)) + result++; + } + return result; +} + +/* + * Populate the search array that we just allocated. + * Be slightly paranoid here. The ELF file is mmap'd and could + * conceivably change between symbol_count() and symsearch_populate(). + * If we notice any difference, bail out rather than potentially + * propagating errors or crashing. + */ +static void symsearch_populate(struct elf_info *elf, + struct syminfo *table, + unsigned int table_size) +{ + bool is_arm = (elf->hdr->e_machine == EM_ARM); + + for (Elf_Sym *sym = elf->symtab_start; sym < elf->symtab_stop; sym++) { + if (is_valid_name(elf, sym)) { + if (table_size-- == 0) + fatal("%s: size mismatch\n", __func__); + table->symbol_index = sym - elf->symtab_start; + table->section_index = get_secindex(elf, sym); + table->addr = sym->st_value; + + /* + * For ARM Thumb instruction, the bit 0 of st_value is + * set if the symbol is STT_FUNC type. Mask it to get + * the address. + */ + if (is_arm && ELF_ST_TYPE(sym->st_info) == STT_FUNC) + table->addr &= ~1; + + table++; + } + } + + if (table_size != 0) + fatal("%s: size mismatch\n", __func__); +} + +/* + * Do any fixups on the table after sorting. + * For now, this just finds adjacent entries which have + * the same section_index and addr, and it propagates + * the first symbol_index over the subsequent entries, + * so that only one symbol_index is seen for any given + * section_index and addr. This ensures that whether + * we're looking at an address from "above" or "below" + * that we see the same symbol_index. + * This does leave some duplicate entries in the table; + * in practice, these are a small fraction of the + * total number of entries, and they are harmless to + * the binary search algorithm other than a few occasional + * unnecessary comparisons. + */ +static void symsearch_fixup(struct syminfo *table, unsigned int table_size) +{ + /* Don't look at index 0, it will never change. */ + for (unsigned int i = 1; i < table_size; i++) { + if (table[i].addr == table[i - 1].addr && + table[i].section_index == table[i - 1].section_index) { + table[i].symbol_index = table[i - 1].symbol_index; + } + } +} + +void symsearch_init(struct elf_info *elf) +{ + unsigned int table_size = symbol_count(elf); + + elf->symsearch = NOFAIL(malloc(sizeof(struct symsearch) + + sizeof(struct syminfo) * table_size)); + elf->symsearch->table_size = table_size; + + symsearch_populate(elf, elf->symsearch->table, table_size); + qsort(elf->symsearch->table, table_size, + sizeof(struct syminfo), syminfo_compare); + + symsearch_fixup(elf->symsearch->table, table_size); +} + +void symsearch_finish(struct elf_info *elf) +{ + free(elf->symsearch); + elf->symsearch = NULL; +} + +/* + * Find the syminfo which is in secndx and "nearest" to addr. + * allow_negative: allow returning a symbol whose address is > addr. + * min_distance: ignore symbols which are further away than this. + * + * Returns a pointer into the symbol table for success. + * Returns NULL if no legal symbol is found within the requested range. + */ +Elf_Sym *symsearch_find_nearest(struct elf_info *elf, Elf_Addr addr, + unsigned int secndx, bool allow_negative, + Elf_Addr min_distance) +{ + unsigned int hi = elf->symsearch->table_size; + unsigned int lo = 0; + struct syminfo *table = elf->symsearch->table; + struct syminfo target; + + target.addr = addr; + target.section_index = secndx; + target.symbol_index = ~0; /* compares greater than any actual index */ + while (hi > lo) { + unsigned int mid = lo + (hi - lo) / 2; /* Avoids overflow */ + + if (syminfo_compare(&table[mid], &target) > 0) + hi = mid; + else + lo = mid + 1; + } + + /* + * table[hi], if it exists, is the first entry in the array which + * lies beyond target. table[hi - 1], if it exists, is the last + * entry in the array which comes before target, including the + * case where it perfectly matches the section and the address. + * + * Note -- if the address we're looking up falls perfectly + * in the middle of two symbols, this is written to always + * prefer the symbol with the lower address. + */ + Elf_Sym *result = NULL; + + if (allow_negative && + hi < elf->symsearch->table_size && + table[hi].section_index == secndx && + table[hi].addr - addr <= min_distance) { + min_distance = table[hi].addr - addr; + result = &elf->symtab_start[table[hi].symbol_index]; + } + if (hi > 0 && + table[hi - 1].section_index == secndx && + addr - table[hi - 1].addr <= min_distance) { + result = &elf->symtab_start[table[hi - 1].symbol_index]; + } + return result; +}
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit 1102f9f85bf66b1a7bd6a40afb40efbbe05dfc05 ]
As mentioned in commit 397586506c3d ("modpost: Add '.ltext' and '.ltext.*' to TEXT_SECTIONS"), modpost can result in a segmentation fault due to a NULL pointer dereference in default_mismatch_handler().
find_tosym() can return the original symbol pointer instead of NULL if a better one is not found.
This fixes the reported segmentation fault.
Fixes: a23e7584ecf3 ("modpost: unify 'sym' and 'to' in default_mismatch_handler()") Reported-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/mod/modpost.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 66589fb4e9aef..7d53942445d75 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -1052,6 +1052,8 @@ static Elf_Sym *find_fromsym(struct elf_info *elf, Elf_Addr addr,
static Elf_Sym *find_tosym(struct elf_info *elf, Elf_Addr addr, Elf_Sym *sym) { + Elf_Sym *new_sym; + /* If the supplied symbol has a valid name, return it */ if (is_valid_name(elf, sym)) return sym; @@ -1060,8 +1062,9 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf_Addr addr, Elf_Sym *sym) * Strive to find a better symbol name, but the resulting name may not * match the symbol referenced in the original code. */ - return symsearch_find_nearest(elf, addr, get_secindex(elf, sym), - true, 20); + new_sym = symsearch_find_nearest(elf, addr, get_secindex(elf, sym), + true, 20); + return new_sym ? new_sym : sym; }
static bool is_executable_section(struct elf_info *elf, unsigned int secndx)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bartosz Golaszewski bartosz.golaszewski@linaro.org
commit b34490879baa847d16fc529c8ea6e6d34f004b38 upstream.
When an interrupt is requested, a procfs directory is created under "/proc/irq/<irqnum>/<label>" where <label> is the string passed to one of the request_irq() variants.
What follows is that the string must not contain the "/" character or the procfs mkdir operation will fail. We don't have such constraints for GPIO consumer labels which are used verbatim as interrupt labels for GPIO irqs. We must therefore sanitize the consumer string before requesting the interrupt.
Let's replace all "/" with ":".
Cc: stable@vger.kernel.org Reported-by: Stefan Wahren wahrenst@gmx.net Closes: https://lore.kernel.org/linux-gpio/39fe95cb-aa83-4b8b-8cab-63947a726754@gmx.... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Reviewed-by: Kent Gibson warthog618@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpiolib-cdev.c | 38 ++++++++++++++++++++++++++++++++------ 1 file changed, 32 insertions(+), 6 deletions(-)
--- a/drivers/gpio/gpiolib-cdev.c +++ b/drivers/gpio/gpiolib-cdev.c @@ -1010,10 +1010,20 @@ static u32 gpio_v2_line_config_debounce_ return 0; }
+static inline char *make_irq_label(const char *orig) +{ + return kstrdup_and_replace(orig, '/', ':', GFP_KERNEL); +} + +static inline void free_irq_label(const char *label) +{ + kfree(label); +} + static void edge_detector_stop(struct line *line) { if (line->irq) { - free_irq(line->irq, line); + free_irq_label(free_irq(line->irq, line)); line->irq = 0; }
@@ -1038,6 +1048,7 @@ static int edge_detector_setup(struct li unsigned long irqflags = 0; u64 eflags; int irq, ret; + char *label;
eflags = edflags & GPIO_V2_LINE_EDGE_FLAGS; if (eflags && !kfifo_initialized(&line->req->events)) { @@ -1074,11 +1085,17 @@ static int edge_detector_setup(struct li IRQF_TRIGGER_RISING : IRQF_TRIGGER_FALLING; irqflags |= IRQF_ONESHOT;
+ label = make_irq_label(line->req->label); + if (!label) + return -ENOMEM; + /* Request a thread to read the events */ ret = request_threaded_irq(irq, edge_irq_handler, edge_irq_thread, - irqflags, line->req->label, line); - if (ret) + irqflags, label, line); + if (ret) { + free_irq_label(label); return ret; + }
line->irq = irq; return 0; @@ -1943,7 +1960,7 @@ static void lineevent_free(struct lineev blocking_notifier_chain_unregister(&le->gdev->device_notifier, &le->device_unregistered_nb); if (le->irq) - free_irq(le->irq, le); + free_irq_label(free_irq(le->irq, le)); if (le->desc) gpiod_free(le->desc); kfree(le->label); @@ -2091,6 +2108,7 @@ static int lineevent_create(struct gpio_ int fd; int ret; int irq, irqflags = 0; + char *label;
if (copy_from_user(&eventreq, ip, sizeof(eventreq))) return -EFAULT; @@ -2175,15 +2193,23 @@ static int lineevent_create(struct gpio_ if (ret) goto out_free_le;
+ label = make_irq_label(le->label); + if (!label) { + ret = -ENOMEM; + goto out_free_le; + } + /* Request a thread to read the events */ ret = request_threaded_irq(irq, lineevent_irq_handler, lineevent_irq_thread, irqflags, - le->label, + label, le); - if (ret) + if (ret) { + free_irq_label(label); goto out_free_le; + }
le->irq = irq;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anup Patel apatel@ventanamicro.com
commit d8dd9f113e16bef3b29c9dcceb584a6f144f55e4 upstream.
The writes to setipnum_le/be register for APLIC in MSI-mode have special consideration for level-triggered interrupts as-per the section "4.9.2 Special consideration for level-sensitive interrupt sources" of the RISC-V AIA specification.
Particularly, the below text from the RISC-V AIA specification defines the behaviour of writes to setipnum_le/be register for level-triggered interrupts:
"A second option is for the interrupt service routine to write the APLIC’s source identity number for the interrupt to the domain’s setipnum register just before exiting. This will cause the interrupt’s pending bit to be set to one again if the source is still asserting an interrupt, but not if the source is not asserting an interrupt."
Fix setipnum_le/be write emulation for in-kernel APLIC by implementing the above behaviour in aplic_write_pending() function.
Cc: stable@vger.kernel.org Fixes: 74967aa208e2 ("RISC-V: KVM: Add in-kernel emulation of AIA APLIC") Signed-off-by: Anup Patel apatel@ventanamicro.com Signed-off-by: Anup Patel anup@brainfault.org Link: https://lore.kernel.org/r/20240321085041.1955293-2-apatel@ventanamicro.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/kvm/aia_aplic.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/kvm/aia_aplic.c b/arch/riscv/kvm/aia_aplic.c index 39e72aa016a4..5e842b92dc46 100644 --- a/arch/riscv/kvm/aia_aplic.c +++ b/arch/riscv/kvm/aia_aplic.c @@ -137,11 +137,21 @@ static void aplic_write_pending(struct aplic *aplic, u32 irq, bool pending) raw_spin_lock_irqsave(&irqd->lock, flags);
sm = irqd->sourcecfg & APLIC_SOURCECFG_SM_MASK; - if (!pending && - ((sm == APLIC_SOURCECFG_SM_LEVEL_HIGH) || - (sm == APLIC_SOURCECFG_SM_LEVEL_LOW))) + if (sm == APLIC_SOURCECFG_SM_INACTIVE) goto skip_write_pending;
+ if (sm == APLIC_SOURCECFG_SM_LEVEL_HIGH || + sm == APLIC_SOURCECFG_SM_LEVEL_LOW) { + if (!pending) + goto skip_write_pending; + if ((irqd->state & APLIC_IRQ_STATE_INPUT) && + sm == APLIC_SOURCECFG_SM_LEVEL_LOW) + goto skip_write_pending; + if (!(irqd->state & APLIC_IRQ_STATE_INPUT) && + sm == APLIC_SOURCECFG_SM_LEVEL_HIGH) + goto skip_write_pending; + } + if (pending) irqd->state |= APLIC_IRQ_STATE_PENDING; else
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anup Patel apatel@ventanamicro.com
commit 8e936e98718f005c986be0bfa1ee6b355acf96be upstream.
The reads to APLIC in_clrip[x] registers returns rectified input values of the interrupt sources.
A rectified input value of an interrupt source is defined by the section "4.5.2 Source configurations (sourcecfg[1]–sourcecfg[1023])" of the RISC-V AIA specification as:
rectified input value = (incoming wire value) XOR (source is inverted)
Update the riscv_aplic_input() implementation to match the above.
Cc: stable@vger.kernel.org Fixes: 74967aa208e2 ("RISC-V: KVM: Add in-kernel emulation of AIA APLIC") Signed-off-by: Anup Patel apatel@ventanamicro.com Signed-off-by: Anup Patel anup@brainfault.org Link: https://lore.kernel.org/r/20240321085041.1955293-3-apatel@ventanamicro.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/kvm/aia_aplic.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/kvm/aia_aplic.c b/arch/riscv/kvm/aia_aplic.c index 5e842b92dc46..b467ba5ed910 100644 --- a/arch/riscv/kvm/aia_aplic.c +++ b/arch/riscv/kvm/aia_aplic.c @@ -197,16 +197,31 @@ static void aplic_write_enabled(struct aplic *aplic, u32 irq, bool enabled)
static bool aplic_read_input(struct aplic *aplic, u32 irq) { - bool ret; - unsigned long flags; + u32 sourcecfg, sm, raw_input, irq_inverted; struct aplic_irq *irqd; + unsigned long flags; + bool ret = false;
if (!irq || aplic->nr_irqs <= irq) return false; irqd = &aplic->irqs[irq];
raw_spin_lock_irqsave(&irqd->lock, flags); - ret = (irqd->state & APLIC_IRQ_STATE_INPUT) ? true : false; + + sourcecfg = irqd->sourcecfg; + if (sourcecfg & APLIC_SOURCECFG_D) + goto skip; + + sm = sourcecfg & APLIC_SOURCECFG_SM_MASK; + if (sm == APLIC_SOURCECFG_SM_INACTIVE) + goto skip; + + raw_input = (irqd->state & APLIC_IRQ_STATE_INPUT) ? 1 : 0; + irq_inverted = (sm == APLIC_SOURCECFG_SM_LEVEL_LOW || + sm == APLIC_SOURCECFG_SM_EDGE_FALL) ? 1 : 0; + ret = !!(raw_input ^ irq_inverted); + +skip: raw_spin_unlock_irqrestore(&irqd->lock, flags);
return ret;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oliver Upton oliver.upton@linux.dev
commit e89c928bedd77d181edc2df01cb6672184775140 upstream.
Programming PMU events in the host that count during guest execution is a feature supported by perf, e.g.
perf stat -e cpu_cycles:G ./lkvm run
While this works for VHE, the guest/host event bitmaps are not carried through to the hypervisor in the nVHE configuration. Make kvm_pmu_update_vcpu_events() conditional on whether or not _hardware_ supports PMUv3 rather than if the vCPU as vPMU enabled.
Cc: stable@vger.kernel.org Fixes: 84d751a019a9 ("KVM: arm64: Pass pmu events to hyp via vcpu") Reviewed-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20240305184840.636212-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton oliver.upton@linux.dev Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/kvm/arm_pmu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/kvm/arm_pmu.h +++ b/include/kvm/arm_pmu.h @@ -86,7 +86,7 @@ void kvm_vcpu_pmu_resync_el0(void); */ #define kvm_pmu_update_vcpu_events(vcpu) \ do { \ - if (!has_vhe() && kvm_vcpu_has_pmu(vcpu)) \ + if (!has_vhe() && kvm_arm_support_pmu_v3()) \ vcpu->arch.pmu.events = *kvm_get_pmu_events(); \ } while (0)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Göttsche cgzones@googlemail.com
commit 37801a36b4d68892ce807264f784d818f8d0d39b upstream.
In case kern_mount() fails and returns an error pointer return in the error branch instead of continuing and dereferencing the error pointer.
While on it drop the never read static variable selinuxfs_mount.
Cc: stable@vger.kernel.org Fixes: 0619f0f5e36f ("selinux: wrap selinuxfs state") Signed-off-by: Christian Göttsche cgzones@googlemail.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/selinux/selinuxfs.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/security/selinux/selinuxfs.c +++ b/security/selinux/selinuxfs.c @@ -2135,7 +2135,6 @@ static struct file_system_type sel_fs_ty .kill_sb = sel_kill_sb, };
-static struct vfsmount *selinuxfs_mount __ro_after_init; struct path selinux_null __ro_after_init;
static int __init init_sel_fs(void) @@ -2157,18 +2156,21 @@ static int __init init_sel_fs(void) return err; }
- selinux_null.mnt = selinuxfs_mount = kern_mount(&sel_fs_type); - if (IS_ERR(selinuxfs_mount)) { + selinux_null.mnt = kern_mount(&sel_fs_type); + if (IS_ERR(selinux_null.mnt)) { pr_err("selinuxfs: could not mount!\n"); - err = PTR_ERR(selinuxfs_mount); - selinuxfs_mount = NULL; + err = PTR_ERR(selinux_null.mnt); + selinux_null.mnt = NULL; + return err; } + selinux_null.dentry = d_hash_and_lookup(selinux_null.mnt->mnt_root, &null_name); if (IS_ERR(selinux_null.dentry)) { pr_err("selinuxfs: could not lookup null!\n"); err = PTR_ERR(selinux_null.dentry); selinux_null.dentry = NULL; + return err; }
return err;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiner Kallweit hkallweit1@gmail.com
commit 5d872c9f46bd2ea3524af3c2420a364a13667135 upstream.
On some boards with this chip version the BIOS is buggy and misses to reset the PHY page selector. This results in the PHY ID read accessing registers on a different page, returning a more or less random value. Fix this by resetting the page selector first.
Fixes: f1e911d5d0df ("r8169: add basic phylib support") Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/64f2055e-98b8-45ec-8568-665e3d54d4e6@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/realtek/r8169_main.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -5055,6 +5055,15 @@ static int r8169_mdio_register(struct rt struct mii_bus *new_bus; int ret;
+ /* On some boards with this chip version the BIOS is buggy and misses + * to reset the PHY page selector. This results in the PHY ID read + * accessing registers on a different page, returning a more or + * less random value. Fix this by resetting the page selector first. + */ + if (tp->mac_version == RTL_GIGA_MAC_VER_25 || + tp->mac_version == RTL_GIGA_MAC_VER_26) + r8169_mdio_write(tp, 0x1f, 0); + new_bus = devm_mdiobus_alloc(&pdev->dev); if (!new_bus) return -ENOMEM;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit 8cb4a9a82b21623dbb4b3051dd30d98356cf95bc upstream.
Add CPUID_LNX_5 to track cpufeatures' word 21, and add the appropriate compile-time assert in KVM to prevent direct lookups on the features in CPUID_LNX_5. KVM uses X86_FEATURE_* flags to manage guest CPUID, and so must translate features that are scattered by Linux from the Linux-defined bit to the hardware-defined bit, i.e. should never try to directly access scattered features in guest CPUID.
Opportunistically add NR_CPUID_WORDS to enum cpuid_leafs, along with a compile-time assert in KVM's CPUID infrastructure to ensure that future additions update cpuid_leafs along with NCAPINTS.
No functional change intended.
Fixes: 7f274e609f3d ("x86/cpufeatures: Add new word for scattered features") Cc: Sandipan Das sandipan.das@amd.com Signed-off-by: Sean Christopherson seanjc@google.com Acked-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeature.h | 2 ++ arch/x86/kvm/reverse_cpuid.h | 2 ++ 2 files changed, 4 insertions(+)
--- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -33,6 +33,8 @@ enum cpuid_leafs CPUID_7_EDX, CPUID_8000_001F_EAX, CPUID_8000_0021_EAX, + CPUID_LNX_5, + NR_CPUID_WORDS, };
#define X86_CAP_FMT_NUM "%d:%d" --- a/arch/x86/kvm/reverse_cpuid.h +++ b/arch/x86/kvm/reverse_cpuid.h @@ -102,10 +102,12 @@ static const struct cpuid_reg reverse_cp */ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf) { + BUILD_BUG_ON(NR_CPUID_WORDS != NCAPINTS); BUILD_BUG_ON(x86_leaf == CPUID_LNX_1); BUILD_BUG_ON(x86_leaf == CPUID_LNX_2); BUILD_BUG_ON(x86_leaf == CPUID_LNX_3); BUILD_BUG_ON(x86_leaf == CPUID_LNX_4); + BUILD_BUG_ON(x86_leaf == CPUID_LNX_5); BUILD_BUG_ON(x86_leaf >= ARRAY_SIZE(reverse_cpuid)); BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 4790a73ace86f3d165bbedba898e0758e6e1b82d upstream.
This reverts commit 7dcd3e014aa7faeeaf4047190b22d8a19a0db696.
Qualcomm Bluetooth controllers like WCN6855 do not have persistent storage for the Bluetooth address and must therefore start as unconfigured to allow the user to set a valid address unless one has been provided by the boot firmware in the devicetree.
A recent change snuck into v6.8-rc7 and incorrectly started marking the default (non-unique) address as valid. This specifically also breaks the Bluetooth setup for some user of the Lenovo ThinkPad X13s.
Note that this is the second time Qualcomm breaks the driver this way and that this was fixed last year by commit 6945795bc81a ("Bluetooth: fix use-bdaddr-property quirk"), which also has some further details.
Fixes: 7dcd3e014aa7 ("Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT") Cc: stable@vger.kernel.org # 6.8 Cc: Janaki Ramaiah Thota quic_janathot@quicinc.com Signed-off-by: Johan Hovold johan+linaro@kernel.org Reported-by: Clayton Craft clayton@craftyguy.net Tested-by: Clayton Craft clayton@craftyguy.net Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/bluetooth/hci_qca.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-)
--- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -7,7 +7,6 @@ * * Copyright (C) 2007 Texas Instruments, Inc. * Copyright (c) 2010, 2012, 2018 The Linux Foundation. All rights reserved. - * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved. * * Acknowledgements: * This file is based on hci_ll.c, which was... @@ -1882,17 +1881,7 @@ retry: case QCA_WCN6750: case QCA_WCN6855: case QCA_WCN7850: - - /* Set BDA quirk bit for reading BDA value from fwnode property - * only if that property exist in DT. - */ - if (fwnode_property_present(dev_fwnode(hdev->dev.parent), "local-bd-address")) { - set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks); - bt_dev_info(hdev, "setting quirk bit to read BDA from fwnode later"); - } else { - bt_dev_dbg(hdev, "local-bd-address` is not present in the devicetree so not setting quirk bit for BDA"); - } - + set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks); hci_set_aosp_capable(hdev);
ret = qca_read_soc_version(hdev, &ver, soc_type);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit e12e28009e584c8f8363439f6a928ec86278a106 upstream.
Several Qualcomm Bluetooth controllers lack persistent storage for the device address and instead one can be provided by the boot firmware using the 'local-bd-address' devicetree property.
The Bluetooth bindings clearly states that the address should be specified in little-endian order, but due to a long-standing bug in the Qualcomm driver which reversed the address some boot firmware has been providing the address in big-endian order instead.
The boot firmware in SC7180 Trogdor Chromebooks is known to be affected so mark the 'local-bd-address' property as broken to maintain backwards compatibility with older firmware when fixing the underlying driver bug.
Note that ChromeOS always updates the kernel and devicetree in lockstep so that there is no need to handle backwards compatibility with older devicetrees.
Fixes: 7ec3e67307f8 ("arm64: dts: qcom: sc7180-trogdor: add initial trogdor and lazor dt") Cc: stable@vger.kernel.org # 5.10 Cc: Rob Clark robdclark@chromium.org Reviewed-by: Douglas Anderson dianders@chromium.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Acked-by: Bjorn Andersson andersson@kernel.org Reviewed-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi +++ b/arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi @@ -970,6 +970,8 @@ ap_spi_fp: &spi10 { vddrf-supply = <&pp1300_l2c>; vddch0-supply = <&pp3300_l10c>; max-speed = <3200000>; + + qcom,local-bd-address-broken; }; };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 77f45cca8bc55d00520a192f5a7715133591c83e upstream.
The WCN6855 firmware on the Lenovo ThinkPad X13s expects the Bluetooth device address in big-endian order when setting it using the EDL_WRITE_BD_ADDR_OPCODE command.
Presumably, this is the case for all non-ROME devices which all use the EDL_WRITE_BD_ADDR_OPCODE command for this (unlike the ROME devices which use a different command and expect the address in little-endian order).
Reverse the little-endian address before setting it to make sure that the address can be configured using tools like btmgmt or using the 'local-bd-address' devicetree property.
Note that this can potentially break systems with boot firmware which has started relying on the broken behaviour and is incorrectly passing the address via devicetree in big-endian order.
The only device affected by this should be the WCN3991 used in some Chromebooks. As ChromeOS updates the kernel and devicetree in lockstep, the new 'qcom,local-bd-address-broken' property can be used to determine if the firmware is buggy so that the underlying driver bug can be fixed without breaking backwards compatibility.
Set the HCI_QUIRK_BDADDR_PROPERTY_BROKEN quirk for such platforms so that the address is reversed when parsing the address property.
Fixes: 5c0a1001c8be ("Bluetooth: hci_qca: Add helper to set device address") Cc: stable@vger.kernel.org # 5.1 Cc: Balakrishna Godavarthi quic_bgodavar@quicinc.com Cc: Matthias Kaehlcke mka@chromium.org Tested-by: Nikita Travkin nikita@trvn.ru # sc7180 Reviewed-by: Douglas Anderson dianders@chromium.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/bluetooth/btqca.c | 8 ++++++-- drivers/bluetooth/hci_qca.c | 10 ++++++++++ 2 files changed, 16 insertions(+), 2 deletions(-)
--- a/drivers/bluetooth/btqca.c +++ b/drivers/bluetooth/btqca.c @@ -758,11 +758,15 @@ EXPORT_SYMBOL_GPL(qca_uart_setup);
int qca_set_bdaddr(struct hci_dev *hdev, const bdaddr_t *bdaddr) { + bdaddr_t bdaddr_swapped; struct sk_buff *skb; int err;
- skb = __hci_cmd_sync_ev(hdev, EDL_WRITE_BD_ADDR_OPCODE, 6, bdaddr, - HCI_EV_VENDOR, HCI_INIT_TIMEOUT); + baswap(&bdaddr_swapped, bdaddr); + + skb = __hci_cmd_sync_ev(hdev, EDL_WRITE_BD_ADDR_OPCODE, 6, + &bdaddr_swapped, HCI_EV_VENDOR, + HCI_INIT_TIMEOUT); if (IS_ERR(skb)) { err = PTR_ERR(skb); bt_dev_err(hdev, "QCA Change address cmd failed (%d)", err); --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -225,6 +225,7 @@ struct qca_serdev { struct qca_power *bt_power; u32 init_speed; u32 oper_speed; + bool bdaddr_property_broken; const char *firmware_name; };
@@ -1824,6 +1825,7 @@ static int qca_setup(struct hci_uart *hu const char *firmware_name = qca_get_firmware_name(hu); int ret; struct qca_btsoc_version ver; + struct qca_serdev *qcadev; const char *soc_name;
ret = qca_check_speeds(hu); @@ -1882,6 +1884,11 @@ retry: case QCA_WCN6855: case QCA_WCN7850: set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks); + + qcadev = serdev_device_get_drvdata(hu->serdev); + if (qcadev->bdaddr_property_broken) + set_bit(HCI_QUIRK_BDADDR_PROPERTY_BROKEN, &hdev->quirks); + hci_set_aosp_capable(hdev);
ret = qca_read_soc_version(hdev, &ver, soc_type); @@ -2253,6 +2260,9 @@ static int qca_serdev_probe(struct serde if (!qcadev->oper_speed) BT_DBG("UART will pick default operating speed");
+ qcadev->bdaddr_property_broken = device_property_read_bool(&serdev->dev, + "qcom,local-bd-address-broken"); + if (data) qcadev->btsoc_type = data->soc_type; else
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 39646f29b100566451d37abc4cc8cdd583756dfe upstream.
Some Bluetooth controllers lack persistent storage for the device address and instead one can be provided by the boot firmware using the 'local-bd-address' devicetree property.
The Bluetooth devicetree bindings clearly states that the address should be specified in little-endian order, but due to a long-standing bug in the Qualcomm driver which reversed the address some boot firmware has been providing the address in big-endian order instead.
Add a new quirk that can be set on platforms with broken firmware and use it to reverse the address when parsing the property so that the underlying driver bug can be fixed.
Fixes: 5c0a1001c8be ("Bluetooth: hci_qca: Add helper to set device address") Cc: stable@vger.kernel.org # 5.1 Reviewed-by: Douglas Anderson dianders@chromium.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/bluetooth/hci.h | 9 +++++++++ net/bluetooth/hci_sync.c | 5 ++++- 2 files changed, 13 insertions(+), 1 deletion(-)
--- a/include/net/bluetooth/hci.h +++ b/include/net/bluetooth/hci.h @@ -176,6 +176,15 @@ enum { */ HCI_QUIRK_USE_BDADDR_PROPERTY,
+ /* When this quirk is set, the Bluetooth Device Address provided by + * the 'local-bd-address' fwnode property is incorrectly specified in + * big-endian order. + * + * This quirk can be set before hci_register_dev is called or + * during the hdev->setup vendor callback. + */ + HCI_QUIRK_BDADDR_PROPERTY_BROKEN, + /* When this quirk is set, the duplicate filtering during * scanning is based on Bluetooth devices addresses. To allow * RSSI based updates, restart scanning if needed. --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -3292,7 +3292,10 @@ static void hci_dev_get_bd_addr_from_pro if (ret < 0 || !bacmp(&ba, BDADDR_ANY)) return;
- bacpy(&hdev->public_addr, &ba); + if (test_bit(HCI_QUIRK_BDADDR_PROPERTY_BROKEN, &hdev->quirks)) + baswap(&hdev->public_addr, &ba); + else + bacpy(&hdev->public_addr, &ba); }
struct hci_init_stage {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hui Wang hui.wang@canonical.com
commit c569242cd49287d53b73a94233db40097d838535 upstream.
We have a BT headset (Lenovo Thinkplus XT99), the pairing and connecting has no problem, once this headset is paired, bluez will remember this device and will auto re-connect it whenever the device is powered on. The auto re-connecting works well with Windows and Android, but with Linux, it always fails. Through debugging, we found at the rfcomm connection stage, the bluetooth stack reports "Connection refused - security block (0x0003)".
For this device, the re-connecting negotiation process is different from other BT headsets, it sends the Link_KEY_REQUEST command before the CONNECT_REQUEST completes, and it doesn't send ENCRYPT_CHANGE command during the negotiation. When the device sends the "connect complete" to hci, the ev->encr_mode is 1.
So here in the conn_complete_evt(), if ev->encr_mode is 1, link type is ACL and HCI_CONN_ENCRYPT is not set, we set HCI_CONN_ENCRYPT to this conn, and update conn->enc_key_size accordingly.
After this change, this BT headset could re-connect with Linux successfully. This is the btmon log after applying the patch, after receiving the "Connect Complete" with "Encryption: Enabled", will send the command to read encryption key size:
HCI Event: Connect Request (0x04) plen 10
Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA) Class: 0x240404 Major class: Audio/Video (headset, speaker, stereo, video, vcr) Minor class: Wearable Headset Device Rendering (Printing, Speaker) Audio (Speaker, Microphone, Headset) Link type: ACL (0x01) ...
HCI Event: Link Key Request (0x17) plen 6
Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA) < HCI Command: Link Key Request Reply (0x01|0x000b) plen 22 Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA) Link key: ${32-hex-digits-key} ...
HCI Event: Connect Complete (0x03) plen 11
Status: Success (0x00) Handle: 256 Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA) Link type: ACL (0x01) Encryption: Enabled (0x01) < HCI Command: Read Encryption Key... (0x05|0x0008) plen 2 Handle: 256 < ACL Data TX: Handle 256 flags 0x00 dlen 10 L2CAP: Information Request (0x0a) ident 1 len 2 Type: Extended features supported (0x0002)
HCI Event: Command Complete (0x0e) plen 7
Read Encryption Key Size (0x05|0x0008) ncmd 1 Status: Success (0x00) Handle: 256 Key size: 16
Cc: stable@vger.kernel.org Link: https://github.com/bluez/bluez/issues/704 Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de Reviewed-by: Luiz Augusto von Dentz luiz.dentz@gmail.com Signed-off-by: Hui Wang hui.wang@canonical.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/hci_event.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
--- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -3219,6 +3219,31 @@ static void hci_conn_complete_evt(struct if (test_bit(HCI_ENCRYPT, &hdev->flags)) set_bit(HCI_CONN_ENCRYPT, &conn->flags);
+ /* "Link key request" completed ahead of "connect request" completes */ + if (ev->encr_mode == 1 && !test_bit(HCI_CONN_ENCRYPT, &conn->flags) && + ev->link_type == ACL_LINK) { + struct link_key *key; + struct hci_cp_read_enc_key_size cp; + + key = hci_find_link_key(hdev, &ev->bdaddr); + if (key) { + set_bit(HCI_CONN_ENCRYPT, &conn->flags); + + if (!(hdev->commands[20] & 0x10)) { + conn->enc_key_size = HCI_LINK_KEY_SIZE; + } else { + cp.handle = cpu_to_le16(conn->handle); + if (hci_send_cmd(hdev, HCI_OP_READ_ENC_KEY_SIZE, + sizeof(cp), &cp)) { + bt_dev_err(hdev, "sending read key size failed"); + conn->enc_key_size = HCI_LINK_KEY_SIZE; + } + } + + hci_encrypt_cfm(conn, ev->status); + } + } + /* Get remote features */ if (conn->type == ACL_LINK) { struct hci_cp_read_remote_features cp;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bastien Nocera hadess@hadess.net
commit 7835fcfd132eb88b87e8eb901f88436f63ab60f7 upstream.
struct hci_dev members conn_info_max_age, conn_info_min_age, le_conn_max_interval, le_conn_min_interval, le_adv_max_interval, and le_adv_min_interval can be modified from the HCI core code, as well through debugfs.
The debugfs implementation, that's only available to privileged users, will check for boundaries, making sure that the minimum value being set is strictly above the maximum value that already exists, and vice-versa.
However, as both minimum and maximum values can be changed concurrently to us modifying them, we need to make sure that the value we check is the value we end up using.
For example, with ->conn_info_max_age set to 10, conn_info_min_age_set() gets called from vfs handlers to set conn_info_min_age to 8.
In conn_info_min_age_set(), this goes through: if (val == 0 || val > hdev->conn_info_max_age) return -EINVAL;
Concurrently, conn_info_max_age_set() gets called to set to set the conn_info_max_age to 7: if (val == 0 || val > hdev->conn_info_max_age) return -EINVAL; That check will also pass because we used the old value (10) for conn_info_max_age.
After those checks that both passed, the struct hci_dev access is mutex-locked, disabling concurrent access, but that does not matter because the invalid value checks both passed, and we'll end up with conn_info_min_age = 8 and conn_info_max_age = 7
To fix this problem, we need to lock the structure access before so the check and assignment are not interrupted.
This fix was originally devised by the BassCheck[1] team, and considered the problem to be an atomicity one. This isn't the case as there aren't any concerns about the variable changing while we check it, but rather after we check it parallel to another change.
This patch fixes CVE-2024-24858 and CVE-2024-24857.
[1] https://sites.google.com/view/basscheck/
Co-developed-by: Gui-Dong Han 2045gemini@gmail.com Signed-off-by: Gui-Dong Han 2045gemini@gmail.com Link: https://lore.kernel.org/linux-bluetooth/20231222161317.6255-1-2045gemini@gma... Link: https://nvd.nist.gov/vuln/detail/CVE-2024-24858 Link: https://lore.kernel.org/linux-bluetooth/20231222162931.6553-1-2045gemini@gma... Link: https://lore.kernel.org/linux-bluetooth/20231222162310.6461-1-2045gemini@gma... Link: https://nvd.nist.gov/vuln/detail/CVE-2024-24857 Fixes: 31ad169148df ("Bluetooth: Add conn info lifetime parameters to debugfs") Fixes: 729a1051da6f ("Bluetooth: Expose default LE advertising interval via debugfs") Fixes: 71c3b60ec6d2 ("Bluetooth: Move BR/EDR debugfs file creation into hci_debugfs.c") Signed-off-by: Bastien Nocera hadess@hadess.net Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/hci_debugfs.c | 48 +++++++++++++++++++++++++++++--------------- 1 file changed, 32 insertions(+), 16 deletions(-)
--- a/net/bluetooth/hci_debugfs.c +++ b/net/bluetooth/hci_debugfs.c @@ -218,10 +218,12 @@ static int conn_info_min_age_set(void *d { struct hci_dev *hdev = data;
- if (val == 0 || val > hdev->conn_info_max_age) + hci_dev_lock(hdev); + if (val == 0 || val > hdev->conn_info_max_age) { + hci_dev_unlock(hdev); return -EINVAL; + }
- hci_dev_lock(hdev); hdev->conn_info_min_age = val; hci_dev_unlock(hdev);
@@ -246,10 +248,12 @@ static int conn_info_max_age_set(void *d { struct hci_dev *hdev = data;
- if (val == 0 || val < hdev->conn_info_min_age) + hci_dev_lock(hdev); + if (val == 0 || val < hdev->conn_info_min_age) { + hci_dev_unlock(hdev); return -EINVAL; + }
- hci_dev_lock(hdev); hdev->conn_info_max_age = val; hci_dev_unlock(hdev);
@@ -567,10 +571,12 @@ static int sniff_min_interval_set(void * { struct hci_dev *hdev = data;
- if (val == 0 || val % 2 || val > hdev->sniff_max_interval) + hci_dev_lock(hdev); + if (val == 0 || val % 2 || val > hdev->sniff_max_interval) { + hci_dev_unlock(hdev); return -EINVAL; + }
- hci_dev_lock(hdev); hdev->sniff_min_interval = val; hci_dev_unlock(hdev);
@@ -595,10 +601,12 @@ static int sniff_max_interval_set(void * { struct hci_dev *hdev = data;
- if (val == 0 || val % 2 || val < hdev->sniff_min_interval) + hci_dev_lock(hdev); + if (val == 0 || val % 2 || val < hdev->sniff_min_interval) { + hci_dev_unlock(hdev); return -EINVAL; + }
- hci_dev_lock(hdev); hdev->sniff_max_interval = val; hci_dev_unlock(hdev);
@@ -850,10 +858,12 @@ static int conn_min_interval_set(void *d { struct hci_dev *hdev = data;
- if (val < 0x0006 || val > 0x0c80 || val > hdev->le_conn_max_interval) + hci_dev_lock(hdev); + if (val < 0x0006 || val > 0x0c80 || val > hdev->le_conn_max_interval) { + hci_dev_unlock(hdev); return -EINVAL; + }
- hci_dev_lock(hdev); hdev->le_conn_min_interval = val; hci_dev_unlock(hdev);
@@ -878,10 +888,12 @@ static int conn_max_interval_set(void *d { struct hci_dev *hdev = data;
- if (val < 0x0006 || val > 0x0c80 || val < hdev->le_conn_min_interval) + hci_dev_lock(hdev); + if (val < 0x0006 || val > 0x0c80 || val < hdev->le_conn_min_interval) { + hci_dev_unlock(hdev); return -EINVAL; + }
- hci_dev_lock(hdev); hdev->le_conn_max_interval = val; hci_dev_unlock(hdev);
@@ -990,10 +1002,12 @@ static int adv_min_interval_set(void *da { struct hci_dev *hdev = data;
- if (val < 0x0020 || val > 0x4000 || val > hdev->le_adv_max_interval) + hci_dev_lock(hdev); + if (val < 0x0020 || val > 0x4000 || val > hdev->le_adv_max_interval) { + hci_dev_unlock(hdev); return -EINVAL; + }
- hci_dev_lock(hdev); hdev->le_adv_min_interval = val; hci_dev_unlock(hdev);
@@ -1018,10 +1032,12 @@ static int adv_max_interval_set(void *da { struct hci_dev *hdev = data;
- if (val < 0x0020 || val > 0x4000 || val < hdev->le_adv_min_interval) + hci_dev_lock(hdev); + if (val < 0x0020 || val > 0x4000 || val < hdev->le_adv_min_interval) { + hci_dev_unlock(hdev); return -EINVAL; + }
- hci_dev_lock(hdev); hdev->le_adv_max_interval = val; hci_dev_unlock(hdev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit a45e6889575c2067d3c0212b6bc1022891e65b91 upstream.
Unlike early commit path stage which triggers a call to abort, an explicit release of the batch is required on abort, otherwise mutex is released and commit_list remains in place.
Add WARN_ON_ONCE to ensure commit_list is empty from the abort path before releasing the mutex.
After this patch, commit_list is always assumed to be empty before grabbing the mutex, therefore
03c1f1ef1584 ("netfilter: Cleanup nft_net->module_list from nf_tables_exit_net()")
only needs to release the pending modules for registration.
Cc: stable@vger.kernel.org Fixes: c0391b6ab810 ("netfilter: nf_tables: missing validation from the abort path") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -10326,10 +10326,11 @@ static int __nf_tables_abort(struct net struct nft_trans *trans, *next; LIST_HEAD(set_update_list); struct nft_trans_elem *te; + int err = 0;
if (action == NFNL_ABORT_VALIDATE && nf_tables_validate(net) < 0) - return -EAGAIN; + err = -EAGAIN;
list_for_each_entry_safe_reverse(trans, next, &nft_net->commit_list, list) { @@ -10522,7 +10523,7 @@ static int __nf_tables_abort(struct net else nf_tables_module_autoload_cleanup(net);
- return 0; + return err; }
static int nf_tables_abort(struct net *net, struct sk_buff *skb, @@ -10535,6 +10536,9 @@ static int nf_tables_abort(struct net *n gc_seq = nft_gc_seq_begin(nft_net); ret = __nf_tables_abort(net, action); nft_gc_seq_end(nft_net, gc_seq); + + WARN_ON_ONCE(!list_empty(&nft_net->commit_list)); + mutex_unlock(&nft_net->commit_mutex);
return ret; @@ -11335,9 +11339,10 @@ static void __net_exit nf_tables_exit_ne
gc_seq = nft_gc_seq_begin(nft_net);
- if (!list_empty(&nft_net->commit_list) || - !list_empty(&nft_net->module_list)) - __nf_tables_abort(net, NFNL_ABORT_NONE); + WARN_ON_ONCE(!list_empty(&nft_net->commit_list)); + + if (!list_empty(&nft_net->module_list)) + nf_tables_module_autoload_cleanup(net);
__nft_release_tables(net);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 0d459e2ffb541841714839e8228b845458ed3b27 upstream.
The commit mutex should not be released during the critical section between nft_gc_seq_begin() and nft_gc_seq_end(), otherwise, async GC worker could collect expired objects and get the released commit lock within the same GC sequence.
nf_tables_module_autoload() temporarily releases the mutex to load module dependencies, then it goes back to replay the transaction again. Move it at the end of the abort phase after nft_gc_seq_end() is called.
Cc: stable@vger.kernel.org Fixes: 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path") Reported-by: Kuan-Ting Chen hexrabbit@devco.re Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -10518,11 +10518,6 @@ static int __nf_tables_abort(struct net nf_tables_abort_release(trans); }
- if (action == NFNL_ABORT_AUTOLOAD) - nf_tables_module_autoload(net); - else - nf_tables_module_autoload_cleanup(net); - return err; }
@@ -10539,6 +10534,14 @@ static int nf_tables_abort(struct net *n
WARN_ON_ONCE(!list_empty(&nft_net->commit_list));
+ /* module autoload needs to happen after GC sequence update because it + * temporarily releases and grabs mutex again. + */ + if (action == NFNL_ABORT_AUTOLOAD) + nf_tables_module_autoload(net); + else + nf_tables_module_autoload_cleanup(net); + mutex_unlock(&nft_net->commit_mutex);
return ret;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang tanggeliang@kylinos.cn
commit 40061817d95bce6dd5634a61a65cd5922e6ccc92 upstream.
There's a bug in pm_nl_check_endpoint(), 'dev' didn't be parsed correctly. If calling it in the 2nd test of endpoint_tests() too, it fails with an error like this:
creation [FAIL] expected '10.0.2.2 id 2 subflow dev dev' \ found '10.0.2.2 id 2 subflow dev ns2eth2'
The reason is '$2' should be set to 'dev', not '$1'. This patch fixes it.
Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-2-3... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -796,7 +796,7 @@ pm_nl_check_endpoint() [ -n "$_flags" ]; flags="flags $_flags" shift elif [ $1 = "dev" ]; then - [ -n "$2" ]; dev="dev $1" + [ -n "$2" ]; dev="dev $2" shift elif [ $1 = "id" ]; then _id=$2 @@ -3507,6 +3507,8 @@ endpoint_tests() local tests_pid=$!
wait_mpj $ns2 + pm_nl_check_endpoint "creation" \ + $ns2 10.0.2.2 id 2 flags subflow dev ns2eth2 chk_subflow_nr "before delete" 2 chk_mptcp_info subflows 1 subflows 1
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesper Dangaard Brouer hawk@kernel.org
commit 037965402a010898d34f4e35327d22c0a95cd51f upstream.
Notice that skb_mark_for_recycle() is introduced later than fixes tag in commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling").
It is believed that fixes tag were missing a call to page_pool_release_page() between v5.9 to v5.14, after which is should have used skb_mark_for_recycle(). Since v6.6 the call page_pool_release_page() were removed (in commit 535b9c61bdef ("net: page_pool: hide page_pool_release_page()") and remaining callers converted (in commit 6bfef2ec0172 ("Merge branch 'net-page_pool-remove-page_pool_release_page'")).
This leak became visible in v6.8 via commit dba1b8a7ab68 ("mm/page_pool: catch page_pool memory leaks").
Cc: stable@vger.kernel.org Fixes: 6c5aa6fc4def ("xen networking: add basic XDP support for xen-netfront") Reported-by: Leonidas Spyropoulos artafinde@archlinux.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=218654 Reported-by: Arthur Borsboom arthurborsboom@gmail.com Signed-off-by: Jesper Dangaard Brouer hawk@kernel.org Link: https://lore.kernel.org/r/171154167446.2671062.9127105384591237363.stgit@fir... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/xen-netfront.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -285,6 +285,7 @@ static struct sk_buff *xennet_alloc_one_ return NULL; } skb_add_rx_frag(skb, 0, page, 0, 0, PAGE_SIZE); + skb_mark_for_recycle(skb);
/* Align ip header to a 16 bytes boundary */ skb_reserve(skb, NET_IP_ALIGN);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mahmoud Adam mngyadam@amazon.com
commit 62fc3357e079a07a22465b9b6ef71bb6ea75ee4b upstream.
cp might be null, calling cp->cp_conn would produce null dereference
[Simon Horman adds:]
Analysis:
* cp is a parameter of __rds_rdma_map and is not reassigned.
* The following call-sites pass a NULL cp argument to __rds_rdma_map()
- rds_get_mr() - rds_get_mr_for_dest
* Prior to the code above, the following assumes that cp may be NULL (which is indicative, but could itself be unnecessary)
trans_private = rs->rs_transport->get_mr( sg, nents, rs, &mr->r_key, cp ? cp->cp_conn : NULL, args->vec.addr, args->vec.bytes, need_odp ? ODP_ZEROBASED : ODP_NOT_NEEDED);
* The code modified by this patch is guarded by IS_ERR(trans_private), where trans_private is assigned as per the previous point in this analysis.
The only implementation of get_mr that I could locate is rds_ib_get_mr() which can return an ERR_PTR if the conn (4th) argument is NULL.
* ret is set to PTR_ERR(trans_private). rds_ib_get_mr can return ERR_PTR(-ENODEV) if the conn (4th) argument is NULL. Thus ret may be -ENODEV in which case the code in question will execute.
Conclusion: * cp may be NULL at the point where this patch adds a check; this patch does seem to address a possible bug
Fixes: c055fc00c07b ("net/rds: fix WARNING in rds_conn_connect_if_down") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Mahmoud Adam mngyadam@amazon.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240326153132.55580-1-mngyadam@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rds/rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/rds/rdma.c +++ b/net/rds/rdma.c @@ -302,7 +302,7 @@ static int __rds_rdma_map(struct rds_soc } ret = PTR_ERR(trans_private); /* Trigger connection so that its ready for the next retry */ - if (ret == -ENODEV) + if (ret == -ENODEV && cp) rds_conn_connect_if_down(cp->cp_conn); goto out; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jose Ignacio Tornos Martinez jtornosm@redhat.com
commit 2e91bb99b9d4f756e92e83c4453f894dda220f09 upstream.
After the commit d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets"), reset is not executed from bind operation and mac address is not read from the device registers or the devicetree at that moment. Since the check to configure if the assigned mac address is random or not for the interface, happens after the bind operation from usbnet_probe, the interface keeps configured as random address, although the address is correctly read and set during open operation (the only reset now).
In order to keep only one reset for the device and to avoid the interface always configured as random address, after reset, configure correctly the suitable field from the driver, if the mac address is read successfully from the device registers or the devicetree. Take into account if a locally administered address (random) was previously stored.
cc: stable@vger.kernel.org # 6.6+ Fixes: d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets") Reported-by: Dave Stevenson dave.stevenson@raspberrypi.com Signed-off-by: Jose Ignacio Tornos Martinez jtornosm@redhat.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240403132158.344838-1-jtornosm@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/ax88179_178a.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/usb/ax88179_178a.c +++ b/drivers/net/usb/ax88179_178a.c @@ -1273,6 +1273,8 @@ static void ax88179_get_mac_addr(struct
if (is_valid_ether_addr(mac)) { eth_hw_addr_set(dev->net, mac); + if (!is_local_ether_addr(mac)) + dev->net->addr_assign_type = NET_ADDR_PERM; } else { netdev_info(dev->net, "invalid MAC address, using random\n"); eth_hw_addr_random(dev->net);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haiyang Zhang haiyangz@microsoft.com
commit c0de6ab920aafb56feab56058e46b688e694a246 upstream.
mana_get_rxbuf_cfg() aligns the RX buffer's DMA datasize to be multiple of 64. So a packet slightly bigger than mtu+14, say 1536, can be received and cause skb_over_panic.
Sample dmesg: [ 5325.237162] skbuff: skb_over_panic: text:ffffffffc043277a len:1536 put:1536 head:ff1100018b517000 data:ff1100018b517100 tail:0x700 end:0x6ea dev:<NULL> [ 5325.243689] ------------[ cut here ]------------ [ 5325.245748] kernel BUG at net/core/skbuff.c:192! [ 5325.247838] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 5325.258374] RIP: 0010:skb_panic+0x4f/0x60 [ 5325.302941] Call Trace: [ 5325.304389] <IRQ> [ 5325.315794] ? skb_panic+0x4f/0x60 [ 5325.317457] ? asm_exc_invalid_op+0x1f/0x30 [ 5325.319490] ? skb_panic+0x4f/0x60 [ 5325.321161] skb_put+0x4e/0x50 [ 5325.322670] mana_poll+0x6fa/0xb50 [mana] [ 5325.324578] __napi_poll+0x33/0x1e0 [ 5325.326328] net_rx_action+0x12e/0x280
As discussed internally, this alignment is not necessary. To fix this bug, remove it from the code. So oversized packets will be marked as CQE_RX_TRUNCATED by NIC, and dropped.
Cc: stable@vger.kernel.org Fixes: 2fbbd712baf1 ("net: mana: Enable RX path to handle various MTU sizes") Signed-off-by: Haiyang Zhang haiyangz@microsoft.com Reviewed-by: Dexuan Cui decui@microsoft.com Link: https://lore.kernel.org/r/1712087316-20886-1-git-send-email-haiyangz@microso... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/microsoft/mana/mana_en.c | 2 +- include/net/mana/mana.h | 1 - 2 files changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -601,7 +601,7 @@ static void mana_get_rxbuf_cfg(int mtu,
*alloc_size = mtu + MANA_RXBUF_PAD + *headroom;
- *datasize = ALIGN(mtu + ETH_HLEN, MANA_RX_DATA_ALIGN); + *datasize = mtu + ETH_HLEN; }
static int mana_pre_alloc_rxbufs(struct mana_port_context *mpc, int new_mtu) --- a/include/net/mana/mana.h +++ b/include/net/mana/mana.h @@ -39,7 +39,6 @@ enum TRI_STATE { #define COMP_ENTRY_SIZE 64
#define RX_BUFFERS_PER_QUEUE 512 -#define MANA_RX_DATA_ALIGN 64
#define MAX_SEND_BUFFERS_PER_QUEUE 256
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marco Pinna marco.pinn95@gmail.com
commit b32a09ea7c38849ff925489a6bf5bd8914bc45df upstream.
Commit 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") added virtio_transport_deliver_tap_pkt() for handing packets to the vsockmon device. However, in virtio_transport_send_pkt_work(), the function is called before actually sending the packet (i.e. before placing it in the virtqueue with virtqueue_add_sgs() and checking whether it returned successfully). Queuing the packet in the virtqueue can fail even multiple times. However, in virtio_transport_deliver_tap_pkt() we deliver the packet to the monitoring tap interface only the first time we call it. This certainly avoids seeing the same packet replicated multiple times in the monitoring interface, but it can show the packet sent with the wrong timestamp or even before we succeed to queue it in the virtqueue.
Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs() and making sure it returned successfully.
Fixes: 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") Cc: stable@vge.kernel.org Signed-off-by: Marco Pinna marco.pinn95@gmail.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Link: https://lore.kernel.org/r/20240329161259.411751-1-marco.pinn95@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/vmw_vsock/virtio_transport.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -109,7 +109,6 @@ virtio_transport_send_pkt_work(struct wo if (!skb) break;
- virtio_transport_deliver_tap_pkt(skb); reply = virtio_vsock_skb_reply(skb);
sg_init_one(&hdr, virtio_vsock_hdr(skb), sizeof(*virtio_vsock_hdr(skb))); @@ -128,6 +127,8 @@ virtio_transport_send_pkt_work(struct wo break; }
+ virtio_transport_deliver_tap_pkt(skb); + if (reply) { struct virtqueue *rx_vq = vsock->vqs[VSOCK_VQ_RX]; int val;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@kernel.org
Commit aa730cff0c26244e88066b5b461a9f5fbac13823 upstream.
Move srso_alias_return_thunk() to the same section as srso_alias_safe_ret() so they can share a cache line.
Signed-off-by: Josh Poimboeuf jpoimboe@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Acked-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/r/eadaf5530b46a7ae8b936522da45ae555d2b3393.169388998... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/lib/retpoline.S | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -177,15 +177,14 @@ SYM_START(srso_alias_safe_ret, SYM_L_GLO int3 SYM_FUNC_END(srso_alias_safe_ret)
- .section .text..__x86.return_thunk - -SYM_CODE_START(srso_alias_return_thunk) +SYM_CODE_START_NOALIGN(srso_alias_return_thunk) UNWIND_HINT_FUNC ANNOTATE_NOENDBR call srso_alias_safe_ret ud2 SYM_CODE_END(srso_alias_return_thunk)
+ .section .text..__x86.return_thunk /* * Some generic notes on the untraining sequences: *
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@kernel.org
Commit 34a3cae7474c6e6f4a85aad4a7b8191b8b35cdcd upstream.
CONFIG_RETHUNK, CONFIG_CPU_UNRET_ENTRY and CONFIG_CPU_SRSO are all tangled up. De-spaghettify the code a bit.
Some of the rethunk-related code has been shuffled around within the '.text..__x86.return_thunk' section, but otherwise there are no functional changes. srso_alias_untrain_ret() and srso_alias_safe_ret() ((which are very address-sensitive) haven't moved.
Signed-off-by: Josh Poimboeuf jpoimboe@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Acked-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/r/2845084ed303d8384905db3b87b77693945302b4.169388998... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 25 +++-- arch/x86/kernel/cpu/bugs.c | 5 - arch/x86/kernel/vmlinux.lds.S | 7 - arch/x86/lib/retpoline.S | 158 +++++++++++++++++++---------------- 4 files changed, 109 insertions(+), 86 deletions(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -289,19 +289,17 @@ * where we have a stack but before any RET instruction. */ .macro UNTRAIN_RET -#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY) || \ - defined(CONFIG_CALL_DEPTH_TRACKING) || defined(CONFIG_CPU_SRSO) +#if defined(CONFIG_RETHUNK) || defined(CONFIG_CPU_IBPB_ENTRY) VALIDATE_UNRET_END ALTERNATIVE_3 "", \ CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \ "call entry_ibpb", X86_FEATURE_ENTRY_IBPB, \ - __stringify(RESET_CALL_DEPTH), X86_FEATURE_CALL_DEPTH + __stringify(RESET_CALL_DEPTH), X86_FEATURE_CALL_DEPTH #endif .endm
.macro UNTRAIN_RET_VM -#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY) || \ - defined(CONFIG_CALL_DEPTH_TRACKING) || defined(CONFIG_CPU_SRSO) +#if defined(CONFIG_RETHUNK) || defined(CONFIG_CPU_IBPB_ENTRY) VALIDATE_UNRET_END ALTERNATIVE_3 "", \ CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \ @@ -311,8 +309,7 @@ .endm
.macro UNTRAIN_RET_FROM_CALL -#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY) || \ - defined(CONFIG_CALL_DEPTH_TRACKING) || defined(CONFIG_CPU_SRSO) +#if defined(CONFIG_RETHUNK) || defined(CONFIG_CPU_IBPB_ENTRY) VALIDATE_UNRET_END ALTERNATIVE_3 "", \ CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \ @@ -359,6 +356,20 @@ extern void __x86_return_thunk(void); static inline void __x86_return_thunk(void) {} #endif
+#ifdef CONFIG_CPU_UNRET_ENTRY +extern void retbleed_return_thunk(void); +#else +static inline void retbleed_return_thunk(void) {} +#endif + +#ifdef CONFIG_CPU_SRSO +extern void srso_return_thunk(void); +extern void srso_alias_return_thunk(void); +#else +static inline void srso_return_thunk(void) {} +static inline void srso_alias_return_thunk(void) {} +#endif + extern void retbleed_return_thunk(void); extern void srso_return_thunk(void); extern void srso_alias_return_thunk(void); --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -63,7 +63,7 @@ EXPORT_SYMBOL_GPL(x86_pred_cmd);
static DEFINE_MUTEX(spec_ctrl_mutex);
-void (*x86_return_thunk)(void) __ro_after_init = &__x86_return_thunk; +void (*x86_return_thunk)(void) __ro_after_init = __x86_return_thunk;
/* Update SPEC_CTRL MSR and its cached copy unconditionally */ static void update_spec_ctrl(u64 val) @@ -1108,8 +1108,7 @@ do_cmd_auto: setup_force_cpu_cap(X86_FEATURE_RETHUNK); setup_force_cpu_cap(X86_FEATURE_UNRET);
- if (IS_ENABLED(CONFIG_RETHUNK)) - x86_return_thunk = retbleed_return_thunk; + x86_return_thunk = retbleed_return_thunk;
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD && boot_cpu_data.x86_vendor != X86_VENDOR_HYGON) --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -139,10 +139,7 @@ SECTIONS STATIC_CALL_TEXT
ALIGN_ENTRY_TEXT_BEGIN -#ifdef CONFIG_CPU_SRSO *(.text..__x86.rethunk_untrain) -#endif - ENTRY_TEXT
#ifdef CONFIG_CPU_SRSO @@ -520,12 +517,12 @@ INIT_PER_CPU(irq_stack_backing_store); "fixed_percpu_data is not at start of per-cpu area"); #endif
-#ifdef CONFIG_RETHUNK +#ifdef CONFIG_CPU_UNRET_ENTRY . = ASSERT((retbleed_return_thunk & 0x3f) == 0, "retbleed_return_thunk not cacheline-aligned"); -. = ASSERT((srso_safe_ret & 0x3f) == 0, "srso_safe_ret not cacheline-aligned"); #endif
#ifdef CONFIG_CPU_SRSO +. = ASSERT((srso_safe_ret & 0x3f) == 0, "srso_safe_ret not cacheline-aligned"); /* * GNU ld cannot do XOR until 2.41. * https://sourceware.org/git/?p=binutils-gdb.git%3Ba=commit%3Bh=f6f78318fca803... --- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -126,12 +126,13 @@ SYM_CODE_END(__x86_indirect_jump_thunk_a #include <asm/GEN-for-each-reg.h> #undef GEN #endif -/* - * This function name is magical and is used by -mfunction-return=thunk-extern - * for the compiler to generate JMPs to it. - */ + #ifdef CONFIG_RETHUNK
+ .section .text..__x86.return_thunk + +#ifdef CONFIG_CPU_SRSO + /* * srso_alias_untrain_ret() and srso_alias_safe_ret() are placed at * special addresses: @@ -147,9 +148,7 @@ SYM_CODE_END(__x86_indirect_jump_thunk_a * * As a result, srso_alias_safe_ret() becomes a safe return. */ -#ifdef CONFIG_CPU_SRSO - .section .text..__x86.rethunk_untrain - + .pushsection .text..__x86.rethunk_untrain SYM_START(srso_alias_untrain_ret, SYM_L_GLOBAL, SYM_A_NONE) UNWIND_HINT_FUNC ANNOTATE_NOENDBR @@ -158,17 +157,9 @@ SYM_START(srso_alias_untrain_ret, SYM_L_ jmp srso_alias_return_thunk SYM_FUNC_END(srso_alias_untrain_ret) __EXPORT_THUNK(srso_alias_untrain_ret) + .popsection
- .section .text..__x86.rethunk_safe -#else -/* dummy definition for alternatives */ -SYM_START(srso_alias_untrain_ret, SYM_L_GLOBAL, SYM_A_NONE) - ANNOTATE_UNRET_SAFE - ret - int3 -SYM_FUNC_END(srso_alias_untrain_ret) -#endif - + .pushsection .text..__x86.rethunk_safe SYM_START(srso_alias_safe_ret, SYM_L_GLOBAL, SYM_A_NONE) lea 8(%_ASM_SP), %_ASM_SP UNWIND_HINT_FUNC @@ -183,8 +174,58 @@ SYM_CODE_START_NOALIGN(srso_alias_return call srso_alias_safe_ret ud2 SYM_CODE_END(srso_alias_return_thunk) + .popsection + +/* + * SRSO untraining sequence for Zen1/2, similar to retbleed_untrain_ret() + * above. On kernel entry, srso_untrain_ret() is executed which is a + * + * movabs $0xccccc30824648d48,%rax + * + * and when the return thunk executes the inner label srso_safe_ret() + * later, it is a stack manipulation and a RET which is mispredicted and + * thus a "safe" one to use. + */ + .align 64 + .skip 64 - (srso_safe_ret - srso_untrain_ret), 0xcc +SYM_START(srso_untrain_ret, SYM_L_LOCAL, SYM_A_NONE) + ANNOTATE_NOENDBR + .byte 0x48, 0xb8 + +/* + * This forces the function return instruction to speculate into a trap + * (UD2 in srso_return_thunk() below). This RET will then mispredict + * and execution will continue at the return site read from the top of + * the stack. + */ +SYM_INNER_LABEL(srso_safe_ret, SYM_L_GLOBAL) + lea 8(%_ASM_SP), %_ASM_SP + ret + int3 + int3 + /* end of movabs */ + lfence + call srso_safe_ret + ud2 +SYM_CODE_END(srso_safe_ret) +SYM_FUNC_END(srso_untrain_ret) + +SYM_CODE_START(srso_return_thunk) + UNWIND_HINT_FUNC + ANNOTATE_NOENDBR + call srso_safe_ret + ud2 +SYM_CODE_END(srso_return_thunk) + +#define JMP_SRSO_UNTRAIN_RET "jmp srso_untrain_ret" +#define JMP_SRSO_ALIAS_UNTRAIN_RET "jmp srso_alias_untrain_ret" +#else /* !CONFIG_CPU_SRSO */ +#define JMP_SRSO_UNTRAIN_RET "ud2" +#define JMP_SRSO_ALIAS_UNTRAIN_RET "ud2" +#endif /* CONFIG_CPU_SRSO */ + +#ifdef CONFIG_CPU_UNRET_ENTRY
- .section .text..__x86.return_thunk /* * Some generic notes on the untraining sequences: * @@ -265,65 +306,21 @@ SYM_CODE_END(retbleed_return_thunk) SYM_FUNC_END(retbleed_untrain_ret) __EXPORT_THUNK(retbleed_untrain_ret)
-/* - * SRSO untraining sequence for Zen1/2, similar to retbleed_untrain_ret() - * above. On kernel entry, srso_untrain_ret() is executed which is a - * - * movabs $0xccccc30824648d48,%rax - * - * and when the return thunk executes the inner label srso_safe_ret() - * later, it is a stack manipulation and a RET which is mispredicted and - * thus a "safe" one to use. - */ - .align 64 - .skip 64 - (srso_safe_ret - srso_untrain_ret), 0xcc -SYM_START(srso_untrain_ret, SYM_L_GLOBAL, SYM_A_NONE) - ANNOTATE_NOENDBR - .byte 0x48, 0xb8 +#define JMP_RETBLEED_UNTRAIN_RET "jmp retbleed_untrain_ret" +#else /* !CONFIG_CPU_UNRET_ENTRY */ +#define JMP_RETBLEED_UNTRAIN_RET "ud2" +#endif /* CONFIG_CPU_UNRET_ENTRY */
-/* - * This forces the function return instruction to speculate into a trap - * (UD2 in srso_return_thunk() below). This RET will then mispredict - * and execution will continue at the return site read from the top of - * the stack. - */ -SYM_INNER_LABEL(srso_safe_ret, SYM_L_GLOBAL) - lea 8(%_ASM_SP), %_ASM_SP - ret - int3 - int3 - /* end of movabs */ - lfence - call srso_safe_ret - ud2 -SYM_CODE_END(srso_safe_ret) -SYM_FUNC_END(srso_untrain_ret) -__EXPORT_THUNK(srso_untrain_ret) - -SYM_CODE_START(srso_return_thunk) - UNWIND_HINT_FUNC - ANNOTATE_NOENDBR - call srso_safe_ret - ud2 -SYM_CODE_END(srso_return_thunk) +#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_SRSO)
SYM_FUNC_START(entry_untrain_ret) - ALTERNATIVE_2 "jmp retbleed_untrain_ret", \ - "jmp srso_untrain_ret", X86_FEATURE_SRSO, \ - "jmp srso_alias_untrain_ret", X86_FEATURE_SRSO_ALIAS + ALTERNATIVE_2 JMP_RETBLEED_UNTRAIN_RET, \ + JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO, \ + JMP_SRSO_ALIAS_UNTRAIN_RET, X86_FEATURE_SRSO_ALIAS SYM_FUNC_END(entry_untrain_ret) __EXPORT_THUNK(entry_untrain_ret)
-SYM_CODE_START(__x86_return_thunk) - UNWIND_HINT_FUNC - ANNOTATE_NOENDBR - ANNOTATE_UNRET_SAFE - ret - int3 -SYM_CODE_END(__x86_return_thunk) -EXPORT_SYMBOL(__x86_return_thunk) - -#endif /* CONFIG_RETHUNK */ +#endif /* CONFIG_CPU_UNRET_ENTRY || CONFIG_CPU_SRSO */
#ifdef CONFIG_CALL_DEPTH_TRACKING
@@ -358,3 +355,22 @@ SYM_FUNC_START(__x86_return_skl) SYM_FUNC_END(__x86_return_skl)
#endif /* CONFIG_CALL_DEPTH_TRACKING */ + +/* + * This function name is magical and is used by -mfunction-return=thunk-extern + * for the compiler to generate JMPs to it. + * + * This code is only used during kernel boot or module init. All + * 'JMP __x86_return_thunk' sites are changed to something else by + * apply_returns(). + */ +SYM_CODE_START(__x86_return_thunk) + UNWIND_HINT_FUNC + ANNOTATE_NOENDBR + ANNOTATE_UNRET_SAFE + ret + int3 +SYM_CODE_END(__x86_return_thunk) +EXPORT_SYMBOL(__x86_return_thunk) + +#endif /* CONFIG_RETHUNK */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@kernel.org
Commit e8efc0800b8b5045ba8c0d1256bfbb47e92e192a upstream.
Factor out the UNTRAIN_RET[_*] common bits into a helper macro.
Signed-off-by: Josh Poimboeuf jpoimboe@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Acked-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/r/f06d45489778bd49623297af2a983eea09067a74.169388998... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 31 ++++++++++--------------------- 1 file changed, 10 insertions(+), 21 deletions(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -288,35 +288,24 @@ * As such, this must be placed after every *SWITCH_TO_KERNEL_CR3 at a point * where we have a stack but before any RET instruction. */ -.macro UNTRAIN_RET +.macro __UNTRAIN_RET ibpb_feature, call_depth_insns #if defined(CONFIG_RETHUNK) || defined(CONFIG_CPU_IBPB_ENTRY) VALIDATE_UNRET_END ALTERNATIVE_3 "", \ CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \ - "call entry_ibpb", X86_FEATURE_ENTRY_IBPB, \ - __stringify(RESET_CALL_DEPTH), X86_FEATURE_CALL_DEPTH + "call entry_ibpb", \ibpb_feature, \ + __stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH #endif .endm
-.macro UNTRAIN_RET_VM -#if defined(CONFIG_RETHUNK) || defined(CONFIG_CPU_IBPB_ENTRY) - VALIDATE_UNRET_END - ALTERNATIVE_3 "", \ - CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \ - "call entry_ibpb", X86_FEATURE_IBPB_ON_VMEXIT, \ - __stringify(RESET_CALL_DEPTH), X86_FEATURE_CALL_DEPTH -#endif -.endm +#define UNTRAIN_RET \ + __UNTRAIN_RET X86_FEATURE_ENTRY_IBPB, __stringify(RESET_CALL_DEPTH)
-.macro UNTRAIN_RET_FROM_CALL -#if defined(CONFIG_RETHUNK) || defined(CONFIG_CPU_IBPB_ENTRY) - VALIDATE_UNRET_END - ALTERNATIVE_3 "", \ - CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \ - "call entry_ibpb", X86_FEATURE_ENTRY_IBPB, \ - __stringify(RESET_CALL_DEPTH_FROM_CALL), X86_FEATURE_CALL_DEPTH -#endif -.endm +#define UNTRAIN_RET_VM \ + __UNTRAIN_RET X86_FEATURE_IBPB_ON_VMEXIT, __stringify(RESET_CALL_DEPTH) + +#define UNTRAIN_RET_FROM_CALL \ + __UNTRAIN_RET X86_FEATURE_ENTRY_IBPB, __stringify(RESET_CALL_DEPTH_FROM_CALL)
.macro CALL_DEPTH_ACCOUNT
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Borislav Petkov (AMD)" bp@alien8.de
Commit 4535e1a4174c4111d92c5a9a21e542d232e0fcaa upstream.
The original version of the mitigation would patch in the calls to the untraining routines directly. That is, the alternative() in UNTRAIN_RET will patch in the CALL to srso_alias_untrain_ret() directly.
However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain mess") meant well in trying to clean up the situation, due to micro- architectural reasons, the untraining routine srso_alias_untrain_ret() must be the target of a CALL instruction and not of a JMP instruction as it is done now.
Reshuffle the alternative macros to accomplish that.
Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess") Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: stable@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/asm-prototypes.h | 1 + arch/x86/include/asm/nospec-branch.h | 21 ++++++++++++++++----- arch/x86/lib/retpoline.S | 10 +++++----- 3 files changed, 22 insertions(+), 10 deletions(-)
--- a/arch/x86/include/asm/asm-prototypes.h +++ b/arch/x86/include/asm/asm-prototypes.h @@ -13,6 +13,7 @@ #include <asm/preempt.h> #include <asm/asm.h> #include <asm/gsseg.h> +#include <asm/nospec-branch.h>
#ifndef CONFIG_X86_CMPXCHG64 extern void cmpxchg8b_emu(void); --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -271,11 +271,20 @@ .Lskip_rsb_@: .endm
+/* + * The CALL to srso_alias_untrain_ret() must be patched in directly at + * the spot where untraining must be done, ie., srso_alias_untrain_ret() + * must be the target of a CALL instruction instead of indirectly + * jumping to a wrapper which then calls it. Therefore, this macro is + * called outside of __UNTRAIN_RET below, for the time being, before the + * kernel can support nested alternatives with arbitrary nesting. + */ +.macro CALL_UNTRAIN_RET #if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_SRSO) -#define CALL_UNTRAIN_RET "call entry_untrain_ret" -#else -#define CALL_UNTRAIN_RET "" + ALTERNATIVE_2 "", "call entry_untrain_ret", X86_FEATURE_UNRET, \ + "call srso_alias_untrain_ret", X86_FEATURE_SRSO_ALIAS #endif +.endm
/* * Mitigate RETBleed for AMD/Hygon Zen uarch. Requires KERNEL CR3 because the @@ -291,8 +300,8 @@ .macro __UNTRAIN_RET ibpb_feature, call_depth_insns #if defined(CONFIG_RETHUNK) || defined(CONFIG_CPU_IBPB_ENTRY) VALIDATE_UNRET_END - ALTERNATIVE_3 "", \ - CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \ + CALL_UNTRAIN_RET + ALTERNATIVE_2 "", \ "call entry_ibpb", \ibpb_feature, \ __stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH #endif @@ -351,6 +360,8 @@ extern void retbleed_return_thunk(void); static inline void retbleed_return_thunk(void) {} #endif
+extern void srso_alias_untrain_ret(void); + #ifdef CONFIG_CPU_SRSO extern void srso_return_thunk(void); extern void srso_alias_return_thunk(void); --- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -218,10 +218,12 @@ SYM_CODE_START(srso_return_thunk) SYM_CODE_END(srso_return_thunk)
#define JMP_SRSO_UNTRAIN_RET "jmp srso_untrain_ret" -#define JMP_SRSO_ALIAS_UNTRAIN_RET "jmp srso_alias_untrain_ret" #else /* !CONFIG_CPU_SRSO */ #define JMP_SRSO_UNTRAIN_RET "ud2" -#define JMP_SRSO_ALIAS_UNTRAIN_RET "ud2" +/* Dummy for the alternative in CALL_UNTRAIN_RET. */ +SYM_CODE_START(srso_alias_untrain_ret) + RET +SYM_FUNC_END(srso_alias_untrain_ret) #endif /* CONFIG_CPU_SRSO */
#ifdef CONFIG_CPU_UNRET_ENTRY @@ -314,9 +316,7 @@ __EXPORT_THUNK(retbleed_untrain_ret) #if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_SRSO)
SYM_FUNC_START(entry_untrain_ret) - ALTERNATIVE_2 JMP_RETBLEED_UNTRAIN_RET, \ - JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO, \ - JMP_SRSO_ALIAS_UNTRAIN_RET, X86_FEATURE_SRSO_ALIAS + ALTERNATIVE JMP_RETBLEED_UNTRAIN_RET, JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO SYM_FUNC_END(entry_untrain_ret) __EXPORT_THUNK(entry_untrain_ret)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 994209ddf4f430946f6247616b2e33d179243769 upstream.
When dormant flag is toggled, hooks are disabled in the commit phase by iterating over current chains in table (existing and new).
The following configuration allows for an inconsistent state:
add table x add chain x y { type filter hook input priority 0; } add table x { flags dormant; } add chain x w { type filter hook input priority 1; }
which triggers the following warning when trying to unregister chain w which is already unregistered.
[ 127.322252] WARNING: CPU: 7 PID: 1211 at net/netfilter/core.c:50 1 __nf_unregister_net_hook+0x21a/0x260 [...] [ 127.322519] Call Trace: [ 127.322521] <TASK> [ 127.322524] ? __warn+0x9f/0x1a0 [ 127.322531] ? __nf_unregister_net_hook+0x21a/0x260 [ 127.322537] ? report_bug+0x1b1/0x1e0 [ 127.322545] ? handle_bug+0x3c/0x70 [ 127.322552] ? exc_invalid_op+0x17/0x40 [ 127.322556] ? asm_exc_invalid_op+0x1a/0x20 [ 127.322563] ? kasan_save_free_info+0x3b/0x60 [ 127.322570] ? __nf_unregister_net_hook+0x6a/0x260 [ 127.322577] ? __nf_unregister_net_hook+0x21a/0x260 [ 127.322583] ? __nf_unregister_net_hook+0x6a/0x260 [ 127.322590] ? __nf_tables_unregister_hook+0x8a/0xe0 [nf_tables] [ 127.322655] nft_table_disable+0x75/0xf0 [nf_tables] [ 127.322717] nf_tables_commit+0x2571/0x2620 [nf_tables]
Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2439,6 +2439,9 @@ static int nf_tables_addchain(struct nft struct nft_stats __percpu *stats = NULL; struct nft_chain_hook hook = {};
+ if (table->flags & __NFT_TABLE_F_UPDATE) + return -EINVAL; + if (flags & NFT_CHAIN_BINDING) return -EOPNOTSUPP;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 24cea9677025e0de419989ecb692acd4bb34cac2 upstream.
Similar to 2c9f0293280e ("netfilter: nf_tables: flush pending destroy work before netlink notifier") to address a race between exit_net and the destroy workqueue.
The trace below shows an element to be released via destroy workqueue while exit_net path (triggered via module removal) has already released the set that is used in such transaction.
[ 1360.547789] BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables] [ 1360.547861] Read of size 8 at addr ffff888140500cc0 by task kworker/4:1/152465 [ 1360.547870] CPU: 4 PID: 152465 Comm: kworker/4:1 Not tainted 6.8.0+ #359 [ 1360.547882] Workqueue: events nf_tables_trans_destroy_work [nf_tables] [ 1360.547984] Call Trace: [ 1360.547991] <TASK> [ 1360.547998] dump_stack_lvl+0x53/0x70 [ 1360.548014] print_report+0xc4/0x610 [ 1360.548026] ? __virt_addr_valid+0xba/0x160 [ 1360.548040] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 1360.548054] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables] [ 1360.548176] kasan_report+0xae/0xe0 [ 1360.548189] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables] [ 1360.548312] nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables] [ 1360.548447] ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [nf_tables] [ 1360.548577] ? _raw_spin_unlock_irq+0x18/0x30 [ 1360.548591] process_one_work+0x2f1/0x670 [ 1360.548610] worker_thread+0x4d3/0x760 [ 1360.548627] ? __pfx_worker_thread+0x10/0x10 [ 1360.548640] kthread+0x16b/0x1b0 [ 1360.548653] ? __pfx_kthread+0x10/0x10 [ 1360.548665] ret_from_fork+0x2f/0x50 [ 1360.548679] ? __pfx_kthread+0x10/0x10 [ 1360.548690] ret_from_fork_asm+0x1a/0x30 [ 1360.548707] </TASK>
[ 1360.548719] Allocated by task 192061: [ 1360.548726] kasan_save_stack+0x20/0x40 [ 1360.548739] kasan_save_track+0x14/0x30 [ 1360.548750] __kasan_kmalloc+0x8f/0xa0 [ 1360.548760] __kmalloc_node+0x1f1/0x450 [ 1360.548771] nf_tables_newset+0x10c7/0x1b50 [nf_tables] [ 1360.548883] nfnetlink_rcv_batch+0xbc4/0xdc0 [nfnetlink] [ 1360.548909] nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink] [ 1360.548927] netlink_unicast+0x367/0x4f0 [ 1360.548935] netlink_sendmsg+0x34b/0x610 [ 1360.548944] ____sys_sendmsg+0x4d4/0x510 [ 1360.548953] ___sys_sendmsg+0xc9/0x120 [ 1360.548961] __sys_sendmsg+0xbe/0x140 [ 1360.548971] do_syscall_64+0x55/0x120 [ 1360.548982] entry_SYSCALL_64_after_hwframe+0x55/0x5d
[ 1360.548994] Freed by task 192222: [ 1360.548999] kasan_save_stack+0x20/0x40 [ 1360.549009] kasan_save_track+0x14/0x30 [ 1360.549019] kasan_save_free_info+0x3b/0x60 [ 1360.549028] poison_slab_object+0x100/0x180 [ 1360.549036] __kasan_slab_free+0x14/0x30 [ 1360.549042] kfree+0xb6/0x260 [ 1360.549049] __nft_release_table+0x473/0x6a0 [nf_tables] [ 1360.549131] nf_tables_exit_net+0x170/0x240 [nf_tables] [ 1360.549221] ops_exit_list+0x50/0xa0 [ 1360.549229] free_exit_list+0x101/0x140 [ 1360.549236] unregister_pernet_operations+0x107/0x160 [ 1360.549245] unregister_pernet_subsys+0x1c/0x30 [ 1360.549254] nf_tables_module_exit+0x43/0x80 [nf_tables] [ 1360.549345] __do_sys_delete_module+0x253/0x370 [ 1360.549352] do_syscall_64+0x55/0x120 [ 1360.549360] entry_SYSCALL_64_after_hwframe+0x55/0x5d
(gdb) list *__nft_release_table+0x473 0x1e033 is in __nft_release_table (net/netfilter/nf_tables_api.c:11354). 11349 list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) { 11350 list_del(&flowtable->list); 11351 nft_use_dec(&table->use); 11352 nf_tables_flowtable_destroy(flowtable); 11353 } 11354 list_for_each_entry_safe(set, ns, &table->sets, list) { 11355 list_del(&set->list); 11356 nft_use_dec(&table->use); 11357 if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) 11358 nft_map_deactivate(&ctx, set); (gdb)
[ 1360.549372] Last potentially related work creation: [ 1360.549376] kasan_save_stack+0x20/0x40 [ 1360.549384] __kasan_record_aux_stack+0x9b/0xb0 [ 1360.549392] __queue_work+0x3fb/0x780 [ 1360.549399] queue_work_on+0x4f/0x60 [ 1360.549407] nft_rhash_remove+0x33b/0x340 [nf_tables] [ 1360.549516] nf_tables_commit+0x1c6a/0x2620 [nf_tables] [ 1360.549625] nfnetlink_rcv_batch+0x728/0xdc0 [nfnetlink] [ 1360.549647] nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink] [ 1360.549671] netlink_unicast+0x367/0x4f0 [ 1360.549680] netlink_sendmsg+0x34b/0x610 [ 1360.549690] ____sys_sendmsg+0x4d4/0x510 [ 1360.549697] ___sys_sendmsg+0xc9/0x120 [ 1360.549706] __sys_sendmsg+0xbe/0x140 [ 1360.549715] do_syscall_64+0x55/0x120 [ 1360.549725] entry_SYSCALL_64_after_hwframe+0x55/0x5d
Fixes: 0935d5588400 ("netfilter: nf_tables: asynchronous release") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -11440,6 +11440,7 @@ static void __exit nf_tables_module_exit unregister_netdevice_notifier(&nf_tables_flowtable_notifier); nft_chain_filter_fini(); nft_chain_route_fini(); + nf_tables_trans_destroy_flush_work(); unregister_pernet_subsys(&nf_tables_net_ops); cancel_work_sync(&trans_gc_work); cancel_work_sync(&trans_destroy_work);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ziyang Xuan william.xuanziyang@huawei.com
commit 24225011d81b471acc0e1e315b7d9905459a6304 upstream.
nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable(). And thhere is not any protection when iterate over nf_tables_flowtables list in __nft_flowtable_type_get(). Therefore, there is pertential data-race of nf_tables_flowtables list entry.
Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller nft_flowtable_type_get() to protect the entire type query process.
Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8167,11 +8167,12 @@ static int nft_flowtable_parse_hook(cons return err; }
+/* call under rcu_read_lock */ static const struct nf_flowtable_type *__nft_flowtable_type_get(u8 family) { const struct nf_flowtable_type *type;
- list_for_each_entry(type, &nf_tables_flowtables, list) { + list_for_each_entry_rcu(type, &nf_tables_flowtables, list) { if (family == type->family) return type; } @@ -8183,9 +8184,13 @@ nft_flowtable_type_get(struct net *net, { const struct nf_flowtable_type *type;
+ rcu_read_lock(); type = __nft_flowtable_type_get(family); - if (type != NULL && try_module_get(type->owner)) + if (type != NULL && try_module_get(type->owner)) { + rcu_read_unlock(); return type; + } + rcu_read_unlock();
lockdep_nfnl_nft_mutex_not_held(); #ifdef CONFIG_MODULES
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 1bc83a019bbe268be3526406245ec28c2458a518 upstream.
Hook unregistration is deferred to the commit phase, same occurs with hook updates triggered by the table dormant flag. When both commands are combined, this results in deleting a basechain while leaving its hook still registered in the core.
Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1209,10 +1209,11 @@ static bool nft_table_pending_update(con return true;
list_for_each_entry(trans, &nft_net->commit_list, list) { - if ((trans->msg_type == NFT_MSG_NEWCHAIN || - trans->msg_type == NFT_MSG_DELCHAIN) && - trans->ctx.table == ctx->table && - nft_trans_chain_update(trans)) + if (trans->ctx.table == ctx->table && + ((trans->msg_type == NFT_MSG_NEWCHAIN && + nft_trans_chain_update(trans)) || + (trans->msg_type == NFT_MSG_DELCHAIN && + nft_is_base_chain(trans->ctx.chain)))) return true; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 0c83842df40f86e529db6842231154772c20edcc upstream.
I got multiple syzbot reports showing old bugs exposed by BPF after commit 20f2505fb436 ("bpf: Try to avoid kzalloc in cgroup/{s,g}etsockopt")
setsockopt() @optlen argument should be taken into account before copying data.
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline] BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline] BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627 Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238
CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189 __asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105 copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] copy_from_sockptr include/linux/sockptr.h:55 [inline] do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline] do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627 nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101 do_sock_setsockopt+0x3af/0x720 net/socket.c:2311 __sys_setsockopt+0x1ae/0x250 net/socket.c:2334 __do_sys_setsockopt net/socket.c:2343 [inline] __se_sys_setsockopt net/socket.c:2340 [inline] __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x72/0x7a RIP: 0033:0x7fd22067dde9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9 RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000 R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8 </TASK>
Allocated by task 7238: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:370 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slub.c:4069 [inline] __kmalloc_noprof+0x200/0x410 mm/slub.c:4082 kmalloc_noprof include/linux/slab.h:664 [inline] __cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869 do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293 __sys_setsockopt+0x1ae/0x250 net/socket.c:2334 __do_sys_setsockopt net/socket.c:2343 [inline] __se_sys_setsockopt net/socket.c:2340 [inline] __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x72/0x7a
The buggy address belongs to the object at ffff88802cd73da0 which belongs to the cache kmalloc-8 of size 8 The buggy address is located 0 bytes inside of allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1)
The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73 flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff) page_type: 0xffffefff(slab) raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122 raw: ffff88802cd73020 000000008080007f 00000001ffffefff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5103, tgid 2119833701 (syz-executor.4), ts 5103, free_ts 70804600828 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1490 prep_new_page mm/page_alloc.c:1498 [inline] get_page_from_freelist+0x2e7e/0x2f40 mm/page_alloc.c:3454 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4712 __alloc_pages_node_noprof include/linux/gfp.h:244 [inline] alloc_pages_node_noprof include/linux/gfp.h:271 [inline] alloc_slab_page+0x5f/0x120 mm/slub.c:2249 allocate_slab+0x5a/0x2e0 mm/slub.c:2412 new_slab mm/slub.c:2465 [inline] ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3615 __slab_alloc+0x58/0xa0 mm/slub.c:3705 __slab_alloc_node mm/slub.c:3758 [inline] slab_alloc_node mm/slub.c:3936 [inline] __do_kmalloc_node mm/slub.c:4068 [inline] kmalloc_node_track_caller_noprof+0x286/0x450 mm/slub.c:4089 kstrdup+0x3a/0x80 mm/util.c:62 device_rename+0xb5/0x1b0 drivers/base/core.c:4558 dev_change_name+0x275/0x860 net/core/dev.c:1232 do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2864 __rtnl_newlink net/core/rtnetlink.c:3680 [inline] rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3727 rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6594 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361 page last free pid 5146 tgid 5146 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1110 [inline] free_unref_page+0xd3c/0xec0 mm/page_alloc.c:2617 discard_slab mm/slub.c:2511 [inline] __put_partials+0xeb/0x130 mm/slub.c:2980 put_cpu_partial+0x17c/0x250 mm/slub.c:3055 __slab_free+0x2ea/0x3d0 mm/slub.c:4254 qlink_free mm/kasan/quarantine.c:163 [inline] qlist_free_all+0x9e/0x140 mm/kasan/quarantine.c:179 kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286 __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3888 [inline] slab_alloc_node mm/slub.c:3948 [inline] __do_kmalloc_node mm/slub.c:4068 [inline] __kmalloc_node_noprof+0x1d7/0x450 mm/slub.c:4076 kmalloc_node_noprof include/linux/slab.h:681 [inline] kvmalloc_node_noprof+0x72/0x190 mm/util.c:634 bucket_table_alloc lib/rhashtable.c:186 [inline] rhashtable_rehash_alloc+0x9e/0x290 lib/rhashtable.c:367 rht_deferred_worker+0x4e1/0x2440 lib/rhashtable.c:427 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
Memory state around the buggy address: ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc
^ ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Pablo Neira Ayuso pablo@netfilter.org Link: https://lore.kernel.org/r/20240404122051.2303764-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bridge/netfilter/ebtables.c | 6 ++++++ net/ipv4/netfilter/arp_tables.c | 4 ++++ net/ipv4/netfilter/ip_tables.c | 4 ++++ net/ipv6/netfilter/ip6_tables.c | 4 ++++ 4 files changed, 18 insertions(+)
--- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -1111,6 +1111,8 @@ static int do_replace(struct net *net, s struct ebt_table_info *newinfo; struct ebt_replace tmp;
+ if (len < sizeof(tmp)) + return -EINVAL; if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) return -EFAULT;
@@ -1423,6 +1425,8 @@ static int update_counters(struct net *n { struct ebt_replace hlp;
+ if (len < sizeof(hlp)) + return -EINVAL; if (copy_from_sockptr(&hlp, arg, sizeof(hlp))) return -EFAULT;
@@ -2352,6 +2356,8 @@ static int compat_update_counters(struct { struct compat_ebt_replace hlp;
+ if (len < sizeof(hlp)) + return -EINVAL; if (copy_from_sockptr(&hlp, arg, sizeof(hlp))) return -EFAULT;
--- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -956,6 +956,8 @@ static int do_replace(struct net *net, s void *loc_cpu_entry; struct arpt_entry *iter;
+ if (len < sizeof(tmp)) + return -EINVAL; if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) return -EFAULT;
@@ -1254,6 +1256,8 @@ static int compat_do_replace(struct net void *loc_cpu_entry; struct arpt_entry *iter;
+ if (len < sizeof(tmp)) + return -EINVAL; if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) return -EFAULT;
--- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -1108,6 +1108,8 @@ do_replace(struct net *net, sockptr_t ar void *loc_cpu_entry; struct ipt_entry *iter;
+ if (len < sizeof(tmp)) + return -EINVAL; if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) return -EFAULT;
@@ -1492,6 +1494,8 @@ compat_do_replace(struct net *net, sockp void *loc_cpu_entry; struct ipt_entry *iter;
+ if (len < sizeof(tmp)) + return -EINVAL; if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) return -EFAULT;
--- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -1125,6 +1125,8 @@ do_replace(struct net *net, sockptr_t ar void *loc_cpu_entry; struct ip6t_entry *iter;
+ if (len < sizeof(tmp)) + return -EINVAL; if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) return -EFAULT;
@@ -1501,6 +1503,8 @@ compat_do_replace(struct net *net, sockp void *loc_cpu_entry; struct ip6t_entry *iter;
+ if (len < sizeof(tmp)) + return -EINVAL; if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) return -EFAULT;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
commit de3f64b738af57e2732b91a0774facc675b75b54 upstream.
If an load_nls_xxx() function fails a few lines above, the 'sbi->bdi_id' is still 0. So, in the error handling path, we will call ida_simple_remove(..., 0) which is not allocated yet.
In order to prevent a spurious "ida_free called for id=0 which is not allocated." message, tweak the error handling path and add a new label.
Fixes: 0fd169576648 ("fs: Add VirtualBox guest shared folder (vboxsf) support") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Link: https://lore.kernel.org/r/d09eaaa4e2e08206c58a1a27ca9b3e81dc168773.169883573... Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/vboxsf/super.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/vboxsf/super.c +++ b/fs/vboxsf/super.c @@ -151,7 +151,7 @@ static int vboxsf_fill_super(struct supe if (!sbi->nls) { vbg_err("vboxsf: Count not load '%s' nls\n", nls_name); err = -EINVAL; - goto fail_free; + goto fail_destroy_idr; } }
@@ -224,6 +224,7 @@ fail_free: ida_simple_remove(&vboxsf_bdi_ida, sbi->bdi_id); if (sbi->nls) unload_nls(sbi->nls); +fail_destroy_idr: idr_destroy(&sbi->ino_idr); kfree(sbi); return err;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Sitnicki jakub@cloudflare.com
commit ff91059932401894e6c86341915615c5eb0eca48 upstream.
syzkaller started using corpuses where a BPF tracing program deletes elements from a sockmap/sockhash map. Because BPF tracing programs can be invoked from any interrupt context, locks taken during a map_delete_elem operation must be hardirq-safe. Otherwise a deadlock due to lock inversion is possible, as reported by lockdep:
CPU0 CPU1 ---- ---- lock(&htab->buckets[i].lock); local_irq_disable(); lock(&host->lock); lock(&htab->buckets[i].lock); <Interrupt> lock(&host->lock);
Locks in sockmap are hardirq-unsafe by design. We expects elements to be deleted from sockmap/sockhash only in task (normal) context with interrupts enabled, or in softirq context.
Detect when map_delete_elem operation is invoked from a context which is _not_ hardirq-unsafe, that is interrupts are disabled, and bail out with an error.
Note that map updates are not affected by this issue. BPF verifier does not allow updating sockmap/sockhash from a BPF tracing program today.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Reported-by: xingwei lee xrivendell7@gmail.com Reported-by: yue sun samsun1006219@gmail.com Reported-by: syzbot+bc922f476bd65abbd466@syzkaller.appspotmail.com Reported-by: syzbot+d4066896495db380182e@syzkaller.appspotmail.com Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Tested-by: syzbot+d4066896495db380182e@syzkaller.appspotmail.com Acked-by: John Fastabend john.fastabend@gmail.com Closes: https://syzkaller.appspot.com/bug?extid=d4066896495db380182e Closes: https://syzkaller.appspot.com/bug?extid=bc922f476bd65abbd466 Link: https://lore.kernel.org/bpf/20240402104621.1050319-1-jakub@cloudflare.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/sock_map.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -411,6 +411,9 @@ static int __sock_map_delete(struct bpf_ struct sock *sk; int err = 0;
+ if (irqs_disabled()) + return -EOPNOTSUPP; /* locks here are hardirq-unsafe */ + spin_lock_bh(&stab->lock); sk = *psk; if (!sk_test || sk_test == sk) @@ -933,6 +936,9 @@ static long sock_hash_delete_elem(struct struct bpf_shtab_elem *elem; int ret = -ENOENT;
+ if (irqs_disabled()) + return -EOPNOTSUPP; /* locks here are hardirq-unsafe */ + hash = sock_hash_bucket_hash(key, key_size); bucket = sock_hash_select_bucket(htab, hash);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov (AMD) bp@alien8.de
commit 0e110732473e14d6520e49d75d2c88ef7d46fe67 upstream.
The srso_alias_untrain_ret() dummy thunk in the !CONFIG_MITIGATION_SRSO case is there only for the altenative in CALL_UNTRAIN_RET to have a symbol to resolve.
However, testing with kernels which don't have CONFIG_MITIGATION_SRSO enabled, leads to the warning in patch_return() to fire:
missing return thunk: srso_alias_untrain_ret+0x0/0x10-0x0: eb 0e 66 66 2e WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:826 apply_returns (arch/x86/kernel/alternative.c:826
Put in a plain "ret" there so that gcc doesn't put a return thunk in in its place which special and gets checked.
In addition:
ERROR: modpost: "srso_alias_untrain_ret" [arch/x86/kvm/kvm-amd.ko] undefined! make[2]: *** [scripts/Makefile.modpost:145: Module.symvers] Chyba 1 make[1]: *** [/usr/src/linux-6.8.3/Makefile:1873: modpost] Chyba 2 make: *** [Makefile:240: __sub-make] Chyba 2
since !SRSO builds would use the dummy return thunk as reported by petr.pisar@atlas.cz, https://bugzilla.kernel.org/show_bug.cgi?id=218679.
Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202404020901.da75a60f-oliver.sang@intel.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/all/202404020901.da75a60f-oliver.sang@intel.com/ Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/lib/retpoline.S | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -222,8 +222,11 @@ SYM_CODE_END(srso_return_thunk) #define JMP_SRSO_UNTRAIN_RET "ud2" /* Dummy for the alternative in CALL_UNTRAIN_RET. */ SYM_CODE_START(srso_alias_untrain_ret) - RET + ANNOTATE_UNRET_SAFE + ret + int3 SYM_FUNC_END(srso_alias_untrain_ret) +__EXPORT_THUNK(srso_alias_untrain_ret) #endif /* CONFIG_CPU_SRSO */
#ifdef CONFIG_CPU_UNRET_ENTRY
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Will Deacon will@kernel.org
commit 4c36a156738887c1edd78589fe192d757989bcde upstream.
When zapping a table entry in stage2_try_break_pte(), we issue range TLB invalidation for the region that was mapped by the table. However, we neglect to align the base address down to the granule size and so if we ended up reaching the table entry via a misaligned address then we will accidentally skip invalidation for some prefix of the affected address range.
Align 'ctx->addr' down to the granule size when performing TLB invalidation for an unmapped table in stage2_try_break_pte().
Cc: Raghavendra Rao Ananta rananta@google.com Cc: Gavin Shan gshan@redhat.com Cc: Shaoqin Huang shahuang@redhat.com Cc: Quentin Perret qperret@google.com Fixes: defc8cc7abf0 ("KVM: arm64: Invalidate the table entries upon a range") Signed-off-by: Will Deacon will@kernel.org Reviewed-by: Shaoqin Huang shahuang@redhat.com Reviewed-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20240327124853.11206-5-will@kernel.org Signed-off-by: Oliver Upton oliver.upton@linux.dev Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp/pgtable.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -805,12 +805,15 @@ static bool stage2_try_break_pte(const s * Perform the appropriate TLB invalidation based on the * evicted pte value (if any). */ - if (kvm_pte_table(ctx->old, ctx->level)) - kvm_tlb_flush_vmid_range(mmu, ctx->addr, - kvm_granule_size(ctx->level)); - else if (kvm_pte_valid(ctx->old)) + if (kvm_pte_table(ctx->old, ctx->level)) { + u64 size = kvm_granule_size(ctx->level); + u64 addr = ALIGN_DOWN(ctx->addr, size); + + kvm_tlb_flush_vmid_range(mmu, addr, size); + } else if (kvm_pte_valid(ctx->old)) { kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level); + } }
if (stage2_pte_is_counted(ctx->old))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit d313eb8b77557a6d5855f42d2234bd592c7b50dd upstream.
syzbot found that tcf_skbmod_dump() was copying four bytes from kernel stack to user space [1].
The issue here is that 'struct tc_skbmod' has a four bytes hole.
We need to clear the structure before filling fields.
[1] BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak in copy_to_user_iter lib/iov_iter.c:24 [inline] BUG: KMSAN: kernel-infoleak in iterate_ubuf include/linux/iov_iter.h:29 [inline] BUG: KMSAN: kernel-infoleak in iterate_and_advance2 include/linux/iov_iter.h:245 [inline] BUG: KMSAN: kernel-infoleak in iterate_and_advance include/linux/iov_iter.h:271 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185 instrument_copy_to_user include/linux/instrumented.h:114 [inline] copy_to_user_iter lib/iov_iter.c:24 [inline] iterate_ubuf include/linux/iov_iter.h:29 [inline] iterate_and_advance2 include/linux/iov_iter.h:245 [inline] iterate_and_advance include/linux/iov_iter.h:271 [inline] _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185 copy_to_iter include/linux/uio.h:196 [inline] simple_copy_to_iter net/core/datagram.c:532 [inline] __skb_datagram_iter+0x185/0x1000 net/core/datagram.c:420 skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546 skb_copy_datagram_msg include/linux/skbuff.h:4050 [inline] netlink_recvmsg+0x432/0x1610 net/netlink/af_netlink.c:1962 sock_recvmsg_nosec net/socket.c:1046 [inline] sock_recvmsg+0x2c4/0x340 net/socket.c:1068 __sys_recvfrom+0x35a/0x5f0 net/socket.c:2242 __do_sys_recvfrom net/socket.c:2260 [inline] __se_sys_recvfrom net/socket.c:2256 [inline] __x64_sys_recvfrom+0x126/0x1d0 net/socket.c:2256 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was stored to memory at: pskb_expand_head+0x30f/0x19d0 net/core/skbuff.c:2253 netlink_trim+0x2c2/0x330 net/netlink/af_netlink.c:1317 netlink_unicast+0x9f/0x1260 net/netlink/af_netlink.c:1351 nlmsg_unicast include/net/netlink.h:1144 [inline] nlmsg_notify+0x21d/0x2f0 net/netlink/af_netlink.c:2610 rtnetlink_send+0x73/0x90 net/core/rtnetlink.c:741 rtnetlink_maybe_send include/linux/rtnetlink.h:17 [inline] tcf_add_notify net/sched/act_api.c:2048 [inline] tcf_action_add net/sched/act_api.c:2071 [inline] tc_ctl_action+0x146e/0x19d0 net/sched/act_api.c:2119 rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595 netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559 rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 ____sys_sendmsg+0x877/0xb60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was stored to memory at: __nla_put lib/nlattr.c:1041 [inline] nla_put+0x1c6/0x230 lib/nlattr.c:1099 tcf_skbmod_dump+0x23f/0xc20 net/sched/act_skbmod.c:256 tcf_action_dump_old net/sched/act_api.c:1191 [inline] tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227 tcf_action_dump+0x1fd/0x460 net/sched/act_api.c:1251 tca_get_fill+0x519/0x7a0 net/sched/act_api.c:1628 tcf_add_notify_msg net/sched/act_api.c:2023 [inline] tcf_add_notify net/sched/act_api.c:2042 [inline] tcf_action_add net/sched/act_api.c:2071 [inline] tc_ctl_action+0x1365/0x19d0 net/sched/act_api.c:2119 rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595 netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559 rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 ____sys_sendmsg+0x877/0xb60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75
Local variable opt created at: tcf_skbmod_dump+0x9d/0xc20 net/sched/act_skbmod.c:244 tcf_action_dump_old net/sched/act_api.c:1191 [inline] tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227
Bytes 188-191 of 248 are uninitialized Memory access of size 248 starts at ffff888117697680 Data copied to user address 00007ffe56d855f0
Fixes: 86da71b57383 ("net_sched: Introduce skbmod action") Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Jamal Hadi Salim jhs@mojatatu.com Link: https://lore.kernel.org/r/20240403130908.93421-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/act_skbmod.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/net/sched/act_skbmod.c +++ b/net/sched/act_skbmod.c @@ -241,13 +241,13 @@ static int tcf_skbmod_dump(struct sk_buf struct tcf_skbmod *d = to_skbmod(a); unsigned char *b = skb_tail_pointer(skb); struct tcf_skbmod_params *p; - struct tc_skbmod opt = { - .index = d->tcf_index, - .refcnt = refcount_read(&d->tcf_refcnt) - ref, - .bindcnt = atomic_read(&d->tcf_bindcnt) - bind, - }; + struct tc_skbmod opt; struct tcf_t t;
+ memset(&opt, 0, sizeof(opt)); + opt.index = d->tcf_index; + opt.refcnt = refcount_read(&d->tcf_refcnt) - ref, + opt.bindcnt = atomic_read(&d->tcf_bindcnt) - bind; spin_lock_bh(&d->tcf_lock); opt.action = d->tcf_action; p = rcu_dereference_protected(d->skbmod_p,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
commit c120209bce34c49dcaba32f15679574327d09f63 upstream.
The definition and declaration of sja1110_pcs_mdio_write_c45() don't have parameters in the same order.
Knowing that sja1110_pcs_mdio_write_c45() is used as a function pointer in 'sja1105_info' structure with .pcs_mdio_write_c45, and that we have:
int (*pcs_mdio_write_c45)(struct mii_bus *bus, int phy, int mmd, int reg, u16 val);
it is likely that the definition is the one to change.
Found with cppcheck, funcArgOrderDifferent.
Fixes: ae271547bba6 ("net: dsa: sja1105: C45 only transactions for PCS") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Reviewed-by: Michael Walle mwalle@kernel.org Reviewed-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://lore.kernel.org/r/ff2a5af67361988b3581831f7bd1eddebfb4c48f.171208276... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/sja1105/sja1105_mdio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/dsa/sja1105/sja1105_mdio.c +++ b/drivers/net/dsa/sja1105/sja1105_mdio.c @@ -94,7 +94,7 @@ int sja1110_pcs_mdio_read_c45(struct mii return tmp & 0xffff; }
-int sja1110_pcs_mdio_write_c45(struct mii_bus *bus, int phy, int reg, int mmd, +int sja1110_pcs_mdio_write_c45(struct mii_bus *bus, int phy, int mmd, int reg, u16 val) { struct sja1105_mdio_private *mdio_priv = bus->priv;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 7eb322360b0266481e560d1807ee79e0cef5742b upstream.
qdisc_tree_reduce_backlog() is called with the qdisc lock held, not RTNL.
We must use qdisc_lookup_rcu() instead of qdisc_lookup()
syzbot reported:
WARNING: suspicious RCU usage 6.1.74-syzkaller #0 Not tainted ----------------------------- net/sched/sch_api.c:305 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 3 locks held by udevd/1142: #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:306 [inline] #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline] #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: net_tx_action+0x64a/0x970 net/core/dev.c:5282 #1: ffff888171861108 (&sch->q.lock){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline] #1: ffff888171861108 (&sch->q.lock){+.-.}-{2:2}, at: net_tx_action+0x754/0x970 net/core/dev.c:5297 #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:306 [inline] #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline] #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: qdisc_tree_reduce_backlog+0x84/0x580 net/sched/sch_api.c:792
stack backtrace: CPU: 1 PID: 1142 Comm: udevd Not tainted 6.1.74-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call Trace: <TASK> [<ffffffff85b85f14>] __dump_stack lib/dump_stack.c:88 [inline] [<ffffffff85b85f14>] dump_stack_lvl+0x1b1/0x28f lib/dump_stack.c:106 [<ffffffff85b86007>] dump_stack+0x15/0x1e lib/dump_stack.c:113 [<ffffffff81802299>] lockdep_rcu_suspicious+0x1b9/0x260 kernel/locking/lockdep.c:6592 [<ffffffff84f0054c>] qdisc_lookup+0xac/0x6f0 net/sched/sch_api.c:305 [<ffffffff84f037c3>] qdisc_tree_reduce_backlog+0x243/0x580 net/sched/sch_api.c:811 [<ffffffff84f5b78c>] pfifo_tail_enqueue+0x32c/0x4b0 net/sched/sch_fifo.c:51 [<ffffffff84fbcf63>] qdisc_enqueue include/net/sch_generic.h:833 [inline] [<ffffffff84fbcf63>] netem_dequeue+0xeb3/0x15d0 net/sched/sch_netem.c:723 [<ffffffff84eecab9>] dequeue_skb net/sched/sch_generic.c:292 [inline] [<ffffffff84eecab9>] qdisc_restart net/sched/sch_generic.c:397 [inline] [<ffffffff84eecab9>] __qdisc_run+0x249/0x1e60 net/sched/sch_generic.c:415 [<ffffffff84d7aa96>] qdisc_run+0xd6/0x260 include/net/pkt_sched.h:125 [<ffffffff84d85d29>] net_tx_action+0x7c9/0x970 net/core/dev.c:5313 [<ffffffff85e002bd>] __do_softirq+0x2bd/0x9bd kernel/softirq.c:616 [<ffffffff81568bca>] invoke_softirq kernel/softirq.c:447 [inline] [<ffffffff81568bca>] __irq_exit_rcu+0xca/0x230 kernel/softirq.c:700 [<ffffffff81568ae9>] irq_exit_rcu+0x9/0x20 kernel/softirq.c:712 [<ffffffff85b89f52>] sysvec_apic_timer_interrupt+0x42/0x90 arch/x86/kernel/apic/apic.c:1107 [<ffffffff85c00ccb>] asm_sysvec_apic_timer_interrupt+0x1b/0x20 arch/x86/include/asm/idtentry.h:656
Fixes: d636fc5dd692 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Jiri Pirko jiri@nvidia.com Acked-by: Jamal Hadi Salim jhs@mojatatu.com Link: https://lore.kernel.org/r/20240402134133.2352776-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_api.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -809,7 +809,7 @@ void qdisc_tree_reduce_backlog(struct Qd notify = !sch->q.qlen && !WARN_ON_ONCE(!n && !qdisc_is_offloaded); /* TODO: perform the search on a per txq basis */ - sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid)); + sch = qdisc_lookup_rcu(qdisc_dev(sch), TC_H_MAJ(parentid)); if (sch == NULL) { WARN_ON_ONCE(parentid != TC_H_ROOT); break;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Piotr Wejman piotrwejman90@gmail.com
commit b3da86d432b7cd65b025a11f68613e333d2483db upstream.
The driver should ensure that same priority is not mapped to multiple rx queues. From DesignWare Cores Ethernet Quality-of-Service Databook, section 17.1.29 MAC_RxQ_Ctrl2: "[...]The software must ensure that the content of this field is mutually exclusive to the PSRQ fields for other queues, that is, the same priority is not mapped to multiple Rx queues[...]"
Previously rx_queue_priority() function was: - clearing all priorities from a queue - adding new priorities to that queue After this patch it will: - first assign new priorities to a queue - then remove those priorities from all other queues - keep other priorities previously assigned to that queue
Fixes: a8f5102af2a7 ("net: stmmac: TX and RX queue priority configuration") Fixes: 2142754f8b9c ("net: stmmac: Add MAC related callbacks for XGMAC2") Signed-off-by: Piotr Wejman piotrwejman90@gmail.com Link: https://lore.kernel.org/r/20240401192239.33942-1-piotrwejman90@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 40 +++++++++++++++----- drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 38 +++++++++++++++---- 2 files changed, 62 insertions(+), 16 deletions(-)
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c @@ -92,19 +92,41 @@ static void dwmac4_rx_queue_priority(str u32 prio, u32 queue) { void __iomem *ioaddr = hw->pcsr; - u32 base_register; - u32 value; + u32 clear_mask = 0; + u32 ctrl2, ctrl3; + int i;
- base_register = (queue < 4) ? GMAC_RXQ_CTRL2 : GMAC_RXQ_CTRL3; - if (queue >= 4) - queue -= 4; + ctrl2 = readl(ioaddr + GMAC_RXQ_CTRL2); + ctrl3 = readl(ioaddr + GMAC_RXQ_CTRL3); + + /* The software must ensure that the same priority + * is not mapped to multiple Rx queues + */ + for (i = 0; i < 4; i++) + clear_mask |= ((prio << GMAC_RXQCTRL_PSRQX_SHIFT(i)) & + GMAC_RXQCTRL_PSRQX_MASK(i));
- value = readl(ioaddr + base_register); + ctrl2 &= ~clear_mask; + ctrl3 &= ~clear_mask;
- value &= ~GMAC_RXQCTRL_PSRQX_MASK(queue); - value |= (prio << GMAC_RXQCTRL_PSRQX_SHIFT(queue)) & + /* First assign new priorities to a queue, then + * clear them from others queues + */ + if (queue < 4) { + ctrl2 |= (prio << GMAC_RXQCTRL_PSRQX_SHIFT(queue)) & GMAC_RXQCTRL_PSRQX_MASK(queue); - writel(value, ioaddr + base_register); + + writel(ctrl2, ioaddr + GMAC_RXQ_CTRL2); + writel(ctrl3, ioaddr + GMAC_RXQ_CTRL3); + } else { + queue -= 4; + + ctrl3 |= (prio << GMAC_RXQCTRL_PSRQX_SHIFT(queue)) & + GMAC_RXQCTRL_PSRQX_MASK(queue); + + writel(ctrl3, ioaddr + GMAC_RXQ_CTRL3); + writel(ctrl2, ioaddr + GMAC_RXQ_CTRL2); + } }
static void dwmac4_tx_queue_priority(struct mac_device_info *hw, --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c @@ -105,17 +105,41 @@ static void dwxgmac2_rx_queue_prio(struc u32 queue) { void __iomem *ioaddr = hw->pcsr; - u32 value, reg; + u32 clear_mask = 0; + u32 ctrl2, ctrl3; + int i;
- reg = (queue < 4) ? XGMAC_RXQ_CTRL2 : XGMAC_RXQ_CTRL3; - if (queue >= 4) + ctrl2 = readl(ioaddr + XGMAC_RXQ_CTRL2); + ctrl3 = readl(ioaddr + XGMAC_RXQ_CTRL3); + + /* The software must ensure that the same priority + * is not mapped to multiple Rx queues + */ + for (i = 0; i < 4; i++) + clear_mask |= ((prio << XGMAC_PSRQ_SHIFT(i)) & + XGMAC_PSRQ(i)); + + ctrl2 &= ~clear_mask; + ctrl3 &= ~clear_mask; + + /* First assign new priorities to a queue, then + * clear them from others queues + */ + if (queue < 4) { + ctrl2 |= (prio << XGMAC_PSRQ_SHIFT(queue)) & + XGMAC_PSRQ(queue); + + writel(ctrl2, ioaddr + XGMAC_RXQ_CTRL2); + writel(ctrl3, ioaddr + XGMAC_RXQ_CTRL3); + } else { queue -= 4;
- value = readl(ioaddr + reg); - value &= ~XGMAC_PSRQ(queue); - value |= (prio << XGMAC_PSRQ_SHIFT(queue)) & XGMAC_PSRQ(queue); + ctrl3 |= (prio << XGMAC_PSRQ_SHIFT(queue)) & + XGMAC_PSRQ(queue);
- writel(value, ioaddr + reg); + writel(ctrl3, ioaddr + XGMAC_RXQ_CTRL3); + writel(ctrl2, ioaddr + XGMAC_RXQ_CTRL2); + } }
static void dwxgmac2_tx_queue_prio(struct mac_device_info *hw, u32 prio,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Horatiu Vultur horatiu.vultur@microchip.com
commit de99e1ea3a35f23ff83a31d6b08f43d27b2c6345 upstream.
There are 2 issues with the blamed commit. 1. When the phy is initialized, it would enable the disabled of UDPv4 checksums. The UDPv6 checksum is already enabled by default. So when 1-step is configured then it would clear these flags. 2. After the 1-step is configured, then if 2-step is configured then the 1-step would be still configured because it is not clearing the flag. So the sync frames will still have origin timestamps set.
Fix this by reading first the value of the register and then just change bit 12 as this one determines if the timestamp needs to be inserted in the frame, without changing any other bits.
Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy") Signed-off-by: Horatiu Vultur horatiu.vultur@microchip.com Reviewed-by: Divya Koppera divya.koppera@microchip.com Link: https://lore.kernel.org/r/20240402071634.2483524-1-horatiu.vultur@microchip.... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/micrel.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -2388,6 +2388,7 @@ static int lan8814_hwtstamp(struct mii_t struct hwtstamp_config config; int txcfg = 0, rxcfg = 0; int pkt_ts_enable; + int tx_mod;
if (copy_from_user(&config, ifr->ifr_data, sizeof(config))) return -EFAULT; @@ -2437,9 +2438,14 @@ static int lan8814_hwtstamp(struct mii_t lanphy_write_page_reg(ptp_priv->phydev, 5, PTP_RX_TIMESTAMP_EN, pkt_ts_enable); lanphy_write_page_reg(ptp_priv->phydev, 5, PTP_TX_TIMESTAMP_EN, pkt_ts_enable);
- if (ptp_priv->hwts_tx_type == HWTSTAMP_TX_ONESTEP_SYNC) + tx_mod = lanphy_read_page_reg(ptp_priv->phydev, 5, PTP_TX_MOD); + if (ptp_priv->hwts_tx_type == HWTSTAMP_TX_ONESTEP_SYNC) { lanphy_write_page_reg(ptp_priv->phydev, 5, PTP_TX_MOD, - PTP_TX_MOD_TX_PTP_SYNC_TS_INSERT_); + tx_mod | PTP_TX_MOD_TX_PTP_SYNC_TS_INSERT_); + } else if (ptp_priv->hwts_tx_type == HWTSTAMP_TX_ON) { + lanphy_write_page_reg(ptp_priv->phydev, 5, PTP_TX_MOD, + tx_mod & ~PTP_TX_MOD_TX_PTP_SYNC_TS_INSERT_); + }
if (config.rx_filter != HWTSTAMP_FILTER_NONE) lan8814_config_ts_intr(ptp_priv->phydev, true);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Duanqiang Wen duanqiangwen@net-swift.com
commit c644920ce9220d83e070f575a4df711741c07f07 upstream.
txgbe clkdev shortened clk_name, so i2c_dev info_name also need to shorten. Otherwise, i2c_dev cannot initialize clock.
Fixes: e30cef001da2 ("net: txgbe: fix clk_name exceed MAX_DEV_ID limits") Signed-off-by: Duanqiang Wen duanqiangwen@net-swift.com Link: https://lore.kernel.org/r/20240402021843.126192-1-duanqiangwen@net-swift.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c @@ -20,6 +20,8 @@ #include "txgbe_phy.h" #include "txgbe_hw.h"
+#define TXGBE_I2C_CLK_DEV_NAME "i2c_dw" + static int txgbe_swnodes_register(struct txgbe *txgbe) { struct txgbe_nodes *nodes = &txgbe->nodes; @@ -551,8 +553,8 @@ static int txgbe_clock_register(struct t char clk_name[32]; struct clk *clk;
- snprintf(clk_name, sizeof(clk_name), "i2c_dw.%d", - pci_dev_id(pdev)); + snprintf(clk_name, sizeof(clk_name), "%s.%d", + TXGBE_I2C_CLK_DEV_NAME, pci_dev_id(pdev));
clk = clk_register_fixed_rate(NULL, clk_name, NULL, 0, 156250000); if (IS_ERR(clk)) @@ -614,7 +616,7 @@ static int txgbe_i2c_register(struct txg
info.parent = &pdev->dev; info.fwnode = software_node_fwnode(txgbe->nodes.group[SWNODE_I2C]); - info.name = "i2c_designware"; + info.name = TXGBE_I2C_CLK_DEV_NAME; info.id = pci_dev_id(pdev);
info.res = &DEFINE_RES_IRQ(pdev->irq);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wei Fang wei.fang@nxp.com
commit cbc17e7802f5de37c7c262204baadfad3f7f99e5 upstream.
Setting mac_managed_pm during interface up is too late.
In situations where the link is not brought up yet and the system suspends the regular PHY power management will run. Since the FEC ETHEREN control bit is cleared (automatically) on suspend the controller is off in resume. When the regular PHY power management resume path runs in this context it will write to the MII_DATA register but nothing will be transmitted on the MDIO bus.
This can be observed by the following log:
fec 5b040000.ethernet eth0: MDIO read timeout Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110 Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110
The data written will however remain in the MII_DATA register.
When the link later is set to administrative up it will trigger a call to fec_restart() which will restore the MII_SPEED register. This triggers the quirk explained in f166f890c8f0 ("net: ethernet: fec: Replace interrupt driven MDIO with polled IO") causing an extra MII_EVENT.
This extra event desynchronizes all the MDIO register reads, causing them to complete too early. Leading all reads to read as 0 because fec_enet_mdio_wait() returns too early.
When a Microchip LAN8700R PHY is connected to the FEC, the 0 reads causes the PHY to be initialized incorrectly and the PHY will not transmit any ethernet signal in this state. It cannot be brought out of this state without a power cycle of the PHY.
Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM") Closes: https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.se... Signed-off-by: Wei Fang wei.fang@nxp.com [jernberg: commit message] Signed-off-by: John Ernberg john.ernberg@actia.se Link: https://lore.kernel.org/r/20240328155909.59613-2-john.ernberg@actia.se Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/freescale/fec_main.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -2381,8 +2381,6 @@ static int fec_enet_mii_probe(struct net fep->link = 0; fep->full_duplex = 0;
- phy_dev->mac_managed_pm = true; - phy_attached_info(phy_dev);
return 0; @@ -2394,10 +2392,12 @@ static int fec_enet_mii_init(struct plat struct net_device *ndev = platform_get_drvdata(pdev); struct fec_enet_private *fep = netdev_priv(ndev); bool suppress_preamble = false; + struct phy_device *phydev; struct device_node *node; int err = -ENXIO; u32 mii_speed, holdtime; u32 bus_freq; + int addr;
/* * The i.MX28 dual fec interfaces are not equal. @@ -2511,6 +2511,13 @@ static int fec_enet_mii_init(struct plat goto err_out_free_mdiobus; of_node_put(node);
+ /* find all the PHY devices on the bus and set mac_managed_pm to true */ + for (addr = 0; addr < PHY_MAX_ADDR; addr++) { + phydev = mdiobus_get_phy(fep->mii_bus, addr); + if (phydev) + phydev->mac_managed_pm = true; + } + mii_cnt++;
/* save fec0 mii_bus */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aleksandr Mishin amishin@t-argos.ru
commit 96c155943a703f0655c0c4cab540f67055960e91 upstream.
In lan8814_get_sig_rx() and lan8814_get_sig_tx() ptp_parse_header() may return NULL as ptp_header due to abnormal packet type or corrupted packet. Fix this bug by adding ptp_header check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy") Signed-off-by: Aleksandr Mishin amishin@t-argos.ru Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20240329061631.33199-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/micrel.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
--- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -2503,7 +2503,7 @@ static void lan8814_txtstamp(struct mii_ } }
-static void lan8814_get_sig_rx(struct sk_buff *skb, u16 *sig) +static bool lan8814_get_sig_rx(struct sk_buff *skb, u16 *sig) { struct ptp_header *ptp_header; u32 type; @@ -2513,7 +2513,11 @@ static void lan8814_get_sig_rx(struct sk ptp_header = ptp_parse_header(skb, type); skb_pull_inline(skb, ETH_HLEN);
+ if (!ptp_header) + return false; + *sig = (__force u16)(ntohs(ptp_header->sequence_id)); + return true; }
static bool lan8814_match_rx_skb(struct kszphy_ptp_priv *ptp_priv, @@ -2525,7 +2529,8 @@ static bool lan8814_match_rx_skb(struct bool ret = false; u16 skb_sig;
- lan8814_get_sig_rx(skb, &skb_sig); + if (!lan8814_get_sig_rx(skb, &skb_sig)) + return ret;
/* Iterate over all RX timestamps and match it with the received skbs */ spin_lock_irqsave(&ptp_priv->rx_ts_lock, flags); @@ -2805,7 +2810,7 @@ static int lan8814_ptpci_adjfine(struct return 0; }
-static void lan8814_get_sig_tx(struct sk_buff *skb, u16 *sig) +static bool lan8814_get_sig_tx(struct sk_buff *skb, u16 *sig) { struct ptp_header *ptp_header; u32 type; @@ -2813,7 +2818,11 @@ static void lan8814_get_sig_tx(struct sk type = ptp_classify_raw(skb); ptp_header = ptp_parse_header(skb, type);
+ if (!ptp_header) + return false; + *sig = (__force u16)(ntohs(ptp_header->sequence_id)); + return true; }
static void lan8814_match_tx_skb(struct kszphy_ptp_priv *ptp_priv, @@ -2827,7 +2836,8 @@ static void lan8814_match_tx_skb(struct
spin_lock_irqsave(&ptp_priv->tx_queue.lock, flags); skb_queue_walk_safe(&ptp_priv->tx_queue, skb, skb_tmp) { - lan8814_get_sig_tx(skb, &skb_sig); + if (!lan8814_get_sig_tx(skb, &skb_sig)) + continue;
if (memcmp(&skb_sig, &seq_id, sizeof(seq_id))) continue; @@ -2881,7 +2891,8 @@ static bool lan8814_match_skb(struct ksz
spin_lock_irqsave(&ptp_priv->rx_queue.lock, flags); skb_queue_walk_safe(&ptp_priv->rx_queue, skb, skb_tmp) { - lan8814_get_sig_rx(skb, &skb_sig); + if (!lan8814_get_sig_rx(skb, &skb_sig)) + continue;
if (memcmp(&skb_sig, &rx_ts->seq_id, sizeof(rx_ts->seq_id))) continue;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Krummsdorf michael.krummsdorf@tq-group.com
commit 625aefac340f45a4fc60908da763f437599a0d6f upstream.
The switch has 4 ports with 2 internal PHYs, but ports are numbered up to 6, with ports 0, 1, 5 and 6 being usable.
Fixes: 71d94a432a15 ("net: dsa: mv88e6xxx: add support for MV88E6020 switch") Signed-off-by: Michael Krummsdorf michael.krummsdorf@tq-group.com Signed-off-by: Matthias Schiffer matthias.schiffer@ew.tq-group.com Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240326123655.40666-1-matthias.schiffer@ew.tq-gro... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/mv88e6xxx/chip.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -5386,8 +5386,12 @@ static const struct mv88e6xxx_info mv88e .family = MV88E6XXX_FAMILY_6250, .name = "Marvell 88E6020", .num_databases = 64, - .num_ports = 4, + /* Ports 2-4 are not routed to pins + * => usable ports 0, 1, 5, 6 + */ + .num_ports = 7, .num_internal_phys = 2, + .invalid_port_mask = BIT(2) | BIT(3) | BIT(4), .max_vid = 4095, .port_base_addr = 0x8, .phy_base_addr = 0x0,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antoine Tenart atenart@kernel.org
commit 0fb101be97ca27850c5ecdbd1269423ce4d1f607 upstream.
UDP tunnel packets can't be GRO in-between their endpoints as this causes different issues. The UDP GRO fwd vxlan tests were relying on this and their expectations have to be fixed.
We keep both vxlan tests and expected no GRO from happening. The vxlan UDP GRO bench test was removed as it's not providing any valuable information now.
Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests") Signed-off-by: Antoine Tenart atenart@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/udpgro_fwd.sh | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
--- a/tools/testing/selftests/net/udpgro_fwd.sh +++ b/tools/testing/selftests/net/udpgro_fwd.sh @@ -241,7 +241,7 @@ for family in 4 6; do
create_vxlan_pair ip netns exec $NS_DST ethtool -K veth$DST rx-gro-list on - run_test "GRO frag list over UDP tunnel" $OL_NET$DST 1 1 + run_test "GRO frag list over UDP tunnel" $OL_NET$DST 10 10 cleanup
# use NAT to circumvent GRO FWD check @@ -254,13 +254,7 @@ for family in 4 6; do # load arp cache before running the test to reduce the amount of # stray traffic on top of the UDP tunnel ip netns exec $NS_SRC $PING -q -c 1 $OL_NET$DST_NAT >/dev/null - run_test "GRO fwd over UDP tunnel" $OL_NET$DST_NAT 1 1 $OL_NET$DST - cleanup - - create_vxlan_pair - run_bench "UDP tunnel fwd perf" $OL_NET$DST - ip netns exec $NS_DST ethtool -K veth$DST rx-udp-gro-forwarding on - run_bench "UDP tunnel GRO fwd perf" $OL_NET$DST + run_test "GRO fwd over UDP tunnel" $OL_NET$DST_NAT 10 10 $OL_NET$DST cleanup done
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antoine Tenart atenart@kernel.org
commit ed4cccef64c1d0d5b91e69f7a8a6697c3a865486 upstream.
If packets are GROed with fraglist they might be segmented later on and continue their journey in the stack. In skb_segment_list those skbs can be reused as-is. This is an issue as their destructor was removed in skb_gro_receive_list but not the reference to their socket, and then they can't be orphaned. Fix this by also removing the reference to the socket.
For example this could be observed,
kernel BUG at include/linux/skbuff.h:3131! (skb_orphan) RIP: 0010:ip6_rcv_core+0x11bc/0x19a0 Call Trace: ipv6_list_rcv+0x250/0x3f0 __netif_receive_skb_list_core+0x49d/0x8f0 netif_receive_skb_list_internal+0x634/0xd40 napi_complete_done+0x1d2/0x7d0 gro_cell_poll+0x118/0x1f0
A similar construction is found in skb_gro_receive, apply the same change there.
Fixes: 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb carring sock reference") Signed-off-by: Antoine Tenart atenart@kernel.org Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/gro.c | 3 ++- net/ipv4/udp_offload.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
--- a/net/core/gro.c +++ b/net/core/gro.c @@ -195,8 +195,9 @@ int skb_gro_receive(struct sk_buff *p, s }
merge: - /* sk owenrship - if any - completely transferred to the aggregated packet */ + /* sk ownership - if any - completely transferred to the aggregated packet */ skb->destructor = NULL; + skb->sk = NULL; delta_truesize = skb->truesize; if (offset > headlen) { unsigned int eat = offset - headlen; --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -449,8 +449,9 @@ static int skb_gro_receive_list(struct s NAPI_GRO_CB(p)->count++; p->data_len += skb->len;
- /* sk owenrship - if any - completely transferred to the aggregated packet */ + /* sk ownership - if any - completely transferred to the aggregated packet */ skb->destructor = NULL; + skb->sk = NULL; p->truesize += skb->truesize; p->len += skb->len;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Oros poros@redhat.com
commit 8edfc7a40e3300fc6c5fa7a3228a24d5bcd86ba5 upstream.
ice_port_vlan_on/off() was introduced in commit 2946204b3fa8 ("ice: implement bridge port vlan"). But ice_port_vlan_on() incorrectly assigns ena_rx_filtering to inner_vlan_ops in DVM mode. This causes an error when rx_filtering cannot be enabled in legacy mode.
Reproducer: echo 1 > /sys/class/net/$PF/device/sriov_numvfs ip link set $PF vf 0 spoofchk off trust on vlan 3 dmesg: ice 0000:41:00.0: failed to enable Rx VLAN filtering for VF 0 VSI 9 during VF rebuild, error -95
Fixes: 2946204b3fa8 ("ice: implement bridge port vlan") Signed-off-by: Petr Oros poros@redhat.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- .../ethernet/intel/ice/ice_vf_vsi_vlan_ops.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_vsi_vlan_ops.c b/drivers/net/ethernet/intel/ice/ice_vf_vsi_vlan_ops.c index 80dc4bcdd3a4..b3e1bdcb80f8 100644 --- a/drivers/net/ethernet/intel/ice/ice_vf_vsi_vlan_ops.c +++ b/drivers/net/ethernet/intel/ice/ice_vf_vsi_vlan_ops.c @@ -26,24 +26,22 @@ static void ice_port_vlan_on(struct ice_vsi *vsi) struct ice_vsi_vlan_ops *vlan_ops; struct ice_pf *pf = vsi->back;
+ /* setup inner VLAN ops */ + vlan_ops = &vsi->inner_vlan_ops; + if (ice_is_dvm_ena(&pf->hw)) { - vlan_ops = &vsi->outer_vlan_ops; - - /* setup outer VLAN ops */ - vlan_ops->set_port_vlan = ice_vsi_set_outer_port_vlan; - vlan_ops->clear_port_vlan = ice_vsi_clear_outer_port_vlan; - - /* setup inner VLAN ops */ - vlan_ops = &vsi->inner_vlan_ops; vlan_ops->add_vlan = noop_vlan_arg; vlan_ops->del_vlan = noop_vlan_arg; vlan_ops->ena_stripping = ice_vsi_ena_inner_stripping; vlan_ops->dis_stripping = ice_vsi_dis_inner_stripping; vlan_ops->ena_insertion = ice_vsi_ena_inner_insertion; vlan_ops->dis_insertion = ice_vsi_dis_inner_insertion; - } else { - vlan_ops = &vsi->inner_vlan_ops;
+ /* setup outer VLAN ops */ + vlan_ops = &vsi->outer_vlan_ops; + vlan_ops->set_port_vlan = ice_vsi_set_outer_port_vlan; + vlan_ops->clear_port_vlan = ice_vsi_clear_outer_port_vlan; + } else { vlan_ops->set_port_vlan = ice_vsi_set_inner_port_vlan; vlan_ops->clear_port_vlan = ice_vsi_clear_inner_port_vlan; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
commit ea2a1cfc3b2019bdea6324acd3c03606b60d71ad upstream.
Commit 73d9629e1c8c ("i40e: Do not allow untrusted VF to remove administratively set MAC") fixed an issue where untrusted VF was allowed to remove its own MAC address although this was assigned administratively from PF. Unfortunately the introduced check is wrong because it causes that MAC filters for other MAC addresses including multi-cast ones are not removed.
<snip> if (ether_addr_equal(addr, vf->default_lan_addr.addr) && i40e_can_vf_change_mac(vf)) was_unimac_deleted = true; else continue;
if (i40e_del_mac_filter(vsi, al->list[i].addr)) { ... </snip>
The else path with `continue` effectively skips any MAC filter removal except one for primary MAC addr when VF is allowed to do so. Fix the check condition so the `continue` is only done for primary MAC address.
Fixes: 73d9629e1c8c ("i40e: Do not allow untrusted VF to remove administratively set MAC") Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Michal Schmidt mschmidt@redhat.com Reviewed-by: Brett Creeley brett.creeley@amd.com Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Link: https://lore.kernel.org/r/20240329180638.211412-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -3143,11 +3143,12 @@ static int i40e_vc_del_mac_addr_msg(stru /* Allow to delete VF primary MAC only if it was not set * administratively by PF or if VF is trusted. */ - if (ether_addr_equal(addr, vf->default_lan_addr.addr) && - i40e_can_vf_change_mac(vf)) - was_unimac_deleted = true; - else - continue; + if (ether_addr_equal(addr, vf->default_lan_addr.addr)) { + if (i40e_can_vf_change_mac(vf)) + was_unimac_deleted = true; + else + continue; + }
if (i40e_del_mac_filter(vsi, al->list[i].addr)) { ret = -EINVAL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 17af420545a750f763025149fa7b833a4fc8b8f0 upstream.
syzbot reported a problem in ip6erspan_rcv() [1]
Issue is that ip6erspan_rcv() (and erspan_rcv()) no longer make sure erspan_base_hdr is present in skb linear part (skb->head) before getting @ver field from it.
Add the missing pskb_may_pull() calls.
v2: Reload iph pointer in erspan_rcv() after pskb_may_pull() because skb->head might have changed.
[1]
BUG: KMSAN: uninit-value in pskb_may_pull_reason include/linux/skbuff.h:2742 [inline] BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2756 [inline] BUG: KMSAN: uninit-value in ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline] BUG: KMSAN: uninit-value in gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610 pskb_may_pull_reason include/linux/skbuff.h:2742 [inline] pskb_may_pull include/linux/skbuff.h:2756 [inline] ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline] gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610 ip6_protocol_deliver_rcu+0x1d4c/0x2ca0 net/ipv6/ip6_input.c:438 ip6_input_finish net/ipv6/ip6_input.c:483 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492 ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586 dst_input include/net/dst.h:460 [inline] ip6_rcv_finish+0x955/0x970 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:314 [inline] ipv6_rcv+0xde/0x390 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core net/core/dev.c:5538 [inline] __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5652 netif_receive_skb_internal net/core/dev.c:5738 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5798 tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1549 tun_get_user+0x5566/0x69e0 drivers/net/tun.c:2002 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2108 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xb63/0x1520 fs/read_write.c:590 ksys_write+0x20f/0x4c0 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x93/0xe0 fs/read_write.c:652 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577 __alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1318 [inline] alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504 sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795 tun_alloc_skb drivers/net/tun.c:1525 [inline] tun_get_user+0x209a/0x69e0 drivers/net/tun.c:1846 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2108 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xb63/0x1520 fs/read_write.c:590 ksys_write+0x20f/0x4c0 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x93/0xe0 fs/read_write.c:652 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75
CPU: 1 PID: 5045 Comm: syz-executor114 Not tainted 6.9.0-rc1-syzkaller-00021-g962490525cff #0
Fixes: cb73ee40b1b3 ("net: ip_gre: use erspan key field for tunnel lookup") Reported-by: syzbot+1c1cf138518bf0c53d68@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/000000000000772f2c0614b66ef7@google.com/ Signed-off-by: Eric Dumazet edumazet@google.com Cc: Lorenzo Bianconi lorenzo@kernel.org Link: https://lore.kernel.org/r/20240328112248.1101491-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/ip_gre.c | 5 +++++ net/ipv6/ip6_gre.c | 3 +++ 2 files changed, 8 insertions(+)
--- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -280,8 +280,13 @@ static int erspan_rcv(struct sk_buff *sk tpi->flags | TUNNEL_NO_KEY, iph->saddr, iph->daddr, 0); } else { + if (unlikely(!pskb_may_pull(skb, + gre_hdr_len + sizeof(*ershdr)))) + return PACKET_REJECT; + ershdr = (struct erspan_base_hdr *)(skb->data + gre_hdr_len); ver = ershdr->ver; + iph = ip_hdr(skb); tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags | TUNNEL_KEY, iph->saddr, iph->daddr, tpi->key); --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -528,6 +528,9 @@ static int ip6erspan_rcv(struct sk_buff struct ip6_tnl *tunnel; u8 ver;
+ if (unlikely(!pskb_may_pull(skb, sizeof(*ershdr)))) + return PACKET_REJECT; + ipv6h = ipv6_hdr(skb); ershdr = (struct erspan_base_hdr *)skb->data; ver = ershdr->ver;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
commit 31974122cfdeaf56abc18d8ab740d580d9833e90 upstream.
The netdev CI runs in a VM and captures serial, so stdout and stderr get combined. Because there's a missing new line in stderr the test ends up corrupting KTAP:
# Successok 1 selftests: net: reuseaddr_conflict
which should have been:
# Success ok 1 selftests: net: reuseaddr_conflict
Fixes: 422d8dc6fd3a ("selftest: add a reuseaddr test") Reviewed-by: Muhammad Usama Anjum usama.anjum@collabora.com Link: https://lore.kernel.org/r/20240329160559.249476-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/reuseaddr_conflict.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/net/reuseaddr_conflict.c +++ b/tools/testing/selftests/net/reuseaddr_conflict.c @@ -109,6 +109,6 @@ int main(void) fd1 = open_port(0, 1); if (fd1 >= 0) error(1, 0, "Was allowed to create an ipv4 reuseport on an already bound non-reuseport socket with no ipv6"); - fprintf(stderr, "Success"); + fprintf(stderr, "Success\n"); return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
commit d91ef1e1b55f730bee8ce286b02b7bdccbc42973 upstream.
Jianguo Wu reported another bind() regression introduced by bhash2.
Calling bind() for the following 3 addresses on the same port, the 3rd one should fail but now succeeds.
1. 0.0.0.0 or ::ffff:0.0.0.0 2. [::] w/ IPV6_V6ONLY 3. IPv4 non-wildcard address or v4-mapped-v6 non-wildcard address
The first two bind() create tb2 like this:
bhash2 -> tb2(:: w/ IPV6_V6ONLY) -> tb2(0.0.0.0)
The 3rd bind() will match with the IPv6 only wildcard address bucket in inet_bind2_bucket_match_addr_any(), however, no conflicting socket exists in the bucket. So, inet_bhash2_conflict() will returns false, and thus, inet_bhash2_addr_any_conflict() returns false consequently.
As a result, the 3rd bind() bypasses conflict check, which should be done against the IPv4 wildcard address bucket.
So, in inet_bhash2_addr_any_conflict(), we must iterate over all buckets.
Note that we cannot add ipv6_only flag for inet_bind2_bucket as it would confuse the following patetrn.
1. [::] w/ SO_REUSE{ADDR,PORT} and IPV6_V6ONLY 2. [::] w/ SO_REUSE{ADDR,PORT} 3. IPv4 non-wildcard address or v4-mapped-v6 non-wildcard address
The first bind() would create a bucket with ipv6_only flag true, the second bind() would add the [::] socket into the same bucket, and the third bind() could succeed based on the wrong assumption that ipv6_only bucket would not conflict with v4(-mapped-v6) address.
Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Diagnosed-by: Jianguo Wu wujianguo106@163.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240326204251.51301-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/inet_connection_sock.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-)
--- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -289,6 +289,7 @@ static bool inet_bhash2_addr_any_conflic struct sock_reuseport *reuseport_cb; struct inet_bind_hashbucket *head2; struct inet_bind2_bucket *tb2; + bool conflict = false; bool reuseport_cb_ok;
rcu_read_lock(); @@ -301,18 +302,20 @@ static bool inet_bhash2_addr_any_conflic
spin_lock(&head2->lock);
- inet_bind_bucket_for_each(tb2, &head2->chain) - if (inet_bind2_bucket_match_addr_any(tb2, net, port, l3mdev, sk)) - break; - - if (tb2 && inet_bhash2_conflict(sk, tb2, uid, relax, reuseport_cb_ok, - reuseport_ok)) { - spin_unlock(&head2->lock); - return true; + inet_bind_bucket_for_each(tb2, &head2->chain) { + if (!inet_bind2_bucket_match_addr_any(tb2, net, port, l3mdev, sk)) + continue; + + if (!inet_bhash2_conflict(sk, tb2, uid, relax, reuseport_cb_ok, reuseport_ok)) + continue; + + conflict = true; + break; }
spin_unlock(&head2->lock); - return false; + + return conflict; }
/*
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Duoming Zhou duoming@zju.edu.cn
commit fd819ad3ecf6f3c232a06b27423ce9ed8c20da89 upstream.
When the ax25 device is detaching, the ax25_dev_device_down() calls ax25_ds_del_timer() to cleanup the slave_timer. When the timer handler is running, the ax25_ds_del_timer() that calls del_timer() in it will return directly. As a result, the use-after-free bugs could happen, one of the scenarios is shown below:
(Thread 1) | (Thread 2) | ax25_ds_timeout() ax25_dev_device_down() | ax25_ds_del_timer() | del_timer() | ax25_dev_put() //FREE | | ax25_dev-> //USE
In order to mitigate bugs, when the device is detaching, use timer_shutdown_sync() to stop the timer.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240329015023.9223-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ax25/ax25_dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ax25/ax25_dev.c +++ b/net/ax25/ax25_dev.c @@ -105,7 +105,7 @@ void ax25_dev_device_down(struct net_dev spin_lock_bh(&ax25_dev_lock);
#ifdef CONFIG_AX25_DAMA_SLAVE - ax25_ds_del_timer(ax25_dev); + timer_shutdown_sync(&ax25_dev->dama.slave_timer); #endif
/*
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
commit d21d40605bca7bd5fc23ef03d4c1ca1f48bc2cae upstream.
syzkaller reported infinite recursive calls of fib6_dump_done() during netlink socket destruction. [1]
From the log, syzkaller sent an AF_UNSPEC RTM_GETROUTE message, and then
the response was generated. The following recvmmsg() resumed the dump for IPv6, but the first call of inet6_dump_fib() failed at kzalloc() due to the fault injection. [0]
12:01:34 executing program 3: r0 = socket$nl_route(0x10, 0x3, 0x0) sendmsg$nl_route(r0, ... snip ...) recvmmsg(r0, ... snip ...) (fail_nth: 8)
Here, fib6_dump_done() was set to nlk_sk(sk)->cb.done, and the next call of inet6_dump_fib() set it to nlk_sk(sk)->cb.args[3]. syzkaller stopped receiving the response halfway through, and finally netlink_sock_destruct() called nlk_sk(sk)->cb.done().
fib6_dump_done() calls fib6_dump_end() and nlk_sk(sk)->cb.done() if it is still not NULL. fib6_dump_end() rewrites nlk_sk(sk)->cb.done() by nlk_sk(sk)->cb.args[3], but it has the same function, not NULL, calling itself recursively and hitting the stack guard page.
To avoid the issue, let's set the destructor after kzalloc().
[0]: FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 1 PID: 432110 Comm: syz-executor.3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:117) should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153) should_failslab (mm/slub.c:3733) kmalloc_trace (mm/slub.c:3748 mm/slub.c:3827 mm/slub.c:3992) inet6_dump_fib (./include/linux/slab.h:628 ./include/linux/slab.h:749 net/ipv6/ip6_fib.c:662) rtnl_dump_all (net/core/rtnetlink.c:4029) netlink_dump (net/netlink/af_netlink.c:2269) netlink_recvmsg (net/netlink/af_netlink.c:1988) ____sys_recvmsg (net/socket.c:1046 net/socket.c:2801) ___sys_recvmsg (net/socket.c:2846) do_recvmmsg (net/socket.c:2943) __x64_sys_recvmmsg (net/socket.c:3041 net/socket.c:3034 net/socket.c:3034)
[1]: BUG: TASK stack guard page was hit at 00000000f2fa9af1 (stack is 00000000b7912430..000000009a436beb) stack guard page: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 223719 Comm: kworker/1:3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: events netlink_sock_destruct_work RIP: 0010:fib6_dump_done (net/ipv6/ip6_fib.c:570) Code: 3c 24 e8 f3 e9 51 fd e9 28 fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 48 89 fd <53> 48 8d 5d 60 e8 b6 4d 07 fd 48 89 da 48 b8 00 00 00 00 00 fc ff RSP: 0018:ffffc9000d980000 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffffffff84405990 RCX: ffffffff844059d3 RDX: ffff8881028e0000 RSI: ffffffff84405ac2 RDI: ffff88810c02f358 RBP: ffff88810c02f358 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000224 R12: 0000000000000000 R13: ffff888007c82c78 R14: ffff888007c82c68 R15: ffff888007c82c68 FS: 0000000000000000(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc9000d97fff8 CR3: 0000000102309002 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <#DF> </#DF> <TASK> fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1)) fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1)) ... fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1)) fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1)) netlink_sock_destruct (net/netlink/af_netlink.c:401) __sk_destruct (net/core/sock.c:2177 (discriminator 2)) sk_destruct (net/core/sock.c:2224) __sk_free (net/core/sock.c:2235) sk_free (net/core/sock.c:2246) process_one_work (kernel/workqueue.c:3259) worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416) kthread (kernel/kthread.c:388) ret_from_fork (arch/x86/kernel/process.c:153) ret_from_fork_asm (arch/x86/entry/entry_64.S:256) Modules linked in:
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/20240401211003.25274-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_fib.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
--- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -645,19 +645,19 @@ static int inet6_dump_fib(struct sk_buff if (!w) { /* New dump: * - * 1. hook callback destructor. - */ - cb->args[3] = (long)cb->done; - cb->done = fib6_dump_done; - - /* - * 2. allocate and initialize walker. + * 1. allocate and initialize walker. */ w = kzalloc(sizeof(*w), GFP_ATOMIC); if (!w) return -ENOMEM; w->func = fib6_dump_node; cb->args[2] = (long)w; + + /* 2. hook callback destructor. + */ + cb->args[3] = (long)cb->done; + cb->done = fib6_dump_done; + }
arg.skb = skb;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Thompson davthompson@nvidia.com
commit 09ba28e1cd3cf715daab1fca6e1623e22fd754a6 upstream.
The mlxbf_gige driver intermittantly encounters a NULL pointer exception while the system is shutting down via "reboot" command. The mlxbf_driver will experience an exception right after executing its shutdown() method. One example of this exception is:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=000000011d373000 [0000000000000070] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 96000004 [#1] SMP CPU: 0 PID: 13 Comm: ksoftirqd/0 Tainted: G S OE 5.15.0-bf.6.gef6992a #1 Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.0.2.12669 Apr 21 2023 pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige] lr : mlxbf_gige_poll+0x54/0x160 [mlxbf_gige] sp : ffff8000080d3c10 x29: ffff8000080d3c10 x28: ffffcce72cbb7000 x27: ffff8000080d3d58 x26: ffff0000814e7340 x25: ffff331cd1a05000 x24: ffffcce72c4ea008 x23: ffff0000814e4b40 x22: ffff0000814e4d10 x21: ffff0000814e4128 x20: 0000000000000000 x19: ffff0000814e4a80 x18: ffffffffffffffff x17: 000000000000001c x16: ffffcce72b4553f4 x15: ffff80008805b8a7 x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101 x11: 7f7f7f7f7f7f7f7f x10: c2ac898b17576267 x9 : ffffcce720fa5404 x8 : ffff000080812138 x7 : 0000000000002e9a x6 : 0000000000000080 x5 : ffff00008de3b000 x4 : 0000000000000000 x3 : 0000000000000001 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige] mlxbf_gige_poll+0x54/0x160 [mlxbf_gige] __napi_poll+0x40/0x1c8 net_rx_action+0x314/0x3a0 __do_softirq+0x128/0x334 run_ksoftirqd+0x54/0x6c smpboot_thread_fn+0x14c/0x190 kthread+0x10c/0x110 ret_from_fork+0x10/0x20 Code: 8b070000 f9000ea0 f95056c0 f86178a1 (b9407002) ---[ end trace 7cc3941aa0d8e6a4 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt Kernel Offset: 0x4ce722520000 from 0xffff800008000000 PHYS_OFFSET: 0x80000000 CPU features: 0x000005c1,a3330e5a Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
During system shutdown, the mlxbf_gige driver's shutdown() is always executed. However, the driver's stop() method will only execute if networking interface configuration logic within the Linux distribution has been setup to do so.
If shutdown() executes but stop() does not execute, NAPI remains enabled and this can lead to an exception if NAPI is scheduled while the hardware interface has only been partially deinitialized.
The networking interface managed by the mlxbf_gige driver must be properly stopped during system shutdown so that IFF_UP is cleared, the hardware interface is put into a clean state, and NAPI is fully deinitialized.
Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: David Thompson davthompson@nvidia.com Link: https://lore.kernel.org/r/20240325210929.25362-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c @@ -14,6 +14,7 @@ #include <linux/module.h> #include <linux/phy.h> #include <linux/platform_device.h> +#include <linux/rtnetlink.h> #include <linux/skbuff.h>
#include "mlxbf_gige.h" @@ -494,8 +495,13 @@ static void mlxbf_gige_shutdown(struct p { struct mlxbf_gige *priv = platform_get_drvdata(pdev);
- writeq(0, priv->base + MLXBF_GIGE_INT_EN); - mlxbf_gige_clean_port(priv); + rtnl_lock(); + netif_device_detach(priv->netdev); + + if (netif_running(priv->netdev)) + dev_close(priv->netdev); + + rtnl_unlock(); }
static const struct acpi_device_id __maybe_unused mlxbf_gige_acpi_match[] = {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Atlas Yu atlas.yu@canonical.com
commit 5e864d90b20803edf6bd44a99fb9afa7171785f2 upstream.
On devices that support DASH, the current code in the "rtl_loop_wait" function raises false alarms when DASH is disabled. This occurs because the function attempts to wait for the DASH firmware to be ready, even though it's not relevant in this case.
r8169 0000:0c:00.0 eth0: RTL8168ep/8111ep, 38:7c:76:49:08:d9, XID 502, IRQ 86 r8169 0000:0c:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko] r8169 0000:0c:00.0 eth0: DASH disabled ... r8169 0000:0c:00.0 eth0: rtl_ep_ocp_read_cond == 0 (loop: 30, delay: 10000).
This patch modifies the driver start/stop functions to skip checking the DASH firmware status when DASH is explicitly disabled. This prevents unnecessary delays and false alarms.
The patch has been tested on several ThinkStation P8/PX workstations.
Fixes: 0ab0c45d8aae ("r8169: add handling DASH when DASH is disabled") Signed-off-by: Atlas Yu atlas.yu@canonical.com Reviewed-by: Heiner Kallweit hkallweit1@gmail.com Link: https://lore.kernel.org/r/20240328055152.18443-1-atlas.yu@canonical.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/realtek/r8169_main.c | 31 ++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -1201,17 +1201,40 @@ static void rtl8168ep_stop_cmac(struct r RTL_W8(tp, IBCR0, RTL_R8(tp, IBCR0) & ~0x01); }
+static void rtl_dash_loop_wait(struct rtl8169_private *tp, + const struct rtl_cond *c, + unsigned long usecs, int n, bool high) +{ + if (!tp->dash_enabled) + return; + rtl_loop_wait(tp, c, usecs, n, high); +} + +static void rtl_dash_loop_wait_high(struct rtl8169_private *tp, + const struct rtl_cond *c, + unsigned long d, int n) +{ + rtl_dash_loop_wait(tp, c, d, n, true); +} + +static void rtl_dash_loop_wait_low(struct rtl8169_private *tp, + const struct rtl_cond *c, + unsigned long d, int n) +{ + rtl_dash_loop_wait(tp, c, d, n, false); +} + static void rtl8168dp_driver_start(struct rtl8169_private *tp) { r8168dp_oob_notify(tp, OOB_CMD_DRIVER_START); - rtl_loop_wait_high(tp, &rtl_dp_ocp_read_cond, 10000, 10); + rtl_dash_loop_wait_high(tp, &rtl_dp_ocp_read_cond, 10000, 10); }
static void rtl8168ep_driver_start(struct rtl8169_private *tp) { r8168ep_ocp_write(tp, 0x01, 0x180, OOB_CMD_DRIVER_START); r8168ep_ocp_write(tp, 0x01, 0x30, r8168ep_ocp_read(tp, 0x30) | 0x01); - rtl_loop_wait_high(tp, &rtl_ep_ocp_read_cond, 10000, 30); + rtl_dash_loop_wait_high(tp, &rtl_ep_ocp_read_cond, 10000, 30); }
static void rtl8168_driver_start(struct rtl8169_private *tp) @@ -1225,7 +1248,7 @@ static void rtl8168_driver_start(struct static void rtl8168dp_driver_stop(struct rtl8169_private *tp) { r8168dp_oob_notify(tp, OOB_CMD_DRIVER_STOP); - rtl_loop_wait_low(tp, &rtl_dp_ocp_read_cond, 10000, 10); + rtl_dash_loop_wait_low(tp, &rtl_dp_ocp_read_cond, 10000, 10); }
static void rtl8168ep_driver_stop(struct rtl8169_private *tp) @@ -1233,7 +1256,7 @@ static void rtl8168ep_driver_stop(struct rtl8168ep_stop_cmac(tp); r8168ep_ocp_write(tp, 0x01, 0x180, OOB_CMD_DRIVER_STOP); r8168ep_ocp_write(tp, 0x01, 0x30, r8168ep_ocp_read(tp, 0x30) | 0x01); - rtl_loop_wait_low(tp, &rtl_ep_ocp_read_cond, 10000, 10); + rtl_dash_loop_wait_low(tp, &rtl_ep_ocp_read_cond, 10000, 10); }
static void rtl8168_driver_stop(struct rtl8169_private *tp)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antoine Tenart atenart@kernel.org
commit 3d010c8031e39f5fa1e8b13ada77e0321091011f upstream.
When rx-udp-gro-forwarding is enabled UDP packets might be GROed when being forwarded. If such packets might land in a tunnel this can cause various issues and udp_gro_receive makes sure this isn't the case by looking for a matching socket. This is performed in udp4/6_gro_lookup_skb but only in the current netns. This is an issue with tunneled packets when the endpoint is in another netns. In such cases the packets will be GROed at the UDP level, which leads to various issues later on. The same thing can happen with rx-gro-list.
We saw this with geneve packets being GROed at the UDP level. In such case gso_size is set; later the packet goes through the geneve rx path, the geneve header is pulled, the offset are adjusted and frag_list skbs are not adjusted with regard to geneve. When those skbs hit skb_fragment, it will misbehave. Different outcomes are possible depending on what the GROed skbs look like; from corrupted packets to kernel crashes.
One example is a BUG_ON[1] triggered in skb_segment while processing the frag_list. Because gso_size is wrong (geneve header was pulled) skb_segment thinks there is "geneve header size" of data in frag_list, although it's in fact the next packet. The BUG_ON itself has nothing to do with the issue. This is only one of the potential issues.
Looking up for a matching socket in udp_gro_receive is fragile: the lookup could be extended to all netns (not speaking about performances) but nothing prevents those packets from being modified in between and we could still not find a matching socket. It's OK to keep the current logic there as it should cover most cases but we also need to make sure we handle tunnel packets being GROed too early.
This is done by extending the checks in udp_unexpected_gso: GSO packets lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits and landing in a tunnel must be segmented.
[1] kernel BUG at net/core/skbuff.c:4408! RIP: 0010:skb_segment+0xd2a/0xf70 __udp_gso_segment+0xaa/0x560
Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets") Signed-off-by: Antoine Tenart atenart@kernel.org Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/udp.h | 28 ++++++++++++++++++++++++++++ net/ipv4/udp.c | 7 +++++++ net/ipv4/udp_offload.c | 6 ++++-- net/ipv6/udp.c | 2 +- 4 files changed, 40 insertions(+), 3 deletions(-)
--- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -140,6 +140,24 @@ static inline void udp_cmsg_recv(struct } }
+DECLARE_STATIC_KEY_FALSE(udp_encap_needed_key); +#if IS_ENABLED(CONFIG_IPV6) +DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key); +#endif + +static inline bool udp_encap_needed(void) +{ + if (static_branch_unlikely(&udp_encap_needed_key)) + return true; + +#if IS_ENABLED(CONFIG_IPV6) + if (static_branch_unlikely(&udpv6_encap_needed_key)) + return true; +#endif + + return false; +} + static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb) { if (!skb_is_gso(skb)) @@ -153,6 +171,16 @@ static inline bool udp_unexpected_gso(st !udp_test_bit(ACCEPT_FRAGLIST, sk)) return true;
+ /* GSO packets lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits might still + * land in a tunnel as the socket check in udp_gro_receive cannot be + * foolproof. + */ + if (udp_encap_needed() && + READ_ONCE(udp_sk(sk)->encap_rcv) && + !(skb_shinfo(skb)->gso_type & + (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM))) + return true; + return false; }
--- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -584,6 +584,13 @@ static inline bool __udp_is_mcast_sock(s }
DEFINE_STATIC_KEY_FALSE(udp_encap_needed_key); +EXPORT_SYMBOL(udp_encap_needed_key); + +#if IS_ENABLED(CONFIG_IPV6) +DEFINE_STATIC_KEY_FALSE(udpv6_encap_needed_key); +EXPORT_SYMBOL(udpv6_encap_needed_key); +#endif + void udp_encap_enable(void) { static_branch_inc(&udp_encap_needed_key); --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -552,8 +552,10 @@ struct sk_buff *udp_gro_receive(struct l unsigned int off = skb_gro_offset(skb); int flush = 1;
- /* we can do L4 aggregation only if the packet can't land in a tunnel - * otherwise we could corrupt the inner stream + /* We can do L4 aggregation only if the packet can't land in a tunnel + * otherwise we could corrupt the inner stream. Detecting such packets + * cannot be foolproof and the aggregation might still happen in some + * cases. Such packets should be caught in udp_unexpected_gso later. */ NAPI_GRO_CB(skb)->is_flist = 0; if (!sk || !udp_sk(sk)->gro_receive) { --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -450,7 +450,7 @@ csum_copy_err: goto try_again; }
-DEFINE_STATIC_KEY_FALSE(udpv6_encap_needed_key); +DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key); void udpv6_encap_enable(void) { static_branch_inc(&udpv6_encap_needed_key);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antoine Tenart atenart@kernel.org
commit f0b8c30345565344df2e33a8417a27503589247d upstream.
UDP GRO validates checksums and in udp4/6_gro_complete fraglist packets are converted to CHECKSUM_UNNECESSARY to avoid later checks. However this is an issue for CHECKSUM_PARTIAL packets as they can be looped in an egress path and then their partial checksums are not fixed.
Different issues can be observed, from invalid checksum on packets to traces like:
gen01: hw csum failure skb len=3008 headroom=160 headlen=1376 tailroom=0 mac=(106,14) net=(120,40) trans=160 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0xffff232e ip_summed=2 complete_sw=0 valid=0 level=0) hash(0x77e3d716 sw=1 l4=1) proto=0x86dd pkttype=0 iif=12 ...
Fix this by only converting CHECKSUM_NONE packets to CHECKSUM_UNNECESSARY by reusing __skb_incr_checksum_unnecessary. All other checksum types are kept as-is, including CHECKSUM_COMPLETE as fraglist packets being segmented back would have their skb->csum valid.
Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Signed-off-by: Antoine Tenart atenart@kernel.org Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/udp_offload.c | 8 +------- net/ipv6/udp_offload.c | 8 +------- 2 files changed, 2 insertions(+), 14 deletions(-)
--- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -722,13 +722,7 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_com skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4); skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
- if (skb->ip_summed == CHECKSUM_UNNECESSARY) { - if (skb->csum_level < SKB_MAX_CSUM_LEVEL) - skb->csum_level++; - } else { - skb->ip_summed = CHECKSUM_UNNECESSARY; - skb->csum_level = 0; - } + __skb_incr_checksum_unnecessary(skb);
return 0; } --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -174,13 +174,7 @@ INDIRECT_CALLABLE_SCOPE int udp6_gro_com skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4); skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
- if (skb->ip_summed == CHECKSUM_UNNECESSARY) { - if (skb->csum_level < SKB_MAX_CSUM_LEVEL) - skb->csum_level++; - } else { - skb->ip_summed = CHECKSUM_UNNECESSARY; - skb->csum_level = 0; - } + __skb_incr_checksum_unnecessary(skb);
return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antoine Tenart atenart@kernel.org
commit 64235eabc4b5b18c507c08a1f16cdac6c5661220 upstream.
GRO has a fundamental issue with UDP tunnel packets as it can't detect those in a foolproof way and GRO could happen before they reach the tunnel endpoint. Previous commits have fixed issues when UDP tunnel packets come from a remote host, but if those packets are issued locally they could run into checksum issues.
If the inner packet has a partial checksum the information will be lost in the GRO logic, either in udp4/6_gro_complete or in udp_gro_complete_segment and packets will have an invalid checksum when leaving the host.
Prevent local UDP tunnel packets from ever being GROed at the outer UDP level.
Due to skb->encapsulation being wrongly used in some drivers this is actually only preventing UDP tunnel packets with a partial checksum to be GROed (see iptunnel_handle_offloads) but those were also the packets triggering issues so in practice this should be sufficient.
Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets") Suggested-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Antoine Tenart atenart@kernel.org Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/udp_offload.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -559,6 +559,12 @@ struct sk_buff *udp_gro_receive(struct l */ NAPI_GRO_CB(skb)->is_flist = 0; if (!sk || !udp_sk(sk)->gro_receive) { + /* If the packet was locally encapsulated in a UDP tunnel that + * wasn't detected above, do not GRO. + */ + if (skb->encapsulation) + goto out; + if (skb->dev->features & NETIF_F_GRO_FRAGLIST) NAPI_GRO_CB(skb)->is_flist = sk ? !udp_test_bit(GRO_ENABLED, sk) : 1;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hariprasad Kelam hkelam@marvell.com
commit 0ba80d96585662299d4ea4624043759ce9015421 upstream.
The current implementation for loading coalesced KPU profiles has a limitation. The "offset" field, which is used to locate profiles within the profile is restricted to a u16.
This restricts the number of profiles that can be loaded. This patch addresses this limitation by increasing the size of the "offset" field.
Fixes: 11c730bfbf5b ("octeontx2-af: support for coalescing KPU profiles") Signed-off-by: Hariprasad Kelam hkelam@marvell.com Reviewed-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c @@ -1669,7 +1669,7 @@ static int npc_fwdb_detect_load_prfl_img struct npc_coalesced_kpu_prfl *img_data = NULL; int i = 0, rc = -EINVAL; void __iomem *kpu_prfl_addr; - u16 offset; + u32 offset;
img_data = (struct npc_coalesced_kpu_prfl __force *)rvu->kpu_prfl_addr; if (le64_to_cpu(img_data->signature) == KPU_SIGN &&
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Su Hui suhui@nfschina.com
commit e709acbd84fb6ef32736331b0147f027a3ef4c20 upstream.
otx2_rxtx_enable() return negative error code such as -EIO, check -EIO rather than EIO to fix this problem.
Fixes: c926252205c4 ("octeontx2-pf: Disable packet I/O for graceful exit") Signed-off-by: Su Hui suhui@nfschina.com Reviewed-by: Subbaraya Sundeep sbhatta@marvell.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Link: https://lore.kernel.org/r/20240328020620.4054692-1-suhui@nfschina.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c @@ -1933,7 +1933,7 @@ int otx2_open(struct net_device *netdev) * mcam entries are enabled to receive the packets. Hence disable the * packet I/O. */ - if (err == EIO) + if (err == -EIO) goto err_disable_rxtx; else if (err) goto err_tx_stop_queues;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aleksandr Mishin amishin@t-argos.ru
commit ef15ddeeb6bee87c044bf7754fac524545bf71e8 upstream.
In rvu_map_cgx_lmac_pf() the 'iter', which is used as an array index, can reach value (up to 14) that exceed the size (MAX_LMAC_COUNT = 8) of the array. Fix this bug by adding 'iter' value check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support") Signed-off-by: Aleksandr Mishin amishin@t-argos.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c @@ -160,6 +160,8 @@ static int rvu_map_cgx_lmac_pf(struct rv continue; lmac_bmap = cgx_get_lmac_bmap(rvu_cgx_pdata(cgx, rvu)); for_each_set_bit(iter, &lmac_bmap, rvu->hw->lmac_per_cgx) { + if (iter >= MAX_LMAC_COUNT) + continue; lmac = cgx_get_lmacid(rvu_cgx_pdata(cgx, rvu), iter); rvu->pf2cgxlmac_map[pf] = cgxlmac_id_to_bmap(cgx, lmac);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aleksandr Loktionov aleksandr.loktionov@intel.com
commit eb58c598ce45b7e787568fe27016260417c3d807 upstream.
The bug usually affects untrusted VFs, because they are limited to 18 MACs, it affects them badly, not letting to create MAC all filters. Not stable to reproduce, it happens when VF user creates MAC filters when other MACVLAN operations are happened in parallel. But consequence is that VF can't receive desired traffic.
Fix counter to be bumped only for new or active filters.
Fixes: 621650cabee5 ("i40e: Refactoring VF MAC filters counting to make more reliable") Signed-off-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Reviewed-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -1249,8 +1249,11 @@ int i40e_count_filters(struct i40e_vsi * int bkt; int cnt = 0;
- hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) - ++cnt; + hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) { + if (f->state == I40E_FILTER_NEW || + f->state == I40E_FILTER_ACTIVE) + ++cnt; + }
return cnt; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aleksandr Loktionov aleksandr.loktionov@intel.com
commit f37c4eac99c258111d414d31b740437e1925b8e8 upstream.
To fix the regression introduced by commit 52424f974bc5, which causes servers hang in very hard to reproduce conditions with resets races. Using two sources for the information is the root cause. In this function before the fix bumping v didn't mean bumping vf pointer. But the code used this variables interchangeably, so stale vf could point to different/not intended vf.
Remove redundant "v" variable and iterate via single VF pointer across whole function instead to guarantee VF pointer validity.
Fixes: 52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF") Signed-off-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Reviewed-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 34 +++++++++------------ 1 file changed, 16 insertions(+), 18 deletions(-)
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -1628,8 +1628,8 @@ bool i40e_reset_all_vfs(struct i40e_pf * { struct i40e_hw *hw = &pf->hw; struct i40e_vf *vf; - int i, v; u32 reg; + int i;
/* If we don't have any VFs, then there is nothing to reset */ if (!pf->num_alloc_vfs) @@ -1640,11 +1640,10 @@ bool i40e_reset_all_vfs(struct i40e_pf * return false;
/* Begin reset on all VFs at once */ - for (v = 0; v < pf->num_alloc_vfs; v++) { - vf = &pf->vf[v]; + for (vf = &pf->vf[0]; vf < &pf->vf[pf->num_alloc_vfs]; ++vf) { /* If VF is being reset no need to trigger reset again */ if (!test_bit(I40E_VF_STATE_RESETTING, &vf->vf_states)) - i40e_trigger_vf_reset(&pf->vf[v], flr); + i40e_trigger_vf_reset(vf, flr); }
/* HW requires some time to make sure it can flush the FIFO for a VF @@ -1653,14 +1652,13 @@ bool i40e_reset_all_vfs(struct i40e_pf * * the VFs using a simple iterator that increments once that VF has * finished resetting. */ - for (i = 0, v = 0; i < 10 && v < pf->num_alloc_vfs; i++) { + for (i = 0, vf = &pf->vf[0]; i < 10 && vf < &pf->vf[pf->num_alloc_vfs]; ++i) { usleep_range(10000, 20000);
/* Check each VF in sequence, beginning with the VF to fail * the previous check. */ - while (v < pf->num_alloc_vfs) { - vf = &pf->vf[v]; + while (vf < &pf->vf[pf->num_alloc_vfs]) { if (!test_bit(I40E_VF_STATE_RESETTING, &vf->vf_states)) { reg = rd32(hw, I40E_VPGEN_VFRSTAT(vf->vf_id)); if (!(reg & I40E_VPGEN_VFRSTAT_VFRD_MASK)) @@ -1670,7 +1668,7 @@ bool i40e_reset_all_vfs(struct i40e_pf * /* If the current VF has finished resetting, move on * to the next VF in sequence. */ - v++; + ++vf; } }
@@ -1680,39 +1678,39 @@ bool i40e_reset_all_vfs(struct i40e_pf * /* Display a warning if at least one VF didn't manage to reset in * time, but continue on with the operation. */ - if (v < pf->num_alloc_vfs) + if (vf < &pf->vf[pf->num_alloc_vfs]) dev_err(&pf->pdev->dev, "VF reset check timeout on VF %d\n", - pf->vf[v].vf_id); + vf->vf_id); usleep_range(10000, 20000);
/* Begin disabling all the rings associated with VFs, but do not wait * between each VF. */ - for (v = 0; v < pf->num_alloc_vfs; v++) { + for (vf = &pf->vf[0]; vf < &pf->vf[pf->num_alloc_vfs]; ++vf) { /* On initial reset, we don't have any queues to disable */ - if (pf->vf[v].lan_vsi_idx == 0) + if (vf->lan_vsi_idx == 0) continue;
/* If VF is reset in another thread just continue */ if (test_bit(I40E_VF_STATE_RESETTING, &vf->vf_states)) continue;
- i40e_vsi_stop_rings_no_wait(pf->vsi[pf->vf[v].lan_vsi_idx]); + i40e_vsi_stop_rings_no_wait(pf->vsi[vf->lan_vsi_idx]); }
/* Now that we've notified HW to disable all of the VF rings, wait * until they finish. */ - for (v = 0; v < pf->num_alloc_vfs; v++) { + for (vf = &pf->vf[0]; vf < &pf->vf[pf->num_alloc_vfs]; ++vf) { /* On initial reset, we don't have any queues to disable */ - if (pf->vf[v].lan_vsi_idx == 0) + if (vf->lan_vsi_idx == 0) continue;
/* If VF is reset in another thread just continue */ if (test_bit(I40E_VF_STATE_RESETTING, &vf->vf_states)) continue;
- i40e_vsi_wait_queues_disabled(pf->vsi[pf->vf[v].lan_vsi_idx]); + i40e_vsi_wait_queues_disabled(pf->vsi[vf->lan_vsi_idx]); }
/* Hw may need up to 50ms to finish disabling the RX queues. We @@ -1721,12 +1719,12 @@ bool i40e_reset_all_vfs(struct i40e_pf * mdelay(50);
/* Finish the reset on each VF */ - for (v = 0; v < pf->num_alloc_vfs; v++) { + for (vf = &pf->vf[0]; vf < &pf->vf[pf->num_alloc_vfs]; ++vf) { /* If VF is reset in another thread just continue */ if (test_bit(I40E_VF_STATE_RESETTING, &vf->vf_states)) continue;
- i40e_cleanup_reset_vf(&pf->vf[v]); + i40e_cleanup_reset_vf(vf); }
i40e_flush(hw);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Wetzel Alexander@wetzel-home.de
[ Upstream commit 27f58c04a8f438078583041468ec60597841284d ]
sg_remove_sfp_usercontext() must not use sg_device_destroy() after calling scsi_device_put().
sg_device_destroy() is accessing the parent scsi_device request_queue which will already be set to NULL when the preceding call to scsi_device_put() removed the last reference to the parent scsi_device.
The resulting NULL pointer exception will then crash the kernel.
Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage") Cc: stable@vger.kernel.org Signed-off-by: Alexander Wetzel Alexander@wetzel-home.de Link: https://lore.kernel.org/r/20240320213032.18221-1-Alexander@wetzel-home.de Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/sg.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 0d8afffd1683b..8bd95ee1825a6 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -2208,6 +2208,7 @@ sg_remove_sfp_usercontext(struct work_struct *work) { struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work); struct sg_device *sdp = sfp->parentdp; + struct scsi_device *device = sdp->device; Sg_request *srp; unsigned long iflags;
@@ -2233,8 +2234,9 @@ sg_remove_sfp_usercontext(struct work_struct *work) "sg_remove_sfp: sfp=0x%p\n", sfp)); kfree(sfp);
- scsi_device_put(sdp->device); + WARN_ON_ONCE(kref_read(&sdp->d_ref) != 1); kref_put(&sdp->d_ref, sg_device_destroy); + scsi_device_put(device); module_put(THIS_MODULE); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krishna Kurapati quic_kriskura@quicinc.com
[ Upstream commit f5e9bda03aa50ffad36eccafe893d004ef213c43 ]
When orientation switch is enabled in ucsi glink, there is a xhci probe failure seen when booting up in host mode in reverse orientation.
During bootup the following things happen in multiple drivers:
a) DWC3 controller driver initializes the core in device mode when the dr_mode is set to DRD. It relies on role_switch call to change role to host.
b) QMP driver initializes the lanes to TYPEC_ORIENTATION_NORMAL as a normal routine. It relies on the typec_switch_set call to get notified of orientation changes.
c) UCSI core reads the UCSI_GET_CONNECTOR_STATUS via the glink and provides initial role switch to dwc3 controller.
When booting up in host mode with orientation TYPEC_ORIENTATION_REVERSE, then we see the following things happening in order:
a) UCSI gives initial role as host to dwc3 controller ucsi_register_port. Upon receiving this notification, the dwc3 core needs to program GCTL from PRTCAP_DEVICE to PRTCAP_HOST and as part of this change, it asserts GCTL Core soft reset and waits for it to be completed before shifting it to host. Only after the reset is done will the dwc3_host_init be invoked and xhci is probed. DWC3 controller expects that the usb phy's are stable during this process i.e., the phy init is already done.
b) During the 100ms wait for GCTL core soft reset, the actual notification from PPM is received by ucsi_glink via pmic glink for changing role to host. The pmic_glink_ucsi_notify routine first sends the orientation change to QMP and then sends role to dwc3 via ucsi framework. This is happening exactly at the time GCTL core soft reset is being processed.
c) When QMP driver receives typec switch to TYPEC_ORIENTATION_REVERSE, it then re-programs the phy at the instant GCTL core soft reset has been asserted by dwc3 controller due to which the QMP PLL lock fails in qmp_combo_usb_power_on.
d) After the 100ms of GCTL core soft reset is completed, the dwc3 core goes for initializing the host mode and invokes xhci probe. But at this point the QMP is non-responsive and as a result, the xhci plat probe fails during xhci_reset.
Fix this by passing orientation switch to available ucsi instances if their gpio configuration is available before ucsi_register is invoked so that by the time, the pmic_glink_ucsi_notify provides typec_switch to QMP, the lane is already configured and the call would be a NOP thus not racing with role switch.
Cc: stable@vger.kernel.org Fixes: c6165ed2f425 ("usb: ucsi: glink: use the connector orientation GPIO to provide switch events") Suggested-by: Wesley Cheng quic_wcheng@quicinc.com Signed-off-by: Krishna Kurapati quic_kriskura@quicinc.com Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20240301040914.458492-1-quic_kriskura@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/typec/ucsi/ucsi_glink.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index 4853141cd10c8..894622b6556a6 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -254,6 +254,20 @@ static void pmic_glink_ucsi_notify(struct work_struct *work) static void pmic_glink_ucsi_register(struct work_struct *work) { struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work); + int orientation; + int i; + + for (i = 0; i < PMIC_GLINK_MAX_PORTS; i++) { + if (!ucsi->port_orientation[i]) + continue; + orientation = gpiod_get_value(ucsi->port_orientation[i]); + + if (orientation >= 0) { + typec_switch_set(ucsi->port_switch[i], + orientation ? TYPEC_ORIENTATION_REVERSE + : TYPEC_ORIENTATION_NORMAL); + } + }
ucsi_register(ucsi->ucsi); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian A. Ehrhardt lk@c--e.de
[ Upstream commit 808a8b9e0b87bbc72bcc1f7ddfe5d04746e7ce56 ]
The completion notification for the final SET_NOTIFICATION_ENABLE command during initialization can include a connector change notification. However, at the time this completion notification is processed, the ucsi struct is not ready to handle this notification. As a result the notification is ignored and the controller never sends an interrupt again.
Re-check CCI for a pending connector state change after initialization is complete. Adjust the corresponding debug message accordingly.
Fixes: 71a1fa0df2a3 ("usb: typec: ucsi: Store the notification mask") Cc: stable@vger.kernel.org Signed-off-by: Christian A. Ehrhardt lk@c--e.de Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Tested-by: Neil Armstrong neil.armstrong@linaro.org # on SM8550-QRD Link: https://lore.kernel.org/r/20240320073927.1641788-3-lk@c--e.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/typec/ucsi/ucsi.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c index 70d9f4eebf1a7..15fbadaca55b1 100644 --- a/drivers/usb/typec/ucsi/ucsi.c +++ b/drivers/usb/typec/ucsi/ucsi.c @@ -959,7 +959,7 @@ void ucsi_connector_change(struct ucsi *ucsi, u8 num) struct ucsi_connector *con = &ucsi->connector[num - 1];
if (!(ucsi->ntfy & UCSI_ENABLE_NTFY_CONNECTOR_CHANGE)) { - dev_dbg(ucsi->dev, "Bogus connector change event\n"); + dev_dbg(ucsi->dev, "Early connector change event\n"); return; }
@@ -1390,6 +1390,7 @@ static int ucsi_init(struct ucsi *ucsi) { struct ucsi_connector *con, *connector; u64 command, ntfy; + u32 cci; int ret; int i;
@@ -1442,6 +1443,13 @@ static int ucsi_init(struct ucsi *ucsi)
ucsi->connector = connector; ucsi->ntfy = ntfy; + + ret = ucsi->ops->read(ucsi, UCSI_CCI, &cci, sizeof(cci)); + if (ret) + return ret; + if (UCSI_CCI_CONNECTOR(READ_ONCE(cci))) + ucsi_connector_change(ucsi, cci); + return 0;
err_unregister:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmytro Laktyushkin dmytro.laktyushkin@amd.com
[ Upstream commit e8d131285c98927554cd007f47cedc4694bfedde ]
[Why] Secondary DP2 display fails to light up in some instances
[How] Clock needs to be on when DPSTREAMCLK*_EN =1. This change moves dtbclk_p enable/disable point to make sure this is the case
Reviewed-by: Charlene Liu charlene.liu@amd.com Reviewed-by: Dmytro Laktyushkin dmytro.laktyushkin@amd.com Acked-by: Tom Chung chiahsuan.chung@amd.com Signed-off-by: Daniel Miess daniel.miess@amd.com Signed-off-by: Dmytro Laktyushkin dmytro.laktyushkin@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Stable-dep-of: 72d72e8fddbc ("drm/amd/display: Prevent crash when disable stream") Signed-off-by: Sasha Levin sashal@kernel.org --- .../drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 11 +++++------ 2 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index 251dd800a2a66..2ac41c2a7238c 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -1179,9 +1179,9 @@ void dce110_disable_stream(struct pipe_ctx *pipe_ctx) dto_params.timing = &pipe_ctx->stream->timing; dp_hpo_inst = pipe_ctx->stream_res.hpo_dp_stream_enc->inst; if (dccg) { - dccg->funcs->set_dtbclk_dto(dccg, &dto_params); dccg->funcs->disable_symclk32_se(dccg, dp_hpo_inst); dccg->funcs->set_dpstreamclk(dccg, REFCLK, tg->inst, dp_hpo_inst); + dccg->funcs->set_dtbclk_dto(dccg, &dto_params); } } else if (dccg && dccg->funcs->disable_symclk_se) { dccg->funcs->disable_symclk_se(dccg, stream_enc->stream_enc_inst, diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c index 1e3803739ae61..12af2859002f7 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c @@ -2728,18 +2728,17 @@ void dcn20_enable_stream(struct pipe_ctx *pipe_ctx) }
if (dc->link_srv->dp_is_128b_132b_signal(pipe_ctx)) { - dp_hpo_inst = pipe_ctx->stream_res.hpo_dp_stream_enc->inst; - dccg->funcs->set_dpstreamclk(dccg, DTBCLK0, tg->inst, dp_hpo_inst); - - phyd32clk = get_phyd32clk_src(link); - dccg->funcs->enable_symclk32_se(dccg, dp_hpo_inst, phyd32clk); - dto_params.otg_inst = tg->inst; dto_params.pixclk_khz = pipe_ctx->stream->timing.pix_clk_100hz / 10; dto_params.num_odm_segments = get_odm_segment_count(pipe_ctx); dto_params.timing = &pipe_ctx->stream->timing; dto_params.ref_dtbclk_khz = dc->clk_mgr->funcs->get_dtb_ref_clk_frequency(dc->clk_mgr); dccg->funcs->set_dtbclk_dto(dccg, &dto_params); + dp_hpo_inst = pipe_ctx->stream_res.hpo_dp_stream_enc->inst; + dccg->funcs->set_dpstreamclk(dccg, DTBCLK0, tg->inst, dp_hpo_inst); + + phyd32clk = get_phyd32clk_src(link); + dccg->funcs->enable_symclk32_se(dccg, dp_hpo_inst, phyd32clk); } else { } if (hws->funcs.calculate_dccg_k1_k2_values && dc->res_pool->dccg->funcs->set_pixel_rate_div) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chris Park chris.park@amd.com
[ Upstream commit 72d72e8fddbcd6c98e1b02d32cf6f2b04e10bd1c ]
[Why] Disabling stream encoder invokes a function that no longer exists.
[How] Check if the function declaration is NULL in disable stream encoder.
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Reviewed-by: Charlene Liu charlene.liu@amd.com Acked-by: Wayne Lin wayne.lin@amd.com Signed-off-by: Chris Park chris.park@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index 2ac41c2a7238c..7b5c1498941dd 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -1181,7 +1181,8 @@ void dce110_disable_stream(struct pipe_ctx *pipe_ctx) if (dccg) { dccg->funcs->disable_symclk32_se(dccg, dp_hpo_inst); dccg->funcs->set_dpstreamclk(dccg, REFCLK, tg->inst, dp_hpo_inst); - dccg->funcs->set_dtbclk_dto(dccg, &dto_params); + if (dccg && dccg->funcs->set_dtbclk_dto) + dccg->funcs->set_dtbclk_dto(dccg, &dto_params); } } else if (dccg && dccg->funcs->disable_symclk_se) { dccg->funcs->disable_symclk_se(dccg, stream_enc->stream_enc_inst,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit 5095d5418193eb2748c7d8553c7150b8f1c44696 ]
Linux PM core has a prepare() callback run before suspend.
If the system is under high memory pressure, the resources may need to be evicted into swap instead. If the storage backing for swap is offlined during the suspend() step then such a call may fail.
So move this step into prepare() to move evict majority of resources and update all non-pmops callers to call the same callback.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Reviewed-by: Christian König christian.koenig@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Stable-dep-of: ca299b4512d4 ("drm/amd: Flush GFXOFF requests in prepare stage") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 ++++++++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 ++++--- 3 files changed, 34 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 85efd686e538d..d59e8536192ca 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1369,6 +1369,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev, void amdgpu_driver_release_kms(struct drm_device *dev);
int amdgpu_device_ip_suspend(struct amdgpu_device *adev); +int amdgpu_device_prepare(struct drm_device *dev); int amdgpu_device_suspend(struct drm_device *dev, bool fbcon); int amdgpu_device_resume(struct drm_device *dev, bool fbcon); u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 79261bec26542..707c17641c757 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1549,6 +1549,7 @@ static void amdgpu_switcheroo_set_state(struct pci_dev *pdev, } else { pr_info("switched off\n"); dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; + amdgpu_device_prepare(dev); amdgpu_device_suspend(dev, true); amdgpu_device_cache_pci_state(pdev); /* Shut down the device */ @@ -4094,6 +4095,31 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev) /* * Suspend & resume. */ +/** + * amdgpu_device_prepare - prepare for device suspend + * + * @dev: drm dev pointer + * + * Prepare to put the hw in the suspend state (all asics). + * Returns 0 for success or an error on failure. + * Called at driver suspend. + */ +int amdgpu_device_prepare(struct drm_device *dev) +{ + struct amdgpu_device *adev = drm_to_adev(dev); + int r; + + if (dev->switch_power_state == DRM_SWITCH_POWER_OFF) + return 0; + + /* Evict the majority of BOs before starting suspend sequence */ + r = amdgpu_device_evict_resources(adev); + if (r) + return r; + + return 0; +} + /** * amdgpu_device_suspend - initiate device suspend * @@ -4114,11 +4140,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
adev->in_suspend = true;
- /* Evict the majority of BOs before grabbing the full access */ - r = amdgpu_device_evict_resources(adev); - if (r) - return r; - if (amdgpu_sriov_vf(adev)) { amdgpu_virt_fini_data_exchange(adev); r = amdgpu_virt_request_full_gpu(adev, false); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 3204c3a42f2a3..f9bc38d20ce3e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2386,8 +2386,9 @@ static int amdgpu_pmops_prepare(struct device *dev) /* Return a positive number here so * DPM_FLAG_SMART_SUSPEND works properly */ - if (amdgpu_device_supports_boco(drm_dev)) - return pm_runtime_suspended(dev); + if (amdgpu_device_supports_boco(drm_dev) && + pm_runtime_suspended(dev)) + return 1;
/* if we will not support s3 or s2i for the device * then skip suspend @@ -2396,7 +2397,7 @@ static int amdgpu_pmops_prepare(struct device *dev) !amdgpu_acpi_is_s3_active(adev)) return 1;
- return 0; + return amdgpu_device_prepare(drm_dev); }
static void amdgpu_pmops_complete(struct device *dev) @@ -2598,6 +2599,9 @@ static int amdgpu_pmops_runtime_suspend(struct device *dev) if (amdgpu_device_supports_boco(drm_dev)) adev->mp1_state = PP_MP1_STATE_UNLOAD;
+ ret = amdgpu_device_prepare(drm_dev); + if (ret) + return ret; ret = amdgpu_device_suspend(drm_dev, false); if (ret) { adev->in_runpm = false;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit cb11ca3233aa3303dc11dca25977d2e7f24be00f ]
If any IP blocks allocate memory during their hw_fini() sequence this can cause the suspend to fail under memory pressure. Introduce a new phase that IP blocks can use to allocate memory before suspend starts so that it can potentially be evicted into swap instead.
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Stable-dep-of: ca299b4512d4 ("drm/amd: Flush GFXOFF requests in prepare stage") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++++++++++- drivers/gpu/drm/amd/include/amd_shared.h | 1 + 2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 707c17641c757..4ebe42395708f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4107,7 +4107,7 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev) int amdgpu_device_prepare(struct drm_device *dev) { struct amdgpu_device *adev = drm_to_adev(dev); - int r; + int i, r;
if (dev->switch_power_state == DRM_SWITCH_POWER_OFF) return 0; @@ -4117,6 +4117,16 @@ int amdgpu_device_prepare(struct drm_device *dev) if (r) return r;
+ for (i = 0; i < adev->num_ip_blocks; i++) { + if (!adev->ip_blocks[i].status.valid) + continue; + if (!adev->ip_blocks[i].version->funcs->prepare_suspend) + continue; + r = adev->ip_blocks[i].version->funcs->prepare_suspend((void *)adev); + if (r) + return r; + } + return 0; }
diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd/include/amd_shared.h index abe829bbd54af..a9880fc531955 100644 --- a/drivers/gpu/drm/amd/include/amd_shared.h +++ b/drivers/gpu/drm/amd/include/amd_shared.h @@ -295,6 +295,7 @@ struct amd_ip_funcs { int (*hw_init)(void *handle); int (*hw_fini)(void *handle); void (*late_fini)(void *handle); + int (*prepare_suspend)(void *handle); int (*suspend)(void *handle); int (*resume)(void *handle); bool (*is_idle)(void *handle);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit ca299b4512d4b4f516732a48ce9aa19d91f4473e ]
If the system hasn't entered GFXOFF when suspend starts it can cause hangs accessing GC and RLC during the suspend stage.
Cc: stable@vger.kernel.org # 6.1.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") Cc: stable@vger.kernel.org # 6.1.y: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: stable@vger.kernel.org # 6.1.y: 2ceec37b0e3d ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: stable@vger.kernel.org # 6.1.y: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend") Cc: stable@vger.kernel.org # 6.6.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") Cc: stable@vger.kernel.org # 6.6.y: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: stable@vger.kernel.org # 6.6.y: 2ceec37b0e3d ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: stable@vger.kernel.org # 6.6.y: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend") Cc: stable@vger.kernel.org # 6.1+ Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132 Fixes: ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 4ebe42395708f..062d78818da16 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4117,6 +4117,8 @@ int amdgpu_device_prepare(struct drm_device *dev) if (r) return r;
+ flush_delayed_work(&adev->gfx.gfx_off_delay_work); + for (i = 0; i < adev->num_ip_blocks; i++) { if (!adev->ip_blocks[i].status.valid) continue;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit addca9175e5f74cf29e8ad918c38c09b8663b5b8 ]
Enum type names should not be suffixed by '_t'. Either to use 'typedef enum name name_t' to so plain 'name_t var' instead of 'enum name_t var'.
Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Link: https://lore.kernel.org/r/20231113231047.548659-6-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: ea558de7238b ("i40e: Enforce software interrupt during busy-poll exit") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e.h | 4 ++-- drivers/net/ethernet/intel/i40e/i40e_ptp.c | 6 +++--- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 4 ++-- 3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 55bb0b5310d5b..bc353da3ed41d 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -108,7 +108,7 @@ #define I40E_MAX_BW_INACTIVE_ACCUM 4 /* accumulate 4 credits max */
/* driver state flags */ -enum i40e_state_t { +enum i40e_state { __I40E_TESTING, __I40E_CONFIG_BUSY, __I40E_CONFIG_DONE, @@ -156,7 +156,7 @@ enum i40e_state_t { BIT_ULL(__I40E_PF_RESET_AND_REBUILD_REQUESTED)
/* VSI state flags */ -enum i40e_vsi_state_t { +enum i40e_vsi_state { __I40E_VSI_DOWN, __I40E_VSI_NEEDS_RESTART, __I40E_VSI_SYNCING_FILTERS, diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c index 8a26811140b47..cac9584debb1d 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c +++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c @@ -34,7 +34,7 @@ enum i40e_ptp_pin { GPIO_4 };
-enum i40e_can_set_pins_t { +enum i40e_can_set_pins { CANT_DO_PINS = -1, CAN_SET_PINS, CAN_DO_PINS @@ -192,7 +192,7 @@ static bool i40e_is_ptp_pin_dev(struct i40e_hw *hw) * return CAN_DO_PINS if pins can be manipulated within a NIC or * return CANT_DO_PINS otherwise. **/ -static enum i40e_can_set_pins_t i40e_can_set_pins(struct i40e_pf *pf) +static enum i40e_can_set_pins i40e_can_set_pins(struct i40e_pf *pf) { if (!i40e_is_ptp_pin_dev(&pf->hw)) { dev_warn(&pf->pdev->dev, @@ -1070,7 +1070,7 @@ static void i40e_ptp_set_pins_hw(struct i40e_pf *pf) static int i40e_ptp_set_pins(struct i40e_pf *pf, struct i40e_ptp_pins_settings *pins) { - enum i40e_can_set_pins_t pin_caps = i40e_can_set_pins(pf); + enum i40e_can_set_pins pin_caps = i40e_can_set_pins(pf); int i = 0;
if (pin_caps == CANT_DO_PINS) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index 900b0d9ede9f5..84e4dacde6f58 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -57,7 +57,7 @@ static inline u16 i40e_intrl_usec_to_reg(int intrl) * mentioning ITR_INDX, ITR_NONE cannot be used as an index 'n' into any * register but instead is a special value meaning "don't update" ITR0/1/2. */ -enum i40e_dyn_idx_t { +enum i40e_dyn_idx { I40E_IDX_ITR0 = 0, I40E_IDX_ITR1 = 1, I40E_IDX_ITR2 = 2, @@ -305,7 +305,7 @@ struct i40e_rx_queue_stats { u64 page_busy_count; };
-enum i40e_ring_state_t { +enum i40e_ring_state { __I40E_TX_FDIR_INIT_DONE, __I40E_TX_XPS_INIT_DONE, __I40E_RING_STATE_NBITS /* must be last */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit ea558de7238bb12c3435c47f0631e9d17bf4a09f ]
As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts during poll exit") followed by commit 23be7075b318 ("ice: fix software generating extra interrupts") I'm seeing the similar issue also with i40e driver.
In certain situation when busy-loop is enabled together with adaptive coalescing, the driver occasionally misses that there are outstanding descriptors to clean when exiting busy poll.
Try to catch the remaining work by triggering a software interrupt when exiting busy poll. No extra interrupts will be generated when busy polling is not used.
The issue was found when running sockperf ping-pong tcp test with adaptive coalescing and busy poll enabled (50 as value busy_pool and busy_read sysctl knobs) and results in huge latency spikes with more than 100000us.
The fix is inspired from the ice driver and do the following: 1) During napi poll exit in case of busy-poll (napo_complete_done() returns false) this is recorded to q_vector that we were in busy loop. 2) Extends i40e_buildreg_itr() to be able to add an enforced software interrupt into built value 2) In i40e_update_enable_itr() enforces a software interrupt trigger if we are exiting busy poll to catch any pending clean-ups 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to 20K interrupts per second to limit the number of these sw interrupts.
Test results ============ Prior: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473 sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.571 usec sockperf: Total 2429473 observations; each percentile contains 24294.73 observations sockperf: ---> <MAX> observation = 103294.331 sockperf: ---> percentile 99.999 = 45.633 sockperf: ---> percentile 99.990 = 37.013 sockperf: ---> percentile 99.900 = 35.910 sockperf: ---> percentile 99.000 = 33.390 sockperf: ---> percentile 90.000 = 28.626 sockperf: ---> percentile 75.000 = 27.741 sockperf: ---> percentile 50.000 = 26.743 sockperf: ---> percentile 25.000 = 25.614 sockperf: ---> <MIN> observation = 12.220
After: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186 sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.965 usec sockperf: Total 2391186 observations; each percentile contains 23911.86 observations sockperf: ---> <MAX> observation = 195.841 sockperf: ---> percentile 99.999 = 45.026 sockperf: ---> percentile 99.990 = 39.009 sockperf: ---> percentile 99.900 = 35.922 sockperf: ---> percentile 99.000 = 33.482 sockperf: ---> percentile 90.000 = 28.902 sockperf: ---> percentile 75.000 = 27.821 sockperf: ---> percentile 50.000 = 26.860 sockperf: ---> percentile 25.000 = 25.685 sockperf: ---> <MIN> observation = 12.277
Fixes: 0bcd952feec7 ("ethernet/intel: consolidate NAPI and NAPI exit") Reported-by: Hugo Ferreira hferreir@redhat.com Reviewed-by: Michal Schmidt mschmidt@redhat.com Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e.h | 1 + drivers/net/ethernet/intel/i40e/i40e_main.c | 6 ++ .../net/ethernet/intel/i40e/i40e_register.h | 3 + drivers/net/ethernet/intel/i40e/i40e_txrx.c | 82 ++++++++++++++----- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 1 + 5 files changed, 72 insertions(+), 21 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index bc353da3ed41d..3cc0b87def3fa 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -992,6 +992,7 @@ struct i40e_q_vector { struct rcu_head rcu; /* to avoid race with update stats on free */ char name[I40E_INT_NAME_STR_LEN]; bool arm_wb_state; + bool in_busy_poll; int irq_num; /* IRQ assigned to this q_vector */ } ____cacheline_internodealigned_in_smp;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index fd4e86b6b4c1f..8bfecf81d26f6 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3908,6 +3908,12 @@ static void i40e_vsi_configure_msix(struct i40e_vsi *vsi) q_vector->tx.target_itr >> 1); q_vector->tx.current_itr = q_vector->tx.target_itr;
+ /* Set ITR for software interrupts triggered after exiting + * busy-loop polling. + */ + wr32(hw, I40E_PFINT_ITRN(I40E_SW_ITR, vector - 1), + I40E_ITR_20K); + wr32(hw, I40E_PFINT_RATEN(vector - 1), i40e_intrl_usec_to_reg(vsi->int_rate_limit));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_register.h b/drivers/net/ethernet/intel/i40e/i40e_register.h index 7339003aa17cd..694cb3e45c1ec 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_register.h +++ b/drivers/net/ethernet/intel/i40e/i40e_register.h @@ -328,8 +328,11 @@ #define I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT 3 #define I40E_PFINT_DYN_CTLN_ITR_INDX_MASK I40E_MASK(0x3, I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT) #define I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT 5 +#define I40E_PFINT_DYN_CTLN_INTERVAL_MASK I40E_MASK(0xFFF, I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT) #define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_SHIFT 24 #define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK I40E_MASK(0x1, I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_SHIFT) +#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_SHIFT 25 +#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_MASK I40E_MASK(0x3, I40E_PFINT_DYN_CTLN_SW_ITR_INDX_SHIFT) #define I40E_PFINT_ICR0 0x00038780 /* Reset: CORER */ #define I40E_PFINT_ICR0_INTEVENT_SHIFT 0 #define I40E_PFINT_ICR0_INTEVENT_MASK I40E_MASK(0x1, I40E_PFINT_ICR0_INTEVENT_SHIFT) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 1df2f93388128..f703646622d9a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2644,7 +2644,22 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget, return failure ? budget : (int)total_rx_packets; }
-static inline u32 i40e_buildreg_itr(const int type, u16 itr) +/** + * i40e_buildreg_itr - build a value for writing to I40E_PFINT_DYN_CTLN register + * @itr_idx: interrupt throttling index + * @interval: interrupt throttling interval value in usecs + * @force_swint: force software interrupt + * + * The function builds a value for I40E_PFINT_DYN_CTLN register that + * is used to update interrupt throttling interval for specified ITR index + * and optionally enforces a software interrupt. If the @itr_idx is equal + * to I40E_ITR_NONE then no interval change is applied and only @force_swint + * parameter is taken into account. If the interval change and enforced + * software interrupt are not requested then the built value just enables + * appropriate vector interrupt. + **/ +static u32 i40e_buildreg_itr(enum i40e_dyn_idx itr_idx, u16 interval, + bool force_swint) { u32 val;
@@ -2658,23 +2673,33 @@ static inline u32 i40e_buildreg_itr(const int type, u16 itr) * an event in the PBA anyway so we need to rely on the automask * to hold pending events for us until the interrupt is re-enabled * - * The itr value is reported in microseconds, and the register - * value is recorded in 2 microsecond units. For this reason we - * only need to shift by the interval shift - 1 instead of the - * full value. + * We have to shift the given value as it is reported in microseconds + * and the register value is recorded in 2 microsecond units. */ - itr &= I40E_ITR_MASK; + interval >>= 1;
+ /* 1. Enable vector interrupt + * 2. Update the interval for the specified ITR index + * (I40E_ITR_NONE in the register is used to indicate that + * no interval update is requested) + */ val = I40E_PFINT_DYN_CTLN_INTENA_MASK | - (type << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT) | - (itr << (I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT - 1)); + FIELD_PREP(I40E_PFINT_DYN_CTLN_ITR_INDX_MASK, itr_idx) | + FIELD_PREP(I40E_PFINT_DYN_CTLN_INTERVAL_MASK, interval); + + /* 3. Enforce software interrupt trigger if requested + * (These software interrupts rate is limited by ITR2 that is + * set to 20K interrupts per second) + */ + if (force_swint) + val |= I40E_PFINT_DYN_CTLN_SWINT_TRIG_MASK | + I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK | + FIELD_PREP(I40E_PFINT_DYN_CTLN_SW_ITR_INDX_MASK, + I40E_SW_ITR);
return val; }
-/* a small macro to shorten up some long lines */ -#define INTREG I40E_PFINT_DYN_CTLN - /* The act of updating the ITR will cause it to immediately trigger. In order * to prevent this from throwing off adaptive update statistics we defer the * update so that it can only happen so often. So after either Tx or Rx are @@ -2693,8 +2718,10 @@ static inline u32 i40e_buildreg_itr(const int type, u16 itr) static inline void i40e_update_enable_itr(struct i40e_vsi *vsi, struct i40e_q_vector *q_vector) { + enum i40e_dyn_idx itr_idx = I40E_ITR_NONE; struct i40e_hw *hw = &vsi->back->hw; - u32 intval; + u16 interval = 0; + u32 itr_val;
/* If we don't have MSIX, then we only need to re-enable icr0 */ if (!(vsi->back->flags & I40E_FLAG_MSIX_ENABLED)) { @@ -2716,8 +2743,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi, */ if (q_vector->rx.target_itr < q_vector->rx.current_itr) { /* Rx ITR needs to be reduced, this is highest priority */ - intval = i40e_buildreg_itr(I40E_RX_ITR, - q_vector->rx.target_itr); + itr_idx = I40E_RX_ITR; + interval = q_vector->rx.target_itr; q_vector->rx.current_itr = q_vector->rx.target_itr; q_vector->itr_countdown = ITR_COUNTDOWN_START; } else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) || @@ -2726,25 +2753,36 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi, /* Tx ITR needs to be reduced, this is second priority * Tx ITR needs to be increased more than Rx, fourth priority */ - intval = i40e_buildreg_itr(I40E_TX_ITR, - q_vector->tx.target_itr); + itr_idx = I40E_TX_ITR; + interval = q_vector->tx.target_itr; q_vector->tx.current_itr = q_vector->tx.target_itr; q_vector->itr_countdown = ITR_COUNTDOWN_START; } else if (q_vector->rx.current_itr != q_vector->rx.target_itr) { /* Rx ITR needs to be increased, third priority */ - intval = i40e_buildreg_itr(I40E_RX_ITR, - q_vector->rx.target_itr); + itr_idx = I40E_RX_ITR; + interval = q_vector->rx.target_itr; q_vector->rx.current_itr = q_vector->rx.target_itr; q_vector->itr_countdown = ITR_COUNTDOWN_START; } else { /* No ITR update, lowest priority */ - intval = i40e_buildreg_itr(I40E_ITR_NONE, 0); if (q_vector->itr_countdown) q_vector->itr_countdown--; }
- if (!test_bit(__I40E_VSI_DOWN, vsi->state)) - wr32(hw, INTREG(q_vector->reg_idx), intval); + /* Do not update interrupt control register if VSI is down */ + if (test_bit(__I40E_VSI_DOWN, vsi->state)) + return; + + /* Update ITR interval if necessary and enforce software interrupt + * if we are exiting busy poll. + */ + if (q_vector->in_busy_poll) { + itr_val = i40e_buildreg_itr(itr_idx, interval, true); + q_vector->in_busy_poll = false; + } else { + itr_val = i40e_buildreg_itr(itr_idx, interval, false); + } + wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), itr_val); }
/** @@ -2859,6 +2897,8 @@ int i40e_napi_poll(struct napi_struct *napi, int budget) */ if (likely(napi_complete_done(napi, work_done))) i40e_update_enable_itr(vsi, q_vector); + else + q_vector->in_busy_poll = true;
return min(work_done, budget - 1); } diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index 84e4dacde6f58..81f6a991bfb73 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -67,6 +67,7 @@ enum i40e_dyn_idx { /* these are indexes into ITRN registers */ #define I40E_RX_ITR I40E_IDX_ITR0 #define I40E_TX_ITR I40E_IDX_ITR1 +#define I40E_SW_ITR I40E_IDX_ITR2
/* Supported RSS offloads */ #define I40E_DEFAULT_RSS_HENA ( \
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit 39ec612acf6d075809c38a7262d7ad09314762f3 ]
The .back field placed in i40e_hw is used to get pointer to i40e_pf instance but it is not necessary as the i40e_hw is a part of i40e_pf and containerof macro can be used to obtain the pointer to i40e_pf. Remove .back field from i40e_hw structure, introduce i40e_hw_to_pf() and i40e_hw_to_dev() helpers and use them.
Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 6dbdd4de0362 ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e.h | 11 ++++++++++ drivers/net/ethernet/intel/i40e/i40e_main.c | 22 ++++++++++++++------ drivers/net/ethernet/intel/i40e/i40e_osdep.h | 8 +++---- drivers/net/ethernet/intel/i40e/i40e_type.h | 1 - 4 files changed, 31 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 3cc0b87def3fa..6f08c8fe653bd 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -1322,4 +1322,15 @@ static inline u32 i40e_is_tc_mqprio_enabled(struct i40e_pf *pf) return pf->flags & I40E_FLAG_TC_MQPRIO; }
+/** + * i40e_hw_to_pf - get pf pointer from the hardware structure + * @hw: pointer to the device HW structure + **/ +static inline struct i40e_pf *i40e_hw_to_pf(struct i40e_hw *hw) +{ + return container_of(hw, struct i40e_pf, hw); +} + +struct device *i40e_hw_to_dev(struct i40e_hw *hw); + #endif /* _I40E_H_ */ diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 8bfecf81d26f6..17ab6a1c53971 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -125,6 +125,17 @@ static void netdev_hw_addr_refcnt(struct i40e_mac_filter *f, } }
+/** + * i40e_hw_to_dev - get device pointer from the hardware structure + * @hw: pointer to the device HW structure + **/ +struct device *i40e_hw_to_dev(struct i40e_hw *hw) +{ + struct i40e_pf *pf = i40e_hw_to_pf(hw); + + return &pf->pdev->dev; +} + /** * i40e_allocate_dma_mem_d - OS specific memory alloc for shared code * @hw: pointer to the HW structure @@ -135,7 +146,7 @@ static void netdev_hw_addr_refcnt(struct i40e_mac_filter *f, int i40e_allocate_dma_mem_d(struct i40e_hw *hw, struct i40e_dma_mem *mem, u64 size, u32 alignment) { - struct i40e_pf *pf = (struct i40e_pf *)hw->back; + struct i40e_pf *pf = i40e_hw_to_pf(hw);
mem->size = ALIGN(size, alignment); mem->va = dma_alloc_coherent(&pf->pdev->dev, mem->size, &mem->pa, @@ -153,7 +164,7 @@ int i40e_allocate_dma_mem_d(struct i40e_hw *hw, struct i40e_dma_mem *mem, **/ int i40e_free_dma_mem_d(struct i40e_hw *hw, struct i40e_dma_mem *mem) { - struct i40e_pf *pf = (struct i40e_pf *)hw->back; + struct i40e_pf *pf = i40e_hw_to_pf(hw);
dma_free_coherent(&pf->pdev->dev, mem->size, mem->va, mem->pa); mem->va = NULL; @@ -15653,10 +15664,10 @@ static int i40e_init_recovery_mode(struct i40e_pf *pf, struct i40e_hw *hw) **/ static inline void i40e_set_subsystem_device_id(struct i40e_hw *hw) { - struct pci_dev *pdev = ((struct i40e_pf *)hw->back)->pdev; + struct i40e_pf *pf = i40e_hw_to_pf(hw);
- hw->subsystem_device_id = pdev->subsystem_device ? - pdev->subsystem_device : + hw->subsystem_device_id = pf->pdev->subsystem_device ? + pf->pdev->subsystem_device : (ushort)(rd32(hw, I40E_PFPCI_SUBSYSID) & USHRT_MAX); }
@@ -15726,7 +15737,6 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent) set_bit(__I40E_DOWN, pf->state);
hw = &pf->hw; - hw->back = pf;
pf->ioremap_len = min_t(int, pci_resource_len(pdev, 0), I40E_MAX_CSR_SPACE); diff --git a/drivers/net/ethernet/intel/i40e/i40e_osdep.h b/drivers/net/ethernet/intel/i40e/i40e_osdep.h index 2bd4de03dafa2..997569a4ad57b 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_osdep.h +++ b/drivers/net/ethernet/intel/i40e/i40e_osdep.h @@ -18,10 +18,10 @@ * actual OS primitives */
-#define hw_dbg(hw, S, A...) \ -do { \ - dev_dbg(&((struct i40e_pf *)hw->back)->pdev->dev, S, ##A); \ -} while (0) +struct i40e_hw; +struct device *i40e_hw_to_dev(struct i40e_hw *hw); + +#define hw_dbg(hw, S, A...) dev_dbg(i40e_hw_to_dev(hw), S, ##A)
#define wr32(a, reg, value) writel((value), ((a)->hw_addr + (reg))) #define rd32(a, reg) readl((a)->hw_addr + (reg)) diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h index 232131bedc3e7..658bc89132783 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_type.h +++ b/drivers/net/ethernet/intel/i40e/i40e_type.h @@ -525,7 +525,6 @@ struct i40e_dcbx_config { /* Port hardware description */ struct i40e_hw { u8 __iomem *hw_addr; - void *back;
/* subsystem structs */ struct i40e_phy_info phy;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit 8196b5fd6c7312d31775f77c7fff0253eb0ecdaa ]
The macros I40E_MDIO_CLAUSE22* and I40E_MDIO_CLAUSE45* are using I40E_MASK together with the same values I40E_GLGEN_MSCA_STCODE_SHIFT and I40E_GLGEN_MSCA_OPCODE_SHIFT to define masks. Introduce I40E_GLGEN_MSCA_OPCODE_MASK and I40E_GLGEN_MSCA_STCODE_MASK for both shifts in i40e_register.h and use them to refactor the macros mentioned above.
Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 6dbdd4de0362 ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/intel/i40e/i40e_register.h | 2 ++ drivers/net/ethernet/intel/i40e/i40e_type.h | 23 +++++++------------ 2 files changed, 10 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_register.h b/drivers/net/ethernet/intel/i40e/i40e_register.h index 694cb3e45c1ec..989c186824733 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_register.h +++ b/drivers/net/ethernet/intel/i40e/i40e_register.h @@ -202,7 +202,9 @@ #define I40E_GLGEN_MSCA_DEVADD_SHIFT 16 #define I40E_GLGEN_MSCA_PHYADD_SHIFT 21 #define I40E_GLGEN_MSCA_OPCODE_SHIFT 26 +#define I40E_GLGEN_MSCA_OPCODE_MASK(_i) I40E_MASK(_i, I40E_GLGEN_MSCA_OPCODE_SHIFT) #define I40E_GLGEN_MSCA_STCODE_SHIFT 28 +#define I40E_GLGEN_MSCA_STCODE_MASK I40E_MASK(0x1, I40E_GLGEN_MSCA_STCODE_SHIFT) #define I40E_GLGEN_MSCA_MDICMD_SHIFT 30 #define I40E_GLGEN_MSCA_MDICMD_MASK I40E_MASK(0x1, I40E_GLGEN_MSCA_MDICMD_SHIFT) #define I40E_GLGEN_MSCA_MDIINPROGEN_SHIFT 31 diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h index 658bc89132783..d4c6afe84fdd2 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_type.h +++ b/drivers/net/ethernet/intel/i40e/i40e_type.h @@ -70,21 +70,14 @@ enum i40e_debug_mask { I40E_DEBUG_ALL = 0xFFFFFFFF };
-#define I40E_MDIO_CLAUSE22_STCODE_MASK I40E_MASK(1, \ - I40E_GLGEN_MSCA_STCODE_SHIFT) -#define I40E_MDIO_CLAUSE22_OPCODE_WRITE_MASK I40E_MASK(1, \ - I40E_GLGEN_MSCA_OPCODE_SHIFT) -#define I40E_MDIO_CLAUSE22_OPCODE_READ_MASK I40E_MASK(2, \ - I40E_GLGEN_MSCA_OPCODE_SHIFT) - -#define I40E_MDIO_CLAUSE45_STCODE_MASK I40E_MASK(0, \ - I40E_GLGEN_MSCA_STCODE_SHIFT) -#define I40E_MDIO_CLAUSE45_OPCODE_ADDRESS_MASK I40E_MASK(0, \ - I40E_GLGEN_MSCA_OPCODE_SHIFT) -#define I40E_MDIO_CLAUSE45_OPCODE_WRITE_MASK I40E_MASK(1, \ - I40E_GLGEN_MSCA_OPCODE_SHIFT) -#define I40E_MDIO_CLAUSE45_OPCODE_READ_MASK I40E_MASK(3, \ - I40E_GLGEN_MSCA_OPCODE_SHIFT) +#define I40E_MDIO_CLAUSE22_STCODE_MASK I40E_GLGEN_MSCA_STCODE_MASK +#define I40E_MDIO_CLAUSE22_OPCODE_WRITE_MASK I40E_GLGEN_MSCA_OPCODE_MASK(1) +#define I40E_MDIO_CLAUSE22_OPCODE_READ_MASK I40E_GLGEN_MSCA_OPCODE_MASK(2) + +#define I40E_MDIO_CLAUSE45_STCODE_MASK I40E_GLGEN_MSCA_STCODE_MASK +#define I40E_MDIO_CLAUSE45_OPCODE_ADDRESS_MASK I40E_GLGEN_MSCA_OPCODE_MASK(0) +#define I40E_MDIO_CLAUSE45_OPCODE_WRITE_MASK I40E_GLGEN_MSCA_OPCODE_MASK(1) +#define I40E_MDIO_CLAUSE45_OPCODE_READ_MASK I40E_GLGEN_MSCA_OPCODE_MASK(3)
#define I40E_PHY_COM_REG_PAGE 0x1E #define I40E_PHY_LED_LINK_MODE_MASK 0xF0
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit 7151d87a175c6618fe81705755eb3dc4199cad4e ]
The <linux/avf/virtchnl.h> uses BIT, struct_size and ETH_ALEN macros but does not include appropriate header files that defines them. Add these dependencies so this header file can be included anywhere.
Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 6dbdd4de0362 ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/avf/virtchnl.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index d0807ad43f933..6424aa06fb08d 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -4,6 +4,10 @@ #ifndef _VIRTCHNL_H_ #define _VIRTCHNL_H_
+#include <linux/bitops.h> +#include <linux/overflow.h> +#include <uapi/linux/if_ether.h> + /* Description: * This header file describes the Virtual Function (VF) - Physical Function * (PF) communication protocol used by the drivers for all devices starting
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit d3276f928a1d2dfebc41a82e967cd0dffeb540f8 ]
Enum i40e_memory_type enum is unused in i40e_allocate_dma_mem() thus can be safely removed. Useless macros in i40e_alloc.h can be removed as well.
Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 6dbdd4de0362 ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_adminq.c | 4 ---- drivers/net/ethernet/intel/i40e/i40e_alloc.h | 14 ------------- drivers/net/ethernet/intel/i40e/i40e_hmc.c | 12 ++++------- drivers/net/ethernet/intel/i40e/i40e_main.c | 20 +++++++++---------- drivers/net/ethernet/intel/i40e/i40e_osdep.h | 7 ------- 5 files changed, 14 insertions(+), 43 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.c b/drivers/net/ethernet/intel/i40e/i40e_adminq.c index 100eb77b8dfe6..e72cfe587c89e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.c +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.c @@ -51,7 +51,6 @@ static int i40e_alloc_adminq_asq_ring(struct i40e_hw *hw) int ret_code;
ret_code = i40e_allocate_dma_mem(hw, &hw->aq.asq.desc_buf, - i40e_mem_atq_ring, (hw->aq.num_asq_entries * sizeof(struct i40e_aq_desc)), I40E_ADMINQ_DESC_ALIGNMENT); @@ -78,7 +77,6 @@ static int i40e_alloc_adminq_arq_ring(struct i40e_hw *hw) int ret_code;
ret_code = i40e_allocate_dma_mem(hw, &hw->aq.arq.desc_buf, - i40e_mem_arq_ring, (hw->aq.num_arq_entries * sizeof(struct i40e_aq_desc)), I40E_ADMINQ_DESC_ALIGNMENT); @@ -136,7 +134,6 @@ static int i40e_alloc_arq_bufs(struct i40e_hw *hw) for (i = 0; i < hw->aq.num_arq_entries; i++) { bi = &hw->aq.arq.r.arq_bi[i]; ret_code = i40e_allocate_dma_mem(hw, bi, - i40e_mem_arq_buf, hw->aq.arq_buf_size, I40E_ADMINQ_DESC_ALIGNMENT); if (ret_code) @@ -198,7 +195,6 @@ static int i40e_alloc_asq_bufs(struct i40e_hw *hw) for (i = 0; i < hw->aq.num_asq_entries; i++) { bi = &hw->aq.asq.r.asq_bi[i]; ret_code = i40e_allocate_dma_mem(hw, bi, - i40e_mem_asq_buf, hw->aq.asq_buf_size, I40E_ADMINQ_DESC_ALIGNMENT); if (ret_code) diff --git a/drivers/net/ethernet/intel/i40e/i40e_alloc.h b/drivers/net/ethernet/intel/i40e/i40e_alloc.h index a6c9a9e343d11..4b2d8da048c64 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_alloc.h +++ b/drivers/net/ethernet/intel/i40e/i40e_alloc.h @@ -6,23 +6,9 @@
struct i40e_hw;
-/* Memory allocation types */ -enum i40e_memory_type { - i40e_mem_arq_buf = 0, /* ARQ indirect command buffer */ - i40e_mem_asq_buf = 1, - i40e_mem_atq_buf = 2, /* ATQ indirect command buffer */ - i40e_mem_arq_ring = 3, /* ARQ descriptor ring */ - i40e_mem_atq_ring = 4, /* ATQ descriptor ring */ - i40e_mem_pd = 5, /* Page Descriptor */ - i40e_mem_bp = 6, /* Backing Page - 4KB */ - i40e_mem_bp_jumbo = 7, /* Backing Page - > 4KB */ - i40e_mem_reserved -}; - /* prototype for functions used for dynamic memory allocation */ int i40e_allocate_dma_mem(struct i40e_hw *hw, struct i40e_dma_mem *mem, - enum i40e_memory_type type, u64 size, u32 alignment); int i40e_free_dma_mem(struct i40e_hw *hw, struct i40e_dma_mem *mem); diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_hmc.c index 96ee63aca7a10..7451d346ae83f 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_hmc.c +++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.c @@ -22,7 +22,6 @@ int i40e_add_sd_table_entry(struct i40e_hw *hw, enum i40e_sd_entry_type type, u64 direct_mode_sz) { - enum i40e_memory_type mem_type __attribute__((unused)); struct i40e_hmc_sd_entry *sd_entry; bool dma_mem_alloc_done = false; struct i40e_dma_mem mem; @@ -43,16 +42,13 @@ int i40e_add_sd_table_entry(struct i40e_hw *hw,
sd_entry = &hmc_info->sd_table.sd_entry[sd_index]; if (!sd_entry->valid) { - if (I40E_SD_TYPE_PAGED == type) { - mem_type = i40e_mem_pd; + if (type == I40E_SD_TYPE_PAGED) alloc_len = I40E_HMC_PAGED_BP_SIZE; - } else { - mem_type = i40e_mem_bp_jumbo; + else alloc_len = direct_mode_sz; - }
/* allocate a 4K pd page or 2M backing page */ - ret_code = i40e_allocate_dma_mem(hw, &mem, mem_type, alloc_len, + ret_code = i40e_allocate_dma_mem(hw, &mem, alloc_len, I40E_HMC_PD_BP_BUF_ALIGNMENT); if (ret_code) goto exit; @@ -140,7 +136,7 @@ int i40e_add_pd_table_entry(struct i40e_hw *hw, page = rsrc_pg; } else { /* allocate a 4K backing page */ - ret_code = i40e_allocate_dma_mem(hw, page, i40e_mem_bp, + ret_code = i40e_allocate_dma_mem(hw, page, I40E_HMC_PAGED_BP_SIZE, I40E_HMC_PD_BP_BUF_ALIGNMENT); if (ret_code) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 17ab6a1c53971..46b7a428808a8 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -137,14 +137,14 @@ struct device *i40e_hw_to_dev(struct i40e_hw *hw) }
/** - * i40e_allocate_dma_mem_d - OS specific memory alloc for shared code + * i40e_allocate_dma_mem - OS specific memory alloc for shared code * @hw: pointer to the HW structure * @mem: ptr to mem struct to fill out * @size: size of memory requested * @alignment: what to align the allocation to **/ -int i40e_allocate_dma_mem_d(struct i40e_hw *hw, struct i40e_dma_mem *mem, - u64 size, u32 alignment) +int i40e_allocate_dma_mem(struct i40e_hw *hw, struct i40e_dma_mem *mem, + u64 size, u32 alignment) { struct i40e_pf *pf = i40e_hw_to_pf(hw);
@@ -158,11 +158,11 @@ int i40e_allocate_dma_mem_d(struct i40e_hw *hw, struct i40e_dma_mem *mem, }
/** - * i40e_free_dma_mem_d - OS specific memory free for shared code + * i40e_free_dma_mem - OS specific memory free for shared code * @hw: pointer to the HW structure * @mem: ptr to mem struct to free **/ -int i40e_free_dma_mem_d(struct i40e_hw *hw, struct i40e_dma_mem *mem) +int i40e_free_dma_mem(struct i40e_hw *hw, struct i40e_dma_mem *mem) { struct i40e_pf *pf = i40e_hw_to_pf(hw);
@@ -175,13 +175,13 @@ int i40e_free_dma_mem_d(struct i40e_hw *hw, struct i40e_dma_mem *mem) }
/** - * i40e_allocate_virt_mem_d - OS specific memory alloc for shared code + * i40e_allocate_virt_mem - OS specific memory alloc for shared code * @hw: pointer to the HW structure * @mem: ptr to mem struct to fill out * @size: size of memory requested **/ -int i40e_allocate_virt_mem_d(struct i40e_hw *hw, struct i40e_virt_mem *mem, - u32 size) +int i40e_allocate_virt_mem(struct i40e_hw *hw, struct i40e_virt_mem *mem, + u32 size) { mem->size = size; mem->va = kzalloc(size, GFP_KERNEL); @@ -193,11 +193,11 @@ int i40e_allocate_virt_mem_d(struct i40e_hw *hw, struct i40e_virt_mem *mem, }
/** - * i40e_free_virt_mem_d - OS specific memory free for shared code + * i40e_free_virt_mem - OS specific memory free for shared code * @hw: pointer to the HW structure * @mem: ptr to mem struct to free **/ -int i40e_free_virt_mem_d(struct i40e_hw *hw, struct i40e_virt_mem *mem) +int i40e_free_virt_mem(struct i40e_hw *hw, struct i40e_virt_mem *mem) { /* it's ok to kfree a NULL pointer */ kfree(mem->va); diff --git a/drivers/net/ethernet/intel/i40e/i40e_osdep.h b/drivers/net/ethernet/intel/i40e/i40e_osdep.h index 997569a4ad57b..70cac3bb31ec3 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_osdep.h +++ b/drivers/net/ethernet/intel/i40e/i40e_osdep.h @@ -36,18 +36,11 @@ struct i40e_dma_mem { u32 size; };
-#define i40e_allocate_dma_mem(h, m, unused, s, a) \ - i40e_allocate_dma_mem_d(h, m, s, a) -#define i40e_free_dma_mem(h, m) i40e_free_dma_mem_d(h, m) - struct i40e_virt_mem { void *va; u32 size; };
-#define i40e_allocate_virt_mem(h, m, s) i40e_allocate_virt_mem_d(h, m, s) -#define i40e_free_virt_mem(h, m) i40e_free_virt_mem_d(h, m) - #define i40e_debug(h, m, s, ...) \ do { \ if (((m) & (h)->debug_mask)) \
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit ef5d54078d451973f90e123fafa23fc95c2a08ae ]
Structures i40e_dma_mem & i40e_virt_mem are defined i40e_osdep.h while memory allocation functions that use them are declared in i40e_alloc.h Move them there.
Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 6dbdd4de0362 ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_adminq.h | 1 + drivers/net/ethernet/intel/i40e/i40e_alloc.h | 14 ++++++++++++++ drivers/net/ethernet/intel/i40e/i40e_osdep.h | 12 ------------ 3 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h b/drivers/net/ethernet/intel/i40e/i40e_adminq.h index 267f2e0a21ce8..1c3d2bc5c3f79 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h @@ -4,6 +4,7 @@ #ifndef _I40E_ADMINQ_H_ #define _I40E_ADMINQ_H_
+#include "i40e_alloc.h" #include "i40e_osdep.h" #include "i40e_adminq_cmd.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_alloc.h b/drivers/net/ethernet/intel/i40e/i40e_alloc.h index 4b2d8da048c64..e0dde326255d6 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_alloc.h +++ b/drivers/net/ethernet/intel/i40e/i40e_alloc.h @@ -4,8 +4,22 @@ #ifndef _I40E_ALLOC_H_ #define _I40E_ALLOC_H_
+#include <linux/types.h> + struct i40e_hw;
+/* memory allocation tracking */ +struct i40e_dma_mem { + void *va; + dma_addr_t pa; + u32 size; +}; + +struct i40e_virt_mem { + void *va; + u32 size; +}; + /* prototype for functions used for dynamic memory allocation */ int i40e_allocate_dma_mem(struct i40e_hw *hw, struct i40e_dma_mem *mem, diff --git a/drivers/net/ethernet/intel/i40e/i40e_osdep.h b/drivers/net/ethernet/intel/i40e/i40e_osdep.h index 70cac3bb31ec3..fd18895cfb56b 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_osdep.h +++ b/drivers/net/ethernet/intel/i40e/i40e_osdep.h @@ -29,18 +29,6 @@ struct device *i40e_hw_to_dev(struct i40e_hw *hw); #define rd64(a, reg) readq((a)->hw_addr + (reg)) #define i40e_flush(a) readl((a)->hw_addr + I40E_GLGEN_STAT)
-/* memory allocation tracking */ -struct i40e_dma_mem { - void *va; - dma_addr_t pa; - u32 size; -}; - -struct i40e_virt_mem { - void *va; - u32 size; -}; - #define i40e_debug(h, m, s, ...) \ do { \ if (((m) & (h)->debug_mask)) \
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit 5dfd37c37a44ba47c35ff8e6eaff14c226141111 ]
Header i40e_osdep.h contains only IO primitives and couple of debug printing macros. Split this header file to i40e_io.h and i40e_debug.h and move i40e_debug_mask enum to i40e_debug.h
Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 6dbdd4de0362 ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_adminq.h | 2 +- drivers/net/ethernet/intel/i40e/i40e_debug.h | 47 +++++++++++++++++++ drivers/net/ethernet/intel/i40e/i40e_hmc.c | 1 - drivers/net/ethernet/intel/i40e/i40e_io.h | 16 +++++++ .../net/ethernet/intel/i40e/i40e_lan_hmc.c | 1 - drivers/net/ethernet/intel/i40e/i40e_osdep.h | 40 ---------------- .../net/ethernet/intel/i40e/i40e_prototype.h | 1 + drivers/net/ethernet/intel/i40e/i40e_type.h | 31 ++---------- 8 files changed, 68 insertions(+), 71 deletions(-) create mode 100644 drivers/net/ethernet/intel/i40e/i40e_debug.h create mode 100644 drivers/net/ethernet/intel/i40e/i40e_io.h delete mode 100644 drivers/net/ethernet/intel/i40e/i40e_osdep.h
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h b/drivers/net/ethernet/intel/i40e/i40e_adminq.h index 1c3d2bc5c3f79..80125bea80a2a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h @@ -4,8 +4,8 @@ #ifndef _I40E_ADMINQ_H_ #define _I40E_ADMINQ_H_
+#include <linux/mutex.h> #include "i40e_alloc.h" -#include "i40e_osdep.h" #include "i40e_adminq_cmd.h"
#define I40E_ADMINQ_DESC(R, i) \ diff --git a/drivers/net/ethernet/intel/i40e/i40e_debug.h b/drivers/net/ethernet/intel/i40e/i40e_debug.h new file mode 100644 index 0000000000000..27ebc72d8bfe5 --- /dev/null +++ b/drivers/net/ethernet/intel/i40e/i40e_debug.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2023 Intel Corporation. */ + +#ifndef _I40E_DEBUG_H_ +#define _I40E_DEBUG_H_ + +#include <linux/dev_printk.h> + +/* debug masks - set these bits in hw->debug_mask to control output */ +enum i40e_debug_mask { + I40E_DEBUG_INIT = 0x00000001, + I40E_DEBUG_RELEASE = 0x00000002, + + I40E_DEBUG_LINK = 0x00000010, + I40E_DEBUG_PHY = 0x00000020, + I40E_DEBUG_HMC = 0x00000040, + I40E_DEBUG_NVM = 0x00000080, + I40E_DEBUG_LAN = 0x00000100, + I40E_DEBUG_FLOW = 0x00000200, + I40E_DEBUG_DCB = 0x00000400, + I40E_DEBUG_DIAG = 0x00000800, + I40E_DEBUG_FD = 0x00001000, + I40E_DEBUG_PACKAGE = 0x00002000, + I40E_DEBUG_IWARP = 0x00F00000, + I40E_DEBUG_AQ_MESSAGE = 0x01000000, + I40E_DEBUG_AQ_DESCRIPTOR = 0x02000000, + I40E_DEBUG_AQ_DESC_BUFFER = 0x04000000, + I40E_DEBUG_AQ_COMMAND = 0x06000000, + I40E_DEBUG_AQ = 0x0F000000, + + I40E_DEBUG_USER = 0xF0000000, + + I40E_DEBUG_ALL = 0xFFFFFFFF +}; + +struct i40e_hw; +struct device *i40e_hw_to_dev(struct i40e_hw *hw); + +#define hw_dbg(hw, S, A...) dev_dbg(i40e_hw_to_dev(hw), S, ##A) + +#define i40e_debug(h, m, s, ...) \ +do { \ + if (((m) & (h)->debug_mask)) \ + dev_info(i40e_hw_to_dev(hw), s, ##__VA_ARGS__); \ +} while (0) + +#endif /* _I40E_DEBUG_H_ */ diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_hmc.c index 7451d346ae83f..b383aea652f3e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_hmc.c +++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.c @@ -2,7 +2,6 @@ /* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e.h" -#include "i40e_osdep.h" #include "i40e_register.h" #include "i40e_alloc.h" #include "i40e_hmc.h" diff --git a/drivers/net/ethernet/intel/i40e/i40e_io.h b/drivers/net/ethernet/intel/i40e/i40e_io.h new file mode 100644 index 0000000000000..2a2ed9a1d476b --- /dev/null +++ b/drivers/net/ethernet/intel/i40e/i40e_io.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2023 Intel Corporation. */ + +#ifndef _I40E_IO_H_ +#define _I40E_IO_H_ + +/* get readq/writeq support for 32 bit kernels, use the low-first version */ +#include <linux/io-64-nonatomic-lo-hi.h> + +#define wr32(a, reg, value) writel((value), ((a)->hw_addr + (reg))) +#define rd32(a, reg) readl((a)->hw_addr + (reg)) + +#define rd64(a, reg) readq((a)->hw_addr + (reg)) +#define i40e_flush(a) readl((a)->hw_addr + I40E_GLGEN_STAT) + +#endif /* _I40E_IO_H_ */ diff --git a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c index 474365bf06480..830f1de254ef4 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c +++ b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c @@ -2,7 +2,6 @@ /* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e.h" -#include "i40e_osdep.h" #include "i40e_register.h" #include "i40e_type.h" #include "i40e_hmc.h" diff --git a/drivers/net/ethernet/intel/i40e/i40e_osdep.h b/drivers/net/ethernet/intel/i40e/i40e_osdep.h deleted file mode 100644 index fd18895cfb56b..0000000000000 --- a/drivers/net/ethernet/intel/i40e/i40e_osdep.h +++ /dev/null @@ -1,40 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* Copyright(c) 2013 - 2018 Intel Corporation. */ - -#ifndef _I40E_OSDEP_H_ -#define _I40E_OSDEP_H_ - -#include <linux/types.h> -#include <linux/if_ether.h> -#include <linux/if_vlan.h> -#include <linux/tcp.h> -#include <linux/pci.h> -#include <linux/highuid.h> - -/* get readq/writeq support for 32 bit kernels, use the low-first version */ -#include <linux/io-64-nonatomic-lo-hi.h> - -/* File to be the magic between shared code and - * actual OS primitives - */ - -struct i40e_hw; -struct device *i40e_hw_to_dev(struct i40e_hw *hw); - -#define hw_dbg(hw, S, A...) dev_dbg(i40e_hw_to_dev(hw), S, ##A) - -#define wr32(a, reg, value) writel((value), ((a)->hw_addr + (reg))) -#define rd32(a, reg) readl((a)->hw_addr + (reg)) - -#define rd64(a, reg) readq((a)->hw_addr + (reg)) -#define i40e_flush(a) readl((a)->hw_addr + I40E_GLGEN_STAT) - -#define i40e_debug(h, m, s, ...) \ -do { \ - if (((m) & (h)->debug_mask)) \ - pr_info("i40e %02x:%02x.%x " s, \ - (h)->bus.bus_id, (h)->bus.device, \ - (h)->bus.func, ##__VA_ARGS__); \ -} while (0) - -#endif /* _I40E_OSDEP_H_ */ diff --git a/drivers/net/ethernet/intel/i40e/i40e_prototype.h b/drivers/net/ethernet/intel/i40e/i40e_prototype.h index 3eeee224f1fb2..9c9234c0706f0 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_prototype.h +++ b/drivers/net/ethernet/intel/i40e/i40e_prototype.h @@ -6,6 +6,7 @@
#include "i40e_type.h" #include "i40e_alloc.h" +#include "i40e_debug.h" #include <linux/avf/virtchnl.h>
/* Prototypes for shared code functions that are not in diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h index d4c6afe84fdd2..44cea0f4f908d 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_type.h +++ b/drivers/net/ethernet/intel/i40e/i40e_type.h @@ -4,7 +4,9 @@ #ifndef _I40E_TYPE_H_ #define _I40E_TYPE_H_
-#include "i40e_osdep.h" +#include <linux/delay.h> +#include <linux/if_ether.h> +#include "i40e_io.h" #include "i40e_register.h" #include "i40e_adminq.h" #include "i40e_hmc.h" @@ -43,33 +45,6 @@ typedef void (*I40E_ADMINQ_CALLBACK)(struct i40e_hw *, struct i40e_aq_desc *); #define I40E_QTX_CTL_VM_QUEUE 0x1 #define I40E_QTX_CTL_PF_QUEUE 0x2
-/* debug masks - set these bits in hw->debug_mask to control output */ -enum i40e_debug_mask { - I40E_DEBUG_INIT = 0x00000001, - I40E_DEBUG_RELEASE = 0x00000002, - - I40E_DEBUG_LINK = 0x00000010, - I40E_DEBUG_PHY = 0x00000020, - I40E_DEBUG_HMC = 0x00000040, - I40E_DEBUG_NVM = 0x00000080, - I40E_DEBUG_LAN = 0x00000100, - I40E_DEBUG_FLOW = 0x00000200, - I40E_DEBUG_DCB = 0x00000400, - I40E_DEBUG_DIAG = 0x00000800, - I40E_DEBUG_FD = 0x00001000, - I40E_DEBUG_PACKAGE = 0x00002000, - I40E_DEBUG_IWARP = 0x00F00000, - I40E_DEBUG_AQ_MESSAGE = 0x01000000, - I40E_DEBUG_AQ_DESCRIPTOR = 0x02000000, - I40E_DEBUG_AQ_DESC_BUFFER = 0x04000000, - I40E_DEBUG_AQ_COMMAND = 0x06000000, - I40E_DEBUG_AQ = 0x0F000000, - - I40E_DEBUG_USER = 0xF0000000, - - I40E_DEBUG_ALL = 0xFFFFFFFF -}; - #define I40E_MDIO_CLAUSE22_STCODE_MASK I40E_GLGEN_MSCA_STCODE_MASK #define I40E_MDIO_CLAUSE22_OPCODE_WRITE_MASK I40E_GLGEN_MSCA_OPCODE_MASK(1) #define I40E_MDIO_CLAUSE22_OPCODE_READ_MASK I40E_GLGEN_MSCA_OPCODE_MASK(2)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit 56df345917c09ffc00b7834f88990a7a7c338b5c ]
Similarly as for ice driver [1] there are also circular header dependencies in i40e driver: i40e.h -> i40e_virtchnl_pf.h -> i40e.h
Another issue is that i40e header files does not contain their own dependencies on other header files (both private and standard) so their inclusion in .c file require to add these deps in certain order to that .c file to make it compilable.
Fix both issues by removal the mentioned circular dependency, by filling i40e headers with their dependencies so they can be placed anywhere in a source code. Additionally remove bunch of includes from i40e.h super header file that are not necessary and include i40e.h only in .c files that really require it.
[1] 649c87c6ff52 ("ice: remove circular header dependencies on ice.h")
Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 6dbdd4de0362 ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e.h | 43 ++++--------------- drivers/net/ethernet/intel/i40e/i40e_adminq.c | 4 +- .../net/ethernet/intel/i40e/i40e_adminq_cmd.h | 2 + drivers/net/ethernet/intel/i40e/i40e_client.c | 1 - drivers/net/ethernet/intel/i40e/i40e_common.c | 11 +++-- drivers/net/ethernet/intel/i40e/i40e_dcb.c | 4 +- drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_ddp.c | 2 +- .../net/ethernet/intel/i40e/i40e_debugfs.c | 3 +- drivers/net/ethernet/intel/i40e/i40e_diag.h | 5 ++- .../net/ethernet/intel/i40e/i40e_ethtool.c | 3 +- drivers/net/ethernet/intel/i40e/i40e_hmc.c | 3 +- drivers/net/ethernet/intel/i40e/i40e_hmc.h | 4 ++ .../net/ethernet/intel/i40e/i40e_lan_hmc.c | 8 ++-- .../net/ethernet/intel/i40e/i40e_lan_hmc.h | 2 + drivers/net/ethernet/intel/i40e/i40e_main.c | 15 ++++--- drivers/net/ethernet/intel/i40e/i40e_nvm.c | 2 + .../net/ethernet/intel/i40e/i40e_prototype.h | 5 +-- drivers/net/ethernet/intel/i40e/i40e_ptp.c | 3 +- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 7 ++- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 1 + .../ethernet/intel/i40e/i40e_txrx_common.h | 2 + drivers/net/ethernet/intel/i40e/i40e_type.h | 7 +-- .../ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 + .../ethernet/intel/i40e/i40e_virtchnl_pf.h | 4 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 4 -- drivers/net/ethernet/intel/i40e/i40e_xsk.h | 4 ++ 27 files changed, 72 insertions(+), 81 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 6f08c8fe653bd..3e6839ac1f0f1 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -4,47 +4,20 @@ #ifndef _I40E_H_ #define _I40E_H_
-#include <net/tcp.h> -#include <net/udp.h> -#include <linux/types.h> -#include <linux/errno.h> -#include <linux/module.h> -#include <linux/pci.h> -#include <linux/netdevice.h> -#include <linux/ioport.h> -#include <linux/iommu.h> -#include <linux/slab.h> -#include <linux/list.h> -#include <linux/hashtable.h> -#include <linux/string.h> -#include <linux/in.h> -#include <linux/ip.h> -#include <linux/sctp.h> -#include <linux/pkt_sched.h> -#include <linux/ipv6.h> -#include <net/checksum.h> -#include <net/ip6_checksum.h> #include <linux/ethtool.h> -#include <linux/if_vlan.h> -#include <linux/if_macvlan.h> -#include <linux/if_bridge.h> -#include <linux/clocksource.h> -#include <linux/net_tstamp.h> +#include <linux/pci.h> #include <linux/ptp_clock_kernel.h> +#include <linux/types.h> +#include <linux/avf/virtchnl.h> +#include <linux/net/intel/i40e_client.h> #include <net/pkt_cls.h> -#include <net/pkt_sched.h> -#include <net/tc_act/tc_gact.h> -#include <net/tc_act/tc_mirred.h> #include <net/udp_tunnel.h> -#include <net/xdp_sock.h> -#include <linux/bitfield.h> -#include "i40e_type.h" +#include "i40e_dcb.h" +#include "i40e_debug.h" +#include "i40e_io.h" #include "i40e_prototype.h" -#include <linux/net/intel/i40e_client.h> -#include <linux/avf/virtchnl.h> -#include "i40e_virtchnl_pf.h" +#include "i40e_register.h" #include "i40e_txrx.h" -#include "i40e_dcb.h"
/* Useful i40e defaults */ #define I40E_MAX_VEB 16 diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.c b/drivers/net/ethernet/intel/i40e/i40e_adminq.c index e72cfe587c89e..9ce6e633cc2f0 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.c +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.c @@ -1,9 +1,9 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2018 Intel Corporation. */
-#include "i40e_type.h" +#include <linux/delay.h> +#include "i40e_alloc.h" #include "i40e_register.h" -#include "i40e_adminq.h" #include "i40e_prototype.h"
static void i40e_resume_aq(struct i40e_hw *hw); diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h index 3357d65a906bf..18a1c3b6d72c5 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h @@ -4,6 +4,8 @@ #ifndef _I40E_ADMINQ_CMD_H_ #define _I40E_ADMINQ_CMD_H_
+#include <linux/bits.h> + /* This header file defines the i40e Admin Queue commands and is shared between * i40e Firmware and Software. * diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.c b/drivers/net/ethernet/intel/i40e/i40e_client.c index 639c5a1ca853b..306758428aefd 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_client.c +++ b/drivers/net/ethernet/intel/i40e/i40e_client.c @@ -6,7 +6,6 @@ #include <linux/net/intel/i40e_client.h>
#include "i40e.h" -#include "i40e_prototype.h"
static LIST_HEAD(i40e_devices); static DEFINE_MUTEX(i40e_device_mutex); diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c index 1b493854f5229..e0685219dbde9 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_common.c +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c @@ -1,11 +1,14 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2021 Intel Corporation. */
-#include "i40e.h" -#include "i40e_type.h" -#include "i40e_adminq.h" -#include "i40e_prototype.h" #include <linux/avf/virtchnl.h> +#include <linux/delay.h> +#include <linux/etherdevice.h> +#include <linux/pci.h> +#include "i40e_adminq_cmd.h" +#include "i40e_devids.h" +#include "i40e_prototype.h" +#include "i40e_register.h"
/** * i40e_set_mac_type - Sets MAC type diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb.c b/drivers/net/ethernet/intel/i40e/i40e_dcb.c index f81e744c0fb36..68602fc375f62 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_dcb.c +++ b/drivers/net/ethernet/intel/i40e/i40e_dcb.c @@ -1,9 +1,9 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2021 Intel Corporation. */
-#include "i40e_adminq.h" -#include "i40e_prototype.h" +#include "i40e_alloc.h" #include "i40e_dcb.h" +#include "i40e_prototype.h"
/** * i40e_get_dcbx_status diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c b/drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c index 195421d863ab1..077a95dad32cf 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c +++ b/drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c @@ -2,8 +2,8 @@ /* Copyright(c) 2013 - 2021 Intel Corporation. */
#ifdef CONFIG_I40E_DCB -#include "i40e.h" #include <net/dcbnl.h> +#include "i40e.h"
#define I40E_DCBNL_STATUS_SUCCESS 0 #define I40E_DCBNL_STATUS_ERROR 1 diff --git a/drivers/net/ethernet/intel/i40e/i40e_ddp.c b/drivers/net/ethernet/intel/i40e/i40e_ddp.c index 0e72abd178ae3..21b3518c40968 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_ddp.c +++ b/drivers/net/ethernet/intel/i40e/i40e_ddp.c @@ -1,9 +1,9 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2018 Intel Corporation. */
+#include <linux/firmware.h> #include "i40e.h"
-#include <linux/firmware.h>
/** * i40e_ddp_profiles_eq - checks if DDP profiles are the equivalent diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c index 1a497cb077100..999c9708def53 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c @@ -5,8 +5,9 @@
#include <linux/fs.h> #include <linux/debugfs.h> - +#include <linux/if_bridge.h> #include "i40e.h" +#include "i40e_virtchnl_pf.h"
static struct dentry *i40e_dbg_root;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_diag.h b/drivers/net/ethernet/intel/i40e/i40e_diag.h index c3ce5f35211f0..ece3a6b9a5c61 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_diag.h +++ b/drivers/net/ethernet/intel/i40e/i40e_diag.h @@ -4,7 +4,10 @@ #ifndef _I40E_DIAG_H_ #define _I40E_DIAG_H_
-#include "i40e_type.h" +#include "i40e_adminq_cmd.h" + +/* forward-declare the HW struct for the compiler */ +struct i40e_hw;
enum i40e_lb_mode { I40E_LB_MODE_NONE = 0x0, diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c index bd1321bf7e268..4e90570ba7803 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c @@ -3,9 +3,10 @@
/* ethtool support for i40e */
-#include "i40e.h" +#include "i40e_devids.h" #include "i40e_diag.h" #include "i40e_txrx_common.h" +#include "i40e_virtchnl_pf.h"
/* ethtool statistics helpers */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_hmc.c index b383aea652f3e..1742624ca62ed 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_hmc.c +++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.c @@ -1,9 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2018 Intel Corporation. */
-#include "i40e.h" -#include "i40e_register.h" #include "i40e_alloc.h" +#include "i40e_debug.h" #include "i40e_hmc.h" #include "i40e_type.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.h b/drivers/net/ethernet/intel/i40e/i40e_hmc.h index 9960da07a5732..480e3a883cc7a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_hmc.h +++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.h @@ -4,6 +4,10 @@ #ifndef _I40E_HMC_H_ #define _I40E_HMC_H_
+#include "i40e_alloc.h" +#include "i40e_io.h" +#include "i40e_register.h" + #define I40E_HMC_MAX_BP_COUNT 512
/* forward-declare the HW struct for the compiler */ diff --git a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c index 830f1de254ef4..beaaf5c309d51 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c +++ b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c @@ -1,12 +1,10 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2018 Intel Corporation. */
-#include "i40e.h" -#include "i40e_register.h" -#include "i40e_type.h" -#include "i40e_hmc.h" +#include "i40e_alloc.h" +#include "i40e_debug.h" #include "i40e_lan_hmc.h" -#include "i40e_prototype.h" +#include "i40e_type.h"
/* lan specific interface functions */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.h b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.h index 9f960404c2b37..305a276953b01 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.h +++ b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.h @@ -4,6 +4,8 @@ #ifndef _I40E_LAN_HMC_H_ #define _I40E_LAN_HMC_H_
+#include "i40e_hmc.h" + /* forward-declare the HW struct for the compiler */ struct i40e_hw;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 46b7a428808a8..a21fc92aa2725 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -1,19 +1,22 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2021 Intel Corporation. */
-#include <linux/etherdevice.h> -#include <linux/of_net.h> -#include <linux/pci.h> -#include <linux/bpf.h> #include <generated/utsrelease.h> #include <linux/crash_dump.h> +#include <linux/if_bridge.h> +#include <linux/if_macvlan.h> +#include <linux/module.h> +#include <net/pkt_cls.h> +#include <net/xdp_sock_drv.h>
/* Local includes */ #include "i40e.h" +#include "i40e_devids.h" #include "i40e_diag.h" +#include "i40e_lan_hmc.h" +#include "i40e_virtchnl_pf.h" #include "i40e_xsk.h" -#include <net/udp_tunnel.h> -#include <net/xdp_sock_drv.h> + /* All i40e tracepoints are defined by the include below, which * must be included exactly once across the whole kernel with * CREATE_TRACE_POINTS defined diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c b/drivers/net/ethernet/intel/i40e/i40e_nvm.c index 07a46adeab38e..77cdbfc19d477 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c +++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c @@ -1,6 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2018 Intel Corporation. */
+#include <linux/delay.h> +#include "i40e_alloc.h" #include "i40e_prototype.h"
/** diff --git a/drivers/net/ethernet/intel/i40e/i40e_prototype.h b/drivers/net/ethernet/intel/i40e/i40e_prototype.h index 9c9234c0706f0..2001fefa0c52d 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_prototype.h +++ b/drivers/net/ethernet/intel/i40e/i40e_prototype.h @@ -4,10 +4,9 @@ #ifndef _I40E_PROTOTYPE_H_ #define _I40E_PROTOTYPE_H_
-#include "i40e_type.h" -#include "i40e_alloc.h" -#include "i40e_debug.h" #include <linux/avf/virtchnl.h> +#include "i40e_debug.h" +#include "i40e_type.h"
/* Prototypes for shared code functions that are not in * the standard function pointer structures. These are diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c index cac9584debb1d..65c714d0bfffd 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c +++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c @@ -1,9 +1,10 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2018 Intel Corporation. */
-#include "i40e.h" #include <linux/ptp_classify.h> #include <linux/posix-clock.h> +#include "i40e.h" +#include "i40e_devids.h"
/* The XL710 timesync is very much like Intel's 82599 design when it comes to * the fundamental clock design. However, the clock operations are much simpler diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index f703646622d9a..c962987d8b51b 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -1,14 +1,13 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2018 Intel Corporation. */
-#include <linux/prefetch.h> #include <linux/bpf_trace.h> +#include <linux/prefetch.h> +#include <linux/sctp.h> #include <net/mpls.h> #include <net/xdp.h> -#include "i40e.h" -#include "i40e_trace.h" -#include "i40e_prototype.h" #include "i40e_txrx_common.h" +#include "i40e_trace.h" #include "i40e_xsk.h"
#define I40E_TXD_CMD (I40E_TX_DESC_CMD_EOP | I40E_TX_DESC_CMD_RS) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index 81f6a991bfb73..2b1d50873a4d1 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -5,6 +5,7 @@ #define _I40E_TXRX_H_
#include <net/xdp.h> +#include "i40e_type.h"
/* Interrupt Throttling and Rate Limiting Goodies */ #define I40E_DEFAULT_IRQ_WORK 256 diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h index 8c5118c8baafb..e26807fd21232 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h @@ -4,6 +4,8 @@ #ifndef I40E_TXRX_COMMON_ #define I40E_TXRX_COMMON_
+#include "i40e.h" + int i40e_xmit_xdp_tx_ring(struct xdp_buff *xdp, struct i40e_ring *xdp_ring); void i40e_clean_programming_status(struct i40e_ring *rx_ring, u64 qword0_raw, u64 qword1); diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h index 44cea0f4f908d..4092f82bcfb12 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_type.h +++ b/drivers/net/ethernet/intel/i40e/i40e_type.h @@ -4,14 +4,9 @@ #ifndef _I40E_TYPE_H_ #define _I40E_TYPE_H_
-#include <linux/delay.h> -#include <linux/if_ether.h> -#include "i40e_io.h" -#include "i40e_register.h" +#include <uapi/linux/if_ether.h> #include "i40e_adminq.h" #include "i40e_hmc.h" -#include "i40e_lan_hmc.h" -#include "i40e_devids.h"
/* I40E_MASK is a macro used on 32 bit registers */ #define I40E_MASK(mask, shift) ((u32)(mask) << (shift)) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 6b90453205b23..7d47a05274548 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -2,6 +2,8 @@ /* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e.h" +#include "i40e_lan_hmc.h" +#include "i40e_virtchnl_pf.h"
/*********************notification routines***********************/
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h index cf190762421cc..66f95e2f3146a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h @@ -4,7 +4,9 @@ #ifndef _I40E_VIRTCHNL_PF_H_ #define _I40E_VIRTCHNL_PF_H_
-#include "i40e.h" +#include <linux/avf/virtchnl.h> +#include <linux/netdevice.h> +#include "i40e_type.h"
#define I40E_MAX_VLANID 4095
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 1f8ae6f5d9807..65f38a57b3dfe 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -2,11 +2,7 @@ /* Copyright(c) 2018 Intel Corporation. */
#include <linux/bpf_trace.h> -#include <linux/stringify.h> #include <net/xdp_sock_drv.h> -#include <net/xdp.h> - -#include "i40e.h" #include "i40e_txrx_common.h" #include "i40e_xsk.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h index 821df248f8bee..ef156fad52f26 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h @@ -4,6 +4,8 @@ #ifndef _I40E_XSK_H_ #define _I40E_XSK_H_
+#include <linux/types.h> + /* This value should match the pragma in the loop_unrolled_for * macro. Why 4? It is strictly empirical. It seems to be a good * compromise between the advantage of having simultaneous outstanding @@ -20,7 +22,9 @@ #define loop_unrolled_for for #endif
+struct i40e_ring; struct i40e_vsi; +struct net_device; struct xsk_buff_pool;
int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesse Brandeburg jesse.brandeburg@intel.com
[ Upstream commit 3314f2097dee43defc20554f961a8b17f4787e2d ]
This series is introducing the use of FIELD_GET and FIELD_PREP which requires bitfield.h to be included. Fix all the includes in this one change, and rearrange includes into alphabetical order to ease readability and future maintenance.
virtchnl.h and it's usage was modified to have it's own includes as it should. This required including bits.h for virtchnl.h.
Reviewed-by: Marcin Szycik marcin.szycik@linux.intel.com Signed-off-by: Jesse Brandeburg jesse.brandeburg@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 6dbdd4de0362 ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/e1000/e1000_hw.c | 1 + drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 1 + drivers/net/ethernet/intel/fm10k/fm10k_vf.c | 1 + drivers/net/ethernet/intel/i40e/i40e_common.c | 1 + drivers/net/ethernet/intel/i40e/i40e_dcb.c | 2 ++ drivers/net/ethernet/intel/i40e/i40e_nvm.c | 1 + drivers/net/ethernet/intel/iavf/iavf_common.c | 3 +- .../net/ethernet/intel/iavf/iavf_ethtool.c | 5 ++-- drivers/net/ethernet/intel/iavf/iavf_fdir.c | 1 + drivers/net/ethernet/intel/iavf/iavf_txrx.c | 1 + drivers/net/ethernet/intel/igb/e1000_i210.c | 4 +-- drivers/net/ethernet/intel/igb/e1000_nvm.c | 4 +-- drivers/net/ethernet/intel/igb/e1000_phy.c | 4 +-- drivers/net/ethernet/intel/igbvf/netdev.c | 28 +++++++++---------- drivers/net/ethernet/intel/igc/igc_i225.c | 1 + drivers/net/ethernet/intel/igc/igc_phy.c | 1 + include/linux/avf/virtchnl.h | 1 + 17 files changed, 37 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000/e1000_hw.c b/drivers/net/ethernet/intel/e1000/e1000_hw.c index 4542e2bc28e8d..4576511c99f56 100644 --- a/drivers/net/ethernet/intel/e1000/e1000_hw.c +++ b/drivers/net/ethernet/intel/e1000/e1000_hw.c @@ -5,6 +5,7 @@ * Shared functions for accessing and configuring the MAC */
+#include <linux/bitfield.h> #include "e1000.h"
static s32 e1000_check_downshift(struct e1000_hw *hw); diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c index af1b0cde36703..ae700a1807c65 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2019 Intel Corporation. */
+#include <linux/bitfield.h> #include "fm10k_pf.h" #include "fm10k_vf.h"
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c index dc8ccd378ec92..c50928ec14fff 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2019 Intel Corporation. */
+#include <linux/bitfield.h> #include "fm10k_vf.h"
/** diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c index e0685219dbde9..4d7caa1199719 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_common.c +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c @@ -2,6 +2,7 @@ /* Copyright(c) 2013 - 2021 Intel Corporation. */
#include <linux/avf/virtchnl.h> +#include <linux/bitfield.h> #include <linux/delay.h> #include <linux/etherdevice.h> #include <linux/pci.h> diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb.c b/drivers/net/ethernet/intel/i40e/i40e_dcb.c index 68602fc375f62..d57dd30b024fa 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_dcb.c +++ b/drivers/net/ethernet/intel/i40e/i40e_dcb.c @@ -1,6 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2021 Intel Corporation. */
+#include <linux/bitfield.h> +#include "i40e_adminq.h" #include "i40e_alloc.h" #include "i40e_dcb.h" #include "i40e_prototype.h" diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c b/drivers/net/ethernet/intel/i40e/i40e_nvm.c index 77cdbfc19d477..e5aec09d58e27 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c +++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2018 Intel Corporation. */
+#include <linux/bitfield.h> #include <linux/delay.h> #include "i40e_alloc.h" #include "i40e_prototype.h" diff --git a/drivers/net/ethernet/intel/iavf/iavf_common.c b/drivers/net/ethernet/intel/iavf/iavf_common.c index 1afd761d80520..f7988cf5efa58 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_common.c +++ b/drivers/net/ethernet/intel/iavf/iavf_common.c @@ -1,10 +1,11 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2018 Intel Corporation. */
+#include <linux/avf/virtchnl.h> +#include <linux/bitfield.h> #include "iavf_type.h" #include "iavf_adminq.h" #include "iavf_prototype.h" -#include <linux/avf/virtchnl.h>
/** * iavf_set_mac_type - Sets MAC type diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c index 892c6a4f03bb8..1ac97bd606e38 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c @@ -1,11 +1,12 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2018 Intel Corporation. */
+#include <linux/bitfield.h> +#include <linux/uaccess.h> + /* ethtool support for iavf */ #include "iavf.h"
-#include <linux/uaccess.h> - /* ethtool statistics helpers */
/** diff --git a/drivers/net/ethernet/intel/iavf/iavf_fdir.c b/drivers/net/ethernet/intel/iavf/iavf_fdir.c index 03e774bd2a5b4..65ddcd81c993e 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_fdir.c +++ b/drivers/net/ethernet/intel/iavf/iavf_fdir.c @@ -3,6 +3,7 @@
/* flow director ethtool support for iavf */
+#include <linux/bitfield.h> #include "iavf.h"
#define GTPU_PORT 2152 diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c index 8c5f6096b0022..f998ecf743c46 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c +++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2013 - 2018 Intel Corporation. */
+#include <linux/bitfield.h> #include <linux/prefetch.h>
#include "iavf.h" diff --git a/drivers/net/ethernet/intel/igb/e1000_i210.c b/drivers/net/ethernet/intel/igb/e1000_i210.c index b9b9d35494d27..53b396fd194a3 100644 --- a/drivers/net/ethernet/intel/igb/e1000_i210.c +++ b/drivers/net/ethernet/intel/igb/e1000_i210.c @@ -5,9 +5,9 @@ * e1000_i211 */
-#include <linux/types.h> +#include <linux/bitfield.h> #include <linux/if_ether.h> - +#include <linux/types.h> #include "e1000_hw.h" #include "e1000_i210.h"
diff --git a/drivers/net/ethernet/intel/igb/e1000_nvm.c b/drivers/net/ethernet/intel/igb/e1000_nvm.c index fa136e6e93285..0da57e89593a0 100644 --- a/drivers/net/ethernet/intel/igb/e1000_nvm.c +++ b/drivers/net/ethernet/intel/igb/e1000_nvm.c @@ -1,9 +1,9 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2007 - 2018 Intel Corporation. */
-#include <linux/if_ether.h> +#include <linux/bitfield.h> #include <linux/delay.h> - +#include <linux/if_ether.h> #include "e1000_mac.h" #include "e1000_nvm.h"
diff --git a/drivers/net/ethernet/intel/igb/e1000_phy.c b/drivers/net/ethernet/intel/igb/e1000_phy.c index a018000f7db92..3c1b562a3271c 100644 --- a/drivers/net/ethernet/intel/igb/e1000_phy.c +++ b/drivers/net/ethernet/intel/igb/e1000_phy.c @@ -1,9 +1,9 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2007 - 2018 Intel Corporation. */
-#include <linux/if_ether.h> +#include <linux/bitfield.h> #include <linux/delay.h> - +#include <linux/if_ether.h> #include "e1000_mac.h" #include "e1000_phy.h"
diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c index 7ff2752dd763a..c748668bf2fce 100644 --- a/drivers/net/ethernet/intel/igbvf/netdev.c +++ b/drivers/net/ethernet/intel/igbvf/netdev.c @@ -3,25 +3,25 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#include <linux/module.h> -#include <linux/types.h> -#include <linux/init.h> -#include <linux/pci.h> -#include <linux/vmalloc.h> -#include <linux/pagemap.h> +#include <linux/bitfield.h> #include <linux/delay.h> -#include <linux/netdevice.h> -#include <linux/tcp.h> -#include <linux/ipv6.h> -#include <linux/slab.h> -#include <net/checksum.h> -#include <net/ip6_checksum.h> -#include <linux/mii.h> #include <linux/ethtool.h> #include <linux/if_vlan.h> +#include <linux/init.h> +#include <linux/ipv6.h> +#include <linux/mii.h> +#include <linux/module.h> +#include <linux/netdevice.h> +#include <linux/pagemap.h> +#include <linux/pci.h> #include <linux/prefetch.h> #include <linux/sctp.h> - +#include <linux/slab.h> +#include <linux/tcp.h> +#include <linux/types.h> +#include <linux/vmalloc.h> +#include <net/checksum.h> +#include <net/ip6_checksum.h> #include "igbvf.h"
char igbvf_driver_name[] = "igbvf"; diff --git a/drivers/net/ethernet/intel/igc/igc_i225.c b/drivers/net/ethernet/intel/igc/igc_i225.c index 17546a035ab19..d2562c8e8015e 100644 --- a/drivers/net/ethernet/intel/igc/igc_i225.c +++ b/drivers/net/ethernet/intel/igc/igc_i225.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2018 Intel Corporation */
+#include <linux/bitfield.h> #include <linux/delay.h>
#include "igc_hw.h" diff --git a/drivers/net/ethernet/intel/igc/igc_phy.c b/drivers/net/ethernet/intel/igc/igc_phy.c index 53b77c969c857..d0d9e7170154c 100644 --- a/drivers/net/ethernet/intel/igc/igc_phy.c +++ b/drivers/net/ethernet/intel/igc/igc_phy.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2018 Intel Corporation */
+#include <linux/bitfield.h> #include "igc_phy.h"
/** diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index 6424aa06fb08d..6e950594215a0 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -5,6 +5,7 @@ #define _VIRTCHNL_H_
#include <linux/bitops.h> +#include <linux/bits.h> #include <linux/overflow.h> #include <uapi/linux/if_ether.h>
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesse Brandeburg jesse.brandeburg@intel.com
[ Upstream commit b9a4525450758dd75edbdaee97425ba7546c2b5c ]
Refactor several older Intel drivers to use FIELD_GET(), which reduces lines of code and adds clarity of intent.
This code was generated by the following coccinelle/spatch script and then manually repaired.
@get@ constant shift,mask; type T; expression a; @@ ( -((T)((a) & mask) >> shift) +FIELD_GET(mask, a)
and applied via: spatch --sp-file field_prep.cocci --in-place --dir \ drivers/net/ethernet/intel/
Cc: Julia Lawall Julia.Lawall@inria.fr CC: Alexander Lobakin aleksander.lobakin@intel.com Reviewed-by: Marcin Szycik marcin.szycik@linux.intel.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Jesse Brandeburg jesse.brandeburg@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 6dbdd4de0362 ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/e1000/e1000_hw.c | 45 ++++++++----------- .../net/ethernet/intel/e1000e/80003es2lan.c | 3 +- drivers/net/ethernet/intel/e1000e/82571.c | 3 +- drivers/net/ethernet/intel/e1000e/ethtool.c | 7 ++- drivers/net/ethernet/intel/e1000e/ich8lan.c | 18 +++----- drivers/net/ethernet/intel/e1000e/mac.c | 2 +- drivers/net/ethernet/intel/e1000e/netdev.c | 11 ++--- drivers/net/ethernet/intel/e1000e/phy.c | 17 +++---- drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 3 +- drivers/net/ethernet/intel/fm10k/fm10k_vf.c | 9 ++-- drivers/net/ethernet/intel/igb/e1000_82575.c | 29 +++++------- drivers/net/ethernet/intel/igb/e1000_i210.c | 15 ++++--- drivers/net/ethernet/intel/igb/e1000_mac.c | 2 +- drivers/net/ethernet/intel/igb/e1000_nvm.c | 14 +++--- drivers/net/ethernet/intel/igb/e1000_phy.c | 9 ++-- drivers/net/ethernet/intel/igb/igb_ethtool.c | 8 ++-- drivers/net/ethernet/intel/igb/igb_main.c | 4 +- drivers/net/ethernet/intel/igbvf/mbx.c | 1 + drivers/net/ethernet/intel/igbvf/netdev.c | 5 +-- .../net/ethernet/intel/ixgbe/ixgbe_common.c | 30 ++++++------- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c | 8 ++-- .../net/ethernet/intel/ixgbe/ixgbe_sriov.c | 8 ++-- drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c | 8 ++-- drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 19 ++++---- 25 files changed, 118 insertions(+), 162 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000/e1000_hw.c b/drivers/net/ethernet/intel/e1000/e1000_hw.c index 4576511c99f56..f9328f2e669f8 100644 --- a/drivers/net/ethernet/intel/e1000/e1000_hw.c +++ b/drivers/net/ethernet/intel/e1000/e1000_hw.c @@ -3261,8 +3261,7 @@ static s32 e1000_phy_igp_get_info(struct e1000_hw *hw, return ret_val;
phy_info->mdix_mode = - (e1000_auto_x_mode) ((phy_data & IGP01E1000_PSSR_MDIX) >> - IGP01E1000_PSSR_MDIX_SHIFT); + (e1000_auto_x_mode)FIELD_GET(IGP01E1000_PSSR_MDIX, phy_data);
if ((phy_data & IGP01E1000_PSSR_SPEED_MASK) == IGP01E1000_PSSR_SPEED_1000MBPS) { @@ -3273,11 +3272,11 @@ static s32 e1000_phy_igp_get_info(struct e1000_hw *hw, if (ret_val) return ret_val;
- phy_info->local_rx = ((phy_data & SR_1000T_LOCAL_RX_STATUS) >> - SR_1000T_LOCAL_RX_STATUS_SHIFT) ? + phy_info->local_rx = FIELD_GET(SR_1000T_LOCAL_RX_STATUS, + phy_data) ? e1000_1000t_rx_status_ok : e1000_1000t_rx_status_not_ok; - phy_info->remote_rx = ((phy_data & SR_1000T_REMOTE_RX_STATUS) >> - SR_1000T_REMOTE_RX_STATUS_SHIFT) ? + phy_info->remote_rx = FIELD_GET(SR_1000T_REMOTE_RX_STATUS, + phy_data) ? e1000_1000t_rx_status_ok : e1000_1000t_rx_status_not_ok;
/* Get cable length */ @@ -3327,14 +3326,12 @@ static s32 e1000_phy_m88_get_info(struct e1000_hw *hw, return ret_val;
phy_info->extended_10bt_distance = - ((phy_data & M88E1000_PSCR_10BT_EXT_DIST_ENABLE) >> - M88E1000_PSCR_10BT_EXT_DIST_ENABLE_SHIFT) ? + FIELD_GET(M88E1000_PSCR_10BT_EXT_DIST_ENABLE, phy_data) ? e1000_10bt_ext_dist_enable_lower : e1000_10bt_ext_dist_enable_normal;
phy_info->polarity_correction = - ((phy_data & M88E1000_PSCR_POLARITY_REVERSAL) >> - M88E1000_PSCR_POLARITY_REVERSAL_SHIFT) ? + FIELD_GET(M88E1000_PSCR_POLARITY_REVERSAL, phy_data) ? e1000_polarity_reversal_disabled : e1000_polarity_reversal_enabled;
/* Check polarity status */ @@ -3348,27 +3345,25 @@ static s32 e1000_phy_m88_get_info(struct e1000_hw *hw, return ret_val;
phy_info->mdix_mode = - (e1000_auto_x_mode) ((phy_data & M88E1000_PSSR_MDIX) >> - M88E1000_PSSR_MDIX_SHIFT); + (e1000_auto_x_mode)FIELD_GET(M88E1000_PSSR_MDIX, phy_data);
if ((phy_data & M88E1000_PSSR_SPEED) == M88E1000_PSSR_1000MBS) { /* Cable Length Estimation and Local/Remote Receiver Information * are only valid at 1000 Mbps. */ phy_info->cable_length = - (e1000_cable_length) ((phy_data & - M88E1000_PSSR_CABLE_LENGTH) >> - M88E1000_PSSR_CABLE_LENGTH_SHIFT); + (e1000_cable_length)FIELD_GET(M88E1000_PSSR_CABLE_LENGTH, + phy_data);
ret_val = e1000_read_phy_reg(hw, PHY_1000T_STATUS, &phy_data); if (ret_val) return ret_val;
- phy_info->local_rx = ((phy_data & SR_1000T_LOCAL_RX_STATUS) >> - SR_1000T_LOCAL_RX_STATUS_SHIFT) ? + phy_info->local_rx = FIELD_GET(SR_1000T_LOCAL_RX_STATUS, + phy_data) ? e1000_1000t_rx_status_ok : e1000_1000t_rx_status_not_ok; - phy_info->remote_rx = ((phy_data & SR_1000T_REMOTE_RX_STATUS) >> - SR_1000T_REMOTE_RX_STATUS_SHIFT) ? + phy_info->remote_rx = FIELD_GET(SR_1000T_REMOTE_RX_STATUS, + phy_data) ? e1000_1000t_rx_status_ok : e1000_1000t_rx_status_not_ok; }
@@ -3516,7 +3511,7 @@ s32 e1000_init_eeprom_params(struct e1000_hw *hw) if (ret_val) return ret_val; eeprom_size = - (eeprom_size & EEPROM_SIZE_MASK) >> EEPROM_SIZE_SHIFT; + FIELD_GET(EEPROM_SIZE_MASK, eeprom_size); /* 256B eeprom size was not supported in earlier hardware, so we * bump eeprom_size up one to ensure that "1" (which maps to * 256B) is never the result used in the shifting logic below. @@ -4892,8 +4887,7 @@ static s32 e1000_get_cable_length(struct e1000_hw *hw, u16 *min_length, &phy_data); if (ret_val) return ret_val; - cable_length = (phy_data & M88E1000_PSSR_CABLE_LENGTH) >> - M88E1000_PSSR_CABLE_LENGTH_SHIFT; + cable_length = FIELD_GET(M88E1000_PSSR_CABLE_LENGTH, phy_data);
/* Convert the enum value to ranged values */ switch (cable_length) { @@ -5002,8 +4996,7 @@ static s32 e1000_check_polarity(struct e1000_hw *hw, &phy_data); if (ret_val) return ret_val; - *polarity = ((phy_data & M88E1000_PSSR_REV_POLARITY) >> - M88E1000_PSSR_REV_POLARITY_SHIFT) ? + *polarity = FIELD_GET(M88E1000_PSSR_REV_POLARITY, phy_data) ? e1000_rev_polarity_reversed : e1000_rev_polarity_normal;
} else if (hw->phy_type == e1000_phy_igp) { @@ -5073,8 +5066,8 @@ static s32 e1000_check_downshift(struct e1000_hw *hw) if (ret_val) return ret_val;
- hw->speed_downgraded = (phy_data & M88E1000_PSSR_DOWNSHIFT) >> - M88E1000_PSSR_DOWNSHIFT_SHIFT; + hw->speed_downgraded = FIELD_GET(M88E1000_PSSR_DOWNSHIFT, + phy_data); }
return E1000_SUCCESS; diff --git a/drivers/net/ethernet/intel/e1000e/80003es2lan.c b/drivers/net/ethernet/intel/e1000e/80003es2lan.c index be9c695dde127..c51fb6bf9c4e0 100644 --- a/drivers/net/ethernet/intel/e1000e/80003es2lan.c +++ b/drivers/net/ethernet/intel/e1000e/80003es2lan.c @@ -92,8 +92,7 @@ static s32 e1000_init_nvm_params_80003es2lan(struct e1000_hw *hw)
nvm->type = e1000_nvm_eeprom_spi;
- size = (u16)((eecd & E1000_EECD_SIZE_EX_MASK) >> - E1000_EECD_SIZE_EX_SHIFT); + size = (u16)FIELD_GET(E1000_EECD_SIZE_EX_MASK, eecd);
/* Added to a constant, "size" becomes the left-shift value * for setting word_size. diff --git a/drivers/net/ethernet/intel/e1000e/82571.c b/drivers/net/ethernet/intel/e1000e/82571.c index 0b1e890dd583b..969f855a79ee6 100644 --- a/drivers/net/ethernet/intel/e1000e/82571.c +++ b/drivers/net/ethernet/intel/e1000e/82571.c @@ -157,8 +157,7 @@ static s32 e1000_init_nvm_params_82571(struct e1000_hw *hw) fallthrough; default: nvm->type = e1000_nvm_eeprom_spi; - size = (u16)((eecd & E1000_EECD_SIZE_EX_MASK) >> - E1000_EECD_SIZE_EX_SHIFT); + size = (u16)FIELD_GET(E1000_EECD_SIZE_EX_MASK, eecd); /* Added to a constant, "size" becomes the left-shift value * for setting word_size. */ diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c index 9835e6a90d56c..fc0f98ea61332 100644 --- a/drivers/net/ethernet/intel/e1000e/ethtool.c +++ b/drivers/net/ethernet/intel/e1000e/ethtool.c @@ -654,8 +654,8 @@ static void e1000_get_drvinfo(struct net_device *netdev, */ snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), "%d.%d-%d", - (adapter->eeprom_vers & 0xF000) >> 12, - (adapter->eeprom_vers & 0x0FF0) >> 4, + FIELD_GET(0xF000, adapter->eeprom_vers), + FIELD_GET(0x0FF0, adapter->eeprom_vers), (adapter->eeprom_vers & 0x000F));
strscpy(drvinfo->bus_info, pci_name(adapter->pdev), @@ -925,8 +925,7 @@ static int e1000_reg_test(struct e1000_adapter *adapter, u64 *data) }
if (mac->type >= e1000_pch_lpt) - wlock_mac = (er32(FWSM) & E1000_FWSM_WLOCK_MAC_MASK) >> - E1000_FWSM_WLOCK_MAC_SHIFT; + wlock_mac = FIELD_GET(E1000_FWSM_WLOCK_MAC_MASK, er32(FWSM));
for (i = 0; i < mac->rar_entry_count; i++) { if (mac->type >= e1000_pch_lpt) { diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c index 39e9fc601bf5a..a2788fd5f8bb8 100644 --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c @@ -1072,13 +1072,11 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, bool link)
lat_enc_d = (lat_enc & E1000_LTRV_VALUE_MASK) * (1U << (E1000_LTRV_SCALE_FACTOR * - ((lat_enc & E1000_LTRV_SCALE_MASK) - >> E1000_LTRV_SCALE_SHIFT))); + FIELD_GET(E1000_LTRV_SCALE_MASK, lat_enc)));
max_ltr_enc_d = (max_ltr_enc & E1000_LTRV_VALUE_MASK) * - (1U << (E1000_LTRV_SCALE_FACTOR * - ((max_ltr_enc & E1000_LTRV_SCALE_MASK) - >> E1000_LTRV_SCALE_SHIFT))); + (1U << (E1000_LTRV_SCALE_FACTOR * + FIELD_GET(E1000_LTRV_SCALE_MASK, max_ltr_enc)));
if (lat_enc_d > max_ltr_enc_d) lat_enc = max_ltr_enc; @@ -2075,8 +2073,7 @@ static s32 e1000_write_smbus_addr(struct e1000_hw *hw) { u16 phy_data; u32 strap = er32(STRAP); - u32 freq = (strap & E1000_STRAP_SMT_FREQ_MASK) >> - E1000_STRAP_SMT_FREQ_SHIFT; + u32 freq = FIELD_GET(E1000_STRAP_SMT_FREQ_MASK, strap); s32 ret_val;
strap &= E1000_STRAP_SMBUS_ADDRESS_MASK; @@ -2562,8 +2559,7 @@ void e1000_copy_rx_addrs_to_phy_ich8lan(struct e1000_hw *hw) hw->phy.ops.write_reg_page(hw, BM_RAR_H(i), (u16)(mac_reg & 0xFFFF)); hw->phy.ops.write_reg_page(hw, BM_RAR_CTRL(i), - (u16)((mac_reg & E1000_RAH_AV) - >> 16)); + FIELD_GET(E1000_RAH_AV, mac_reg)); }
e1000_disable_phy_wakeup_reg_access_bm(hw, &phy_reg); @@ -3205,7 +3201,7 @@ static s32 e1000_valid_nvm_bank_detect_ich8lan(struct e1000_hw *hw, u32 *bank) &nvm_dword); if (ret_val) return ret_val; - sig_byte = (u8)((nvm_dword & 0xFF00) >> 8); + sig_byte = FIELD_GET(0xFF00, nvm_dword); if ((sig_byte & E1000_ICH_NVM_VALID_SIG_MASK) == E1000_ICH_NVM_SIG_VALUE) { *bank = 0; @@ -3218,7 +3214,7 @@ static s32 e1000_valid_nvm_bank_detect_ich8lan(struct e1000_hw *hw, u32 *bank) &nvm_dword); if (ret_val) return ret_val; - sig_byte = (u8)((nvm_dword & 0xFF00) >> 8); + sig_byte = FIELD_GET(0xFF00, nvm_dword); if ((sig_byte & E1000_ICH_NVM_VALID_SIG_MASK) == E1000_ICH_NVM_SIG_VALUE) { *bank = 1; diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c index 5df7ad93f3d77..30515bfb259ea 100644 --- a/drivers/net/ethernet/intel/e1000e/mac.c +++ b/drivers/net/ethernet/intel/e1000e/mac.c @@ -52,7 +52,7 @@ void e1000_set_lan_id_multi_port_pcie(struct e1000_hw *hw) * for the device regardless of function swap state. */ reg = er32(STATUS); - bus->func = (reg & E1000_STATUS_FUNC_MASK) >> E1000_STATUS_FUNC_SHIFT; + bus->func = FIELD_GET(E1000_STATUS_FUNC_MASK, reg); }
/** diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index f536c856727cb..af5d9d97a0d6c 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -1788,8 +1788,7 @@ static irqreturn_t e1000_intr_msi(int __always_unused irq, void *data) adapter->corr_errors += pbeccsts & E1000_PBECCSTS_CORR_ERR_CNT_MASK; adapter->uncorr_errors += - (pbeccsts & E1000_PBECCSTS_UNCORR_ERR_CNT_MASK) >> - E1000_PBECCSTS_UNCORR_ERR_CNT_SHIFT; + FIELD_GET(E1000_PBECCSTS_UNCORR_ERR_CNT_MASK, pbeccsts);
/* Do the reset outside of interrupt context */ schedule_work(&adapter->reset_task); @@ -1868,8 +1867,7 @@ static irqreturn_t e1000_intr(int __always_unused irq, void *data) adapter->corr_errors += pbeccsts & E1000_PBECCSTS_CORR_ERR_CNT_MASK; adapter->uncorr_errors += - (pbeccsts & E1000_PBECCSTS_UNCORR_ERR_CNT_MASK) >> - E1000_PBECCSTS_UNCORR_ERR_CNT_SHIFT; + FIELD_GET(E1000_PBECCSTS_UNCORR_ERR_CNT_MASK, pbeccsts);
/* Do the reset outside of interrupt context */ schedule_work(&adapter->reset_task); @@ -5031,8 +5029,7 @@ static void e1000e_update_stats(struct e1000_adapter *adapter) adapter->corr_errors += pbeccsts & E1000_PBECCSTS_CORR_ERR_CNT_MASK; adapter->uncorr_errors += - (pbeccsts & E1000_PBECCSTS_UNCORR_ERR_CNT_MASK) >> - E1000_PBECCSTS_UNCORR_ERR_CNT_SHIFT; + FIELD_GET(E1000_PBECCSTS_UNCORR_ERR_CNT_MASK, pbeccsts); } }
@@ -6249,7 +6246,7 @@ static int e1000_init_phy_wakeup(struct e1000_adapter *adapter, u32 wufc) phy_reg |= BM_RCTL_MPE; phy_reg &= ~(BM_RCTL_MO_MASK); if (mac_reg & E1000_RCTL_MO_3) - phy_reg |= (((mac_reg & E1000_RCTL_MO_3) >> E1000_RCTL_MO_SHIFT) + phy_reg |= (FIELD_GET(E1000_RCTL_MO_3, mac_reg) << BM_RCTL_MO_SHIFT); if (mac_reg & E1000_RCTL_BAM) phy_reg |= BM_RCTL_BAM; diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c index 08c3d477dd6f7..96ff0ca561b6c 100644 --- a/drivers/net/ethernet/intel/e1000e/phy.c +++ b/drivers/net/ethernet/intel/e1000e/phy.c @@ -154,10 +154,9 @@ s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data) e_dbg("MDI Read PHY Reg Address %d Error\n", offset); return -E1000_ERR_PHY; } - if (((mdic & E1000_MDIC_REG_MASK) >> E1000_MDIC_REG_SHIFT) != offset) { + if (FIELD_GET(E1000_MDIC_REG_MASK, mdic) != offset) { e_dbg("MDI Read offset error - requested %d, returned %d\n", - offset, - (mdic & E1000_MDIC_REG_MASK) >> E1000_MDIC_REG_SHIFT); + offset, FIELD_GET(E1000_MDIC_REG_MASK, mdic)); return -E1000_ERR_PHY; } *data = (u16)mdic; @@ -167,7 +166,6 @@ s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data) */ if (hw->mac.type == e1000_pch2lan) udelay(100); - return 0; }
@@ -218,10 +216,9 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data) e_dbg("MDI Write PHY Red Address %d Error\n", offset); return -E1000_ERR_PHY; } - if (((mdic & E1000_MDIC_REG_MASK) >> E1000_MDIC_REG_SHIFT) != offset) { + if (FIELD_GET(E1000_MDIC_REG_MASK, mdic) != offset) { e_dbg("MDI Write offset error - requested %d, returned %d\n", - offset, - (mdic & E1000_MDIC_REG_MASK) >> E1000_MDIC_REG_SHIFT); + offset, FIELD_GET(E1000_MDIC_REG_MASK, mdic)); return -E1000_ERR_PHY; }
@@ -1793,8 +1790,7 @@ s32 e1000e_get_cable_length_m88(struct e1000_hw *hw) if (ret_val) return ret_val;
- index = ((phy_data & M88E1000_PSSR_CABLE_LENGTH) >> - M88E1000_PSSR_CABLE_LENGTH_SHIFT); + index = FIELD_GET(M88E1000_PSSR_CABLE_LENGTH, phy_data);
if (index >= M88E1000_CABLE_LENGTH_TABLE_SIZE - 1) return -E1000_ERR_PHY; @@ -3234,8 +3230,7 @@ s32 e1000_get_cable_length_82577(struct e1000_hw *hw) if (ret_val) return ret_val;
- length = ((phy_data & I82577_DSTATUS_CABLE_LENGTH) >> - I82577_DSTATUS_CABLE_LENGTH_SHIFT); + length = FIELD_GET(I82577_DSTATUS_CABLE_LENGTH, phy_data);
if (length == E1000_CABLE_LENGTH_UNDEFINED) return -E1000_ERR_PHY; diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c index ae700a1807c65..aed5e0bf6313e 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c @@ -1576,8 +1576,7 @@ static s32 fm10k_get_fault_pf(struct fm10k_hw *hw, int type, if (func & FM10K_FAULT_FUNC_PF) fault->func = 0; else - fault->func = 1 + ((func & FM10K_FAULT_FUNC_VF_MASK) >> - FM10K_FAULT_FUNC_VF_SHIFT); + fault->func = 1 + FIELD_GET(FM10K_FAULT_FUNC_VF_MASK, func);
/* record fault type */ fault->type = func & FM10K_FAULT_FUNC_TYPE_MASK; diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c index c50928ec14fff..7fb1961f29210 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c @@ -127,15 +127,14 @@ static s32 fm10k_init_hw_vf(struct fm10k_hw *hw) hw->mac.max_queues = i;
/* fetch default VLAN and ITR scale */ - hw->mac.default_vid = (fm10k_read_reg(hw, FM10K_TXQCTL(0)) & - FM10K_TXQCTL_VID_MASK) >> FM10K_TXQCTL_VID_SHIFT; + hw->mac.default_vid = FIELD_GET(FM10K_TXQCTL_VID_MASK, + fm10k_read_reg(hw, FM10K_TXQCTL(0))); /* Read the ITR scale from TDLEN. See the definition of * FM10K_TDLEN_ITR_SCALE_SHIFT for more information about how TDLEN is * used here. */ - hw->mac.itr_scale = (fm10k_read_reg(hw, FM10K_TDLEN(0)) & - FM10K_TDLEN_ITR_SCALE_MASK) >> - FM10K_TDLEN_ITR_SCALE_SHIFT; + hw->mac.itr_scale = FIELD_GET(FM10K_TDLEN_ITR_SCALE_MASK, + fm10k_read_reg(hw, FM10K_TDLEN(0)));
return 0;
diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c b/drivers/net/ethernet/intel/igb/e1000_82575.c index 8d6e44ee1895a..64dfc362d1dc4 100644 --- a/drivers/net/ethernet/intel/igb/e1000_82575.c +++ b/drivers/net/ethernet/intel/igb/e1000_82575.c @@ -222,8 +222,7 @@ static s32 igb_init_phy_params_82575(struct e1000_hw *hw) }
/* set lan id */ - hw->bus.func = (rd32(E1000_STATUS) & E1000_STATUS_FUNC_MASK) >> - E1000_STATUS_FUNC_SHIFT; + hw->bus.func = FIELD_GET(E1000_STATUS_FUNC_MASK, rd32(E1000_STATUS));
/* Set phy->phy_addr and phy->id. */ ret_val = igb_get_phy_id_82575(hw); @@ -262,8 +261,8 @@ static s32 igb_init_phy_params_82575(struct e1000_hw *hw) if (ret_val) goto out;
- data = (data & E1000_M88E1112_MAC_CTRL_1_MODE_MASK) >> - E1000_M88E1112_MAC_CTRL_1_MODE_SHIFT; + data = FIELD_GET(E1000_M88E1112_MAC_CTRL_1_MODE_MASK, + data); if (data == E1000_M88E1112_AUTO_COPPER_SGMII || data == E1000_M88E1112_AUTO_COPPER_BASEX) hw->mac.ops.check_for_link = @@ -330,8 +329,7 @@ static s32 igb_init_nvm_params_82575(struct e1000_hw *hw) u32 eecd = rd32(E1000_EECD); u16 size;
- size = (u16)((eecd & E1000_EECD_SIZE_EX_MASK) >> - E1000_EECD_SIZE_EX_SHIFT); + size = FIELD_GET(E1000_EECD_SIZE_EX_MASK, eecd);
/* Added to a constant, "size" becomes the left-shift value * for setting word_size. @@ -2798,7 +2796,7 @@ static s32 igb_get_thermal_sensor_data_generic(struct e1000_hw *hw) return 0;
hw->nvm.ops.read(hw, ets_offset, 1, &ets_cfg); - if (((ets_cfg & NVM_ETS_TYPE_MASK) >> NVM_ETS_TYPE_SHIFT) + if (FIELD_GET(NVM_ETS_TYPE_MASK, ets_cfg) != NVM_ETS_TYPE_EMC) return E1000_NOT_IMPLEMENTED;
@@ -2808,10 +2806,8 @@ static s32 igb_get_thermal_sensor_data_generic(struct e1000_hw *hw)
for (i = 1; i < num_sensors; i++) { hw->nvm.ops.read(hw, (ets_offset + i), 1, &ets_sensor); - sensor_index = ((ets_sensor & NVM_ETS_DATA_INDEX_MASK) >> - NVM_ETS_DATA_INDEX_SHIFT); - sensor_location = ((ets_sensor & NVM_ETS_DATA_LOC_MASK) >> - NVM_ETS_DATA_LOC_SHIFT); + sensor_index = FIELD_GET(NVM_ETS_DATA_INDEX_MASK, ets_sensor); + sensor_location = FIELD_GET(NVM_ETS_DATA_LOC_MASK, ets_sensor);
if (sensor_location != 0) hw->phy.ops.read_i2c_byte(hw, @@ -2859,20 +2855,17 @@ static s32 igb_init_thermal_sensor_thresh_generic(struct e1000_hw *hw) return 0;
hw->nvm.ops.read(hw, ets_offset, 1, &ets_cfg); - if (((ets_cfg & NVM_ETS_TYPE_MASK) >> NVM_ETS_TYPE_SHIFT) + if (FIELD_GET(NVM_ETS_TYPE_MASK, ets_cfg) != NVM_ETS_TYPE_EMC) return E1000_NOT_IMPLEMENTED;
- low_thresh_delta = ((ets_cfg & NVM_ETS_LTHRES_DELTA_MASK) >> - NVM_ETS_LTHRES_DELTA_SHIFT); + low_thresh_delta = FIELD_GET(NVM_ETS_LTHRES_DELTA_MASK, ets_cfg); num_sensors = (ets_cfg & NVM_ETS_NUM_SENSORS_MASK);
for (i = 1; i <= num_sensors; i++) { hw->nvm.ops.read(hw, (ets_offset + i), 1, &ets_sensor); - sensor_index = ((ets_sensor & NVM_ETS_DATA_INDEX_MASK) >> - NVM_ETS_DATA_INDEX_SHIFT); - sensor_location = ((ets_sensor & NVM_ETS_DATA_LOC_MASK) >> - NVM_ETS_DATA_LOC_SHIFT); + sensor_index = FIELD_GET(NVM_ETS_DATA_INDEX_MASK, ets_sensor); + sensor_location = FIELD_GET(NVM_ETS_DATA_LOC_MASK, ets_sensor); therm_limit = ets_sensor & NVM_ETS_DATA_HTHRESH_MASK;
hw->phy.ops.write_i2c_byte(hw, diff --git a/drivers/net/ethernet/intel/igb/e1000_i210.c b/drivers/net/ethernet/intel/igb/e1000_i210.c index 53b396fd194a3..503b239868e8e 100644 --- a/drivers/net/ethernet/intel/igb/e1000_i210.c +++ b/drivers/net/ethernet/intel/igb/e1000_i210.c @@ -473,7 +473,7 @@ s32 igb_read_invm_version(struct e1000_hw *hw, /* Check if we have second version location used */ else if ((i == 1) && ((*record & E1000_INVM_VER_FIELD_TWO) == 0)) { - version = (*record & E1000_INVM_VER_FIELD_ONE) >> 3; + version = FIELD_GET(E1000_INVM_VER_FIELD_ONE, *record); status = 0; break; } @@ -483,8 +483,8 @@ s32 igb_read_invm_version(struct e1000_hw *hw, else if ((((*record & E1000_INVM_VER_FIELD_ONE) == 0) && ((*record & 0x3) == 0)) || (((*record & 0x3) != 0) && (i != 1))) { - version = (*next_record & E1000_INVM_VER_FIELD_TWO) - >> 13; + version = FIELD_GET(E1000_INVM_VER_FIELD_TWO, + *next_record); status = 0; break; } @@ -493,15 +493,15 @@ s32 igb_read_invm_version(struct e1000_hw *hw, */ else if (((*record & E1000_INVM_VER_FIELD_TWO) == 0) && ((*record & 0x3) == 0)) { - version = (*record & E1000_INVM_VER_FIELD_ONE) >> 3; + version = FIELD_GET(E1000_INVM_VER_FIELD_ONE, *record); status = 0; break; } }
if (!status) { - invm_ver->invm_major = (version & E1000_INVM_MAJOR_MASK) - >> E1000_INVM_MAJOR_SHIFT; + invm_ver->invm_major = FIELD_GET(E1000_INVM_MAJOR_MASK, + version); invm_ver->invm_minor = version & E1000_INVM_MINOR_MASK; } /* Read Image Type */ @@ -520,7 +520,8 @@ s32 igb_read_invm_version(struct e1000_hw *hw, ((*record & E1000_INVM_IMGTYPE_FIELD) == 0)) || ((((*record & 0x3) != 0) && (i != 1)))) { invm_ver->invm_img_type = - (*next_record & E1000_INVM_IMGTYPE_FIELD) >> 23; + FIELD_GET(E1000_INVM_IMGTYPE_FIELD, + *next_record); status = 0; break; } diff --git a/drivers/net/ethernet/intel/igb/e1000_mac.c b/drivers/net/ethernet/intel/igb/e1000_mac.c index caf91c6f52b4d..ceaec2cf08a43 100644 --- a/drivers/net/ethernet/intel/igb/e1000_mac.c +++ b/drivers/net/ethernet/intel/igb/e1000_mac.c @@ -56,7 +56,7 @@ s32 igb_get_bus_info_pcie(struct e1000_hw *hw) }
reg = rd32(E1000_STATUS); - bus->func = (reg & E1000_STATUS_FUNC_MASK) >> E1000_STATUS_FUNC_SHIFT; + bus->func = FIELD_GET(E1000_STATUS_FUNC_MASK, reg);
return 0; } diff --git a/drivers/net/ethernet/intel/igb/e1000_nvm.c b/drivers/net/ethernet/intel/igb/e1000_nvm.c index 0da57e89593a0..2dcd64d6dec31 100644 --- a/drivers/net/ethernet/intel/igb/e1000_nvm.c +++ b/drivers/net/ethernet/intel/igb/e1000_nvm.c @@ -708,10 +708,10 @@ void igb_get_fw_version(struct e1000_hw *hw, struct e1000_fw_version *fw_vers) */ if ((etrack_test & NVM_MAJOR_MASK) != NVM_ETRACK_VALID) { hw->nvm.ops.read(hw, NVM_VERSION, 1, &fw_version); - fw_vers->eep_major = (fw_version & NVM_MAJOR_MASK) - >> NVM_MAJOR_SHIFT; - fw_vers->eep_minor = (fw_version & NVM_MINOR_MASK) - >> NVM_MINOR_SHIFT; + fw_vers->eep_major = FIELD_GET(NVM_MAJOR_MASK, + fw_version); + fw_vers->eep_minor = FIELD_GET(NVM_MINOR_MASK, + fw_version); fw_vers->eep_build = (fw_version & NVM_IMAGE_ID_MASK); goto etrack_id; } @@ -753,15 +753,13 @@ void igb_get_fw_version(struct e1000_hw *hw, struct e1000_fw_version *fw_vers) return; } hw->nvm.ops.read(hw, NVM_VERSION, 1, &fw_version); - fw_vers->eep_major = (fw_version & NVM_MAJOR_MASK) - >> NVM_MAJOR_SHIFT; + fw_vers->eep_major = FIELD_GET(NVM_MAJOR_MASK, fw_version);
/* check for old style version format in newer images*/ if ((fw_version & NVM_NEW_DEC_MASK) == 0x0) { eeprom_verl = (fw_version & NVM_COMB_VER_MASK); } else { - eeprom_verl = (fw_version & NVM_MINOR_MASK) - >> NVM_MINOR_SHIFT; + eeprom_verl = FIELD_GET(NVM_MINOR_MASK, fw_version); } /* Convert minor value to hex before assigning to output struct * Val to be converted will not be higher than 99, per tool output diff --git a/drivers/net/ethernet/intel/igb/e1000_phy.c b/drivers/net/ethernet/intel/igb/e1000_phy.c index 3c1b562a3271c..bed94e50a6693 100644 --- a/drivers/net/ethernet/intel/igb/e1000_phy.c +++ b/drivers/net/ethernet/intel/igb/e1000_phy.c @@ -1682,8 +1682,7 @@ s32 igb_get_cable_length_m88(struct e1000_hw *hw) if (ret_val) goto out;
- index = (phy_data & M88E1000_PSSR_CABLE_LENGTH) >> - M88E1000_PSSR_CABLE_LENGTH_SHIFT; + index = FIELD_GET(M88E1000_PSSR_CABLE_LENGTH, phy_data); if (index >= ARRAY_SIZE(e1000_m88_cable_length_table) - 1) { ret_val = -E1000_ERR_PHY; goto out; @@ -1796,8 +1795,7 @@ s32 igb_get_cable_length_m88_gen2(struct e1000_hw *hw) if (ret_val) goto out;
- index = (phy_data & M88E1000_PSSR_CABLE_LENGTH) >> - M88E1000_PSSR_CABLE_LENGTH_SHIFT; + index = FIELD_GET(M88E1000_PSSR_CABLE_LENGTH, phy_data); if (index >= ARRAY_SIZE(e1000_m88_cable_length_table) - 1) { ret_val = -E1000_ERR_PHY; goto out; @@ -2578,8 +2576,7 @@ s32 igb_get_cable_length_82580(struct e1000_hw *hw) if (ret_val) goto out;
- length = (phy_data & I82580_DSTATUS_CABLE_LENGTH) >> - I82580_DSTATUS_CABLE_LENGTH_SHIFT; + length = FIELD_GET(I82580_DSTATUS_CABLE_LENGTH, phy_data);
if (length == E1000_CABLE_LENGTH_UNDEFINED) ret_val = -E1000_ERR_PHY; diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index 4ee849985e2b8..92b2be06a6e93 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -2434,7 +2434,7 @@ static int igb_get_ts_info(struct net_device *dev, } }
-#define ETHER_TYPE_FULL_MASK ((__force __be16)~0) +#define ETHER_TYPE_FULL_MASK cpu_to_be16(FIELD_MAX(U16_MAX)) static int igb_get_ethtool_nfc_entry(struct igb_adapter *adapter, struct ethtool_rxnfc *cmd) { @@ -2733,8 +2733,8 @@ static int igb_rxnfc_write_vlan_prio_filter(struct igb_adapter *adapter, u32 vlapqf;
vlapqf = rd32(E1000_VLAPQF); - vlan_priority = (ntohs(input->filter.vlan_tci) & VLAN_PRIO_MASK) - >> VLAN_PRIO_SHIFT; + vlan_priority = FIELD_GET(VLAN_PRIO_MASK, + ntohs(input->filter.vlan_tci)); queue_index = (vlapqf >> (vlan_priority * 4)) & E1000_VLAPQF_QUEUE_MASK;
/* check whether this vlan prio is already set */ @@ -2817,7 +2817,7 @@ static void igb_clear_vlan_prio_filter(struct igb_adapter *adapter, u8 vlan_priority; u32 vlapqf;
- vlan_priority = (vlan_tci & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT; + vlan_priority = FIELD_GET(VLAN_PRIO_MASK, vlan_tci);
vlapqf = rd32(E1000_VLAPQF); vlapqf &= ~E1000_VLAPQF_P_VALID(vlan_priority); diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 11921141b6079..4431e7693d45f 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -7283,7 +7283,7 @@ static int igb_set_vf_promisc(struct igb_adapter *adapter, u32 *msgbuf, u32 vf) static int igb_set_vf_multicasts(struct igb_adapter *adapter, u32 *msgbuf, u32 vf) { - int n = (msgbuf[0] & E1000_VT_MSGINFO_MASK) >> E1000_VT_MSGINFO_SHIFT; + int n = FIELD_GET(E1000_VT_MSGINFO_MASK, msgbuf[0]); u16 *hash_list = (u16 *)&msgbuf[1]; struct vf_data_storage *vf_data = &adapter->vf_data[vf]; int i; @@ -7543,7 +7543,7 @@ static int igb_ndo_set_vf_vlan(struct net_device *netdev, int vf,
static int igb_set_vf_vlan_msg(struct igb_adapter *adapter, u32 *msgbuf, u32 vf) { - int add = (msgbuf[0] & E1000_VT_MSGINFO_MASK) >> E1000_VT_MSGINFO_SHIFT; + int add = FIELD_GET(E1000_VT_MSGINFO_MASK, msgbuf[0]); int vid = (msgbuf[1] & E1000_VLVF_VLANID_MASK); int ret;
diff --git a/drivers/net/ethernet/intel/igbvf/mbx.c b/drivers/net/ethernet/intel/igbvf/mbx.c index a3cd7ac48d4b6..d15282ee5ea8f 100644 --- a/drivers/net/ethernet/intel/igbvf/mbx.c +++ b/drivers/net/ethernet/intel/igbvf/mbx.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2009 - 2018 Intel Corporation. */
+#include <linux/bitfield.h> #include "mbx.h"
/** diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c index c748668bf2fce..c5012fa36af2f 100644 --- a/drivers/net/ethernet/intel/igbvf/netdev.c +++ b/drivers/net/ethernet/intel/igbvf/netdev.c @@ -273,9 +273,8 @@ static bool igbvf_clean_rx_irq(struct igbvf_adapter *adapter, * that case, it fills the header buffer and spills the rest * into the page. */ - hlen = (le16_to_cpu(rx_desc->wb.lower.lo_dword.hs_rss.hdr_info) - & E1000_RXDADV_HDRBUFLEN_MASK) >> - E1000_RXDADV_HDRBUFLEN_SHIFT; + hlen = le16_get_bits(rx_desc->wb.lower.lo_dword.hs_rss.hdr_info, + E1000_RXDADV_HDRBUFLEN_MASK); if (hlen > adapter->rx_ps_hdr_size) hlen = adapter->rx_ps_hdr_size;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c index b2a0f2aaa05be..2e6e0365154a1 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c @@ -684,7 +684,7 @@ void ixgbe_set_lan_id_multi_port_pcie(struct ixgbe_hw *hw) u32 reg;
reg = IXGBE_READ_REG(hw, IXGBE_STATUS); - bus->func = (reg & IXGBE_STATUS_LAN_ID) >> IXGBE_STATUS_LAN_ID_SHIFT; + bus->func = FIELD_GET(IXGBE_STATUS_LAN_ID, reg); bus->lan_id = bus->func;
/* check for a port swap */ @@ -695,8 +695,8 @@ void ixgbe_set_lan_id_multi_port_pcie(struct ixgbe_hw *hw) /* Get MAC instance from EEPROM for configuring CS4227 */ if (hw->device_id == IXGBE_DEV_ID_X550EM_A_SFP) { hw->eeprom.ops.read(hw, IXGBE_EEPROM_CTRL_4, &ee_ctrl_4); - bus->instance_id = (ee_ctrl_4 & IXGBE_EE_CTRL_4_INST_ID) >> - IXGBE_EE_CTRL_4_INST_ID_SHIFT; + bus->instance_id = FIELD_GET(IXGBE_EE_CTRL_4_INST_ID, + ee_ctrl_4); } }
@@ -870,10 +870,9 @@ s32 ixgbe_init_eeprom_params_generic(struct ixgbe_hw *hw) * SPI EEPROM is assumed here. This code would need to * change if a future EEPROM is not SPI. */ - eeprom_size = (u16)((eec & IXGBE_EEC_SIZE) >> - IXGBE_EEC_SIZE_SHIFT); + eeprom_size = FIELD_GET(IXGBE_EEC_SIZE, eec); eeprom->word_size = BIT(eeprom_size + - IXGBE_EEPROM_WORD_SIZE_SHIFT); + IXGBE_EEPROM_WORD_SIZE_SHIFT); }
if (eec & IXGBE_EEC_ADDR_SIZE) @@ -3935,10 +3934,10 @@ s32 ixgbe_get_thermal_sensor_data_generic(struct ixgbe_hw *hw) if (status) return status;
- sensor_index = ((ets_sensor & IXGBE_ETS_DATA_INDEX_MASK) >> - IXGBE_ETS_DATA_INDEX_SHIFT); - sensor_location = ((ets_sensor & IXGBE_ETS_DATA_LOC_MASK) >> - IXGBE_ETS_DATA_LOC_SHIFT); + sensor_index = FIELD_GET(IXGBE_ETS_DATA_INDEX_MASK, + ets_sensor); + sensor_location = FIELD_GET(IXGBE_ETS_DATA_LOC_MASK, + ets_sensor);
if (sensor_location != 0) { status = hw->phy.ops.read_i2c_byte(hw, @@ -3982,8 +3981,7 @@ s32 ixgbe_init_thermal_sensor_thresh_generic(struct ixgbe_hw *hw) if (status) return status;
- low_thresh_delta = ((ets_cfg & IXGBE_ETS_LTHRES_DELTA_MASK) >> - IXGBE_ETS_LTHRES_DELTA_SHIFT); + low_thresh_delta = FIELD_GET(IXGBE_ETS_LTHRES_DELTA_MASK, ets_cfg); num_sensors = (ets_cfg & IXGBE_ETS_NUM_SENSORS_MASK); if (num_sensors > IXGBE_MAX_SENSORS) num_sensors = IXGBE_MAX_SENSORS; @@ -3997,10 +3995,10 @@ s32 ixgbe_init_thermal_sensor_thresh_generic(struct ixgbe_hw *hw) ets_offset + 1 + i); continue; } - sensor_index = ((ets_sensor & IXGBE_ETS_DATA_INDEX_MASK) >> - IXGBE_ETS_DATA_INDEX_SHIFT); - sensor_location = ((ets_sensor & IXGBE_ETS_DATA_LOC_MASK) >> - IXGBE_ETS_DATA_LOC_SHIFT); + sensor_index = FIELD_GET(IXGBE_ETS_DATA_INDEX_MASK, + ets_sensor); + sensor_location = FIELD_GET(IXGBE_ETS_DATA_LOC_MASK, + ets_sensor); therm_limit = ets_sensor & IXGBE_ETS_DATA_HTHRESH_MASK;
hw->phy.ops.write_i2c_byte(hw, diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index cb23aad5953b0..f245f3df40fca 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -11409,7 +11409,7 @@ static pci_ers_result_t ixgbe_io_error_detected(struct pci_dev *pdev, if ((pf_func & 1) == (pdev->devfn & 1)) { unsigned int device_id;
- vf = (req_id & 0x7F) >> 1; + vf = FIELD_GET(0x7F, req_id); e_dev_err("VF %d has caused a PCIe error\n", vf); e_dev_err("TLP: dw0: %8.8x\tdw1: %8.8x\tdw2: " "%8.8x\tdw3: %8.8x\n", diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c index 930dc50719364..f28140a05f091 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c @@ -276,9 +276,8 @@ s32 ixgbe_identify_phy_generic(struct ixgbe_hw *hw) return 0;
if (hw->phy.nw_mng_if_sel) { - phy_addr = (hw->phy.nw_mng_if_sel & - IXGBE_NW_MNG_IF_SEL_MDIO_PHY_ADD) >> - IXGBE_NW_MNG_IF_SEL_MDIO_PHY_ADD_SHIFT; + phy_addr = FIELD_GET(IXGBE_NW_MNG_IF_SEL_MDIO_PHY_ADD, + hw->phy.nw_mng_if_sel); if (ixgbe_probe_phy(hw, phy_addr)) return 0; else @@ -1447,8 +1446,7 @@ s32 ixgbe_reset_phy_nl(struct ixgbe_hw *hw) ret_val = hw->eeprom.ops.read(hw, data_offset, &eword); if (ret_val) goto err_eeprom; - control = (eword & IXGBE_CONTROL_MASK_NL) >> - IXGBE_CONTROL_SHIFT_NL; + control = FIELD_GET(IXGBE_CONTROL_MASK_NL, eword); edata = eword & IXGBE_DATA_MASK_NL; switch (control) { case IXGBE_DELAY_NL: diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c index 198ab9d97618c..d0a6c220a12ac 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c @@ -363,8 +363,7 @@ int ixgbe_pci_sriov_configure(struct pci_dev *dev, int num_vfs) static int ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf) { - int entries = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) - >> IXGBE_VT_MSGINFO_SHIFT; + int entries = FIELD_GET(IXGBE_VT_MSGINFO_MASK, msgbuf[0]); u16 *hash_list = (u16 *)&msgbuf[1]; struct vf_data_storage *vfinfo = &adapter->vfinfo[vf]; struct ixgbe_hw *hw = &adapter->hw; @@ -971,7 +970,7 @@ static int ixgbe_set_vf_mac_addr(struct ixgbe_adapter *adapter, static int ixgbe_set_vf_vlan_msg(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf) { - u32 add = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) >> IXGBE_VT_MSGINFO_SHIFT; + u32 add = FIELD_GET(IXGBE_VT_MSGINFO_MASK, msgbuf[0]); u32 vid = (msgbuf[1] & IXGBE_VLVF_VLANID_MASK); u8 tcs = adapter->hw_tcs;
@@ -994,8 +993,7 @@ static int ixgbe_set_vf_macvlan_msg(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf) { u8 *new_mac = ((u8 *)(&msgbuf[1])); - int index = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) >> - IXGBE_VT_MSGINFO_SHIFT; + int index = FIELD_GET(IXGBE_VT_MSGINFO_MASK, msgbuf[0]); int err;
if (adapter->vfinfo[vf].pf_set_mac && !adapter->vfinfo[vf].trusted && diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c index 15325c549d9b5..57a912e4653fc 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c @@ -187,16 +187,16 @@ s32 ixgbe_start_hw_X540(struct ixgbe_hw *hw) s32 ixgbe_init_eeprom_params_X540(struct ixgbe_hw *hw) { struct ixgbe_eeprom_info *eeprom = &hw->eeprom; - u32 eec; - u16 eeprom_size;
if (eeprom->type == ixgbe_eeprom_uninitialized) { + u16 eeprom_size; + u32 eec; + eeprom->semaphore_delay = 10; eeprom->type = ixgbe_flash;
eec = IXGBE_READ_REG(hw, IXGBE_EEC(hw)); - eeprom_size = (u16)((eec & IXGBE_EEC_SIZE) >> - IXGBE_EEC_SIZE_SHIFT); + eeprom_size = FIELD_GET(IXGBE_EEC_SIZE, eec); eeprom->word_size = BIT(eeprom_size + IXGBE_EEPROM_WORD_SIZE_SHIFT);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c index cdc912bba8089..c1adc94a5a657 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c @@ -630,16 +630,16 @@ static s32 ixgbe_fc_autoneg_fw(struct ixgbe_hw *hw) static s32 ixgbe_init_eeprom_params_X550(struct ixgbe_hw *hw) { struct ixgbe_eeprom_info *eeprom = &hw->eeprom; - u32 eec; - u16 eeprom_size;
if (eeprom->type == ixgbe_eeprom_uninitialized) { + u16 eeprom_size; + u32 eec; + eeprom->semaphore_delay = 10; eeprom->type = ixgbe_flash;
eec = IXGBE_READ_REG(hw, IXGBE_EEC(hw)); - eeprom_size = (u16)((eec & IXGBE_EEC_SIZE) >> - IXGBE_EEC_SIZE_SHIFT); + eeprom_size = FIELD_GET(IXGBE_EEC_SIZE, eec); eeprom->word_size = BIT(eeprom_size + IXGBE_EEPROM_WORD_SIZE_SHIFT);
@@ -714,8 +714,7 @@ static s32 ixgbe_read_iosf_sb_reg_x550(struct ixgbe_hw *hw, u32 reg_addr, ret = ixgbe_iosf_wait(hw, &command);
if ((command & IXGBE_SB_IOSF_CTRL_RESP_STAT_MASK) != 0) { - error = (command & IXGBE_SB_IOSF_CTRL_CMPL_ERR_MASK) >> - IXGBE_SB_IOSF_CTRL_CMPL_ERR_SHIFT; + error = FIELD_GET(IXGBE_SB_IOSF_CTRL_CMPL_ERR_MASK, command); hw_dbg(hw, "Failed to read, error %x\n", error); ret = -EIO; goto out; @@ -1415,8 +1414,7 @@ static s32 ixgbe_write_iosf_sb_reg_x550(struct ixgbe_hw *hw, u32 reg_addr, ret = ixgbe_iosf_wait(hw, &command);
if ((command & IXGBE_SB_IOSF_CTRL_RESP_STAT_MASK) != 0) { - error = (command & IXGBE_SB_IOSF_CTRL_CMPL_ERR_MASK) >> - IXGBE_SB_IOSF_CTRL_CMPL_ERR_SHIFT; + error = FIELD_GET(IXGBE_SB_IOSF_CTRL_CMPL_ERR_MASK, command); hw_dbg(hw, "Failed to write, error %x\n", error); return -EIO; } @@ -3229,9 +3227,8 @@ static void ixgbe_read_mng_if_sel_x550em(struct ixgbe_hw *hw) */ if (hw->mac.type == ixgbe_mac_x550em_a && hw->phy.nw_mng_if_sel & IXGBE_NW_MNG_IF_SEL_MDIO_ACT) { - hw->phy.mdio.prtad = (hw->phy.nw_mng_if_sel & - IXGBE_NW_MNG_IF_SEL_MDIO_PHY_ADD) >> - IXGBE_NW_MNG_IF_SEL_MDIO_PHY_ADD_SHIFT; + hw->phy.mdio.prtad = FIELD_GET(IXGBE_NW_MNG_IF_SEL_MDIO_PHY_ADD, + hw->phy.nw_mng_if_sel); } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vitaly Lifshits vitaly.lifshits@intel.com
[ Upstream commit 6dbdd4de0362c37e54e8b049781402e5a409e7d0 ]
On some Meteor Lake systems accessing the PHY via the MDIO interface may result in an MDI error. This issue happens sporadically and in most cases a second access to the PHY via the MDIO interface results in success.
As a workaround, introduce a retry counter which is set to 3 on Meteor Lake systems. The driver will only return an error if 3 consecutive PHY access attempts fail. The retry mechanism is disabled in specific flows, where MDI errors are expected.
Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake") Suggested-by: Nikolay Mushayev nikolay.mushayev@intel.com Co-developed-by: Nir Efrati nir.efrati@intel.com Signed-off-by: Nir Efrati nir.efrati@intel.com Signed-off-by: Vitaly Lifshits vitaly.lifshits@intel.com Tested-by: Naama Meir naamax.meir@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/e1000e/hw.h | 2 + drivers/net/ethernet/intel/e1000e/ich8lan.c | 33 ++++ drivers/net/ethernet/intel/e1000e/phy.c | 182 ++++++++++++-------- drivers/net/ethernet/intel/e1000e/phy.h | 2 + 4 files changed, 150 insertions(+), 69 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000e/hw.h b/drivers/net/ethernet/intel/e1000e/hw.h index 1fef6bb5a5fbc..4b6e7536170ab 100644 --- a/drivers/net/ethernet/intel/e1000e/hw.h +++ b/drivers/net/ethernet/intel/e1000e/hw.h @@ -628,6 +628,7 @@ struct e1000_phy_info { u32 id; u32 reset_delay_us; /* in usec */ u32 revision; + u32 retry_count;
enum e1000_media_type media_type;
@@ -644,6 +645,7 @@ struct e1000_phy_info { bool polarity_correction; bool speed_downgraded; bool autoneg_wait_to_complete; + bool retry_enabled; };
struct e1000_nvm_info { diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c index a2788fd5f8bb8..717c52913e84b 100644 --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c @@ -222,11 +222,18 @@ static bool e1000_phy_is_accessible_pchlan(struct e1000_hw *hw) if (hw->mac.type >= e1000_pch_lpt) { /* Only unforce SMBus if ME is not active */ if (!(er32(FWSM) & E1000_ICH_FWSM_FW_VALID)) { + /* Switching PHY interface always returns MDI error + * so disable retry mechanism to avoid wasting time + */ + e1000e_disable_phy_retry(hw); + /* Unforce SMBus mode in PHY */ e1e_rphy_locked(hw, CV_SMB_CTRL, &phy_reg); phy_reg &= ~CV_SMB_CTRL_FORCE_SMBUS; e1e_wphy_locked(hw, CV_SMB_CTRL, phy_reg);
+ e1000e_enable_phy_retry(hw); + /* Unforce SMBus mode in MAC */ mac_reg = er32(CTRL_EXT); mac_reg &= ~E1000_CTRL_EXT_FORCE_SMBUS; @@ -310,6 +317,11 @@ static s32 e1000_init_phy_workarounds_pchlan(struct e1000_hw *hw) goto out; }
+ /* There is no guarantee that the PHY is accessible at this time + * so disable retry mechanism to avoid wasting time + */ + e1000e_disable_phy_retry(hw); + /* The MAC-PHY interconnect may be in SMBus mode. If the PHY is * inaccessible and resetting the PHY is not blocked, toggle the * LANPHYPC Value bit to force the interconnect to PCIe mode. @@ -380,6 +392,8 @@ static s32 e1000_init_phy_workarounds_pchlan(struct e1000_hw *hw) break; }
+ e1000e_enable_phy_retry(hw); + hw->phy.ops.release(hw); if (!ret_val) {
@@ -449,6 +463,11 @@ static s32 e1000_init_phy_params_pchlan(struct e1000_hw *hw)
phy->id = e1000_phy_unknown;
+ if (hw->mac.type == e1000_pch_mtp) { + phy->retry_count = 2; + e1000e_enable_phy_retry(hw); + } + ret_val = e1000_init_phy_workarounds_pchlan(hw); if (ret_val) return ret_val; @@ -1146,6 +1165,11 @@ s32 e1000_enable_ulp_lpt_lp(struct e1000_hw *hw, bool to_sx) if (ret_val) goto out;
+ /* Switching PHY interface always returns MDI error + * so disable retry mechanism to avoid wasting time + */ + e1000e_disable_phy_retry(hw); + /* Force SMBus mode in PHY */ ret_val = e1000_read_phy_reg_hv_locked(hw, CV_SMB_CTRL, &phy_reg); if (ret_val) @@ -1153,6 +1177,8 @@ s32 e1000_enable_ulp_lpt_lp(struct e1000_hw *hw, bool to_sx) phy_reg |= CV_SMB_CTRL_FORCE_SMBUS; e1000_write_phy_reg_hv_locked(hw, CV_SMB_CTRL, phy_reg);
+ e1000e_enable_phy_retry(hw); + /* Force SMBus mode in MAC */ mac_reg = er32(CTRL_EXT); mac_reg |= E1000_CTRL_EXT_FORCE_SMBUS; @@ -1313,6 +1339,11 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, bool force) /* Toggle LANPHYPC Value bit */ e1000_toggle_lanphypc_pch_lpt(hw);
+ /* Switching PHY interface always returns MDI error + * so disable retry mechanism to avoid wasting time + */ + e1000e_disable_phy_retry(hw); + /* Unforce SMBus mode in PHY */ ret_val = e1000_read_phy_reg_hv_locked(hw, CV_SMB_CTRL, &phy_reg); if (ret_val) { @@ -1333,6 +1364,8 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, bool force) phy_reg &= ~CV_SMB_CTRL_FORCE_SMBUS; e1000_write_phy_reg_hv_locked(hw, CV_SMB_CTRL, phy_reg);
+ e1000e_enable_phy_retry(hw); + /* Unforce SMBus mode in MAC */ mac_reg = er32(CTRL_EXT); mac_reg &= ~E1000_CTRL_EXT_FORCE_SMBUS; diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c index 96ff0ca561b6c..395746bcf8f7c 100644 --- a/drivers/net/ethernet/intel/e1000e/phy.c +++ b/drivers/net/ethernet/intel/e1000e/phy.c @@ -107,6 +107,16 @@ s32 e1000e_phy_reset_dsp(struct e1000_hw *hw) return e1e_wphy(hw, M88E1000_PHY_GEN_CONTROL, 0); }
+void e1000e_disable_phy_retry(struct e1000_hw *hw) +{ + hw->phy.retry_enabled = false; +} + +void e1000e_enable_phy_retry(struct e1000_hw *hw) +{ + hw->phy.retry_enabled = true; +} + /** * e1000e_read_phy_reg_mdic - Read MDI control register * @hw: pointer to the HW structure @@ -118,55 +128,73 @@ s32 e1000e_phy_reset_dsp(struct e1000_hw *hw) **/ s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data) { + u32 i, mdic = 0, retry_counter, retry_max; struct e1000_phy_info *phy = &hw->phy; - u32 i, mdic = 0; + bool success;
if (offset > MAX_PHY_REG_ADDRESS) { e_dbg("PHY Address %d is out of range\n", offset); return -E1000_ERR_PARAM; }
+ retry_max = phy->retry_enabled ? phy->retry_count : 0; + /* Set up Op-code, Phy Address, and register offset in the MDI * Control register. The MAC will take care of interfacing with the * PHY to retrieve the desired data. */ - mdic = ((offset << E1000_MDIC_REG_SHIFT) | - (phy->addr << E1000_MDIC_PHY_SHIFT) | - (E1000_MDIC_OP_READ)); + for (retry_counter = 0; retry_counter <= retry_max; retry_counter++) { + success = true;
- ew32(MDIC, mdic); + mdic = ((offset << E1000_MDIC_REG_SHIFT) | + (phy->addr << E1000_MDIC_PHY_SHIFT) | + (E1000_MDIC_OP_READ));
- /* Poll the ready bit to see if the MDI read completed - * Increasing the time out as testing showed failures with - * the lower time out - */ - for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) { - udelay(50); - mdic = er32(MDIC); - if (mdic & E1000_MDIC_READY) - break; - } - if (!(mdic & E1000_MDIC_READY)) { - e_dbg("MDI Read PHY Reg Address %d did not complete\n", offset); - return -E1000_ERR_PHY; - } - if (mdic & E1000_MDIC_ERROR) { - e_dbg("MDI Read PHY Reg Address %d Error\n", offset); - return -E1000_ERR_PHY; - } - if (FIELD_GET(E1000_MDIC_REG_MASK, mdic) != offset) { - e_dbg("MDI Read offset error - requested %d, returned %d\n", - offset, FIELD_GET(E1000_MDIC_REG_MASK, mdic)); - return -E1000_ERR_PHY; + ew32(MDIC, mdic); + + /* Poll the ready bit to see if the MDI read completed + * Increasing the time out as testing showed failures with + * the lower time out + */ + for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) { + usleep_range(50, 60); + mdic = er32(MDIC); + if (mdic & E1000_MDIC_READY) + break; + } + if (!(mdic & E1000_MDIC_READY)) { + e_dbg("MDI Read PHY Reg Address %d did not complete\n", + offset); + success = false; + } + if (mdic & E1000_MDIC_ERROR) { + e_dbg("MDI Read PHY Reg Address %d Error\n", offset); + success = false; + } + if (FIELD_GET(E1000_MDIC_REG_MASK, mdic) != offset) { + e_dbg("MDI Read offset error - requested %d, returned %d\n", + offset, FIELD_GET(E1000_MDIC_REG_MASK, mdic)); + success = false; + } + + /* Allow some time after each MDIC transaction to avoid + * reading duplicate data in the next MDIC transaction. + */ + if (hw->mac.type == e1000_pch2lan) + usleep_range(100, 150); + + if (success) { + *data = (u16)mdic; + return 0; + } + + if (retry_counter != retry_max) { + e_dbg("Perform retry on PHY transaction...\n"); + mdelay(10); + } } - *data = (u16)mdic;
- /* Allow some time after each MDIC transaction to avoid - * reading duplicate data in the next MDIC transaction. - */ - if (hw->mac.type == e1000_pch2lan) - udelay(100); - return 0; + return -E1000_ERR_PHY; }
/** @@ -179,56 +207,72 @@ s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data) **/ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data) { + u32 i, mdic = 0, retry_counter, retry_max; struct e1000_phy_info *phy = &hw->phy; - u32 i, mdic = 0; + bool success;
if (offset > MAX_PHY_REG_ADDRESS) { e_dbg("PHY Address %d is out of range\n", offset); return -E1000_ERR_PARAM; }
+ retry_max = phy->retry_enabled ? phy->retry_count : 0; + /* Set up Op-code, Phy Address, and register offset in the MDI * Control register. The MAC will take care of interfacing with the * PHY to retrieve the desired data. */ - mdic = (((u32)data) | - (offset << E1000_MDIC_REG_SHIFT) | - (phy->addr << E1000_MDIC_PHY_SHIFT) | - (E1000_MDIC_OP_WRITE)); + for (retry_counter = 0; retry_counter <= retry_max; retry_counter++) { + success = true;
- ew32(MDIC, mdic); + mdic = (((u32)data) | + (offset << E1000_MDIC_REG_SHIFT) | + (phy->addr << E1000_MDIC_PHY_SHIFT) | + (E1000_MDIC_OP_WRITE));
- /* Poll the ready bit to see if the MDI read completed - * Increasing the time out as testing showed failures with - * the lower time out - */ - for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) { - udelay(50); - mdic = er32(MDIC); - if (mdic & E1000_MDIC_READY) - break; - } - if (!(mdic & E1000_MDIC_READY)) { - e_dbg("MDI Write PHY Reg Address %d did not complete\n", offset); - return -E1000_ERR_PHY; - } - if (mdic & E1000_MDIC_ERROR) { - e_dbg("MDI Write PHY Red Address %d Error\n", offset); - return -E1000_ERR_PHY; - } - if (FIELD_GET(E1000_MDIC_REG_MASK, mdic) != offset) { - e_dbg("MDI Write offset error - requested %d, returned %d\n", - offset, FIELD_GET(E1000_MDIC_REG_MASK, mdic)); - return -E1000_ERR_PHY; - } + ew32(MDIC, mdic);
- /* Allow some time after each MDIC transaction to avoid - * reading duplicate data in the next MDIC transaction. - */ - if (hw->mac.type == e1000_pch2lan) - udelay(100); + /* Poll the ready bit to see if the MDI read completed + * Increasing the time out as testing showed failures with + * the lower time out + */ + for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) { + usleep_range(50, 60); + mdic = er32(MDIC); + if (mdic & E1000_MDIC_READY) + break; + } + if (!(mdic & E1000_MDIC_READY)) { + e_dbg("MDI Write PHY Reg Address %d did not complete\n", + offset); + success = false; + } + if (mdic & E1000_MDIC_ERROR) { + e_dbg("MDI Write PHY Reg Address %d Error\n", offset); + success = false; + } + if (FIELD_GET(E1000_MDIC_REG_MASK, mdic) != offset) { + e_dbg("MDI Write offset error - requested %d, returned %d\n", + offset, FIELD_GET(E1000_MDIC_REG_MASK, mdic)); + success = false; + }
- return 0; + /* Allow some time after each MDIC transaction to avoid + * reading duplicate data in the next MDIC transaction. + */ + if (hw->mac.type == e1000_pch2lan) + usleep_range(100, 150); + + if (success) + return 0; + + if (retry_counter != retry_max) { + e_dbg("Perform retry on PHY transaction...\n"); + mdelay(10); + } + } + + return -E1000_ERR_PHY; }
/** diff --git a/drivers/net/ethernet/intel/e1000e/phy.h b/drivers/net/ethernet/intel/e1000e/phy.h index c48777d095235..049bb325b4b14 100644 --- a/drivers/net/ethernet/intel/e1000e/phy.h +++ b/drivers/net/ethernet/intel/e1000e/phy.h @@ -51,6 +51,8 @@ s32 e1000e_read_phy_reg_bm2(struct e1000_hw *hw, u32 offset, u16 *data); s32 e1000e_write_phy_reg_bm2(struct e1000_hw *hw, u32 offset, u16 data); void e1000_power_up_phy_copper(struct e1000_hw *hw); void e1000_power_down_phy_copper(struct e1000_hw *hw); +void e1000e_disable_phy_retry(struct e1000_hw *hw); +void e1000e_enable_phy_retry(struct e1000_hw *hw); s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data); s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data); s32 e1000_read_phy_reg_hv(struct e1000_hw *hw, u32 offset, u16 *data);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vitaly Lifshits vitaly.lifshits@intel.com
[ Upstream commit 662200e324daebe6859c1f0f3ea1538b0561425a ]
Add curly braces to avoid entering to an if statement where it is not always required in e1000_shutdown function. This improves code readability and might prevent non-deterministic behaviour in the future.
Signed-off-by: Vitaly Lifshits vitaly.lifshits@intel.com Tested-by: Naama Meir naamax.meir@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Link: https://lore.kernel.org/r/20240301184806.2634508-5-anthony.l.nguyen@intel.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 861e8086029e ("e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/e1000e/netdev.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index af5d9d97a0d6c..cc8c531ec3dff 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -6688,14 +6688,14 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime) if (adapter->hw.phy.type == e1000_phy_igp_3) { e1000e_igp3_phy_powerdown_workaround_ich8lan(&adapter->hw); } else if (hw->mac.type >= e1000_pch_lpt) { - if (wufc && !(wufc & (E1000_WUFC_EX | E1000_WUFC_MC | E1000_WUFC_BC))) + if (wufc && !(wufc & (E1000_WUFC_EX | E1000_WUFC_MC | E1000_WUFC_BC))) { /* ULP does not support wake from unicast, multicast * or broadcast. */ retval = e1000_enable_ulp_lpt_lp(hw, !runtime); - - if (retval) - return retval; + if (retval) + return retval; + } }
/* Ensure that the appropriate bits are set in LPI_CTRL
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vitaly Lifshits vitaly.lifshits@intel.com
[ Upstream commit 861e8086029e003305750b4126ecd6617465f5c7 ]
Forcing SMBUS inside the ULP enabling flow leads to sporadic PHY loss on some systems. It is suspected to be caused by initiating PHY transactions before the interface settles.
Separating this configuration from the ULP enabling flow and moving it to the shutdown function allows enough time for the interface to settle and avoids adding a delay.
Fixes: 6607c99e7034 ("e1000e: i219 - fix to enable both ULP and EEE in Sx state") Co-developed-by: Dima Ruinskiy dima.ruinskiy@intel.com Signed-off-by: Dima Ruinskiy dima.ruinskiy@intel.com Signed-off-by: Vitaly Lifshits vitaly.lifshits@intel.com Tested-by: Naama Meir naamax.meir@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/e1000e/ich8lan.c | 19 ------------------- drivers/net/ethernet/intel/e1000e/netdev.c | 18 ++++++++++++++++++ 2 files changed, 18 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c index 717c52913e84b..4d83c9a0c023a 100644 --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c @@ -1165,25 +1165,6 @@ s32 e1000_enable_ulp_lpt_lp(struct e1000_hw *hw, bool to_sx) if (ret_val) goto out;
- /* Switching PHY interface always returns MDI error - * so disable retry mechanism to avoid wasting time - */ - e1000e_disable_phy_retry(hw); - - /* Force SMBus mode in PHY */ - ret_val = e1000_read_phy_reg_hv_locked(hw, CV_SMB_CTRL, &phy_reg); - if (ret_val) - goto release; - phy_reg |= CV_SMB_CTRL_FORCE_SMBUS; - e1000_write_phy_reg_hv_locked(hw, CV_SMB_CTRL, phy_reg); - - e1000e_enable_phy_retry(hw); - - /* Force SMBus mode in MAC */ - mac_reg = er32(CTRL_EXT); - mac_reg |= E1000_CTRL_EXT_FORCE_SMBUS; - ew32(CTRL_EXT, mac_reg); - /* Si workaround for ULP entry flow on i127/rev6 h/w. Enable * LPLU and disable Gig speed when entering ULP */ diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index cc8c531ec3dff..3692fce201959 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -6623,6 +6623,7 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime) struct e1000_hw *hw = &adapter->hw; u32 ctrl, ctrl_ext, rctl, status, wufc; int retval = 0; + u16 smb_ctrl;
/* Runtime suspend should only enable wakeup for link changes */ if (runtime) @@ -6696,6 +6697,23 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime) if (retval) return retval; } + + /* Force SMBUS to allow WOL */ + /* Switching PHY interface always returns MDI error + * so disable retry mechanism to avoid wasting time + */ + e1000e_disable_phy_retry(hw); + + e1e_rphy(hw, CV_SMB_CTRL, &smb_ctrl); + smb_ctrl |= CV_SMB_CTRL_FORCE_SMBUS; + e1e_wphy(hw, CV_SMB_CTRL, smb_ctrl); + + e1000e_enable_phy_retry(hw); + + /* Force SMBus mode in MAC */ + ctrl_ext = er32(CTRL_EXT); + ctrl_ext |= E1000_CTRL_EXT_FORCE_SMBUS; + ew32(CTRL_EXT, ctrl_ext); }
/* Ensure that the appropriate bits are set in LPI_CTRL
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Claudiu Beznea claudiu.beznea.uj@bp.renesas.com
[ Upstream commit 2b993bfdb47b3aaafd8fe9cd5038b5e297b18ee1 ]
ravb_poll() initial code used to interrogate the first descriptor of the RX queue in case gPTP is false to determine if ravb_rx() should be called. This is done for non-gPTP IPs. For gPTP IPs the driver PTP-specific information was used to determine if receive function should be called. As every IP has its own receive function that interrogates the RX descriptors list in the same way the ravb_poll() was doing there is no need to double check this in ravb_poll(). Removing the code from ravb_poll() leads to a cleaner code.
Signed-off-by: Claudiu Beznea claudiu.beznea.uj@bp.renesas.com Reviewed-by: Sergey Shtylyov s.shtylyov@omp.ru Signed-off-by: Paolo Abeni pabeni@redhat.com Stable-dep-of: 596a4254915f ("net: ravb: Always process TX descriptor ring") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/ravb_main.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 8fec0dbbbe7bb..b87e9252ea176 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -1288,25 +1288,16 @@ static int ravb_poll(struct napi_struct *napi, int budget) struct net_device *ndev = napi->dev; struct ravb_private *priv = netdev_priv(ndev); const struct ravb_hw_info *info = priv->info; - bool gptp = info->gptp || info->ccc_gac; - struct ravb_rx_desc *desc; unsigned long flags; int q = napi - priv->napi; int mask = BIT(q); int quota = budget; - unsigned int entry;
- if (!gptp) { - entry = priv->cur_rx[q] % priv->num_rx_ring[q]; - desc = &priv->gbeth_rx_ring[entry]; - } /* Processing RX Descriptor Ring */ /* Clear RX interrupt */ ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0); - if (gptp || desc->die_dt != DT_FEMPTY) { - if (ravb_rx(ndev, "a, q)) - goto out; - } + if (ravb_rx(ndev, "a, q)) + goto out;
/* Processing TX Descriptor Ring */ spin_lock_irqsave(&priv->lock, flags);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Barker paul.barker.ct@bp.renesas.com
[ Upstream commit 596a4254915f94c927217fe09c33a6828f33fb25 ]
The TX queue should be serviced each time the poll function is called, even if the full RX work budget has been consumed. This prevents starvation of the TX queue when RX bandwidth usage is high.
Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Paul Barker paul.barker.ct@bp.renesas.com Reviewed-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20240402145305.82148-1-paul.barker.ct@bp.renesas.c... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/ravb_main.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index b87e9252ea176..14595d23b1903 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -1292,12 +1292,12 @@ static int ravb_poll(struct napi_struct *napi, int budget) int q = napi - priv->napi; int mask = BIT(q); int quota = budget; + bool unmask;
/* Processing RX Descriptor Ring */ /* Clear RX interrupt */ ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0); - if (ravb_rx(ndev, "a, q)) - goto out; + unmask = !ravb_rx(ndev, "a, q);
/* Processing TX Descriptor Ring */ spin_lock_irqsave(&priv->lock, flags); @@ -1307,6 +1307,9 @@ static int ravb_poll(struct napi_struct *napi, int budget) netif_wake_subqueue(ndev, q); spin_unlock_irqrestore(&priv->lock, flags);
+ if (!unmask) + goto out; + napi_complete(napi);
/* Re-enable RX/TX interrupts */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Barker paul.barker.ct@bp.renesas.com
[ Upstream commit 101b76418d7163240bc74a7e06867dca0e51183e ]
The error statistics should be updated each time the poll function is called, even if the full RX work budget has been consumed. This prevents the counts from becoming stuck when RX bandwidth usage is high.
This also ensures that error counters are not updated after we've re-enabled interrupts as that could result in a race condition.
Also drop an unnecessary space.
Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Paul Barker paul.barker.ct@bp.renesas.com Reviewed-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20240402145305.82148-2-paul.barker.ct@bp.renesas.c... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/ravb_main.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 14595d23b1903..c6897e6ea362d 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -1307,6 +1307,15 @@ static int ravb_poll(struct napi_struct *napi, int budget) netif_wake_subqueue(ndev, q); spin_unlock_irqrestore(&priv->lock, flags);
+ /* Receive error message handling */ + priv->rx_over_errors = priv->stats[RAVB_BE].rx_over_errors; + if (info->nc_queues) + priv->rx_over_errors += priv->stats[RAVB_NC].rx_over_errors; + if (priv->rx_over_errors != ndev->stats.rx_over_errors) + ndev->stats.rx_over_errors = priv->rx_over_errors; + if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors) + ndev->stats.rx_fifo_errors = priv->rx_fifo_errors; + if (!unmask) goto out;
@@ -1323,14 +1332,6 @@ static int ravb_poll(struct napi_struct *napi, int budget) } spin_unlock_irqrestore(&priv->lock, flags);
- /* Receive error message handling */ - priv->rx_over_errors = priv->stats[RAVB_BE].rx_over_errors; - if (info->nc_queues) - priv->rx_over_errors += priv->stats[RAVB_NC].rx_over_errors; - if (priv->rx_over_errors != ndev->stats.rx_over_errors) - ndev->stats.rx_over_errors = priv->rx_over_errors; - if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors) - ndev->stats.rx_fifo_errors = priv->rx_fifo_errors; out: return budget - quota; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
[ Upstream commit 466eec4a22a76c462781bf6d45cb02cbedf21a61 ]
Convert all local ASID variables and parameters throughout the SEV code from signed integers to unsigned integers. As ASIDs are fundamentally unsigned values, and the global min/max variables are appropriately unsigned integers, too.
Functionally, this is a glorified nop as KVM guarantees min_sev_asid is non-zero, and no CPU supports -1u as the _only_ asid, i.e. the signed vs. unsigned goof won't cause problems in practice.
Opportunistically use sev_get_asid() in sev_flush_encrypted_page() instead of open coding an equivalent.
Reviewed-by: Tom Lendacky thomas.lendacky@amd.com Link: https://lore.kernel.org/r/20240131235609.4161407-3-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Stable-dep-of: 0aa6b90ef9d7 ("KVM: SVM: Add support for allowing zero SEV ASIDs") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/svm/sev.c | 18 ++++++++++-------- arch/x86/kvm/trace.h | 10 +++++----- 2 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index e86231c3b8a54..ea68a08cc89c2 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -84,9 +84,10 @@ struct enc_region { };
/* Called with the sev_bitmap_lock held, or on shutdown */ -static int sev_flush_asids(int min_asid, int max_asid) +static int sev_flush_asids(unsigned int min_asid, unsigned int max_asid) { - int ret, asid, error = 0; + int ret, error = 0; + unsigned int asid;
/* Check if there are any ASIDs to reclaim before performing a flush */ asid = find_next_bit(sev_reclaim_asid_bitmap, nr_asids, min_asid); @@ -116,7 +117,7 @@ static inline bool is_mirroring_enc_context(struct kvm *kvm) }
/* Must be called with the sev_bitmap_lock held */ -static bool __sev_recycle_asids(int min_asid, int max_asid) +static bool __sev_recycle_asids(unsigned int min_asid, unsigned int max_asid) { if (sev_flush_asids(min_asid, max_asid)) return false; @@ -143,8 +144,9 @@ static void sev_misc_cg_uncharge(struct kvm_sev_info *sev)
static int sev_asid_new(struct kvm_sev_info *sev) { - int asid, min_asid, max_asid, ret; + unsigned int asid, min_asid, max_asid; bool retry = true; + int ret;
WARN_ON(sev->misc_cg); sev->misc_cg = get_current_misc_cg(); @@ -187,7 +189,7 @@ static int sev_asid_new(struct kvm_sev_info *sev) return ret; }
-static int sev_get_asid(struct kvm *kvm) +static unsigned int sev_get_asid(struct kvm *kvm) { struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -284,8 +286,8 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
static int sev_bind_asid(struct kvm *kvm, unsigned int handle, int *error) { + unsigned int asid = sev_get_asid(kvm); struct sev_data_activate activate; - int asid = sev_get_asid(kvm); int ret;
/* activate ASID on the given handle */ @@ -2314,7 +2316,7 @@ int sev_cpu_init(struct svm_cpu_data *sd) */ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va) { - int asid = to_kvm_svm(vcpu->kvm)->sev_info.asid; + unsigned int asid = sev_get_asid(vcpu->kvm);
/* * Note! The address must be a kernel address, as regular page walk @@ -2632,7 +2634,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm) void pre_sev_run(struct vcpu_svm *svm, int cpu) { struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu); - int asid = sev_get_asid(svm->vcpu.kvm); + unsigned int asid = sev_get_asid(svm->vcpu.kvm);
/* Assign the asid allocated with this SEV guest */ svm->asid = asid; diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 83843379813ee..b82e6ed4f0241 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -732,13 +732,13 @@ TRACE_EVENT(kvm_nested_intr_vmexit, * Tracepoint for nested #vmexit because of interrupt pending */ TRACE_EVENT(kvm_invlpga, - TP_PROTO(__u64 rip, int asid, u64 address), + TP_PROTO(__u64 rip, unsigned int asid, u64 address), TP_ARGS(rip, asid, address),
TP_STRUCT__entry( - __field( __u64, rip ) - __field( int, asid ) - __field( __u64, address ) + __field( __u64, rip ) + __field( unsigned int, asid ) + __field( __u64, address ) ),
TP_fast_assign( @@ -747,7 +747,7 @@ TRACE_EVENT(kvm_invlpga, __entry->address = address; ),
- TP_printk("rip: 0x%016llx asid: %d address: 0x%016llx", + TP_printk("rip: 0x%016llx asid: %u address: 0x%016llx", __entry->rip, __entry->asid, __entry->address) );
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ashish Kalra ashish.kalra@amd.com
[ Upstream commit 0aa6b90ef9d75b4bd7b6d106d85f2a3437697f91 ]
Some BIOSes allow the end user to set the minimum SEV ASID value (CPUID 0x8000001F_EDX) to be greater than the maximum number of encrypted guests, or maximum SEV ASID value (CPUID 0x8000001F_ECX) in order to dedicate all the SEV ASIDs to SEV-ES or SEV-SNP.
The SEV support, as coded, does not handle the case where the minimum SEV ASID value can be greater than the maximum SEV ASID value. As a result, the following confusing message is issued:
[ 30.715724] kvm_amd: SEV enabled (ASIDs 1007 - 1006)
Fix the support to properly handle this case.
Fixes: 916391a2d1dc ("KVM: SVM: Add support for SEV-ES capability in KVM") Suggested-by: Sean Christopherson seanjc@google.com Signed-off-by: Ashish Kalra ashish.kalra@amd.com Cc: stable@vger.kernel.org Acked-by: Tom Lendacky thomas.lendacky@amd.com Link: https://lore.kernel.org/r/20240104190520.62510-1-Ashish.Kalra@amd.com Link: https://lore.kernel.org/r/20240131235609.4161407-4-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/svm/sev.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index ea68a08cc89c2..c5845f31c34dc 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -144,10 +144,21 @@ static void sev_misc_cg_uncharge(struct kvm_sev_info *sev)
static int sev_asid_new(struct kvm_sev_info *sev) { - unsigned int asid, min_asid, max_asid; + /* + * SEV-enabled guests must use asid from min_sev_asid to max_sev_asid. + * SEV-ES-enabled guest can use from 1 to min_sev_asid - 1. + * Note: min ASID can end up larger than the max if basic SEV support is + * effectively disabled by disallowing use of ASIDs for SEV guests. + */ + unsigned int min_asid = sev->es_active ? 1 : min_sev_asid; + unsigned int max_asid = sev->es_active ? min_sev_asid - 1 : max_sev_asid; + unsigned int asid; bool retry = true; int ret;
+ if (min_asid > max_asid) + return -ENOTTY; + WARN_ON(sev->misc_cg); sev->misc_cg = get_current_misc_cg(); ret = sev_misc_cg_try_charge(sev); @@ -159,12 +170,6 @@ static int sev_asid_new(struct kvm_sev_info *sev)
mutex_lock(&sev_bitmap_lock);
- /* - * SEV-enabled guests must use asid from min_sev_asid to max_sev_asid. - * SEV-ES-enabled guest can use from 1 to min_sev_asid - 1. - */ - min_asid = sev->es_active ? 1 : min_sev_asid; - max_asid = sev->es_active ? min_sev_asid - 1 : max_sev_asid; again: asid = find_next_zero_bit(sev_asid_bitmap, max_asid + 1, min_asid); if (asid > max_asid) { @@ -2236,8 +2241,10 @@ void __init sev_hardware_setup(void) goto out; }
- sev_asid_count = max_sev_asid - min_sev_asid + 1; - WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV, sev_asid_count)); + if (min_sev_asid <= max_sev_asid) { + sev_asid_count = max_sev_asid - min_sev_asid + 1; + WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV, sev_asid_count)); + } sev_supported = true;
/* SEV-ES support requested? */ @@ -2268,7 +2275,9 @@ void __init sev_hardware_setup(void) out: if (boot_cpu_has(X86_FEATURE_SEV)) pr_info("SEV %s (ASIDs %u - %u)\n", - sev_supported ? "enabled" : "disabled", + sev_supported ? min_sev_asid <= max_sev_asid ? "enabled" : + "unusable" : + "disabled", min_sev_asid, max_sev_asid); if (boot_cpu_has(X86_FEATURE_SEV_ES)) pr_info("SEV-ES %s (ASIDs %u - %u)\n",
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
[ Upstream commit 055ca83559912f2cfd91c9441427bac4caf3c74e ]
When you try to splice between a normal pipe and a notification pipe, get_pipe_info(..., true) fails, so splice() falls back to treating the notification pipe like a normal pipe - so we end up in iter_file_splice_write(), which first locks the input pipe, then calls vfs_iter_write(), which locks the output pipe.
Lockdep complains about that, because we're taking a pipe lock while already holding another pipe lock.
I think this probably (?) can't actually lead to deadlocks, since you'd need another way to nest locking a normal pipe into locking a watch_queue pipe, but the lockdep annotations don't make that clear.
Bail out earlier in pipe_write() for notification pipes, before taking the pipe lock.
Reported-and-tested-by: syzbot+011e4ea1da6692cf881c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=011e4ea1da6692cf881c Fixes: c73be61cede5 ("pipe: Add general notification queue support") Signed-off-by: Jann Horn jannh@google.com Link: https://lore.kernel.org/r/20231124150822.2121798-1-jannh@google.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/pipe.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/fs/pipe.c b/fs/pipe.c index a234035cc375d..ba4376341ddd2 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -425,6 +425,18 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) bool was_empty = false; bool wake_next_writer = false;
+ /* + * Reject writing to watch queue pipes before the point where we lock + * the pipe. + * Otherwise, lockdep would be unhappy if the caller already has another + * pipe locked. + * If we had to support locking a normal pipe and a notification pipe at + * the same time, we could set up lockdep annotations for that, but + * since we don't actually need that, it's simpler to just bail here. + */ + if (pipe_has_watch_queue(pipe)) + return -EXDEV; + /* Null write succeeds. */ if (unlikely(total_len == 0)) return 0; @@ -437,11 +449,6 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) goto out; }
- if (pipe_has_watch_queue(pipe)) { - ret = -EXDEV; - goto out; - } - /* * If it wasn't empty we try to merge new data into * the last buffer.
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dominique Martinet asmadeus@codewreck.org
[ Upstream commit be3193e58ec210b2a72fb1134c2a0695088a911d ]
Previous conversion to iov missed these debug statements which would now always print the requested size instead of the actual server reply.
Write also added a loop in a much older commit but we didn't report these, while reads do report each iteration -- it's more coherent to keep reporting all requests to server so move that at the same time.
Fixes: 7f02464739da ("9p: convert to advancing variant of iov_iter_get_pages_alloc()") Signed-off-by: Dominique Martinet asmadeus@codewreck.org Message-ID: 20240109-9p-rw-trace-v1-1-327178114257@codewreck.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/client.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/9p/client.c b/net/9p/client.c index e265a0ca6bddd..f7e90b4769bba 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -1583,7 +1583,7 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to, received = rsize; }
- p9_debug(P9_DEBUG_9P, "<<< RREAD count %d\n", count); + p9_debug(P9_DEBUG_9P, "<<< RREAD count %d\n", received);
if (non_zc) { int n = copy_to_iter(dataptr, received, to); @@ -1609,9 +1609,6 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err) int total = 0; *err = 0;
- p9_debug(P9_DEBUG_9P, ">>> TWRITE fid %d offset %llu count %zd\n", - fid->fid, offset, iov_iter_count(from)); - while (iov_iter_count(from)) { int count = iov_iter_count(from); int rsize = fid->iounit; @@ -1623,6 +1620,9 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err) if (count < rsize) rsize = count;
+ p9_debug(P9_DEBUG_9P, ">>> TWRITE fid %d offset %llu count %d (/%d)\n", + fid->fid, offset, rsize, count); + /* Don't bother zerocopy for small IO (< 1024) */ if (clnt->trans_mod->zc_request && rsize > 1024) { req = p9_client_zc_rpc(clnt, P9_TWRITE, NULL, from, 0, @@ -1650,7 +1650,7 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err) written = rsize; }
- p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count); + p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", written);
p9_req_put(clnt, req); iov_iter_revert(from, count - written - iov_iter_count(from));
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit f193957b0fbbba397c8bddedf158b3bf7e4850fc ]
wm_adsp_write_ctl() must hold the pwr_lock mutex when calling cs_dsp_get_ctl().
This was previously partially fixed by commit 781118bc2fc1 ("ASoC: wm_adsp: Fix missing locking in wm_adsp_[read|write]_ctl()") but this only put locking around the call to cs_dsp_coeff_write_ctrl(), missing the call to cs_dsp_get_ctl().
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 781118bc2fc1 ("ASoC: wm_adsp: Fix missing locking in wm_adsp_[read|write]_ctl()") Link: https://msgid.link/r/20240307110227.41421-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/wm_adsp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sound/soc/codecs/wm_adsp.c b/sound/soc/codecs/wm_adsp.c index 72b90a7ee4b68..b9c20e29fe63e 100644 --- a/sound/soc/codecs/wm_adsp.c +++ b/sound/soc/codecs/wm_adsp.c @@ -683,11 +683,12 @@ static void wm_adsp_control_remove(struct cs_dsp_coeff_ctl *cs_ctl) int wm_adsp_write_ctl(struct wm_adsp *dsp, const char *name, int type, unsigned int alg, void *buf, size_t len) { - struct cs_dsp_coeff_ctl *cs_ctl = cs_dsp_get_ctl(&dsp->cs_dsp, name, type, alg); + struct cs_dsp_coeff_ctl *cs_ctl; struct wm_coeff_ctl *ctl; int ret;
mutex_lock(&dsp->cs_dsp.pwr_lock); + cs_ctl = cs_dsp_get_ctl(&dsp->cs_dsp, name, type, alg); ret = cs_dsp_coeff_write_ctrl(cs_ctl, 0, buf, len); mutex_unlock(&dsp->cs_dsp.pwr_lock);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pu Lehui pulehui@huawei.com
[ Upstream commit ea6873118493019474abbf57d5a800da365734df ]
RISC-V perf driver does not yet support branch sampling. Although the specification is in the works [0], it is best to disable such events until support is available, otherwise we will get unexpected results. Due to this reason, two riscv bpf testcases get_branch_snapshot and perf_branches/perf_branches_hw fail.
Link: https://github.com/riscv/riscv-control-transfer-records [0] Fixes: f5bfa23f576f ("RISC-V: Add a perf core library for pmu drivers") Signed-off-by: Pu Lehui pulehui@huawei.com Reviewed-by: Atish Patra atishp@rivosinc.com Reviewed-by: Conor Dooley conor.dooley@microchip.com Link: https://lore.kernel.org/r/20240312012053.1178140-1-pulehui@huaweicloud.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/perf/riscv_pmu.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c index c78a6fd6c57f6..b4efdddb2ad91 100644 --- a/drivers/perf/riscv_pmu.c +++ b/drivers/perf/riscv_pmu.c @@ -313,6 +313,10 @@ static int riscv_pmu_event_init(struct perf_event *event) u64 event_config = 0; uint64_t cmask;
+ /* driver does not support branch stack sampling */ + if (has_branch_stack(event)) + return -EOPNOTSUPP; + hwc->flags = 0; mapped_event = rvpmu->event_map(event, &event_config); if (mapped_event < 0) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Victor Isaev victor@torrio.net
[ Upstream commit 13dddf9319808badd2c1f5d7007b4e82838a648e ]
"riscv: signal: Report signal frame size to userspace via auxv" (e92f469) has added new constant AT_MINSIGSTKSZ but failed to increment the size of auxv, keeping AT_VECTOR_SIZE_ARCH at 9. This fix correctly increments AT_VECTOR_SIZE_ARCH to 10, following the approach in the commit 94b07c1 ("arm64: signal: Report signal frame size to userspace via auxv").
Link: https://lore.kernel.org/r/73883406.20231215232720@torrio.net Link: https://lore.kernel.org/all/20240102133617.3649-1-victor@torrio.net/ Reported-by: Ivan Komarov ivan.komarov@dfyz.info Closes: https://lore.kernel.org/linux-riscv/CY3Z02NYV1C4.11BLB9PLVW9G1@fedora/ Fixes: e92f469b0771 ("riscv: signal: Report signal frame size to userspace via auxv") Signed-off-by: Victor Isaev isv@google.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/uapi/asm/auxvec.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/include/uapi/asm/auxvec.h b/arch/riscv/include/uapi/asm/auxvec.h index 10aaa83db89ef..95050ebe9ad00 100644 --- a/arch/riscv/include/uapi/asm/auxvec.h +++ b/arch/riscv/include/uapi/asm/auxvec.h @@ -34,7 +34,7 @@ #define AT_L3_CACHEGEOMETRY 47
/* entries in ARCH_DLINFO */ -#define AT_VECTOR_SIZE_ARCH 9 +#define AT_VECTOR_SIZE_ARCH 10 #define AT_MINSIGSTKSZ 51
#endif /* _UAPI_ASM_RISCV_AUXVEC_H */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit 00bb549d7d63a21532e76e4a334d7807a54d9f31 ]
When keeping the upper end of a cache block entry, the entry[] array must be indexed by the offset from the base register of the block, i.e. max - mas.index.
The code was indexing entry[] by only the register address, leading to an out-of-bounds access that copied some part of the kernel memory over the cache contents.
This bug was not detected by the regmap KUnit test because it only tests with a block of registers starting at 0, so mas.index == 0.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: f033c26de5a5 ("regmap: Add maple tree based register cache") Link: https://msgid.link/r/20240327114406.976986-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/base/regmap/regcache-maple.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/regmap/regcache-maple.c b/drivers/base/regmap/regcache-maple.c index 41edd6a430eb4..c1776127a5724 100644 --- a/drivers/base/regmap/regcache-maple.c +++ b/drivers/base/regmap/regcache-maple.c @@ -145,7 +145,7 @@ static int regcache_maple_drop(struct regmap *map, unsigned int min, upper_index = max + 1; upper_last = mas.last;
- upper = kmemdup(&entry[max + 1], + upper = kmemdup(&entry[max - mas.index + 1], ((mas.last - max) * sizeof(unsigned long)), map->alloc_flags);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Simon Trimmer simont@opensource.cirrus.com
[ Upstream commit 2d0401ee38d43ab0e4cdd02dfc9d402befb2b5c8 ]
Adding the ACPI HIDs to the match table triggers the cs35l56-hda modules to be loaded on boot so that Serial Multi Instantiate can add the devices to the bus and begin the driver init sequence.
Signed-off-by: Simon Trimmer simont@opensource.cirrus.com Fixes: 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier") Message-ID: 20240328121355.18972-1-simont@opensource.cirrus.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/cs35l56_hda_i2c.c | 13 +++++++++++-- sound/pci/hda/cs35l56_hda_spi.c | 13 +++++++++++-- 2 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/sound/pci/hda/cs35l56_hda_i2c.c b/sound/pci/hda/cs35l56_hda_i2c.c index 757a4d193e0fb..c31f60b0421e5 100644 --- a/sound/pci/hda/cs35l56_hda_i2c.c +++ b/sound/pci/hda/cs35l56_hda_i2c.c @@ -49,10 +49,19 @@ static const struct i2c_device_id cs35l56_hda_i2c_id[] = { {} };
+static const struct acpi_device_id cs35l56_acpi_hda_match[] = { + { "CSC3554", 0 }, + { "CSC3556", 0 }, + { "CSC3557", 0 }, + {} +}; +MODULE_DEVICE_TABLE(acpi, cs35l56_acpi_hda_match); + static struct i2c_driver cs35l56_hda_i2c_driver = { .driver = { - .name = "cs35l56-hda", - .pm = &cs35l56_hda_pm_ops, + .name = "cs35l56-hda", + .acpi_match_table = cs35l56_acpi_hda_match, + .pm = &cs35l56_hda_pm_ops, }, .id_table = cs35l56_hda_i2c_id, .probe = cs35l56_hda_i2c_probe, diff --git a/sound/pci/hda/cs35l56_hda_spi.c b/sound/pci/hda/cs35l56_hda_spi.c index 756aec342eab7..52c9e04b3c55f 100644 --- a/sound/pci/hda/cs35l56_hda_spi.c +++ b/sound/pci/hda/cs35l56_hda_spi.c @@ -49,10 +49,19 @@ static const struct spi_device_id cs35l56_hda_spi_id[] = { {} };
+static const struct acpi_device_id cs35l56_acpi_hda_match[] = { + { "CSC3554", 0 }, + { "CSC3556", 0 }, + { "CSC3557", 0 }, + {} +}; +MODULE_DEVICE_TABLE(acpi, cs35l56_acpi_hda_match); + static struct spi_driver cs35l56_hda_spi_driver = { .driver = { - .name = "cs35l56-hda", - .pm = &cs35l56_hda_pm_ops, + .name = "cs35l56-hda", + .acpi_match_table = cs35l56_acpi_hda_match, + .pm = &cs35l56_hda_pm_ops, }, .id_table = cs35l56_hda_spi_id, .probe = cs35l56_hda_spi_probe,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Hewitt christianshewitt@gmail.com
[ Upstream commit 2bd02f5a0bac4bb13e0da18652dc75ba0e4958ec ]
Increase the timeout value to prevent system logs on Amlogic boards flooding with power transition warnings:
[ 13.047638] panfrost ffe40000.gpu: shader power transition timeout [ 13.048674] panfrost ffe40000.gpu: l2 power transition timeout [ 13.937324] panfrost ffe40000.gpu: shader power transition timeout [ 13.938351] panfrost ffe40000.gpu: l2 power transition timeout ... [39829.506904] panfrost ffe40000.gpu: shader power transition timeout [39829.507938] panfrost ffe40000.gpu: l2 power transition timeout [39949.508369] panfrost ffe40000.gpu: shader power transition timeout [39949.509405] panfrost ffe40000.gpu: l2 power transition timeout
The 2000 value has been found through trial and error testing with devices using G52 and G31 GPUs.
Fixes: 22aa1a209018 ("drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off()") Signed-off-by: Christian Hewitt christianshewitt@gmail.com Reviewed-by: Steven Price steven.price@arm.com Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Signed-off-by: Steven Price steven.price@arm.com Link: https://patchwork.freedesktop.org/patch/msgid/20240322164525.2617508-1-chris... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/panfrost/panfrost_gpu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c index eca45b83e4e67..c067ff550692a 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -387,19 +387,19 @@ void panfrost_gpu_power_off(struct panfrost_device *pfdev)
gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present); ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_PWRTRANS_LO, - val, !val, 1, 1000); + val, !val, 1, 2000); if (ret) dev_err(pfdev->dev, "shader power transition timeout");
gpu_write(pfdev, TILER_PWROFF_LO, pfdev->features.tiler_present); ret = readl_relaxed_poll_timeout(pfdev->iomem + TILER_PWRTRANS_LO, - val, !val, 1, 1000); + val, !val, 1, 2000); if (ret) dev_err(pfdev->dev, "tiler power transition timeout");
gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present); ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_LO, - val, !val, 0, 1000); + val, !val, 0, 2000); if (ret) dev_err(pfdev->dev, "l2 power transition timeout"); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Airlie airlied@redhat.com
[ Upstream commit be141849ec00ef39935bf169c0f194ac70bf85ce ]
dEQP-VK.sparse_resources.image_rebind.2d_array.r64i.128_128_8 was causing a remap operation like the below.
op_remap: prev: 0000003fffed0000 00000000000f0000 00000000a5abd18a 0000000000000000 op_remap: next: op_remap: unmap: 0000003fffed0000 0000000000100000 0 op_map: map: 0000003ffffc0000 0000000000010000 000000005b1ba33c 00000000000e0000
This was resulting in an unmap operation from 0x3fffed0000+0xf0000, 0x100000 which was corrupting the pagetables and oopsing the kernel.
Fixes the prev + unmap range calcs to use start/end and map back to addr/range.
Signed-off-by: Dave Airlie airlied@redhat.com Fixes: b88baab82871 ("drm/nouveau: implement new VM_BIND uAPI") Cc: Danilo Krummrich dakr@redhat.com Signed-off-by: Danilo Krummrich dakr@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20240328024317.2041851-1-airli... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nouveau_uvmm.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c index aae780e4a4aa3..2bbcdc649e862 100644 --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c @@ -804,15 +804,15 @@ op_remap(struct drm_gpuva_op_remap *r, struct drm_gpuva_op_unmap *u = r->unmap; struct nouveau_uvma *uvma = uvma_from_va(u->va); u64 addr = uvma->va.va.addr; - u64 range = uvma->va.va.range; + u64 end = uvma->va.va.addr + uvma->va.va.range;
if (r->prev) addr = r->prev->va.addr + r->prev->va.range;
if (r->next) - range = r->next->va.addr - addr; + end = r->next->va.addr;
- op_unmap_range(u, addr, range); + op_unmap_range(u, addr, end - addr); }
static int
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Clark robdclark@chromium.org
[ Upstream commit a4ec240f6b7c21cf846d10017c3ce423a0eae92c ]
virtgpu "vram" GEM objects do not implement obj->get_sg_table(). But they also don't use drm_gem_map_dma_buf(). In fact they may not even have guest visible pages. But it is perfectly fine to export and share with other virtual devices.
Reported-by: Dominik Behr dbehr@chromium.org Fixes: 207395da5a97 ("drm/prime: reject DMA-BUF attach when get_sg_table is missing") Signed-off-by: Rob Clark robdclark@chromium.org Reviewed-by: Simon Ser contact@emersion.fr Signed-off-by: Simon Ser contact@emersion.fr Link: https://patchwork.freedesktop.org/patch/msgid/20240322214801.319975-1-robdcl... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/drm_prime.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 7352bde299d54..03bd3c7bd0dc2 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -582,7 +582,12 @@ int drm_gem_map_attach(struct dma_buf *dma_buf, { struct drm_gem_object *obj = dma_buf->priv;
- if (!obj->funcs->get_sg_table) + /* + * drm_gem_map_dma_buf() requires obj->get_sg_table(), but drivers + * that implement their own ->map_dma_buf() do not. + */ + if (dma_buf->ops->map_dma_buf == drm_gem_map_dma_buf && + !obj->funcs->get_sg_table) return -ENOSYS;
return drm_gem_pin(obj);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit 310a5caa4e861616a27a83c3e8bda17d65026fa8 ]
The disable_irq_lock protects the 'disable_irq' value, we need to lock before testing it.
Fixes: 02fb23d72720 ("ASoC: rt5682-sdw: fix for JD event handling in ClockStop Mode0") Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Chao Song chao.song@linux.intel.com Link: https://msgid.link/r/20240325221817.206465-2-pierre-louis.bossart@linux.inte... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt5682-sdw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/rt5682-sdw.c b/sound/soc/codecs/rt5682-sdw.c index e67c2e19cb1a7..1fdbef5fd6cba 100644 --- a/sound/soc/codecs/rt5682-sdw.c +++ b/sound/soc/codecs/rt5682-sdw.c @@ -763,12 +763,12 @@ static int __maybe_unused rt5682_dev_resume(struct device *dev) return 0;
if (!slave->unattach_request) { + mutex_lock(&rt5682->disable_irq_lock); if (rt5682->disable_irq == true) { - mutex_lock(&rt5682->disable_irq_lock); sdw_write_no_pm(slave, SDW_SCP_INTMASK1, SDW_SCP_INT1_IMPL_DEF); rt5682->disable_irq = false; - mutex_unlock(&rt5682->disable_irq_lock); } + mutex_unlock(&rt5682->disable_irq_lock); goto regmap_sync; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit ee287771644394d071e6a331951ee8079b64f9a7 ]
The disable_irq_lock protects the 'disable_irq' value, we need to lock before testing it.
Fixes: 23adeb7056ac ("ASoC: rt711-sdca: fix for JD event handling in ClockStop Mode0") Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Chao Song chao.song@linux.intel.com Link: https://msgid.link/r/20240325221817.206465-3-pierre-louis.bossart@linux.inte... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt711-sdca-sdw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/rt711-sdca-sdw.c b/sound/soc/codecs/rt711-sdca-sdw.c index 935e597022d32..b8471b2d8f4f1 100644 --- a/sound/soc/codecs/rt711-sdca-sdw.c +++ b/sound/soc/codecs/rt711-sdca-sdw.c @@ -438,13 +438,13 @@ static int __maybe_unused rt711_sdca_dev_resume(struct device *dev) return 0;
if (!slave->unattach_request) { + mutex_lock(&rt711->disable_irq_lock); if (rt711->disable_irq == true) { - mutex_lock(&rt711->disable_irq_lock); sdw_write_no_pm(slave, SDW_SCP_SDCA_INTMASK1, SDW_SCP_SDCA_INTMASK_SDCA_0); sdw_write_no_pm(slave, SDW_SCP_SDCA_INTMASK2, SDW_SCP_SDCA_INTMASK_SDCA_8); rt711->disable_irq = false; - mutex_unlock(&rt711->disable_irq_lock); } + mutex_unlock(&rt711->disable_irq_lock); goto regmap_sync; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit aae86cfd8790bcc7693a5a0894df58de5cb5128c ]
The disable_irq_lock protects the 'disable_irq' value, we need to lock before testing it.
Fixes: b69de265bd0e ("ASoC: rt711: fix for JD event handling in ClockStop Mode0") Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Chao Song chao.song@linux.intel.com Link: https://msgid.link/r/20240325221817.206465-4-pierre-louis.bossart@linux.inte... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt711-sdw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/rt711-sdw.c b/sound/soc/codecs/rt711-sdw.c index 3f5773310ae8c..988451f24a756 100644 --- a/sound/soc/codecs/rt711-sdw.c +++ b/sound/soc/codecs/rt711-sdw.c @@ -536,12 +536,12 @@ static int __maybe_unused rt711_dev_resume(struct device *dev) return 0;
if (!slave->unattach_request) { + mutex_lock(&rt711->disable_irq_lock); if (rt711->disable_irq == true) { - mutex_lock(&rt711->disable_irq_lock); sdw_write_no_pm(slave, SDW_SCP_INTMASK1, SDW_SCP_INT1_IMPL_DEF); rt711->disable_irq = false; - mutex_unlock(&rt711->disable_irq_lock); } + mutex_unlock(&rt711->disable_irq_lock); goto regmap_sync; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit c8b2e5c1b959d100990e4f0cbad38e7d047bb97c ]
The disable_irq_lock protects the 'disable_irq' value, we need to lock before testing it.
Fixes: 7a8735c1551e ("ASoC: rt712-sdca: fix for JD event handling in ClockStop Mode0") Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Chao Song chao.song@linux.intel.com Link: https://msgid.link/r/20240325221817.206465-5-pierre-louis.bossart@linux.inte... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt712-sdca-sdw.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/rt712-sdca-sdw.c b/sound/soc/codecs/rt712-sdca-sdw.c index 6b644a89c5890..ba877432cea61 100644 --- a/sound/soc/codecs/rt712-sdca-sdw.c +++ b/sound/soc/codecs/rt712-sdca-sdw.c @@ -438,13 +438,14 @@ static int __maybe_unused rt712_sdca_dev_resume(struct device *dev) return 0;
if (!slave->unattach_request) { + mutex_lock(&rt712->disable_irq_lock); if (rt712->disable_irq == true) { - mutex_lock(&rt712->disable_irq_lock); + sdw_write_no_pm(slave, SDW_SCP_SDCA_INTMASK1, SDW_SCP_SDCA_INTMASK_SDCA_0); sdw_write_no_pm(slave, SDW_SCP_SDCA_INTMASK2, SDW_SCP_SDCA_INTMASK_SDCA_8); rt712->disable_irq = false; - mutex_unlock(&rt712->disable_irq_lock); } + mutex_unlock(&rt712->disable_irq_lock); goto regmap_sync; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit adb354bbc231b23d3a05163ce35c1d598512ff64 ]
The disable_irq_lock protects the 'disable_irq' value, we need to lock before testing it.
Fixes: a0b7c59ac1a9 ("ASoC: rt722-sdca: fix for JD event handling in ClockStop Mode0") Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Chao Song chao.song@linux.intel.com Link: https://msgid.link/r/20240325221817.206465-6-pierre-louis.bossart@linux.inte... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt722-sdca-sdw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/rt722-sdca-sdw.c b/sound/soc/codecs/rt722-sdca-sdw.c index a38ec58622145..43a4e79e56966 100644 --- a/sound/soc/codecs/rt722-sdca-sdw.c +++ b/sound/soc/codecs/rt722-sdca-sdw.c @@ -464,13 +464,13 @@ static int __maybe_unused rt722_sdca_dev_resume(struct device *dev) return 0;
if (!slave->unattach_request) { + mutex_lock(&rt722->disable_irq_lock); if (rt722->disable_irq == true) { - mutex_lock(&rt722->disable_irq_lock); sdw_write_no_pm(slave, SDW_SCP_SDCA_INTMASK1, SDW_SCP_SDCA_INTMASK_SDCA_6); sdw_write_no_pm(slave, SDW_SCP_SDCA_INTMASK2, SDW_SCP_SDCA_INTMASK_SDCA_8); rt722->disable_irq = false; - mutex_unlock(&rt722->disable_irq_lock); } + mutex_unlock(&rt722->disable_irq_lock); goto regmap_sync; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephen Lee slee08177@gmail.com
[ Upstream commit fc563aa900659a850e2ada4af26b9d7a3de6c591 ]
In snd_soc_info_volsw(), mask is generated by figuring out the index of the most significant bit set in max and converting the index to a bitmask through bit shift 1. Unintended wraparound occurs when max is an integer value with msb bit set. Since the bit shift value 1 is treated as an integer type, the left shift operation will wraparound and set mask to 0 instead of all 1's. In order to fix this, we type cast 1 as `1ULL` to prevent the wraparound.
Fixes: 7077148fb50a ("ASoC: core: Split ops out of soc-core.c") Signed-off-by: Stephen Lee slee08177@gmail.com Link: https://msgid.link/r/20240326010131.6211-1-slee08177@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/soc-ops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/soc-ops.c b/sound/soc/soc-ops.c index 2d25748ca7066..b27e89ff6a167 100644 --- a/sound/soc/soc-ops.c +++ b/sound/soc/soc-ops.c @@ -263,7 +263,7 @@ int snd_soc_get_volsw(struct snd_kcontrol *kcontrol, int max = mc->max; int min = mc->min; int sign_bit = mc->sign_bit; - unsigned int mask = (1 << fls(max)) - 1; + unsigned int mask = (1ULL << fls(max)) - 1; unsigned int invert = mc->invert; int val; int ret;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sam Protsenko semen.protsenko@linaro.org
[ Upstream commit 460efee706c2b6a4daba62ec143fea29c2e7b358 ]
Simplify the code by extracting all cases of FIFO depth calculation into a dedicated macro. No functional change.
Signed-off-by: Sam Protsenko semen.protsenko@linaro.org Reviewed-by: Andi Shyti andi.shyti@kernel.org Link: https://msgid.link/r/20240120170001.3356-1-semen.protsenko@linaro.org Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: a3d3eab627bb ("spi: s3c64xx: Use DMA mode from fifo size") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-s3c64xx.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/spi/spi-s3c64xx.c b/drivers/spi/spi-s3c64xx.c index 0e48ffd499b9f..432ec60d35684 100644 --- a/drivers/spi/spi-s3c64xx.c +++ b/drivers/spi/spi-s3c64xx.c @@ -109,6 +109,7 @@ #define TX_FIFO_LVL(v, i) (((v) >> 6) & FIFO_LVL_MASK(i)) #define RX_FIFO_LVL(v, i) (((v) >> (i)->port_conf->rx_lvl_offset) & \ FIFO_LVL_MASK(i)) +#define FIFO_DEPTH(i) ((FIFO_LVL_MASK(i) >> 1) + 1)
#define S3C64XX_SPI_MAX_TRAILCNT 0x3ff #define S3C64XX_SPI_TRAILCNT_OFF 19 @@ -406,7 +407,7 @@ static bool s3c64xx_spi_can_dma(struct spi_controller *host, struct s3c64xx_spi_driver_data *sdd = spi_controller_get_devdata(host);
if (sdd->rx_dma.ch && sdd->tx_dma.ch) { - return xfer->len > (FIFO_LVL_MASK(sdd) >> 1) + 1; + return xfer->len > FIFO_DEPTH(sdd); } else { return false; } @@ -495,9 +496,7 @@ static u32 s3c64xx_spi_wait_for_timeout(struct s3c64xx_spi_driver_data *sdd, void __iomem *regs = sdd->regs; unsigned long val = 1; u32 status; - - /* max fifo depth available */ - u32 max_fifo = (FIFO_LVL_MASK(sdd) >> 1) + 1; + u32 max_fifo = FIFO_DEPTH(sdd);
if (timeout_ms) val = msecs_to_loops(timeout_ms); @@ -604,7 +603,7 @@ static int s3c64xx_wait_for_pio(struct s3c64xx_spi_driver_data *sdd, * For any size less than the fifo size the below code is * executed atleast once. */ - loops = xfer->len / ((FIFO_LVL_MASK(sdd) >> 1) + 1); + loops = xfer->len / FIFO_DEPTH(sdd); buf = xfer->rx_buf; do { /* wait for data to be received in the fifo */ @@ -741,7 +740,7 @@ static int s3c64xx_spi_transfer_one(struct spi_controller *host, struct spi_transfer *xfer) { struct s3c64xx_spi_driver_data *sdd = spi_controller_get_devdata(host); - const unsigned int fifo_len = (FIFO_LVL_MASK(sdd) >> 1) + 1; + const unsigned int fifo_len = FIFO_DEPTH(sdd); const void *tx_buf = NULL; void *rx_buf = NULL; int target_len = 0, origin_len = 0; @@ -1280,7 +1279,7 @@ static int s3c64xx_spi_probe(struct platform_device *pdev) dev_dbg(&pdev->dev, "Samsung SoC SPI Driver loaded for Bus SPI-%d with %d Targets attached\n", sdd->port_id, host->num_chipselect); dev_dbg(&pdev->dev, "\tIOmem=[%pR]\tFIFO %dbytes\n", - mem_res, (FIFO_LVL_MASK(sdd) >> 1) + 1); + mem_res, FIFO_DEPTH(sdd));
pm_runtime_mark_last_busy(&pdev->dev); pm_runtime_put_autosuspend(&pdev->dev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tudor Ambarus tudor.ambarus@linaro.org
[ Upstream commit a77ce80f63f06d7ae933c332ed77c79136fa69b0 ]
Sorting headers alphabetically helps locating duplicates, and makes it easier to figure out where to insert new headers.
Reviewed-by: Andi Shyti andi.shyti@kernel.org Reviewed-by: Peter Griffin peter.griffin@linaro.org Signed-off-by: Tudor Ambarus tudor.ambarus@linaro.org Link: https://lore.kernel.org/r/20240207120431.2766269-2-tudor.ambarus@linaro.org Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: a3d3eab627bb ("spi: s3c64xx: Use DMA mode from fifo size") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-s3c64xx.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/spi/spi-s3c64xx.c b/drivers/spi/spi-s3c64xx.c index 432ec60d35684..26d389d95af92 100644 --- a/drivers/spi/spi-s3c64xx.c +++ b/drivers/spi/spi-s3c64xx.c @@ -3,19 +3,18 @@ // Copyright (c) 2009 Samsung Electronics Co., Ltd. // Jaswinder Singh jassi.brar@samsung.com
-#include <linux/init.h> -#include <linux/module.h> -#include <linux/interrupt.h> -#include <linux/delay.h> #include <linux/clk.h> +#include <linux/delay.h> #include <linux/dma-mapping.h> #include <linux/dmaengine.h> +#include <linux/init.h> +#include <linux/interrupt.h> +#include <linux/module.h> +#include <linux/of.h> +#include <linux/platform_data/spi-s3c64xx.h> #include <linux/platform_device.h> #include <linux/pm_runtime.h> #include <linux/spi/spi.h> -#include <linux/of.h> - -#include <linux/platform_data/spi-s3c64xx.h>
#define MAX_SPI_PORTS 12 #define S3C64XX_SPI_QUIRK_CS_AUTO (1 << 1)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tudor Ambarus tudor.ambarus@linaro.org
[ Upstream commit 4568fa574fcef3811a8140702979f076ef0f5bc0 ]
The driver uses GENMASK() but does not include <linux/bits.h>.
It is good practice to directly include all headers used, it avoids implicit dependencies and spurious breakage if someone rearranges headers and causes the implicit include to vanish.
Include the missing header.
Reviewed-by: Peter Griffin peter.griffin@linaro.org Signed-off-by: Tudor Ambarus tudor.ambarus@linaro.org Link: https://lore.kernel.org/r/20240207120431.2766269-4-tudor.ambarus@linaro.org Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: a3d3eab627bb ("spi: s3c64xx: Use DMA mode from fifo size") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-s3c64xx.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/spi/spi-s3c64xx.c b/drivers/spi/spi-s3c64xx.c index 26d389d95af92..1e519b1537e71 100644 --- a/drivers/spi/spi-s3c64xx.c +++ b/drivers/spi/spi-s3c64xx.c @@ -3,6 +3,7 @@ // Copyright (c) 2009 Samsung Electronics Co., Ltd. // Jaswinder Singh jassi.brar@samsung.com
+#include <linux/bits.h> #include <linux/clk.h> #include <linux/delay.h> #include <linux/dma-mapping.h>
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tudor Ambarus tudor.ambarus@linaro.org
[ Upstream commit 9d47e411f4d636519a8d26587928d34cf52c0c1f ]
Else case is not needed after a return, remove it.
Reviewed-by: Andi Shyti andi.shyti@kernel.org Reviewed-by: Sam Protsenko semen.protsenko@linaro.org Signed-off-by: Tudor Ambarus tudor.ambarus@linaro.org Link: https://lore.kernel.org/r/20240207120431.2766269-9-tudor.ambarus@linaro.org Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: a3d3eab627bb ("spi: s3c64xx: Use DMA mode from fifo size") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-s3c64xx.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/spi/spi-s3c64xx.c b/drivers/spi/spi-s3c64xx.c index 1e519b1537e71..29e99410c9716 100644 --- a/drivers/spi/spi-s3c64xx.c +++ b/drivers/spi/spi-s3c64xx.c @@ -406,12 +406,10 @@ static bool s3c64xx_spi_can_dma(struct spi_controller *host, { struct s3c64xx_spi_driver_data *sdd = spi_controller_get_devdata(host);
- if (sdd->rx_dma.ch && sdd->tx_dma.ch) { + if (sdd->rx_dma.ch && sdd->tx_dma.ch) return xfer->len > FIFO_DEPTH(sdd); - } else { - return false; - }
+ return false; }
static int s3c64xx_enable_datapath(struct s3c64xx_spi_driver_data *sdd,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tudor Ambarus tudor.ambarus@linaro.org
[ Upstream commit ff8faa8a5c0f4c2da797cd22a163ee3cc8823b13 ]
Define a magic value, it will be used in the next patch as well.
Signed-off-by: Tudor Ambarus tudor.ambarus@linaro.org Link: https://msgid.link/r/20240216070555.2483977-3-tudor.ambarus@linaro.org Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: a3d3eab627bb ("spi: s3c64xx: Use DMA mode from fifo size") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-s3c64xx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/spi/spi-s3c64xx.c b/drivers/spi/spi-s3c64xx.c index 29e99410c9716..3da940e6299f0 100644 --- a/drivers/spi/spi-s3c64xx.c +++ b/drivers/spi/spi-s3c64xx.c @@ -76,6 +76,7 @@ #define S3C64XX_SPI_INT_RX_FIFORDY_EN (1<<1) #define S3C64XX_SPI_INT_TX_FIFORDY_EN (1<<0)
+#define S3C64XX_SPI_ST_TX_FIFO_LVL_SHIFT 6 #define S3C64XX_SPI_ST_RX_OVERRUN_ERR (1<<5) #define S3C64XX_SPI_ST_RX_UNDERRUN_ERR (1<<4) #define S3C64XX_SPI_ST_TX_OVERRUN_ERR (1<<3) @@ -106,7 +107,8 @@ #define FIFO_LVL_MASK(i) ((i)->port_conf->fifo_lvl_mask[i->port_id]) #define S3C64XX_SPI_ST_TX_DONE(v, i) (((v) & \ (1 << (i)->port_conf->tx_st_done)) ? 1 : 0) -#define TX_FIFO_LVL(v, i) (((v) >> 6) & FIFO_LVL_MASK(i)) +#define TX_FIFO_LVL(v, i) (((v) >> S3C64XX_SPI_ST_TX_FIFO_LVL_SHIFT) & \ + FIFO_LVL_MASK(i)) #define RX_FIFO_LVL(v, i) (((v) >> (i)->port_conf->rx_lvl_offset) & \ FIFO_LVL_MASK(i)) #define FIFO_DEPTH(i) ((FIFO_LVL_MASK(i) >> 1) + 1)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tudor Ambarus tudor.ambarus@linaro.org
[ Upstream commit d6911cf27e5c8491cbfedd4ae2d1ee74a3e685b4 ]
The driver is wrong because is using partial register field masks for the SPI_STATUS.{RX, TX}_FIFO_LVL register fields.
We see s3c64xx_spi_port_config.fifo_lvl_mask with different values for different instances of the same IP. Take s5pv210_spi_port_config for example, it defines: .fifo_lvl_mask = { 0x1ff, 0x7F },
fifo_lvl_mask is used to determine the FIFO depth of the instance of the IP. In this case, the integrator uses a 256 bytes FIFO for the first SPI instance of the IP, and a 64 bytes FIFO for the second instance. While the first mask reflects the SPI_STATUS.{RX, TX}_FIFO_LVL register fields, the second one is two bits short. Using partial field masks is misleading and can hide problems of the driver's logic.
Allow platforms to specify the full FIFO mask, regardless of the FIFO depth.
Introduce {rx, tx}_fifomask to represent the SPI_STATUS.{RX, TX}_FIFO_LVL register fields. It's a shifted mask defining the field's length and position. We'll be able to deprecate the use of @rx_lvl_offset, as the shift value can be determined from the mask. The existing compatibles shall start using {rx, tx}_fifomask so that they use the full field mask and to avoid shifting the mask to position, and then shifting it back to zero in the {TX, RX}_FIFO_LVL macros.
@rx_lvl_offset will be deprecated in a further patch, after we have the infrastructure to deprecate @fifo_lvl_mask as well.
No functional change intended.
Signed-off-by: Tudor Ambarus tudor.ambarus@linaro.org Link: https://msgid.link/r/20240216070555.2483977-4-tudor.ambarus@linaro.org Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: a3d3eab627bb ("spi: s3c64xx: Use DMA mode from fifo size") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-s3c64xx.c | 40 +++++++++++++++++++++++++++++++++++---- 1 file changed, 36 insertions(+), 4 deletions(-)
diff --git a/drivers/spi/spi-s3c64xx.c b/drivers/spi/spi-s3c64xx.c index 3da940e6299f0..688b8fad9e2fd 100644 --- a/drivers/spi/spi-s3c64xx.c +++ b/drivers/spi/spi-s3c64xx.c @@ -3,6 +3,7 @@ // Copyright (c) 2009 Samsung Electronics Co., Ltd. // Jaswinder Singh jassi.brar@samsung.com
+#include <linux/bitops.h> #include <linux/bits.h> #include <linux/clk.h> #include <linux/delay.h> @@ -107,10 +108,10 @@ #define FIFO_LVL_MASK(i) ((i)->port_conf->fifo_lvl_mask[i->port_id]) #define S3C64XX_SPI_ST_TX_DONE(v, i) (((v) & \ (1 << (i)->port_conf->tx_st_done)) ? 1 : 0) -#define TX_FIFO_LVL(v, i) (((v) >> S3C64XX_SPI_ST_TX_FIFO_LVL_SHIFT) & \ - FIFO_LVL_MASK(i)) -#define RX_FIFO_LVL(v, i) (((v) >> (i)->port_conf->rx_lvl_offset) & \ - FIFO_LVL_MASK(i)) +#define TX_FIFO_LVL(v, sdd) (((v) & (sdd)->tx_fifomask) >> \ + __ffs((sdd)->tx_fifomask)) +#define RX_FIFO_LVL(v, sdd) (((v) & (sdd)->rx_fifomask) >> \ + __ffs((sdd)->rx_fifomask)) #define FIFO_DEPTH(i) ((FIFO_LVL_MASK(i) >> 1) + 1)
#define S3C64XX_SPI_MAX_TRAILCNT 0x3ff @@ -136,6 +137,10 @@ struct s3c64xx_spi_dma_data { * struct s3c64xx_spi_port_config - SPI Controller hardware info * @fifo_lvl_mask: Bit-mask for {TX|RX}_FIFO_LVL bits in SPI_STATUS register. * @rx_lvl_offset: Bit offset of RX_FIFO_LVL bits in SPI_STATUS regiter. + * @rx_fifomask: SPI_STATUS.RX_FIFO_LVL mask. Shifted mask defining the field's + * length and position. + * @tx_fifomask: SPI_STATUS.TX_FIFO_LVL mask. Shifted mask defining the field's + * length and position. * @tx_st_done: Bit offset of TX_DONE bit in SPI_STATUS regiter. * @clk_div: Internal clock divider * @quirks: Bitmask of known quirks @@ -153,6 +158,8 @@ struct s3c64xx_spi_dma_data { struct s3c64xx_spi_port_config { int fifo_lvl_mask[MAX_SPI_PORTS]; int rx_lvl_offset; + u32 rx_fifomask; + u32 tx_fifomask; int tx_st_done; int quirks; int clk_div; @@ -182,6 +189,10 @@ struct s3c64xx_spi_port_config { * @tx_dma: Local transmit DMA data (e.g. chan and direction) * @port_conf: Local SPI port configuartion data * @port_id: Port identification number + * @rx_fifomask: SPI_STATUS.RX_FIFO_LVL mask. Shifted mask defining the field's + * length and position. + * @tx_fifomask: SPI_STATUS.TX_FIFO_LVL mask. Shifted mask defining the field's + * length and position. */ struct s3c64xx_spi_driver_data { void __iomem *regs; @@ -201,6 +212,8 @@ struct s3c64xx_spi_driver_data { struct s3c64xx_spi_dma_data tx_dma; const struct s3c64xx_spi_port_config *port_conf; unsigned int port_id; + u32 rx_fifomask; + u32 tx_fifomask; };
static void s3c64xx_flush_fifo(struct s3c64xx_spi_driver_data *sdd) @@ -1145,6 +1158,23 @@ static inline const struct s3c64xx_spi_port_config *s3c64xx_spi_get_port_config( return (const struct s3c64xx_spi_port_config *)platform_get_device_id(pdev)->driver_data; }
+static void s3c64xx_spi_set_fifomask(struct s3c64xx_spi_driver_data *sdd) +{ + const struct s3c64xx_spi_port_config *port_conf = sdd->port_conf; + + if (port_conf->rx_fifomask) + sdd->rx_fifomask = port_conf->rx_fifomask; + else + sdd->rx_fifomask = FIFO_LVL_MASK(sdd) << + port_conf->rx_lvl_offset; + + if (port_conf->tx_fifomask) + sdd->tx_fifomask = port_conf->tx_fifomask; + else + sdd->tx_fifomask = FIFO_LVL_MASK(sdd) << + S3C64XX_SPI_ST_TX_FIFO_LVL_SHIFT; +} + static int s3c64xx_spi_probe(struct platform_device *pdev) { struct resource *mem_res; @@ -1190,6 +1220,8 @@ static int s3c64xx_spi_probe(struct platform_device *pdev) sdd->port_id = pdev->id; }
+ s3c64xx_spi_set_fifomask(sdd); + sdd->cur_bpw = 8;
sdd->tx_dma.direction = DMA_MEM_TO_DEV;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tudor Ambarus tudor.ambarus@linaro.org
[ Upstream commit c6e776ab6abdfce5a1edcde7a22c639e76499939 ]
Determine the FIFO depth only once, at probe time. ``sdd->fifo_depth`` can be set later on with the FIFO depth specified in the device tree.
Signed-off-by: Tudor Ambarus tudor.ambarus@linaro.org Link: https://msgid.link/r/20240216070555.2483977-5-tudor.ambarus@linaro.org Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: a3d3eab627bb ("spi: s3c64xx: Use DMA mode from fifo size") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-s3c64xx.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/spi/spi-s3c64xx.c b/drivers/spi/spi-s3c64xx.c index 688b8fad9e2fd..e059fb9db1da1 100644 --- a/drivers/spi/spi-s3c64xx.c +++ b/drivers/spi/spi-s3c64xx.c @@ -189,6 +189,7 @@ struct s3c64xx_spi_port_config { * @tx_dma: Local transmit DMA data (e.g. chan and direction) * @port_conf: Local SPI port configuartion data * @port_id: Port identification number + * @fifo_depth: depth of the FIFO. * @rx_fifomask: SPI_STATUS.RX_FIFO_LVL mask. Shifted mask defining the field's * length and position. * @tx_fifomask: SPI_STATUS.TX_FIFO_LVL mask. Shifted mask defining the field's @@ -212,6 +213,7 @@ struct s3c64xx_spi_driver_data { struct s3c64xx_spi_dma_data tx_dma; const struct s3c64xx_spi_port_config *port_conf; unsigned int port_id; + unsigned int fifo_depth; u32 rx_fifomask; u32 tx_fifomask; }; @@ -422,7 +424,7 @@ static bool s3c64xx_spi_can_dma(struct spi_controller *host, struct s3c64xx_spi_driver_data *sdd = spi_controller_get_devdata(host);
if (sdd->rx_dma.ch && sdd->tx_dma.ch) - return xfer->len > FIFO_DEPTH(sdd); + return xfer->len > sdd->fifo_depth;
return false; } @@ -509,7 +511,7 @@ static u32 s3c64xx_spi_wait_for_timeout(struct s3c64xx_spi_driver_data *sdd, void __iomem *regs = sdd->regs; unsigned long val = 1; u32 status; - u32 max_fifo = FIFO_DEPTH(sdd); + u32 max_fifo = sdd->fifo_depth;
if (timeout_ms) val = msecs_to_loops(timeout_ms); @@ -616,7 +618,7 @@ static int s3c64xx_wait_for_pio(struct s3c64xx_spi_driver_data *sdd, * For any size less than the fifo size the below code is * executed atleast once. */ - loops = xfer->len / FIFO_DEPTH(sdd); + loops = xfer->len / sdd->fifo_depth; buf = xfer->rx_buf; do { /* wait for data to be received in the fifo */ @@ -753,7 +755,7 @@ static int s3c64xx_spi_transfer_one(struct spi_controller *host, struct spi_transfer *xfer) { struct s3c64xx_spi_driver_data *sdd = spi_controller_get_devdata(host); - const unsigned int fifo_len = FIFO_DEPTH(sdd); + const unsigned int fifo_len = sdd->fifo_depth; const void *tx_buf = NULL; void *rx_buf = NULL; int target_len = 0, origin_len = 0; @@ -1220,6 +1222,8 @@ static int s3c64xx_spi_probe(struct platform_device *pdev) sdd->port_id = pdev->id; }
+ sdd->fifo_depth = FIFO_DEPTH(sdd); + s3c64xx_spi_set_fifomask(sdd);
sdd->cur_bpw = 8; @@ -1311,7 +1315,7 @@ static int s3c64xx_spi_probe(struct platform_device *pdev) dev_dbg(&pdev->dev, "Samsung SoC SPI Driver loaded for Bus SPI-%d with %d Targets attached\n", sdd->port_id, host->num_chipselect); dev_dbg(&pdev->dev, "\tIOmem=[%pR]\tFIFO %dbytes\n", - mem_res, FIFO_DEPTH(sdd)); + mem_res, sdd->fifo_depth);
pm_runtime_mark_last_busy(&pdev->dev); pm_runtime_put_autosuspend(&pdev->dev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jaewon Kim jaewon02.kim@samsung.com
[ Upstream commit a3d3eab627bbbb0cb175910cf8d0f7022628a642 ]
If the SPI data size is smaller than FIFO, it operates in PIO mode, and if it is larger than FIFO size, it oerates in DMA mode.
If the SPI data size is equal to fifo, it operates in PIO mode and it is separated to 2 transfers. To prevent it, it must operate in DMA mode from the case where the data size and the fifo size are the same.
Fixes: 1ee806718d5e ("spi: s3c64xx: support interrupt based pio mode") Signed-off-by: Jaewon Kim jaewon02.kim@samsung.com Reviewed-by: Sam Protsenko semen.protsenko@linaro.org Link: https://lore.kernel.org/r/20240329085840.65856-1-jaewon02.kim@samsung.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-s3c64xx.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/spi/spi-s3c64xx.c b/drivers/spi/spi-s3c64xx.c index e059fb9db1da1..652eadbefe24c 100644 --- a/drivers/spi/spi-s3c64xx.c +++ b/drivers/spi/spi-s3c64xx.c @@ -424,7 +424,7 @@ static bool s3c64xx_spi_can_dma(struct spi_controller *host, struct s3c64xx_spi_driver_data *sdd = spi_controller_get_devdata(host);
if (sdd->rx_dma.ch && sdd->tx_dma.ch) - return xfer->len > sdd->fifo_depth; + return xfer->len >= sdd->fifo_depth;
return false; } @@ -783,10 +783,9 @@ static int s3c64xx_spi_transfer_one(struct spi_controller *host, return status; }
- if (!is_polling(sdd) && (xfer->len > fifo_len) && + if (!is_polling(sdd) && xfer->len >= fifo_len && sdd->rx_dma.ch && sdd->tx_dma.ch) { use_dma = 1; - } else if (xfer->len >= fifo_len) { tx_buf = xfer->tx_buf; rx_buf = xfer->rx_buf;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vijendar Mukunda Vijendar.Mukunda@amd.com
[ Upstream commit 2c603a4947a1247102ccb008d5eb6f37a4043c98 ]
If acp_init() fails, acp pci driver probe should return error. Add acp_init() function return value check logic.
Fixes: e61b415515d3 ("ASoC: amd: acp: refactor the acp init and de-init sequence") Signed-off-by: Vijendar Mukunda Vijendar.Mukunda@amd.com Link: https://lore.kernel.org/r/20240329053815.2373979-1-Vijendar.Mukunda@amd.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/amd/acp/acp-pci.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/sound/soc/amd/acp/acp-pci.c b/sound/soc/amd/acp/acp-pci.c index a32c14a109b77..223238f662f83 100644 --- a/sound/soc/amd/acp/acp-pci.c +++ b/sound/soc/amd/acp/acp-pci.c @@ -107,7 +107,10 @@ static int acp_pci_probe(struct pci_dev *pci, const struct pci_device_id *pci_id goto unregister_dmic_dev; }
- acp_init(chip); + ret = acp_init(chip); + if (ret) + goto unregister_dmic_dev; + res = devm_kcalloc(&pci->dev, num_res, sizeof(struct resource), GFP_KERNEL); if (!res) { ret = -ENOMEM;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit eaa03486d932572dfd1c5f64f9dfebe572ad88c0 ]
Fix warnings reported by smatch by initializing local 'ret' variable to 0.
drivers/base/regmap/regcache-maple.c:186 regcache_maple_drop() error: uninitialized symbol 'ret'. drivers/base/regmap/regcache-maple.c:290 regcache_maple_sync() error: uninitialized symbol 'ret'.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: f033c26de5a5 ("regmap: Add maple tree based register cache") Link: https://lore.kernel.org/r/20240329144630.1965159-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/base/regmap/regcache-maple.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/base/regmap/regcache-maple.c b/drivers/base/regmap/regcache-maple.c index c1776127a5724..55999a50ccc0b 100644 --- a/drivers/base/regmap/regcache-maple.c +++ b/drivers/base/regmap/regcache-maple.c @@ -112,7 +112,7 @@ static int regcache_maple_drop(struct regmap *map, unsigned int min, unsigned long *entry, *lower, *upper; unsigned long lower_index, lower_last; unsigned long upper_index, upper_last; - int ret; + int ret = 0;
lower = NULL; upper = NULL; @@ -244,7 +244,7 @@ static int regcache_maple_sync(struct regmap *map, unsigned int min, unsigned long lmin = min; unsigned long lmax = max; unsigned int r, v, sync_start; - int ret; + int ret = 0; bool sync_needed = false;
map->cache_bypass = true;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 52f80bb181a9a1530ade30bc18991900bbb9697f ]
gcc warns about a memcpy() with overlapping pointers because of an incorrect size calculation:
In file included from include/linux/string.h:369, from drivers/ata/sata_sx4.c:66: In function 'memcpy_fromio', inlined from 'pdc20621_get_from_dimm.constprop' at drivers/ata/sata_sx4.c:962:2: include/linux/fortify-string.h:97:33: error: '__builtin_memcpy' accessing 4294934464 bytes at offsets 0 and [16, 16400] overlaps 6442385281 bytes at offset -2147450817 [-Werror=restrict] 97 | #define __underlying_memcpy __builtin_memcpy | ^ include/linux/fortify-string.h:620:9: note: in expansion of macro '__underlying_memcpy' 620 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ include/linux/fortify-string.h:665:26: note: in expansion of macro '__fortify_memcpy_chk' 665 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ include/asm-generic/io.h:1184:9: note: in expansion of macro 'memcpy' 1184 | memcpy(buffer, __io_virt(addr), size); | ^~~~~~
The problem here is the overflow of an unsigned 32-bit number to a negative that gets converted into a signed 'long', keeping a large positive number.
Replace the complex calculation with a more readable min() variant that avoids the warning.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/sata_sx4.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/ata/sata_sx4.c b/drivers/ata/sata_sx4.c index b51d7a9d0d90c..a482741eb181f 100644 --- a/drivers/ata/sata_sx4.c +++ b/drivers/ata/sata_sx4.c @@ -957,8 +957,7 @@ static void pdc20621_get_from_dimm(struct ata_host *host, void *psource,
offset -= (idx * window_size); idx++; - dist = ((long) (window_size - (offset + size))) >= 0 ? size : - (long) (window_size - offset); + dist = min(size, window_size - offset); memcpy_fromio(psource, dimm_mmio + offset / 4, dist);
psource += dist; @@ -1005,8 +1004,7 @@ static void pdc20621_put_to_dimm(struct ata_host *host, void *psource, readl(mmio + PDC_DIMM_WINDOW_CTLR); offset -= (idx * window_size); idx++; - dist = ((long)(s32)(window_size - (offset + size))) >= 0 ? size : - (long) (window_size - offset); + dist = min(size, window_size - offset); memcpy_toio(dimm_mmio + offset / 4, psource, dist); writel(0x01, mmio + PDC_GENERAL_CTLR); readl(mmio + PDC_GENERAL_CTLR);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 1197c5b2099f716b3de327437fb50900a0b936c9 ]
The myrb and myrs drivers use an odd way of implementing their sysfs files, calling snprintf() with a fixed length of 32 bytes to print into a page sized buffer. One of the strings is actually longer than 32 bytes, which clang can warn about:
drivers/scsi/myrb.c:1906:10: error: 'snprintf' will always be truncated; specified size is 32, but format string expands to at least 34 [-Werror,-Wformat-truncation] drivers/scsi/myrs.c:1089:10: error: 'snprintf' will always be truncated; specified size is 32, but format string expands to at least 34 [-Werror,-Wformat-truncation]
These could all be plain sprintf() without a length as the buffer is always long enough. On the other hand, sysfs files should not be overly long either, so just double the length to make sure the longest strings don't get truncated here.
Fixes: 77266186397c ("scsi: myrs: Add Mylex RAID controller (SCSI interface)") Fixes: 081ff398c56c ("scsi: myrb: Add Mylex RAID controller (block interface)") Signed-off-by: Arnd Bergmann arnd@arndb.de Link: https://lore.kernel.org/r/20240326223825.4084412-8-arnd@kernel.org Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/myrb.c | 20 ++++++++++---------- drivers/scsi/myrs.c | 24 ++++++++++++------------ 2 files changed, 22 insertions(+), 22 deletions(-)
diff --git a/drivers/scsi/myrb.c b/drivers/scsi/myrb.c index ca2e932dd9b70..f684eb5e04898 100644 --- a/drivers/scsi/myrb.c +++ b/drivers/scsi/myrb.c @@ -1775,9 +1775,9 @@ static ssize_t raid_state_show(struct device *dev,
name = myrb_devstate_name(ldev_info->state); if (name) - ret = snprintf(buf, 32, "%s\n", name); + ret = snprintf(buf, 64, "%s\n", name); else - ret = snprintf(buf, 32, "Invalid (%02X)\n", + ret = snprintf(buf, 64, "Invalid (%02X)\n", ldev_info->state); } else { struct myrb_pdev_state *pdev_info = sdev->hostdata; @@ -1796,9 +1796,9 @@ static ssize_t raid_state_show(struct device *dev, else name = myrb_devstate_name(pdev_info->state); if (name) - ret = snprintf(buf, 32, "%s\n", name); + ret = snprintf(buf, 64, "%s\n", name); else - ret = snprintf(buf, 32, "Invalid (%02X)\n", + ret = snprintf(buf, 64, "Invalid (%02X)\n", pdev_info->state); } return ret; @@ -1886,11 +1886,11 @@ static ssize_t raid_level_show(struct device *dev,
name = myrb_raidlevel_name(ldev_info->raid_level); if (!name) - return snprintf(buf, 32, "Invalid (%02X)\n", + return snprintf(buf, 64, "Invalid (%02X)\n", ldev_info->state); - return snprintf(buf, 32, "%s\n", name); + return snprintf(buf, 64, "%s\n", name); } - return snprintf(buf, 32, "Physical Drive\n"); + return snprintf(buf, 64, "Physical Drive\n"); } static DEVICE_ATTR_RO(raid_level);
@@ -1903,15 +1903,15 @@ static ssize_t rebuild_show(struct device *dev, unsigned char status;
if (sdev->channel < myrb_logical_channel(sdev->host)) - return snprintf(buf, 32, "physical device - not rebuilding\n"); + return snprintf(buf, 64, "physical device - not rebuilding\n");
status = myrb_get_rbld_progress(cb, &rbld_buf);
if (rbld_buf.ldev_num != sdev->id || status != MYRB_STATUS_SUCCESS) - return snprintf(buf, 32, "not rebuilding\n"); + return snprintf(buf, 64, "not rebuilding\n");
- return snprintf(buf, 32, "rebuilding block %u of %u\n", + return snprintf(buf, 64, "rebuilding block %u of %u\n", rbld_buf.ldev_size - rbld_buf.blocks_left, rbld_buf.ldev_size); } diff --git a/drivers/scsi/myrs.c b/drivers/scsi/myrs.c index a1eec65a9713f..e824be9d9bbb9 100644 --- a/drivers/scsi/myrs.c +++ b/drivers/scsi/myrs.c @@ -947,9 +947,9 @@ static ssize_t raid_state_show(struct device *dev,
name = myrs_devstate_name(ldev_info->dev_state); if (name) - ret = snprintf(buf, 32, "%s\n", name); + ret = snprintf(buf, 64, "%s\n", name); else - ret = snprintf(buf, 32, "Invalid (%02X)\n", + ret = snprintf(buf, 64, "Invalid (%02X)\n", ldev_info->dev_state); } else { struct myrs_pdev_info *pdev_info; @@ -958,9 +958,9 @@ static ssize_t raid_state_show(struct device *dev, pdev_info = sdev->hostdata; name = myrs_devstate_name(pdev_info->dev_state); if (name) - ret = snprintf(buf, 32, "%s\n", name); + ret = snprintf(buf, 64, "%s\n", name); else - ret = snprintf(buf, 32, "Invalid (%02X)\n", + ret = snprintf(buf, 64, "Invalid (%02X)\n", pdev_info->dev_state); } return ret; @@ -1066,13 +1066,13 @@ static ssize_t raid_level_show(struct device *dev, ldev_info = sdev->hostdata; name = myrs_raid_level_name(ldev_info->raid_level); if (!name) - return snprintf(buf, 32, "Invalid (%02X)\n", + return snprintf(buf, 64, "Invalid (%02X)\n", ldev_info->dev_state);
} else name = myrs_raid_level_name(MYRS_RAID_PHYSICAL);
- return snprintf(buf, 32, "%s\n", name); + return snprintf(buf, 64, "%s\n", name); } static DEVICE_ATTR_RO(raid_level);
@@ -1086,7 +1086,7 @@ static ssize_t rebuild_show(struct device *dev, unsigned char status;
if (sdev->channel < cs->ctlr_info->physchan_present) - return snprintf(buf, 32, "physical device - not rebuilding\n"); + return snprintf(buf, 64, "physical device - not rebuilding\n");
ldev_info = sdev->hostdata; ldev_num = ldev_info->ldev_num; @@ -1098,11 +1098,11 @@ static ssize_t rebuild_show(struct device *dev, return -EIO; } if (ldev_info->rbld_active) { - return snprintf(buf, 32, "rebuilding block %zu of %zu\n", + return snprintf(buf, 64, "rebuilding block %zu of %zu\n", (size_t)ldev_info->rbld_lba, (size_t)ldev_info->cfg_devsize); } else - return snprintf(buf, 32, "not rebuilding\n"); + return snprintf(buf, 64, "not rebuilding\n"); }
static ssize_t rebuild_store(struct device *dev, @@ -1190,7 +1190,7 @@ static ssize_t consistency_check_show(struct device *dev, unsigned short ldev_num;
if (sdev->channel < cs->ctlr_info->physchan_present) - return snprintf(buf, 32, "physical device - not checking\n"); + return snprintf(buf, 64, "physical device - not checking\n");
ldev_info = sdev->hostdata; if (!ldev_info) @@ -1198,11 +1198,11 @@ static ssize_t consistency_check_show(struct device *dev, ldev_num = ldev_info->ldev_num; myrs_get_ldev_info(cs, ldev_num, ldev_info); if (ldev_info->cc_active) - return snprintf(buf, 32, "checking block %zu of %zu\n", + return snprintf(buf, 64, "checking block %zu of %zu\n", (size_t)ldev_info->cc_lba, (size_t)ldev_info->cfg_devsize); else - return snprintf(buf, 32, "not checking\n"); + return snprintf(buf, 64, "not checking\n"); }
static ssize_t consistency_check_store(struct device *dev,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li Nan linan122@huawei.com
[ Upstream commit 0296bea01cfa6526be6bd2d16dc83b4e7f1af91f ]
"if device_add() succeeds, you should call device_del() when you want to get rid of it."
In sd_probe(), device_add_disk() fails when device_add() has already succeeded, so change put_device() to device_unregister() to ensure device resources are released.
Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()") Signed-off-by: Li Nan linan122@huawei.com Link: https://lore.kernel.org/r/20231208082335.1754205-1-linan666@huaweicloud.com Reviewed-by: Bart Van Assche bvanassche@acm.org Reviewed-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/sd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index e80c33cdad2b9..c62f677084b4c 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3754,7 +3754,7 @@ static int sd_probe(struct device *dev)
error = device_add_disk(dev, gd, NULL); if (error) { - put_device(&sdkp->disk_dev); + device_unregister(&sdkp->disk_dev); put_disk(gd); goto out; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oswald Buddenhagen oswald.buddenhagen@gmx.de
[ Upstream commit 03f56ed4ead162551ac596c9e3076ff01f1c5836 ]
As already anticipated in the original commit, playback was broken for very short samples. I just didn't expect it to be an actual problem, because we're talking about less than 1.5 milliseconds here. But clearly such wavetable samples do actually exist.
The problem was that for such short samples we'd set the current position beyond the end of the loop, so we'd run off the end of the sample and play garbage. This is a bigger (more audible) problem than the original one, which was that we'd start playback with garbage (whatever was still in the cache), which would be mostly masked by the note's attack phase.
So revert to the old behavior for now. We'll subsequently fix it properly with a bigger patch series. Note that this isn't a full revert - the dead code is not re-introduced, because that would be silly.
Fixes: df335e9a8bcb ("ALSA: emu10k1: fix synthesizer sample playback position and caching") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218625 Signed-off-by: Oswald Buddenhagen oswald.buddenhagen@gmx.de Message-ID: 20240401145805.528794-1-oswald.buddenhagen@gmx.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/emu10k1/emu10k1_callback.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/sound/pci/emu10k1/emu10k1_callback.c b/sound/pci/emu10k1/emu10k1_callback.c index d36234b88fb42..941bfbf812ed3 100644 --- a/sound/pci/emu10k1/emu10k1_callback.c +++ b/sound/pci/emu10k1/emu10k1_callback.c @@ -255,7 +255,7 @@ lookup_voices(struct snd_emux *emu, struct snd_emu10k1 *hw, /* check if sample is finished playing (non-looping only) */ if (bp != best + V_OFF && bp != best + V_FREE && (vp->reg.sample_mode & SNDRV_SFNT_SAMPLE_SINGLESHOT)) { - val = snd_emu10k1_ptr_read(hw, CCCA_CURRADDR, vp->ch) - 64; + val = snd_emu10k1_ptr_read(hw, CCCA_CURRADDR, vp->ch); if (val >= vp->reg.loopstart) bp = best + V_OFF; } @@ -362,7 +362,7 @@ start_voice(struct snd_emux_voice *vp)
map = (hw->silent_page.addr << hw->address_mode) | (hw->address_mode ? MAP_PTI_MASK1 : MAP_PTI_MASK0);
- addr = vp->reg.start + 64; + addr = vp->reg.start; temp = vp->reg.parm.filterQ; ccca = (temp << 28) | addr; if (vp->apitch < 0xe400) @@ -430,9 +430,6 @@ start_voice(struct snd_emux_voice *vp) /* Q & current address (Q 4bit value, MSB) */ CCCA, ccca,
- /* cache */ - CCR, REG_VAL_PUT(CCR_CACHEINVALIDSIZE, 64), - /* reset volume */ VTFT, vtarget | vp->ftarget, CVCF, vtarget | CVCF_CURRENTFILTER_MASK,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit e9e62243a3e2322cf639f653a0b0a88a76446ce7 ]
When we're engaged in local caching of a cifs filesystem, we cannot perform caching of a partially written cache granule unless we can read the rest of the granule. This can result in unexpected access errors being reported to the user.
Fix this by the following: if a file is opened O_WRONLY locally, but the mount was given the "-o fsc" flag, try first opening the remote file with GENERIC_READ|GENERIC_WRITE and if that returns -EACCES, try dropping the GENERIC_READ and doing the open again. If that last succeeds, invalidate the cache for that file as for O_DIRECT.
Fixes: 70431bfd825d ("cifs: Support fscache indexing rewrite") Signed-off-by: David Howells dhowells@redhat.com cc: Steve French sfrench@samba.org cc: Shyam Prasad N nspmangalore@gmail.com cc: Rohith Surabattula rohiths.msft@gmail.com cc: Jeff Layton jlayton@kernel.org cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/dir.c | 15 +++++++++++++ fs/smb/client/file.c | 48 ++++++++++++++++++++++++++++++++--------- fs/smb/client/fscache.h | 6 ++++++ 3 files changed, 59 insertions(+), 10 deletions(-)
diff --git a/fs/smb/client/dir.c b/fs/smb/client/dir.c index 580a27a3a7e62..855468a32904e 100644 --- a/fs/smb/client/dir.c +++ b/fs/smb/client/dir.c @@ -189,6 +189,7 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned int disposition; struct TCP_Server_Info *server = tcon->ses->server; struct cifs_open_parms oparms; + int rdwr_for_fscache = 0;
*oplock = 0; if (tcon->ses->server->oplocks) @@ -200,6 +201,10 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned return PTR_ERR(full_path); }
+ /* If we're caching, we need to be able to fill in around partial writes. */ + if (cifs_fscache_enabled(inode) && (oflags & O_ACCMODE) == O_WRONLY) + rdwr_for_fscache = 1; + #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY if (tcon->unix_ext && cap_unix(tcon->ses) && !tcon->broken_posix_open && (CIFS_UNIX_POSIX_PATH_OPS_CAP & @@ -276,6 +281,8 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned desired_access |= GENERIC_READ; /* is this too little? */ if (OPEN_FMODE(oflags) & FMODE_WRITE) desired_access |= GENERIC_WRITE; + if (rdwr_for_fscache == 1) + desired_access |= GENERIC_READ;
disposition = FILE_OVERWRITE_IF; if ((oflags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL)) @@ -304,6 +311,7 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned if (!tcon->unix_ext && (mode & S_IWUGO) == 0) create_options |= CREATE_OPTION_READONLY;
+retry_open: oparms = (struct cifs_open_parms) { .tcon = tcon, .cifs_sb = cifs_sb, @@ -317,8 +325,15 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned rc = server->ops->open(xid, &oparms, oplock, buf); if (rc) { cifs_dbg(FYI, "cifs_create returned 0x%x\n", rc); + if (rc == -EACCES && rdwr_for_fscache == 1) { + desired_access &= ~GENERIC_READ; + rdwr_for_fscache = 2; + goto retry_open; + } goto out; } + if (rdwr_for_fscache == 2) + cifs_invalidate_cache(inode, FSCACHE_INVAL_DIO_WRITE);
#ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY /* diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c index c711d5eb2987e..606972a95465b 100644 --- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -206,12 +206,12 @@ cifs_mark_open_files_invalid(struct cifs_tcon *tcon) */ }
-static inline int cifs_convert_flags(unsigned int flags) +static inline int cifs_convert_flags(unsigned int flags, int rdwr_for_fscache) { if ((flags & O_ACCMODE) == O_RDONLY) return GENERIC_READ; else if ((flags & O_ACCMODE) == O_WRONLY) - return GENERIC_WRITE; + return rdwr_for_fscache == 1 ? (GENERIC_READ | GENERIC_WRITE) : GENERIC_WRITE; else if ((flags & O_ACCMODE) == O_RDWR) { /* GENERIC_ALL is too much permission to request can cause unnecessary access denied on create */ @@ -348,11 +348,16 @@ static int cifs_nt_open(const char *full_path, struct inode *inode, struct cifs_ int create_options = CREATE_NOT_DIR; struct TCP_Server_Info *server = tcon->ses->server; struct cifs_open_parms oparms; + int rdwr_for_fscache = 0;
if (!server->ops->open) return -ENOSYS;
- desired_access = cifs_convert_flags(f_flags); + /* If we're caching, we need to be able to fill in around partial writes. */ + if (cifs_fscache_enabled(inode) && (f_flags & O_ACCMODE) == O_WRONLY) + rdwr_for_fscache = 1; + + desired_access = cifs_convert_flags(f_flags, rdwr_for_fscache);
/********************************************************************* * open flag mapping table: @@ -389,6 +394,7 @@ static int cifs_nt_open(const char *full_path, struct inode *inode, struct cifs_ if (f_flags & O_DIRECT) create_options |= CREATE_NO_BUFFER;
+retry_open: oparms = (struct cifs_open_parms) { .tcon = tcon, .cifs_sb = cifs_sb, @@ -400,8 +406,16 @@ static int cifs_nt_open(const char *full_path, struct inode *inode, struct cifs_ };
rc = server->ops->open(xid, &oparms, oplock, buf); - if (rc) + if (rc) { + if (rc == -EACCES && rdwr_for_fscache == 1) { + desired_access = cifs_convert_flags(f_flags, 0); + rdwr_for_fscache = 2; + goto retry_open; + } return rc; + } + if (rdwr_for_fscache == 2) + cifs_invalidate_cache(inode, FSCACHE_INVAL_DIO_WRITE);
/* TODO: Add support for calling posix query info but with passing in fid */ if (tcon->unix_ext) @@ -834,11 +848,11 @@ int cifs_open(struct inode *inode, struct file *file) use_cache: fscache_use_cookie(cifs_inode_cookie(file_inode(file)), file->f_mode & FMODE_WRITE); - if (file->f_flags & O_DIRECT && - (!((file->f_flags & O_ACCMODE) != O_RDONLY) || - file->f_flags & O_APPEND)) - cifs_invalidate_cache(file_inode(file), - FSCACHE_INVAL_DIO_WRITE); + if (!(file->f_flags & O_DIRECT)) + goto out; + if ((file->f_flags & (O_ACCMODE | O_APPEND)) == O_RDONLY) + goto out; + cifs_invalidate_cache(file_inode(file), FSCACHE_INVAL_DIO_WRITE);
out: free_dentry_path(page); @@ -903,6 +917,7 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush) int disposition = FILE_OPEN; int create_options = CREATE_NOT_DIR; struct cifs_open_parms oparms; + int rdwr_for_fscache = 0;
xid = get_xid(); mutex_lock(&cfile->fh_mutex); @@ -966,7 +981,11 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush) } #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */
- desired_access = cifs_convert_flags(cfile->f_flags); + /* If we're caching, we need to be able to fill in around partial writes. */ + if (cifs_fscache_enabled(inode) && (cfile->f_flags & O_ACCMODE) == O_WRONLY) + rdwr_for_fscache = 1; + + desired_access = cifs_convert_flags(cfile->f_flags, rdwr_for_fscache);
/* O_SYNC also has bit for O_DSYNC so following check picks up either */ if (cfile->f_flags & O_SYNC) @@ -978,6 +997,7 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush) if (server->ops->get_lease_key) server->ops->get_lease_key(inode, &cfile->fid);
+retry_open: oparms = (struct cifs_open_parms) { .tcon = tcon, .cifs_sb = cifs_sb, @@ -1003,6 +1023,11 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush) /* indicate that we need to relock the file */ oparms.reconnect = true; } + if (rc == -EACCES && rdwr_for_fscache == 1) { + desired_access = cifs_convert_flags(cfile->f_flags, 0); + rdwr_for_fscache = 2; + goto retry_open; + }
if (rc) { mutex_unlock(&cfile->fh_mutex); @@ -1011,6 +1036,9 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush) goto reopen_error_exit; }
+ if (rdwr_for_fscache == 2) + cifs_invalidate_cache(inode, FSCACHE_INVAL_DIO_WRITE); + #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY reopen_success: #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */ diff --git a/fs/smb/client/fscache.h b/fs/smb/client/fscache.h index a3d73720914f8..1f2ea9f5cc9a8 100644 --- a/fs/smb/client/fscache.h +++ b/fs/smb/client/fscache.h @@ -109,6 +109,11 @@ static inline void cifs_readahead_to_fscache(struct inode *inode, __cifs_readahead_to_fscache(inode, pos, len); }
+static inline bool cifs_fscache_enabled(struct inode *inode) +{ + return fscache_cookie_enabled(cifs_inode_cookie(inode)); +} + #else /* CONFIG_CIFS_FSCACHE */ static inline void cifs_fscache_fill_coherency(struct inode *inode, @@ -124,6 +129,7 @@ static inline void cifs_fscache_release_inode_cookie(struct inode *inode) {} static inline void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update) {} static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode) { return NULL; } static inline void cifs_invalidate_cache(struct inode *inode, unsigned int flags) {} +static inline bool cifs_fscache_enabled(struct inode *inode) { return false; }
static inline int cifs_fscache_query_occupancy(struct inode *inode, pgoff_t first, unsigned int nr_pages,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huai-Yuan Liu qq810974084@gmail.com
[ Upstream commit 1f886a7bfb3faf4c1021e73f045538008ce7634e ]
In function pci1xxxx_spi_probe, there is a potential null pointer that may be caused by a failed memory allocation by the function devm_kzalloc. Hence, a null pointer check needs to be added to prevent null pointer dereferencing later in the code.
To fix this issue, spi_bus->spi_int[iter] should be checked. The memory allocated by devm_kzalloc will be automatically released, so just directly return -ENOMEM without worrying about memory leaks.
Fixes: 1cc0cbea7167 ("spi: microchip: pci1xxxx: Add driver for SPI controller of PCI1XXXX PCIe switch") Signed-off-by: Huai-Yuan Liu qq810974084@gmail.com Link: https://msgid.link/r/20240403014221.969801-1-qq810974084@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-pci1xxxx.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/spi/spi-pci1xxxx.c b/drivers/spi/spi-pci1xxxx.c index 3638e974f5d49..06bf58b7e5d72 100644 --- a/drivers/spi/spi-pci1xxxx.c +++ b/drivers/spi/spi-pci1xxxx.c @@ -275,6 +275,8 @@ static int pci1xxxx_spi_probe(struct pci_dev *pdev, const struct pci_device_id * spi_bus->spi_int[iter] = devm_kzalloc(&pdev->dev, sizeof(struct pci1xxxx_spi_internal), GFP_KERNEL); + if (!spi_bus->spi_int[iter]) + return -ENOMEM; spi_sub_ptr = spi_bus->spi_int[iter]; spi_sub_ptr->spi_host = devm_spi_alloc_host(dev, sizeof(struct spi_controller)); if (!spi_sub_ptr->spi_host)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit b286997e83dcf7b498329a66a8a22fc8a5bf50f0 ]
Event count value is initialized and set to zero in function paicrypt_start(). This function is called once per CPU when an event is started on that CPU. This leads to event count value being set to zero as many times as there are online CPUs. This is not necessary. The event count value is bound to the event and it is sufficient to initialize the event counter once at event creation time. This is done when the event structure is dynamicly allocated with __GFP_ZERO flag. This sets member count to zero.
Acked-by: Sumanth Korikkar sumanthk@linux.ibm.com Signed-off-by: Thomas Richter tmricht@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Stable-dep-of: e9f3af02f639 ("s390/pai: fix sampling event removal for PMU device driver") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/perf_pai_crypto.c | 1 - arch/s390/kernel/perf_pai_ext.c | 1 - 2 files changed, 2 deletions(-)
diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c index 4a4e914c283c8..0921cea849125 100644 --- a/arch/s390/kernel/perf_pai_crypto.c +++ b/arch/s390/kernel/perf_pai_crypto.c @@ -253,7 +253,6 @@ static void paicrypt_start(struct perf_event *event, int flags) if (!event->hw.last_tag) { event->hw.last_tag = 1; sum = paicrypt_getall(event); /* Get current value */ - local64_set(&event->count, 0); local64_set(&event->hw.prev_count, sum); } } diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c index b5febe22d0546..ac32107167eac 100644 --- a/arch/s390/kernel/perf_pai_ext.c +++ b/arch/s390/kernel/perf_pai_ext.c @@ -327,7 +327,6 @@ static void paiext_start(struct perf_event *event, int flags) event->hw.last_tag = 1; sum = paiext_getall(event); /* Get current value */ local64_set(&event->hw.prev_count, sum); - local64_set(&event->count, 0); }
static int paiext_add(struct perf_event *event, int flags)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit aecd5a37b5ef4de4f6402dc079672e4243cc4c13 ]
Function paicrypt_event_init() initializes the PMU device driver specific details for an event. It is called once per event creation. The function paicrypt_event_init() is not necessarily executed on that CPU the event will be used for. When an event is activated, function paicrypt_start() is used to start the event on that CPU. The per CPU data structure struct paicrypt_map has a pointer to the event which is active for a particular CPU. This pointer is set in function paicrypt_start() to point to the currently installed event. There is no need to also set this pointer in function paicrypt_event_init() where is might be assigned to the wrong CPU. Therefore remove this assignment in paicrypt_event_init().
Acked-by: Sumanth Korikkar sumanthk@linux.ibm.com Signed-off-by: Thomas Richter tmricht@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Stable-dep-of: e9f3af02f639 ("s390/pai: fix sampling event removal for PMU device driver") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/perf_pai_crypto.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c index 0921cea849125..1ac74333a78dc 100644 --- a/arch/s390/kernel/perf_pai_crypto.c +++ b/arch/s390/kernel/perf_pai_crypto.c @@ -216,7 +216,6 @@ static int paicrypt_event_init(struct perf_event *event) * are active at the same time. */ event->hw.last_tag = 0; - cpump->event = event; event->destroy = paicrypt_event_destroy;
if (a->sample_period) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit 4711b7b8f99583f6105a33e91f106125134beacb ]
Setting event::hw.last_tag to zero is not necessary. The memory for each event is dynamically allocated by the kernel common code and initialized to zero already. Remove this unnecessary assignment. Move the comment to function paicrypt_start() for clarification.
Suggested-by: Sumanth Korikkar sumanthk@linux.ibm.com Acked-by: Sumanth Korikkar sumanthk@linux.ibm.com Signed-off-by: Thomas Richter tmricht@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Stable-dep-of: e9f3af02f639 ("s390/pai: fix sampling event removal for PMU device driver") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/perf_pai_crypto.c | 11 +++++------ arch/s390/kernel/perf_pai_ext.c | 1 - 2 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c index 1ac74333a78dc..270255acacb02 100644 --- a/arch/s390/kernel/perf_pai_crypto.c +++ b/arch/s390/kernel/perf_pai_crypto.c @@ -210,12 +210,6 @@ static int paicrypt_event_init(struct perf_event *event) if (rc) return rc;
- /* Event initialization sets last_tag to 0. When later on the events - * are deleted and re-added, do not reset the event count value to zero. - * Events are added, deleted and re-added when 2 or more events - * are active at the same time. - */ - event->hw.last_tag = 0; event->destroy = paicrypt_event_destroy;
if (a->sample_period) { @@ -249,6 +243,11 @@ static void paicrypt_start(struct perf_event *event, int flags) { u64 sum;
+ /* Event initialization sets last_tag to 0. When later on the events + * are deleted and re-added, do not reset the event count value to zero. + * Events are added, deleted and re-added when 2 or more events + * are active at the same time. + */ if (!event->hw.last_tag) { event->hw.last_tag = 1; sum = paicrypt_getall(event); /* Get current value */ diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c index ac32107167eac..8fddde11cfb1f 100644 --- a/arch/s390/kernel/perf_pai_ext.c +++ b/arch/s390/kernel/perf_pai_ext.c @@ -261,7 +261,6 @@ static int paiext_event_init(struct perf_event *event) rc = paiext_alloc(a, event); if (rc) return rc; - event->hw.last_tag = 0; event->destroy = paiext_event_destroy;
if (a->sample_period) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit cb1259b7b574bd90ef22dac2c6282327cdae31c6 ]
The PAI crypto counter and PAI NNPA counters start and stop functions are streamlined. Move the conditions to invoke start and stop functions to its respective function body and call them unconditionally. The start and stop functions now determine how to proceed. No functional change.
Signed-off-by: Thomas Richter tmricht@linux.ibm.com Acked-by: Mete Durlu meted@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Stable-dep-of: e9f3af02f639 ("s390/pai: fix sampling event removal for PMU device driver") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/perf_pai_crypto.c | 29 ++++++++++++------------- arch/s390/kernel/perf_pai_ext.c | 35 ++++++++++++++---------------- 2 files changed, 30 insertions(+), 34 deletions(-)
diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c index 270255acacb02..65cc74ab4cdd8 100644 --- a/arch/s390/kernel/perf_pai_crypto.c +++ b/arch/s390/kernel/perf_pai_crypto.c @@ -248,10 +248,14 @@ static void paicrypt_start(struct perf_event *event, int flags) * Events are added, deleted and re-added when 2 or more events * are active at the same time. */ - if (!event->hw.last_tag) { - event->hw.last_tag = 1; - sum = paicrypt_getall(event); /* Get current value */ - local64_set(&event->hw.prev_count, sum); + if (!event->attr.sample_period) { /* Counting */ + if (!event->hw.last_tag) { + event->hw.last_tag = 1; + sum = paicrypt_getall(event); /* Get current value */ + local64_set(&event->hw.prev_count, sum); + } + } else { /* Sampling */ + perf_sched_cb_inc(event->pmu); } }
@@ -266,19 +270,18 @@ static int paicrypt_add(struct perf_event *event, int flags) __ctl_set_bit(0, 50); } cpump->event = event; - if (flags & PERF_EF_START && !event->attr.sample_period) { - /* Only counting needs initial counter value */ + if (flags & PERF_EF_START) paicrypt_start(event, PERF_EF_RELOAD); - } event->hw.state = 0; - if (event->attr.sample_period) - perf_sched_cb_inc(event->pmu); return 0; }
static void paicrypt_stop(struct perf_event *event, int flags) { - paicrypt_read(event); + if (!event->attr.sample_period) /* Counting */ + paicrypt_read(event); + else /* Sampling */ + perf_sched_cb_dec(event->pmu); event->hw.state = PERF_HES_STOPPED; }
@@ -286,11 +289,7 @@ static void paicrypt_del(struct perf_event *event, int flags) { struct paicrypt_map *cpump = this_cpu_ptr(&paicrypt_map);
- if (event->attr.sample_period) - perf_sched_cb_dec(event->pmu); - if (!event->attr.sample_period) - /* Only counting needs to read counter */ - paicrypt_stop(event, PERF_EF_UPDATE); + paicrypt_stop(event, PERF_EF_UPDATE); if (--cpump->active_events == 0) { __ctl_clear_bit(0, 50); WRITE_ONCE(S390_lowcore.ccd, 0); diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c index 8fddde11cfb1f..bac95261ec46d 100644 --- a/arch/s390/kernel/perf_pai_ext.c +++ b/arch/s390/kernel/perf_pai_ext.c @@ -321,11 +321,15 @@ static void paiext_start(struct perf_event *event, int flags) { u64 sum;
- if (event->hw.last_tag) - return; - event->hw.last_tag = 1; - sum = paiext_getall(event); /* Get current value */ - local64_set(&event->hw.prev_count, sum); + if (!event->attr.sample_period) { /* Counting */ + if (!event->hw.last_tag) { + event->hw.last_tag = 1; + sum = paiext_getall(event); /* Get current value */ + local64_set(&event->hw.prev_count, sum); + } + } else { /* Sampling */ + perf_sched_cb_inc(event->pmu); + } }
static int paiext_add(struct perf_event *event, int flags) @@ -342,21 +346,19 @@ static int paiext_add(struct perf_event *event, int flags) debug_sprintf_event(paiext_dbg, 4, "%s 1508 %llx acc %llx\n", __func__, S390_lowcore.aicd, pcb->acc); } - if (flags & PERF_EF_START && !event->attr.sample_period) { - /* Only counting needs initial counter value */ + cpump->event = event; + if (flags & PERF_EF_START) paiext_start(event, PERF_EF_RELOAD); - } event->hw.state = 0; - if (event->attr.sample_period) { - cpump->event = event; - perf_sched_cb_inc(event->pmu); - } return 0; }
static void paiext_stop(struct perf_event *event, int flags) { - paiext_read(event); + if (!event->attr.sample_period) /* Counting */ + paiext_read(event); + else /* Sampling */ + perf_sched_cb_dec(event->pmu); event->hw.state = PERF_HES_STOPPED; }
@@ -366,12 +368,7 @@ static void paiext_del(struct perf_event *event, int flags) struct paiext_map *cpump = mp->mapptr; struct paiext_cb *pcb = cpump->paiext_cb;
- if (event->attr.sample_period) - perf_sched_cb_dec(event->pmu); - if (!event->attr.sample_period) { - /* Only counting needs to read counter */ - paiext_stop(event, PERF_EF_UPDATE); - } + paiext_stop(event, PERF_EF_UPDATE); if (--cpump->active_events == 0) { /* Disable CPU instruction lookup for PAIE1 control block */ __ctl_clear_bit(0, 49);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit e9f3af02f63909f41b43c28330434cc437639c5c ]
In case of a sampling event, the PAI PMU device drivers need a reference to this event. Currently to PMU device driver reference is removed when a sampling event is destroyed. This may lead to situations where the reference of the PMU device driver is removed while being used by a different sampling event. Reset the event reference pointer of the PMU device driver when a sampling event is deleted and before the next one might be added.
Fixes: 39d62336f5c1 ("s390/pai: add support for cryptography counters") Signed-off-by: Thomas Richter tmricht@linux.ibm.com Acked-by: Sumanth Korikkar sumanthk@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/perf_pai_crypto.c | 10 +++++++--- arch/s390/kernel/perf_pai_ext.c | 10 +++++++--- 2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c index 65cc74ab4cdd8..1eefbe2ff4189 100644 --- a/arch/s390/kernel/perf_pai_crypto.c +++ b/arch/s390/kernel/perf_pai_crypto.c @@ -53,7 +53,6 @@ static void paicrypt_event_destroy(struct perf_event *event) { struct paicrypt_map *cpump = per_cpu_ptr(&paicrypt_map, event->cpu);
- cpump->event = NULL; static_branch_dec(&pai_key); mutex_lock(&pai_reserve_mutex); debug_sprintf_event(cfm_dbg, 5, "%s event %#llx cpu %d users %d" @@ -278,10 +277,15 @@ static int paicrypt_add(struct perf_event *event, int flags)
static void paicrypt_stop(struct perf_event *event, int flags) { - if (!event->attr.sample_period) /* Counting */ + struct paicrypt_mapptr *mp = this_cpu_ptr(paicrypt_root.mapptr); + struct paicrypt_map *cpump = mp->mapptr; + + if (!event->attr.sample_period) { /* Counting */ paicrypt_read(event); - else /* Sampling */ + } else { /* Sampling */ perf_sched_cb_dec(event->pmu); + cpump->event = NULL; + } event->hw.state = PERF_HES_STOPPED; }
diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c index bac95261ec46d..a9235071ca70b 100644 --- a/arch/s390/kernel/perf_pai_ext.c +++ b/arch/s390/kernel/perf_pai_ext.c @@ -122,7 +122,6 @@ static void paiext_event_destroy(struct perf_event *event) struct paiext_map *cpump = mp->mapptr;
mutex_lock(&paiext_reserve_mutex); - cpump->event = NULL; if (refcount_dec_and_test(&cpump->refcnt)) /* Last reference gone */ paiext_free(mp); paiext_root_free(); @@ -355,10 +354,15 @@ static int paiext_add(struct perf_event *event, int flags)
static void paiext_stop(struct perf_event *event, int flags) { - if (!event->attr.sample_period) /* Counting */ + struct paiext_mapptr *mp = this_cpu_ptr(paiext_root.mapptr); + struct paiext_map *cpump = mp->mapptr; + + if (!event->attr.sample_period) { /* Counting */ paiext_read(event); - else /* Sampling */ + } else { /* Sampling */ perf_sched_cb_dec(event->pmu); + cpump->event = NULL; + } event->hw.state = PERF_HES_STOPPED; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 3137b83a90646917c90951d66489db466b4ae106 ]
Building with W=1 shows a warning for an unused variable when CONFIG_PCI is diabled:
drivers/ata/sata_mv.c:790:35: error: unused variable 'mv_pci_tbl' [-Werror,-Wunused-const-variable] static const struct pci_device_id mv_pci_tbl[] = {
Move the table into the same block that containsn the pci_driver definition.
Fixes: 7bb3c5290ca0 ("sata_mv: Remove PCI dependency") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/sata_mv.c | 63 +++++++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 32 deletions(-)
diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c index 45e48d653c60b..80a45e11fb5b6 100644 --- a/drivers/ata/sata_mv.c +++ b/drivers/ata/sata_mv.c @@ -787,37 +787,6 @@ static const struct ata_port_info mv_port_info[] = { }, };
-static const struct pci_device_id mv_pci_tbl[] = { - { PCI_VDEVICE(MARVELL, 0x5040), chip_504x }, - { PCI_VDEVICE(MARVELL, 0x5041), chip_504x }, - { PCI_VDEVICE(MARVELL, 0x5080), chip_5080 }, - { PCI_VDEVICE(MARVELL, 0x5081), chip_508x }, - /* RocketRAID 1720/174x have different identifiers */ - { PCI_VDEVICE(TTI, 0x1720), chip_6042 }, - { PCI_VDEVICE(TTI, 0x1740), chip_6042 }, - { PCI_VDEVICE(TTI, 0x1742), chip_6042 }, - - { PCI_VDEVICE(MARVELL, 0x6040), chip_604x }, - { PCI_VDEVICE(MARVELL, 0x6041), chip_604x }, - { PCI_VDEVICE(MARVELL, 0x6042), chip_6042 }, - { PCI_VDEVICE(MARVELL, 0x6080), chip_608x }, - { PCI_VDEVICE(MARVELL, 0x6081), chip_608x }, - - { PCI_VDEVICE(ADAPTEC2, 0x0241), chip_604x }, - - /* Adaptec 1430SA */ - { PCI_VDEVICE(ADAPTEC2, 0x0243), chip_7042 }, - - /* Marvell 7042 support */ - { PCI_VDEVICE(MARVELL, 0x7042), chip_7042 }, - - /* Highpoint RocketRAID PCIe series */ - { PCI_VDEVICE(TTI, 0x2300), chip_7042 }, - { PCI_VDEVICE(TTI, 0x2310), chip_7042 }, - - { } /* terminate list */ -}; - static const struct mv_hw_ops mv5xxx_ops = { .phy_errata = mv5_phy_errata, .enable_leds = mv5_enable_leds, @@ -4300,6 +4269,36 @@ static int mv_pci_init_one(struct pci_dev *pdev, static int mv_pci_device_resume(struct pci_dev *pdev); #endif
+static const struct pci_device_id mv_pci_tbl[] = { + { PCI_VDEVICE(MARVELL, 0x5040), chip_504x }, + { PCI_VDEVICE(MARVELL, 0x5041), chip_504x }, + { PCI_VDEVICE(MARVELL, 0x5080), chip_5080 }, + { PCI_VDEVICE(MARVELL, 0x5081), chip_508x }, + /* RocketRAID 1720/174x have different identifiers */ + { PCI_VDEVICE(TTI, 0x1720), chip_6042 }, + { PCI_VDEVICE(TTI, 0x1740), chip_6042 }, + { PCI_VDEVICE(TTI, 0x1742), chip_6042 }, + + { PCI_VDEVICE(MARVELL, 0x6040), chip_604x }, + { PCI_VDEVICE(MARVELL, 0x6041), chip_604x }, + { PCI_VDEVICE(MARVELL, 0x6042), chip_6042 }, + { PCI_VDEVICE(MARVELL, 0x6080), chip_608x }, + { PCI_VDEVICE(MARVELL, 0x6081), chip_608x }, + + { PCI_VDEVICE(ADAPTEC2, 0x0241), chip_604x }, + + /* Adaptec 1430SA */ + { PCI_VDEVICE(ADAPTEC2, 0x0243), chip_7042 }, + + /* Marvell 7042 support */ + { PCI_VDEVICE(MARVELL, 0x7042), chip_7042 }, + + /* Highpoint RocketRAID PCIe series */ + { PCI_VDEVICE(TTI, 0x2300), chip_7042 }, + { PCI_VDEVICE(TTI, 0x2310), chip_7042 }, + + { } /* terminate list */ +};
static struct pci_driver mv_pci_driver = { .name = DRV_NAME, @@ -4312,6 +4311,7 @@ static struct pci_driver mv_pci_driver = { #endif
}; +MODULE_DEVICE_TABLE(pci, mv_pci_tbl);
/** * mv_print_info - Dump key info to kernel log for perusal. @@ -4484,7 +4484,6 @@ static void __exit mv_exit(void) MODULE_AUTHOR("Brett Russ"); MODULE_DESCRIPTION("SCSI low-level driver for Marvell SATA controllers"); MODULE_LICENSE("GPL v2"); -MODULE_DEVICE_TABLE(pci, mv_pci_tbl); MODULE_VERSION(DRV_VERSION); MODULE_ALIAS("platform:" DRV_NAME);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vijendar Mukunda Vijendar.Mukunda@amd.com
[ Upstream commit b9846a386734e73a1414950ebfd50f04919f5e24 ]
Before ACP firmware loading, DSP interrupts are not expected. Sometimes after reboot, it's observed that before ACP firmware is loaded false DSP interrupt is reported. Registering the interrupt handler before acp initialization causing false interrupts sometimes on reboot as ACP reset is not applied. Correct the sequence by invoking acp initialization sequence prior to registering interrupt handler.
Fixes: 738a2b5e2cc9 ("ASoC: SOF: amd: Add IPC support for ACP IP block") Signed-off-by: Vijendar Mukunda Vijendar.Mukunda@amd.com Link: https://msgid.link/r/20240404041717.430545-1-Vijendar.Mukunda@amd.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sof/amd/acp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/sound/soc/sof/amd/acp.c b/sound/soc/sof/amd/acp.c index 4c54ce212de6a..cc006d7038d97 100644 --- a/sound/soc/sof/amd/acp.c +++ b/sound/soc/sof/amd/acp.c @@ -522,6 +522,10 @@ int amd_sof_acp_probe(struct snd_sof_dev *sdev) goto unregister_dev; }
+ ret = acp_init(sdev); + if (ret < 0) + goto free_smn_dev; + sdev->ipc_irq = pci->irq; ret = request_threaded_irq(sdev->ipc_irq, acp_irq_handler, acp_irq_thread, IRQF_SHARED, "AudioDSP", sdev); @@ -531,10 +535,6 @@ int amd_sof_acp_probe(struct snd_sof_dev *sdev) goto free_smn_dev; }
- ret = acp_init(sdev); - if (ret < 0) - goto free_ipc_irq; - sdev->dsp_box.offset = 0; sdev->dsp_box.size = BOX_SIZE_512;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 05258a0a69b3c5d2c003f818702c0a52b6fea861 ]
Jan Schunk reports that his small NFS servers suffer from memory exhaustion after just a few days. A bisect shows that commit e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call") is the first bad commit.
That commit assumed that sock_sendmsg() releases all the pages in the underlying bio_vec array, but the reality is that it doesn't. svc_xprt_release() releases the rqst's response pages, but the record marker page fragment isn't one of those, so it is never released.
This is a narrow fix that can be applied to stable kernels. A more extensive fix is in the works.
Reported-by: Jan Schunk scpcom@gmx.de Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218671 Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call") Cc: Alexander Duyck alexander.duyck@gmail.com Cc: Jakub Kacinski kuba@kernel.org Cc: David Howells dhowells@redhat.com Reviewed-by: David Howells dhowells@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/svcsock.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index e0ce4276274be..933e12e3a55c7 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1216,15 +1216,6 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) * MSG_SPLICE_PAGES is used exclusively to reduce the number of * copy operations in this path. Therefore the caller must ensure * that the pages backing @xdr are unchanging. - * - * Note that the send is non-blocking. The caller has incremented - * the reference count on each page backing the RPC message, and - * the network layer will "put" these pages when transmission is - * complete. - * - * This is safe for our RPC services because the memory backing - * the head and tail components is never kmalloc'd. These always - * come from pages in the svc_rqst::rq_pages array. */ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp, rpc_fraghdr marker, unsigned int *sentp) @@ -1254,6 +1245,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp, iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, 1 + count, sizeof(marker) + rqstp->rq_res.len); ret = sock_sendmsg(svsk->sk_sock, &msg); + page_frag_free(buf); if (ret < 0) return ret; *sentp += ret;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit a370c2419e4680a27382d9231edcf739d5d74efc ]
patch_map() uses fixmap mappings to circumvent the non-writability of the kernel text mapping.
The __set_fixmap() function only flushes the current cpu tlb, it does not emit an IPI so we must make sure that while we use a fixmap mapping, the current task is not migrated on another cpu which could miss the newly introduced fixmap mapping.
So in order to avoid any task migration, disable the preemption.
Reported-by: Andrea Parri andrea@rivosinc.com Closes: https://lore.kernel.org/all/ZcS+GAaM25LXsBOl@andrea/ Reported-by: Andy Chiu andy.chiu@sifive.com Closes: https://lore.kernel.org/linux-riscv/CABgGipUMz3Sffu-CkmeUB1dKVwVQ73+7=sgC45-... Fixes: cad539baa48f ("riscv: implement a memset like function for text") Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock") Co-developed-by: Andy Chiu andy.chiu@sifive.com Signed-off-by: Andy Chiu andy.chiu@sifive.com Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Acked-by: Puranjay Mohan puranjay12@gmail.com Link: https://lore.kernel.org/r/20240326203017.310422-3-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/kernel/patch.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c index 37e87fdcf6a00..30e12b310cab7 100644 --- a/arch/riscv/kernel/patch.c +++ b/arch/riscv/kernel/patch.c @@ -80,6 +80,8 @@ static int __patch_insn_set(void *addr, u8 c, size_t len) */ lockdep_assert_held(&text_mutex);
+ preempt_disable(); + if (across_pages) patch_map(addr + PAGE_SIZE, FIX_TEXT_POKE1);
@@ -92,6 +94,8 @@ static int __patch_insn_set(void *addr, u8 c, size_t len) if (across_pages) patch_unmap(FIX_TEXT_POKE1);
+ preempt_enable(); + return 0; } NOKPROBE_SYMBOL(__patch_insn_set); @@ -122,6 +126,8 @@ static int __patch_insn_write(void *addr, const void *insn, size_t len) if (!riscv_patch_in_stop_machine) lockdep_assert_held(&text_mutex);
+ preempt_disable(); + if (across_pages) patch_map(addr + PAGE_SIZE, FIX_TEXT_POKE1);
@@ -134,6 +140,8 @@ static int __patch_insn_write(void *addr, const void *insn, size_t len) if (across_pages) patch_unmap(FIX_TEXT_POKE1);
+ preempt_enable(); + return ret; } NOKPROBE_SYMBOL(__patch_insn_write);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 10396f4df8b75ff6ab0aa2cd74296565466f2c8d ]
Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the client. While a callback job is technically an RPC that counter is really more for client-driven RPCs, and this has the effect of preventing the client from being unhashed until the callback completes.
If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we can end up in a situation where the callback can't complete on the (now dead) callback channel, but the new client can't connect because the old client can't be unhashed. This usually manifests as a NFS4ERR_DELAY return on the CREATE_SESSION operation.
The job is only holding a reference to the client so it can clear a flag after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of reference when dealing with the nfsdfs info files, but it should work appropriately here to ensure that the nfs4_client doesn't disappear.
Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reported-by: Vladimir Benes vbenes@redhat.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 522596060252f..c7e52d980cd75 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2886,12 +2886,9 @@ static void nfsd4_cb_recall_any_release(struct nfsd4_callback *cb) { struct nfs4_client *clp = cb->cb_clp; - struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
- spin_lock(&nn->client_lock); clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); - put_client_renew_locked(clp); - spin_unlock(&nn->client_lock); + drop_client(clp); }
static const struct nfsd4_callback_ops nfsd4_cb_recall_any_ops = { @@ -6273,7 +6270,7 @@ deleg_reaper(struct nfsd_net *nn) list_add(&clp->cl_ra_cblist, &cblist);
/* release in nfsd4_cb_recall_any_release */ - atomic_inc(&clp->cl_rpc_users); + kref_get(&clp->cl_nfsdfs.cl_ref); set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); clp->cl_ra_time = ktime_get_boottime_seconds(); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesse Brandeburg jesse.brandeburg@intel.com
commit 6c5b6ca7642f2992502a22dbd8b80927de174b67 upstream.
Fix an obviously incorrect assignment, created with a typo or cut-n-paste error.
Fixes: 5995ef88e3a8 ("ice: realloc VSI stats arrays") Signed-off-by: Jesse Brandeburg jesse.brandeburg@intel.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/ice/ice_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -3111,7 +3111,7 @@ ice_vsi_realloc_stat_arrays(struct ice_v } }
- tx_ring_stats = vsi_stat->rx_ring_stats; + tx_ring_stats = vsi_stat->tx_ring_stats; vsi_stat->tx_ring_stats = krealloc_array(vsi_stat->tx_ring_stats, req_txq, sizeof(*vsi_stat->tx_ring_stats),
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov (AMD) bp@alien8.de
commit b377c66ae3509ccea596512d6afb4777711c4870 upstream.
srso_alias_untrain_ret() is special code, even if it is a dummy which is called in the !SRSO case, so annotate it like its real counterpart, to address the following objtool splat:
vmlinux.o: warning: objtool: .export_symbol+0x2b290: data relocation to !ENDBR: srso_alias_untrain_ret+0x0
Fixes: 4535e1a4174c ("x86/bugs: Fix the SRSO mitigation on Zen3/4") Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Ingo Molnar mingo@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Link: https://lore.kernel.org/r/20240405144637.17908-1-bp@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/lib/retpoline.S | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -223,6 +223,7 @@ SYM_CODE_END(srso_return_thunk) /* Dummy for the alternative in CALL_UNTRAIN_RET. */ SYM_CODE_START(srso_alias_untrain_ret) ANNOTATE_UNRET_SAFE + ANNOTATE_NOENDBR ret int3 SYM_FUNC_END(srso_alias_untrain_ret)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bartosz Golaszewski bartosz.golaszewski@linaro.org
commit b3b95964590a3d756d69ea8604c856de805479ad upstream.
We need to take into account that a line's consumer label may be NULL and not try to kstrdup() it in that case but rather pass the NULL pointer up the stack to the interrupt request function.
To that end: let make_irq_label() return NULL as a valid return value and use ERR_PTR() instead to signal an allocation failure to callers.
Cc: stable@vger.kernel.org Fixes: b34490879baa ("gpio: cdev: sanitize the label before requesting the interrupt") Reported-by: Linux Kernel Functional Testing lkft@linaro.org Closes: https://lore.kernel.org/lkml/20240402093534.212283-1-naresh.kamboju@linaro.o... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Tested-by: Anders Roxell anders.roxell@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpiolib-cdev.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
--- a/drivers/gpio/gpiolib-cdev.c +++ b/drivers/gpio/gpiolib-cdev.c @@ -1012,7 +1012,16 @@ static u32 gpio_v2_line_config_debounce_
static inline char *make_irq_label(const char *orig) { - return kstrdup_and_replace(orig, '/', ':', GFP_KERNEL); + char *new; + + if (!orig) + return NULL; + + new = kstrdup_and_replace(orig, '/', ':', GFP_KERNEL); + if (!new) + return ERR_PTR(-ENOMEM); + + return new; }
static inline void free_irq_label(const char *label) @@ -1086,8 +1095,8 @@ static int edge_detector_setup(struct li irqflags |= IRQF_ONESHOT;
label = make_irq_label(line->req->label); - if (!label) - return -ENOMEM; + if (IS_ERR(label)) + return PTR_ERR(label);
/* Request a thread to read the events */ ret = request_threaded_irq(irq, edge_irq_handler, edge_irq_thread, @@ -2194,8 +2203,8 @@ static int lineevent_create(struct gpio_ goto out_free_le;
label = make_irq_label(le->label); - if (!label) { - ret = -ENOMEM; + if (IS_ERR(label)) { + ret = PTR_ERR(label); goto out_free_le; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kent Gibson warthog618@gmail.com
commit 83092341e15d0dfee1caa8dc502f66c815ccd78a upstream.
When adding sanitization of the label, the path through edge_detector_setup() that leads to debounce_setup() was overlooked. A request taking this path does not allocate a new label and the request label is freed twice when the request is released, resulting in memory corruption.
Add label sanitization to debounce_setup().
Cc: stable@vger.kernel.org Fixes: b34490879baa ("gpio: cdev: sanitize the label before requesting the interrupt") Signed-off-by: Kent Gibson warthog618@gmail.com [Bartosz: rebased on top of the fix for empty GPIO labels] Co-developed-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpiolib-cdev.c | 49 +++++++++++++++++++++++++------------------- 1 file changed, 28 insertions(+), 21 deletions(-)
--- a/drivers/gpio/gpiolib-cdev.c +++ b/drivers/gpio/gpiolib-cdev.c @@ -655,6 +655,25 @@ static u32 line_event_id(int level) GPIO_V2_LINE_EVENT_FALLING_EDGE; }
+static inline char *make_irq_label(const char *orig) +{ + char *new; + + if (!orig) + return NULL; + + new = kstrdup_and_replace(orig, '/', ':', GFP_KERNEL); + if (!new) + return ERR_PTR(-ENOMEM); + + return new; +} + +static inline void free_irq_label(const char *label) +{ + kfree(label); +} + #ifdef CONFIG_HTE
static enum hte_return process_hw_ts_thread(void *p) @@ -942,6 +961,7 @@ static int debounce_setup(struct line *l { unsigned long irqflags; int ret, level, irq; + char *label;
/* try hardware */ ret = gpiod_set_debounce(line->desc, debounce_period_us); @@ -964,11 +984,17 @@ static int debounce_setup(struct line *l if (irq < 0) return -ENXIO;
+ label = make_irq_label(line->req->label); + if (IS_ERR(label)) + return -ENOMEM; + irqflags = IRQF_TRIGGER_FALLING | IRQF_TRIGGER_RISING; ret = request_irq(irq, debounce_irq_handler, irqflags, - line->req->label, line); - if (ret) + label, line); + if (ret) { + free_irq_label(label); return ret; + } line->irq = irq; } else { ret = hte_edge_setup(line, GPIO_V2_LINE_FLAG_EDGE_BOTH); @@ -1010,25 +1036,6 @@ static u32 gpio_v2_line_config_debounce_ return 0; }
-static inline char *make_irq_label(const char *orig) -{ - char *new; - - if (!orig) - return NULL; - - new = kstrdup_and_replace(orig, '/', ':', GFP_KERNEL); - if (!new) - return ERR_PTR(-ENOMEM); - - return new; -} - -static inline void free_irq_label(const char *label) -{ - kfree(label); -} - static void edge_detector_stop(struct line *line) { if (line->irq) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
commit c1832f67035dc04fb89e6b591b64e4d515843cda upstream.
Don't send oplock break if rename fails. This patch fix smb2.oplock.batch20 test.
Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/smb2pdu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -5631,8 +5631,9 @@ static int smb2_rename(struct ksmbd_work if (!file_info->ReplaceIfExists) flags = RENAME_NOREPLACE;
- smb_break_all_levII_oplock(work, fp, 0); rc = ksmbd_vfs_rename(work, &fp->filp->f_path, new_name, flags); + if (!rc) + smb_break_all_levII_oplock(work, fp, 0); out: kfree(new_name); return rc;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
commit a677ebd8ca2f2632ccdecbad7b87641274e15aac upstream.
If installing malicious ksmbd-tools, ksmbd.mountd can return invalid ipc response to ksmbd kernel server. ksmbd should validate payload size of ipc response from ksmbd.mountd to avoid memory overrun or slab-out-of-bounds. This patch validate 3 ipc response that has payload.
Cc: stable@vger.kernel.org Reported-by: Chao Ma machao2019@gmail.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/ksmbd_netlink.h | 3 ++- fs/smb/server/mgmt/share_config.c | 7 ++++++- fs/smb/server/transport_ipc.c | 37 +++++++++++++++++++++++++++++++++++++ 3 files changed, 45 insertions(+), 2 deletions(-)
--- a/fs/smb/server/ksmbd_netlink.h +++ b/fs/smb/server/ksmbd_netlink.h @@ -166,7 +166,8 @@ struct ksmbd_share_config_response { __u16 force_uid; __u16 force_gid; __s8 share_name[KSMBD_REQ_MAX_SHARE_NAME]; - __u32 reserved[112]; /* Reserved room */ + __u32 reserved[111]; /* Reserved room */ + __u32 payload_sz; __u32 veto_list_sz; __s8 ____payload[]; }; --- a/fs/smb/server/mgmt/share_config.c +++ b/fs/smb/server/mgmt/share_config.c @@ -158,7 +158,12 @@ static struct ksmbd_share_config *share_ share->name = kstrdup(name, GFP_KERNEL);
if (!test_share_config_flag(share, KSMBD_SHARE_FLAG_PIPE)) { - share->path = kstrdup(ksmbd_share_config_path(resp), + int path_len = PATH_MAX; + + if (resp->payload_sz) + path_len = resp->payload_sz - resp->veto_list_sz; + + share->path = kstrndup(ksmbd_share_config_path(resp), path_len, GFP_KERNEL); if (share->path) share->path_sz = strlen(share->path); --- a/fs/smb/server/transport_ipc.c +++ b/fs/smb/server/transport_ipc.c @@ -65,6 +65,7 @@ struct ipc_msg_table_entry { struct hlist_node ipc_table_hlist;
void *response; + unsigned int msg_sz; };
static struct delayed_work ipc_timer_work; @@ -275,6 +276,7 @@ static int handle_response(int type, voi }
memcpy(entry->response, payload, sz); + entry->msg_sz = sz; wake_up_interruptible(&entry->wait); ret = 0; break; @@ -453,6 +455,34 @@ out: return ret; }
+static int ipc_validate_msg(struct ipc_msg_table_entry *entry) +{ + unsigned int msg_sz = entry->msg_sz; + + if (entry->type == KSMBD_EVENT_RPC_REQUEST) { + struct ksmbd_rpc_command *resp = entry->response; + + msg_sz = sizeof(struct ksmbd_rpc_command) + resp->payload_sz; + } else if (entry->type == KSMBD_EVENT_SPNEGO_AUTHEN_REQUEST) { + struct ksmbd_spnego_authen_response *resp = entry->response; + + msg_sz = sizeof(struct ksmbd_spnego_authen_response) + + resp->session_key_len + resp->spnego_blob_len; + } else if (entry->type == KSMBD_EVENT_SHARE_CONFIG_REQUEST) { + struct ksmbd_share_config_response *resp = entry->response; + + if (resp->payload_sz) { + if (resp->payload_sz < resp->veto_list_sz) + return -EINVAL; + + msg_sz = sizeof(struct ksmbd_share_config_response) + + resp->payload_sz; + } + } + + return entry->msg_sz != msg_sz ? -EINVAL : 0; +} + static void *ipc_msg_send_request(struct ksmbd_ipc_msg *msg, unsigned int handle) { struct ipc_msg_table_entry entry; @@ -477,6 +507,13 @@ static void *ipc_msg_send_request(struct ret = wait_event_interruptible_timeout(entry.wait, entry.response != NULL, IPC_WAIT_TIMEOUT); + if (entry.response) { + ret = ipc_validate_msg(&entry); + if (ret) { + kvfree(entry.response); + entry.response = NULL; + } + } out: down_write(&ipc_msg_table_lock); hash_del(&entry.ipc_table_hlist);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
commit 5ed11af19e56f0434ce0959376d136005745a936 upstream.
SMB2_GLOBAL_CAP_ENCRYPTION flag should be used only for 3.0 and 3.0.2 dialects. This flags set cause compatibility problems with other SMB clients.
Reported-by: James Christopher Adduono jc@adduono.com Tested-by: James Christopher Adduono jc@adduono.com Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/smb2ops.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/fs/smb/server/smb2ops.c +++ b/fs/smb/server/smb2ops.c @@ -228,6 +228,11 @@ void init_smb3_0_server(struct ksmbd_con conn->cli_cap & SMB2_GLOBAL_CAP_ENCRYPTION) conn->vals->capabilities |= SMB2_GLOBAL_CAP_ENCRYPTION;
+ if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION || + (!(server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION_OFF) && + conn->cli_cap & SMB2_GLOBAL_CAP_ENCRYPTION)) + conn->vals->capabilities |= SMB2_GLOBAL_CAP_ENCRYPTION; + if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB3_MULTICHANNEL) conn->vals->capabilities |= SMB2_GLOBAL_CAP_MULTI_CHANNEL; } @@ -275,11 +280,6 @@ int init_smb3_11_server(struct ksmbd_con conn->vals->capabilities |= SMB2_GLOBAL_CAP_LEASING | SMB2_GLOBAL_CAP_DIRECTORY_LEASING;
- if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION || - (!(server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION_OFF) && - conn->cli_cap & SMB2_GLOBAL_CAP_ENCRYPTION)) - conn->vals->capabilities |= SMB2_GLOBAL_CAP_ENCRYPTION; - if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB3_MULTICHANNEL) conn->vals->capabilities |= SMB2_GLOBAL_CAP_MULTI_CHANNEL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoffer Sandberg cs@tuxedo.de
commit daf6c4681a74034d5723e2fb761e0d7f3a1ca18f upstream.
This patch adds the existing fixup to certain TF platforms implementing the ALC274 codec with a headset jack. It fixes/activates the inactive microphone of the headset.
Signed-off-by: Christoffer Sandberg cs@tuxedo.de Signed-off-by: Werner Sembach wse@tuxedocomputers.com Cc: stable@vger.kernel.org Message-ID: 20240328102757.50310-1-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10302,6 +10302,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x1d05, 0x1147, "TongFang GMxTGxx", ALC269_FIXUP_NO_SHUTUP), SND_PCI_QUIRK(0x1d05, 0x115c, "TongFang GMxTGxx", ALC269_FIXUP_NO_SHUTUP), SND_PCI_QUIRK(0x1d05, 0x121b, "TongFang GMxAGxx", ALC269_FIXUP_NO_SHUTUP), + SND_PCI_QUIRK(0x1d05, 0x1387, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1602, "RedmiBook", ALC255_FIXUP_XIAOMI_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1701, "XiaomiNotebook Pro", ALC298_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1d72, 0x1901, "RedmiBook 14", ALC256_FIXUP_ASUS_HEADSET_MIC),
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: I Gede Agastya Darma Laksana gedeagas22@gmail.com
commit 1576f263ee2147dc395531476881058609ad3d38 upstream.
This patch addresses an issue with the Panasonic CF-SZ6's existing quirk, specifically its headset microphone functionality. Previously, the quirk used ALC269_FIXUP_HEADSET_MODE, which does not support the CF-SZ6's design of a single 3.5mm jack for both mic and audio output effectively. The device uses pin 0x19 for the headset mic without jack detection.
Following verification on the CF-SZ6 and discussions with the original patch author, i determined that the update to ALC269_FIXUP_ASPIRE_HEADSET_MIC is the appropriate solution. This change is custom-designed for the CF-SZ6's unique hardware setup, which includes a single 3.5mm jack for both mic and audio output, connecting the headset microphone to pin 0x19 without the use of jack detection.
Fixes: 0fca97a29b83 ("ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk") Signed-off-by: I Gede Agastya Darma Laksana gedeagas22@gmail.com Cc: stable@vger.kernel.org Message-ID: 20240401174602.14133-1-gedeagas22@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10072,7 +10072,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x10ec, 0x1252, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), SND_PCI_QUIRK(0x10ec, 0x1254, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), SND_PCI_QUIRK(0x10ec, 0x12cc, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), - SND_PCI_QUIRK(0x10f7, 0x8338, "Panasonic CF-SZ6", ALC269_FIXUP_HEADSET_MODE), + SND_PCI_QUIRK(0x10f7, 0x8338, "Panasonic CF-SZ6", ALC269_FIXUP_ASPIRE_HEADSET_MIC), SND_PCI_QUIRK(0x144d, 0xc109, "Samsung Ativ book 9 (NP900X3G)", ALC269_FIXUP_INV_DMIC), SND_PCI_QUIRK(0x144d, 0xc169, "Samsung Notebook 9 Pen (NP930SBE-K01US)", ALC298_FIXUP_SAMSUNG_AMP), SND_PCI_QUIRK(0x144d, 0xc176, "Samsung Notebook 9 Pro (NP930MBE-K04US)", ALC298_FIXUP_SAMSUNG_AMP),
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
commit 09ab7eff38202159271534d2f5ad45526168f2a5 upstream.
Just rely on the xarray for any kind of bgid. This simplifies things, and it really doesn't bring us much, if anything.
Cc: stable@vger.kernel.org # v6.4+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/io_uring_types.h | 1 io_uring/io_uring.c | 2 - io_uring/kbuf.c | 70 ++++------------------------------------- 3 files changed, 8 insertions(+), 65 deletions(-)
--- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -250,7 +250,6 @@ struct io_ring_ctx {
struct io_submit_state submit_state;
- struct io_buffer_list *io_bl; struct xarray io_bl_xa;
struct io_hash_table cancel_table_locked; --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -343,7 +343,6 @@ static __cold struct io_ring_ctx *io_rin err: kfree(ctx->cancel_table.hbs); kfree(ctx->cancel_table_locked.hbs); - kfree(ctx->io_bl); xa_destroy(&ctx->io_bl_xa); kfree(ctx); return NULL; @@ -2934,7 +2933,6 @@ static __cold void io_ring_ctx_free(stru io_wq_put_hash(ctx->hash_map); kfree(ctx->cancel_table.hbs); kfree(ctx->cancel_table_locked.hbs); - kfree(ctx->io_bl); xa_destroy(&ctx->io_bl_xa); kfree(ctx); } --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -17,8 +17,6 @@
#define IO_BUFFER_LIST_BUF_PER_PAGE (PAGE_SIZE / sizeof(struct io_uring_buf))
-#define BGID_ARRAY 64 - /* BIDs are addressed by a 16-bit field in a CQE */ #define MAX_BIDS_PER_BGID (1 << 16)
@@ -31,13 +29,9 @@ struct io_provide_buf { __u16 bid; };
-static struct io_buffer_list *__io_buffer_get_list(struct io_ring_ctx *ctx, - struct io_buffer_list *bl, - unsigned int bgid) +static inline struct io_buffer_list *__io_buffer_get_list(struct io_ring_ctx *ctx, + unsigned int bgid) { - if (bl && bgid < BGID_ARRAY) - return &bl[bgid]; - return xa_load(&ctx->io_bl_xa, bgid); }
@@ -53,7 +47,7 @@ static inline struct io_buffer_list *io_ { lockdep_assert_held(&ctx->uring_lock);
- return __io_buffer_get_list(ctx, ctx->io_bl, bgid); + return __io_buffer_get_list(ctx, bgid); }
static int io_buffer_add_list(struct io_ring_ctx *ctx, @@ -66,10 +60,6 @@ static int io_buffer_add_list(struct io_ */ bl->bgid = bgid; smp_store_release(&bl->is_ready, 1); - - if (bgid < BGID_ARRAY) - return 0; - return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL)); }
@@ -215,24 +205,6 @@ void __user *io_buffer_select(struct io_ return ret; }
-static __cold int io_init_bl_list(struct io_ring_ctx *ctx) -{ - struct io_buffer_list *bl; - int i; - - bl = kcalloc(BGID_ARRAY, sizeof(struct io_buffer_list), GFP_KERNEL); - if (!bl) - return -ENOMEM; - - for (i = 0; i < BGID_ARRAY; i++) { - INIT_LIST_HEAD(&bl[i].buf_list); - bl[i].bgid = i; - } - - smp_store_release(&ctx->io_bl, bl); - return 0; -} - /* * Mark the given mapped range as free for reuse */ @@ -305,13 +277,6 @@ void io_destroy_buffers(struct io_ring_c { struct io_buffer_list *bl; unsigned long index; - int i; - - for (i = 0; i < BGID_ARRAY; i++) { - if (!ctx->io_bl) - break; - __io_remove_buffers(ctx, &ctx->io_bl[i], -1U); - }
xa_for_each(&ctx->io_bl_xa, index, bl) { xa_erase(&ctx->io_bl_xa, bl->bgid); @@ -485,12 +450,6 @@ int io_provide_buffers(struct io_kiocb *
io_ring_submit_lock(ctx, issue_flags);
- if (unlikely(p->bgid < BGID_ARRAY && !ctx->io_bl)) { - ret = io_init_bl_list(ctx); - if (ret) - goto err; - } - bl = io_buffer_get_list(ctx, p->bgid); if (unlikely(!bl)) { bl = kzalloc(sizeof(*bl), GFP_KERNEL_ACCOUNT); @@ -503,14 +462,9 @@ int io_provide_buffers(struct io_kiocb * if (ret) { /* * Doesn't need rcu free as it was never visible, but - * let's keep it consistent throughout. Also can't - * be a lower indexed array group, as adding one - * where lookup failed cannot happen. + * let's keep it consistent throughout. */ - if (p->bgid >= BGID_ARRAY) - kfree_rcu(bl, rcu); - else - WARN_ON_ONCE(1); + kfree_rcu(bl, rcu); goto err; } } @@ -675,12 +629,6 @@ int io_register_pbuf_ring(struct io_ring if (reg.ring_entries >= 65536) return -EINVAL;
- if (unlikely(reg.bgid < BGID_ARRAY && !ctx->io_bl)) { - int ret = io_init_bl_list(ctx); - if (ret) - return ret; - } - bl = io_buffer_get_list(ctx, reg.bgid); if (bl) { /* if mapped buffer ring OR classic exists, don't allow */ @@ -730,10 +678,8 @@ int io_unregister_pbuf_ring(struct io_ri return -EINVAL;
__io_remove_buffers(ctx, bl, -1U); - if (bl->bgid >= BGID_ARRAY) { - xa_erase(&ctx->io_bl_xa, bl->bgid); - kfree_rcu(bl, rcu); - } + xa_erase(&ctx->io_bl_xa, bl->bgid); + kfree_rcu(bl, rcu); return 0; }
@@ -741,7 +687,7 @@ void *io_pbuf_get_address(struct io_ring { struct io_buffer_list *bl;
- bl = __io_buffer_get_list(ctx, smp_load_acquire(&ctx->io_bl), bgid); + bl = __io_buffer_get_list(ctx, bgid);
if (!bl || !bl->is_mmap) return NULL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
commit 3b80cff5a4d117c53d38ce805823084eaeffbde6 upstream.
Now that xarray is being exclusively used for the buffer_list lookup, this check is no longer needed. Get rid of it and the is_ready member.
Cc: stable@vger.kernel.org # v6.4+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/kbuf.c | 8 -------- io_uring/kbuf.h | 2 -- 2 files changed, 10 deletions(-)
--- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -59,7 +59,6 @@ static int io_buffer_add_list(struct io_ * always under the ->uring_lock, but the RCU lookup from mmap does. */ bl->bgid = bgid; - smp_store_release(&bl->is_ready, 1); return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL)); }
@@ -691,13 +690,6 @@ void *io_pbuf_get_address(struct io_ring
if (!bl || !bl->is_mmap) return NULL; - /* - * Ensure the list is fully setup. Only strictly needed for RCU lookup - * via mmap, and in that case only for the array indexed groups. For - * the xarray lookups, it's either visible and ready, or not at all. - */ - if (!smp_load_acquire(&bl->is_ready)) - return NULL;
return bl->buf_ring; } --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -29,8 +29,6 @@ struct io_buffer_list { __u8 is_mapped; /* ring mapped provided buffers, but mmap'ed by application */ __u8 is_mmap; - /* bl is visible from an RCU point of view for lookup */ - __u8 is_ready; };
struct io_buffer {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
commit 6b69c4ab4f685327d9e10caf0d84217ba23a8c4b upstream.
No functional changes in this patch, just in preparation for being able to keep the buffer list alive outside of the ctx->uring_lock.
Cc: stable@vger.kernel.org # v6.4+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/kbuf.c | 15 +++++++++++---- io_uring/kbuf.h | 2 ++ 2 files changed, 13 insertions(+), 4 deletions(-)
--- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -59,6 +59,7 @@ static int io_buffer_add_list(struct io_ * always under the ->uring_lock, but the RCU lookup from mmap does. */ bl->bgid = bgid; + atomic_set(&bl->refs, 1); return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL)); }
@@ -272,6 +273,14 @@ static int __io_remove_buffers(struct io return i; }
+static void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl) +{ + if (atomic_dec_and_test(&bl->refs)) { + __io_remove_buffers(ctx, bl, -1U); + kfree_rcu(bl, rcu); + } +} + void io_destroy_buffers(struct io_ring_ctx *ctx) { struct io_buffer_list *bl; @@ -279,8 +288,7 @@ void io_destroy_buffers(struct io_ring_c
xa_for_each(&ctx->io_bl_xa, index, bl) { xa_erase(&ctx->io_bl_xa, bl->bgid); - __io_remove_buffers(ctx, bl, -1U); - kfree_rcu(bl, rcu); + io_put_bl(ctx, bl); }
while (!list_empty(&ctx->io_buffers_pages)) { @@ -676,9 +684,8 @@ int io_unregister_pbuf_ring(struct io_ri if (!bl->is_mapped) return -EINVAL;
- __io_remove_buffers(ctx, bl, -1U); xa_erase(&ctx->io_bl_xa, bl->bgid); - kfree_rcu(bl, rcu); + io_put_bl(ctx, bl); return 0; }
--- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -25,6 +25,8 @@ struct io_buffer_list { __u16 head; __u16 mask;
+ atomic_t refs; + /* ring mapped provided buffers */ __u8 is_mapped; /* ring mapped provided buffers, but mmap'ed by application */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
commit 73eaa2b583493b680c6f426531d6736c39643bfb upstream.
Rather than use the system unbound event workqueue, use an io_uring specific one. This avoids dependencies with the tty, which also uses the system_unbound_wq, and issues flushes of said workqueue from inside its poll handling.
Cc: stable@vger.kernel.org Reported-by: Rasmus Karlsson rasmus.karlsson@pajlada.com Tested-by: Rasmus Karlsson rasmus.karlsson@pajlada.com Tested-by: Iskren Chernev me@iskren.info Link: https://github.com/axboe/liburing/issues/1113 Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -148,6 +148,7 @@ static bool io_uring_try_cancel_requests static void io_queue_sqe(struct io_kiocb *req);
struct kmem_cache *req_cachep; +static struct workqueue_struct *iou_wq __ro_after_init;
static int __read_mostly sysctl_io_uring_disabled; static int __read_mostly sysctl_io_uring_group = -1; @@ -3180,7 +3181,7 @@ static __cold void io_ring_ctx_wait_and_ * noise and overhead, there's no discernable change in runtime * over using system_wq. */ - queue_work(system_unbound_wq, &ctx->exit_work); + queue_work(iou_wq, &ctx->exit_work); }
static int io_uring_release(struct inode *inode, struct file *file) @@ -4664,6 +4665,8 @@ static int __init io_uring_init(void) offsetof(struct io_kiocb, cmd.data), sizeof_field(struct io_kiocb, cmd.data), NULL);
+ iou_wq = alloc_workqueue("iou_exit", WQ_UNBOUND, 64); + #ifdef CONFIG_SYSCTL register_sysctl_init("kernel", kernel_io_uring_disabled_table); #endif
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
commit 561e4f9451d65fc2f7eef564e0064373e3019793 upstream.
If we look up the kbuf, ensure that it doesn't get unregistered until after we're done with it. Since we're inside mmap, we cannot safely use the io_uring lock. Rely on the fact that we can lookup the buffer list under RCU now and grab a reference to it, preventing it from being unregistered until we're done with it. The lookup returns the io_buffer_list directly with it referenced.
Cc: stable@vger.kernel.org # v6.4+ Fixes: 5cf4f52e6d8a ("io_uring: free io_buffer_list entries via RCU") Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.c | 11 ++++++----- io_uring/kbuf.c | 31 +++++++++++++++++++++++++------ io_uring/kbuf.h | 4 +++- 3 files changed, 34 insertions(+), 12 deletions(-)
--- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3429,14 +3429,15 @@ static void *io_uring_validate_mmap_requ ptr = ctx->sq_sqes; break; case IORING_OFF_PBUF_RING: { + struct io_buffer_list *bl; unsigned int bgid;
bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT; - rcu_read_lock(); - ptr = io_pbuf_get_address(ctx, bgid); - rcu_read_unlock(); - if (!ptr) - return ERR_PTR(-EINVAL); + bl = io_pbuf_get_bl(ctx, bgid); + if (IS_ERR(bl)) + return bl; + ptr = bl->buf_ring; + io_put_bl(ctx, bl); break; } default: --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -273,7 +273,7 @@ static int __io_remove_buffers(struct io return i; }
-static void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl) +void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl) { if (atomic_dec_and_test(&bl->refs)) { __io_remove_buffers(ctx, bl, -1U); @@ -689,16 +689,35 @@ int io_unregister_pbuf_ring(struct io_ri return 0; }
-void *io_pbuf_get_address(struct io_ring_ctx *ctx, unsigned long bgid) +struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx, + unsigned long bgid) { struct io_buffer_list *bl; + bool ret;
- bl = __io_buffer_get_list(ctx, bgid); + /* + * We have to be a bit careful here - we're inside mmap and cannot grab + * the uring_lock. This means the buffer_list could be simultaneously + * going away, if someone is trying to be sneaky. Look it up under rcu + * so we know it's not going away, and attempt to grab a reference to + * it. If the ref is already zero, then fail the mapping. If successful, + * the caller will call io_put_bl() to drop the the reference at at the + * end. This may then safely free the buffer_list (and drop the pages) + * at that point, vm_insert_pages() would've already grabbed the + * necessary vma references. + */ + rcu_read_lock(); + bl = xa_load(&ctx->io_bl_xa, bgid); + /* must be a mmap'able buffer ring and have pages */ + ret = false; + if (bl && bl->is_mmap) + ret = atomic_inc_not_zero(&bl->refs); + rcu_read_unlock();
- if (!bl || !bl->is_mmap) - return NULL; + if (ret) + return bl;
- return bl->buf_ring; + return ERR_PTR(-EINVAL); }
/* --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -60,7 +60,9 @@ unsigned int __io_put_kbuf(struct io_kio
void io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags);
-void *io_pbuf_get_address(struct io_ring_ctx *ctx, unsigned long bgid); +void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl); +struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx, + unsigned long bgid);
static inline void io_kbuf_recycle_ring(struct io_kiocb *req) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herve Codina herve.codina@bootlin.com
commit 0462c56c290a99a7f03e817ae5b843116dfb575c upstream.
The commit 80dd33cf72d1 ("drivers: base: Fix device link removal") introduces a workqueue to release the consumer and supplier devices used in the devlink. In the job queued, devices are release and in turn, when all the references to these devices are dropped, the release function of the device itself is called.
Nothing is present to provide some synchronisation with this workqueue in order to ensure that all ongoing releasing operations are done and so, some other operations can be started safely.
For instance, in the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove()
During the step 1, devices are released and related devlinks are removed (jobs pushed in the workqueue). During the step 2, OF nodes are destroyed but, without any synchronisation with devlink removal jobs, of_overlay_remove() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2
Indeed, the missing of_node_put() call is going to be done, too late, from the workqueue job execution.
Introduce device_link_wait_removal() to offer a way to synchronize operations waiting for the end of devlink removals (i.e. end of workqueue jobs). Also, as a flushing operation is done on the workqueue, the workqueue used is moved from a system-wide workqueue to a local one.
Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com Tested-by: Luca Ceresoli luca.ceresoli@bootlin.com Reviewed-by: Nuno Sa nuno.sa@analog.com Reviewed-by: Saravana Kannan saravanak@google.com Acked-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Link: https://lore.kernel.org/r/20240325152140.198219-2-herve.codina@bootlin.com Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/base/core.c | 26 +++++++++++++++++++++++--- include/linux/device.h | 1 + 2 files changed, 24 insertions(+), 3 deletions(-)
--- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(voi static void __fw_devlink_link_to_consumers(struct device *dev); static bool fw_devlink_drv_reg_done; static bool fw_devlink_best_effort; +static struct workqueue_struct *device_link_wq;
/** * __fwnode_link_add - Create a link between two fwnode_handles. @@ -531,12 +532,26 @@ static void devlink_dev_release(struct d /* * It may take a while to complete this work because of the SRCU * synchronization in device_link_release_fn() and if the consumer or - * supplier devices get deleted when it runs, so put it into the "long" - * workqueue. + * supplier devices get deleted when it runs, so put it into the + * dedicated workqueue. */ - queue_work(system_long_wq, &link->rm_work); + queue_work(device_link_wq, &link->rm_work); }
+/** + * device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate + */ +void device_link_wait_removal(void) +{ + /* + * devlink removal jobs are queued in the dedicated work queue. + * To be sure that all removal jobs are terminated, ensure that any + * scheduled work has run to completion. + */ + flush_workqueue(device_link_wq); +} +EXPORT_SYMBOL_GPL(device_link_wait_removal); + static struct class devlink_class = { .name = "devlink", .dev_groups = devlink_groups, @@ -4090,9 +4105,14 @@ int __init devices_init(void) sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj); if (!sysfs_dev_char_kobj) goto char_kobj_err; + device_link_wq = alloc_workqueue("device_link_wq", 0, 0); + if (!device_link_wq) + goto wq_err;
return 0;
+ wq_err: + kobject_put(sysfs_dev_char_kobj); char_kobj_err: kobject_put(sysfs_dev_block_kobj); block_kobj_err: --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1250,6 +1250,7 @@ void device_link_del(struct device_link void device_link_remove(void *consumer, struct device *supplier); void device_links_supplier_sync_state_pause(void); void device_links_supplier_sync_state_resume(void); +void device_link_wait_removal(void);
/* Create alias, so I can be autoloaded. */ #define MODULE_ALIAS_CHARDEV(major,minor) \
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herve Codina herve.codina@bootlin.com
commit 8917e7385346bd6584890ed362985c219fe6ae84 upstream.
In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_changeset_destroy() with the devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com Reviewed-by: Saravana Kannan saravanak@google.com Tested-by: Luca Ceresoli luca.ceresoli@bootlin.com Reviewed-by: Nuno Sa nuno.sa@analog.com Link: https://lore.kernel.org/r/20240325152140.198219-3-herve.codina@bootlin.com Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/of/dynamic.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -9,6 +9,7 @@
#define pr_fmt(fmt) "OF: " fmt
+#include <linux/device.h> #include <linux/of.h> #include <linux/spinlock.h> #include <linux/slab.h> @@ -667,6 +668,17 @@ void of_changeset_destroy(struct of_chan { struct of_changeset_entry *ce, *cen;
+ /* + * When a device is deleted, the device links to/from it are also queued + * for deletion. Until these device links are freed, the devices + * themselves aren't freed. If the device being deleted is due to an + * overlay change, this device might be holding a reference to a device + * node that will be freed. So, wait until all already pending device + * links are deleted before freeing a device node. This ensures we don't + * free any device node that has a non-zero reference count. + */ + device_link_wait_removal(); + list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node) __of_changeset_entry_destroy(ce); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 upstream.
PAT handling won't do the right thing in COW mappings: the first PTE (or, in fact, all PTEs) can be replaced during write faults to point at anon folios. Reliably recovering the correct PFN and cachemode using follow_phys() from PTEs will not work in COW mappings.
Using follow_phys(), we might just get the address+protection of the anon folio (which is very wrong), or fail on swap/nonswap entries, failing follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and track_pfn_copy(), not properly calling free_pfn_range().
In free_pfn_range(), we either wouldn't call memtype_free() or would call it with the wrong range, possibly leaking memory.
To fix that, let's update follow_phys() to refuse returning anon folios, and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings if we run into that.
We will now properly handle untrack_pfn() with COW mappings, where we don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if the first page was replaced by an anon folio, though: we'd have to store the cachemode in the VMA to make this work, likely growing the VMA size.
For now, lets keep it simple and let track_pfn_copy() just fail in that case: it would have failed in the past with swap/nonswap entries already, and it would have done the wrong thing with anon folios.
Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
<--- C reproducer ---> #include <stdio.h> #include <sys/mman.h> #include <unistd.h> #include <liburing.h>
int main(void) { struct io_uring_params p = {}; int ring_fd; size_t size; char *map;
ring_fd = io_uring_setup(1, &p); if (ring_fd < 0) { perror("io_uring_setup"); return 1; } size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
/* Map the submission queue ring MAP_PRIVATE */ map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, ring_fd, IORING_OFF_SQ_RING); if (map == MAP_FAILED) { perror("mmap"); return 1; }
/* We have at least one page. Let's COW it. */ *map = 0; pause(); return 0; } <--- C reproducer --->
On a system with 16 GiB RAM and swap configured: # ./iouring & # memhog 16G # killall iouring [ 301.552930] ------------[ cut here ]------------ [ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100 [ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g [ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1 [ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4 [ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100 [ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000 [ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282 [ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047 [ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200 [ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000 [ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000 [ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000 [ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000 [ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0 [ 301.565725] PKRU: 55555554 [ 301.565944] Call Trace: [ 301.566148] <TASK> [ 301.566325] ? untrack_pfn+0xf4/0x100 [ 301.566618] ? __warn+0x81/0x130 [ 301.566876] ? untrack_pfn+0xf4/0x100 [ 301.567163] ? report_bug+0x171/0x1a0 [ 301.567466] ? handle_bug+0x3c/0x80 [ 301.567743] ? exc_invalid_op+0x17/0x70 [ 301.568038] ? asm_exc_invalid_op+0x1a/0x20 [ 301.568363] ? untrack_pfn+0xf4/0x100 [ 301.568660] ? untrack_pfn+0x65/0x100 [ 301.568947] unmap_single_vma+0xa6/0xe0 [ 301.569247] unmap_vmas+0xb5/0x190 [ 301.569532] exit_mmap+0xec/0x340 [ 301.569801] __mmput+0x3e/0x130 [ 301.570051] do_exit+0x305/0xaf0 ...
Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reported-by: Wupeng Ma mawupeng1@huawei.com Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines") Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3") Acked-by: Ingo Molnar mingo@kernel.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Andy Lutomirski luto@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Borislav Petkov bp@alien8.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/pat/memtype.c | 49 ++++++++++++++++++++++++++++++++-------------- mm/memory.c | 4 +++ 2 files changed, 39 insertions(+), 14 deletions(-)
--- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -950,6 +950,38 @@ static void free_pfn_range(u64 paddr, un memtype_free(paddr, paddr + size); }
+static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr, + pgprot_t *pgprot) +{ + unsigned long prot; + + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PAT)); + + /* + * We need the starting PFN and cachemode used for track_pfn_remap() + * that covered the whole VMA. For most mappings, we can obtain that + * information from the page tables. For COW mappings, we might now + * suddenly have anon folios mapped and follow_phys() will fail. + * + * Fallback to using vma->vm_pgoff, see remap_pfn_range_notrack(), to + * detect the PFN. If we need the cachemode as well, we're out of luck + * for now and have to fail fork(). + */ + if (!follow_phys(vma, vma->vm_start, 0, &prot, paddr)) { + if (pgprot) + *pgprot = __pgprot(prot); + return 0; + } + if (is_cow_mapping(vma->vm_flags)) { + if (pgprot) + return -EINVAL; + *paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT; + return 0; + } + WARN_ON_ONCE(1); + return -EINVAL; +} + /* * track_pfn_copy is called when vma that is covering the pfnmap gets * copied through copy_page_range(). @@ -960,20 +992,13 @@ static void free_pfn_range(u64 paddr, un int track_pfn_copy(struct vm_area_struct *vma) { resource_size_t paddr; - unsigned long prot; unsigned long vma_size = vma->vm_end - vma->vm_start; pgprot_t pgprot;
if (vma->vm_flags & VM_PAT) { - /* - * reserve the whole chunk covered by vma. We need the - * starting address and protection from pte. - */ - if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) { - WARN_ON_ONCE(1); + if (get_pat_info(vma, &paddr, &pgprot)) return -EINVAL; - } - pgprot = __pgprot(prot); + /* reserve the whole chunk covered by vma. */ return reserve_pfn_range(paddr, vma_size, &pgprot, 1); }
@@ -1048,7 +1073,6 @@ void untrack_pfn(struct vm_area_struct * unsigned long size, bool mm_wr_locked) { resource_size_t paddr; - unsigned long prot;
if (vma && !(vma->vm_flags & VM_PAT)) return; @@ -1056,11 +1080,8 @@ void untrack_pfn(struct vm_area_struct * /* free the chunk starting from pfn or the whole chunk */ paddr = (resource_size_t)pfn << PAGE_SHIFT; if (!paddr && !size) { - if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) { - WARN_ON_ONCE(1); + if (get_pat_info(vma, &paddr, NULL)) return; - } - size = vma->vm_end - vma->vm_start; } free_pfn_range(paddr, size); --- a/mm/memory.c +++ b/mm/memory.c @@ -5674,6 +5674,10 @@ int follow_phys(struct vm_area_struct *v goto out; pte = ptep_get(ptep);
+ /* Never return PFNs of anon folios in COW mappings. */ + if (vm_normal_folio(vma, address, pte)) + goto unlock; + if ((flags & FOLL_WRITE) && !pte_write(pte)) goto unlock;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov (AMD) bp@alien8.de
commit 3ddf944b32f88741c303f0b21459dbb3872b8bc5 upstream.
Modifying a MCA bank's MCA_CTL bits which control which error types to be reported is done over
/sys/devices/system/machinecheck/ ├── machinecheck0 │ ├── bank0 │ ├── bank1 │ ├── bank10 │ ├── bank11 ...
sysfs nodes by writing the new bit mask of events to enable.
When the write is accepted, the kernel deletes all current timers and reinits all banks.
Doing that in parallel can lead to initializing a timer which is already armed and in the timer wheel, i.e., in use already:
ODEBUG: init active (active state 0) object: ffff888063a28000 object type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642 WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514 debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514
Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code does.
Reported by: Yue Sun samsun1006219@gmail.com Reported by: xingwei lee xrivendell7@gmail.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Cc: stable@kernel.org Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%2Bx_-PqMv98gKig55=2vbzf... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/mce/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -2468,12 +2468,14 @@ static ssize_t set_bank(struct device *s return -EINVAL;
b = &per_cpu(mce_banks_array, s->id)[bank]; - if (!b->init) return -ENODEV;
b->ctl = new; + + mutex_lock(&mce_sysfs_mutex); mce_restart(); + mutex_unlock(&mce_sysfs_mutex);
return size; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason A. Donenfeld Jason@zx2c4.com
commit 99485c4c026f024e7cb82da84c7951dbe3deb584 upstream.
There are few uses of CoCo that don't rely on working cryptography and hence a working RNG. Unfortunately, the CoCo threat model means that the VM host cannot be trusted and may actively work against guests to extract secrets or manipulate computation. Since a malicious host can modify or observe nearly all inputs to guests, the only remaining source of entropy for CoCo guests is RDRAND.
If RDRAND is broken -- due to CPU hardware fault -- the RNG as a whole is meant to gracefully continue on gathering entropy from other sources, but since there aren't other sources on CoCo, this is catastrophic. This is mostly a concern at boot time when initially seeding the RNG, as after that the consequences of a broken RDRAND are much more theoretical.
So, try at boot to seed the RNG using 256 bits of RDRAND output. If this fails, panic(). This will also trigger if the system is booted without RDRAND, as RDRAND is essential for a safe CoCo boot.
Add this deliberately to be "just a CoCo x86 driver feature" and not part of the RNG itself. Many device drivers and platforms have some desire to contribute something to the RNG, and add_device_randomness() is specifically meant for this purpose.
Any driver can call it with seed data of any quality, or even garbage quality, and it can only possibly make the quality of the RNG better or have no effect, but can never make it worse.
Rather than trying to build something into the core of the RNG, consider the particular CoCo issue just a CoCo issue, and therefore separate it all out into driver (well, arch/platform) code.
[ bp: Massage commit message. ]
Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Elena Reshetova elena.reshetova@intel.com Reviewed-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Theodore Ts'o tytso@mit.edu Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240326160735.73531-1-Jason@zx2c4.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/coco/core.c | 41 +++++++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/coco.h | 2 ++ arch/x86/kernel/setup.c | 2 ++ 3 files changed, 45 insertions(+)
--- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -3,13 +3,17 @@ * Confidential Computing Platform Capability checks * * Copyright (C) 2021 Advanced Micro Devices, Inc. + * Copyright (C) 2024 Jason A. Donenfeld Jason@zx2c4.com. All Rights Reserved. * * Author: Tom Lendacky thomas.lendacky@amd.com */
#include <linux/export.h> #include <linux/cc_platform.h> +#include <linux/string.h> +#include <linux/random.h>
+#include <asm/archrandom.h> #include <asm/coco.h> #include <asm/processor.h>
@@ -148,3 +152,40 @@ u64 cc_mkdec(u64 val) } } EXPORT_SYMBOL_GPL(cc_mkdec); + +__init void cc_random_init(void) +{ + /* + * The seed is 32 bytes (in units of longs), which is 256 bits, which + * is the security level that the RNG is targeting. + */ + unsigned long rng_seed[32 / sizeof(long)]; + size_t i, longs; + + if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) + return; + + /* + * Since the CoCo threat model includes the host, the only reliable + * source of entropy that can be neither observed nor manipulated is + * RDRAND. Usually, RDRAND failure is considered tolerable, but since + * CoCo guests have no other unobservable source of entropy, it's + * important to at least ensure the RNG gets some initial random seeds. + */ + for (i = 0; i < ARRAY_SIZE(rng_seed); i += longs) { + longs = arch_get_random_longs(&rng_seed[i], ARRAY_SIZE(rng_seed) - i); + + /* + * A zero return value means that the guest doesn't have RDRAND + * or the CPU is physically broken, and in both cases that + * means most crypto inside of the CoCo instance will be + * broken, defeating the purpose of CoCo in the first place. So + * just panic here because it's absolutely unsafe to continue + * executing. + */ + if (longs == 0) + panic("RDRAND is defective."); + } + add_device_randomness(rng_seed, sizeof(rng_seed)); + memzero_explicit(rng_seed, sizeof(rng_seed)); +} --- a/arch/x86/include/asm/coco.h +++ b/arch/x86/include/asm/coco.h @@ -22,6 +22,7 @@ static inline void cc_set_mask(u64 mask)
u64 cc_mkenc(u64 val); u64 cc_mkdec(u64 val); +void cc_random_init(void); #else static inline u64 cc_mkenc(u64 val) { @@ -32,6 +33,7 @@ static inline u64 cc_mkdec(u64 val) { return val; } +static inline void cc_random_init(void) { } #endif
#endif /* _ASM_X86_COCO_H */ --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -35,6 +35,7 @@ #include <asm/bios_ebda.h> #include <asm/bugs.h> #include <asm/cacheinfo.h> +#include <asm/coco.h> #include <asm/cpu.h> #include <asm/efi.h> #include <asm/gart.h> @@ -1120,6 +1121,7 @@ void __init setup_arch(char **cmdline_p) * memory size. */ sev_setup_arch(); + cc_random_init();
efi_fake_memmap(); efi_find_mirror();
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kan Liang kan.liang@linux.intel.com
commit 312be9fc2234c8acfb8148a9f4c358b70d358dee upstream.
The MSR_PEBS_DATA_CFG MSR register is used to configure which data groups should be generated into a PEBS record, and it's shared among all counters.
If there are different configurations among counters, perf combines all the configurations.
The first perf command as below requires a complete PEBS record (including memory info, GPRs, XMMs, and LBRs). The second perf command only requires a basic group. However, after the second perf command is running, the MSR_PEBS_DATA_CFG register is cleared. Only a basic group is generated in a PEBS record, which is wrong. The required information for the first perf command is missed.
$ perf record --intr-regs=AX,SP,XMM0 -a -C 8 -b -W -d -c 100000003 -o /dev/null -e cpu/event=0xd0,umask=0x81/upp & $ sleep 5 $ perf record --per-thread -c 1 -e cycles:pp --no-timestamp --no-tid taskset -c 8 ./noploop 1000
The first PEBS event is a system-wide PEBS event. The second PEBS event is a per-thread event. When the thread is scheduled out, the intel_pmu_pebs_del() function is invoked to update the PEBS state. Since the system-wide event is still available, the cpuc->n_pebs is 1. The cpuc->pebs_data_cfg is cleared. The data configuration for the system-wide PEBS event is lost.
The (cpuc->n_pebs == 1) check was introduced in commit:
b6a32f023fcc ("perf/x86: Fix PEBS threshold initialization")
At that time, it indeed didn't hurt whether the state was updated during the removal, because only the threshold is updated.
The calculation of the threshold takes the last PEBS event into account.
However, since commit:
b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG")
we delay the threshold update, and clear the PEBS data config, which triggers the bug.
The PEBS data config update scope should not be shrunk during removal.
[ mingo: Improved the changelog & comments. ]
Fixes: b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG") Reported-by: Stephane Eranian eranian@google.com Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Ingo Molnar mingo@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240401133320.703971-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/ds.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1236,11 +1236,11 @@ pebs_update_state(bool needed_cb, struct struct pmu *pmu = event->pmu;
/* - * Make sure we get updated with the first PEBS - * event. It will trigger also during removal, but - * that does not hurt: + * Make sure we get updated with the first PEBS event. + * During removal, ->pebs_data_cfg is still valid for + * the last PEBS event. Don't clear it. */ - if (cpuc->n_pebs == 1) + if ((cpuc->n_pebs == 1) && add) cpuc->pebs_data_cfg = PEBS_UPDATE_DS_SW;
if (needed_cb != pebs_needs_sched_cb(cpuc)) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Brown broonie@kernel.org
commit b017a0cea627fcbe158fc2c214fe893e18c4d0c4 upstream.
The SVE register sets have two different formats, one of which is a wrapped version of the standard FPSIMD register set and another with actual SVE register data. At present we check TIF_SVE to see if full SVE register state should be provided when reading the SVE regset but if we were in a syscall we may have saved only floating point registers even though that is set.
Fix this and simplify the logic by checking and using the format which we recorded when deciding if we should use FPSIMD or SVE format.
Fixes: 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") Cc: stable@vger.kernel.org # 6.2.x Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20240325-arm64-ptrace-fp-type-v1-1-8dc846caf11f@ke... Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/ptrace.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -728,7 +728,6 @@ static void sve_init_header_from_task(st { unsigned int vq; bool active; - bool fpsimd_only; enum vec_type task_type;
memset(header, 0, sizeof(*header)); @@ -744,12 +743,10 @@ static void sve_init_header_from_task(st case ARM64_VEC_SVE: if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT)) header->flags |= SVE_PT_VL_INHERIT; - fpsimd_only = !test_tsk_thread_flag(target, TIF_SVE); break; case ARM64_VEC_SME: if (test_tsk_thread_flag(target, TIF_SME_VL_INHERIT)) header->flags |= SVE_PT_VL_INHERIT; - fpsimd_only = false; break; default: WARN_ON_ONCE(1); @@ -757,7 +754,7 @@ static void sve_init_header_from_task(st }
if (active) { - if (fpsimd_only) { + if (target->thread.fp_type == FP_STATE_FPSIMD) { header->flags |= SVE_PT_REGS_FPSIMD; } else { header->flags |= SVE_PT_REGS_SVE;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit 65291dcfcf8936e1b23cfd7718fdfde7cfaf7706 upstream.
folio_is_secretmem() currently relies on secretmem folios being LRU folios, to save some cycles.
However, folios might reside in a folio batch without the LRU flag set, or temporarily have their LRU flag cleared. Consequently, the LRU flag is unreliable for this purpose.
In particular, this is the case when secretmem_fault() allocates a fresh page and calls filemap_add_folio()->folio_add_lru(). The folio might be added to the per-cpu folio batch and won't get the LRU flag set until the batch was drained using e.g., lru_add_drain().
Consequently, folio_is_secretmem() might not detect secretmem folios and GUP-fast can succeed in grabbing a secretmem folio, crashing the kernel when we would later try reading/writing to the folio, because the folio has been unmapped from the directmap.
Fix it by removing that unreliable check.
Link: https://lkml.kernel.org/r/20240326143210.291116-2-david@redhat.com Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas") Signed-off-by: David Hildenbrand david@redhat.com Reported-by: xingwei lee xrivendell7@gmail.com Reported-by: yue sun samsun1006219@gmail.com Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rm... Debugged-by: Miklos Szeredi miklos@szeredi.hu Tested-by: Miklos Szeredi mszeredi@redhat.com Reviewed-by: Mike Rapoport (IBM) rppt@kernel.org Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/secretmem.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/include/linux/secretmem.h +++ b/include/linux/secretmem.h @@ -13,10 +13,10 @@ static inline bool folio_is_secretmem(st /* * Using folio_mapping() is quite slow because of the actual call * instruction. - * We know that secretmem pages are not compound and LRU so we can + * We know that secretmem pages are not compound, so we can * save a couple of cycles here. */ - if (folio_test_large(folio) || !folio_test_lru(folio)) + if (folio_test_large(folio)) return false;
mapping = (struct address_space *)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Edward Liaw edliaw@google.com
commit 176517c9310281d00dd3210ab4cc4d3cdc26b17e upstream.
Got a compilation error on Android for ffsl after 91b80cc5b39f ("selftests: mm: fix map_hugetlb failure on 64K page size systems") included vm_util.h.
Link: https://lkml.kernel.org/r/20240329185814.16304-1-edliaw@google.com Fixes: af605d26a8f2 ("selftests/mm: merge util.h into vm_util.h") Signed-off-by: Edward Liaw edliaw@google.com Reviewed-by: Muhammad Usama Anjum usama.anjum@collabora.com Cc: Axel Rasmussen axelrasmussen@google.com Cc: David Hildenbrand david@redhat.com Cc: "Mike Rapoport (IBM)" rppt@kernel.org Cc: Peter Xu peterx@redhat.com Cc: Shuah Khan shuah@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/mm/vm_util.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests/mm/vm_util.h index c02990bbd56f..9007c420d52c 100644 --- a/tools/testing/selftests/mm/vm_util.h +++ b/tools/testing/selftests/mm/vm_util.h @@ -3,7 +3,7 @@ #include <stdbool.h> #include <sys/mman.h> #include <err.h> -#include <string.h> /* ffsl() */ +#include <strings.h> /* ffsl() */ #include <unistd.h> /* _SC_PAGESIZE */
#define BIT_ULL(nr) (1ULL << (nr))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sumanth Korikkar sumanthk@linux.ibm.com
commit 378ca2d2ad410a1cd5690d06b46c5e2297f4c8c0 upstream.
Align system call table on 8 bytes. With sys_call_table entry size of 8 bytes that eliminates the possibility of a system call pointer crossing cache line boundary.
Cc: stable@kernel.org Suggested-by: Ulrich Weigand ulrich.weigand@de.ibm.com Reviewed-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Sumanth Korikkar sumanthk@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/entry.S | 1 + 1 file changed, 1 insertion(+)
--- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -653,6 +653,7 @@ SYM_DATA_START_LOCAL(daton_psw) SYM_DATA_END(daton_psw)
.section .rodata, "a" + .balign 8 #define SYSCALL(esame,emu) .quad __s390x_ ## esame SYM_DATA_START(sys_call_table) #include "asm/syscall_table.h"
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Samuel Holland samuel.holland@sifive.com
commit d080a08b06b6266cc3e0e86c5acfd80db937cb6b upstream.
These macros did not initialize __kr_err, so they could fail even if the access did not fault.
Cc: stable@vger.kernel.org Fixes: d464118cdc41 ("riscv: implement __get_kernel_nofault and __put_user_nofault") Signed-off-by: Samuel Holland samuel.holland@sifive.com Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Reviewed-by: Charlie Jenkins charlie@rivosinc.com Link: https://lore.kernel.org/r/20240312022030.320789-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/uaccess.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/riscv/include/asm/uaccess.h +++ b/arch/riscv/include/asm/uaccess.h @@ -319,7 +319,7 @@ unsigned long __must_check clear_user(vo
#define __get_kernel_nofault(dst, src, type, err_label) \ do { \ - long __kr_err; \ + long __kr_err = 0; \ \ __get_user_nocheck(*((type *)(dst)), (type *)(src), __kr_err); \ if (unlikely(__kr_err)) \ @@ -328,7 +328,7 @@ do { \
#define __put_kernel_nofault(dst, src, type, err_label) \ do { \ - long __kr_err; \ + long __kr_err = 0; \ \ __put_user_nocheck(*((type *)(src)), (type *)(dst), __kr_err); \ if (unlikely(__kr_err)) \
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan O'Rear sorear@fastmail.com
commit d14fa1fcf69db9d070e75f1c4425211fa619dfc8 upstream.
childregs represents the registers which are active for the new thread in user context. For a kernel thread, childregs->gp is never used since the kernel gp is not touched by switch_to. For a user mode helper, the gp value can be observed in user space after execve or possibly by other means.
[From the email thread]
The /* Kernel thread */ comment is somewhat inaccurate in that it is also used for user_mode_helper threads, which exec a user process, e.g. /sbin/init or when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have PF_KTHREAD set and are valid targets for ptrace etc. even before they exec.
childregs is the *user* context during syscall execution and it is observable from userspace in at least five ways:
1. kernel_execve does not currently clear integer registers, so the starting register state for PID 1 and other user processes started by the kernel has sp = user stack, gp = kernel __global_pointer$, all other integer registers zeroed by the memset in the patch comment.
This is a bug in its own right, but I'm unwilling to bet that it is the only way to exploit the issue addressed by this patch.
2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread before it execs, but ptrace requires SIGSTOP to be delivered which can only happen at user/kernel boundaries.
3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for user_mode_helpers before the exec completes, but gp is not one of the registers it returns.
4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses are also exposed via PERF_SAMPLE_REGS_USER which is permitted under LOCKDOWN_PERF. I have not attempted to write exploit code.
5. Much of the tracing infrastructure allows access to user registers. I have not attempted to determine which forms of tracing allow access to user registers without already allowing access to kernel registers.
Fixes: 7db91e57a0ac ("RISC-V: Task implementation") Cc: stable@vger.kernel.org Signed-off-by: Stefan O'Rear sorear@fastmail.com Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/kernel/process.c | 3 --- 1 file changed, 3 deletions(-)
--- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -26,8 +26,6 @@ #include <asm/cpuidle.h> #include <asm/vector.h>
-register unsigned long gp_in_global __asm__("gp"); - #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) #include <linux/stackprotector.h> unsigned long __stack_chk_guard __read_mostly; @@ -186,7 +184,6 @@ int copy_thread(struct task_struct *p, c if (unlikely(args->fn)) { /* Kernel thread */ memset(childregs, 0, sizeof(struct pt_regs)); - childregs->gp = gp_in_global; /* Supervisor/Machine, irqs on: */ childregs->status = SR_PP | SR_PIE;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit 4a5ba0e0bfe552ac7451f57e304f6343c3d87f89 upstream.
The tcons created by cifs_construct_tcon() on multiuser mounts must also be able to failover and refresh DFS referrals, so set the appropriate fields in order to get a full DFS tcon. They could be shared among different superblocks later, too.
Cc: stable@vger.kernel.org # 6.4+ Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202404021518.3Xu2VU4s-lkp@intel.com/ Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/connect.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+)
--- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -3988,6 +3988,7 @@ cifs_construct_tcon(struct cifs_sb_info struct cifs_ses *ses; struct cifs_tcon *tcon = NULL; struct smb3_fs_context *ctx; + char *origin_fullpath = NULL;
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); if (ctx == NULL) @@ -4011,6 +4012,7 @@ cifs_construct_tcon(struct cifs_sb_info ctx->sign = master_tcon->ses->sign; ctx->seal = master_tcon->seal; ctx->witness = master_tcon->use_witness; + ctx->dfs_root_ses = master_tcon->ses->dfs_root_ses;
rc = cifs_set_vol_auth(ctx, master_tcon->ses); if (rc) { @@ -4030,12 +4032,39 @@ cifs_construct_tcon(struct cifs_sb_info goto out; }
+#ifdef CONFIG_CIFS_DFS_UPCALL + spin_lock(&master_tcon->tc_lock); + if (master_tcon->origin_fullpath) { + spin_unlock(&master_tcon->tc_lock); + origin_fullpath = dfs_get_path(cifs_sb, cifs_sb->ctx->source); + if (IS_ERR(origin_fullpath)) { + tcon = ERR_CAST(origin_fullpath); + origin_fullpath = NULL; + cifs_put_smb_ses(ses); + goto out; + } + } else { + spin_unlock(&master_tcon->tc_lock); + } +#endif + tcon = cifs_get_tcon(ses, ctx); if (IS_ERR(tcon)) { cifs_put_smb_ses(ses); goto out; }
+#ifdef CONFIG_CIFS_DFS_UPCALL + if (origin_fullpath) { + spin_lock(&tcon->tc_lock); + tcon->origin_fullpath = origin_fullpath; + spin_unlock(&tcon->tc_lock); + origin_fullpath = NULL; + queue_delayed_work(dfscache_wq, &tcon->dfs_cache_work, + dfs_cache_get_ttl() * HZ); + } +#endif + #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY if (cap_unix(ses)) reset_cifs_unix_caps(0, tcon, NULL, ctx); @@ -4044,6 +4073,7 @@ cifs_construct_tcon(struct cifs_sb_info out: kfree(ctx->username); kfree_sensitive(ctx->password); + kfree(origin_fullpath); kfree(ctx);
return tcon;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit 93cee45ccfebc62a3bb4cd622b89e00c8c7d8493 upstream.
Serialise cifs_construct_tcon() with cifs_mount_mutex to handle parallel mounts that may end up reusing the session and tcon created by it.
Cc: stable@vger.kernel.org # 6.4+ Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/connect.c | 13 ++++++++++++- fs/smb/client/fs_context.c | 6 +++--- fs/smb/client/fs_context.h | 12 ++++++++++++ 3 files changed, 27 insertions(+), 4 deletions(-)
--- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -3981,7 +3981,7 @@ cifs_set_vol_auth(struct smb3_fs_context }
static struct cifs_tcon * -cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid) +__cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid) { int rc; struct cifs_tcon *master_tcon = cifs_sb_master_tcon(cifs_sb); @@ -4079,6 +4079,17 @@ out: return tcon; }
+static struct cifs_tcon * +cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid) +{ + struct cifs_tcon *ret; + + cifs_mount_lock(); + ret = __cifs_construct_tcon(cifs_sb, fsuid); + cifs_mount_unlock(); + return ret; +} + struct cifs_tcon * cifs_sb_master_tcon(struct cifs_sb_info *cifs_sb) { --- a/fs/smb/client/fs_context.c +++ b/fs/smb/client/fs_context.c @@ -37,7 +37,7 @@ #include "rfc1002pdu.h" #include "fs_context.h"
-static DEFINE_MUTEX(cifs_mount_mutex); +DEFINE_MUTEX(cifs_mount_mutex);
static const match_table_t cifs_smb_version_tokens = { { Smb_1, SMB1_VERSION_STRING }, @@ -752,9 +752,9 @@ static int smb3_get_tree(struct fs_conte
if (err) return err; - mutex_lock(&cifs_mount_mutex); + cifs_mount_lock(); ret = smb3_get_tree_common(fc); - mutex_unlock(&cifs_mount_mutex); + cifs_mount_unlock(); return ret; }
--- a/fs/smb/client/fs_context.h +++ b/fs/smb/client/fs_context.h @@ -293,4 +293,16 @@ extern void smb3_update_mnt_flags(struct #define MAX_CACHED_FIDS 16 extern char *cifs_sanitize_prepath(char *prepath, gfp_t gfp);
+extern struct mutex cifs_mount_mutex; + +static inline void cifs_mount_lock(void) +{ + mutex_lock(&cifs_mount_mutex); +} + +static inline void cifs_mount_unlock(void) +{ + mutex_unlock(&cifs_mount_mutex); +} + #endif
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ritvik Budhiraja rbudhiraja@microsoft.com
commit 173217bd73365867378b5e75a86f0049e1069ee8 upstream.
In the current implementation, CIFS close sends a close to the server and does not check for the success of the server close. This patch adds functionality to check for server close return status and retries in case of an EBUSY or EAGAIN error.
This can help avoid handle leaks
Cc: stable@vger.kernel.org Signed-off-by: Ritvik Budhiraja rbudhiraja@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/cached_dir.c | 6 ++-- fs/smb/client/cifsfs.c | 11 +++++++ fs/smb/client/cifsglob.h | 7 +++-- fs/smb/client/file.c | 63 ++++++++++++++++++++++++++++++++++++++++----- fs/smb/client/smb1ops.c | 4 +- fs/smb/client/smb2ops.c | 9 +++--- fs/smb/client/smb2pdu.c | 2 - 7 files changed, 85 insertions(+), 17 deletions(-)
--- a/fs/smb/client/cached_dir.c +++ b/fs/smb/client/cached_dir.c @@ -401,6 +401,7 @@ smb2_close_cached_fid(struct kref *ref) { struct cached_fid *cfid = container_of(ref, struct cached_fid, refcount); + int rc;
spin_lock(&cfid->cfids->cfid_list_lock); if (cfid->on_list) { @@ -414,9 +415,10 @@ smb2_close_cached_fid(struct kref *ref) cfid->dentry = NULL;
if (cfid->is_open) { - SMB2_close(0, cfid->tcon, cfid->fid.persistent_fid, + rc = SMB2_close(0, cfid->tcon, cfid->fid.persistent_fid, cfid->fid.volatile_fid); - atomic_dec(&cfid->tcon->num_remote_opens); + if (rc != -EBUSY && rc != -EAGAIN) + atomic_dec(&cfid->tcon->num_remote_opens); }
free_cached_dir(cfid); --- a/fs/smb/client/cifsfs.c +++ b/fs/smb/client/cifsfs.c @@ -159,6 +159,7 @@ struct workqueue_struct *decrypt_wq; struct workqueue_struct *fileinfo_put_wq; struct workqueue_struct *cifsoplockd_wq; struct workqueue_struct *deferredclose_wq; +struct workqueue_struct *serverclose_wq; __u32 cifs_lock_secret;
/* @@ -1877,6 +1878,13 @@ init_cifs(void) goto out_destroy_cifsoplockd_wq; }
+ serverclose_wq = alloc_workqueue("serverclose", + WQ_FREEZABLE|WQ_MEM_RECLAIM, 0); + if (!serverclose_wq) { + rc = -ENOMEM; + goto out_destroy_serverclose_wq; + } + rc = cifs_init_inodecache(); if (rc) goto out_destroy_deferredclose_wq; @@ -1951,6 +1959,8 @@ out_destroy_decrypt_wq: destroy_workqueue(decrypt_wq); out_destroy_cifsiod_wq: destroy_workqueue(cifsiod_wq); +out_destroy_serverclose_wq: + destroy_workqueue(serverclose_wq); out_clean_proc: cifs_proc_clean(); return rc; @@ -1980,6 +1990,7 @@ exit_cifs(void) destroy_workqueue(cifsoplockd_wq); destroy_workqueue(decrypt_wq); destroy_workqueue(fileinfo_put_wq); + destroy_workqueue(serverclose_wq); destroy_workqueue(cifsiod_wq); cifs_proc_clean(); } --- a/fs/smb/client/cifsglob.h +++ b/fs/smb/client/cifsglob.h @@ -425,10 +425,10 @@ struct smb_version_operations { /* set fid protocol-specific info */ void (*set_fid)(struct cifsFileInfo *, struct cifs_fid *, __u32); /* close a file */ - void (*close)(const unsigned int, struct cifs_tcon *, + int (*close)(const unsigned int, struct cifs_tcon *, struct cifs_fid *); /* close a file, returning file attributes and timestamps */ - void (*close_getattr)(const unsigned int xid, struct cifs_tcon *tcon, + int (*close_getattr)(const unsigned int xid, struct cifs_tcon *tcon, struct cifsFileInfo *pfile_info); /* send a flush request to the server */ int (*flush)(const unsigned int, struct cifs_tcon *, struct cifs_fid *); @@ -1408,6 +1408,7 @@ struct cifsFileInfo { bool invalidHandle:1; /* file closed via session abend */ bool swapfile:1; bool oplock_break_cancelled:1; + bool offload:1; /* offload final part of _put to a wq */ unsigned int oplock_epoch; /* epoch from the lease break */ __u32 oplock_level; /* oplock/lease level from the lease break */ int count; @@ -1416,6 +1417,7 @@ struct cifsFileInfo { struct cifs_search_info srch_inf; struct work_struct oplock_break; /* work for oplock breaks */ struct work_struct put; /* work for the final part of _put */ + struct work_struct serverclose; /* work for serverclose */ struct delayed_work deferred; bool deferred_close_scheduled; /* Flag to indicate close is scheduled */ char *symlink_target; @@ -2073,6 +2075,7 @@ extern struct workqueue_struct *decrypt_ extern struct workqueue_struct *fileinfo_put_wq; extern struct workqueue_struct *cifsoplockd_wq; extern struct workqueue_struct *deferredclose_wq; +extern struct workqueue_struct *serverclose_wq; extern __u32 cifs_lock_secret;
extern mempool_t *cifs_mid_poolp; --- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -459,6 +459,7 @@ cifs_down_write(struct rw_semaphore *sem }
static void cifsFileInfo_put_work(struct work_struct *work); +void serverclose_work(struct work_struct *work);
struct cifsFileInfo *cifs_new_fileinfo(struct cifs_fid *fid, struct file *file, struct tcon_link *tlink, __u32 oplock, @@ -505,6 +506,7 @@ struct cifsFileInfo *cifs_new_fileinfo(s cfile->tlink = cifs_get_tlink(tlink); INIT_WORK(&cfile->oplock_break, cifs_oplock_break); INIT_WORK(&cfile->put, cifsFileInfo_put_work); + INIT_WORK(&cfile->serverclose, serverclose_work); INIT_DELAYED_WORK(&cfile->deferred, smb2_deferred_work_close); mutex_init(&cfile->fh_mutex); spin_lock_init(&cfile->file_info_lock); @@ -596,6 +598,40 @@ static void cifsFileInfo_put_work(struct cifsFileInfo_put_final(cifs_file); }
+void serverclose_work(struct work_struct *work) +{ + struct cifsFileInfo *cifs_file = container_of(work, + struct cifsFileInfo, serverclose); + + struct cifs_tcon *tcon = tlink_tcon(cifs_file->tlink); + + struct TCP_Server_Info *server = tcon->ses->server; + int rc = 0; + int retries = 0; + int MAX_RETRIES = 4; + + do { + if (server->ops->close_getattr) + rc = server->ops->close_getattr(0, tcon, cifs_file); + else if (server->ops->close) + rc = server->ops->close(0, tcon, &cifs_file->fid); + + if (rc == -EBUSY || rc == -EAGAIN) { + retries++; + msleep(250); + } + } while ((rc == -EBUSY || rc == -EAGAIN) && (retries < MAX_RETRIES) + ); + + if (retries == MAX_RETRIES) + pr_warn("Serverclose failed %d times, giving up\n", MAX_RETRIES); + + if (cifs_file->offload) + queue_work(fileinfo_put_wq, &cifs_file->put); + else + cifsFileInfo_put_final(cifs_file); +} + /** * cifsFileInfo_put - release a reference of file priv data * @@ -636,10 +672,13 @@ void _cifsFileInfo_put(struct cifsFileIn struct cifs_fid fid = {}; struct cifs_pending_open open; bool oplock_break_cancelled; + bool serverclose_offloaded = false;
spin_lock(&tcon->open_file_lock); spin_lock(&cifsi->open_file_lock); spin_lock(&cifs_file->file_info_lock); + + cifs_file->offload = offload; if (--cifs_file->count > 0) { spin_unlock(&cifs_file->file_info_lock); spin_unlock(&cifsi->open_file_lock); @@ -681,13 +720,20 @@ void _cifsFileInfo_put(struct cifsFileIn if (!tcon->need_reconnect && !cifs_file->invalidHandle) { struct TCP_Server_Info *server = tcon->ses->server; unsigned int xid; + int rc = 0;
xid = get_xid(); if (server->ops->close_getattr) - server->ops->close_getattr(xid, tcon, cifs_file); + rc = server->ops->close_getattr(xid, tcon, cifs_file); else if (server->ops->close) - server->ops->close(xid, tcon, &cifs_file->fid); + rc = server->ops->close(xid, tcon, &cifs_file->fid); _free_xid(xid); + + if (rc == -EBUSY || rc == -EAGAIN) { + // Server close failed, hence offloading it as an async op + queue_work(serverclose_wq, &cifs_file->serverclose); + serverclose_offloaded = true; + } }
if (oplock_break_cancelled) @@ -695,10 +741,15 @@ void _cifsFileInfo_put(struct cifsFileIn
cifs_del_pending_open(&open);
- if (offload) - queue_work(fileinfo_put_wq, &cifs_file->put); - else - cifsFileInfo_put_final(cifs_file); + // if serverclose has been offloaded to wq (on failure), it will + // handle offloading put as well. If serverclose not offloaded, + // we need to handle offloading put here. + if (!serverclose_offloaded) { + if (offload) + queue_work(fileinfo_put_wq, &cifs_file->put); + else + cifsFileInfo_put_final(cifs_file); + } }
int cifs_open(struct inode *inode, struct file *file) --- a/fs/smb/client/smb1ops.c +++ b/fs/smb/client/smb1ops.c @@ -753,11 +753,11 @@ cifs_set_fid(struct cifsFileInfo *cfile, cinode->can_cache_brlcks = CIFS_CACHE_WRITE(cinode); }
-static void +static int cifs_close_file(const unsigned int xid, struct cifs_tcon *tcon, struct cifs_fid *fid) { - CIFSSMBClose(xid, tcon, fid->netfid); + return CIFSSMBClose(xid, tcon, fid->netfid); }
static int --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -1392,14 +1392,14 @@ smb2_set_fid(struct cifsFileInfo *cfile, memcpy(cfile->fid.create_guid, fid->create_guid, 16); }
-static void +static int smb2_close_file(const unsigned int xid, struct cifs_tcon *tcon, struct cifs_fid *fid) { - SMB2_close(xid, tcon, fid->persistent_fid, fid->volatile_fid); + return SMB2_close(xid, tcon, fid->persistent_fid, fid->volatile_fid); }
-static void +static int smb2_close_getattr(const unsigned int xid, struct cifs_tcon *tcon, struct cifsFileInfo *cfile) { @@ -1410,7 +1410,7 @@ smb2_close_getattr(const unsigned int xi rc = __SMB2_close(xid, tcon, cfile->fid.persistent_fid, cfile->fid.volatile_fid, &file_inf); if (rc) - return; + return rc;
inode = d_inode(cfile->dentry);
@@ -1439,6 +1439,7 @@ smb2_close_getattr(const unsigned int xi
/* End of file and Attributes should not have to be updated on close */ spin_unlock(&inode->i_lock); + return rc; }
static int --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -3549,9 +3549,9 @@ __SMB2_close(const unsigned int xid, str memcpy(&pbuf->network_open_info, &rsp->network_open_info, sizeof(pbuf->network_open_info)); + atomic_dec(&tcon->num_remote_opens); }
- atomic_dec(&tcon->num_remote_opens); close_exit: SMB2_close_free(&rqst); free_rsp_buf(resp_buftype, rsp);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit ca545b7f0823f19db0f1148d59bc5e1a56634502 upstream.
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/cifs_debug.c | 2 ++ fs/smb/client/cifsglob.h | 10 ++++++++++ 2 files changed, 12 insertions(+)
--- a/fs/smb/client/cifs_debug.c +++ b/fs/smb/client/cifs_debug.c @@ -250,6 +250,8 @@ static int cifs_debug_files_proc_show(st spin_lock(&cifs_tcp_ses_lock); list_for_each_entry(server, &cifs_tcp_ses_list, tcp_ses_list) { list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) { + if (cifs_ses_exiting(ses)) + continue; list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { spin_lock(&tcon->open_file_lock); list_for_each_entry(cfile, &tcon->openFileList, tlist) { --- a/fs/smb/client/cifsglob.h +++ b/fs/smb/client/cifsglob.h @@ -2281,4 +2281,14 @@ struct smb2_compound_vars { struct smb2_file_link_info link_info; };
+static inline bool cifs_ses_exiting(struct cifs_ses *ses) +{ + bool ret; + + spin_lock(&ses->ses_lock); + ret = ses->ses_status == SES_EXITING; + spin_unlock(&ses->ses_lock); + return ret; +} + #endif /* _CIFS_GLOB_H */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit d3da25c5ac84430f89875ca7485a3828150a7e0a upstream.
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/cifs_debug.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/smb/client/cifs_debug.c +++ b/fs/smb/client/cifs_debug.c @@ -656,6 +656,8 @@ static ssize_t cifs_stats_proc_write(str } #endif /* CONFIG_CIFS_STATS2 */ list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) { + if (cifs_ses_exiting(ses)) + continue; list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { atomic_set(&tcon->num_smbs_sent, 0); spin_lock(&tcon->stat_lock);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit 0865ffefea197b437ba78b5dd8d8e256253efd65 upstream.
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/cifs_debug.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/smb/client/cifs_debug.c +++ b/fs/smb/client/cifs_debug.c @@ -736,6 +736,8 @@ static int cifs_stats_proc_show(struct s } #endif /* STATS2 */ list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) { + if (cifs_ses_exiting(ses)) + continue; list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { i++; seq_printf(m, "\n%d) %s", i, tcon->tree_name);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit 58acd1f497162e7d282077f816faa519487be045 upstream.
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/ioctl.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/fs/smb/client/ioctl.c +++ b/fs/smb/client/ioctl.c @@ -246,7 +246,9 @@ static int cifs_dump_full_key(struct cif spin_lock(&cifs_tcp_ses_lock); list_for_each_entry(server_it, &cifs_tcp_ses_list, tcp_ses_list) { list_for_each_entry(ses_it, &server_it->smb_ses_list, smb_ses_list) { - if (ses_it->Suid == out.session_id) { + spin_lock(&ses_it->ses_lock); + if (ses_it->ses_status != SES_EXITING && + ses_it->Suid == out.session_id) { ses = ses_it; /* * since we are using the session outside the crit @@ -254,9 +256,11 @@ static int cifs_dump_full_key(struct cif * so increment its refcount */ cifs_smb_ses_inc_refcount(ses); + spin_unlock(&ses_it->ses_lock); found = true; goto search_end; } + spin_unlock(&ses_it->ses_lock); } } search_end:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit 22863485a4626ec6ecf297f4cc0aef709bc862e4 upstream.
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/smb2misc.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/smb/client/smb2misc.c +++ b/fs/smb/client/smb2misc.c @@ -697,6 +697,8 @@ smb2_is_valid_oplock_break(char *buffer, /* look up tcon based on tid & uid */ spin_lock(&cifs_tcp_ses_lock); list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) { + if (cifs_ses_exiting(ses)) + continue; list_for_each_entry(tcon, &ses->tcon_list, tcon_list) {
spin_lock(&tcon->open_file_lock);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit 705c76fbf726c7a2f6ff9143d4013b18daaaebf1 upstream.
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/smb2misc.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/smb/client/smb2misc.c +++ b/fs/smb/client/smb2misc.c @@ -622,6 +622,8 @@ smb2_is_valid_lease_break(char *buffer, /* look up tcon based on tid & uid */ spin_lock(&cifs_tcp_ses_lock); list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) { + if (cifs_ses_exiting(ses)) + continue; list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { spin_lock(&tcon->open_file_lock); cifs_stats_inc(
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit 69ccf040acddf33a3a85ec0f6b45ef84b0f7ec29 upstream.
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/misc.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/smb/client/misc.c +++ b/fs/smb/client/misc.c @@ -489,6 +489,8 @@ is_valid_oplock_break(char *buffer, stru /* look up tcon based on tid & uid */ spin_lock(&cifs_tcp_ses_lock); list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) { + if (cifs_ses_exiting(ses)) + continue; list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { if (tcon->tid != buf->Tid) continue;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit 63981561ffd2d4987807df4126f96a11e18b0c1d upstream.
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/smb2ops.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -2430,6 +2430,8 @@ smb2_is_network_name_deleted(char *buf,
spin_lock(&cifs_tcp_ses_lock); list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) { + if (cifs_ses_exiting(ses)) + continue; list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { if (tcon->tid == le32_to_cpu(shdr->Id.SyncId.TreeId)) { spin_lock(&tcon->tc_lock);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit e0e50401cc3921c9eaf1b0e667db174519ea939f upstream.
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/connect.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -178,6 +178,8 @@ cifs_signal_cifsd_for_reconnect(struct T
spin_lock(&cifs_tcp_ses_lock); list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) { + if (cifs_ses_exiting(ses)) + continue; spin_lock(&ses->chan_lock); for (i = 0; i < ses->chan_count; i++) { if (!ses->chans[i].server)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Shyti andi.shyti@linux.intel.com
commit bc9a1ec01289e6e7259dc5030b413a9c6654a99a upstream.
The hardware should not dynamically balance the load between CCS engines. Wa_14019159160 recommends disabling it across all platforms.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+ Reviewed-by: Matt Roper matthew.d.roper@intel.com Acked-by: Michal Mrozek michal.mrozek@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240328073409.674098-2-andi.s... (cherry picked from commit f5d2904cf814f20b79e3e4c1b24a4ccc2411b7e0) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 23 +++++++++++++++++++++-- 2 files changed, 22 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1468,6 +1468,7 @@ #define ECOBITS_PPGTT_CACHE4B (0 << 8)
#define GEN12_RCU_MODE _MMIO(0x14800) +#define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1) #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0)
#define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168) --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -50,7 +50,8 @@ * registers belonging to BCS, VCS or VECS should be implemented in * xcs_engine_wa_init(). Workarounds for registers not belonging to a specific * engine's MMIO range but that are part of of the common RCS/CCS reset domain - * should be implemented in general_render_compute_wa_init(). + * should be implemented in general_render_compute_wa_init(). The settings + * about the CCS load balancing should be added in ccs_engine_wa_mode(). * * - GT workarounds: the list of these WAs is applied whenever these registers * revert to their default values: on GPU reset, suspend/resume [1]_, etc. @@ -2823,6 +2824,22 @@ add_render_compute_tuning_settings(struc wa_write_clr(wal, GEN8_GARBCNTL, GEN12_BUS_HASH_CTL_BIT_EXC); }
+static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_list *wal) +{ + struct intel_gt *gt = engine->gt; + + if (!IS_DG2(gt->i915)) + return; + + /* + * Wa_14019159160: This workaround, along with others, leads to + * significant challenges in utilizing load balancing among the + * CCS slices. Consequently, an architectural decision has been + * made to completely disable automatic CCS load balancing. + */ + wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE); +} + /* * The workarounds in this function apply to shared registers in * the general render reset domain that aren't tied to a @@ -2970,8 +2987,10 @@ engine_init_workarounds(struct intel_eng * to a single RCS/CCS engine's workaround list since * they're reset as part of the general render domain reset. */ - if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) + if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) { general_render_compute_wa_init(engine, wal); + ccs_engine_wa_mode(engine, wal); + }
if (engine->class == COMPUTE_CLASS) ccs_engine_wa_init(engine, wal);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Shyti andi.shyti@linux.intel.com
commit ea315f98e5d6d3191b74beb0c3e5fc16081d517c upstream.
We want a fixed load CCS balancing consisting in all slices sharing one single user engine. For this reason do not create the intel_engine_cs structure with its dedicated command streamer for CCS slices beyond the first.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+ Acked-by: Michal Mrozek michal.mrozek@intel.com Reviewed-by: Matt Roper matthew.d.roper@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240328073409.674098-3-andi.s... (cherry picked from commit c7a5aa4e57f88470313a8277eb299b221b86e3b1) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -912,6 +912,23 @@ static intel_engine_mask_t init_engine_m info->engine_mask &= ~BIT(GSC0); }
+ /* + * Do not create the command streamer for CCS slices beyond the first. + * All the workload submitted to the first engine will be shared among + * all the slices. + * + * Once the user will be allowed to customize the CCS mode, then this + * check needs to be removed. + */ + if (IS_DG2(gt->i915)) { + u8 first_ccs = __ffs(CCS_MASK(gt)); + + /* Mask off all the CCS engine */ + info->engine_mask &= ~GENMASK(CCS3, CCS0); + /* Put back in the first CCS engine */ + info->engine_mask |= BIT(_CCS(first_ccs)); + } + return info->engine_mask; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Shyti andi.shyti@linux.intel.com
commit 6db31251bb265813994bfb104eb4b4d0f44d64fb upstream.
Enable only one CCS engine by default with all the compute sices allocated to it.
While generating the list of UABI engines to be exposed to the user, exclude any additional CCS engines beyond the first instance.
This change can be tested with igt i915_query.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+ Reviewed-by: Matt Roper matthew.d.roper@intel.com Acked-by: Michal Mrozek michal.mrozek@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240328073409.674098-4-andi.s... (cherry picked from commit 2bebae0112b117de7e8a7289277a4bd2403b9e17) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/Makefile | 1 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 ++++++++++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 5 +++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 7 +++++ 5 files changed, 65 insertions(+) create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
--- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -104,6 +104,7 @@ gt-y += \ gt/intel_ggtt_fencing.o \ gt/intel_gt.o \ gt/intel_gt_buffer_pool.o \ + gt/intel_gt_ccs_mode.o \ gt/intel_gt_clock_utils.o \ gt/intel_gt_debugfs.o \ gt/intel_gt_engines_debugfs.o \ --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2024 Intel Corporation + */ + +#include "i915_drv.h" +#include "intel_gt.h" +#include "intel_gt_ccs_mode.h" +#include "intel_gt_regs.h" + +void intel_gt_apply_ccs_mode(struct intel_gt *gt) +{ + int cslice; + u32 mode = 0; + int first_ccs = __ffs(CCS_MASK(gt)); + + if (!IS_DG2(gt->i915)) + return; + + /* Build the value for the fixed CCS load balancing */ + for (cslice = 0; cslice < I915_MAX_CCS; cslice++) { + if (CCS_MASK(gt) & BIT(cslice)) + /* + * If available, assign the cslice + * to the first available engine... + */ + mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs); + + else + /* + * ... otherwise, mark the cslice as + * unavailable if no CCS dispatches here + */ + mode |= XEHP_CCS_MODE_CSLICE(cslice, + XEHP_CCS_MODE_CSLICE_MASK); + } + + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode); +} --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2024 Intel Corporation + */ + +#ifndef __INTEL_GT_CCS_MODE_H__ +#define __INTEL_GT_CCS_MODE_H__ + +struct intel_gt; + +void intel_gt_apply_ccs_mode(struct intel_gt *gt); + +#endif /* __INTEL_GT_CCS_MODE_H__ */ --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1471,6 +1471,11 @@ #define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1) #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0)
+#define XEHP_CCS_MODE _MMIO(0x14804) +#define XEHP_CCS_MODE_CSLICE_MASK REG_GENMASK(2, 0) /* CCS0-3 + rsvd */ +#define XEHP_CCS_MODE_CSLICE_WIDTH ilog2(XEHP_CCS_MODE_CSLICE_MASK + 1) +#define XEHP_CCS_MODE_CSLICE(cslice, ccs) (ccs << (cslice * XEHP_CCS_MODE_CSLICE_WIDTH)) + #define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0 (1 << 10) #define CHV_FGT_DISABLE_SS1 (1 << 11) --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -10,6 +10,7 @@ #include "intel_engine_regs.h" #include "intel_gpu_commands.h" #include "intel_gt.h" +#include "intel_gt_ccs_mode.h" #include "intel_gt_mcr.h" #include "intel_gt_regs.h" #include "intel_ring.h" @@ -2838,6 +2839,12 @@ static void ccs_engine_wa_mode(struct in * made to completely disable automatic CCS load balancing. */ wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE); + + /* + * After having disabled automatic load balancing we need to + * assign all slices to a single CCS. We will call it CCS mode 1 + */ + intel_gt_apply_ccs_mode(gt); }
/*
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit bebb5af001dc6cb4f505bb21c4d5e2efbdc112e2 which is commit f2208aa12c27bfada3c15c550c03ca81d42dcac2 upstream.
It is reported to cause problems in the stable branches, so revert it.
Link: https://lore.kernel.org/r/899b7c1419a064a2b721b78eade06659@stwm.de Reported-by: Wolfgang Walter linux@stwm.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Borislav Petkov (AMD) bp@alien8.de Cc: Guenter Roeck linux@roeck-us.net Cc: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/mpparse.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/arch/x86/kernel/mpparse.c +++ b/arch/x86/kernel/mpparse.c @@ -196,12 +196,12 @@ static int __init smp_read_mpc(struct mp if (!smp_check_mpc(mpc, oem, str)) return 0;
- if (early) { - /* Initialize the lapic mapping */ - if (!acpi_lapic) - register_lapic_address(mpc->lapic); + /* Initialize the lapic mapping */ + if (!acpi_lapic) + register_lapic_address(mpc->lapic); + + if (early) return 1; - }
/* Now process the configuration blocks. */ while (count < mpc->length) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sergey Shtylyov s.shtylyov@omp.ru
commit a1aa5390cc912934fee76ce80af5f940452fa987 upstream.
In of_modalias(), we can get passed the str and len parameters which would cause a kernel oops in vsnprintf() since it only allows passing a NULL ptr when the length is also 0. Also, we need to filter out the negative values of the len parameter as these will result in a really huge buffer since snprintf() takes size_t parameter while ours is ssize_t...
Found by Linux Verification Center (linuxtesting.org) with the Svace static analysis tool.
Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1d211023-3923-685b-20f0-f3f90ea56e1f@omp.ru Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/of/module.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/of/module.c +++ b/drivers/of/module.c @@ -16,6 +16,14 @@ ssize_t of_modalias(const struct device_ ssize_t csize; ssize_t tsize;
+ /* + * Prevent a kernel oops in vsnprintf() -- it only allows passing a + * NULL ptr when the length is also 0. Also filter out the negative + * lengths... + */ + if ((len > 0 && !str) || len < 0) + return -EINVAL; + /* Name & Type */ /* %p eats all alphanum characters, so %c must be used here */ csize = snprintf(str, len, "of:N%pOFn%c%s", np, 'T',
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit e3aae1098f109f0bd33c971deff1926f4e4441d0 upstream.
shellcheck recently helped to prevent issues. It is then good to fix the other harmless issues in order to spot "real" ones later.
Here, two categories of warnings are now ignored:
- SC2317: Command appears to be unreachable. The cleanup() function is invoked indirectly via the EXIT trap.
- SC2086: Double quote to prevent globbing and word splitting. This is recommended, but the current usage is correct and there is no need to do all these modifications to be compliant with this rule.
For the modifications:
- SC2034: ksft_skip appears unused. - SC2181: Check exit code directly with e.g. 'if mycmd;', not indirectly with $?. - SC2004: $/${} is unnecessary on arithmetic variables. - SC2155: Declare and assign separately to avoid masking return values. - SC2166: Prefer [ p ] && [ q ] as [ p -a q ] is not well defined. - SC2059: Don't use variables in the printf format string. Use printf '..%s..' "$foo".
Now this script is shellcheck (0.9.0) compliant. We can easily spot new issues.
Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptc... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 76 ++++++++++++--------- 1 file changed, 47 insertions(+), 29 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -1,6 +1,11 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0
+# Double quotes to prevent globbing and word splitting is recommended in new +# code but we accept it, especially because there were too many before having +# address all other issues detected by shellcheck. +#shellcheck disable=SC2086 + . "$(dirname "${0}")/mptcp_lib.sh"
time_start=$(date +%s) @@ -13,7 +18,6 @@ sout="" cin_disconnect="" cin="" cout="" -ksft_skip=4 capture=false timeout_poll=30 timeout_test=$((timeout_poll * 2 + 1)) @@ -131,6 +135,8 @@ ns4="ns4-$rndh" TEST_COUNT=0 TEST_GROUP=""
+# This function is used in the cleanup trap +#shellcheck disable=SC2317 cleanup() { rm -f "$cin_disconnect" "$cout_disconnect" @@ -225,8 +231,9 @@ set_ethtool_flags() { local dev="$2" local flags="$3"
- ip netns exec $ns ethtool -K $dev $flags 2>/dev/null - [ $? -eq 0 ] && echo "INFO: set $ns dev $dev: ethtool -K $flags" + if ip netns exec $ns ethtool -K $dev $flags 2>/dev/null; then + echo "INFO: set $ns dev $dev: ethtool -K $flags" + fi }
set_random_ethtool_flags() { @@ -363,7 +370,7 @@ do_transfer() local extra_args="$7"
local port - port=$((10000+$TEST_COUNT)) + port=$((10000+TEST_COUNT)) TEST_COUNT=$((TEST_COUNT+1))
if [ "$rcvbuf" -gt 0 ]; then @@ -420,12 +427,18 @@ do_transfer() nstat -n fi
- local stat_synrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX") - local stat_ackrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX") - local stat_cookietx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent") - local stat_cookierx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv") - local stat_csum_err_s=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtDataCsumErr") - local stat_csum_err_c=$(mptcp_lib_get_counter "${connector_ns}" "MPTcpExtDataCsumErr") + local stat_synrx_last_l + local stat_ackrx_last_l + local stat_cookietx_last + local stat_cookierx_last + local stat_csum_err_s + local stat_csum_err_c + stat_synrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX") + stat_ackrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX") + stat_cookietx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent") + stat_cookierx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv") + stat_csum_err_s=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtDataCsumErr") + stat_csum_err_c=$(mptcp_lib_get_counter "${connector_ns}" "MPTcpExtDataCsumErr")
timeout ${timeout_test} \ ip netns exec ${listener_ns} \ @@ -488,11 +501,16 @@ do_transfer() check_transfer $cin $sout "file received by server" rets=$?
- local stat_synrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX") - local stat_ackrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX") - local stat_cookietx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent") - local stat_cookierx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv") - local stat_ooo_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtTCPOFOQueue") + local stat_synrx_now_l + local stat_ackrx_now_l + local stat_cookietx_now + local stat_cookierx_now + local stat_ooo_now + stat_synrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX") + stat_ackrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX") + stat_cookietx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent") + stat_cookierx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv") + stat_ooo_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtTCPOFOQueue")
expect_synrx=$((stat_synrx_last_l)) expect_ackrx=$((stat_ackrx_last_l)) @@ -501,8 +519,8 @@ do_transfer() cookies=${cookies##*=}
if [ ${cl_proto} = "MPTCP" ] && [ ${srv_proto} = "MPTCP" ]; then - expect_synrx=$((stat_synrx_last_l+$connect_per_transfer)) - expect_ackrx=$((stat_ackrx_last_l+$connect_per_transfer)) + expect_synrx=$((stat_synrx_last_l+connect_per_transfer)) + expect_ackrx=$((stat_ackrx_last_l+connect_per_transfer)) fi
if [ ${stat_synrx_now_l} -lt ${expect_synrx} ]; then @@ -510,7 +528,7 @@ do_transfer() "${stat_synrx_now_l}" "${expect_synrx}" 1>&2 retc=1 fi - if [ ${stat_ackrx_now_l} -lt ${expect_ackrx} -a ${stat_ooo_now} -eq 0 ]; then + if [ ${stat_ackrx_now_l} -lt ${expect_ackrx} ] && [ ${stat_ooo_now} -eq 0 ]; then if [ ${stat_ooo_now} -eq 0 ]; then printf "[ FAIL ] lower MPC ACK rx (%d) than expected (%d)\n" \ "${stat_ackrx_now_l}" "${expect_ackrx}" 1>&2 @@ -521,18 +539,20 @@ do_transfer() fi
if $checksum; then - local csum_err_s=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtDataCsumErr") - local csum_err_c=$(mptcp_lib_get_counter "${connector_ns}" "MPTcpExtDataCsumErr") + local csum_err_s + local csum_err_c + csum_err_s=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtDataCsumErr") + csum_err_c=$(mptcp_lib_get_counter "${connector_ns}" "MPTcpExtDataCsumErr")
local csum_err_s_nr=$((csum_err_s - stat_csum_err_s)) if [ $csum_err_s_nr -gt 0 ]; then - printf "[ FAIL ]\nserver got $csum_err_s_nr data checksum error[s]" + printf "[ FAIL ]\nserver got %d data checksum error[s]" ${csum_err_s_nr} rets=1 fi
local csum_err_c_nr=$((csum_err_c - stat_csum_err_c)) if [ $csum_err_c_nr -gt 0 ]; then - printf "[ FAIL ]\nclient got $csum_err_c_nr data checksum error[s]" + printf "[ FAIL ]\nclient got %d data checksum error[s]" ${csum_err_c_nr} retc=1 fi fi @@ -701,7 +721,7 @@ run_test_transparent() return fi
-ip netns exec "$listener_ns" nft -f /dev/stdin <<"EOF" + if ! ip netns exec "$listener_ns" nft -f /dev/stdin <<"EOF" flush ruleset table inet mangle { chain divert { @@ -712,7 +732,7 @@ table inet mangle { } } EOF - if [ $? -ne 0 ]; then + then echo "SKIP: $msg, could not load nft ruleset" mptcp_lib_fail_if_expected_feature "nft rules" mptcp_lib_result_skip "${TEST_GROUP}" @@ -727,8 +747,7 @@ EOF local_addr="0.0.0.0" fi
- ip -net "$listener_ns" $r6flag rule add fwmark 1 lookup 100 - if [ $? -ne 0 ]; then + if ! ip -net "$listener_ns" $r6flag rule add fwmark 1 lookup 100; then ip netns exec "$listener_ns" nft flush ruleset echo "SKIP: $msg, ip $r6flag rule failed" mptcp_lib_fail_if_expected_feature "ip rule" @@ -736,8 +755,7 @@ EOF return fi
- ip -net "$listener_ns" route add local $local_addr/0 dev lo table 100 - if [ $? -ne 0 ]; then + if ! ip -net "$listener_ns" route add local $local_addr/0 dev lo table 100; then ip netns exec "$listener_ns" nft flush ruleset ip -net "$listener_ns" $r6flag rule del fwmark 1 lookup 100 echo "SKIP: $msg, ip route add local $local_addr failed" @@ -900,7 +918,7 @@ stop_if_error "Could not even run ping t echo -n "INFO: Using loss of $tc_loss " test "$tc_delay" -gt 0 && echo -n "delay $tc_delay ms "
-reorder_delay=$(($tc_delay / 4)) +reorder_delay=$((tc_delay / 4))
if [ -z "${tc_reorder}" ]; then reorder1=$((RANDOM%10))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Davide Caratti dcaratti@redhat.com
commit 8e2b8a9fa512709e6fee744dcd4e2a20ee7f5c56 upstream.
Eric Dumazet suggests:
The fact that mptcp_is_tcpsk() was able to write over sock->ops was a bit strange to me. mptcp_is_tcpsk() should answer a question, with a read-only argument.
re-factor code to avoid overwriting sock_ops inside that function. Also, change the helper name to reflect the semantics and to disambiguate from its dual, sk_is_mptcp(). While at it, collapse mptcp_stream_accept() and mptcp_accept() into a single function, where fallback / non-fallback are separated into a single sk_is_mptcp() conditional.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/432 Suggested-by: Eric Dumazet edumazet@google.com Signed-off-by: Davide Caratti dcaratti@redhat.com Acked-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Matthieu Baerts matttbe@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 108 ++++++++++++++++++++------------------------------- 1 file changed, 44 insertions(+), 64 deletions(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -55,28 +55,14 @@ static u64 mptcp_wnd_end(const struct mp return READ_ONCE(msk->wnd_end); }
-static bool mptcp_is_tcpsk(struct sock *sk) +static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk) { - struct socket *sock = sk->sk_socket; - - if (unlikely(sk->sk_prot == &tcp_prot)) { - /* we are being invoked after mptcp_accept() has - * accepted a non-mp-capable flow: sk is a tcp_sk, - * not an mptcp one. - * - * Hand the socket over to tcp so all further socket ops - * bypass mptcp. - */ - WRITE_ONCE(sock->ops, &inet_stream_ops); - return true; #if IS_ENABLED(CONFIG_MPTCP_IPV6) - } else if (unlikely(sk->sk_prot == &tcpv6_prot)) { - WRITE_ONCE(sock->ops, &inet6_stream_ops); - return true; + if (sk->sk_prot == &tcpv6_prot) + return &inet6_stream_ops; #endif - } - - return false; + WARN_ON_ONCE(sk->sk_prot != &tcp_prot); + return &inet_stream_ops; }
static int __mptcp_socket_create(struct mptcp_sock *msk) @@ -3328,44 +3314,6 @@ void mptcp_rcv_space_init(struct mptcp_s msk->rcvq_space.space = TCP_INIT_CWND * TCP_MSS_DEFAULT; }
-static struct sock *mptcp_accept(struct sock *ssk, int flags, int *err, - bool kern) -{ - struct sock *newsk; - - pr_debug("ssk=%p, listener=%p", ssk, mptcp_subflow_ctx(ssk)); - newsk = inet_csk_accept(ssk, flags, err, kern); - if (!newsk) - return NULL; - - pr_debug("newsk=%p, subflow is mptcp=%d", newsk, sk_is_mptcp(newsk)); - if (sk_is_mptcp(newsk)) { - struct mptcp_subflow_context *subflow; - struct sock *new_mptcp_sock; - - subflow = mptcp_subflow_ctx(newsk); - new_mptcp_sock = subflow->conn; - - /* is_mptcp should be false if subflow->conn is missing, see - * subflow_syn_recv_sock() - */ - if (WARN_ON_ONCE(!new_mptcp_sock)) { - tcp_sk(newsk)->is_mptcp = 0; - goto out; - } - - newsk = new_mptcp_sock; - MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_MPCAPABLEPASSIVEACK); - } else { - MPTCP_INC_STATS(sock_net(ssk), - MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); - } - -out: - newsk->sk_kern_sock = kern; - return newsk; -} - void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags) { struct mptcp_subflow_context *subflow, *tmp; @@ -3802,7 +3750,6 @@ static struct proto mptcp_prot = { .connect = mptcp_connect, .disconnect = mptcp_disconnect, .close = mptcp_close, - .accept = mptcp_accept, .setsockopt = mptcp_setsockopt, .getsockopt = mptcp_getsockopt, .shutdown = mptcp_shutdown, @@ -3912,18 +3859,36 @@ static int mptcp_stream_accept(struct so if (!ssk) return -EINVAL;
- newsk = mptcp_accept(ssk, flags, &err, kern); + pr_debug("ssk=%p, listener=%p", ssk, mptcp_subflow_ctx(ssk)); + newsk = inet_csk_accept(ssk, flags, &err, kern); if (!newsk) return err;
- lock_sock(newsk); - - __inet_accept(sock, newsock, newsk); - if (!mptcp_is_tcpsk(newsock->sk)) { - struct mptcp_sock *msk = mptcp_sk(newsk); + pr_debug("newsk=%p, subflow is mptcp=%d", newsk, sk_is_mptcp(newsk)); + if (sk_is_mptcp(newsk)) { struct mptcp_subflow_context *subflow; + struct sock *new_mptcp_sock; + + subflow = mptcp_subflow_ctx(newsk); + new_mptcp_sock = subflow->conn; + + /* is_mptcp should be false if subflow->conn is missing, see + * subflow_syn_recv_sock() + */ + if (WARN_ON_ONCE(!new_mptcp_sock)) { + tcp_sk(newsk)->is_mptcp = 0; + goto tcpfallback; + } + + newsk = new_mptcp_sock; + MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_MPCAPABLEPASSIVEACK); + + newsk->sk_kern_sock = kern; + lock_sock(newsk); + __inet_accept(sock, newsock, newsk);
set_bit(SOCK_CUSTOM_SOCKOPT, &newsock->flags); + msk = mptcp_sk(newsk); msk->in_accept_queue = 0;
/* set ssk->sk_socket of accept()ed flows to mptcp socket. @@ -3945,6 +3910,21 @@ static int mptcp_stream_accept(struct so if (unlikely(list_is_singular(&msk->conn_list))) mptcp_set_state(newsk, TCP_CLOSE); } + } else { + MPTCP_INC_STATS(sock_net(ssk), + MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); +tcpfallback: + newsk->sk_kern_sock = kern; + lock_sock(newsk); + __inet_accept(sock, newsock, newsk); + /* we are being invoked after accepting a non-mp-capable + * flow: sk is a tcp_sk, not an mptcp one. + * + * Hand the socket over to tcp so all further socket ops + * bypass mptcp. + */ + WRITE_ONCE(newsock->sk->sk_socket->ops, + mptcp_fallback_tcp_ops(newsock->sk)); } release_sock(newsk);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Davide Caratti dcaratti@redhat.com
commit 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282 upstream.
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they accept non-MPC connections. As reported by Christoph, this is "surprising" because the counter might become greater than MPTcpExtMPCapableSYNRX.
MPTcpExtMPCapableFallbackACK counter's name suggests it should only be incremented when a connection was seen using MPTCP options, then a fallback to TCP has been done. Let's do that by incrementing it when the subflow context of an inbound MPC connection attempt is dropped. Also, update mptcp_connect.sh kselftest, to ensure that the above MIB does not increment in case a pure TCP client connects to a MPTCP server.
Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch cpaasch@apple.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449 Signed-off-by: Davide Caratti dcaratti@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-3... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 2 -- net/mptcp/subflow.c | 2 ++ tools/testing/selftests/net/mptcp/mptcp_connect.sh | 9 +++++++++ 3 files changed, 11 insertions(+), 2 deletions(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3911,8 +3911,6 @@ static int mptcp_stream_accept(struct so mptcp_set_state(newsk, TCP_CLOSE); } } else { - MPTCP_INC_STATS(sock_net(ssk), - MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); tcpfallback: newsk->sk_kern_sock = kern; lock_sock(newsk); --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -905,6 +905,8 @@ dispose_child: return child;
fallback: + if (fallback) + SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); mptcp_subflow_drop_ctx(child); return child; } --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -433,12 +433,14 @@ do_transfer() local stat_cookierx_last local stat_csum_err_s local stat_csum_err_c + local stat_tcpfb_last_l stat_synrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX") stat_ackrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX") stat_cookietx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent") stat_cookierx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv") stat_csum_err_s=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtDataCsumErr") stat_csum_err_c=$(mptcp_lib_get_counter "${connector_ns}" "MPTcpExtDataCsumErr") + stat_tcpfb_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK")
timeout ${timeout_test} \ ip netns exec ${listener_ns} \ @@ -506,11 +508,13 @@ do_transfer() local stat_cookietx_now local stat_cookierx_now local stat_ooo_now + local stat_tcpfb_now_l stat_synrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX") stat_ackrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX") stat_cookietx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent") stat_cookierx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv") stat_ooo_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtTCPOFOQueue") + stat_tcpfb_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK")
expect_synrx=$((stat_synrx_last_l)) expect_ackrx=$((stat_ackrx_last_l)) @@ -564,6 +568,11 @@ do_transfer() mptcp_lib_result_fail "${TEST_GROUP}: ${result_msg}" fi
+ if [ ${stat_ooo_now} -eq 0 ] && [ ${stat_tcpfb_last_l} -ne ${stat_tcpfb_now_l} ]; then + mptcp_lib_pr_fail "unexpected fallback to TCP" + rets=1 + fi + if [ $cookies -eq 2 ];then if [ $stat_cookietx_last -ge $stat_cookietx_now ] ;then printf " WARN: CookieSent: did not advance"
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrii Nakryiko andrii@kernel.org
commit e9c856cabefb71d47b2eeb197f72c9c88e9b45b0 upstream.
There is no need to delay putting either path or task to deallocation step. It can be done right after bpf_uprobe_unregister. Between release and dealloc, there could be still some running BPF programs, but they don't access either task or path, only data in link->uprobes, so it is safe to do.
On the other hand, doing path_put() in dealloc callback makes this dealloc sleepable because path_put() itself might sleep. Which is problematic due to the need to call uprobe's dealloc through call_rcu(), which is what is done in the next bug fix patch. So solve the problem by releasing these resources early.
Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/r/20240328052426.3042617-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/bpf_trace.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -3065,6 +3065,9 @@ static void bpf_uprobe_multi_link_releas
umulti_link = container_of(link, struct bpf_uprobe_multi_link, link); bpf_uprobe_unregister(&umulti_link->path, umulti_link->uprobes, umulti_link->cnt); + if (umulti_link->task) + put_task_struct(umulti_link->task); + path_put(&umulti_link->path); }
static void bpf_uprobe_multi_link_dealloc(struct bpf_link *link) @@ -3072,9 +3075,6 @@ static void bpf_uprobe_multi_link_deallo struct bpf_uprobe_multi_link *umulti_link;
umulti_link = container_of(link, struct bpf_uprobe_multi_link, link); - if (umulti_link->task) - put_task_struct(umulti_link->task); - path_put(&umulti_link->path); kvfree(umulti_link->uprobes); kfree(umulti_link); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrii Nakryiko andrii@kernel.org
commit 1a80dbcb2dbaf6e4c216e62e30fa7d3daa8001ce upstream.
BPF link for some program types is passed as a "context" which can be used by those BPF programs to look up additional information. E.g., for multi-kprobes and multi-uprobes, link is used to fetch BPF cookie values.
Because of this runtime dependency, when bpf_link refcnt drops to zero there could still be active BPF programs running accessing link data.
This patch adds generic support to defer bpf_link dealloc callback to after RCU GP, if requested. This is done by exposing two different deallocation callbacks, one synchronous and one deferred. If deferred one is provided, bpf_link_free() will schedule dealloc_deferred() callback to happen after RCU GP.
BPF is using two flavors of RCU: "classic" non-sleepable one and RCU tasks trace one. The latter is used when sleepable BPF programs are used. bpf_link_free() accommodates that by checking underlying BPF program's sleepable flag, and goes either through normal RCU GP only for non-sleepable, or through RCU tasks trace GP *and* then normal RCU GP (taking into account rcu_trace_implies_rcu_gp() optimization), if BPF program is sleepable.
We use this for multi-kprobe and multi-uprobe links, which dereference link during program run. We also preventively switch raw_tp link to use deferred dealloc callback, as upcoming changes in bpf-next tree expose raw_tp link data (specifically, cookie value) to BPF program at runtime as well.
Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link") Reported-by: syzbot+981935d9485a560bfbcb@syzkaller.appspotmail.com Reported-by: syzbot+2cb5a6c573e98db598cc@syzkaller.appspotmail.com Reported-by: syzbot+62d8b26793e8a2bd0516@syzkaller.appspotmail.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Acked-by: Jiri Olsa jolsa@kernel.org Link: https://lore.kernel.org/r/20240328052426.3042617-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bpf.h | 16 +++++++++++++++- kernel/bpf/syscall.c | 35 ++++++++++++++++++++++++++++++++--- kernel/trace/bpf_trace.c | 4 ++-- 3 files changed, 49 insertions(+), 6 deletions(-)
--- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1524,12 +1524,26 @@ struct bpf_link { enum bpf_link_type type; const struct bpf_link_ops *ops; struct bpf_prog *prog; - struct work_struct work; + /* rcu is used before freeing, work can be used to schedule that + * RCU-based freeing before that, so they never overlap + */ + union { + struct rcu_head rcu; + struct work_struct work; + }; };
struct bpf_link_ops { void (*release)(struct bpf_link *link); + /* deallocate link resources callback, called without RCU grace period + * waiting + */ void (*dealloc)(struct bpf_link *link); + /* deallocate link resources callback, called after RCU grace period; + * if underlying BPF program is sleepable we go through tasks trace + * RCU GP and then "classic" RCU GP + */ + void (*dealloc_deferred)(struct bpf_link *link); int (*detach)(struct bpf_link *link); int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog, struct bpf_prog *old_prog); --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2866,17 +2866,46 @@ void bpf_link_inc(struct bpf_link *link) atomic64_inc(&link->refcnt); }
+static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu) +{ + struct bpf_link *link = container_of(rcu, struct bpf_link, rcu); + + /* free bpf_link and its containing memory */ + link->ops->dealloc_deferred(link); +} + +static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) +{ + if (rcu_trace_implies_rcu_gp()) + bpf_link_defer_dealloc_rcu_gp(rcu); + else + call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp); +} + /* bpf_link_free is guaranteed to be called from process context */ static void bpf_link_free(struct bpf_link *link) { + bool sleepable = false; + bpf_link_free_id(link->id); if (link->prog) { + sleepable = link->prog->aux->sleepable; /* detach BPF program, clean up used resources */ link->ops->release(link); bpf_prog_put(link->prog); } - /* free bpf_link and its containing memory */ - link->ops->dealloc(link); + if (link->ops->dealloc_deferred) { + /* schedule BPF link deallocation; if underlying BPF program + * is sleepable, we need to first wait for RCU tasks trace + * sync, then go through "classic" RCU grace period + */ + if (sleepable) + call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); + else + call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); + } + if (link->ops->dealloc) + link->ops->dealloc(link); }
static void bpf_link_put_deferred(struct work_struct *work) @@ -3381,7 +3410,7 @@ static int bpf_raw_tp_link_fill_link_inf
static const struct bpf_link_ops bpf_raw_tp_link_lops = { .release = bpf_raw_tp_link_release, - .dealloc = bpf_raw_tp_link_dealloc, + .dealloc_deferred = bpf_raw_tp_link_dealloc, .show_fdinfo = bpf_raw_tp_link_show_fdinfo, .fill_link_info = bpf_raw_tp_link_fill_link_info, }; --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2639,7 +2639,7 @@ static int bpf_kprobe_multi_link_fill_li
static const struct bpf_link_ops bpf_kprobe_multi_link_lops = { .release = bpf_kprobe_multi_link_release, - .dealloc = bpf_kprobe_multi_link_dealloc, + .dealloc_deferred = bpf_kprobe_multi_link_dealloc, .fill_link_info = bpf_kprobe_multi_link_fill_link_info, };
@@ -3081,7 +3081,7 @@ static void bpf_uprobe_multi_link_deallo
static const struct bpf_link_ops bpf_uprobe_multi_link_lops = { .release = bpf_uprobe_multi_link_release, - .dealloc = bpf_uprobe_multi_link_dealloc, + .dealloc_deferred = bpf_uprobe_multi_link_dealloc, };
static int uprobe_prog_run(struct bpf_uprobe *uprobe,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hou Wenlong houwenlong.hwl@antgroup.com
commit d2a285d65bfde3218fd0c3b88794d0135ced680b upstream.
Move the __head section definition to a header to widen its use.
An upcoming patch will mark the code as __head in mem_encrypt_identity.c too.
Signed-off-by: Hou Wenlong houwenlong.hwl@antgroup.com Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/0583f57977be184689c373fe540cbd7d85ca2047.169752540... Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/init.h | 2 ++ arch/x86/kernel/head64.c | 3 +-- 2 files changed, 3 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/init.h +++ b/arch/x86/include/asm/init.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_INIT_H #define _ASM_X86_INIT_H
+#define __head __section(".head.text") + struct x86_mapping_info { void *(*alloc_pgt_page)(void *); /* allocate buf for page table */ void *context; /* context for alloc_pgt_page */ --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -41,6 +41,7 @@ #include <asm/trapnr.h> #include <asm/sev.h> #include <asm/tdx.h> +#include <asm/init.h>
/* * Manage page tables very early on. @@ -84,8 +85,6 @@ static struct desc_ptr startup_gdt_descr .address = 0, };
-#define __head __section(".head.text") - static void __head *fixup_pointer(void *ptr, unsigned long physaddr) { return ptr - (void *)_text + (void *)physaddr;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel ardb@kernel.org
commit 7205f06e847422b66c1506eee01b9998ffc75d76 upstream.
Parse the mem_encrypt= command line parameter from the EFI stub if CONFIG_ARCH_HAS_MEM_ENCRYPT=y, so that it can be passed to the early boot code by the arch code in the stub.
This avoids the need for the core kernel to do any string parsing very early in the boot.
Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Tested-by: Tom Lendacky thomas.lendacky@amd.com Link: https://lore.kernel.org/r/20240227151907.387873-16-ardb+git@google.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/libstub/efi-stub-helper.c | 8 ++++++++ drivers/firmware/efi/libstub/efistub.h | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -24,6 +24,8 @@ static bool efi_noinitrd; static bool efi_nosoftreserve; static bool efi_disable_pci_dma = IS_ENABLED(CONFIG_EFI_DISABLE_PCI_DMA);
+int efi_mem_encrypt; + bool __pure __efi_soft_reserve_enabled(void) { return !efi_nosoftreserve; @@ -75,6 +77,12 @@ efi_status_t efi_parse_options(char cons efi_noinitrd = true; } else if (IS_ENABLED(CONFIG_X86_64) && !strcmp(param, "no5lvl")) { efi_no5lvl = true; + } else if (IS_ENABLED(CONFIG_ARCH_HAS_MEM_ENCRYPT) && + !strcmp(param, "mem_encrypt") && val) { + if (parse_option_str(val, "on")) + efi_mem_encrypt = 1; + else if (parse_option_str(val, "off")) + efi_mem_encrypt = -1; } else if (!strcmp(param, "efi") && val) { efi_nochunk = parse_option_str(val, "nochunk"); efi_novamap |= parse_option_str(val, "novamap"); --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -37,8 +37,8 @@ extern bool efi_no5lvl; extern bool efi_nochunk; extern bool efi_nokaslr; extern int efi_loglevel; +extern int efi_mem_encrypt; extern bool efi_novamap; - extern const efi_system_table_t *efi_system_table;
typedef union efi_dxe_services_table efi_dxe_services_table_t;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel ardb@kernel.org
commit 48204aba801f1b512b3abed10b8e1a63e03f3dd1 upstream.
The .head.text section is the initial primary entrypoint of the core kernel, and is entered with the CPU executing from a 1:1 mapping of memory. Such code must never access global variables using absolute references, as these are based on the kernel virtual mapping which is not active yet at this point.
Given that the SME startup code is also called from this early execution context, move it into .head.text as well. This will allow more thorough build time checks in the future to ensure that early startup code only uses RIP-relative references to global variables.
Also replace some occurrences of __pa_symbol() [which relies on the compiler generating an absolute reference, which is not guaranteed] and an open coded RIP-relative access with RIP_REL_REF().
Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Tested-by: Tom Lendacky thomas.lendacky@amd.com Link: https://lore.kernel.org/r/20240227151907.387873-18-ardb+git@google.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/mem_encrypt.h | 8 +++---- arch/x86/mm/mem_encrypt_identity.c | 42 ++++++++++++++----------------------- 2 files changed, 21 insertions(+), 29 deletions(-)
--- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -46,8 +46,8 @@ void __init sme_unmap_bootdata(char *rea void __init sme_early_init(void); void __init sev_setup_arch(void);
-void __init sme_encrypt_kernel(struct boot_params *bp); -void __init sme_enable(struct boot_params *bp); +void sme_encrypt_kernel(struct boot_params *bp); +void sme_enable(struct boot_params *bp);
int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size); int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size); @@ -81,8 +81,8 @@ static inline void __init sme_unmap_boot static inline void __init sme_early_init(void) { } static inline void __init sev_setup_arch(void) { }
-static inline void __init sme_encrypt_kernel(struct boot_params *bp) { } -static inline void __init sme_enable(struct boot_params *bp) { } +static inline void sme_encrypt_kernel(struct boot_params *bp) { } +static inline void sme_enable(struct boot_params *bp) { }
static inline void sev_es_init_vc_handling(void) { }
--- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/mm/mem_encrypt_identity.c @@ -41,6 +41,7 @@ #include <linux/mem_encrypt.h> #include <linux/cc_platform.h>
+#include <asm/init.h> #include <asm/setup.h> #include <asm/sections.h> #include <asm/cmdline.h> @@ -98,7 +99,7 @@ static char sme_workarea[2 * PMD_SIZE] _ static char sme_cmdline_arg[] __initdata = "mem_encrypt"; static char sme_cmdline_on[] __initdata = "on";
-static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd) +static void __head sme_clear_pgd(struct sme_populate_pgd_data *ppd) { unsigned long pgd_start, pgd_end, pgd_size; pgd_t *pgd_p; @@ -113,7 +114,7 @@ static void __init sme_clear_pgd(struct memset(pgd_p, 0, pgd_size); }
-static pud_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd) +static pud_t __head *sme_prepare_pgd(struct sme_populate_pgd_data *ppd) { pgd_t *pgd; p4d_t *p4d; @@ -150,7 +151,7 @@ static pud_t __init *sme_prepare_pgd(str return pud; }
-static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd) +static void __head sme_populate_pgd_large(struct sme_populate_pgd_data *ppd) { pud_t *pud; pmd_t *pmd; @@ -166,7 +167,7 @@ static void __init sme_populate_pgd_larg set_pmd(pmd, __pmd(ppd->paddr | ppd->pmd_flags)); }
-static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd) +static void __head sme_populate_pgd(struct sme_populate_pgd_data *ppd) { pud_t *pud; pmd_t *pmd; @@ -192,7 +193,7 @@ static void __init sme_populate_pgd(stru set_pte(pte, __pte(ppd->paddr | ppd->pte_flags)); }
-static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd) +static void __head __sme_map_range_pmd(struct sme_populate_pgd_data *ppd) { while (ppd->vaddr < ppd->vaddr_end) { sme_populate_pgd_large(ppd); @@ -202,7 +203,7 @@ static void __init __sme_map_range_pmd(s } }
-static void __init __sme_map_range_pte(struct sme_populate_pgd_data *ppd) +static void __head __sme_map_range_pte(struct sme_populate_pgd_data *ppd) { while (ppd->vaddr < ppd->vaddr_end) { sme_populate_pgd(ppd); @@ -212,7 +213,7 @@ static void __init __sme_map_range_pte(s } }
-static void __init __sme_map_range(struct sme_populate_pgd_data *ppd, +static void __head __sme_map_range(struct sme_populate_pgd_data *ppd, pmdval_t pmd_flags, pteval_t pte_flags) { unsigned long vaddr_end; @@ -236,22 +237,22 @@ static void __init __sme_map_range(struc __sme_map_range_pte(ppd); }
-static void __init sme_map_range_encrypted(struct sme_populate_pgd_data *ppd) +static void __head sme_map_range_encrypted(struct sme_populate_pgd_data *ppd) { __sme_map_range(ppd, PMD_FLAGS_ENC, PTE_FLAGS_ENC); }
-static void __init sme_map_range_decrypted(struct sme_populate_pgd_data *ppd) +static void __head sme_map_range_decrypted(struct sme_populate_pgd_data *ppd) { __sme_map_range(ppd, PMD_FLAGS_DEC, PTE_FLAGS_DEC); }
-static void __init sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd) +static void __head sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd) { __sme_map_range(ppd, PMD_FLAGS_DEC_WP, PTE_FLAGS_DEC_WP); }
-static unsigned long __init sme_pgtable_calc(unsigned long len) +static unsigned long __head sme_pgtable_calc(unsigned long len) { unsigned long entries = 0, tables = 0;
@@ -288,7 +289,7 @@ static unsigned long __init sme_pgtable_ return entries + tables; }
-void __init sme_encrypt_kernel(struct boot_params *bp) +void __head sme_encrypt_kernel(struct boot_params *bp) { unsigned long workarea_start, workarea_end, workarea_len; unsigned long execute_start, execute_end, execute_len; @@ -323,9 +324,8 @@ void __init sme_encrypt_kernel(struct bo * memory from being cached. */
- /* Physical addresses gives us the identity mapped virtual addresses */ - kernel_start = __pa_symbol(_text); - kernel_end = ALIGN(__pa_symbol(_end), PMD_SIZE); + kernel_start = (unsigned long)RIP_REL_REF(_text); + kernel_end = ALIGN((unsigned long)RIP_REL_REF(_end), PMD_SIZE); kernel_len = kernel_end - kernel_start;
initrd_start = 0; @@ -343,14 +343,6 @@ void __init sme_encrypt_kernel(struct bo #endif
/* - * We're running identity mapped, so we must obtain the address to the - * SME encryption workarea using rip-relative addressing. - */ - asm ("lea sme_workarea(%%rip), %0" - : "=r" (workarea_start) - : "p" (sme_workarea)); - - /* * Calculate required number of workarea bytes needed: * executable encryption area size: * stack page (PAGE_SIZE) @@ -359,7 +351,7 @@ void __init sme_encrypt_kernel(struct bo * pagetable structures for the encryption of the kernel * pagetable structures for workarea (in case not currently mapped) */ - execute_start = workarea_start; + execute_start = workarea_start = (unsigned long)RIP_REL_REF(sme_workarea); execute_end = execute_start + (PAGE_SIZE * 2) + PMD_SIZE; execute_len = execute_end - execute_start;
@@ -502,7 +494,7 @@ void __init sme_encrypt_kernel(struct bo native_write_cr3(__native_read_cr3()); }
-void __init sme_enable(struct boot_params *bp) +void __head sme_enable(struct boot_params *bp) { const char *cmdline_ptr, *cmdline_arg, *cmdline_on; unsigned int eax, ebx, ecx, edx;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel ardb@kernel.org
commit 428080c9b19bfda37c478cd626dbd3851db1aff9 upstream.
In preparation for implementing rigorous build time checks to enforce that only code that can support it will be called from the early 1:1 mapping of memory, move SEV init code that is called in this manner to the .head.text section.
Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Tested-by: Tom Lendacky thomas.lendacky@amd.com Link: https://lore.kernel.org/r/20240227151907.387873-19-ardb+git@google.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/boot/compressed/sev.c | 3 +++ arch/x86/include/asm/sev.h | 10 +++++----- arch/x86/kernel/sev-shared.c | 23 ++++++++++------------- arch/x86/kernel/sev.c | 14 ++++++++------ 4 files changed, 26 insertions(+), 24 deletions(-)
--- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -116,6 +116,9 @@ static bool fault_in_kernel_space(unsign #undef __init #define __init
+#undef __head +#define __head + #define __BOOT_COMPRESSED
/* Basic instruction decoding support needed */ --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -199,15 +199,15 @@ static inline int pvalidate(unsigned lon struct snp_guest_request_ioctl;
void setup_ghcb(void); -void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, - unsigned long npages); -void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, - unsigned long npages); +void early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, + unsigned long npages); +void early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, + unsigned long npages); void snp_set_memory_shared(unsigned long vaddr, unsigned long npages); void snp_set_memory_private(unsigned long vaddr, unsigned long npages); void snp_set_wakeup_secondary_cpu(void); bool snp_init(struct boot_params *bp); -void __init __noreturn snp_abort(void); +void __noreturn snp_abort(void); void snp_dmi_setup(void); int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio); void snp_accept_memory(phys_addr_t start, phys_addr_t end); --- a/arch/x86/kernel/sev-shared.c +++ b/arch/x86/kernel/sev-shared.c @@ -89,7 +89,8 @@ static bool __init sev_es_check_cpu_feat return true; }
-static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason) +static void __head __noreturn +sev_es_terminate(unsigned int set, unsigned int reason) { u64 val = GHCB_MSR_TERM_REQ;
@@ -326,13 +327,7 @@ static int sev_cpuid_hv(struct ghcb *ghc */ static const struct snp_cpuid_table *snp_cpuid_get_table(void) { - void *ptr; - - asm ("lea cpuid_table_copy(%%rip), %0" - : "=r" (ptr) - : "p" (&cpuid_table_copy)); - - return ptr; + return &RIP_REL_REF(cpuid_table_copy); }
/* @@ -391,7 +386,7 @@ static u32 snp_cpuid_calc_xsave_size(u64 return xsave_size; }
-static bool +static bool __head snp_cpuid_get_validated_func(struct cpuid_leaf *leaf) { const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table(); @@ -528,7 +523,8 @@ static int snp_cpuid_postprocess(struct * Returns -EOPNOTSUPP if feature not enabled. Any other non-zero return value * should be treated as fatal by caller. */ -static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf) +static int __head +snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf) { const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
@@ -570,7 +566,7 @@ static int snp_cpuid(struct ghcb *ghcb, * page yet, so it only supports the MSR based communication with the * hypervisor and only the CPUID exit-code. */ -void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code) +void __head do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code) { unsigned int subfn = lower_bits(regs->cx, 32); unsigned int fn = lower_bits(regs->ax, 32); @@ -1016,7 +1012,8 @@ struct cc_setup_data { * Search for a Confidential Computing blob passed in as a setup_data entry * via the Linux Boot Protocol. */ -static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp) +static __head +struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp) { struct cc_setup_data *sd = NULL; struct setup_data *hdr; @@ -1043,7 +1040,7 @@ static struct cc_blob_sev_info *find_cc_ * mapping needs to be updated in sync with all the changes to virtual memory * layout and related mapping facilities throughout the boot process. */ -static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info) +static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info) { const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table; int i; --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -26,6 +26,7 @@ #include <linux/dmi.h> #include <uapi/linux/sev-guest.h>
+#include <asm/init.h> #include <asm/cpu_entry_area.h> #include <asm/stacktrace.h> #include <asm/sev.h> @@ -683,8 +684,9 @@ static u64 __init get_jump_table_addr(vo return ret; }
-static void early_set_pages_state(unsigned long vaddr, unsigned long paddr, - unsigned long npages, enum psc_op op) +static void __head +early_set_pages_state(unsigned long vaddr, unsigned long paddr, + unsigned long npages, enum psc_op op) { unsigned long paddr_end; u64 val; @@ -740,7 +742,7 @@ e_term: sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC); }
-void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, +void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned long npages) { /* @@ -2045,7 +2047,7 @@ fail: * * Scan for the blob in that order. */ -static __init struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp) +static __head struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp) { struct cc_blob_sev_info *cc_info;
@@ -2071,7 +2073,7 @@ found_cc_info: return cc_info; }
-bool __init snp_init(struct boot_params *bp) +bool __head snp_init(struct boot_params *bp) { struct cc_blob_sev_info *cc_info;
@@ -2093,7 +2095,7 @@ bool __init snp_init(struct boot_params return true; }
-void __init __noreturn snp_abort(void) +void __head __noreturn snp_abort(void) { sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel ardb@kernel.org
commit 9c55461040a9264b7e44444c53d26480b438eda6 upstream.
Currently, the EFI stub invokes the EFI memory attributes protocol to strip any NX restrictions from the entire loaded kernel, resulting in all code and data being mapped read-write-execute.
The point of the EFI memory attributes protocol is to remove the need for all memory allocations to be mapped with both write and execute permissions by default, and make it the OS loader's responsibility to transition data mappings to code mappings where appropriate.
Even though the UEFI specification does not appear to leave room for denying memory attribute changes based on security policy, let's be cautious and avoid relying on the ability to create read-write-execute mappings. This is trivially achievable, given that the amount of kernel code executing via the firmware's 1:1 mapping is rather small and limited to the .head.text region. So let's drop the NX restrictions only on that subregion, but not before remapping it as read-only first.
Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/boot/compressed/Makefile | 2 +- arch/x86/boot/compressed/misc.c | 1 + arch/x86/include/asm/boot.h | 1 + drivers/firmware/efi/libstub/x86-stub.c | 11 ++++++++++- 4 files changed, 13 insertions(+), 2 deletions(-)
--- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -84,7 +84,7 @@ LDFLAGS_vmlinux += -T hostprogs := mkpiggy HOST_EXTRACFLAGS += -I$(srctree)/tools/include
-sed-voffset := -e 's/^([0-9a-fA-F]*) [ABCDGRSTVW] (_text|__bss_start|_end)$$/#define VO_\2 _AC(0x\1,UL)/p' +sed-voffset := -e 's/^([0-9a-fA-F]*) [ABCDGRSTVW] (_text|__start_rodata|__bss_start|_end)$$/#define VO_\2 _AC(0x\1,UL)/p'
quiet_cmd_voffset = VOFFSET $@ cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@ --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -330,6 +330,7 @@ static size_t parse_elf(void *output) return ehdr.e_entry - LOAD_PHYSICAL_ADDR; }
+const unsigned long kernel_text_size = VO___start_rodata - VO__text; const unsigned long kernel_total_size = VO__end - VO__text;
static u8 boot_heap[BOOT_HEAP_SIZE] __aligned(4); --- a/arch/x86/include/asm/boot.h +++ b/arch/x86/include/asm/boot.h @@ -81,6 +81,7 @@
#ifndef __ASSEMBLY__ extern unsigned int output_len; +extern const unsigned long kernel_text_size; extern const unsigned long kernel_total_size;
unsigned long decompress_kernel(unsigned char *outbuf, unsigned long virt_addr, --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -238,6 +238,15 @@ efi_status_t efi_adjust_memory_range_pro rounded_end = roundup(start + size, EFI_PAGE_SIZE);
if (memattr != NULL) { + status = efi_call_proto(memattr, set_memory_attributes, + rounded_start, + rounded_end - rounded_start, + EFI_MEMORY_RO); + if (status != EFI_SUCCESS) { + efi_warn("Failed to set EFI_MEMORY_RO attribute\n"); + return status; + } + status = efi_call_proto(memattr, clear_memory_attributes, rounded_start, rounded_end - rounded_start, @@ -816,7 +825,7 @@ static efi_status_t efi_decompress_kerne
*kernel_entry = addr + entry;
- return efi_adjust_memory_range_protection(addr, kernel_total_size); + return efi_adjust_memory_range_protection(addr, kernel_text_size); }
static void __noreturn enter_kernel(unsigned long kernel_addr,
Hello,
On Mon, 8 Apr 2024 14:54:59 +0200 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.26-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] ec59b99017e9 ("Linux 6.6.26-rc1")
Thanks, SJ
[...]
---
ok 1 selftests: damon: debugfs_attrs.sh ok 2 selftests: damon: debugfs_schemes.sh ok 3 selftests: damon: debugfs_target_ids.sh ok 4 selftests: damon: debugfs_empty_targets.sh ok 5 selftests: damon: debugfs_huge_count_read_write.sh ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: debugfs_rm_non_contexts.sh ok 8 selftests: damon: sysfs.sh ok 9 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 10 selftests: damon: reclaim.sh ok 11 selftests: damon: lru_sort.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_arm64.sh ok 12 selftests: damon-tests: build_m68k.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m
On Mon, 8 Apr 2024 at 18:30, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.26-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
The s390 defconfig build failed with gcc-13 and clang-17 due following build warning / errors on Linux stable-rc linux-6.6.y.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Build error: -------- arch/s390/kernel/perf_pai_crypto.c: In function 'paicrypt_stop': arch/s390/kernel/perf_pai_crypto.c:280:51: error: 'paicrypt_root' undeclared (first use in this function); did you mean 'paicrypt_stop'? 280 | struct paicrypt_mapptr *mp = this_cpu_ptr(paicrypt_root.mapptr); | ^~~~~~~~~~~~~
Commit detail, s390/pai: fix sampling event removal for PMU device driver [ Upstream commit e9f3af02f63909f41b43c28330434cc437639c5c ]
Steps to reproduce: # tuxmake --runtime podman --target-arch s390 --toolchain gcc-13 --kconfig defconfig
Links: - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.6.y/build/v6.6.25... - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.6.y/build/v6.6.25... - https://storage.tuxsuite.com/public/linaro/lkft/builds/2eozoS8GQGxb94EUWNTPu...
-- Linaro LKFT https://lkft.linaro.org
On Mon, Apr 08, 2024 at 10:45:44PM +0530, Naresh Kamboju wrote:
On Mon, 8 Apr 2024 at 18:30, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.26-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
The s390 defconfig build failed with gcc-13 and clang-17 due following build warning / errors on Linux stable-rc linux-6.6.y.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Build error:
arch/s390/kernel/perf_pai_crypto.c: In function 'paicrypt_stop': arch/s390/kernel/perf_pai_crypto.c:280:51: error: 'paicrypt_root' undeclared (first use in this function); did you mean 'paicrypt_stop'? 280 | struct paicrypt_mapptr *mp = this_cpu_ptr(paicrypt_root.mapptr); | ^~~~~~~~~~~~~
Commit detail, s390/pai: fix sampling event removal for PMU device driver [ Upstream commit e9f3af02f63909f41b43c28330434cc437639c5c ]
Steps to reproduce: # tuxmake --runtime podman --target-arch s390 --toolchain gcc-13 --kconfig defconfig
Links:
Thanks, I'll go drop a bunch of s390 patches from this tree and push out a -rc2 later tonight.
greg k-h
On Mon, Apr 08, 2024 at 10:45:44PM +0530, Naresh Kamboju wrote:
On Mon, 8 Apr 2024 at 18:30, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.26-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
The s390 defconfig build failed with gcc-13 and clang-17 due following build warning / errors on Linux stable-rc linux-6.6.y.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Build error:
arch/s390/kernel/perf_pai_crypto.c: In function 'paicrypt_stop': arch/s390/kernel/perf_pai_crypto.c:280:51: error: 'paicrypt_root' undeclared (first use in this function); did you mean 'paicrypt_stop'? 280 | struct paicrypt_mapptr *mp = this_cpu_ptr(paicrypt_root.mapptr); | ^~~~~~~~~~~~~
Commit detail, s390/pai: fix sampling event removal for PMU device driver [ Upstream commit e9f3af02f63909f41b43c28330434cc437639c5c ]
I'll drop the pai patches from this queue as well, thanks!
greg k-h
On Mon, Apr 08, 2024 at 02:54:59PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000. Anything received after that time might be too late.
No regressions found on WSL (x86 and arm64).
Built, booted, and reviewed dmesg.
Thank you. :)
Tested-by: Kelsey Steele kelseysteele@linux.microsoft.com
On 4/8/24 5:54 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.26-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Mon, 08 Apr 2024 14:54:59 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.26-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.6: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 116 tests: 116 pass, 0 fail
Linux version: 6.6.26-rc1-gec59b99017e9 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
Hi Greg
On Mon, Apr 8, 2024 at 10:00 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.26-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
6.6.26-rc1 tested.
Build successfully completed. Boot successfully completed. No dmesg regressions. Video output normal. Sound output normal.
Lenovo ThinkPad X1 Carbon Gen10(Intel i7-1260P(x86_64) arch linux)
[ 0.000000] Linux version 6.6.26-rc1rv (takeshi@ThinkPadX1Gen10J0764) (gcc (GCC) 13.2.1 20230801, GNU ld (GNU Binutils) 2.42.0) #1 SMP PREEMPT_DYNAMIC Tue Apr 9 18:54:11 JST 2024
Thanks
Tested-by: Takeshi Ogasawara takeshi.ogasawara@futuring-girl.com
On Mon, Apr 08, 2024 at 02:54:59PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Conor Dooley conor.dooley@microchip.com
Thanks, Conor.
On Mon, Apr 08, 2024 at 02:54:59PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Mark Brown broonie@kernel.org
On Mon, 8 Apr 2024 at 15:00, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.26-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro's test farm. Regressions on x86_64, and i386.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Following kernel warnings have been noticed on x86_64, qemu-x86_64 and qemu-i386 while running LTP cve ioctl_sg01 tests the kernel with stable-rc 6.6.26-rc1 and 6.8.5-rc1.
Reverted this patch and I couldn't see the repoted warning. scsi: sg: Avoid sg device teardown race [ Upstream commit 27f58c04a8f438078583041468ec60597841284d ]
This has been reported on stable-rc 6.6.24-rc1 [1].
------------[ cut here ]------------ [ 839.268407] WARNING: CPU: 0 PID: 92507 at drivers/scsi/sg.c:2237 sg_remove_sfp_usercontext+0x145/0x150 [ 839.277715] Modules linked in: algif_hash x86_pkg_temp_thermal [ 839.284952] CPU: 0 PID: 92507 Comm: kworker/0:0 Not tainted 6.6.26-rc1 #1 [ 839.293108] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.7 12/07/2021 [ 839.300514] Workqueue: events sg_remove_sfp_usercontext [ 839.307122] RIP: 0010:sg_remove_sfp_usercontext+0x145/0x150
<trim>
[ 839.415941] Call Trace: [ 839.419788] <TASK> [ 839.421924] ? show_regs+0x69/0x80 [ 839.425337] ? __warn+0x8d/0x150 [ 839.429950] ? sg_remove_sfp_usercontext+0x145/0x150 [ 839.434923] ? report_bug+0x171/0x1a0 [ 839.439968] ? handle_bug+0x42/0x80 [ 839.443466] ? exc_invalid_op+0x1c/0x70 [ 839.448688] ? asm_exc_invalid_op+0x1f/0x30 [ 839.452878] ? call_rcu+0x12/0x20 [ 839.457579] ? sg_remove_sfp_usercontext+0x145/0x150 [ 839.462551] process_one_work+0x13e/0x300 [ 839.467946] worker_thread+0x2f6/0x430 [ 839.471704] ? _raw_spin_unlock_irqrestore+0x22/0x50 [ 839.478049] ? __pfx_worker_thread+0x10/0x10 [ 839.482326] kthread+0x102/0x140 [ 839.486942] ? __pfx_kthread+0x10/0x10 [ 839.490699] ret_from_fork+0x3e/0x60 [ 839.495658] ? __pfx_kthread+0x10/0x10 [ 839.499419] ret_from_fork_asm+0x1b/0x30 [ 839.504727] </TASK> [ 839.506923] ---[ end trace 0000000000000000 ]---
[1] - https://lore.kernel.org/stable/CA+G9fYs5MZaPV+tTukfUbJtdztQMExfixo=ZwbBr1A6O...
[2] - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.6.y/build/v6.6.25...
## Build * kernel: 6.6.26-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-6.6.y * git commit: ec59b99017e96d3bc8a6d5b87c01f47ec5a9bcb9 * git describe: v6.6.25-253-gec59b99017e9 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.6.y/build/v6.6.25...
## Test result summary total: 165396, pass: 142577, fail: 2173, skip: 20493, xfail: 153
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 131 total, 131 passed, 0 failed * arm64: 39 total, 39 passed, 0 failed * i386: 31 total, 31 passed, 0 failed * mips: 24 total, 24 passed, 0 failed * parisc: 3 total, 3 passed, 0 failed * powerpc: 34 total, 34 passed, 0 failed * riscv: 16 total, 16 passed, 0 failed * s390: 12 total, 8 passed, 4 failed * sh: 10 total, 10 passed, 0 failed * sparc: 6 total, 6 passed, 0 failed * x86_64: 35 total, 35 passed, 0 failed
## Test suites summary * boot * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mm * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-user_events * kselftest-vDSO * kselftest-watchdog * kselftest-x86 * kselftest-zram * kunit * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-smoketest * ltp-syscalls * ltp-tracing * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
Hi Greg,
On 08/04/24 18:24, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000. Anything received after that time might be too late.
No problems seen on x86_64 and aarch64 with our testing.
Tested-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com
Thanks, Harshit
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.26-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
On 4/8/24 06:54, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.26 release. There are 252 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.26-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org