This is the start of the stable review cycle for the 5.15.151 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.151-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.15.151-rc1
Davide Caratti dcaratti@redhat.com mptcp: fix double-free on socket dismantle
Gal Pressman gal@nvidia.com Revert "tls: rx: move counting TlsDecryptErrors for sync"
Jakub Kicinski kuba@kernel.org net: tls: fix async vs NIC crypto offload
Martynas Pumputis m@lambda.lt bpf: Derive source IP addr via bpf_*_fib_lookup()
Louis DeLosSantos louis.delos.devel@gmail.com bpf: Add table ID to bpf_fib_lookup BPF helper
Martin KaFai Lau martin.lau@kernel.org bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "interconnect: Teach lockdep about icc_bw_lock order"
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "interconnect: Fix locking for runpm vs reclaim"
Bartosz Golaszewski bartosz.golaszewski@linaro.org gpio: fix resource unwinding order in error path
Andy Shevchenko andriy.shevchenko@linux.intel.com gpiolib: Fix the error path order in gpiochip_add_data_with_key()
Arturas Moskvinas arturas.moskvinas@gmail.com gpio: 74x164: Enable output pins after registers are reset
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Drop oob_skb ref before purging queue in GC.
Max Krummenacher max.krummenacher@toradex.com Revert "drm/bridge: lt8912b: Register and attach our DSI device at probe"
Oscar Salvador osalvador@suse.de fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
Baokun Li libaokun1@huawei.com cachefiles: fix memory leak in cachefiles_add_cache()
Paolo Abeni pabeni@redhat.com mptcp: fix possible deadlock in subflow diag
Paolo Abeni pabeni@redhat.com mptcp: push at DSS boundaries
Geliang Tang tanggeliang@kylinos.cn mptcp: add needs_id for netlink appending addr
Jean Sacren sakiwit@gmail.com mptcp: clean up harmless false expressions
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: add missing kconfig for NF Filter in v6
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: add missing kconfig for NF Filter
Paolo Abeni pabeni@redhat.com mptcp: rename timer related helper to less confusing names
Paolo Abeni pabeni@redhat.com mptcp: process pending subflow error on close
Paolo Abeni pabeni@redhat.com mptcp: move __mptcp_error_report in protocol.c
Paolo Bonzini pbonzini@redhat.com x86/cpu/intel: Detect TME keyid bits before setting MTRR mask registers
Bjorn Andersson quic_bjorande@quicinc.com pmdomain: qcom: rpmhpd: Fix enabled_corner aggregation
Zong Li zong.li@sifive.com riscv: add CALLER_ADDRx support
Elad Nachman enachman@marvell.com mmc: sdhci-xenon: fix PHY init clock stability
Elad Nachman enachman@marvell.com mmc: sdhci-xenon: add timeout for PHY init complete
Ivan Semenov ivan@semenov.dev mmc: core: Fix eMMC initialization with 1-bit bus connection
Curtis Klein curtis.klein@hpe.com dmaengine: fsl-qdma: init irq after reg initialization
Tadeusz Struk tstruk@gigaio.com dmaengine: ptdma: use consistent DMA masks
Peng Ma peng.ma@nxp.com dmaengine: fsl-qdma: fix SoC may hang on 16 byte unaligned read
David Sterba dsterba@suse.com btrfs: dev-replace: properly validate device names
Johannes Berg johannes.berg@intel.com wifi: nl80211: reject iftype change with mesh ID change
Alexander Ofitserov oficerovas@altlinux.org gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
Takashi Sakamoto o-takashi@sakamocchi.jp ALSA: firewire-lib: fix to check cycle continuity
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp tomoyo: fix UAF write bug in tomoyo_write_control()
Dimitris Vlachos dvlachos@ics.forth.gr riscv: Sparse-Memory/vmemmap out-of-bounds fix
David Howells dhowells@redhat.com afs: Fix endless loop in directory parsing
Jiri Slaby (SUSE) jirislaby@kernel.org fbcon: always restore the old font data in fbcon_do_set_font()
Takashi Iwai tiwai@suse.de ALSA: Drop leftover snd-rtctimer stuff from Makefile
Hans de Goede hdegoede@redhat.com power: supply: bq27xxx-i2c: Do not free non existing IRQ
Arnd Bergmann arnd@arndb.de efi/capsule-loader: fix incorrect allocation size
Sabrina Dubroca sd@queasysnail.net tls: decrement decrypt_pending if no async completion will be called
Jakub Kicinski kuba@kernel.org tls: rx: use async as an in-out argument
Jakub Kicinski kuba@kernel.org tls: rx: assume crypto always calls our callback
Jakub Kicinski kuba@kernel.org tls: rx: move counting TlsDecryptErrors for sync
Jakub Kicinski kuba@kernel.org tls: rx: don't track the async count
Jakub Kicinski kuba@kernel.org tls: rx: factor out writing ContentType to cmsg
Jakub Kicinski kuba@kernel.org tls: rx: wrap decryption arguments in a structure
Jakub Kicinski kuba@kernel.org tls: rx: don't report text length from the bowels of decrypt
Jakub Kicinski kuba@kernel.org tls: rx: drop unnecessary arguments from tls_setup_from_iter()
Jakub Kicinski kuba@kernel.org tls: hw: rx: use return value of tls_device_decrypted() to carry status
Jakub Kicinski kuba@kernel.org tls: rx: refactor decrypt_skb_update()
Jakub Kicinski kuba@kernel.org tls: rx: don't issue wake ups when data is decrypted
Jakub Kicinski kuba@kernel.org tls: rx: don't store the decryption status in socket context
Jakub Kicinski kuba@kernel.org tls: rx: don't store the record type in socket context
Oleksij Rempel linux@rempel-privat.de igb: extend PTP timestamp adjustments to i211
Lin Ma linma@zju.edu.cn rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
Florian Westphal fw@strlen.de netfilter: bridge: confirm multicast packets before passing them up the stack
Florian Westphal fw@strlen.de netfilter: let reset rules clean out conntrack entries
Florian Westphal fw@strlen.de netfilter: make function op structures const
Florian Westphal fw@strlen.de netfilter: core: move ip_ct_attach indirection to struct nf_ct_hook
Florian Westphal fw@strlen.de netfilter: nfnetlink_queue: silence bogus compiler warning
Ignat Korchagin ignat@cloudflare.com netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
Kai-Heng Feng kai.heng.feng@canonical.com Bluetooth: Enforce validation on max value of connection interval
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
Zijun Hu quic_zijuhu@quicinc.com Bluetooth: hci_event: Fix wrongly recorded wakeup BD_ADDR
Ying Hsu yinghsu@chromium.org Bluetooth: Avoid potential use-after-free in hci_error_reset
Jakub Raczynski j.raczynski@samsung.com stmmac: Clear variable when destroying workqueue
Justin Iurman justin.iurman@uliege.be uapi: in6: replace temporary label with rfc9486
Javier Carrasco javier.carrasco.cruz@gmail.com net: usb: dm9601: fix wrong return value in dm9601_mdio_read
Jakub Kicinski kuba@kernel.org veth: try harder when allocating queue memory
Vasily Averin vvs@openvz.org net: enable memcg accounting for veth queues
Oleksij Rempel linux@rempel-privat.de lan78xx: enable auto speed configuration for LAN7850 if no EEPROM is detected
Eric Dumazet edumazet@google.com ipv6: fix potential "struct net" leak in inet6_rtm_getaddr()
Jakub Kicinski kuba@kernel.org net: veth: clear GRO when clearing XDP even when down
Doug Smythies dsmythies@telus.net cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back
Yunjian Wang wangyunjian@huawei.com tun: Fix xdp_rxq_info's queue_index when detaching
Florian Westphal fw@strlen.de net: ip_tunnel: prevent perpetual headroom growth
Ryosuke Yasuoka ryasuoka@redhat.com netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter
Han Xu han.xu@nxp.com mtd: spinand: gigadevice: Fix the get ecc status issue
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: disallow timeout for anonymous sets
-------------
Diffstat:
Makefile | 4 +- arch/riscv/include/asm/ftrace.h | 5 + arch/riscv/include/asm/pgtable.h | 2 +- arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/return_address.c | 48 ++++ arch/x86/kernel/cpu/intel.c | 178 ++++++------ drivers/cpufreq/intel_pstate.c | 3 + drivers/dma/fsl-qdma.c | 25 +- drivers/dma/ptdma/ptdma-dmaengine.c | 2 - drivers/firmware/efi/capsule-loader.c | 2 +- drivers/gpio/gpio-74x164.c | 4 +- drivers/gpio/gpiolib.c | 12 +- drivers/gpu/drm/bridge/lontium-lt8912b.c | 11 +- drivers/interconnect/core.c | 18 +- drivers/mmc/core/mmc.c | 2 + drivers/mmc/host/sdhci-xenon-phy.c | 48 +++- drivers/mtd/nand/spi/gigadevice.c | 6 +- drivers/net/ethernet/intel/igb/igb_ptp.c | 5 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +- drivers/net/gtp.c | 12 +- drivers/net/tun.c | 1 + drivers/net/usb/dm9601.c | 2 +- drivers/net/usb/lan78xx.c | 3 +- drivers/net/veth.c | 40 +-- drivers/power/supply/bq27xxx_battery_i2c.c | 4 +- drivers/soc/qcom/rpmhpd.c | 7 +- drivers/video/fbdev/core/fbcon.c | 8 +- fs/afs/dir.c | 4 +- fs/btrfs/dev-replace.c | 24 +- fs/cachefiles/bind.c | 3 + fs/hugetlbfs/inode.c | 6 +- include/linux/netfilter.h | 14 +- include/net/ipv6_stubs.h | 5 + include/net/netfilter/nf_conntrack.h | 8 + include/net/strparser.h | 4 + include/net/tls.h | 11 +- include/uapi/linux/bpf.h | 37 ++- include/uapi/linux/in6.h | 2 +- net/bluetooth/hci_core.c | 7 +- net/bluetooth/hci_event.c | 13 +- net/bluetooth/l2cap_core.c | 8 +- net/bridge/br_netfilter_hooks.c | 96 +++++++ net/bridge/netfilter/nf_conntrack_bridge.c | 30 ++ net/core/filter.c | 67 ++++- net/core/rtnetlink.c | 11 +- net/ipv4/ip_tunnel.c | 28 +- net/ipv4/netfilter/nf_reject_ipv4.c | 1 + net/ipv6/addrconf.c | 7 +- net/ipv6/af_inet6.c | 1 + net/ipv6/netfilter/nf_reject_ipv6.c | 1 + net/mptcp/diag.c | 3 + net/mptcp/pm_netlink.c | 30 +- net/mptcp/protocol.c | 123 +++++++-- net/mptcp/subflow.c | 36 --- net/netfilter/core.c | 45 +-- net/netfilter/nf_conntrack_core.c | 21 +- net/netfilter/nf_conntrack_netlink.c | 4 +- net/netfilter/nf_conntrack_proto_tcp.c | 35 +++ net/netfilter/nf_nat_core.c | 2 +- net/netfilter/nf_tables_api.c | 7 + net/netfilter/nfnetlink_queue.c | 10 +- net/netfilter/nft_compat.c | 20 ++ net/netlink/af_netlink.c | 2 +- net/tls/tls_device.c | 6 +- net/tls/tls_sw.c | 316 ++++++++++------------ net/unix/garbage.c | 22 +- net/wireless/nl80211.c | 2 + security/tomoyo/common.c | 3 +- sound/core/Makefile | 1 - sound/firewire/amdtp-stream.c | 2 +- tools/include/uapi/linux/bpf.h | 37 ++- tools/testing/selftests/net/mptcp/config | 2 + 72 files changed, 1046 insertions(+), 529 deletions(-)
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit e26d3009efda338f19016df4175f354a9bd0a4ab upstream.
Never used from userspace, disallow these parameters.
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4682,6 +4682,9 @@ static int nf_tables_newset(struct sk_bu if (!(flags & NFT_SET_TIMEOUT)) return -EINVAL;
+ if (flags & NFT_SET_ANONYMOUS) + return -EOPNOTSUPP; + err = nf_msecs_to_jiffies64(nla[NFTA_SET_TIMEOUT], &desc.timeout); if (err) return err; @@ -4690,6 +4693,10 @@ static int nf_tables_newset(struct sk_bu if (nla[NFTA_SET_GC_INTERVAL] != NULL) { if (!(flags & NFT_SET_TIMEOUT)) return -EINVAL; + + if (flags & NFT_SET_ANONYMOUS) + return -EOPNOTSUPP; + desc.gc_int = ntohl(nla_get_be32(nla[NFTA_SET_GC_INTERVAL])); }
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Han Xu han.xu@nxp.com
[ Upstream commit 59950610c0c00c7a06d8a75d2ee5d73dba4274cf ]
Some GigaDevice ecc_get_status functions use on-stack buffer for spi_mem_op causes spi_mem_check_op failing, fix the issue by using spinand scratchbuf.
Fixes: c40c7a990a46 ("mtd: spinand: Add support for GigaDevice GD5F1GQ4UExxG") Signed-off-by: Han Xu han.xu@nxp.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20231108150701.593912-1-han.xu@nxp.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/nand/spi/gigadevice.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/spi/gigadevice.c b/drivers/mtd/nand/spi/gigadevice.c index da77ab20296ea..56d1b56615f97 100644 --- a/drivers/mtd/nand/spi/gigadevice.c +++ b/drivers/mtd/nand/spi/gigadevice.c @@ -178,7 +178,7 @@ static int gd5fxgq4uexxg_ecc_get_status(struct spinand_device *spinand, { u8 status2; struct spi_mem_op op = SPINAND_GET_FEATURE_OP(GD5FXGQXXEXXG_REG_STATUS2, - &status2); + spinand->scratchbuf); int ret;
switch (status & STATUS_ECC_MASK) { @@ -199,6 +199,7 @@ static int gd5fxgq4uexxg_ecc_get_status(struct spinand_device *spinand, * report the maximum of 4 in this case */ /* bits sorted this way (3...0): ECCS1,ECCS0,ECCSE1,ECCSE0 */ + status2 = *(spinand->scratchbuf); return ((status & STATUS_ECC_MASK) >> 2) | ((status2 & STATUS_ECC_MASK) >> 4);
@@ -220,7 +221,7 @@ static int gd5fxgq5xexxg_ecc_get_status(struct spinand_device *spinand, { u8 status2; struct spi_mem_op op = SPINAND_GET_FEATURE_OP(GD5FXGQXXEXXG_REG_STATUS2, - &status2); + spinand->scratchbuf); int ret;
switch (status & STATUS_ECC_MASK) { @@ -240,6 +241,7 @@ static int gd5fxgq5xexxg_ecc_get_status(struct spinand_device *spinand, * 1 ... 4 bits are flipped (and corrected) */ /* bits sorted this way (1...0): ECCSE1, ECCSE0 */ + status2 = *(spinand->scratchbuf); return ((status2 & STATUS_ECC_MASK) >> 4) + 1;
case STATUS_ECC_UNCOR_ERROR:
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryosuke Yasuoka ryasuoka@redhat.com
[ Upstream commit 661779e1fcafe1b74b3f3fe8e980c1e207fea1fd ]
syzbot reported the following uninit-value access issue [1]:
netlink_to_full_skb() creates a new `skb` and puts the `skb->data` passed as a 1st arg of netlink_to_full_skb() onto new `skb`. The data size is specified as `len` and passed to skb_put_data(). This `len` is based on `skb->end` that is not data offset but buffer offset. The `skb->end` contains data and tailroom. Since the tailroom is not initialized when the new `skb` created, KMSAN detects uninitialized memory area when copying the data.
This patch resolved this issue by correct the len from `skb->end` to `skb->len`, which is the actual data offset.
BUG: KMSAN: kernel-infoleak-after-free in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak-after-free in copy_to_user_iter lib/iov_iter.c:24 [inline] BUG: KMSAN: kernel-infoleak-after-free in iterate_ubuf include/linux/iov_iter.h:29 [inline] BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance2 include/linux/iov_iter.h:245 [inline] BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance include/linux/iov_iter.h:271 [inline] BUG: KMSAN: kernel-infoleak-after-free in _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186 instrument_copy_to_user include/linux/instrumented.h:114 [inline] copy_to_user_iter lib/iov_iter.c:24 [inline] iterate_ubuf include/linux/iov_iter.h:29 [inline] iterate_and_advance2 include/linux/iov_iter.h:245 [inline] iterate_and_advance include/linux/iov_iter.h:271 [inline] _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186 copy_to_iter include/linux/uio.h:197 [inline] simple_copy_to_iter+0x68/0xa0 net/core/datagram.c:532 __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:420 skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546 skb_copy_datagram_msg include/linux/skbuff.h:3960 [inline] packet_recvmsg+0xd9c/0x2000 net/packet/af_packet.c:3482 sock_recvmsg_nosec net/socket.c:1044 [inline] sock_recvmsg net/socket.c:1066 [inline] sock_read_iter+0x467/0x580 net/socket.c:1136 call_read_iter include/linux/fs.h:2014 [inline] new_sync_read fs/read_write.c:389 [inline] vfs_read+0x8f6/0xe00 fs/read_write.c:470 ksys_read+0x20f/0x4c0 fs/read_write.c:613 __do_sys_read fs/read_write.c:623 [inline] __se_sys_read fs/read_write.c:621 [inline] __x64_sys_read+0x93/0xd0 fs/read_write.c:621 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was stored to memory at: skb_put_data include/linux/skbuff.h:2622 [inline] netlink_to_full_skb net/netlink/af_netlink.c:181 [inline] __netlink_deliver_tap_skb net/netlink/af_netlink.c:298 [inline] __netlink_deliver_tap+0x5be/0xc90 net/netlink/af_netlink.c:325 netlink_deliver_tap net/netlink/af_netlink.c:338 [inline] netlink_deliver_tap_kernel net/netlink/af_netlink.c:347 [inline] netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x10f1/0x1250 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x1238/0x13d0 net/netlink/af_netlink.c:1910 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was created at: free_pages_prepare mm/page_alloc.c:1087 [inline] free_unref_page_prepare+0xb0/0xa40 mm/page_alloc.c:2347 free_unref_page_list+0xeb/0x1100 mm/page_alloc.c:2533 release_pages+0x23d3/0x2410 mm/swap.c:1042 free_pages_and_swap_cache+0xd9/0xf0 mm/swap_state.c:316 tlb_batch_pages_flush mm/mmu_gather.c:98 [inline] tlb_flush_mmu_free mm/mmu_gather.c:293 [inline] tlb_flush_mmu+0x6f5/0x980 mm/mmu_gather.c:300 tlb_finish_mmu+0x101/0x260 mm/mmu_gather.c:392 exit_mmap+0x49e/0xd30 mm/mmap.c:3321 __mmput+0x13f/0x530 kernel/fork.c:1349 mmput+0x8a/0xa0 kernel/fork.c:1371 exit_mm+0x1b8/0x360 kernel/exit.c:567 do_exit+0xd57/0x4080 kernel/exit.c:858 do_group_exit+0x2fd/0x390 kernel/exit.c:1021 __do_sys_exit_group kernel/exit.c:1032 [inline] __se_sys_exit_group kernel/exit.c:1030 [inline] __x64_sys_exit_group+0x3c/0x50 kernel/exit.c:1030 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Bytes 3852-3903 of 3904 are uninitialized Memory access of size 3904 starts at ffff88812ea1e000 Data copied to user address 0000000020003280
CPU: 1 PID: 5043 Comm: syz-executor297 Not tainted 6.7.0-rc5-syzkaller-00047-g5bd7ef53ffe5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
Fixes: 1853c9496460 ("netlink, mmap: transform mmap skb into full skb on taps") Reported-and-tested-by: syzbot+34ad5fab48f7bf510349@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=34ad5fab48f7bf510349 [1] Signed-off-by: Ryosuke Yasuoka ryasuoka@redhat.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240221074053.1794118-1-ryasuoka@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netlink/af_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 2169a9c3da1c3..82df02695bbdd 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -165,7 +165,7 @@ static inline u32 netlink_group_mask(u32 group) static struct sk_buff *netlink_to_full_skb(const struct sk_buff *skb, gfp_t gfp_mask) { - unsigned int len = skb_end_offset(skb); + unsigned int len = skb->len; struct sk_buff *new;
new = alloc_skb(len, gfp_mask);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 5ae1e9922bbdbaeb9cfbe91085ab75927488ac0f ]
syzkaller triggered following kasan splat: BUG: KASAN: use-after-free in __skb_flow_dissect+0x19d1/0x7a50 net/core/flow_dissector.c:1170 Read of size 1 at addr ffff88812fb4000e by task syz-executor183/5191 [..] kasan_report+0xda/0x110 mm/kasan/report.c:588 __skb_flow_dissect+0x19d1/0x7a50 net/core/flow_dissector.c:1170 skb_flow_dissect_flow_keys include/linux/skbuff.h:1514 [inline] ___skb_get_hash net/core/flow_dissector.c:1791 [inline] __skb_get_hash+0xc7/0x540 net/core/flow_dissector.c:1856 skb_get_hash include/linux/skbuff.h:1556 [inline] ip_tunnel_xmit+0x1855/0x33c0 net/ipv4/ip_tunnel.c:748 ipip_tunnel_xmit+0x3cc/0x4e0 net/ipv4/ipip.c:308 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3548 [inline] dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3564 __dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4349 dev_queue_xmit include/linux/netdevice.h:3134 [inline] neigh_connected_output+0x42c/0x5d0 net/core/neighbour.c:1592 ... ip_finish_output2+0x833/0x2550 net/ipv4/ip_output.c:235 ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323 .. iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1dbc/0x33c0 net/ipv4/ip_tunnel.c:831 ipgre_xmit+0x4a1/0x980 net/ipv4/ip_gre.c:665 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3548 [inline] dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3564 ...
The splat occurs because skb->data points past skb->head allocated area. This is because neigh layer does: __skb_pull(skb, skb_network_offset(skb));
... but skb_network_offset() returns a negative offset and __skb_pull() arg is unsigned. IOW, we skb->data gets "adjusted" by a huge value.
The negative value is returned because skb->head and skb->data distance is more than 64k and skb->network_header (u16) has wrapped around.
The bug is in the ip_tunnel infrastructure, which can cause dev->needed_headroom to increment ad infinitum.
The syzkaller reproducer consists of packets getting routed via a gre tunnel, and route of gre encapsulated packets pointing at another (ipip) tunnel. The ipip encapsulation finds gre0 as next output device.
This results in the following pattern:
1). First packet is to be sent out via gre0. Route lookup found an output device, ipip0.
2). ip_tunnel_xmit for gre0 bumps gre0->needed_headroom based on the future output device, rt.dev->needed_headroom (ipip0).
3). ip output / start_xmit moves skb on to ipip0. which runs the same code path again (xmit recursion).
4). Routing step for the post-gre0-encap packet finds gre0 as output device to use for ipip0 encapsulated packet.
tunl0->needed_headroom is then incremented based on the (already bumped) gre0 device headroom.
This repeats for every future packet:
gre0->needed_headroom gets inflated because previous packets' ipip0 step incremented rt->dev (gre0) headroom, and ipip0 incremented because gre0 needed_headroom was increased.
For each subsequent packet, gre/ipip0->needed_headroom grows until post-expand-head reallocations result in a skb->head/data distance of more than 64k.
Once that happens, skb->network_header (u16) wraps around when pskb_expand_head tries to make sure that skb_network_offset() is unchanged after the headroom expansion/reallocation.
After this skb_network_offset(skb) returns a different (and negative) result post headroom expansion.
The next trip to neigh layer (or anything else that would __skb_pull the network header) makes skb->data point to a memory location outside skb->head area.
v2: Cap the needed_headroom update to an arbitarily chosen upperlimit to prevent perpetual increase instead of dropping the headroom increment completely.
Reported-and-tested-by: syzbot+bfde3bef047a81b8fde6@syzkaller.appspotmail.com Closes: https://groups.google.com/g/syzkaller-bugs/c/fL9G6GtWskY/m/VKk_PR5FBAAJ Fixes: 243aad830e8a ("ip_gre: include route header_len in max_headroom calculation") Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240220135606.4939-1-fw@strlen.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/ip_tunnel.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 426dc910aaf87..96b7cd3049a33 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -540,6 +540,20 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb, return 0; }
+static void ip_tunnel_adj_headroom(struct net_device *dev, unsigned int headroom) +{ + /* we must cap headroom to some upperlimit, else pskb_expand_head + * will overflow header offsets in skb_headers_offset_update(). + */ + static const unsigned int max_allowed = 512; + + if (headroom > max_allowed) + headroom = max_allowed; + + if (headroom > READ_ONCE(dev->needed_headroom)) + WRITE_ONCE(dev->needed_headroom, headroom); +} + void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, u8 proto, int tunnel_hlen) { @@ -613,13 +627,13 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, }
headroom += LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len; - if (headroom > READ_ONCE(dev->needed_headroom)) - WRITE_ONCE(dev->needed_headroom, headroom); - - if (skb_cow_head(skb, READ_ONCE(dev->needed_headroom))) { + if (skb_cow_head(skb, headroom)) { ip_rt_put(rt); goto tx_dropped; } + + ip_tunnel_adj_headroom(dev, headroom); + iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, proto, tos, ttl, df, !net_eq(tunnel->net, dev_net(dev))); return; @@ -797,16 +811,16 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
max_headroom = LL_RESERVED_SPACE(rt->dst.dev) + sizeof(struct iphdr) + rt->dst.header_len + ip_encap_hlen(&tunnel->encap); - if (max_headroom > READ_ONCE(dev->needed_headroom)) - WRITE_ONCE(dev->needed_headroom, max_headroom);
- if (skb_cow_head(skb, READ_ONCE(dev->needed_headroom))) { + if (skb_cow_head(skb, max_headroom)) { ip_rt_put(rt); dev->stats.tx_dropped++; kfree_skb(skb); return; }
+ ip_tunnel_adj_headroom(dev, max_headroom); + iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, protocol, tos, ttl, df, !net_eq(tunnel->net, dev_net(dev))); return;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yunjian Wang wangyunjian@huawei.com
[ Upstream commit 2a770cdc4382b457ca3d43d03f0f0064f905a0d0 ]
When a queue(tfile) is detached, we only update tfile's queue_index, but do not update xdp_rxq_info's queue_index. This patch fixes it.
Fixes: 8bf5c4ee1889 ("tun: setup xdp_rxq_info") Signed-off-by: Yunjian Wang wangyunjian@huawei.com Link: https://lore.kernel.org/r/1708398727-46308-1-git-send-email-wangyunjian@huaw... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/tun.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 603530e6cd7b9..42bf0a3ec632e 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -654,6 +654,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean) tun->tfiles[tun->numqueues - 1]); ntfile = rtnl_dereference(tun->tfiles[index]); ntfile->queue_index = index; + ntfile->xdp_rxq.queue_index = index; rcu_assign_pointer(tun->tfiles[tun->numqueues - 1], NULL);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Doug Smythies dsmythies@telus.net
[ Upstream commit f0a0fc10abb062d122db5ac4ed42f6d1ca342649 ]
There is a loophole in pstate limit clamping for the intel_cpufreq CPU frequency scaling driver (intel_pstate in passive mode), schedutil CPU frequency scaling governor, HWP (HardWare Pstate) control enabled, when the adjust_perf call back path is used.
Fix it.
Fixes: a365ab6b9dfb cpufreq: intel_pstate: Implement the ->adjust_perf() callback Signed-off-by: Doug Smythies dsmythies@telus.net Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/intel_pstate.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index dd5f4eee9ffb6..4de71e772f514 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -2787,6 +2787,9 @@ static void intel_cpufreq_adjust_perf(unsigned int cpunum, if (min_pstate < cpu->min_perf_ratio) min_pstate = cpu->min_perf_ratio;
+ if (min_pstate > cpu->max_perf_ratio) + min_pstate = cpu->max_perf_ratio; + max_pstate = min(cap_pstate, cpu->max_perf_ratio); if (max_pstate < min_pstate) max_pstate = min_pstate;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit fe9f801355f0b47668419f30f1fac1cf4539e736 ]
veth sets NETIF_F_GRO automatically when XDP is enabled, because both features use the same NAPI machinery.
The logic to clear NETIF_F_GRO sits in veth_disable_xdp() which is called both on ndo_stop and when XDP is turned off. To avoid the flag from being cleared when the device is brought down, the clearing is skipped when IFF_UP is not set. Bringing the device down should indeed not modify its features.
Unfortunately, this means that clearing is also skipped when XDP is disabled _while_ the device is down. And there's nothing on the open path to bring the device features back into sync. IOW if user enables XDP, disables it and then brings the device up we'll end up with a stray GRO flag set but no NAPI instances.
We don't depend on the GRO flag on the datapath, so the datapath won't crash. We will crash (or hang), however, next time features are sync'ed (either by user via ethtool or peer changing its config). The GRO flag will go away, and veth will try to disable the NAPIs. But the open path never created them since XDP was off, the GRO flag was a stray. If NAPI was initialized before we'll hang in napi_disable(). If it never was we'll crash trying to stop uninitialized hrtimer.
Move the GRO flag updates to the XDP enable / disable paths, instead of mixing them with the ndo_open / ndo_close paths.
Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP") Reported-by: Thomas Gleixner tglx@linutronix.de Reported-by: syzbot+039399a9b96297ddedca@syzkaller.appspotmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Toke Høiland-Jørgensen toke@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/veth.c | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 984a153804096..85c3e12f83627 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1079,14 +1079,6 @@ static int veth_enable_xdp(struct net_device *dev) veth_disable_xdp_range(dev, 0, dev->real_num_rx_queues, true); return err; } - - if (!veth_gro_requested(dev)) { - /* user-space did not require GRO, but adding XDP - * is supposed to get GRO working - */ - dev->features |= NETIF_F_GRO; - netdev_features_change(dev); - } } }
@@ -1106,18 +1098,9 @@ static void veth_disable_xdp(struct net_device *dev) for (i = 0; i < dev->real_num_rx_queues; i++) rcu_assign_pointer(priv->rq[i].xdp_prog, NULL);
- if (!netif_running(dev) || !veth_gro_requested(dev)) { + if (!netif_running(dev) || !veth_gro_requested(dev)) veth_napi_del(dev);
- /* if user-space did not require GRO, since adding XDP - * enabled it, clear it now - */ - if (!veth_gro_requested(dev) && netif_running(dev)) { - dev->features &= ~NETIF_F_GRO; - netdev_features_change(dev); - } - } - veth_disable_xdp_range(dev, 0, dev->real_num_rx_queues, false); }
@@ -1497,6 +1480,14 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, }
if (!old_prog) { + if (!veth_gro_requested(dev)) { + /* user-space did not require GRO, but adding + * XDP is supposed to get GRO working + */ + dev->features |= NETIF_F_GRO; + netdev_features_change(dev); + } + peer->hw_features &= ~NETIF_F_GSO_SOFTWARE; peer->max_mtu = max_mtu; } @@ -1507,6 +1498,14 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, if (dev->flags & IFF_UP) veth_disable_xdp(dev);
+ /* if user-space did not require GRO, since adding XDP + * enabled it, clear it now + */ + if (!veth_gro_requested(dev)) { + dev->features &= ~NETIF_F_GRO; + netdev_features_change(dev); + } + if (peer) { peer->hw_features |= NETIF_F_GSO_SOFTWARE; peer->max_mtu = ETH_MAX_MTU;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 10bfd453da64a057bcfd1a49fb6b271c48653cdb ]
It seems that if userspace provides a correct IFA_TARGET_NETNSID value but no IFA_ADDRESS and IFA_LOCAL attributes, inet6_rtm_getaddr() returns -EINVAL with an elevated "struct net" refcount.
Fixes: 6ecf4c37eb3e ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Christian Brauner brauner@kernel.org Cc: David Ahern dsahern@kernel.org Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/addrconf.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index c52317184e3e2..968ca078191cd 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -5463,9 +5463,10 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh, }
addr = extract_addr(tb[IFA_ADDRESS], tb[IFA_LOCAL], &peer); - if (!addr) - return -EINVAL; - + if (!addr) { + err = -EINVAL; + goto errout; + } ifm = nlmsg_data(nlh); if (ifm->ifa_index) dev = dev_get_by_index(tgt_net, ifm->ifa_index);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleksij Rempel o.rempel@pengutronix.de
[ Upstream commit 0e67899abfbfdea0c3c0ed3fd263ffc601c5c157 ]
Same as LAN7800, LAN7850 can be used without EEPROM. If EEPROM is not present or not flashed, LAN7850 will fail to sync the speed detected by the PHY with the MAC. In case link speed is 100Mbit, it will accidentally work, otherwise no data can be transferred.
Better way would be to implement link_up callback, or set auto speed configuration unconditionally. But this changes would be more intrusive. So, for now, set it only if no EEPROM is found.
Fixes: e69647a19c87 ("lan78xx: Set ASD in MAC_CR when EEE is enabled.") Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://lore.kernel.org/r/20240222123839.2816561-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/lan78xx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index 5700c9d20a3e2..c8b42892655a1 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -2862,7 +2862,8 @@ static int lan78xx_reset(struct lan78xx_net *dev) if (dev->chipid == ID_REV_CHIP_ID_7801_) buf &= ~MAC_CR_GMII_EN_;
- if (dev->chipid == ID_REV_CHIP_ID_7800_) { + if (dev->chipid == ID_REV_CHIP_ID_7800_ || + dev->chipid == ID_REV_CHIP_ID_7850_) { ret = lan78xx_read_raw_eeprom(dev, 0, 1, &sig); if (!ret && sig != EEPROM_INDICATOR) { /* Implies there is no external eeprom. Set mac speed */
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin vvs@openvz.org
[ Upstream commit 961c6136359eef38a8c023d02028fdcd123f02a6 ]
veth netdevice defines own rx queues and allocates array containing up to 4095 ~750-bytes-long 'struct veth_rq' elements. Such allocation is quite huge and should be accounted to memcg.
Signed-off-by: Vasily Averin vvs@openvz.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 1ce7d306ea63 ("veth: try harder when allocating queue memory") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/veth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 85c3e12f83627..87cee614618ca 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1303,7 +1303,7 @@ static int veth_alloc_queues(struct net_device *dev) struct veth_priv *priv = netdev_priv(dev); int i;
- priv->rq = kcalloc(dev->num_rx_queues, sizeof(*priv->rq), GFP_KERNEL); + priv->rq = kcalloc(dev->num_rx_queues, sizeof(*priv->rq), GFP_KERNEL_ACCOUNT); if (!priv->rq) return -ENOMEM;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 1ce7d306ea63f3e379557c79abd88052e0483813 ]
struct veth_rq is pretty large, 832B total without debug options enabled. Since commit under Fixes we try to pre-allocate enough queues for every possible CPU. Miao Wang reports that this may lead to order-5 allocations which will fail in production.
Let the allocation fallback to vmalloc() and try harder. These are the same flags we pass to netdev queue allocation.
Reported-and-tested-by: Miao Wang shankerwangmiao@gmail.com Fixes: 9d3684c24a52 ("veth: create by default nr_possible_cpus queues") Link: https://lore.kernel.org/all/5F52CAE2-2FB7-4712-95F1-3312FBBFA8DD@gmail.com/ Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240223235908.693010-1-kuba@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/veth.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 87cee614618ca..0102f86d48676 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1303,7 +1303,8 @@ static int veth_alloc_queues(struct net_device *dev) struct veth_priv *priv = netdev_priv(dev); int i;
- priv->rq = kcalloc(dev->num_rx_queues, sizeof(*priv->rq), GFP_KERNEL_ACCOUNT); + priv->rq = kvcalloc(dev->num_rx_queues, sizeof(*priv->rq), + GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); if (!priv->rq) return -ENOMEM;
@@ -1319,7 +1320,7 @@ static void veth_free_queues(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev);
- kfree(priv->rq); + kvfree(priv->rq); }
static int veth_dev_init(struct net_device *dev)
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Javier Carrasco javier.carrasco.cruz@gmail.com
[ Upstream commit c68b2c9eba38ec3f60f4894b189090febf4d8d22 ]
The MII code does not check the return value of mdio_read (among others), and therefore no error code should be sent. A previous fix to the use of an uninitialized variable propagates negative error codes, that might lead to wrong operations by the MII library.
An example of such issues is the use of mii_nway_restart by the dm9601 driver. The mii_nway_restart function does not check the value returned by mdio_read, which in this case might be a negative number which could contain the exact bit the function checks (BMCR_ANENABLE = 0x1000).
Return zero in case of error, as it is common practice in users of mdio_read to avoid wrong uses of the return value.
Fixes: 8f8abb863fa5 ("net: usb: dm9601: fix uninitialized variable use in dm9601_mdio_read") Signed-off-by: Javier Carrasco javier.carrasco.cruz@gmail.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Peter Korsgaard peter@korsgaard.com Link: https://lore.kernel.org/r/20240225-dm9601_ret_err-v1-1-02c1d959ea59@gmail.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/dm9601.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c index 1959e12a3ff8a..f7357d884d6aa 100644 --- a/drivers/net/usb/dm9601.c +++ b/drivers/net/usb/dm9601.c @@ -232,7 +232,7 @@ static int dm9601_mdio_read(struct net_device *netdev, int phy_id, int loc) err = dm_read_shared_word(dev, 1, loc, &res); if (err < 0) { netdev_err(dev->net, "MDIO read error: %d\n", err); - return err; + return 0; }
netdev_dbg(dev->net,
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Justin Iurman justin.iurman@uliege.be
[ Upstream commit 6a2008641920a9c6fe1abbeb9acbec463215d505 ]
Not really a fix per se, but IPV6_TLV_IOAM is still tagged as "TEMPORARY IANA allocation for IOAM", while RFC 9486 is available for some time now. Just update the reference.
Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Signed-off-by: Justin Iurman justin.iurman@uliege.be Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240226124921.9097-1-justin.iurman@uliege.be Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/uapi/linux/in6.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h index c4c53a9ab9595..ff8d21f9e95b7 100644 --- a/include/uapi/linux/in6.h +++ b/include/uapi/linux/in6.h @@ -145,7 +145,7 @@ struct in6_flowlabel_req { #define IPV6_TLV_PADN 1 #define IPV6_TLV_ROUTERALERT 5 #define IPV6_TLV_CALIPSO 7 /* RFC 5570 */ -#define IPV6_TLV_IOAM 49 /* TEMPORARY IANA allocation for IOAM */ +#define IPV6_TLV_IOAM 49 /* RFC 9486 */ #define IPV6_TLV_JUMBO 194 #define IPV6_TLV_HAO 201 /* home address option */
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Raczynski j.raczynski@samsung.com
[ Upstream commit 8af411bbba1f457c33734795f024d0ef26d0963f ]
Currently when suspending driver and stopping workqueue it is checked whether workqueue is not NULL and if so, it is destroyed. Function destroy_workqueue() does drain queue and does clear variable, but it does not set workqueue variable to NULL. This can cause kernel/module panic if code attempts to clear workqueue that was not initialized.
This scenario is possible when resuming suspended driver in stmmac_resume(), because there is no handling for failed stmmac_hw_setup(), which can fail and return if DMA engine has failed to initialize, and workqueue is initialized after DMA engine. Should DMA engine fail to initialize, resume will proceed normally, but interface won't work and TX queue will eventually timeout, causing 'Reset adapter' error. This then does destroy workqueue during reset process. And since workqueue is initialized after DMA engine and can be skipped, it will cause kernel/module panic.
To secure against this possible crash, set workqueue variable to NULL when destroying workqueue.
Log/backtrace from crash goes as follows: [88.031977]------------[ cut here ]------------ [88.031985]NETDEV WATCHDOG: eth0 (sxgmac): transmit queue 1 timed out [88.032017]WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x390/0x398 <Skipping backtrace for watchdog timeout> [88.032251]---[ end trace e70de432e4d5c2c0 ]--- [88.032282]sxgmac 16d88000.ethernet eth0: Reset adapter. [88.036359]------------[ cut here ]------------ [88.036519]Call trace: [88.036523] flush_workqueue+0x3e4/0x430 [88.036528] drain_workqueue+0xc4/0x160 [88.036533] destroy_workqueue+0x40/0x270 [88.036537] stmmac_fpe_stop_wq+0x4c/0x70 [88.036541] stmmac_release+0x278/0x280 [88.036546] __dev_close_many+0xcc/0x158 [88.036551] dev_close_many+0xbc/0x190 [88.036555] dev_close.part.0+0x70/0xc0 [88.036560] dev_close+0x24/0x30 [88.036564] stmmac_service_task+0x110/0x140 [88.036569] process_one_work+0x1d8/0x4a0 [88.036573] worker_thread+0x54/0x408 [88.036578] kthread+0x164/0x170 [88.036583] ret_from_fork+0x10/0x20 [88.036588]---[ end trace e70de432e4d5c2c1 ]--- [88.036597]Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
Fixes: 5a5586112b929 ("net: stmmac: support FPE link partner hand-shaking procedure") Signed-off-by: Jakub Raczynski j.raczynski@samsung.com Reviewed-by: Jiri Pirko jiri@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index a1c1e353ca072..b0ab8f6986f8b 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3825,8 +3825,10 @@ static void stmmac_fpe_stop_wq(struct stmmac_priv *priv) { set_bit(__FPE_REMOVING, &priv->fpe_task_state);
- if (priv->fpe_wq) + if (priv->fpe_wq) { destroy_workqueue(priv->fpe_wq); + priv->fpe_wq = NULL; + }
netdev_info(priv->dev, "FPE workqueue stop"); }
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ying Hsu yinghsu@chromium.org
[ Upstream commit 2449007d3f73b2842c9734f45f0aadb522daf592 ]
While handling the HCI_EV_HARDWARE_ERROR event, if the underlying BT controller is not responding, the GPIO reset mechanism would free the hci_dev and lead to a use-after-free in hci_error_reset.
Here's the call trace observed on a ChromeOS device with Intel AX201: queue_work_on+0x3e/0x6c __hci_cmd_sync_sk+0x2ee/0x4c0 [bluetooth HASH:3b4a6] ? init_wait_entry+0x31/0x31 __hci_cmd_sync+0x16/0x20 [bluetooth <HASH:3b4a 6>] hci_error_reset+0x4f/0xa4 [bluetooth <HASH:3b4a 6>] process_one_work+0x1d8/0x33f worker_thread+0x21b/0x373 kthread+0x13a/0x152 ? pr_cont_work+0x54/0x54 ? kthread_blkcg+0x31/0x31 ret_from_fork+0x1f/0x30
This patch holds the reference count on the hci_dev while processing a HCI_EV_HARDWARE_ERROR event to avoid potential crash.
Fixes: c7741d16a57c ("Bluetooth: Perform a power cycle when receiving hardware error event") Signed-off-by: Ying Hsu yinghsu@chromium.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index b3b597960c562..a8854b24f4cfb 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -2330,6 +2330,7 @@ static void hci_error_reset(struct work_struct *work) { struct hci_dev *hdev = container_of(work, struct hci_dev, error_reset);
+ hci_dev_hold(hdev); BT_DBG("%s", hdev->name);
if (hdev->hw_error) @@ -2337,10 +2338,10 @@ static void hci_error_reset(struct work_struct *work) else bt_dev_err(hdev, "hardware error 0x%2.2x", hdev->hw_error_code);
- if (hci_dev_do_close(hdev)) - return; + if (!hci_dev_do_close(hdev)) + hci_dev_do_open(hdev);
- hci_dev_do_open(hdev); + hci_dev_put(hdev); }
void hci_uuids_clear(struct hci_dev *hdev)
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zijun Hu quic_zijuhu@quicinc.com
[ Upstream commit 61a5ab72edea7ebc3ad2c6beea29d966f528ebfb ]
hci_store_wake_reason() wrongly parses event HCI_Connection_Request as HCI_Connection_Complete and HCI_Connection_Complete as HCI_Connection_Request, so causes recording wakeup BD_ADDR error and potential stability issue, fix it by using the correct field.
Fixes: 2f20216c1d6f ("Bluetooth: Emit controller suspend and resume events") Signed-off-by: Zijun Hu quic_zijuhu@quicinc.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_event.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 2ad2f4647847c..c4a35d4612b05 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -6272,10 +6272,10 @@ static void hci_store_wake_reason(struct hci_dev *hdev, u8 event, * keep track of the bdaddr of the connection event that woke us up. */ if (event == HCI_EV_CONN_REQUEST) { - bacpy(&hdev->wake_addr, &conn_complete->bdaddr); + bacpy(&hdev->wake_addr, &conn_request->bdaddr); hdev->wake_addr_type = BDADDR_BREDR; } else if (event == HCI_EV_CONN_COMPLETE) { - bacpy(&hdev->wake_addr, &conn_request->bdaddr); + bacpy(&hdev->wake_addr, &conn_complete->bdaddr); hdev->wake_addr_type = BDADDR_BREDR; } else if (event == HCI_EV_LE_META) { struct hci_ev_le_meta *le_ev = (void *)skb->data;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 7e74aa53a68bf60f6019bd5d9a9a1406ec4d4865 ]
If we received HCI_EV_IO_CAPA_REQUEST while HCI_OP_READ_REMOTE_EXT_FEATURES is yet to be responded assume the remote does support SSP since otherwise this event shouldn't be generated.
Link: https://lore.kernel.org/linux-bluetooth/CABBYNZ+9UdG1cMZVmdtN3U2aS16AKMCyTAR... Fixes: c7f59461f5a7 ("Bluetooth: Fix a refcnt underflow problem for hci_conn") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_event.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index c4a35d4612b05..0bfd856d079d5 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -4720,9 +4720,12 @@ static void hci_io_capa_request_evt(struct hci_dev *hdev, struct sk_buff *skb) hci_dev_lock(hdev);
conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); - if (!conn || !hci_conn_ssp_enabled(conn)) + if (!conn || !hci_dev_test_flag(hdev, HCI_SSP_ENABLED)) goto unlock;
+ /* Assume remote supports SSP since it has triggered this event */ + set_bit(HCI_CONN_SSP_ENABLED, &conn->flags); + hci_conn_hold(conn);
if (!hci_dev_test_flag(hdev, HCI_MGMT))
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kai-Heng Feng kai.heng.feng@canonical.com
[ Upstream commit e4b019515f950b4e6e5b74b2e1bb03a90cb33039 ]
Right now Linux BT stack cannot pass test case "GAP/CONN/CPUP/BV-05-C 'Connection Parameter Update Procedure Invalid Parameters Central Responder'" in Bluetooth Test Suite revision GAP.TS.p44. [0]
That was revoled by commit c49a8682fc5d ("Bluetooth: validate BLE connection interval updates"), but later got reverted due to devices like keyboards and mice may require low connection interval.
So only validate the max value connection interval to pass the Test Suite, and let devices to request low connection interval if needed.
[0] https://www.bluetooth.org/docman/handlers/DownloadDoc.ashx?doc_id=229869
Fixes: 68d19d7d9957 ("Revert "Bluetooth: validate BLE connection interval updates"") Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_event.c | 4 ++++ net/bluetooth/l2cap_core.c | 8 +++++++- 2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 0bfd856d079d5..ba7242729a8fb 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -6058,6 +6058,10 @@ static void hci_le_remote_conn_param_req_evt(struct hci_dev *hdev, return send_conn_param_neg_reply(hdev, handle, HCI_ERROR_UNKNOWN_CONN_ID);
+ if (max > hcon->le_conn_max_interval) + return send_conn_param_neg_reply(hdev, handle, + HCI_ERROR_INVALID_LL_PARAMS); + if (hci_check_conn_params(min, max, latency, timeout)) return send_conn_param_neg_reply(hdev, handle, HCI_ERROR_INVALID_LL_PARAMS); diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 850b6aab73779..11bfc8737e6ce 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -5614,7 +5614,13 @@ static inline int l2cap_conn_param_update_req(struct l2cap_conn *conn,
memset(&rsp, 0, sizeof(rsp));
- err = hci_check_conn_params(min, max, latency, to_multiplier); + if (max > hcon->le_conn_max_interval) { + BT_DBG("requested connection interval exceeds current bounds."); + err = -EINVAL; + } else { + err = hci_check_conn_params(min, max, latency, to_multiplier); + } + if (err) rsp.result = cpu_to_le16(L2CAP_CONN_PARAM_REJECTED); else
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ignat Korchagin ignat@cloudflare.com
[ Upstream commit 7e0f122c65912740327e4c54472acaa5f85868cb ]
Commit d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") added some validation of NFPROTO_* families in the nft_compat module, but it broke the ability to use legacy iptables modules in dual-stack nftables.
While with legacy iptables one had to independently manage IPv4 and IPv6 tables, with nftables it is possible to have dual-stack tables sharing the rules. Moreover, it was possible to use rules based on legacy iptables match/target modules in dual-stack nftables.
As an example, the program from [2] creates an INET dual-stack family table using an xt_bpf based rule, which looks like the following (the actual output was generated with a patched nft tool as the current nft tool does not parse dual stack tables with legacy match rules, so consider it for illustrative purposes only):
table inet testfw { chain input { type filter hook prerouting priority filter; policy accept; bytecode counter packets 0 bytes 0 accept } }
After d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") we get EOPNOTSUPP for the above program.
Fix this by allowing NFPROTO_INET for nft_(match/target)_validate(), but also restrict the functions to classic iptables hooks.
Changes in v3: * clarify that upstream nft will not display such configuration properly and that the output was generated with a patched nft tool * remove example program from commit description and link to it instead * no code changes otherwise
Changes in v2: * restrict nft_(match/target)_validate() to classic iptables hooks * rewrite example program to use unmodified libnftnl
Fixes: d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") Link: https://lore.kernel.org/all/Zc1PfoWN38UuFJRI@calendula/T/#mc947262582c90fec0... [1] Link: https://lore.kernel.org/all/20240220145509.53357-1-ignat@cloudflare.com/ [2] Reported-by: Jordan Griege jgriege@cloudflare.com Signed-off-by: Ignat Korchagin ignat@cloudflare.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_compat.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c index 64a2a5f195896..aee046e00bfaf 100644 --- a/net/netfilter/nft_compat.c +++ b/net/netfilter/nft_compat.c @@ -358,10 +358,20 @@ static int nft_target_validate(const struct nft_ctx *ctx,
if (ctx->family != NFPROTO_IPV4 && ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_INET && ctx->family != NFPROTO_BRIDGE && ctx->family != NFPROTO_ARP) return -EOPNOTSUPP;
+ ret = nft_chain_validate_hooks(ctx->chain, + (1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_FORWARD) | + (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_POST_ROUTING)); + if (ret) + return ret; + if (nft_is_base_chain(ctx->chain)) { const struct nft_base_chain *basechain = nft_base_chain(ctx->chain); @@ -607,10 +617,20 @@ static int nft_match_validate(const struct nft_ctx *ctx,
if (ctx->family != NFPROTO_IPV4 && ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_INET && ctx->family != NFPROTO_BRIDGE && ctx->family != NFPROTO_ARP) return -EOPNOTSUPP;
+ ret = nft_chain_validate_hooks(ctx->chain, + (1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_FORWARD) | + (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_POST_ROUTING)); + if (ret) + return ret; + if (nft_is_base_chain(ctx->chain)) { const struct nft_base_chain *basechain = nft_base_chain(ctx->chain);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit b43c2793f5e9910862e8fe07846b74e45b104501 ]
net/netfilter/nfnetlink_queue.c:601:36: warning: variable 'ctinfo' is uninitialized when used here [-Wuninitialized] if (ct && nfnl_ct->build(skb, ct, ctinfo, NFQA_CT, NFQA_CT_INFO) < 0)
ctinfo is only uninitialized if ct == NULL. Init it to 0 to silence this.
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nfnetlink_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index 5329ebf19a18b..f4468ef3d0a94 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -387,7 +387,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, struct net_device *indev; struct net_device *outdev; struct nf_conn *ct = NULL; - enum ip_conntrack_info ctinfo; + enum ip_conntrack_info ctinfo = 0; struct nfnl_ct_hook *nfnl_ct; bool csum_verify; char *secdata = NULL;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 3fce16493dc1aa2c9af3d7e7bd360dfe203a3e6a ]
ip_ct_attach predates struct nf_ct_hook, we can place it there and remove the exported symbol.
Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/netfilter.h | 2 +- net/netfilter/core.c | 19 ++++++++----------- net/netfilter/nf_conntrack_core.c | 4 +--- 3 files changed, 10 insertions(+), 15 deletions(-)
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index e20c2db0f2c16..64acdf22eb4fa 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -435,7 +435,6 @@ nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family) #if IS_ENABLED(CONFIG_NF_CONNTRACK) #include <linux/netfilter/nf_conntrack_zones_common.h>
-extern void (*ip_ct_attach)(struct sk_buff *, const struct sk_buff *) __rcu; void nf_ct_attach(struct sk_buff *, const struct sk_buff *); struct nf_conntrack_tuple; bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple, @@ -458,6 +457,7 @@ struct nf_ct_hook { void (*destroy)(struct nf_conntrack *); bool (*get_tuple_skb)(struct nf_conntrack_tuple *, const struct sk_buff *); + void (*attach)(struct sk_buff *nskb, const struct sk_buff *skb); }; extern struct nf_ct_hook __rcu *nf_ct_hook;
diff --git a/net/netfilter/core.c b/net/netfilter/core.c index ffa84cafb746b..5396d27ba6a71 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -639,25 +639,22 @@ struct nf_ct_hook __rcu *nf_ct_hook __read_mostly; EXPORT_SYMBOL_GPL(nf_ct_hook);
#if IS_ENABLED(CONFIG_NF_CONNTRACK) -/* This does not belong here, but locally generated errors need it if connection - tracking in use: without this, connection may not be in hash table, and hence - manufactured ICMP or RST packets will not be associated with it. */ -void (*ip_ct_attach)(struct sk_buff *, const struct sk_buff *) - __rcu __read_mostly; -EXPORT_SYMBOL(ip_ct_attach); - struct nf_nat_hook __rcu *nf_nat_hook __read_mostly; EXPORT_SYMBOL_GPL(nf_nat_hook);
+/* This does not belong here, but locally generated errors need it if connection + * tracking in use: without this, connection may not be in hash table, and hence + * manufactured ICMP or RST packets will not be associated with it. + */ void nf_ct_attach(struct sk_buff *new, const struct sk_buff *skb) { - void (*attach)(struct sk_buff *, const struct sk_buff *); + const struct nf_ct_hook *ct_hook;
if (skb->_nfct) { rcu_read_lock(); - attach = rcu_dereference(ip_ct_attach); - if (attach) - attach(new, skb); + ct_hook = rcu_dereference(nf_ct_hook); + if (ct_hook) + ct_hook->attach(new, skb); rcu_read_unlock(); } } diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 10622760f894a..779e41d1afdce 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -2518,7 +2518,6 @@ static int kill_all(struct nf_conn *i, void *data) void nf_conntrack_cleanup_start(void) { conntrack_gc_work.exiting = true; - RCU_INIT_POINTER(ip_ct_attach, NULL); }
void nf_conntrack_cleanup_end(void) @@ -2838,12 +2837,11 @@ static struct nf_ct_hook nf_conntrack_hook = { .update = nf_conntrack_update, .destroy = nf_ct_destroy, .get_tuple_skb = nf_conntrack_get_tuple_skb, + .attach = nf_conntrack_attach, };
void nf_conntrack_init_end(void) { - /* For use by REJECT target */ - RCU_INIT_POINTER(ip_ct_attach, nf_conntrack_attach); RCU_INIT_POINTER(nf_ct_hook, &nf_conntrack_hook); }
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 285c8a7a58158cb1805c97ff03875df2ba2ea1fe ]
No functional changes, these structures should be const.
Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/netfilter.h | 8 ++++---- net/netfilter/core.c | 10 +++++----- net/netfilter/nf_conntrack_core.c | 4 ++-- net/netfilter/nf_conntrack_netlink.c | 4 ++-- net/netfilter/nf_nat_core.c | 2 +- net/netfilter/nfnetlink_queue.c | 8 ++++---- 6 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index 64acdf22eb4fa..5a665034c30be 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -376,13 +376,13 @@ struct nf_nat_hook { enum ip_conntrack_dir dir); };
-extern struct nf_nat_hook __rcu *nf_nat_hook; +extern const struct nf_nat_hook __rcu *nf_nat_hook;
static inline void nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family) { #if IS_ENABLED(CONFIG_NF_NAT) - struct nf_nat_hook *nat_hook; + const struct nf_nat_hook *nat_hook;
rcu_read_lock(); nat_hook = rcu_dereference(nf_nat_hook); @@ -459,7 +459,7 @@ struct nf_ct_hook { const struct sk_buff *); void (*attach)(struct sk_buff *nskb, const struct sk_buff *skb); }; -extern struct nf_ct_hook __rcu *nf_ct_hook; +extern const struct nf_ct_hook __rcu *nf_ct_hook;
struct nlattr;
@@ -474,7 +474,7 @@ struct nfnl_ct_hook { void (*seq_adjust)(struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info ctinfo, s32 off); }; -extern struct nfnl_ct_hook __rcu *nfnl_ct_hook; +extern const struct nfnl_ct_hook __rcu *nfnl_ct_hook;
/** * nf_skb_duplicated - TEE target has sent a packet diff --git a/net/netfilter/core.c b/net/netfilter/core.c index 5396d27ba6a71..aa3f7d3228fda 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -632,14 +632,14 @@ EXPORT_SYMBOL(nf_hook_slow_list); /* This needs to be compiled in any case to avoid dependencies between the * nfnetlink_queue code and nf_conntrack. */ -struct nfnl_ct_hook __rcu *nfnl_ct_hook __read_mostly; +const struct nfnl_ct_hook __rcu *nfnl_ct_hook __read_mostly; EXPORT_SYMBOL_GPL(nfnl_ct_hook);
-struct nf_ct_hook __rcu *nf_ct_hook __read_mostly; +const struct nf_ct_hook __rcu *nf_ct_hook __read_mostly; EXPORT_SYMBOL_GPL(nf_ct_hook);
#if IS_ENABLED(CONFIG_NF_CONNTRACK) -struct nf_nat_hook __rcu *nf_nat_hook __read_mostly; +const struct nf_nat_hook __rcu *nf_nat_hook __read_mostly; EXPORT_SYMBOL_GPL(nf_nat_hook);
/* This does not belong here, but locally generated errors need it if connection @@ -662,7 +662,7 @@ EXPORT_SYMBOL(nf_ct_attach);
void nf_conntrack_destroy(struct nf_conntrack *nfct) { - struct nf_ct_hook *ct_hook; + const struct nf_ct_hook *ct_hook;
rcu_read_lock(); ct_hook = rcu_dereference(nf_ct_hook); @@ -677,7 +677,7 @@ EXPORT_SYMBOL(nf_conntrack_destroy); bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple, const struct sk_buff *skb) { - struct nf_ct_hook *ct_hook; + const struct nf_ct_hook *ct_hook; bool ret = false;
rcu_read_lock(); diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 779e41d1afdce..2a4222eefc894 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -2145,9 +2145,9 @@ static int __nf_conntrack_update(struct net *net, struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info ctinfo) { + const struct nf_nat_hook *nat_hook; struct nf_conntrack_tuple_hash *h; struct nf_conntrack_tuple tuple; - struct nf_nat_hook *nat_hook; unsigned int status; int dataoff; u16 l3num; @@ -2833,7 +2833,7 @@ int nf_conntrack_init_start(void) return ret; }
-static struct nf_ct_hook nf_conntrack_hook = { +static const struct nf_ct_hook nf_conntrack_hook = { .update = nf_conntrack_update, .destroy = nf_ct_destroy, .get_tuple_skb = nf_conntrack_get_tuple_skb, diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index c427f7625a3b5..1466015bc56dc 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1816,7 +1816,7 @@ ctnetlink_parse_nat_setup(struct nf_conn *ct, const struct nlattr *attr) __must_hold(RCU) { - struct nf_nat_hook *nat_hook; + const struct nf_nat_hook *nat_hook; int err;
nat_hook = rcu_dereference(nf_nat_hook); @@ -2922,7 +2922,7 @@ static void ctnetlink_glue_seqadj(struct sk_buff *skb, struct nf_conn *ct, nf_ct_tcp_seqadj_set(skb, ct, ctinfo, diff); }
-static struct nfnl_ct_hook ctnetlink_glue_hook = { +static const struct nfnl_ct_hook ctnetlink_glue_hook = { .build_size = ctnetlink_glue_build_size, .build = ctnetlink_glue_build, .parse = ctnetlink_glue_parse, diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index 2731176839228..b776b3af78ca2 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -1120,7 +1120,7 @@ static struct pernet_operations nat_net_ops = { .size = sizeof(struct nat_net), };
-static struct nf_nat_hook nat_hook = { +static const struct nf_nat_hook nat_hook = { .parse_nat_setup = nfnetlink_parse_nat_setup, #ifdef CONFIG_XFRM .decode_session = __nf_nat_decode_session, diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index f4468ef3d0a94..8c96e01f6a023 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -225,7 +225,7 @@ find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict) { - struct nf_ct_hook *ct_hook; + const struct nf_ct_hook *ct_hook; int err;
if (verdict == NF_ACCEPT || @@ -388,7 +388,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, struct net_device *outdev; struct nf_conn *ct = NULL; enum ip_conntrack_info ctinfo = 0; - struct nfnl_ct_hook *nfnl_ct; + const struct nfnl_ct_hook *nfnl_ct; bool csum_verify; char *secdata = NULL; u32 seclen = 0; @@ -1115,7 +1115,7 @@ static int nfqnl_recv_verdict_batch(struct sk_buff *skb, return 0; }
-static struct nf_conn *nfqnl_ct_parse(struct nfnl_ct_hook *nfnl_ct, +static struct nf_conn *nfqnl_ct_parse(const struct nfnl_ct_hook *nfnl_ct, const struct nlmsghdr *nlh, const struct nlattr * const nfqa[], struct nf_queue_entry *entry, @@ -1182,11 +1182,11 @@ static int nfqnl_recv_verdict(struct sk_buff *skb, const struct nfnl_info *info, { struct nfnl_queue_net *q = nfnl_queue_pernet(info->net); u_int16_t queue_num = ntohs(info->nfmsg->res_id); + const struct nfnl_ct_hook *nfnl_ct; struct nfqnl_msg_verdict_hdr *vhdr; enum ip_conntrack_info ctinfo; struct nfqnl_instance *queue; struct nf_queue_entry *entry; - struct nfnl_ct_hook *nfnl_ct; struct nf_conn *ct = NULL; unsigned int verdict; int err;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 2954fe60e33da0f4de4d81a4c95c7dddb517d00c ]
iptables/nftables support responding to tcp packets with tcp resets.
The generated tcp reset packet passes through both output and postrouting netfilter hooks, but conntrack will never see them because the generated skb has its ->nfct pointer copied over from the packet that triggered the reset rule.
If the reset rule is used for established connections, this may result in the conntrack entry to be around for a very long time (default timeout is 5 days).
One way to avoid this would be to not copy the nf_conn pointer so that the rest packet passes through conntrack too.
Problem is that output rules might not have the same conntrack zone setup as the prerouting ones, so its possible that the reset skb won't find the correct entry. Generating a template entry for the skb seems error prone as well.
Add an explicit "closing" function that switches a confirmed conntrack entry to closed state and wire this up for tcp.
If the entry isn't confirmed, no action is needed because the conntrack entry will never be committed to the table.
Reported-by: Russel King linux@armlinux.org.uk Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/netfilter.h | 3 +++ include/net/netfilter/nf_conntrack.h | 8 ++++++ net/ipv4/netfilter/nf_reject_ipv4.c | 1 + net/ipv6/netfilter/nf_reject_ipv6.c | 1 + net/netfilter/core.c | 16 ++++++++++++ net/netfilter/nf_conntrack_core.c | 12 +++++++++ net/netfilter/nf_conntrack_proto_tcp.c | 35 ++++++++++++++++++++++++++ 7 files changed, 76 insertions(+)
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index 5a665034c30be..c92bb1580f419 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -436,11 +436,13 @@ nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family) #include <linux/netfilter/nf_conntrack_zones_common.h>
void nf_ct_attach(struct sk_buff *, const struct sk_buff *); +void nf_ct_set_closing(struct nf_conntrack *nfct); struct nf_conntrack_tuple; bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple, const struct sk_buff *skb); #else static inline void nf_ct_attach(struct sk_buff *new, struct sk_buff *skb) {} +static inline void nf_ct_set_closing(struct nf_conntrack *nfct) {} struct nf_conntrack_tuple; static inline bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple, const struct sk_buff *skb) @@ -458,6 +460,7 @@ struct nf_ct_hook { bool (*get_tuple_skb)(struct nf_conntrack_tuple *, const struct sk_buff *); void (*attach)(struct sk_buff *nskb, const struct sk_buff *skb); + void (*set_closing)(struct nf_conntrack *nfct); }; extern const struct nf_ct_hook __rcu *nf_ct_hook;
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 34c266502a50e..39541ab912a16 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -123,6 +123,12 @@ struct nf_conn { union nf_conntrack_proto proto; };
+static inline struct nf_conn * +nf_ct_to_nf_conn(const struct nf_conntrack *nfct) +{ + return container_of(nfct, struct nf_conn, ct_general); +} + static inline struct nf_conn * nf_ct_tuplehash_to_ctrack(const struct nf_conntrack_tuple_hash *hash) { @@ -173,6 +179,8 @@ nf_ct_get(const struct sk_buff *skb, enum ip_conntrack_info *ctinfo)
void nf_ct_destroy(struct nf_conntrack *nfct);
+void nf_conntrack_tcp_set_closing(struct nf_conn *ct); + /* decrement reference count on a conntrack */ static inline void nf_ct_put(struct nf_conn *ct) { diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c index f2edb40c0db00..350aaca126181 100644 --- a/net/ipv4/netfilter/nf_reject_ipv4.c +++ b/net/ipv4/netfilter/nf_reject_ipv4.c @@ -278,6 +278,7 @@ void nf_send_reset(struct net *net, struct sock *sk, struct sk_buff *oldskb, goto free_nskb;
nf_ct_attach(nskb, oldskb); + nf_ct_set_closing(skb_nfct(oldskb));
#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) /* If we use ip_local_out for bridged traffic, the MAC source on diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c index dffeaaaadcded..c0057edd84cfc 100644 --- a/net/ipv6/netfilter/nf_reject_ipv6.c +++ b/net/ipv6/netfilter/nf_reject_ipv6.c @@ -345,6 +345,7 @@ void nf_send_reset6(struct net *net, struct sock *sk, struct sk_buff *oldskb, nf_reject_ip6_tcphdr_put(nskb, oldskb, otcph, otcplen);
nf_ct_attach(nskb, oldskb); + nf_ct_set_closing(skb_nfct(oldskb));
#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) /* If we use ip6_local_out for bridged traffic, the MAC source on diff --git a/net/netfilter/core.c b/net/netfilter/core.c index aa3f7d3228fda..fe81824799d95 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -674,6 +674,22 @@ void nf_conntrack_destroy(struct nf_conntrack *nfct) } EXPORT_SYMBOL(nf_conntrack_destroy);
+void nf_ct_set_closing(struct nf_conntrack *nfct) +{ + const struct nf_ct_hook *ct_hook; + + if (!nfct) + return; + + rcu_read_lock(); + ct_hook = rcu_dereference(nf_ct_hook); + if (ct_hook) + ct_hook->set_closing(nfct); + + rcu_read_unlock(); +} +EXPORT_SYMBOL_GPL(nf_ct_set_closing); + bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple, const struct sk_buff *skb) { diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 2a4222eefc894..e0f4f76439d3d 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -2833,11 +2833,23 @@ int nf_conntrack_init_start(void) return ret; }
+static void nf_conntrack_set_closing(struct nf_conntrack *nfct) +{ + struct nf_conn *ct = nf_ct_to_nf_conn(nfct); + + switch (nf_ct_protonum(ct)) { + case IPPROTO_TCP: + nf_conntrack_tcp_set_closing(ct); + break; + } +} + static const struct nf_ct_hook nf_conntrack_hook = { .update = nf_conntrack_update, .destroy = nf_ct_destroy, .get_tuple_skb = nf_conntrack_get_tuple_skb, .attach = nf_conntrack_attach, + .set_closing = nf_conntrack_set_closing, };
void nf_conntrack_init_end(void) diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 1ecfdc4f23be8..f33e6aea7f4da 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -870,6 +870,41 @@ static bool tcp_can_early_drop(const struct nf_conn *ct) return false; }
+void nf_conntrack_tcp_set_closing(struct nf_conn *ct) +{ + enum tcp_conntrack old_state; + const unsigned int *timeouts; + u32 timeout; + + if (!nf_ct_is_confirmed(ct)) + return; + + spin_lock_bh(&ct->lock); + old_state = ct->proto.tcp.state; + ct->proto.tcp.state = TCP_CONNTRACK_CLOSE; + + if (old_state == TCP_CONNTRACK_CLOSE || + test_bit(IPS_FIXED_TIMEOUT_BIT, &ct->status)) { + spin_unlock_bh(&ct->lock); + return; + } + + timeouts = nf_ct_timeout_lookup(ct); + if (!timeouts) { + const struct nf_tcp_net *tn; + + tn = nf_tcp_pernet(nf_ct_net(ct)); + timeouts = tn->timeouts; + } + + timeout = timeouts[TCP_CONNTRACK_CLOSE]; + WRITE_ONCE(ct->timeout, timeout + nfct_time_stamp); + + spin_unlock_bh(&ct->lock); + + nf_conntrack_event_cache(IPCT_PROTOINFO, ct); +} + static void nf_ct_tcp_state_reset(struct ip_ct_tcp_state *state) { state->td_end = 0;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 62e7151ae3eb465e0ab52a20c941ff33bb6332e9 ]
conntrack nf_confirm logic cannot handle cloned skbs referencing the same nf_conn entry, which will happen for multicast (broadcast) frames on bridges.
Example: macvlan0 | br0 / \ ethX ethY
ethX (or Y) receives a L2 multicast or broadcast packet containing an IP packet, flow is not yet in conntrack table.
1. skb passes through bridge and fake-ip (br_netfilter)Prerouting. -> skb->_nfct now references a unconfirmed entry 2. skb is broad/mcast packet. bridge now passes clones out on each bridge interface. 3. skb gets passed up the stack. 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb and schedules a work queue to send them out on the lower devices.
The clone skb->_nfct is not a copy, it is the same entry as the original skb. The macvlan rx handler then returns RX_HANDLER_PASS. 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.
The Macvlan broadcast worker and normal confirm path will race.
This race will not happen if step 2 already confirmed a clone. In that case later steps perform skb_clone() with skb->_nfct already confirmed (in hash table). This works fine.
But such confirmation won't happen when eb/ip/nftables rules dropped the packets before they reached the nf_confirm step in postrouting.
Pablo points out that nf_conntrack_bridge doesn't allow use of stateful nat, so we can safely discard the nf_conn entry and let inet call conntrack again.
This doesn't work for bridge netfilter: skb could have a nat transformation. Also bridge nf prevents re-invocation of inet prerouting via 'sabotage_in' hook.
Work around this problem by explicit confirmation of the entry at LOCAL_IN time, before upper layer has a chance to clone the unconfirmed entry.
The downside is that this disables NAT and conntrack helpers.
Alternative fix would be to add locking to all code parts that deal with unconfirmed packets, but even if that could be done in a sane way this opens up other problems, for example:
-m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4 -m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5
For multicast case, only one of such conflicting mappings will be created, conntrack only handles 1:1 NAT mappings.
Users should set create a setup that explicitly marks such traffic NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass them, ruleset might have accept rules for untracked traffic already, so user-visible behaviour would change.
Suggested-by: Pablo Neira Ayuso pablo@netfilter.org Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217777 Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/netfilter.h | 1 + net/bridge/br_netfilter_hooks.c | 96 ++++++++++++++++++++++ net/bridge/netfilter/nf_conntrack_bridge.c | 30 +++++++ net/netfilter/nf_conntrack_core.c | 1 + 4 files changed, 128 insertions(+)
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index c92bb1580f419..c69cbd64b5b46 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -461,6 +461,7 @@ struct nf_ct_hook { const struct sk_buff *); void (*attach)(struct sk_buff *nskb, const struct sk_buff *skb); void (*set_closing)(struct nf_conntrack *nfct); + int (*confirm)(struct sk_buff *skb); }; extern const struct nf_ct_hook __rcu *nf_ct_hook;
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index f14beb9a62edb..8a114a5000466 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -43,6 +43,10 @@ #include <linux/sysctl.h> #endif
+#if IS_ENABLED(CONFIG_NF_CONNTRACK) +#include <net/netfilter/nf_conntrack_core.h> +#endif + static unsigned int brnf_net_id __read_mostly;
struct brnf_net { @@ -537,6 +541,90 @@ static unsigned int br_nf_pre_routing(void *priv, return NF_STOLEN; }
+#if IS_ENABLED(CONFIG_NF_CONNTRACK) +/* conntracks' nf_confirm logic cannot handle cloned skbs referencing + * the same nf_conn entry, which will happen for multicast (broadcast) + * Frames on bridges. + * + * Example: + * macvlan0 + * br0 + * ethX ethY + * + * ethX (or Y) receives multicast or broadcast packet containing + * an IP packet, not yet in conntrack table. + * + * 1. skb passes through bridge and fake-ip (br_netfilter)Prerouting. + * -> skb->_nfct now references a unconfirmed entry + * 2. skb is broad/mcast packet. bridge now passes clones out on each bridge + * interface. + * 3. skb gets passed up the stack. + * 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb + * and schedules a work queue to send them out on the lower devices. + * + * The clone skb->_nfct is not a copy, it is the same entry as the + * original skb. The macvlan rx handler then returns RX_HANDLER_PASS. + * 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb. + * + * The Macvlan broadcast worker and normal confirm path will race. + * + * This race will not happen if step 2 already confirmed a clone. In that + * case later steps perform skb_clone() with skb->_nfct already confirmed (in + * hash table). This works fine. + * + * But such confirmation won't happen when eb/ip/nftables rules dropped the + * packets before they reached the nf_confirm step in postrouting. + * + * Work around this problem by explicit confirmation of the entry at + * LOCAL_IN time, before upper layer has a chance to clone the unconfirmed + * entry. + * + */ +static unsigned int br_nf_local_in(void *priv, + struct sk_buff *skb, + const struct nf_hook_state *state) +{ + struct nf_conntrack *nfct = skb_nfct(skb); + const struct nf_ct_hook *ct_hook; + struct nf_conn *ct; + int ret; + + if (!nfct || skb->pkt_type == PACKET_HOST) + return NF_ACCEPT; + + ct = container_of(nfct, struct nf_conn, ct_general); + if (likely(nf_ct_is_confirmed(ct))) + return NF_ACCEPT; + + WARN_ON_ONCE(skb_shared(skb)); + WARN_ON_ONCE(refcount_read(&nfct->use) != 1); + + /* We can't call nf_confirm here, it would create a dependency + * on nf_conntrack module. + */ + ct_hook = rcu_dereference(nf_ct_hook); + if (!ct_hook) { + skb->_nfct = 0ul; + nf_conntrack_put(nfct); + return NF_ACCEPT; + } + + nf_bridge_pull_encap_header(skb); + ret = ct_hook->confirm(skb); + switch (ret & NF_VERDICT_MASK) { + case NF_STOLEN: + return NF_STOLEN; + default: + nf_bridge_push_encap_header(skb); + break; + } + + ct = container_of(nfct, struct nf_conn, ct_general); + WARN_ON_ONCE(!nf_ct_is_confirmed(ct)); + + return ret; +} +#endif
/* PF_BRIDGE/FORWARD *************************************************/ static int br_nf_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb) @@ -935,6 +1023,14 @@ static const struct nf_hook_ops br_nf_ops[] = { .hooknum = NF_BR_PRE_ROUTING, .priority = NF_BR_PRI_BRNF, }, +#if IS_ENABLED(CONFIG_NF_CONNTRACK) + { + .hook = br_nf_local_in, + .pf = NFPROTO_BRIDGE, + .hooknum = NF_BR_LOCAL_IN, + .priority = NF_BR_PRI_LAST, + }, +#endif { .hook = br_nf_forward_ip, .pf = NFPROTO_BRIDGE, diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c index d14b2dbbd1dfb..83743e95939b1 100644 --- a/net/bridge/netfilter/nf_conntrack_bridge.c +++ b/net/bridge/netfilter/nf_conntrack_bridge.c @@ -290,6 +290,30 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb, return nf_conntrack_in(skb, &bridge_state); }
+static unsigned int nf_ct_bridge_in(void *priv, struct sk_buff *skb, + const struct nf_hook_state *state) +{ + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + + if (skb->pkt_type == PACKET_HOST) + return NF_ACCEPT; + + /* nf_conntrack_confirm() cannot handle concurrent clones, + * this happens for broad/multicast frames with e.g. macvlan on top + * of the bridge device. + */ + ct = nf_ct_get(skb, &ctinfo); + if (!ct || nf_ct_is_confirmed(ct) || nf_ct_is_template(ct)) + return NF_ACCEPT; + + /* let inet prerouting call conntrack again */ + skb->_nfct = 0; + nf_ct_put(ct); + + return NF_ACCEPT; +} + static void nf_ct_bridge_frag_save(struct sk_buff *skb, struct nf_bridge_frag_data *data) { @@ -414,6 +438,12 @@ static struct nf_hook_ops nf_ct_bridge_hook_ops[] __read_mostly = { .hooknum = NF_BR_PRE_ROUTING, .priority = NF_IP_PRI_CONNTRACK, }, + { + .hook = nf_ct_bridge_in, + .pf = NFPROTO_BRIDGE, + .hooknum = NF_BR_LOCAL_IN, + .priority = NF_IP_PRI_CONNTRACK_CONFIRM, + }, { .hook = nf_ct_bridge_post, .pf = NFPROTO_BRIDGE, diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index e0f4f76439d3d..be6031886f942 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -2850,6 +2850,7 @@ static const struct nf_ct_hook nf_conntrack_hook = { .get_tuple_skb = nf_conntrack_get_tuple_skb, .attach = nf_conntrack_attach, .set_closing = nf_conntrack_set_closing, + .confirm = __nf_conntrack_confirm, };
void nf_conntrack_init_end(void)
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lin Ma linma@zju.edu.cn
[ Upstream commit 743ad091fb46e622f1b690385bb15e3cd3daf874 ]
In the commit d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length"), an adjustment was made to the old loop logic in the function `rtnl_bridge_setlink` to enable the loop to also check the length of the IFLA_BRIDGE_MODE attribute. However, this adjustment removed the `break` statement and led to an error logic of the flags writing back at the end of this function.
if (have_flags) memcpy(nla_data(attr), &flags, sizeof(flags)); // attr should point to IFLA_BRIDGE_FLAGS NLA !!!
Before the mentioned commit, the `attr` is granted to be IFLA_BRIDGE_FLAGS. However, this is not necessarily true fow now as the updated loop will let the attr point to the last NLA, even an invalid NLA which could cause overflow writes.
This patch introduces a new variable `br_flag` to save the NLA pointer that points to IFLA_BRIDGE_FLAGS and uses it to resolve the mentioned error logic.
Fixes: d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length") Signed-off-by: Lin Ma linma@zju.edu.cn Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20240227121128.608110-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/rtnetlink.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 1b71e5c582bbc..ef218e290dfba 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -4925,10 +4925,9 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, struct net *net = sock_net(skb->sk); struct ifinfomsg *ifm; struct net_device *dev; - struct nlattr *br_spec, *attr = NULL; + struct nlattr *br_spec, *attr, *br_flags_attr = NULL; int rem, err = -EOPNOTSUPP; u16 flags = 0; - bool have_flags = false;
if (nlmsg_len(nlh) < sizeof(*ifm)) return -EINVAL; @@ -4946,11 +4945,11 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC); if (br_spec) { nla_for_each_nested(attr, br_spec, rem) { - if (nla_type(attr) == IFLA_BRIDGE_FLAGS && !have_flags) { + if (nla_type(attr) == IFLA_BRIDGE_FLAGS && !br_flags_attr) { if (nla_len(attr) < sizeof(flags)) return -EINVAL;
- have_flags = true; + br_flags_attr = attr; flags = nla_get_u16(attr); }
@@ -4994,8 +4993,8 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, } }
- if (have_flags) - memcpy(nla_data(attr), &flags, sizeof(flags)); + if (br_flags_attr) + memcpy(nla_data(br_flags_attr), &flags, sizeof(flags)); out: return err; }
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleksij Rempel o.rempel@pengutronix.de
[ Upstream commit 0bb7b09392eb74b152719ae87b1ba5e4bf910ef0 ]
The i211 requires the same PTP timestamp adjustments as the i210, according to its datasheet. To ensure consistent timestamping across different platforms, this change extends the existing adjustments to include the i211.
The adjustment result are tested and comparable for i210 and i211 based systems.
Fixes: 3f544d2a4d5c ("igb: adjust PTP timestamps for Tx/Rx latency") Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Reviewed-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Link: https://lore.kernel.org/r/20240227184942.362710-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_ptp.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c index 9cdb7a856ab6c..1a1575e8577af 100644 --- a/drivers/net/ethernet/intel/igb/igb_ptp.c +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c @@ -826,7 +826,7 @@ static void igb_ptp_tx_hwtstamp(struct igb_adapter *adapter)
igb_ptp_systim_to_hwtstamp(adapter, &shhwtstamps, regval); /* adjust timestamp for the TX latency based on link speed */ - if (adapter->hw.mac.type == e1000_i210) { + if (hw->mac.type == e1000_i210 || hw->mac.type == e1000_i211) { switch (adapter->link_speed) { case SPEED_10: adjust = IGB_I210_TX_LATENCY_10; @@ -872,6 +872,7 @@ int igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, void *va, ktime_t *timestamp) { struct igb_adapter *adapter = q_vector->adapter; + struct e1000_hw *hw = &adapter->hw; struct skb_shared_hwtstamps ts; __le64 *regval = (__le64 *)va; int adjust = 0; @@ -891,7 +892,7 @@ int igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, void *va, igb_ptp_systim_to_hwtstamp(adapter, &ts, le64_to_cpu(regval[1]));
/* adjust timestamp for the RX latency based on link speed */ - if (adapter->hw.mac.type == e1000_i210) { + if (hw->mac.type == e1000_i210 || hw->mac.type == e1000_i211) { switch (adapter->link_speed) { case SPEED_10: adjust = IGB_I210_RX_LATENCY_10;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit c3f6bb74137c68b515b7e2ff123a80611e801013 ]
Original TLS implementation was handling one record at a time. It stashed the type of the record inside tls context (per socket structure) for convenience. When async crypto support was added [1] the author had to use skb->cb to store the type per-message.
The use of skb->cb overlaps with strparser, however, so a hybrid approach was taken where type is stored in context while parsing (since we parse a message at a time) but once parsed its copied to skb->cb.
Recently a workaround for sockmaps [2] exposed the previously private struct _strp_msg and started a trend of adding user fields directly in strparser's header. This is cleaner than storing information about an skb in the context.
This change is not strictly necessary, but IMHO the ownership of the context field is confusing. Information naturally belongs to the skb.
[1] commit 94524d8fc965 ("net/tls: Add support for async decryption of tls records") [2] commit b2c4618162ec ("bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg")
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/strparser.h | 3 +++ include/net/tls.h | 10 +++------- net/tls/tls_sw.c | 38 +++++++++++++++++--------------------- 3 files changed, 23 insertions(+), 28 deletions(-)
diff --git a/include/net/strparser.h b/include/net/strparser.h index 732b7097d78e4..c271543076cf8 100644 --- a/include/net/strparser.h +++ b/include/net/strparser.h @@ -70,6 +70,9 @@ struct sk_skb_cb { * when dst_reg == src_reg. */ u64 temp_reg; + struct tls_msg { + u8 control; + } tls; };
static inline struct strp_msg *strp_msg(struct sk_buff *skb) diff --git a/include/net/tls.h b/include/net/tls.h index eda0015c5c592..24c1b718ceacc 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -116,11 +116,6 @@ struct tls_rec { u8 aead_req_ctx[]; };
-struct tls_msg { - struct strp_msg rxm; - u8 control; -}; - struct tx_work { struct delayed_work work; struct sock *sk; @@ -151,7 +146,6 @@ struct tls_sw_context_rx { void (*saved_data_ready)(struct sock *sk);
struct sk_buff *recv_pkt; - u8 control; u8 async_capable:1; u8 decrypted:1; atomic_t decrypt_pending; @@ -410,7 +404,9 @@ void tls_free_partial_record(struct sock *sk, struct tls_context *ctx);
static inline struct tls_msg *tls_msg(struct sk_buff *skb) { - return (struct tls_msg *)strp_msg(skb); + struct sk_skb_cb *scb = (struct sk_skb_cb *)skb->cb; + + return &scb->tls; }
static inline bool tls_is_partially_sent_record(struct tls_context *ctx) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index e6f700f67c010..82d7c9b036bc7 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -128,10 +128,10 @@ static int skb_nsg(struct sk_buff *skb, int offset, int len) return __skb_nsg(skb, offset, len, 0); }
-static int padding_length(struct tls_sw_context_rx *ctx, - struct tls_prot_info *prot, struct sk_buff *skb) +static int padding_length(struct tls_prot_info *prot, struct sk_buff *skb) { struct strp_msg *rxm = strp_msg(skb); + struct tls_msg *tlm = tls_msg(skb); int sub = 0;
/* Determine zero-padding length */ @@ -153,7 +153,7 @@ static int padding_length(struct tls_sw_context_rx *ctx, sub++; back++; } - ctx->control = content_type; + tlm->control = content_type; } return sub; } @@ -187,7 +187,7 @@ static void tls_decrypt_done(struct crypto_async_request *req, int err) struct strp_msg *rxm = strp_msg(skb); int pad;
- pad = padding_length(ctx, prot, skb); + pad = padding_length(prot, skb); if (pad < 0) { ctx->async_wait.err = pad; tls_err_abort(skb->sk, pad); @@ -1423,6 +1423,7 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb, struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); struct tls_prot_info *prot = &tls_ctx->prot_info; struct strp_msg *rxm = strp_msg(skb); + struct tls_msg *tlm = tls_msg(skb); int n_sgin, n_sgout, nsg, mem_size, aead_size, err, pages = 0; struct aead_request *aead_req; struct sk_buff *unused; @@ -1500,7 +1501,7 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb, /* Prepare AAD */ tls_make_aad(aad, rxm->full_len - prot->overhead_size + prot->tail_size, - tls_ctx->rx.rec_seq, ctx->control, prot); + tls_ctx->rx.rec_seq, tlm->control, prot);
/* Prepare sgin */ sg_init_table(sgin, n_sgin); @@ -1585,7 +1586,7 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, *zc = false; }
- pad = padding_length(ctx, prot, skb); + pad = padding_length(prot, skb); if (pad < 0) return pad;
@@ -1817,26 +1818,21 @@ int tls_sw_recvmsg(struct sock *sk, } } goto recv_end; - } else { - tlm = tls_msg(skb); - if (prot->version == TLS_1_3_VERSION) - tlm->control = 0; - else - tlm->control = ctx->control; }
rxm = strp_msg(skb); + tlm = tls_msg(skb);
to_decrypt = rxm->full_len - prot->overhead_size;
if (to_decrypt <= len && !is_kvec && !is_peek && - ctx->control == TLS_RECORD_TYPE_DATA && + tlm->control == TLS_RECORD_TYPE_DATA && prot->version != TLS_1_3_VERSION && !bpf_strp_enabled) zc = true;
/* Do not use async mode if record is non-data */ - if (ctx->control == TLS_RECORD_TYPE_DATA && !bpf_strp_enabled) + if (tlm->control == TLS_RECORD_TYPE_DATA && !bpf_strp_enabled) async_capable = ctx->async_capable; else async_capable = false; @@ -1851,8 +1847,6 @@ int tls_sw_recvmsg(struct sock *sk, if (err == -EINPROGRESS) { async = true; num_async++; - } else if (prot->version == TLS_1_3_VERSION) { - tlm->control = ctx->control; }
/* If the type of records being processed is not known yet, @@ -1999,6 +1993,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); struct strp_msg *rxm = NULL; struct sock *sk = sock->sk; + struct tls_msg *tlm; struct sk_buff *skb; ssize_t copied = 0; bool from_queue; @@ -2027,14 +2022,15 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, } }
+ rxm = strp_msg(skb); + tlm = tls_msg(skb); + /* splice does not support reading control messages */ - if (ctx->control != TLS_RECORD_TYPE_DATA) { + if (tlm->control != TLS_RECORD_TYPE_DATA) { err = -EINVAL; goto splice_read_end; }
- rxm = strp_msg(skb); - chunk = min_t(unsigned int, rxm->full_len, len); copied = skb_splice_bits(skb, sk, rxm->offset, pipe, chunk, flags); if (copied < 0) @@ -2077,10 +2073,10 @@ bool tls_sw_sock_is_readable(struct sock *sk) static int tls_read_size(struct strparser *strp, struct sk_buff *skb) { struct tls_context *tls_ctx = tls_get_ctx(strp->sk); - struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); struct tls_prot_info *prot = &tls_ctx->prot_info; char header[TLS_HEADER_SIZE + MAX_IV_SIZE]; struct strp_msg *rxm = strp_msg(skb); + struct tls_msg *tlm = tls_msg(skb); size_t cipher_overhead; size_t data_len = 0; int ret; @@ -2101,7 +2097,7 @@ static int tls_read_size(struct strparser *strp, struct sk_buff *skb) if (ret < 0) goto read_failure;
- ctx->control = header[0]; + tlm->control = header[0];
data_len = ((header[4] & 0xFF) | (header[3] << 8));
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 7dc59c33d62c4520a119051d4486c214ef5caa23 ]
Similar justification to previous change, the information about decryption status belongs in the skb.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/strparser.h | 1 + include/net/tls.h | 1 - net/tls/tls_device.c | 3 ++- net/tls/tls_sw.c | 10 ++++++---- 4 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/include/net/strparser.h b/include/net/strparser.h index c271543076cf8..a191486eb1e4c 100644 --- a/include/net/strparser.h +++ b/include/net/strparser.h @@ -72,6 +72,7 @@ struct sk_skb_cb { u64 temp_reg; struct tls_msg { u8 control; + u8 decrypted; } tls; };
diff --git a/include/net/tls.h b/include/net/tls.h index 24c1b718ceacc..ea0aeae26cf76 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -147,7 +147,6 @@ struct tls_sw_context_rx {
struct sk_buff *recv_pkt; u8 async_capable:1; - u8 decrypted:1; atomic_t decrypt_pending; /* protect crypto_wait with decrypt_pending*/ spinlock_t decrypt_compl_lock; diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index 88785196a8966..f23d18e666284 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -936,6 +936,7 @@ int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx, struct sk_buff *skb, struct strp_msg *rxm) { struct tls_offload_context_rx *ctx = tls_offload_ctx_rx(tls_ctx); + struct tls_msg *tlm = tls_msg(skb); int is_decrypted = skb->decrypted; int is_encrypted = !is_decrypted; struct sk_buff *skb_iter; @@ -950,7 +951,7 @@ int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx, tls_ctx->rx.rec_seq, rxm->full_len, is_encrypted, is_decrypted);
- ctx->sw.decrypted |= is_decrypted; + tlm->decrypted |= is_decrypted;
if (unlikely(test_bit(TLS_RX_DEV_DEGRADED, &tls_ctx->flags))) { if (likely(is_encrypted || is_decrypted)) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 82d7c9b036bc7..0a6630bbef53e 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1560,9 +1560,10 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); struct tls_prot_info *prot = &tls_ctx->prot_info; struct strp_msg *rxm = strp_msg(skb); + struct tls_msg *tlm = tls_msg(skb); int pad, err = 0;
- if (!ctx->decrypted) { + if (!tlm->decrypted) { if (tls_ctx->rx_conf == TLS_HW) { err = tls_device_decrypted(sk, tls_ctx, skb, rxm); if (err < 0) @@ -1570,7 +1571,7 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, }
/* Still not decrypted after tls_device */ - if (!ctx->decrypted) { + if (!tlm->decrypted) { err = decrypt_internal(sk, skb, dest, NULL, chunk, zc, async); if (err < 0) { @@ -1594,7 +1595,7 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, rxm->offset += prot->prepend_size; rxm->full_len -= prot->overhead_size; tls_advance_record_sn(sk, prot, &tls_ctx->rx); - ctx->decrypted = 1; + tlm->decrypted = 1; ctx->saved_data_ready(sk); } else { *zc = false; @@ -2137,8 +2138,9 @@ static void tls_queue(struct strparser *strp, struct sk_buff *skb) { struct tls_context *tls_ctx = tls_get_ctx(strp->sk); struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); + struct tls_msg *tlm = tls_msg(skb);
- ctx->decrypted = 0; + tlm->decrypted = 0;
ctx->recv_pkt = skb; strp_pause(strp);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 5dbda02d322db7762f1a0348117cde913fb46c13 ]
We inform the applications that data is available when the record is received. Decryption happens inline inside recvmsg or splice call. Generating another wakeup inside the decryption handler seems pointless as someone must be actively reading the socket if we are executing this code.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 0a6630bbef53e..5fdc4f5193ee5 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1557,7 +1557,6 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, bool async) { struct tls_context *tls_ctx = tls_get_ctx(sk); - struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); struct tls_prot_info *prot = &tls_ctx->prot_info; struct strp_msg *rxm = strp_msg(skb); struct tls_msg *tlm = tls_msg(skb); @@ -1596,7 +1595,6 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, rxm->full_len -= prot->overhead_size; tls_advance_record_sn(sk, prot, &tls_ctx->rx); tlm->decrypted = 1; - ctx->saved_data_ready(sk); } else { *zc = false; }
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 3764ae5ba6615095de86698a00e814513b9ad0d5 ]
Use early return and a jump label to remove two indentation levels. No functional changes.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 66 ++++++++++++++++++++++++------------------------ 1 file changed, 33 insertions(+), 33 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 5fdc4f5193ee5..7da17dd7c38b9 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1560,46 +1560,46 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, struct tls_prot_info *prot = &tls_ctx->prot_info; struct strp_msg *rxm = strp_msg(skb); struct tls_msg *tlm = tls_msg(skb); - int pad, err = 0; + int pad, err;
- if (!tlm->decrypted) { - if (tls_ctx->rx_conf == TLS_HW) { - err = tls_device_decrypted(sk, tls_ctx, skb, rxm); - if (err < 0) - return err; - } + if (tlm->decrypted) { + *zc = false; + return 0; + }
- /* Still not decrypted after tls_device */ - if (!tlm->decrypted) { - err = decrypt_internal(sk, skb, dest, NULL, chunk, zc, - async); - if (err < 0) { - if (err == -EINPROGRESS) - tls_advance_record_sn(sk, prot, - &tls_ctx->rx); - else if (err == -EBADMSG) - TLS_INC_STATS(sock_net(sk), - LINUX_MIB_TLSDECRYPTERROR); - return err; - } - } else { + if (tls_ctx->rx_conf == TLS_HW) { + err = tls_device_decrypted(sk, tls_ctx, skb, rxm); + if (err < 0) + return err; + + /* skip SW decryption if NIC handled it already */ + if (tlm->decrypted) { *zc = false; + goto decrypt_done; } + }
- pad = padding_length(prot, skb); - if (pad < 0) - return pad; - - rxm->full_len -= pad; - rxm->offset += prot->prepend_size; - rxm->full_len -= prot->overhead_size; - tls_advance_record_sn(sk, prot, &tls_ctx->rx); - tlm->decrypted = 1; - } else { - *zc = false; + err = decrypt_internal(sk, skb, dest, NULL, chunk, zc, async); + if (err < 0) { + if (err == -EINPROGRESS) + tls_advance_record_sn(sk, prot, &tls_ctx->rx); + else if (err == -EBADMSG) + TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTERROR); + return err; }
- return err; +decrypt_done: + pad = padding_length(prot, skb); + if (pad < 0) + return pad; + + rxm->full_len -= pad; + rxm->offset += prot->prepend_size; + rxm->full_len -= prot->overhead_size; + tls_advance_record_sn(sk, prot, &tls_ctx->rx); + tlm->decrypted = 1; + + return 0; }
int decrypt_skb(struct sock *sk, struct sk_buff *skb,
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 71471ca32505afa7c3f7f6a8268716e1ddb81cd4 ]
Instead of tls_device poking into internals of the message return 1 from tls_device_decrypted() if the device handled the decryption.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_device.c | 7 ++----- net/tls/tls_sw.c | 5 ++--- 2 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index f23d18e666284..e7c361807590d 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -936,7 +936,6 @@ int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx, struct sk_buff *skb, struct strp_msg *rxm) { struct tls_offload_context_rx *ctx = tls_offload_ctx_rx(tls_ctx); - struct tls_msg *tlm = tls_msg(skb); int is_decrypted = skb->decrypted; int is_encrypted = !is_decrypted; struct sk_buff *skb_iter; @@ -951,11 +950,9 @@ int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx, tls_ctx->rx.rec_seq, rxm->full_len, is_encrypted, is_decrypted);
- tlm->decrypted |= is_decrypted; - if (unlikely(test_bit(TLS_RX_DEV_DEGRADED, &tls_ctx->flags))) { if (likely(is_encrypted || is_decrypted)) - return 0; + return is_decrypted;
/* After tls_device_down disables the offload, the next SKB will * likely have initial fragments decrypted, and final ones not @@ -970,7 +967,7 @@ int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx, */ if (is_decrypted) { ctx->resync_nh_reset = 1; - return 0; + return is_decrypted; } if (is_encrypted) { tls_device_core_ctrl_rx_resync(tls_ctx, ctx, sk, skb); diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 7da17dd7c38b9..eed32ef3ca4a0 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1571,9 +1571,8 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, err = tls_device_decrypted(sk, tls_ctx, skb, rxm); if (err < 0) return err; - - /* skip SW decryption if NIC handled it already */ - if (tlm->decrypted) { + if (err > 0) { + tlm->decrypted = 1; *zc = false; goto decrypt_done; }
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit d4bd88e67666c73cfa9d75c282e708890d4f10a7 ]
sk is unused, remove it to make it clear the function doesn't poke at the socket.
size_used is always 0 on input and @length on success.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index eed32ef3ca4a0..cf09f147f5a09 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1348,15 +1348,14 @@ static struct sk_buff *tls_wait_data(struct sock *sk, struct sk_psock *psock, return skb; }
-static int tls_setup_from_iter(struct sock *sk, struct iov_iter *from, +static int tls_setup_from_iter(struct iov_iter *from, int length, int *pages_used, - unsigned int *size_used, struct scatterlist *to, int to_max_pages) { int rc = 0, i = 0, num_elem = *pages_used, maxpages; struct page *pages[MAX_SKB_FRAGS]; - unsigned int size = *size_used; + unsigned int size = 0; ssize_t copied, use; size_t offset;
@@ -1399,8 +1398,7 @@ static int tls_setup_from_iter(struct sock *sk, struct iov_iter *from, sg_mark_end(&to[num_elem - 1]); out: if (rc) - iov_iter_revert(from, size - *size_used); - *size_used = size; + iov_iter_revert(from, size); *pages_used = num_elem;
return rc; @@ -1519,12 +1517,12 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb, sg_init_table(sgout, n_sgout); sg_set_buf(&sgout[0], aad, prot->aad_size);
- *chunk = 0; - err = tls_setup_from_iter(sk, out_iov, data_len, - &pages, chunk, &sgout[1], + err = tls_setup_from_iter(out_iov, data_len, + &pages, &sgout[1], (n_sgout - 1)); if (err < 0) goto fallback_to_reg_recv; + *chunk = data_len; } else if (out_sg) { memcpy(sgout, out_sg, n_sgout * sizeof(*sgout)); } else {
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 9bdf75ccffa690237cd0b472cd598cf6d22873dc ]
We plumb pointer to chunk all the way to the decryption method. It's set to the length of the text when decrypt_skb_update() returns.
I think the code is written this way because original TLS implementation passed &chunk to zerocopy_from_iter() and this was carried forward as the code gotten more complex, without any refactoring.
The fix for peek() introduced a new variable - to_decrypt which for all practical purposes is what chunk is going to get set to. Spare ourselves the pointer passing, use to_decrypt.
Use this opportunity to clean things up a little further.
Note that chunk / to_decrypt was mostly needed for the async path, since the sync path would access rxm->full_len (decryption transforms full_len from record size to text size). Use the right source of truth more explicitly.
We have three cases: - async - it's TLS 1.2 only, so chunk == to_decrypt, but we need the min() because to_decrypt is a whole record and we don't want to underflow len. Note that we can't handle partial record by falling back to sync as it would introduce reordering against records in flight. - zc - again, TLS 1.2 only for now, so chunk == to_decrypt, we don't do zc if len < to_decrypt, no need to check again. - normal - it already handles chunk > len, we can factor out the assignment to rxm->full_len and share it with zc.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 33 ++++++++++++++------------------- 1 file changed, 14 insertions(+), 19 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index cf09f147f5a09..fc1fa98d21937 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1415,7 +1415,7 @@ static int tls_setup_from_iter(struct iov_iter *from, static int decrypt_internal(struct sock *sk, struct sk_buff *skb, struct iov_iter *out_iov, struct scatterlist *out_sg, - int *chunk, bool *zc, bool async) + bool *zc, bool async) { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); @@ -1522,7 +1522,6 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb, (n_sgout - 1)); if (err < 0) goto fallback_to_reg_recv; - *chunk = data_len; } else if (out_sg) { memcpy(sgout, out_sg, n_sgout * sizeof(*sgout)); } else { @@ -1532,7 +1531,6 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb, fallback_to_reg_recv: sgout = sgin; pages = 0; - *chunk = data_len; *zc = false; }
@@ -1551,8 +1549,7 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb, }
static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, - struct iov_iter *dest, int *chunk, bool *zc, - bool async) + struct iov_iter *dest, bool *zc, bool async) { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_prot_info *prot = &tls_ctx->prot_info; @@ -1576,7 +1573,7 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, } }
- err = decrypt_internal(sk, skb, dest, NULL, chunk, zc, async); + err = decrypt_internal(sk, skb, dest, NULL, zc, async); if (err < 0) { if (err == -EINPROGRESS) tls_advance_record_sn(sk, prot, &tls_ctx->rx); @@ -1603,9 +1600,8 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb, struct scatterlist *sgout) { bool zc = true; - int chunk;
- return decrypt_internal(sk, skb, NULL, sgout, &chunk, &zc, false); + return decrypt_internal(sk, skb, NULL, sgout, &zc, false); }
static bool tls_sw_advance_skb(struct sock *sk, struct sk_buff *skb, @@ -1795,9 +1791,8 @@ int tls_sw_recvmsg(struct sock *sk, num_async = 0; while (len && (decrypted + copied < target || ctx->recv_pkt)) { bool retain_skb = false; + int to_decrypt, chunk; bool zc = false; - int to_decrypt; - int chunk = 0; bool async_capable; bool async = false;
@@ -1834,7 +1829,7 @@ int tls_sw_recvmsg(struct sock *sk, async_capable = false;
err = decrypt_skb_update(sk, skb, &msg->msg_iter, - &chunk, &zc, async_capable); + &zc, async_capable); if (err < 0 && err != -EINPROGRESS) { tls_err_abort(sk, -EBADMSG); goto recv_end; @@ -1872,8 +1867,13 @@ int tls_sw_recvmsg(struct sock *sk, } }
- if (async) + if (async) { + /* TLS 1.2-only, to_decrypt must be text length */ + chunk = min_t(int, to_decrypt, len); goto pick_next_record; + } + /* TLS 1.3 may have updated the length by more than overhead */ + chunk = rxm->full_len;
if (!zc) { if (bpf_strp_enabled) { @@ -1889,11 +1889,9 @@ int tls_sw_recvmsg(struct sock *sk, } }
- if (rxm->full_len > len) { + if (chunk > len) { retain_skb = true; chunk = len; - } else { - chunk = rxm->full_len; }
err = skb_copy_datagram_msg(skb, rxm->offset, @@ -1908,9 +1906,6 @@ int tls_sw_recvmsg(struct sock *sk, }
pick_next_record: - if (chunk > len) - chunk = len; - decrypted += chunk; len -= chunk;
@@ -2011,7 +2006,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, if (!skb) goto splice_read_end;
- err = decrypt_skb_update(sk, skb, NULL, &chunk, &zc, false); + err = decrypt_skb_update(sk, skb, NULL, &zc, false); if (err < 0) { tls_err_abort(sk, -EBADMSG); goto splice_read_end;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 4175eac37123a68ebee71f288826339fb89bfec7 ]
We pass zc as a pointer to bool a few functions down as an in/out argument. This is error prone since C will happily evalue a pointer as a boolean (IOW forgetting *zc and writing zc leads to loss of developer time..). Wrap the arguments into a structure.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 49 ++++++++++++++++++++++++++---------------------- 1 file changed, 27 insertions(+), 22 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index fc1fa98d21937..c491cde30504e 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -44,6 +44,11 @@ #include <net/strparser.h> #include <net/tls.h>
+struct tls_decrypt_arg { + bool zc; + bool async; +}; + noinline void tls_err_abort(struct sock *sk, int err) { WARN_ON_ONCE(err >= 0); @@ -1415,7 +1420,7 @@ static int tls_setup_from_iter(struct iov_iter *from, static int decrypt_internal(struct sock *sk, struct sk_buff *skb, struct iov_iter *out_iov, struct scatterlist *out_sg, - bool *zc, bool async) + struct tls_decrypt_arg *darg) { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); @@ -1432,7 +1437,7 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb, prot->tail_size; int iv_offset = 0;
- if (*zc && (out_iov || out_sg)) { + if (darg->zc && (out_iov || out_sg)) { if (out_iov) n_sgout = iov_iter_npages(out_iov, INT_MAX) + 1; else @@ -1441,7 +1446,7 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb, rxm->full_len - prot->prepend_size); } else { n_sgout = 0; - *zc = false; + darg->zc = false; n_sgin = skb_cow_data(skb, 0, &unused); }
@@ -1531,12 +1536,12 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb, fallback_to_reg_recv: sgout = sgin; pages = 0; - *zc = false; + darg->zc = false; }
/* Prepare and submit AEAD request */ err = tls_do_decryption(sk, skb, sgin, sgout, iv, - data_len, aead_req, async); + data_len, aead_req, darg->async); if (err == -EINPROGRESS) return err;
@@ -1549,7 +1554,8 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb, }
static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, - struct iov_iter *dest, bool *zc, bool async) + struct iov_iter *dest, + struct tls_decrypt_arg *darg) { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_prot_info *prot = &tls_ctx->prot_info; @@ -1558,7 +1564,7 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, int pad, err;
if (tlm->decrypted) { - *zc = false; + darg->zc = false; return 0; }
@@ -1568,12 +1574,12 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, return err; if (err > 0) { tlm->decrypted = 1; - *zc = false; + darg->zc = false; goto decrypt_done; } }
- err = decrypt_internal(sk, skb, dest, NULL, zc, async); + err = decrypt_internal(sk, skb, dest, NULL, darg); if (err < 0) { if (err == -EINPROGRESS) tls_advance_record_sn(sk, prot, &tls_ctx->rx); @@ -1599,9 +1605,9 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, int decrypt_skb(struct sock *sk, struct sk_buff *skb, struct scatterlist *sgout) { - bool zc = true; + struct tls_decrypt_arg darg = { .zc = true, };
- return decrypt_internal(sk, skb, NULL, sgout, &zc, false); + return decrypt_internal(sk, skb, NULL, sgout, &darg); }
static bool tls_sw_advance_skb(struct sock *sk, struct sk_buff *skb, @@ -1790,11 +1796,10 @@ int tls_sw_recvmsg(struct sock *sk, decrypted = 0; num_async = 0; while (len && (decrypted + copied < target || ctx->recv_pkt)) { + struct tls_decrypt_arg darg = {}; bool retain_skb = false; int to_decrypt, chunk; - bool zc = false; - bool async_capable; - bool async = false; + bool async;
skb = tls_wait_data(sk, psock, flags & MSG_DONTWAIT, timeo, &err); if (!skb) { @@ -1820,16 +1825,15 @@ int tls_sw_recvmsg(struct sock *sk, tlm->control == TLS_RECORD_TYPE_DATA && prot->version != TLS_1_3_VERSION && !bpf_strp_enabled) - zc = true; + darg.zc = true;
/* Do not use async mode if record is non-data */ if (tlm->control == TLS_RECORD_TYPE_DATA && !bpf_strp_enabled) - async_capable = ctx->async_capable; + darg.async = ctx->async_capable; else - async_capable = false; + darg.async = false;
- err = decrypt_skb_update(sk, skb, &msg->msg_iter, - &zc, async_capable); + err = decrypt_skb_update(sk, skb, &msg->msg_iter, &darg); if (err < 0 && err != -EINPROGRESS) { tls_err_abort(sk, -EBADMSG); goto recv_end; @@ -1875,7 +1879,7 @@ int tls_sw_recvmsg(struct sock *sk, /* TLS 1.3 may have updated the length by more than overhead */ chunk = rxm->full_len;
- if (!zc) { + if (!darg.zc) { if (bpf_strp_enabled) { err = sk_psock_tls_strp_read(psock, skb); if (err != __SK_PASS) { @@ -1991,7 +1995,6 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, int err = 0; long timeo; int chunk; - bool zc = false;
lock_sock(sk);
@@ -2001,12 +2004,14 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, if (from_queue) { skb = __skb_dequeue(&ctx->rx_list); } else { + struct tls_decrypt_arg darg = {}; + skb = tls_wait_data(sk, NULL, flags & SPLICE_F_NONBLOCK, timeo, &err); if (!skb) goto splice_read_end;
- err = decrypt_skb_update(sk, skb, NULL, &zc, false); + err = decrypt_skb_update(sk, skb, NULL, &darg); if (err < 0) { tls_err_abort(sk, -EBADMSG); goto splice_read_end;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 06554f4ffc2595ae52ee80aec4a13bd77d22bed7 ]
cmsg can be filled in during rx_list processing or normal receive. Consolidate the code.
We don't need to keep the boolean to track if the cmsg was created. 0 is an invalid content type.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 91 +++++++++++++++++++----------------------------- 1 file changed, 36 insertions(+), 55 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index c491cde30504e..ca71a9f559b37 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1634,6 +1634,29 @@ static bool tls_sw_advance_skb(struct sock *sk, struct sk_buff *skb, return true; }
+static int tls_record_content_type(struct msghdr *msg, struct tls_msg *tlm, + u8 *control) +{ + int err; + + if (!*control) { + *control = tlm->control; + if (!*control) + return -EBADMSG; + + err = put_cmsg(msg, SOL_TLS, TLS_GET_RECORD_TYPE, + sizeof(*control), control); + if (*control != TLS_RECORD_TYPE_DATA) { + if (err || msg->msg_flags & MSG_CTRUNC) + return -EIO; + } + } else if (*control != tlm->control) { + return 0; + } + + return 1; +} + /* This function traverses the rx_list in tls receive context to copies the * decrypted records into the buffer provided by caller zero copy is not * true. Further, the records are removed from the rx_list if it is not a peek @@ -1642,31 +1665,23 @@ static bool tls_sw_advance_skb(struct sock *sk, struct sk_buff *skb, static int process_rx_list(struct tls_sw_context_rx *ctx, struct msghdr *msg, u8 *control, - bool *cmsg, size_t skip, size_t len, bool zc, bool is_peek) { struct sk_buff *skb = skb_peek(&ctx->rx_list); - u8 ctrl = *control; - u8 msgc = *cmsg; struct tls_msg *tlm; ssize_t copied = 0; - - /* Set the record type in 'control' if caller didn't pass it */ - if (!ctrl && skb) { - tlm = tls_msg(skb); - ctrl = tlm->control; - } + int err;
while (skip && skb) { struct strp_msg *rxm = strp_msg(skb); tlm = tls_msg(skb);
- /* Cannot process a record of different type */ - if (ctrl != tlm->control) - return 0; + err = tls_record_content_type(msg, tlm, control); + if (err <= 0) + return err;
if (skip < rxm->full_len) break; @@ -1682,27 +1697,12 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
tlm = tls_msg(skb);
- /* Cannot process a record of different type */ - if (ctrl != tlm->control) - return 0; - - /* Set record type if not already done. For a non-data record, - * do not proceed if record type could not be copied. - */ - if (!msgc) { - int cerr = put_cmsg(msg, SOL_TLS, TLS_GET_RECORD_TYPE, - sizeof(ctrl), &ctrl); - msgc = true; - if (ctrl != TLS_RECORD_TYPE_DATA) { - if (cerr || msg->msg_flags & MSG_CTRUNC) - return -EIO; - - *cmsg = msgc; - } - } + err = tls_record_content_type(msg, tlm, control); + if (err <= 0) + return err;
if (!zc || (rxm->full_len - skip) > len) { - int err = skb_copy_datagram_msg(skb, rxm->offset + skip, + err = skb_copy_datagram_msg(skb, rxm->offset + skip, msg, chunk); if (err < 0) return err; @@ -1739,7 +1739,6 @@ static int process_rx_list(struct tls_sw_context_rx *ctx, skb = next_skb; }
- *control = ctrl; return copied; }
@@ -1761,7 +1760,6 @@ int tls_sw_recvmsg(struct sock *sk, struct tls_msg *tlm; struct sk_buff *skb; ssize_t copied = 0; - bool cmsg = false; int target, err = 0; long timeo; bool is_kvec = iov_iter_is_kvec(&msg->msg_iter); @@ -1778,8 +1776,7 @@ int tls_sw_recvmsg(struct sock *sk, bpf_strp_enabled = sk_psock_strp_enabled(psock);
/* Process pending decrypted records. It must be non-zero-copy */ - err = process_rx_list(ctx, msg, &control, &cmsg, 0, len, false, - is_peek); + err = process_rx_list(ctx, msg, &control, 0, len, false, is_peek); if (err < 0) { tls_err_abort(sk, err); goto end; @@ -1851,26 +1848,10 @@ int tls_sw_recvmsg(struct sock *sk, * is known just after record is dequeued from stream parser. * For tls1.3, we disable async. */ - - if (!control) - control = tlm->control; - else if (control != tlm->control) + err = tls_record_content_type(msg, tlm, &control); + if (err <= 0) goto recv_end;
- if (!cmsg) { - int cerr; - - cerr = put_cmsg(msg, SOL_TLS, TLS_GET_RECORD_TYPE, - sizeof(control), &control); - cmsg = true; - if (control != TLS_RECORD_TYPE_DATA) { - if (cerr || msg->msg_flags & MSG_CTRUNC) { - err = -EIO; - goto recv_end; - } - } - } - if (async) { /* TLS 1.2-only, to_decrypt must be text length */ chunk = min_t(int, to_decrypt, len); @@ -1959,10 +1940,10 @@ int tls_sw_recvmsg(struct sock *sk,
/* Drain records from the rx_list & copy if required */ if (is_peek || is_kvec) - err = process_rx_list(ctx, msg, &control, &cmsg, copied, + err = process_rx_list(ctx, msg, &control, copied, decrypted, false, is_peek); else - err = process_rx_list(ctx, msg, &control, &cmsg, 0, + err = process_rx_list(ctx, msg, &control, 0, decrypted, true, is_peek); if (err < 0) { tls_err_abort(sk, err);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 7da18bcc5e4cfd14ea520367546c5697e64ae592 ]
We track both if the last record was handled by async crypto and how many records were async. This is not necessary. We implicitly assume once crypto goes async it will stay that way, otherwise we'd reorder records. So just track if we're in async mode, the exact number of records is not necessary.
This change also forces us into "async" mode more consistently in case crypto ever decided to interleave async and sync.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index ca71a9f559b37..d3bbae9af9f41 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1753,13 +1753,13 @@ int tls_sw_recvmsg(struct sock *sk, struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); struct tls_prot_info *prot = &tls_ctx->prot_info; struct sk_psock *psock; - int num_async, pending; unsigned char control = 0; ssize_t decrypted = 0; struct strp_msg *rxm; struct tls_msg *tlm; struct sk_buff *skb; ssize_t copied = 0; + bool async = false; int target, err = 0; long timeo; bool is_kvec = iov_iter_is_kvec(&msg->msg_iter); @@ -1791,12 +1791,10 @@ int tls_sw_recvmsg(struct sock *sk, timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
decrypted = 0; - num_async = 0; while (len && (decrypted + copied < target || ctx->recv_pkt)) { struct tls_decrypt_arg darg = {}; bool retain_skb = false; int to_decrypt, chunk; - bool async;
skb = tls_wait_data(sk, psock, flags & MSG_DONTWAIT, timeo, &err); if (!skb) { @@ -1836,10 +1834,8 @@ int tls_sw_recvmsg(struct sock *sk, goto recv_end; }
- if (err == -EINPROGRESS) { + if (err == -EINPROGRESS) async = true; - num_async++; - }
/* If the type of records being processed is not known yet, * set it to record type just dequeued. If it is already known, @@ -1914,7 +1910,9 @@ int tls_sw_recvmsg(struct sock *sk, }
recv_end: - if (num_async) { + if (async) { + int pending; + /* Wait for all previously submitted records to be decrypted */ spin_lock_bh(&ctx->decrypt_compl_lock); ctx->async_notify = true;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 284b4d93daee56dff3e10029ddf2e03227f50dbf ]
Move counting TlsDecryptErrors to tls_do_decryption() where differences between sync and async crypto are reconciled.
No functional changes, this code just always gave me a pause.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index d3bbae9af9f41..85fa49170b4e5 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -274,6 +274,8 @@ static int tls_do_decryption(struct sock *sk,
ret = crypto_wait_req(ret, &ctx->async_wait); } + if (ret == -EBADMSG) + TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTERROR);
if (async) atomic_dec(&ctx->decrypt_pending); @@ -1583,8 +1585,6 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, if (err < 0) { if (err == -EINPROGRESS) tls_advance_record_sn(sk, prot, &tls_ctx->rx); - else if (err == -EBADMSG) - TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTERROR); return err; }
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 1c699ffa48a15710746989c36a82cbfb07e8d17f ]
If crypto didn't always invoke our callback for async we'd not be clearing skb->sk and would crash in the skb core when freeing it. This if must be dead code.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 85fa49170b4e5..27ac27daec868 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -277,9 +277,6 @@ static int tls_do_decryption(struct sock *sk, if (ret == -EBADMSG) TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTERROR);
- if (async) - atomic_dec(&ctx->decrypt_pending); - return ret; }
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 3547a1f9d988d88ecff4fc365d2773037c849f49 ]
Propagating EINPROGRESS thru multiple layers of functions is error prone. Use darg->async as an in/out argument, like we use darg->zc today. On input it tells the code if async is allowed, on output if it took place.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 27ac27daec868..a1a99f9f093b1 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -236,7 +236,7 @@ static int tls_do_decryption(struct sock *sk, char *iv_recv, size_t data_len, struct aead_request *aead_req, - bool async) + struct tls_decrypt_arg *darg) { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_prot_info *prot = &tls_ctx->prot_info; @@ -249,7 +249,7 @@ static int tls_do_decryption(struct sock *sk, data_len + prot->tag_size, (u8 *)iv_recv);
- if (async) { + if (darg->async) { /* Using skb->sk to push sk through to crypto async callback * handler. This allows propagating errors up to the socket * if needed. It _must_ be cleared in the async handler @@ -269,11 +269,13 @@ static int tls_do_decryption(struct sock *sk,
ret = crypto_aead_decrypt(aead_req); if (ret == -EINPROGRESS) { - if (async) - return ret; + if (darg->async) + return 0;
ret = crypto_wait_req(ret, &ctx->async_wait); } + darg->async = false; + if (ret == -EBADMSG) TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTERROR);
@@ -1540,9 +1542,9 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
/* Prepare and submit AEAD request */ err = tls_do_decryption(sk, skb, sgin, sgout, iv, - data_len, aead_req, darg->async); - if (err == -EINPROGRESS) - return err; + data_len, aead_req, darg); + if (darg->async) + return 0;
/* Release the pages in case iov was mapped to pages */ for (; pages > 0; pages--) @@ -1579,11 +1581,10 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, }
err = decrypt_internal(sk, skb, dest, NULL, darg); - if (err < 0) { - if (err == -EINPROGRESS) - tls_advance_record_sn(sk, prot, &tls_ctx->rx); + if (err < 0) return err; - } + if (darg->async) + goto decrypt_next;
decrypt_done: pad = padding_length(prot, skb); @@ -1593,8 +1594,9 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb, rxm->full_len -= pad; rxm->offset += prot->prepend_size; rxm->full_len -= prot->overhead_size; - tls_advance_record_sn(sk, prot, &tls_ctx->rx); tlm->decrypted = 1; +decrypt_next: + tls_advance_record_sn(sk, prot, &tls_ctx->rx);
return 0; } @@ -1826,13 +1828,12 @@ int tls_sw_recvmsg(struct sock *sk, darg.async = false;
err = decrypt_skb_update(sk, skb, &msg->msg_iter, &darg); - if (err < 0 && err != -EINPROGRESS) { + if (err < 0) { tls_err_abort(sk, -EBADMSG); goto recv_end; }
- if (err == -EINPROGRESS) - async = true; + async |= darg.async;
/* If the type of records being processed is not known yet, * set it to record type just dequeued. If it is already known,
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit f7fa16d49837f947ee59492958f9e6f0e51d9a78 ]
With mixed sync/async decryption, or failures of crypto_aead_decrypt, we increment decrypt_pending but we never do the corresponding decrement since tls_decrypt_done will not be called. In this case, we should decrement decrypt_pending immediately to avoid getting stuck.
For example, the prequeue prequeue test gets stuck with mixed modes (one async decrypt + one sync decrypt).
Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Link: https://lore.kernel.org/r/c56d5fc35543891d5319f834f25622360e1bfbec.170913264... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index a1a99f9f093b1..83319a3b8bdd1 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -273,6 +273,8 @@ static int tls_do_decryption(struct sock *sk, return 0;
ret = crypto_wait_req(ret, &ctx->async_wait); + } else if (darg->async) { + atomic_dec(&ctx->decrypt_pending); } darg->async = false;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit fccfa646ef3628097d59f7d9c1a3e84d4b6bb45e ]
gcc-14 notices that the allocation with sizeof(void) on 32-bit architectures is not enough for a 64-bit phys_addr_t:
drivers/firmware/efi/capsule-loader.c: In function 'efi_capsule_open': drivers/firmware/efi/capsule-loader.c:295:24: error: allocation of insufficient size '4' for type 'phys_addr_t' {aka 'long long unsigned int'} with size '8' [-Werror=alloc-size] 295 | cap_info->phys = kzalloc(sizeof(void *), GFP_KERNEL); | ^
Use the correct type instead here.
Fixes: f24c4d478013 ("efi/capsule-loader: Reinstate virtual capsule mapping") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/capsule-loader.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/capsule-loader.c b/drivers/firmware/efi/capsule-loader.c index 3e8d4b51a8140..97bafb5f70389 100644 --- a/drivers/firmware/efi/capsule-loader.c +++ b/drivers/firmware/efi/capsule-loader.c @@ -292,7 +292,7 @@ static int efi_capsule_open(struct inode *inode, struct file *file) return -ENOMEM; }
- cap_info->phys = kzalloc(sizeof(void *), GFP_KERNEL); + cap_info->phys = kzalloc(sizeof(phys_addr_t), GFP_KERNEL); if (!cap_info->phys) { kfree(cap_info->pages); kfree(cap_info);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 2df70149e73e79783bcbc7db4fa51ecef0e2022c ]
The bq27xxx i2c-client may not have an IRQ, in which case client->irq will be 0. bq27xxx_battery_i2c_probe() already has an if (client->irq) check wrapping the request_threaded_irq().
But bq27xxx_battery_i2c_remove() unconditionally calls free_irq(client->irq) leading to:
[ 190.310742] ------------[ cut here ]------------ [ 190.310843] Trying to free already-free IRQ 0 [ 190.310861] WARNING: CPU: 2 PID: 1304 at kernel/irq/manage.c:1893 free_irq+0x1b8/0x310
Followed by a backtrace when unbinding the driver. Add an if (client->irq) to bq27xxx_battery_i2c_remove() mirroring probe() to fix this.
Fixes: 444ff00734f3 ("power: supply: bq27xxx: Fix I2C IRQ race on remove") Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20240215155133.70537-1-hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/bq27xxx_battery_i2c.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/power/supply/bq27xxx_battery_i2c.c b/drivers/power/supply/bq27xxx_battery_i2c.c index b722ee2d7e142..4e5d773b3bf8d 100644 --- a/drivers/power/supply/bq27xxx_battery_i2c.c +++ b/drivers/power/supply/bq27xxx_battery_i2c.c @@ -209,7 +209,9 @@ static int bq27xxx_battery_i2c_remove(struct i2c_client *client) { struct bq27xxx_device_info *di = i2c_get_clientdata(client);
- free_irq(client->irq, di); + if (client->irq) + free_irq(client->irq, di); + bq27xxx_battery_teardown(di);
mutex_lock(&battery_mutex);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
[ Upstream commit 4df49712eb54141be00a9312547436d55677f092 ]
We forgot to remove the line for snd-rtctimer from Makefile while dropping the functionality. Get rid of the stale line.
Fixes: 34ce71a96dcb ("ALSA: timer: remove legacy rtctimer") Link: https://lore.kernel.org/r/20240221092156.28695-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/core/Makefile | 1 - 1 file changed, 1 deletion(-)
diff --git a/sound/core/Makefile b/sound/core/Makefile index 79e1407cd0de7..7da92e0383e1c 100644 --- a/sound/core/Makefile +++ b/sound/core/Makefile @@ -33,7 +33,6 @@ snd-ctl-led-objs := control_led.o snd-rawmidi-objs := rawmidi.o snd-timer-objs := timer.o snd-hrtimer-objs := hrtimer.o -snd-rtctimer-objs := rtctimer.o snd-hwdep-objs := hwdep.o snd-seq-device-objs := seq_device.o
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby (SUSE) jirislaby@kernel.org
[ Upstream commit 00d6a284fcf3fad1b7e1b5bc3cd87cbfb60ce03f ]
Commit a5a923038d70 (fbdev: fbcon: Properly revert changes when vc_resize() failed) started restoring old font data upon failure (of vc_resize()). But it performs so only for user fonts. It means that the "system"/internal fonts are not restored at all. So in result, the very first call to fbcon_do_set_font() performs no restore at all upon failing vc_resize().
This can be reproduced by Syzkaller to crash the system on the next invocation of font_get(). It's rather hard to hit the allocation failure in vc_resize() on the first font_set(), but not impossible. Esp. if fault injection is used to aid the execution/failure. It was demonstrated by Sirius: BUG: unable to handle page fault for address: fffffffffffffff8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD cb7b067 P4D cb7b067 PUD cb7d067 PMD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8007 Comm: poc Not tainted 6.7.0-g9d1694dc91ce #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:fbcon_get_font+0x229/0x800 drivers/video/fbdev/core/fbcon.c:2286 Call Trace: <TASK> con_font_get drivers/tty/vt/vt.c:4558 [inline] con_font_op+0x1fc/0xf20 drivers/tty/vt/vt.c:4673 vt_k_ioctl drivers/tty/vt/vt_ioctl.c:474 [inline] vt_ioctl+0x632/0x2ec0 drivers/tty/vt/vt_ioctl.c:752 tty_ioctl+0x6f8/0x1570 drivers/tty/tty_io.c:2803 vfs_ioctl fs/ioctl.c:51 [inline] ...
So restore the font data in any case, not only for user fonts. Note the later 'if' is now protected by 'old_userfont' and not 'old_data' as the latter is always set now. (And it is supposed to be non-NULL. Otherwise we would see the bug above again.)
Signed-off-by: Jiri Slaby (SUSE) jirislaby@kernel.org Fixes: a5a923038d70 ("fbdev: fbcon: Properly revert changes when vc_resize() failed") Reported-and-tested-by: Ubisectech Sirius bugreport@ubisectech.com Cc: Ubisectech Sirius bugreport@ubisectech.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Helge Deller deller@gmx.de Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/20240208114411.14604-1-jirisla... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/core/fbcon.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c index b6712655ec1f0..b163b54b868e6 100644 --- a/drivers/video/fbdev/core/fbcon.c +++ b/drivers/video/fbdev/core/fbcon.c @@ -2409,11 +2409,9 @@ static int fbcon_do_set_font(struct vc_data *vc, int w, int h, int charcount, struct fbcon_ops *ops = info->fbcon_par; struct fbcon_display *p = &fb_display[vc->vc_num]; int resize, ret, old_userfont, old_width, old_height, old_charcount; - char *old_data = NULL; + u8 *old_data = vc->vc_font.data;
resize = (w != vc->vc_font.width) || (h != vc->vc_font.height); - if (p->userfont) - old_data = vc->vc_font.data; vc->vc_font.data = (void *)(p->fontdata = data); old_userfont = p->userfont; if ((p->userfont = userfont)) @@ -2447,13 +2445,13 @@ static int fbcon_do_set_font(struct vc_data *vc, int w, int h, int charcount, update_screen(vc); }
- if (old_data && (--REFCOUNT(old_data) == 0)) + if (old_userfont && (--REFCOUNT(old_data) == 0)) kfree(old_data - FONT_EXTRA_WORDS * sizeof(int)); return 0;
err_out: p->fontdata = old_data; - vc->vc_font.data = (void *)old_data; + vc->vc_font.data = old_data;
if (userfont) { p->userfont = old_userfont;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit 5f7a07646655fb4108da527565dcdc80124b14c4 ]
If a directory has a block with only ".__afsXXXX" files in it (from uncompleted silly-rename), these .__afsXXXX files are skipped but without advancing the file position in the dir_context. This leads to afs_dir_iterate() repeating the block again and again.
Fix this by making the code that skips the .__afsXXXX file also manually advance the file position.
The symptoms are a soft lookup:
watchdog: BUG: soft lockup - CPU#3 stuck for 52s! [check:5737] ... RIP: 0010:afs_dir_iterate_block+0x39/0x1fd ... ? watchdog_timer_fn+0x1a6/0x213 ... ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? afs_dir_iterate_block+0x39/0x1fd afs_dir_iterate+0x10a/0x148 afs_readdir+0x30/0x4a iterate_dir+0x93/0xd3 __do_sys_getdents64+0x6b/0xd4
This is almost certainly the actual fix for:
https://bugzilla.kernel.org/show_bug.cgi?id=218496
Fixes: 57e9d49c5452 ("afs: Hide silly-rename files from userspace") Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/786185.1708694102@warthog.procyon.org.uk Reviewed-by: Marc Dionne marc.dionne@auristor.com cc: Marc Dionne marc.dionne@auristor.com cc: Markus Suvanto markus.suvanto@gmail.com cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/dir.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 106426de50279..c4e22e9f7a666 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -497,8 +497,10 @@ static int afs_dir_iterate_block(struct afs_vnode *dvnode, dire->u.name[0] == '.' && ctx->actor != afs_lookup_filldir && ctx->actor != afs_lookup_one_filldir && - memcmp(dire->u.name, ".__afs", 6) == 0) + memcmp(dire->u.name, ".__afs", 6) == 0) { + ctx->pos = blkoff + next * sizeof(union afs_xdr_dirent); continue; + }
/* found the next entry */ if (!dir_emit(ctx, dire->u.name, nlen,
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dimitris Vlachos dvlachos@ics.forth.gr
[ Upstream commit a11dd49dcb9376776193e15641f84fcc1e5980c9 ]
Offset vmemmap so that the first page of vmemmap will be mapped to the first page of physical memory in order to ensure that vmemmap’s bounds will be respected during pfn_to_page()/page_to_pfn() operations. The conversion macros will produce correct SV39/48/57 addresses for every possible/valid DRAM_BASE inside the physical memory limits.
v2:Address Alex's comments
Suggested-by: Alexandre Ghiti alexghiti@rivosinc.com Signed-off-by: Dimitris Vlachos dvlachos@ics.forth.gr Reported-by: Dimitris Vlachos dvlachos@ics.forth.gr Closes: https://lore.kernel.org/linux-riscv/20240202135030.42265-1-csd4492@csd.uoc.g... Fixes: d95f1a542c3d ("RISC-V: Implement sparsemem") Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20240229191723.32779-1-dvlachos@ics.forth.gr Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 397cb945b16eb..9a3d9b68f2ff4 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -58,7 +58,7 @@ * Define vmemmap for pfn_to_page & page_to_pfn calls. Needed if kernel * is configured with CONFIG_SPARSEMEM_VMEMMAP enabled. */ -#define vmemmap ((struct page *)VMEMMAP_START) +#define vmemmap ((struct page *)VMEMMAP_START - (phys_ram_base >> PAGE_SHIFT))
#define PCI_IO_SIZE SZ_16M #define PCI_IO_END VMEMMAP_START
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 2f03fc340cac9ea1dc63cbf8c93dd2eb0f227815 upstream.
Since tomoyo_write_control() updates head->write_buf when write() of long lines is requested, we need to fetch head->write_buf after head->io_sem is held. Otherwise, concurrent write() requests can cause use-after-free-write and double-free problems.
Reported-by: Sam Sun samsun1006219@gmail.com Closes: https://lkml.kernel.org/r/CAEkJfYNDspuGxYx5kym8Lvp--D36CMDUErg4rxfWFJuPbbji8... Fixes: bd03a3e4c9a9 ("TOMOYO: Add policy namespace support.") Cc: stable@vger.kernel.org # Linux 3.1+ Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/tomoyo/common.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/security/tomoyo/common.c +++ b/security/tomoyo/common.c @@ -2657,13 +2657,14 @@ ssize_t tomoyo_write_control(struct tomo { int error = buffer_len; size_t avail_len = buffer_len; - char *cp0 = head->write_buf; + char *cp0; int idx;
if (!head->write) return -EINVAL; if (mutex_lock_interruptible(&head->io_sem)) return -EINTR; + cp0 = head->write_buf; head->read_user_buf_avail = 0; idx = tomoyo_read_lock(); /* Read a line and dispatch it to the policy handler. */
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Sakamoto o-takashi@sakamocchi.jp
commit 77ce96543b03f437c6b45f286d8110db2b6622a3 upstream.
The local helper function to compare the given pair of cycle count evaluates them. If the left value is less than the right value, the function returns negative value.
If the safe cycle is less than the current cycle, it is the case of cycle lost. However, it is not currently handled properly.
This commit fixes the bug.
Cc: stable@vger.kernel.org Fixes: 705794c53b00 ("ALSA: firewire-lib: check cycle continuity") Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Link: https://lore.kernel.org/r/20240218033026.72577-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/firewire/amdtp-stream.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/firewire/amdtp-stream.c +++ b/sound/firewire/amdtp-stream.c @@ -934,7 +934,7 @@ static int generate_device_pkt_descs(str // to the reason. unsigned int safe_cycle = increment_ohci_cycle_count(next_cycle, IR_JUMBO_PAYLOAD_MAX_SKIP_CYCLES); - lost = (compare_ohci_cycle_count(safe_cycle, cycle) > 0); + lost = (compare_ohci_cycle_count(safe_cycle, cycle) < 0); } if (lost) { dev_err(&s->unit->device, "Detect discontinuity of cycle: %d %d\n",
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Ofitserov oficerovas@altlinux.org
commit 616d82c3cfa2a2146dd7e3ae47bda7e877ee549e upstream.
The gtp_link_ops operations structure for the subsystem must be registered after registering the gtp_net_ops pernet operations structure.
Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug:
[ 1010.702740] gtp: GTP module unloaded [ 1010.715877] general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] SMP KASAN NOPTI [ 1010.715888] KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] [ 1010.715895] CPU: 1 PID: 128616 Comm: a.out Not tainted 6.8.0-rc6-std-def-alt1 #1 [ 1010.715899] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-alt1 04/01/2014 [ 1010.715908] RIP: 0010:gtp_newlink+0x4d7/0x9c0 [gtp] [ 1010.715915] Code: 80 3c 02 00 0f 85 41 04 00 00 48 8b bb d8 05 00 00 e8 ed f6 ff ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 4f 04 00 00 4c 89 e2 4c 8b 6d 00 48 b8 00 00 00 [ 1010.715920] RSP: 0018:ffff888020fbf180 EFLAGS: 00010203 [ 1010.715929] RAX: dffffc0000000000 RBX: ffff88800399c000 RCX: 0000000000000000 [ 1010.715933] RDX: 0000000000000001 RSI: ffffffff84805280 RDI: 0000000000000282 [ 1010.715938] RBP: 000000000000000d R08: 0000000000000001 R09: 0000000000000000 [ 1010.715942] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88800399cc80 [ 1010.715947] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000400 [ 1010.715953] FS: 00007fd1509ab5c0(0000) GS:ffff88805b300000(0000) knlGS:0000000000000000 [ 1010.715958] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1010.715962] CR2: 0000000000000000 CR3: 000000001c07a000 CR4: 0000000000750ee0 [ 1010.715968] PKRU: 55555554 [ 1010.715972] Call Trace: [ 1010.715985] ? __die_body.cold+0x1a/0x1f [ 1010.715995] ? die_addr+0x43/0x70 [ 1010.716002] ? exc_general_protection+0x199/0x2f0 [ 1010.716016] ? asm_exc_general_protection+0x1e/0x30 [ 1010.716026] ? gtp_newlink+0x4d7/0x9c0 [gtp] [ 1010.716034] ? gtp_net_exit+0x150/0x150 [gtp] [ 1010.716042] __rtnl_newlink+0x1063/0x1700 [ 1010.716051] ? rtnl_setlink+0x3c0/0x3c0 [ 1010.716063] ? is_bpf_text_address+0xc0/0x1f0 [ 1010.716070] ? kernel_text_address.part.0+0xbb/0xd0 [ 1010.716076] ? __kernel_text_address+0x56/0xa0 [ 1010.716084] ? unwind_get_return_address+0x5a/0xa0 [ 1010.716091] ? create_prof_cpu_mask+0x30/0x30 [ 1010.716098] ? arch_stack_walk+0x9e/0xf0 [ 1010.716106] ? stack_trace_save+0x91/0xd0 [ 1010.716113] ? stack_trace_consume_entry+0x170/0x170 [ 1010.716121] ? __lock_acquire+0x15c5/0x5380 [ 1010.716139] ? mark_held_locks+0x9e/0xe0 [ 1010.716148] ? kmem_cache_alloc_trace+0x35f/0x3c0 [ 1010.716155] ? __rtnl_newlink+0x1700/0x1700 [ 1010.716160] rtnl_newlink+0x69/0xa0 [ 1010.716166] rtnetlink_rcv_msg+0x43b/0xc50 [ 1010.716172] ? rtnl_fdb_dump+0x9f0/0x9f0 [ 1010.716179] ? lock_acquire+0x1fe/0x560 [ 1010.716188] ? netlink_deliver_tap+0x12f/0xd50 [ 1010.716196] netlink_rcv_skb+0x14d/0x440 [ 1010.716202] ? rtnl_fdb_dump+0x9f0/0x9f0 [ 1010.716208] ? netlink_ack+0xab0/0xab0 [ 1010.716213] ? netlink_deliver_tap+0x202/0xd50 [ 1010.716220] ? netlink_deliver_tap+0x218/0xd50 [ 1010.716226] ? __virt_addr_valid+0x30b/0x590 [ 1010.716233] netlink_unicast+0x54b/0x800 [ 1010.716240] ? netlink_attachskb+0x870/0x870 [ 1010.716248] ? __check_object_size+0x2de/0x3b0 [ 1010.716254] netlink_sendmsg+0x938/0xe40 [ 1010.716261] ? netlink_unicast+0x800/0x800 [ 1010.716269] ? __import_iovec+0x292/0x510 [ 1010.716276] ? netlink_unicast+0x800/0x800 [ 1010.716284] __sock_sendmsg+0x159/0x190 [ 1010.716290] ____sys_sendmsg+0x712/0x880 [ 1010.716297] ? sock_write_iter+0x3d0/0x3d0 [ 1010.716304] ? __ia32_sys_recvmmsg+0x270/0x270 [ 1010.716309] ? lock_acquire+0x1fe/0x560 [ 1010.716315] ? drain_array_locked+0x90/0x90 [ 1010.716324] ___sys_sendmsg+0xf8/0x170 [ 1010.716331] ? sendmsg_copy_msghdr+0x170/0x170 [ 1010.716337] ? lockdep_init_map_type+0x2c7/0x860 [ 1010.716343] ? lockdep_hardirqs_on_prepare+0x430/0x430 [ 1010.716350] ? debug_mutex_init+0x33/0x70 [ 1010.716360] ? percpu_counter_add_batch+0x8b/0x140 [ 1010.716367] ? lock_acquire+0x1fe/0x560 [ 1010.716373] ? find_held_lock+0x2c/0x110 [ 1010.716384] ? __fd_install+0x1b6/0x6f0 [ 1010.716389] ? lock_downgrade+0x810/0x810 [ 1010.716396] ? __fget_light+0x222/0x290 [ 1010.716403] __sys_sendmsg+0xea/0x1b0 [ 1010.716409] ? __sys_sendmsg_sock+0x40/0x40 [ 1010.716419] ? lockdep_hardirqs_on_prepare+0x2b3/0x430 [ 1010.716425] ? syscall_enter_from_user_mode+0x1d/0x60 [ 1010.716432] do_syscall_64+0x30/0x40 [ 1010.716438] entry_SYSCALL_64_after_hwframe+0x62/0xc7 [ 1010.716444] RIP: 0033:0x7fd1508cbd49 [ 1010.716452] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ef 70 0d 00 f7 d8 64 89 01 48 [ 1010.716456] RSP: 002b:00007fff18872348 EFLAGS: 00000202 ORIG_RAX: 000000000000002e [ 1010.716463] RAX: ffffffffffffffda RBX: 000055f72bf0eac0 RCX: 00007fd1508cbd49 [ 1010.716468] RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000006 [ 1010.716473] RBP: 00007fff18872360 R08: 00007fff18872360 R09: 00007fff18872360 [ 1010.716478] R10: 00007fff18872360 R11: 0000000000000202 R12: 000055f72bf0e1b0 [ 1010.716482] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1010.716491] Modules linked in: gtp(+) udp_tunnel ib_core uinput af_packet rfkill qrtr joydev hid_generic usbhid hid kvm_intel iTCO_wdt intel_pmc_bxt iTCO_vendor_support kvm snd_hda_codec_generic ledtrig_audio irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel nls_utf8 snd_intel_dspcfg nls_cp866 psmouse aesni_intel vfat crypto_simd fat cryptd glue_helper snd_hda_codec pcspkr snd_hda_core i2c_i801 snd_hwdep i2c_smbus xhci_pci snd_pcm lpc_ich xhci_pci_renesas xhci_hcd qemu_fw_cfg tiny_power_button button sch_fq_codel vboxvideo drm_vram_helper drm_ttm_helper ttm vboxsf vboxguest snd_seq_midi snd_seq_midi_event snd_seq snd_rawmidi snd_seq_device snd_timer snd soundcore msr fuse efi_pstore dm_mod ip_tables x_tables autofs4 virtio_gpu virtio_dma_buf drm_kms_helper cec rc_core drm virtio_rng virtio_scsi rng_core virtio_balloon virtio_blk virtio_net virtio_console net_failover failover ahci libahci libata evdev scsi_mod input_leds serio_raw virtio_pci intel_agp [ 1010.716674] virtio_ring intel_gtt virtio [last unloaded: gtp] [ 1010.716693] ---[ end trace 04990a4ce61e174b ]---
Cc: stable@vger.kernel.org Signed-off-by: Alexander Ofitserov oficerovas@altlinux.org Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Reviewed-by: Jiri Pirko jiri@nvidia.com Link: https://lore.kernel.org/r/20240228114703.465107-1-oficerovas@altlinux.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/gtp.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -1422,26 +1422,26 @@ static int __init gtp_init(void)
get_random_bytes(>p_h_initval, sizeof(gtp_h_initval));
- err = rtnl_link_register(>p_link_ops); + err = register_pernet_subsys(>p_net_ops); if (err < 0) goto error_out;
- err = register_pernet_subsys(>p_net_ops); + err = rtnl_link_register(>p_link_ops); if (err < 0) - goto unreg_rtnl_link; + goto unreg_pernet_subsys;
err = genl_register_family(>p_genl_family); if (err < 0) - goto unreg_pernet_subsys; + goto unreg_rtnl_link;
pr_info("GTP module loaded (pdp ctx size %zd bytes)\n", sizeof(struct pdp_ctx)); return 0;
-unreg_pernet_subsys: - unregister_pernet_subsys(>p_net_ops); unreg_rtnl_link: rtnl_link_unregister(>p_link_ops); +unreg_pernet_subsys: + unregister_pernet_subsys(>p_net_ops); error_out: pr_err("error loading GTP module loaded\n"); return err;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
commit f78c1375339a291cba492a70eaf12ec501d28a8e upstream.
It's currently possible to change the mesh ID when the interface isn't yet in mesh mode, at the same time as changing it into mesh mode. This leads to an overwrite of data in the wdev->u union for the interface type it currently has, causing cfg80211_change_iface() to do wrong things when switching.
We could probably allow setting an interface to mesh while setting the mesh ID at the same time by doing a different order of operations here, but realistically there's no userspace that's going to do this, so just disallow changes in iftype when setting mesh ID.
Cc: stable@vger.kernel.org Fixes: 29cbe68c516a ("cfg80211/mac80211: add mesh join/leave commands") Reported-by: syzbot+dd4779978217b1973180@syzkaller.appspotmail.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/wireless/nl80211.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -3914,6 +3914,8 @@ static int nl80211_set_interface(struct
if (ntype != NL80211_IFTYPE_MESH_POINT) return -EINVAL; + if (otype != NL80211_IFTYPE_MESH_POINT) + return -EINVAL; if (netif_running(dev)) return -EBUSY;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
commit 9845664b9ee47ce7ee7ea93caf47d39a9d4552c4 upstream.
There's a syzbot report that device name buffers passed to device replace are not properly checked for string termination which could lead to a read out of bounds in getname_kernel().
Add a helper that validates both source and target device name buffers. For devid as the source initialize the buffer to empty string in case something tries to read it later.
This was originally analyzed and fixed in a different way by Edward Adam Davis (see links).
Link: https://lore.kernel.org/linux-btrfs/000000000000d1a1d1060cc9c5e7@google.com/ Link: https://lore.kernel.org/linux-btrfs/tencent_44CA0665C9836EF9EEC80CB9E7E206DF... CC: stable@vger.kernel.org # 4.19+ CC: Edward Adam Davis eadavis@qq.com Reported-and-tested-by: syzbot+33f23b49ac24f986c9e8@syzkaller.appspotmail.com Reviewed-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/dev-replace.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-)
--- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -763,6 +763,23 @@ leave: return ret; }
+static int btrfs_check_replace_dev_names(struct btrfs_ioctl_dev_replace_args *args) +{ + if (args->start.srcdevid == 0) { + if (memchr(args->start.srcdev_name, 0, + sizeof(args->start.srcdev_name)) == NULL) + return -ENAMETOOLONG; + } else { + args->start.srcdev_name[0] = 0; + } + + if (memchr(args->start.tgtdev_name, 0, + sizeof(args->start.tgtdev_name)) == NULL) + return -ENAMETOOLONG; + + return 0; +} + int btrfs_dev_replace_by_ioctl(struct btrfs_fs_info *fs_info, struct btrfs_ioctl_dev_replace_args *args) { @@ -775,10 +792,9 @@ int btrfs_dev_replace_by_ioctl(struct bt default: return -EINVAL; } - - if ((args->start.srcdevid == 0 && args->start.srcdev_name[0] == '\0') || - args->start.tgtdev_name[0] == '\0') - return -EINVAL; + ret = btrfs_check_replace_dev_names(args); + if (ret < 0) + return ret;
ret = btrfs_dev_replace_start(fs_info, args->start.tgtdev_name, args->start.srcdevid,
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peng Ma peng.ma@nxp.com
commit 9d739bccf261dd93ec1babf82f5c5d71dd4caa3e upstream.
There is chip (ls1028a) errata:
The SoC may hang on 16 byte unaligned read transactions by QDMA.
Unaligned read transactions initiated by QDMA may stall in the NOC (Network On-Chip), causing a deadlock condition. Stalled transactions will trigger completion timeouts in PCIe controller.
Workaround: Enable prefetch by setting the source descriptor prefetchable bit ( SD[PF] = 1 ).
Implement this workaround.
Cc: stable@vger.kernel.org Fixes: b092529e0aa0 ("dmaengine: fsl-qdma: Add qDMA controller driver for Layerscape SoCs") Signed-off-by: Peng Ma peng.ma@nxp.com Signed-off-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20240201215007.439503-1-Frank.Li@nxp.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/fsl-qdma.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/dma/fsl-qdma.c +++ b/drivers/dma/fsl-qdma.c @@ -109,6 +109,7 @@ #define FSL_QDMA_CMD_WTHROTL_OFFSET 20 #define FSL_QDMA_CMD_DSEN_OFFSET 19 #define FSL_QDMA_CMD_LWC_OFFSET 16 +#define FSL_QDMA_CMD_PF BIT(17)
/* Field definition for Descriptor status */ #define QDMA_CCDF_STATUS_RTE BIT(5) @@ -384,7 +385,8 @@ static void fsl_qdma_comp_fill_memcpy(st qdma_csgf_set_f(csgf_dest, len); /* Descriptor Buffer */ cmd = cpu_to_le32(FSL_QDMA_CMD_RWTTYPE << - FSL_QDMA_CMD_RWTTYPE_OFFSET); + FSL_QDMA_CMD_RWTTYPE_OFFSET) | + FSL_QDMA_CMD_PF; sdf->data = QDMA_SDDF_CMD(cmd);
cmd = cpu_to_le32(FSL_QDMA_CMD_RWTTYPE <<
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tadeusz Struk tstruk@gigaio.com
commit df2515a17914ecfc2a0594509deaf7fcb8d191ac upstream.
The PTDMA driver sets DMA masks in two different places for the same device inconsistently. First call is in pt_pci_probe(), where it uses 48bit mask. The second call is in pt_dmaengine_register(), where it uses a 64bit mask. Using 64bit dma mask causes IO_PAGE_FAULT errors on DMA transfers between main memory and other devices. Without the extra call it works fine. Additionally the second call doesn't check the return value so it can silently fail. Remove the superfluous dma_set_mask() call and only use 48bit mask.
Cc: stable@vger.kernel.org Fixes: b0b4a6b10577 ("dmaengine: ptdma: register PTDMA controller as a DMA resource") Reviewed-by: Basavaraj Natikar Basavaraj.Natikar@amd.com Signed-off-by: Tadeusz Struk tstruk@gigaio.com Link: https://lore.kernel.org/r/20240222163053.13842-1-tstruk@gigaio.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/ptdma/ptdma-dmaengine.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/dma/ptdma/ptdma-dmaengine.c +++ b/drivers/dma/ptdma/ptdma-dmaengine.c @@ -361,8 +361,6 @@ int pt_dmaengine_register(struct pt_devi chan->vc.desc_free = pt_do_cleanup; vchan_init(&chan->vc, dma_dev);
- dma_set_mask_and_coherent(pt->dev, DMA_BIT_MASK(64)); - ret = dma_async_device_register(dma_dev); if (ret) goto err_reg;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Curtis Klein curtis.klein@hpe.com
commit 87a39071e0b639f45e05d296cc0538eef44ec0bd upstream.
Initialize the qDMA irqs after the registers are configured so that interrupts that may have been pending from a primary kernel don't get processed by the irq handler before it is ready to and cause panic with the following trace:
Call trace: fsl_qdma_queue_handler+0xf8/0x3e8 __handle_irq_event_percpu+0x78/0x2b0 handle_irq_event_percpu+0x1c/0x68 handle_irq_event+0x44/0x78 handle_fasteoi_irq+0xc8/0x178 generic_handle_irq+0x24/0x38 __handle_domain_irq+0x90/0x100 gic_handle_irq+0x5c/0xb8 el1_irq+0xb8/0x180 _raw_spin_unlock_irqrestore+0x14/0x40 __setup_irq+0x4bc/0x798 request_threaded_irq+0xd8/0x190 devm_request_threaded_irq+0x74/0xe8 fsl_qdma_probe+0x4d4/0xca8 platform_drv_probe+0x50/0xa0 really_probe+0xe0/0x3f8 driver_probe_device+0x64/0x130 device_driver_attach+0x6c/0x78 __driver_attach+0xbc/0x158 bus_for_each_dev+0x5c/0x98 driver_attach+0x20/0x28 bus_add_driver+0x158/0x220 driver_register+0x60/0x110 __platform_driver_register+0x44/0x50 fsl_qdma_driver_init+0x18/0x20 do_one_initcall+0x48/0x258 kernel_init_freeable+0x1a4/0x23c kernel_init+0x10/0xf8 ret_from_fork+0x10/0x18
Cc: stable@vger.kernel.org Fixes: b092529e0aa0 ("dmaengine: fsl-qdma: Add qDMA controller driver for Layerscape SoCs") Signed-off-by: Curtis Klein curtis.klein@hpe.com Signed-off-by: Yi Zhao yi.zhao@nxp.com Signed-off-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20240201220406.440145-1-Frank.Li@nxp.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/fsl-qdma.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-)
--- a/drivers/dma/fsl-qdma.c +++ b/drivers/dma/fsl-qdma.c @@ -1203,10 +1203,6 @@ static int fsl_qdma_probe(struct platfor if (!fsl_qdma->queue) return -ENOMEM;
- ret = fsl_qdma_irq_init(pdev, fsl_qdma); - if (ret) - return ret; - fsl_qdma->irq_base = platform_get_irq_byname(pdev, "qdma-queue0"); if (fsl_qdma->irq_base < 0) return fsl_qdma->irq_base; @@ -1245,16 +1241,19 @@ static int fsl_qdma_probe(struct platfor
platform_set_drvdata(pdev, fsl_qdma);
- ret = dma_async_device_register(&fsl_qdma->dma_dev); + ret = fsl_qdma_reg_init(fsl_qdma); if (ret) { - dev_err(&pdev->dev, - "Can't register NXP Layerscape qDMA engine.\n"); + dev_err(&pdev->dev, "Can't Initialize the qDMA engine.\n"); return ret; }
- ret = fsl_qdma_reg_init(fsl_qdma); + ret = fsl_qdma_irq_init(pdev, fsl_qdma); + if (ret) + return ret; + + ret = dma_async_device_register(&fsl_qdma->dma_dev); if (ret) { - dev_err(&pdev->dev, "Can't Initialize the qDMA engine.\n"); + dev_err(&pdev->dev, "Can't register NXP Layerscape qDMA engine.\n"); return ret; }
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Semenov ivan@semenov.dev
commit ff3206d2186d84e4f77e1378ba1d225633f17b9b upstream.
Initializing an eMMC that's connected via a 1-bit bus is current failing, if the HW (DT) informs that 4-bit bus is supported. In fact this is a regression, as we were earlier capable of falling back to 1-bit mode, when switching to 4/8-bit bus failed. Therefore, let's restore the behaviour.
Log for Samsung eMMC 5.1 chip connected via 1bit bus (only D0 pin) Before patch: [134509.044225] mmc0: switch to bus width 4 failed [134509.044509] mmc0: new high speed MMC card at address 0001 [134509.054594] mmcblk0: mmc0:0001 BGUF4R 29.1 GiB [134509.281602] mmc0: switch to bus width 4 failed [134509.282638] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [134509.282657] Buffer I/O error on dev mmcblk0, logical block 0, async page read [134509.284598] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [134509.284602] Buffer I/O error on dev mmcblk0, logical block 0, async page read [134509.284609] ldm_validate_partition_table(): Disk read failed. [134509.286495] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [134509.286500] Buffer I/O error on dev mmcblk0, logical block 0, async page read [134509.288303] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [134509.288308] Buffer I/O error on dev mmcblk0, logical block 0, async page read [134509.289540] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [134509.289544] Buffer I/O error on dev mmcblk0, logical block 0, async page read [134509.289553] mmcblk0: unable to read partition table [134509.289728] mmcblk0boot0: mmc0:0001 BGUF4R 31.9 MiB [134509.290283] mmcblk0boot1: mmc0:0001 BGUF4R 31.9 MiB [134509.294577] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2 [134509.295835] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [134509.295841] Buffer I/O error on dev mmcblk0, logical block 0, async page read
After patch:
[134551.089613] mmc0: switch to bus width 4 failed [134551.090377] mmc0: new high speed MMC card at address 0001 [134551.102271] mmcblk0: mmc0:0001 BGUF4R 29.1 GiB [134551.113365] mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20 p21 [134551.114262] mmcblk0boot0: mmc0:0001 BGUF4R 31.9 MiB [134551.114925] mmcblk0boot1: mmc0:0001 BGUF4R 31.9 MiB
Fixes: 577fb13199b1 ("mmc: rework selection of bus speed mode") Cc: stable@vger.kernel.org Signed-off-by: Ivan Semenov ivan@semenov.dev Link: https://lore.kernel.org/r/20240206172845.34316-1-ivan@semenov.dev Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/mmc.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -1000,10 +1000,12 @@ static int mmc_select_bus_width(struct m static unsigned ext_csd_bits[] = { EXT_CSD_BUS_WIDTH_8, EXT_CSD_BUS_WIDTH_4, + EXT_CSD_BUS_WIDTH_1, }; static unsigned bus_widths[] = { MMC_BUS_WIDTH_8, MMC_BUS_WIDTH_4, + MMC_BUS_WIDTH_1, }; struct mmc_host *host = card->host; unsigned idx, bus_width = 0;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Elad Nachman enachman@marvell.com
commit 09e23823ae9a3e2d5d20f2e1efe0d6e48cef9129 upstream.
AC5X spec says PHY init complete bit must be polled until zero. We see cases in which timeout can take longer than the standard calculation on AC5X, which is expected following the spec comment above. According to the spec, we must wait as long as it takes for that bit to toggle on AC5X. Cap that with 100 delay loops so we won't get stuck forever.
Fixes: 06c8b667ff5b ("mmc: sdhci-xenon: Add support to PHYs of Marvell Xenon SDHC") Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Elad Nachman enachman@marvell.com Link: https://lore.kernel.org/r/20240222191714.1216470-3-enachman@marvell.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-xenon-phy.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-)
--- a/drivers/mmc/host/sdhci-xenon-phy.c +++ b/drivers/mmc/host/sdhci-xenon-phy.c @@ -109,6 +109,8 @@ #define XENON_EMMC_PHY_LOGIC_TIMING_ADJUST (XENON_EMMC_PHY_REG_BASE + 0x18) #define XENON_LOGIC_TIMING_VALUE 0x00AA8977
+#define XENON_MAX_PHY_TIMEOUT_LOOPS 100 + /* * List offset of PHY registers and some special register values * in eMMC PHY 5.0 or eMMC PHY 5.1 @@ -259,18 +261,27 @@ static int xenon_emmc_phy_init(struct sd /* get the wait time */ wait /= clock; wait++; - /* wait for host eMMC PHY init completes */ - udelay(wait);
- reg = sdhci_readl(host, phy_regs->timing_adj); - reg &= XENON_PHY_INITIALIZAION; - if (reg) { + /* + * AC5X spec says bit must be polled until zero. + * We see cases in which timeout can take longer + * than the standard calculation on AC5X, which is + * expected following the spec comment above. + * According to the spec, we must wait as long as + * it takes for that bit to toggle on AC5X. + * Cap that with 100 delay loops so we won't get + * stuck here forever: + */ + + ret = read_poll_timeout(sdhci_readl, reg, + !(reg & XENON_PHY_INITIALIZAION), + wait, XENON_MAX_PHY_TIMEOUT_LOOPS * wait, + false, host, phy_regs->timing_adj); + if (ret) dev_err(mmc_dev(host->mmc), "eMMC PHY init cannot complete after %d us\n", - wait); - return -ETIMEDOUT; - } + wait * XENON_MAX_PHY_TIMEOUT_LOOPS);
- return 0; + return ret; }
#define ARMADA_3700_SOC_PAD_1_8V 0x1
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Elad Nachman enachman@marvell.com
commit 8e9f25a290ae0016353c9ea13314c95fb3207812 upstream.
Each time SD/mmc phy is initialized, at times, in some of the attempts, phy fails to completes its initialization which results into timeout error. Per the HW spec, it is a pre-requisite to ensure a stable SD clock before a phy initialization is attempted.
Fixes: 06c8b667ff5b ("mmc: sdhci-xenon: Add support to PHYs of Marvell Xenon SDHC") Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Elad Nachman enachman@marvell.com Link: https://lore.kernel.org/r/20240222200930.1277665-1-enachman@marvell.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-xenon-phy.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
--- a/drivers/mmc/host/sdhci-xenon-phy.c +++ b/drivers/mmc/host/sdhci-xenon-phy.c @@ -11,6 +11,7 @@ #include <linux/slab.h> #include <linux/delay.h> #include <linux/ktime.h> +#include <linux/iopoll.h> #include <linux/of_address.h>
#include "sdhci-pltfm.h" @@ -218,6 +219,19 @@ static int xenon_alloc_emmc_phy(struct s return 0; }
+static int xenon_check_stability_internal_clk(struct sdhci_host *host) +{ + u32 reg; + int err; + + err = read_poll_timeout(sdhci_readw, reg, reg & SDHCI_CLOCK_INT_STABLE, + 1100, 20000, false, host, SDHCI_CLOCK_CONTROL); + if (err) + dev_err(mmc_dev(host->mmc), "phy_init: Internal clock never stabilized.\n"); + + return err; +} + /* * eMMC 5.0/5.1 PHY init/re-init. * eMMC PHY init should be executed after: @@ -234,6 +248,11 @@ static int xenon_emmc_phy_init(struct sd struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host); struct xenon_emmc_phy_regs *phy_regs = priv->emmc_phy_regs;
+ int ret = xenon_check_stability_internal_clk(host); + + if (ret) + return ret; + reg = sdhci_readl(host, phy_regs->timing_adj); reg |= XENON_PHY_INITIALIZAION; sdhci_writel(host, reg, phy_regs->timing_adj);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zong Li zong.li@sifive.com
commit 680341382da56bd192ebfa4e58eaf4fec2e5bca7 upstream.
CALLER_ADDRx returns caller's address at specified level, they are used for several tracers. These macros eventually use __builtin_return_address(n) to get the caller's address if arch doesn't define their own implementation.
In RISC-V, __builtin_return_address(n) only works when n == 0, we need to walk the stack frame to get the caller's address at specified level.
data.level started from 'level + 3' due to the call flow of getting caller's address in RISC-V implementation. If we don't have additional three iteration, the level is corresponding to follows:
callsite -> return_address -> arch_stack_walk -> walk_stackframe | | | | level 3 level 2 level 1 level 0
Fixes: 10626c32e382 ("riscv/ftrace: Add basic support") Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Signed-off-by: Zong Li zong.li@sifive.com Link: https://lore.kernel.org/r/20240202015102.26251-1-zong.li@sifive.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/ftrace.h | 5 +++ arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/return_address.c | 48 +++++++++++++++++++++++++++++++++++++ 3 files changed, 55 insertions(+) create mode 100644 arch/riscv/kernel/return_address.c
--- a/arch/riscv/include/asm/ftrace.h +++ b/arch/riscv/include/asm/ftrace.h @@ -25,6 +25,11 @@
#define ARCH_SUPPORTS_FTRACE_OPS 1 #ifndef __ASSEMBLY__ + +extern void *return_address(unsigned int level); + +#define ftrace_return_address(n) return_address(n) + void MCOUNT_NAME(void); static inline unsigned long ftrace_call_adjust(unsigned long addr) { --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -7,6 +7,7 @@ ifdef CONFIG_FTRACE CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_patch.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_sbi.o = $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_return_address.o = $(CC_FLAGS_FTRACE) endif CFLAGS_syscall_table.o += $(call cc-option,-Wno-override-init,)
@@ -25,6 +26,7 @@ obj-y += irq.o obj-y += process.o obj-y += ptrace.o obj-y += reset.o +obj-y += return_address.o obj-y += setup.o obj-y += signal.o obj-y += syscall_table.o --- /dev/null +++ b/arch/riscv/kernel/return_address.c @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * This code come from arch/arm64/kernel/return_address.c + * + * Copyright (C) 2023 SiFive. + */ + +#include <linux/export.h> +#include <linux/kprobes.h> +#include <linux/stacktrace.h> + +struct return_address_data { + unsigned int level; + void *addr; +}; + +static bool save_return_addr(void *d, unsigned long pc) +{ + struct return_address_data *data = d; + + if (!data->level) { + data->addr = (void *)pc; + return false; + } + + --data->level; + + return true; +} +NOKPROBE_SYMBOL(save_return_addr); + +noinline void *return_address(unsigned int level) +{ + struct return_address_data data; + + data.level = level + 3; + data.addr = NULL; + + arch_stack_walk(save_return_addr, &data, current, NULL); + + if (!data.level) + return data.addr; + else + return NULL; + +} +EXPORT_SYMBOL_GPL(return_address); +NOKPROBE_SYMBOL(return_address);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bjorn Andersson quic_bjorande@quicinc.com
commit 2a93c6cbd5a703d44c414a3c3945a87ce11430ba upstream.
Commit 'e3e56c050ab6 ("soc: qcom: rpmhpd: Make power_on actually enable the domain")' aimed to make sure that a power-domain that is being enabled without any particular performance-state requested will at least turn the rail on, to avoid filling DeviceTree with otherwise unnecessary required-opps properties.
But in the event that aggregation happens on a disabled power-domain, with an enabled peer without performance-state, both the local and peer corner are 0. The peer's enabled_corner is not considered, with the result that the underlying (shared) resource is disabled.
One case where this can be observed is when the display stack keeps mmcx enabled (but without a particular performance-state vote) in order to access registers and sync_state happens in the rpmhpd driver. As mmcx_ao is flushed the state of the peer (mmcx) is not considered and mmcx_ao ends up turning off "mmcx.lvl" underneath mmcx. This has been observed several times, but has been painted over in DeviceTree by adding an explicit vote for the lowest non-disabled performance-state.
Fixes: e3e56c050ab6 ("soc: qcom: rpmhpd: Make power_on actually enable the domain") Reported-by: Johan Hovold johan@kernel.org Closes: https://lore.kernel.org/linux-arm-msm/ZdMwZa98L23mu3u6@hovoldconsulting.com/ Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Tested-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Reviewed-by: Stephen Boyd swboyd@chromium.org Tested-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20240226-rpmhpd-enable-corner-fix-v1-1-68c004cec48... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/qcom/rpmhpd.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/soc/qcom/rpmhpd.c +++ b/drivers/soc/qcom/rpmhpd.c @@ -351,12 +351,15 @@ static int rpmhpd_aggregate_corner(struc unsigned int active_corner, sleep_corner; unsigned int this_active_corner = 0, this_sleep_corner = 0; unsigned int peer_active_corner = 0, peer_sleep_corner = 0; + unsigned int peer_enabled_corner;
to_active_sleep(pd, corner, &this_active_corner, &this_sleep_corner);
- if (peer && peer->enabled) - to_active_sleep(peer, peer->corner, &peer_active_corner, + if (peer && peer->enabled) { + peer_enabled_corner = max(peer->corner, peer->enable_corner); + to_active_sleep(peer, peer_enabled_corner, &peer_active_corner, &peer_sleep_corner); + }
active_corner = max(this_active_corner, peer_active_corner);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Bonzini pbonzini@redhat.com
commit 6890cb1ace350b4386c8aee1343dc3b3ddd214da upstream.
MKTME repurposes the high bit of physical address to key id for encryption key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same, the valid bits in the MTRR mask register are based on the reduced number of physical address bits.
detect_tme() in arch/x86/kernel/cpu/intel.c detects TME and subtracts it from the total usable physical bits, but it is called too late. Move the call to early_init_intel() so that it is called in setup_arch(), before MTRRs are setup.
This fixes boot on TDX-enabled systems, which until now only worked with "disable_mtrr_cleanup". Without the patch, the values written to the MTRRs mask registers were 52-bit wide (e.g. 0x000fffff_80000800) and the writes failed; with the patch, the values are 46-bit wide, which matches the reduced MAXPHYADDR that is shown in /proc/cpuinfo.
Reported-by: Zixi Chen zixchen@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240131230902.1867092-3-pbonzini%40redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/intel.c | 178 ++++++++++++++++++++++---------------------- 1 file changed, 91 insertions(+), 87 deletions(-)
--- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -181,6 +181,90 @@ static bool bad_spectre_microcode(struct return false; }
+#define MSR_IA32_TME_ACTIVATE 0x982 + +/* Helpers to access TME_ACTIVATE MSR */ +#define TME_ACTIVATE_LOCKED(x) (x & 0x1) +#define TME_ACTIVATE_ENABLED(x) (x & 0x2) + +#define TME_ACTIVATE_POLICY(x) ((x >> 4) & 0xf) /* Bits 7:4 */ +#define TME_ACTIVATE_POLICY_AES_XTS_128 0 + +#define TME_ACTIVATE_KEYID_BITS(x) ((x >> 32) & 0xf) /* Bits 35:32 */ + +#define TME_ACTIVATE_CRYPTO_ALGS(x) ((x >> 48) & 0xffff) /* Bits 63:48 */ +#define TME_ACTIVATE_CRYPTO_AES_XTS_128 1 + +/* Values for mktme_status (SW only construct) */ +#define MKTME_ENABLED 0 +#define MKTME_DISABLED 1 +#define MKTME_UNINITIALIZED 2 +static int mktme_status = MKTME_UNINITIALIZED; + +static void detect_tme_early(struct cpuinfo_x86 *c) +{ + u64 tme_activate, tme_policy, tme_crypto_algs; + int keyid_bits = 0, nr_keyids = 0; + static u64 tme_activate_cpu0 = 0; + + rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate); + + if (mktme_status != MKTME_UNINITIALIZED) { + if (tme_activate != tme_activate_cpu0) { + /* Broken BIOS? */ + pr_err_once("x86/tme: configuration is inconsistent between CPUs\n"); + pr_err_once("x86/tme: MKTME is not usable\n"); + mktme_status = MKTME_DISABLED; + + /* Proceed. We may need to exclude bits from x86_phys_bits. */ + } + } else { + tme_activate_cpu0 = tme_activate; + } + + if (!TME_ACTIVATE_LOCKED(tme_activate) || !TME_ACTIVATE_ENABLED(tme_activate)) { + pr_info_once("x86/tme: not enabled by BIOS\n"); + mktme_status = MKTME_DISABLED; + return; + } + + if (mktme_status != MKTME_UNINITIALIZED) + goto detect_keyid_bits; + + pr_info("x86/tme: enabled by BIOS\n"); + + tme_policy = TME_ACTIVATE_POLICY(tme_activate); + if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128) + pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy); + + tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate); + if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) { + pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n", + tme_crypto_algs); + mktme_status = MKTME_DISABLED; + } +detect_keyid_bits: + keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate); + nr_keyids = (1UL << keyid_bits) - 1; + if (nr_keyids) { + pr_info_once("x86/mktme: enabled by BIOS\n"); + pr_info_once("x86/mktme: %d KeyIDs available\n", nr_keyids); + } else { + pr_info_once("x86/mktme: disabled by BIOS\n"); + } + + if (mktme_status == MKTME_UNINITIALIZED) { + /* MKTME is usable */ + mktme_status = MKTME_ENABLED; + } + + /* + * KeyID bits effectively lower the number of physical address + * bits. Update cpuinfo_x86::x86_phys_bits accordingly. + */ + c->x86_phys_bits -= keyid_bits; +} + static void early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable; @@ -332,6 +416,13 @@ static void early_init_intel(struct cpui */ if (detect_extended_topology_early(c) < 0) detect_ht_early(c); + + /* + * Adjust the number of physical bits early because it affects the + * valid bits of the MTRR mask registers. + */ + if (cpu_has(c, X86_FEATURE_TME)) + detect_tme_early(c); }
static void bsp_init_intel(struct cpuinfo_x86 *c) @@ -492,90 +583,6 @@ static void srat_detect_node(struct cpui #endif }
-#define MSR_IA32_TME_ACTIVATE 0x982 - -/* Helpers to access TME_ACTIVATE MSR */ -#define TME_ACTIVATE_LOCKED(x) (x & 0x1) -#define TME_ACTIVATE_ENABLED(x) (x & 0x2) - -#define TME_ACTIVATE_POLICY(x) ((x >> 4) & 0xf) /* Bits 7:4 */ -#define TME_ACTIVATE_POLICY_AES_XTS_128 0 - -#define TME_ACTIVATE_KEYID_BITS(x) ((x >> 32) & 0xf) /* Bits 35:32 */ - -#define TME_ACTIVATE_CRYPTO_ALGS(x) ((x >> 48) & 0xffff) /* Bits 63:48 */ -#define TME_ACTIVATE_CRYPTO_AES_XTS_128 1 - -/* Values for mktme_status (SW only construct) */ -#define MKTME_ENABLED 0 -#define MKTME_DISABLED 1 -#define MKTME_UNINITIALIZED 2 -static int mktme_status = MKTME_UNINITIALIZED; - -static void detect_tme(struct cpuinfo_x86 *c) -{ - u64 tme_activate, tme_policy, tme_crypto_algs; - int keyid_bits = 0, nr_keyids = 0; - static u64 tme_activate_cpu0 = 0; - - rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate); - - if (mktme_status != MKTME_UNINITIALIZED) { - if (tme_activate != tme_activate_cpu0) { - /* Broken BIOS? */ - pr_err_once("x86/tme: configuration is inconsistent between CPUs\n"); - pr_err_once("x86/tme: MKTME is not usable\n"); - mktme_status = MKTME_DISABLED; - - /* Proceed. We may need to exclude bits from x86_phys_bits. */ - } - } else { - tme_activate_cpu0 = tme_activate; - } - - if (!TME_ACTIVATE_LOCKED(tme_activate) || !TME_ACTIVATE_ENABLED(tme_activate)) { - pr_info_once("x86/tme: not enabled by BIOS\n"); - mktme_status = MKTME_DISABLED; - return; - } - - if (mktme_status != MKTME_UNINITIALIZED) - goto detect_keyid_bits; - - pr_info("x86/tme: enabled by BIOS\n"); - - tme_policy = TME_ACTIVATE_POLICY(tme_activate); - if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128) - pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy); - - tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate); - if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) { - pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n", - tme_crypto_algs); - mktme_status = MKTME_DISABLED; - } -detect_keyid_bits: - keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate); - nr_keyids = (1UL << keyid_bits) - 1; - if (nr_keyids) { - pr_info_once("x86/mktme: enabled by BIOS\n"); - pr_info_once("x86/mktme: %d KeyIDs available\n", nr_keyids); - } else { - pr_info_once("x86/mktme: disabled by BIOS\n"); - } - - if (mktme_status == MKTME_UNINITIALIZED) { - /* MKTME is usable */ - mktme_status = MKTME_ENABLED; - } - - /* - * KeyID bits effectively lower the number of physical address - * bits. Update cpuinfo_x86::x86_phys_bits accordingly. - */ - c->x86_phys_bits -= keyid_bits; -} - static void init_cpuid_fault(struct cpuinfo_x86 *c) { u64 msr; @@ -712,9 +719,6 @@ static void init_intel(struct cpuinfo_x8
init_ia32_feat_ctl(c);
- if (cpu_has(c, X86_FEATURE_TME)) - detect_tme(c); - init_intel_misc_features(c);
split_lock_init();
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit d5fbeff1ab812b6c473b6924bee8748469462e2c upstream.
This will simplify the next patch ("mptcp: process pending subflow error on close").
No functional change intended.
Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 36 ++++++++++++++++++++++++++++++++++++ net/mptcp/subflow.c | 36 ------------------------------------ 2 files changed, 36 insertions(+), 36 deletions(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -688,6 +688,42 @@ static bool __mptcp_ofo_queue(struct mpt return moved; }
+void __mptcp_error_report(struct sock *sk) +{ + struct mptcp_subflow_context *subflow; + struct mptcp_sock *msk = mptcp_sk(sk); + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + int err = sock_error(ssk); + int ssk_state; + + if (!err) + continue; + + /* only propagate errors on fallen-back sockets or + * on MPC connect + */ + if (sk->sk_state != TCP_SYN_SENT && !__mptcp_check_fallback(msk)) + continue; + + /* We need to propagate only transition to CLOSE state. + * Orphaned socket will see such state change via + * subflow_sched_work_if_closed() and that path will properly + * destroy the msk as needed. + */ + ssk_state = inet_sk_state_load(ssk); + if (ssk_state == TCP_CLOSE && !sock_flag(sk, SOCK_DEAD)) + inet_sk_state_store(sk, ssk_state); + WRITE_ONCE(sk->sk_err, -err); + + /* This barrier is coupled with smp_rmb() in mptcp_poll() */ + smp_wmb(); + sk_error_report(sk); + break; + } +} + /* In most cases we will be able to lock the mptcp socket. If its already * owned, we need to defer to the work queue to avoid ABBA deadlock. */ --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1269,42 +1269,6 @@ void mptcp_space(const struct sock *ssk, *full_space = tcp_full_space(sk); }
-void __mptcp_error_report(struct sock *sk) -{ - struct mptcp_subflow_context *subflow; - struct mptcp_sock *msk = mptcp_sk(sk); - - mptcp_for_each_subflow(msk, subflow) { - struct sock *ssk = mptcp_subflow_tcp_sock(subflow); - int err = sock_error(ssk); - int ssk_state; - - if (!err) - continue; - - /* only propagate errors on fallen-back sockets or - * on MPC connect - */ - if (sk->sk_state != TCP_SYN_SENT && !__mptcp_check_fallback(msk)) - continue; - - /* We need to propagate only transition to CLOSE state. - * Orphaned socket will see such state change via - * subflow_sched_work_if_closed() and that path will properly - * destroy the msk as needed. - */ - ssk_state = inet_sk_state_load(ssk); - if (ssk_state == TCP_CLOSE && !sock_flag(sk, SOCK_DEAD)) - inet_sk_state_store(sk, ssk_state); - sk->sk_err = -err; - - /* This barrier is coupled with smp_rmb() in mptcp_poll() */ - smp_wmb(); - sk_error_report(sk); - break; - } -} - static void subflow_error_report(struct sock *ssk) { struct sock *sk = mptcp_subflow_ctx(ssk)->conn;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit 9f1a98813b4b686482e5ef3c9d998581cace0ba6 upstream.
On incoming TCP reset, subflow closing could happen before error propagation. That in turn could cause the socket error being ignored, and a missing socket state transition, as reported by Daire-Byrne.
Address the issues explicitly checking for subflow socket error at close time. To avoid code duplication, factor-out of __mptcp_error_report() a new helper implementing the relevant bits.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/429 Fixes: 15cc10453398 ("mptcp: deliver ssk errors to msk") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 63 +++++++++++++++++++++++++++------------------------ 1 file changed, 34 insertions(+), 29 deletions(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -688,40 +688,44 @@ static bool __mptcp_ofo_queue(struct mpt return moved; }
-void __mptcp_error_report(struct sock *sk) +static bool __mptcp_subflow_error_report(struct sock *sk, struct sock *ssk) { - struct mptcp_subflow_context *subflow; - struct mptcp_sock *msk = mptcp_sk(sk); + int err = sock_error(ssk); + int ssk_state;
- mptcp_for_each_subflow(msk, subflow) { - struct sock *ssk = mptcp_subflow_tcp_sock(subflow); - int err = sock_error(ssk); - int ssk_state; + if (!err) + return false;
- if (!err) - continue; + /* only propagate errors on fallen-back sockets or + * on MPC connect + */ + if (sk->sk_state != TCP_SYN_SENT && !__mptcp_check_fallback(mptcp_sk(sk))) + return false;
- /* only propagate errors on fallen-back sockets or - * on MPC connect - */ - if (sk->sk_state != TCP_SYN_SENT && !__mptcp_check_fallback(msk)) - continue; + /* We need to propagate only transition to CLOSE state. + * Orphaned socket will see such state change via + * subflow_sched_work_if_closed() and that path will properly + * destroy the msk as needed. + */ + ssk_state = inet_sk_state_load(ssk); + if (ssk_state == TCP_CLOSE && !sock_flag(sk, SOCK_DEAD)) + inet_sk_state_store(sk, ssk_state); + WRITE_ONCE(sk->sk_err, -err); + + /* This barrier is coupled with smp_rmb() in mptcp_poll() */ + smp_wmb(); + sk_error_report(sk); + return true; +}
- /* We need to propagate only transition to CLOSE state. - * Orphaned socket will see such state change via - * subflow_sched_work_if_closed() and that path will properly - * destroy the msk as needed. - */ - ssk_state = inet_sk_state_load(ssk); - if (ssk_state == TCP_CLOSE && !sock_flag(sk, SOCK_DEAD)) - inet_sk_state_store(sk, ssk_state); - WRITE_ONCE(sk->sk_err, -err); - - /* This barrier is coupled with smp_rmb() in mptcp_poll() */ - smp_wmb(); - sk_error_report(sk); - break; - } +void __mptcp_error_report(struct sock *sk) +{ + struct mptcp_subflow_context *subflow; + struct mptcp_sock *msk = mptcp_sk(sk); + + mptcp_for_each_subflow(msk, subflow) + if (__mptcp_subflow_error_report(sk, mptcp_subflow_tcp_sock(subflow))) + break; }
/* In most cases we will be able to lock the mptcp socket. If its already @@ -2309,6 +2313,7 @@ static void __mptcp_close_ssk(struct soc /* close acquired an extra ref */ __sock_put(ssk); } + __mptcp_subflow_error_report(sk, ssk); release_sock(ssk);
sock_put(ssk);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit f6909dc1c1f4452879278128012da6c76bc186a5 upstream.
The msk socket uses to different timeout to track close related events and retransmissions. The existing helpers do not indicate clearly which timer they actually touch, making the related code quite confusing.
Change the existing helpers name to avoid such confusion. No functional change intended.
This patch is linked to the next one ("mptcp: fix dangling connection hang-up"). The two patches are supposed to be backported together.
Cc: stable@vger.kernel.org # v5.11+ Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -330,7 +330,7 @@ drop: return false; }
-static void mptcp_stop_timer(struct sock *sk) +static void mptcp_stop_rtx_timer(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk);
@@ -830,12 +830,12 @@ static void mptcp_flush_join_list(struct mptcp_sockopt_sync_all(msk); }
-static bool mptcp_timer_pending(struct sock *sk) +static bool mptcp_rtx_timer_pending(struct sock *sk) { return timer_pending(&inet_csk(sk)->icsk_retransmit_timer); }
-static void mptcp_reset_timer(struct sock *sk) +static void mptcp_reset_rtx_timer(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); unsigned long tout; @@ -1145,10 +1145,10 @@ out: __mptcp_mem_reclaim_partial(sk);
if (snd_una == READ_ONCE(msk->snd_nxt) && !msk->recovery) { - if (mptcp_timer_pending(sk) && !mptcp_data_fin_enabled(msk)) - mptcp_stop_timer(sk); + if (mptcp_rtx_timer_pending(sk) && !mptcp_data_fin_enabled(msk)) + mptcp_stop_rtx_timer(sk); } else { - mptcp_reset_timer(sk); + mptcp_reset_rtx_timer(sk); } }
@@ -1640,8 +1640,8 @@ void __mptcp_push_pending(struct sock *s
out: /* ensure the rtx timer is running */ - if (!mptcp_timer_pending(sk)) - mptcp_reset_timer(sk); + if (!mptcp_rtx_timer_pending(sk)) + mptcp_reset_rtx_timer(sk); if (copied) mptcp_check_send_data_fin(sk); } @@ -1700,8 +1700,8 @@ out: if (copied) { tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle, info.size_goal); - if (!mptcp_timer_pending(sk)) - mptcp_reset_timer(sk); + if (!mptcp_rtx_timer_pending(sk)) + mptcp_reset_rtx_timer(sk);
if (msk->snd_data_fin_enable && msk->snd_nxt + 1 == msk->write_seq) @@ -2173,7 +2173,7 @@ static void mptcp_retransmit_timer(struc sock_put(sk); }
-static void mptcp_timeout_timer(struct timer_list *t) +static void mptcp_tout_timer(struct timer_list *t) { struct sock *sk = from_timer(sk, t, sk_timer);
@@ -2465,8 +2465,8 @@ static void __mptcp_retrans(struct sock release_sock(ssk);
reset_timer: - if (!mptcp_timer_pending(sk)) - mptcp_reset_timer(sk); + if (!mptcp_rtx_timer_pending(sk)) + mptcp_reset_rtx_timer(sk); }
static void mptcp_worker(struct work_struct *work) @@ -2543,7 +2543,7 @@ static int __mptcp_init_sock(struct sock
/* re-use the csk retrans timer for MPTCP-level retrans */ timer_setup(&msk->sk.icsk_retransmit_timer, mptcp_retransmit_timer, 0); - timer_setup(&sk->sk_timer, mptcp_timeout_timer, 0); + timer_setup(&sk->sk_timer, mptcp_tout_timer, 0);
return 0; } @@ -2629,8 +2629,8 @@ void mptcp_subflow_shutdown(struct sock } else { pr_debug("Sending DATA_FIN on subflow %p", ssk); tcp_send_ack(ssk); - if (!mptcp_timer_pending(sk)) - mptcp_reset_timer(sk); + if (!mptcp_rtx_timer_pending(sk)) + mptcp_reset_rtx_timer(sk); } break; }
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
commit 3645c844902bd4e173d6704fc2a37e8746904d67 upstream.
Since the commit mentioned below, 'mptcp_join' selftests is using IPTables to add rules to the Filter table.
It is then required to have IP_NF_FILTER KConfig.
This KConfig is usually enabled by default in many defconfig, but we recently noticed that some CI were running our selftests without them enabled.
Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/config | 1 + 1 file changed, 1 insertion(+)
--- a/tools/testing/selftests/net/mptcp/config +++ b/tools/testing/selftests/net/mptcp/config @@ -17,3 +17,4 @@ CONFIG_NETFILTER_XTABLES=m CONFIG_NETFILTER_XT_MATCH_BPF=m CONFIG_NF_TABLES_IPV4=y CONFIG_NF_TABLES_IPV6=y +CONFIG_IP_NF_FILTER=m
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
commit 8c86fad2cecdc6bf7283ecd298b4d0555bd8b8aa upstream.
Since the commit mentioned below, 'mptcp_join' selftests is using IPTables to add rules to the Filter table for IPv6.
It is then required to have IP6_NF_FILTER KConfig.
This KConfig is usually enabled by default in many defconfig, but we recently noticed that some CI were running our selftests without them enabled.
Fixes: 523514ed0a99 ("selftests: mptcp: add ADD_ADDR IPv6 test cases") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/config | 1 + 1 file changed, 1 insertion(+)
--- a/tools/testing/selftests/net/mptcp/config +++ b/tools/testing/selftests/net/mptcp/config @@ -18,3 +18,4 @@ CONFIG_NETFILTER_XT_MATCH_BPF=m CONFIG_NF_TABLES_IPV4=y CONFIG_NF_TABLES_IPV6=y CONFIG_IP_NF_FILTER=m +CONFIG_IP6_NF_FILTER=m
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jean Sacren sakiwit@gmail.com
commit 59060a47ca50bbdb1d863b73667a1065873ecc06 upstream.
entry->addr.id is u8 with a range from 0 to 255 and MAX_ADDR_ID is 255. We should drop both false expressions of (entry->addr.id > MAX_ADDR_ID).
We should also remove the obsolete parentheses in the first if branch.
Use U8_MAX for MAX_ADDR_ID and add a comment to show the link to mptcp_addr_info.id as suggested by Mr. Matthieu Baerts.
Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jean Sacren sakiwit@gmail.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -38,7 +38,8 @@ struct mptcp_pm_add_entry { u8 retrans_times; };
-#define MAX_ADDR_ID 255 +/* max value of mptcp_addr_info.id */ +#define MAX_ADDR_ID U8_MAX #define BITMAP_SZ DIV_ROUND_UP(MAX_ADDR_ID + 1, BITS_PER_LONG)
struct pm_nl_pernet { @@ -854,14 +855,13 @@ find_next: entry->addr.id = find_next_zero_bit(pernet->id_bitmap, MAX_ADDR_ID + 1, pernet->next_id); - if ((!entry->addr.id || entry->addr.id > MAX_ADDR_ID) && - pernet->next_id != 1) { + if (!entry->addr.id && pernet->next_id != 1) { pernet->next_id = 1; goto find_next; } }
- if (!entry->addr.id || entry->addr.id > MAX_ADDR_ID) + if (!entry->addr.id) goto out;
__set_bit(entry->addr.id, pernet->id_bitmap);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang tanggeliang@kylinos.cn
commit 584f3894262634596532cf43a5e782e34a0ce374 upstream.
Just the same as userspace PM, a new parameter needs_id is added for in-kernel PM mptcp_pm_nl_append_new_local_addr() too.
Add a new helper mptcp_pm_has_addr_attr_id() to check whether an address ID is set from PM or not.
In mptcp_pm_nl_get_local_id(), needs_id is always true, but in mptcp_pm_nl_add_addr_doit(), pass mptcp_pm_has_addr_attr_id() to needs_it.
Fixes: efd5a4c04e18 ("mptcp: add the address ID assignment bitmap") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -823,7 +823,8 @@ static bool address_use_port(struct mptc }
static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet, - struct mptcp_pm_addr_entry *entry) + struct mptcp_pm_addr_entry *entry, + bool needs_id) { struct mptcp_pm_addr_entry *cur; unsigned int addr_max; @@ -850,7 +851,7 @@ static int mptcp_pm_nl_append_new_local_ goto out; }
- if (!entry->addr.id) { + if (!entry->addr.id && needs_id) { find_next: entry->addr.id = find_next_zero_bit(pernet->id_bitmap, MAX_ADDR_ID + 1, @@ -861,7 +862,7 @@ find_next: } }
- if (!entry->addr.id) + if (!entry->addr.id && needs_id) goto out;
__set_bit(entry->addr.id, pernet->id_bitmap); @@ -1001,7 +1002,7 @@ int mptcp_pm_nl_get_local_id(struct mptc entry->ifindex = 0; entry->flags = 0; entry->lsk = NULL; - ret = mptcp_pm_nl_append_new_local_addr(pernet, entry); + ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, true); if (ret < 0) kfree(entry);
@@ -1202,6 +1203,18 @@ next: return 0; }
+static bool mptcp_pm_has_addr_attr_id(const struct nlattr *attr, + struct genl_info *info) +{ + struct nlattr *tb[MPTCP_PM_ADDR_ATTR_MAX + 1]; + + if (!nla_parse_nested_deprecated(tb, MPTCP_PM_ADDR_ATTR_MAX, attr, + mptcp_pm_addr_policy, info->extack) && + tb[MPTCP_PM_ADDR_ATTR_ID]) + return true; + return false; +} + static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info) { struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR]; @@ -1228,7 +1241,8 @@ static int mptcp_nl_cmd_add_addr(struct return ret; } } - ret = mptcp_pm_nl_append_new_local_addr(pernet, entry); + ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, + !mptcp_pm_has_addr_attr_id(attr, info)); if (ret < 0) { GENL_SET_ERR_MSG(info, "too many addresses or duplicate one"); if (entry->lsk)
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit b9cd26f640a308ea314ad23532de9a8592cd09d2 upstream.
when inserting not contiguous data in the subflow write queue, the protocol creates a new skb and prevent the TCP stack from merging it later with already queued skbs by setting the EOR marker.
Still no push flag is explicitly set at the end of previous GSO packet, making the aggregation on the receiver side sub-optimal - and packetdrill self-tests less predictable.
Explicitly mark the end of not contiguous DSS with the push flag.
Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-4-162... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1350,6 +1350,7 @@ static int mptcp_sendmsg_frag(struct soc mpext = skb_ext_find(skb, SKB_EXT_MPTCP); if (!mptcp_skb_can_collapse_to(data_seq, skb, mpext)) { TCP_SKB_CB(skb)->eor = 1; + tcp_mark_push(tcp_sk(ssk), skb); goto alloc_skb; }
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit d6a9608af9a75d13243d217f6ce1e30e57d56ffe upstream.
Syzbot and Eric reported a lockdep splat in the subflow diag:
WARNING: possible circular locking dependency detected 6.8.0-rc4-syzkaller-00212-g40b9385dd8e6 #0 Not tainted
syz-executor.2/24141 is trying to acquire lock: ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137
but task is already holding lock: ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_diag_dump_icsk+0x39f/0x1f80 net/ipv4/inet_diag.c:1038
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&h->lhash2[i].lock){+.+.}-{2:2}: lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __inet_hash+0x335/0xbe0 net/ipv4/inet_hashtables.c:743 inet_csk_listen_start+0x23a/0x320 net/ipv4/inet_connection_sock.c:1261 __inet_listen_sk+0x2a2/0x770 net/ipv4/af_inet.c:217 inet_listen+0xa3/0x110 net/ipv4/af_inet.c:239 rds_tcp_listen_init+0x3fd/0x5a0 net/rds/tcp_listen.c:316 rds_tcp_init_net+0x141/0x320 net/rds/tcp.c:577 ops_init+0x352/0x610 net/core/net_namespace.c:136 __register_pernet_operations net/core/net_namespace.c:1214 [inline] register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1283 register_pernet_device+0x33/0x80 net/core/net_namespace.c:1370 rds_tcp_init+0x62/0xd0 net/rds/tcp.c:735 do_one_initcall+0x238/0x830 init/main.c:1236 do_initcall_level+0x157/0x210 init/main.c:1298 do_initcalls+0x3f/0x80 init/main.c:1314 kernel_init_freeable+0x42f/0x5d0 init/main.c:1551 kernel_init+0x1d/0x2a0 init/main.c:1441 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
-> #0 (k-sk_lock-AF_INET6){+.+.}-{0:0}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 lock_sock_fast include/net/sock.h:1723 [inline] subflow_get_info+0x166/0xd20 net/mptcp/diag.c:28 tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137 inet_sk_diag_fill+0x10ed/0x1e00 net/ipv4/inet_diag.c:345 inet_diag_dump_icsk+0x55b/0x1f80 net/ipv4/inet_diag.c:1061 __inet_diag_dump+0x211/0x3a0 net/ipv4/inet_diag.c:1263 inet_diag_dump_compat+0x1c1/0x2d0 net/ipv4/inet_diag.c:1371 netlink_dump+0x59b/0xc80 net/netlink/af_netlink.c:2264 __netlink_dump_start+0x5df/0x790 net/netlink/af_netlink.c:2370 netlink_dump_start include/linux/netlink.h:338 [inline] inet_diag_rcv_msg_compat+0x209/0x4c0 net/ipv4/inet_diag.c:1405 sock_diag_rcv_msg+0xe7/0x410 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667 do_syscall_64+0xf9/0x240 entry_SYSCALL_64_after_hwframe+0x6f/0x77
As noted by Eric we can break the lock dependency chain avoid dumping any extended info for the mptcp subflow listener: nothing actually useful is presented there.
Fixes: b8adb69a7d29 ("mptcp: fix lockless access in subflow ULP diag") Cc: stable@vger.kernel.org Reported-by: Eric Dumazet edumazet@google.com Closes: https://lore.kernel.org/netdev/CANn89iJ=Oecw6OZDwmSYc9HJKQ_G32uN11L+oUcMu+TO... Suggested-by: Eric Dumazet edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-9-162... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/diag.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/mptcp/diag.c +++ b/net/mptcp/diag.c @@ -21,6 +21,9 @@ static int subflow_get_info(struct sock bool slow; int err;
+ if (inet_sk_state_load(sk) == TCP_LISTEN) + return 0; + start = nla_nest_start_noflag(skb, INET_ULP_INFO_MPTCP); if (!start) return -EMSGSIZE;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
commit e21a2f17566cbd64926fb8f16323972f7a064444 upstream.
The following memory leak was reported after unbinding /dev/cachefiles:
================================================================== unreferenced object 0xffff9b674176e3c0 (size 192): comm "cachefilesd2", pid 680, jiffies 4294881224 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc ea38a44b): [<ffffffff8eb8a1a5>] kmem_cache_alloc+0x2d5/0x370 [<ffffffff8e917f86>] prepare_creds+0x26/0x2e0 [<ffffffffc002eeef>] cachefiles_determine_cache_security+0x1f/0x120 [<ffffffffc00243ec>] cachefiles_add_cache+0x13c/0x3a0 [<ffffffffc0025216>] cachefiles_daemon_write+0x146/0x1c0 [<ffffffff8ebc4a3b>] vfs_write+0xcb/0x520 [<ffffffff8ebc5069>] ksys_write+0x69/0xf0 [<ffffffff8f6d4662>] do_syscall_64+0x72/0x140 [<ffffffff8f8000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 ==================================================================
Put the reference count of cache_cred in cachefiles_daemon_unbind() to fix the problem. And also put cache_cred in cachefiles_add_cache() error branch to avoid memory leaks.
Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") CC: stable@vger.kernel.org Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240217081431.796809-1-libaokun1@huawei.com Acked-by: David Howells dhowells@redhat.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Baokun Li libaokun1@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cachefiles/bind.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/cachefiles/bind.c +++ b/fs/cachefiles/bind.c @@ -249,6 +249,8 @@ error_open_root: kmem_cache_free(cachefiles_object_jar, fsdef); error_root_object: cachefiles_end_secure(cache, saved_cred); + put_cred(cache->cache_cred); + cache->cache_cred = NULL; pr_err("Failed to register: %d\n", ret); return ret; } @@ -269,6 +271,7 @@ void cachefiles_daemon_unbind(struct cac
dput(cache->graveyard); mntput(cache->mnt); + put_cred(cache->cache_cred);
kfree(cache->rootdirname); kfree(cache->secctx);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oscar Salvador osalvador@suse.de
commit 79d72c68c58784a3e1cd2378669d51bfd0cb7498 upstream.
When configuring a hugetlb filesystem via the fsconfig() syscall, there is a possible NULL dereference in hugetlbfs_fill_super() caused by assigning NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize is non valid.
E.g: Taking the following steps:
fd = fsopen("hugetlbfs", FSOPEN_CLOEXEC); fsconfig(fd, FSCONFIG_SET_STRING, "pagesize", "1024", 0); fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
Given that the requested "pagesize" is invalid, ctxt->hstate will be replaced with NULL, losing its previous value, and we will print an error:
... ... case Opt_pagesize: ps = memparse(param->string, &rest); ctx->hstate = h; if (!ctx->hstate) { pr_err("Unsupported page size %lu MB\n", ps / SZ_1M); return -EINVAL; } return 0; ... ...
This is a problem because later on, we will dereference ctxt->hstate in hugetlbfs_fill_super()
... ... sb->s_blocksize = huge_page_size(ctx->hstate); ... ...
Causing below Oops.
Fix this by replacing cxt->hstate value only when then pagesize is known to be valid.
kernel: hugetlbfs: Unsupported page size 0 MB kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0 kernel: Oops: 0000 [#1] PREEMPT SMP PTI kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G E 6.8.0-rc2-default+ #22 5a47c3fef76212addcc6eb71344aabc35190ae8f kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0 kernel: Call Trace: kernel: <TASK> kernel: ? __die_body+0x1a/0x60 kernel: ? page_fault_oops+0x16f/0x4a0 kernel: ? search_bpf_extables+0x65/0x70 kernel: ? fixup_exception+0x22/0x310 kernel: ? exc_page_fault+0x69/0x150 kernel: ? asm_exc_page_fault+0x22/0x30 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: ? hugetlbfs_fill_super+0xb4/0x1a0 kernel: ? hugetlbfs_fill_super+0x28/0x1a0 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: vfs_get_super+0x40/0xa0 kernel: ? __pfx_bpf_lsm_capable+0x10/0x10 kernel: vfs_get_tree+0x25/0xd0 kernel: vfs_cmd_create+0x64/0xe0 kernel: __x64_sys_fsconfig+0x395/0x410 kernel: do_syscall_64+0x80/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? exc_page_fault+0x69/0x150 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 kernel: RIP: 0033:0x7ffbc0cb87c9 kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48 kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbc0cb87c9 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 kernel: RBP: 00007ffc29d2f3b0 R08: 0000000000000000 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 kernel: R13: 00007ffc29d2f4c0 R14: 0000000000000000 R15: 0000000000000000 kernel: </TASK> kernel: Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) netfs(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) intel_pmc_bxt(E) sb_edac(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) rfkill(E) ipmi_ssif(E) kvm(E) acpi_ipmi(E) irqbypass(E) pcspkr(E) igb(E) ipmi_si(E) mei_me(E) i2c_i801(E) joydev(E) intel_pch_thermal(E) i2c_smbus(E) dca(E) lpc_ich(E) mei(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) tiny_power_button(E) button(E) fuse(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) t10_pi(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) polyval_clmulni(E) ahci(E) xhci_pci(E) polyval_generic(E) gf128mul(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) xhci_pci_renesas(E) libahci(E) ehci_pci(E) sha1_ssse3(E) xhci_hcd(E) ehci_hcd(E) libata(E) kernel: mgag200(E) i2c_algo_bit(E) usbcore(E) wmi(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) aesni_intel(E) crypto_simd(E) cryptd(E) kernel: Unloaded tainted modules: acpi_cpufreq(E):1 fjes(E):1 kernel: CR2: 0000000000000028 kernel: ---[ end trace 0000000000000000 ]--- kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0
Link: https://lkml.kernel.org/r/20240130210418.3771-1-osalvador@suse.de Fixes: 32021982a324 ("hugetlbfs: Convert to fs_context") Signed-off-by: Michal Hocko mhocko@suse.com Signed-off-by: Oscar Salvador osalvador@suse.de Acked-by: Muchun Song muchun.song@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Vamsi Krishna Brahmajosyula vamsi-krishna.brahmajosyula@broadcom.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/hugetlbfs/inode.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1234,6 +1234,7 @@ static int hugetlbfs_parse_param(struct { struct hugetlbfs_fs_context *ctx = fc->fs_private; struct fs_parse_result result; + struct hstate *h; char *rest; unsigned long ps; int opt; @@ -1278,11 +1279,12 @@ static int hugetlbfs_parse_param(struct
case Opt_pagesize: ps = memparse(param->string, &rest); - ctx->hstate = size_to_hstate(ps); - if (!ctx->hstate) { + h = size_to_hstate(ps); + if (!h) { pr_err("Unsupported page size %lu MB\n", ps >> 20); return -EINVAL; } + ctx->hstate = h; return 0;
case Opt_min_size:
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Krummenacher max.krummenacher@toradex.com
This reverts commit ef4a40953c8076626875ff91c41e210fcee7a6fd which is commit d89078c37b10f05fa4f4791b71db2572db361b68 upstream.
The commit was applied to make further commits apply cleanly, but the commit depends on other commits in the same patchset. I.e. the controlling DSI host would need a change too. Thus one would need to backport the full patchset changing the DSI hosts and all downstream DSI device drivers.
Revert the commit and fix up the conflicts with the backported fixes to the lt8912b driver.
Signed-off-by: Max Krummenacher max.krummenacher@toradex.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/bridge/lontium-lt8912b.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
--- a/drivers/gpu/drm/bridge/lontium-lt8912b.c +++ b/drivers/gpu/drm/bridge/lontium-lt8912b.c @@ -571,6 +571,10 @@ static int lt8912_bridge_attach(struct d if (ret) goto error;
+ ret = lt8912_attach_dsi(lt); + if (ret) + goto error; + return 0;
error: @@ -726,15 +730,8 @@ static int lt8912_probe(struct i2c_clien
drm_bridge_add(<->bridge);
- ret = lt8912_attach_dsi(lt); - if (ret) - goto err_attach; - return 0;
-err_attach: - drm_bridge_remove(<->bridge); - lt8912_free_i2c(lt); err_i2c: lt8912_put_dt(lt); err_dt_parse:
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
commit aa82ac51d63328714645c827775d64dbfd9941f3 upstream.
syzbot reported another task hung in __unix_gc(). [0]
The current while loop assumes that all of the left candidates have oob_skb and calling kfree_skb(oob_skb) releases the remaining candidates.
However, I missed a case that oob_skb has self-referencing fd and another fd and the latter sk is placed before the former in the candidate list. Then, the while loop never proceeds, resulting the task hung.
__unix_gc() has the same loop just before purging the collected skb, so we can call kfree_skb(oob_skb) there and let __skb_queue_purge() release all inflight sockets.
[0]: Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 2784 Comm: kworker/u4:8 Not tainted 6.8.0-rc4-syzkaller-01028-g71b605d32017 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Workqueue: events_unbound __unix_gc RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x70 kernel/kcov.c:200 Code: 89 fb e8 23 00 00 00 48 8b 3d 84 f5 1a 0c 48 89 de 5b e9 43 26 57 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 0f 1e fa 48 8b 04 24 65 48 8b 0d 90 52 70 7e 65 8b 15 91 52 70 RSP: 0018:ffffc9000a17fa78 EFLAGS: 00000287 RAX: ffffffff8a0a6108 RBX: ffff88802b6c2640 RCX: ffff88802c0b3b80 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 RBP: ffffc9000a17fbf0 R08: ffffffff89383f1d R09: 1ffff1100ee5ff84 R10: dffffc0000000000 R11: ffffed100ee5ff85 R12: 1ffff110056d84ee R13: ffffc9000a17fae0 R14: 0000000000000000 R15: ffffffff8f47b840 FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffef5687ff8 CR3: 0000000029b34000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <NMI> </NMI> <TASK> __unix_gc+0xe69/0xf40 net/unix/garbage.c:343 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x913/0x1420 kernel/workqueue.c:2706 worker_thread+0xa5f/0x1000 kernel/workqueue.c:2787 kthread+0x2ef/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242 </TASK>
Reported-and-tested-by: syzbot+ecab4d36f920c3574bf9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ecab4d36f920c3574bf9 Fixes: 25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/unix/garbage.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
--- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -284,9 +284,17 @@ void unix_gc(void) * which are creating the cycle(s). */ skb_queue_head_init(&hitlist); - list_for_each_entry(u, &gc_candidates, link) + list_for_each_entry(u, &gc_candidates, link) { scan_children(&u->sk, inc_inflight, &hitlist);
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + if (u->oob_skb) { + kfree_skb(u->oob_skb); + u->oob_skb = NULL; + } +#endif + } + /* not_cycle_list contains those sockets which do not make up a * cycle. Restore these to the inflight list. */ @@ -314,18 +322,6 @@ void unix_gc(void) /* Here we are. Hitlist is filled. Die. */ __skb_queue_purge(&hitlist);
-#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - while (!list_empty(&gc_candidates)) { - u = list_entry(gc_candidates.next, struct unix_sock, link); - if (u->oob_skb) { - struct sk_buff *skb = u->oob_skb; - - u->oob_skb = NULL; - kfree_skb(skb); - } - } -#endif - spin_lock(&unix_gc_lock);
/* There could be io_uring registered files, just push them back to
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arturas Moskvinas arturas.moskvinas@gmail.com
[ Upstream commit 530b1dbd97846b110ea8a94c7cc903eca21786e5 ]
Chip outputs are enabled[1] before actual reset is performed[2] which might cause pin output value to flip flop if previous pin value was set to 1. Fix that behavior by making sure chip is fully reset before all outputs are enabled.
Flip-flop can be noticed when module is removed and inserted again and one of the pins was changed to 1 before removal. 100 microsecond flipping is noticeable on oscilloscope (100khz SPI bus).
For a properly reset chip - output is enabled around 100 microseconds (on 100khz SPI bus) later during probing process hence should be irrelevant behavioral change.
Fixes: 7ebc194d0fd4 (gpio: 74x164: Introduce 'enable-gpios' property) Link: https://elixir.bootlin.com/linux/v6.7.4/source/drivers/gpio/gpio-74x164.c#L1... [1] Link: https://elixir.bootlin.com/linux/v6.7.4/source/drivers/gpio/gpio-74x164.c#L1... [2] Signed-off-by: Arturas Moskvinas arturas.moskvinas@gmail.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-74x164.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpio-74x164.c b/drivers/gpio/gpio-74x164.c index 4a55cdf089d62..ebfa0a8e57dec 100644 --- a/drivers/gpio/gpio-74x164.c +++ b/drivers/gpio/gpio-74x164.c @@ -127,8 +127,6 @@ static int gen_74x164_probe(struct spi_device *spi) if (IS_ERR(chip->gpiod_oe)) return PTR_ERR(chip->gpiod_oe);
- gpiod_set_value_cansleep(chip->gpiod_oe, 1); - spi_set_drvdata(spi, chip);
chip->gpio_chip.label = spi->modalias; @@ -153,6 +151,8 @@ static int gen_74x164_probe(struct spi_device *spi) goto exit_destroy; }
+ gpiod_set_value_cansleep(chip->gpiod_oe, 1); + ret = gpiochip_add_data(&chip->gpio_chip, chip); if (!ret) return 0;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit e4aec4daa8c009057b5e063db1b7322252c92dc8 ]
After shuffling the code, error path wasn't updated correctly. Fix it here.
Fixes: 2f4133bb5f14 ("gpiolib: No need to call gpiochip_remove_pin_ranges() twice") Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpiolib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index f9fdd117c654c..4245ece1acb60 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -805,11 +805,11 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, gpiochip_irqchip_free_valid_mask(gc); err_remove_acpi_chip: acpi_gpiochip_remove(gc); + gpiochip_remove_pin_ranges(gc); err_remove_of_chip: gpiochip_free_hogs(gc); of_gpiochip_remove(gc); err_free_gpiochip_mask: - gpiochip_remove_pin_ranges(gc); gpiochip_free_valid_mask(gc); if (gdev->dev.release) { /* release() has been registered by gpiochip_setup_dev() */
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bartosz Golaszewski bartosz.golaszewski@linaro.org
[ Upstream commit ec5c54a9d3c4f9c15e647b049fea401ee5258696 ]
Hogs are added *after* ACPI so should be removed *before* in error path.
Fixes: a411e81e61df ("gpiolib: add hogs support for machine code") Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpiolib.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 4245ece1acb60..34a061b4becdb 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -774,11 +774,11 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
ret = gpiochip_irqchip_init_valid_mask(gc); if (ret) - goto err_remove_acpi_chip; + goto err_free_hogs;
ret = gpiochip_irqchip_init_hw(gc); if (ret) - goto err_remove_acpi_chip; + goto err_remove_irqchip_mask;
ret = gpiochip_add_irqchip(gc, lock_key, request_key); if (ret) @@ -803,11 +803,11 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, gpiochip_irqchip_remove(gc); err_remove_irqchip_mask: gpiochip_irqchip_free_valid_mask(gc); -err_remove_acpi_chip: +err_free_hogs: + gpiochip_free_hogs(gc); acpi_gpiochip_remove(gc); gpiochip_remove_pin_ranges(gc); err_remove_of_chip: - gpiochip_free_hogs(gc); of_gpiochip_remove(gc); err_free_gpiochip_mask: gpiochip_free_valid_mask(gc);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 9be2957f014d91088db1eb5dd09d9a03d7184dce which is commit af42269c3523492d71ebbe11fefae2653e9cdc78 upstream.
It is reported to cause boot crashes in Android systems, so revert it from the stable trees for now.
Cc: Rob Clark robdclark@chromium.org Cc: Georgi Djakov djakov@kernel.org Cc: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/interconnect/core.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
--- a/drivers/interconnect/core.c +++ b/drivers/interconnect/core.c @@ -30,7 +30,6 @@ static LIST_HEAD(icc_providers); static int providers_count; static bool synced_state; static DEFINE_MUTEX(icc_lock); -static DEFINE_MUTEX(icc_bw_lock); static struct dentry *icc_debugfs_dir;
static void icc_summary_show_one(struct seq_file *s, struct icc_node *n) @@ -637,7 +636,7 @@ int icc_set_bw(struct icc_path *path, u3 if (WARN_ON(IS_ERR(path) || !path->num_nodes)) return -EINVAL;
- mutex_lock(&icc_bw_lock); + mutex_lock(&icc_lock);
old_avg = path->reqs[0].avg_bw; old_peak = path->reqs[0].peak_bw; @@ -669,7 +668,7 @@ int icc_set_bw(struct icc_path *path, u3 apply_constraints(path); }
- mutex_unlock(&icc_bw_lock); + mutex_unlock(&icc_lock);
trace_icc_set_bw_end(path, ret);
@@ -972,7 +971,6 @@ void icc_node_add(struct icc_node *node, return;
mutex_lock(&icc_lock); - mutex_lock(&icc_bw_lock);
node->provider = provider; list_add_tail(&node->node_list, &provider->nodes); @@ -998,7 +996,6 @@ void icc_node_add(struct icc_node *node, node->avg_bw = 0; node->peak_bw = 0;
- mutex_unlock(&icc_bw_lock); mutex_unlock(&icc_lock); } EXPORT_SYMBOL_GPL(icc_node_add); @@ -1126,7 +1123,6 @@ void icc_sync_state(struct device *dev) return;
mutex_lock(&icc_lock); - mutex_lock(&icc_bw_lock); synced_state = true; list_for_each_entry(p, &icc_providers, provider_list) { dev_dbg(p->dev, "interconnect provider is in synced state\n");
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit e3a29b80e9e6df217dd61c670ac42864fa4a0e67 which is commit 13619170303878e1dae86d9a58b039475c957fcf upstream.
It is reported to cause boot crashes in Android systems, so revert it from the stable trees for now.
Cc: Rob Clark robdclark@chromium.org Cc: Georgi Djakov djakov@kernel.org Cc: Guenter Roeck linux@roeck-us.net Cc: Jon Hunter jonathanh@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/interconnect/core.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
--- a/drivers/interconnect/core.c +++ b/drivers/interconnect/core.c @@ -1135,21 +1135,13 @@ void icc_sync_state(struct device *dev) } } } - mutex_unlock(&icc_bw_lock); mutex_unlock(&icc_lock); } EXPORT_SYMBOL_GPL(icc_sync_state);
static int __init icc_init(void) { - struct device_node *root; - - /* Teach lockdep about lock ordering wrt. shrinker: */ - fs_reclaim_acquire(GFP_KERNEL); - might_lock(&icc_bw_lock); - fs_reclaim_release(GFP_KERNEL); - - root = of_find_node_by_path("/"); + struct device_node *root = of_find_node_by_path("/");
providers_count = of_count_icc_providers(root); of_node_put(root);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin KaFai Lau martin.lau@kernel.org
commit 31de4105f00d64570139bc5494a201b0bd57349f upstream.
The bpf_fib_lookup() also looks up the neigh table. This was done before bpf_redirect_neigh() was added.
In the use case that does not manage the neigh table and requires bpf_fib_lookup() to lookup a fib to decide if it needs to redirect or not, the bpf prog can depend only on using bpf_redirect_neigh() to lookup the neigh. It also keeps the neigh entries fresh and connected.
This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid the double neigh lookup when the bpf prog always call bpf_redirect_neigh() to do the neigh lookup. The params->smac output is skipped together when SKIP_NEIGH is set because bpf_redirect_neigh() will figure out the smac also.
Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/uapi/linux/bpf.h | 6 ++++++ net/core/filter.c | 39 ++++++++++++++++++++++++++------------- tools/include/uapi/linux/bpf.h | 6 ++++++ 3 files changed, 38 insertions(+), 13 deletions(-)
--- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3014,6 +3014,11 @@ union bpf_attr { * **BPF_FIB_LOOKUP_OUTPUT** * Perform lookup from an egress perspective (default is * ingress). + * **BPF_FIB_LOOKUP_SKIP_NEIGH** + * Skip the neighbour table lookup. *params*->dmac + * and *params*->smac will not be set as output. A common + * use case is to call **bpf_redirect_neigh**\ () after + * doing **bpf_fib_lookup**\ (). * * *ctx* is either **struct xdp_md** for XDP programs or * **struct sk_buff** tc cls_act programs. @@ -6040,6 +6045,7 @@ struct bpf_raw_tracepoint_args { enum { BPF_FIB_LOOKUP_DIRECT = (1U << 0), BPF_FIB_LOOKUP_OUTPUT = (1U << 1), + BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), };
enum { --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5392,12 +5392,8 @@ static const struct bpf_func_proto bpf_s #endif
#if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6) -static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, - const struct neighbour *neigh, - const struct net_device *dev, u32 mtu) +static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, u32 mtu) { - memcpy(params->dmac, neigh->ha, ETH_ALEN); - memcpy(params->smac, dev->dev_addr, ETH_ALEN); params->h_vlan_TCI = 0; params->h_vlan_proto = 0; if (mtu) @@ -5508,21 +5504,29 @@ static int bpf_ipv4_fib_lookup(struct ne if (likely(nhc->nhc_gw_family != AF_INET6)) { if (nhc->nhc_gw_family) params->ipv4_dst = nhc->nhc_gw.ipv4; - - neigh = __ipv4_neigh_lookup_noref(dev, - (__force u32)params->ipv4_dst); } else { struct in6_addr *dst = (struct in6_addr *)params->ipv6_dst;
params->family = AF_INET6; *dst = nhc->nhc_gw.ipv6; - neigh = __ipv6_neigh_lookup_noref_stub(dev, dst); }
+ if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) + goto set_fwd_params; + + if (likely(nhc->nhc_gw_family != AF_INET6)) + neigh = __ipv4_neigh_lookup_noref(dev, + (__force u32)params->ipv4_dst); + else + neigh = __ipv6_neigh_lookup_noref_stub(dev, params->ipv6_dst); + if (!neigh || !(neigh->nud_state & NUD_VALID)) return BPF_FIB_LKUP_RET_NO_NEIGH; + memcpy(params->dmac, neigh->ha, ETH_ALEN); + memcpy(params->smac, dev->dev_addr, ETH_ALEN);
- return bpf_fib_set_fwd_params(params, neigh, dev, mtu); +set_fwd_params: + return bpf_fib_set_fwd_params(params, mtu); } #endif
@@ -5630,24 +5634,33 @@ static int bpf_ipv6_fib_lookup(struct ne params->rt_metric = res.f6i->fib6_metric; params->ifindex = dev->ifindex;
+ if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) + goto set_fwd_params; + /* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is * not needed here. */ neigh = __ipv6_neigh_lookup_noref_stub(dev, dst); if (!neigh || !(neigh->nud_state & NUD_VALID)) return BPF_FIB_LKUP_RET_NO_NEIGH; + memcpy(params->dmac, neigh->ha, ETH_ALEN); + memcpy(params->smac, dev->dev_addr, ETH_ALEN);
- return bpf_fib_set_fwd_params(params, neigh, dev, mtu); +set_fwd_params: + return bpf_fib_set_fwd_params(params, mtu); } #endif
+#define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \ + BPF_FIB_LOOKUP_SKIP_NEIGH) + BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx, struct bpf_fib_lookup *, params, int, plen, u32, flags) { if (plen < sizeof(*params)) return -EINVAL;
- if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) + if (flags & ~BPF_FIB_LOOKUP_MASK) return -EINVAL;
switch (params->family) { @@ -5685,7 +5698,7 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk if (plen < sizeof(*params)) return -EINVAL;
- if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) + if (flags & ~BPF_FIB_LOOKUP_MASK) return -EINVAL;
if (params->tot_len) --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3014,6 +3014,11 @@ union bpf_attr { * **BPF_FIB_LOOKUP_OUTPUT** * Perform lookup from an egress perspective (default is * ingress). + * **BPF_FIB_LOOKUP_SKIP_NEIGH** + * Skip the neighbour table lookup. *params*->dmac + * and *params*->smac will not be set as output. A common + * use case is to call **bpf_redirect_neigh**\ () after + * doing **bpf_fib_lookup**\ (). * * *ctx* is either **struct xdp_md** for XDP programs or * **struct sk_buff** tc cls_act programs. @@ -6040,6 +6045,7 @@ struct bpf_raw_tracepoint_args { enum { BPF_FIB_LOOKUP_DIRECT = (1U << 0), BPF_FIB_LOOKUP_OUTPUT = (1U << 1), + BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), };
enum {
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Louis DeLosSantos louis.delos.devel@gmail.com
commit 8ad77e72caae22a1ddcfd0c03f2884929e93b7a4 upstream.
Add ability to specify routing table ID to the `bpf_fib_lookup` BPF helper.
A new field `tbid` is added to `struct bpf_fib_lookup` used as parameters to the `bpf_fib_lookup` BPF helper.
When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` and `BPF_FIB_LOOKUP_TBID` flags the `tbid` field in `struct bpf_fib_lookup` will be used as the table ID for the fib lookup.
If the `tbid` does not exist the fib lookup will fail with `BPF_FIB_LKUP_RET_NOT_FWDED`.
The `tbid` field becomes a union over the vlan related output fields in `struct bpf_fib_lookup` and will be zeroed immediately after usage.
This functionality is useful in containerized environments.
For instance, if a CNI wants to dictate the next-hop for traffic leaving a container it can create a container-specific routing table and perform a fib lookup against this table in a "host-net-namespace-side" TC program.
This functionality also allows `ip rule` like functionality at the TC layer, allowing an eBPF program to pick a routing table based on some aspect of the sk_buff.
As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN datapath.
When egress traffic leaves a Pod an eBPF program attached by Cilium will determine which VRF the egress traffic should target, and then perform a FIB lookup in a specific table representing this VRF's FIB.
Signed-off-by: Louis DeLosSantos louis.delos.devel@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20230505-bpf-add-tbid-fib-lookup-v2-1-0a31c22c74... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/uapi/linux/bpf.h | 21 ++++++++++++++++++--- net/core/filter.c | 14 +++++++++++++- tools/include/uapi/linux/bpf.h | 21 ++++++++++++++++++--- 3 files changed, 49 insertions(+), 7 deletions(-)
--- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3011,6 +3011,10 @@ union bpf_attr { * **BPF_FIB_LOOKUP_DIRECT** * Do a direct table lookup vs full lookup using FIB * rules. + * **BPF_FIB_LOOKUP_TBID** + * Used with BPF_FIB_LOOKUP_DIRECT. + * Use the routing table ID present in *params*->tbid + * for the fib lookup. * **BPF_FIB_LOOKUP_OUTPUT** * Perform lookup from an egress perspective (default is * ingress). @@ -6046,6 +6050,7 @@ enum { BPF_FIB_LOOKUP_DIRECT = (1U << 0), BPF_FIB_LOOKUP_OUTPUT = (1U << 1), BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), + BPF_FIB_LOOKUP_TBID = (1U << 3), };
enum { @@ -6106,9 +6111,19 @@ struct bpf_fib_lookup { __u32 ipv6_dst[4]; /* in6_addr; network order */ };
- /* output */ - __be16 h_vlan_proto; - __be16 h_vlan_TCI; + union { + struct { + /* output */ + __be16 h_vlan_proto; + __be16 h_vlan_TCI; + }; + /* input: when accompanied with the + * 'BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_TBID` flags, a + * specific routing table to use for the fib lookup. + */ + __u32 tbid; + }; + __u8 smac[6]; /* ETH_ALEN */ __u8 dmac[6]; /* ETH_ALEN */ }; --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5447,6 +5447,12 @@ static int bpf_ipv4_fib_lookup(struct ne u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; struct fib_table *tb;
+ if (flags & BPF_FIB_LOOKUP_TBID) { + tbid = params->tbid; + /* zero out for vlan output */ + params->tbid = 0; + } + tb = fib_get_table(net, tbid); if (unlikely(!tb)) return BPF_FIB_LKUP_RET_NOT_FWDED; @@ -5580,6 +5586,12 @@ static int bpf_ipv6_fib_lookup(struct ne u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; struct fib6_table *tb;
+ if (flags & BPF_FIB_LOOKUP_TBID) { + tbid = params->tbid; + /* zero out for vlan output */ + params->tbid = 0; + } + tb = ipv6_stub->fib6_get_table(net, tbid); if (unlikely(!tb)) return BPF_FIB_LKUP_RET_NOT_FWDED; @@ -5652,7 +5664,7 @@ set_fwd_params: #endif
#define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \ - BPF_FIB_LOOKUP_SKIP_NEIGH) + BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID)
BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx, struct bpf_fib_lookup *, params, int, plen, u32, flags) --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3011,6 +3011,10 @@ union bpf_attr { * **BPF_FIB_LOOKUP_DIRECT** * Do a direct table lookup vs full lookup using FIB * rules. + * **BPF_FIB_LOOKUP_TBID** + * Used with BPF_FIB_LOOKUP_DIRECT. + * Use the routing table ID present in *params*->tbid + * for the fib lookup. * **BPF_FIB_LOOKUP_OUTPUT** * Perform lookup from an egress perspective (default is * ingress). @@ -6046,6 +6050,7 @@ enum { BPF_FIB_LOOKUP_DIRECT = (1U << 0), BPF_FIB_LOOKUP_OUTPUT = (1U << 1), BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), + BPF_FIB_LOOKUP_TBID = (1U << 3), };
enum { @@ -6106,9 +6111,19 @@ struct bpf_fib_lookup { __u32 ipv6_dst[4]; /* in6_addr; network order */ };
- /* output */ - __be16 h_vlan_proto; - __be16 h_vlan_TCI; + union { + struct { + /* output */ + __be16 h_vlan_proto; + __be16 h_vlan_TCI; + }; + /* input: when accompanied with the + * 'BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_TBID` flags, a + * specific routing table to use for the fib lookup. + */ + __u32 tbid; + }; + __u8 smac[6]; /* ETH_ALEN */ __u8 dmac[6]; /* ETH_ALEN */ };
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martynas Pumputis m@lambda.lt
commit dab4e1f06cabb6834de14264394ccab197007302 upstream.
Extend the bpf_fib_lookup() helper by making it to return the source IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set.
For example, the following snippet can be used to derive the desired source IP address:
struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr };
ret = bpf_skb_fib_lookup(skb, p, sizeof(p), BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH); if (ret != BPF_FIB_LKUP_RET_SUCCESS) return TC_ACT_SHOT;
/* the p.ipv4_src now contains the source address */
The inability to derive the proper source address may cause malfunctions in BPF-based dataplanes for hosts containing netdevs with more than one routable IP address or for multi-homed hosts.
For example, Cilium implements packet masquerading in BPF. If an egressing netdev to which the Cilium's BPF prog is attached has multiple IP addresses, then only one [hardcoded] IP address can be used for masquerading. This breaks connectivity if any other IP address should have been selected instead, for example, when a public and private addresses are attached to the same egress interface.
The change was tested with Cilium [1].
Nikolay Aleksandrov helped to figure out the IPv6 addr selection.
[1]: https://github.com/cilium/cilium/pull/28283
Signed-off-by: Martynas Pumputis m@lambda.lt Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ipv6_stubs.h | 5 +++++ include/uapi/linux/bpf.h | 10 ++++++++++ net/core/filter.c | 18 +++++++++++++++++- net/ipv6/af_inet6.c | 1 + tools/include/uapi/linux/bpf.h | 10 ++++++++++ 5 files changed, 43 insertions(+), 1 deletion(-)
--- a/include/net/ipv6_stubs.h +++ b/include/net/ipv6_stubs.h @@ -81,6 +81,11 @@ struct ipv6_bpf_stub { const struct in6_addr *daddr, __be16 dport, int dif, int sdif, struct udp_table *tbl, struct sk_buff *skb); + int (*ipv6_dev_get_saddr)(struct net *net, + const struct net_device *dst_dev, + const struct in6_addr *daddr, + unsigned int prefs, + struct in6_addr *saddr); }; extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly;
--- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3023,6 +3023,11 @@ union bpf_attr { * and *params*->smac will not be set as output. A common * use case is to call **bpf_redirect_neigh**\ () after * doing **bpf_fib_lookup**\ (). + * **BPF_FIB_LOOKUP_SRC** + * Derive and set source IP addr in *params*->ipv{4,6}_src + * for the nexthop. If the src addr cannot be derived, + * **BPF_FIB_LKUP_RET_NO_SRC_ADDR** is returned. In this + * case, *params*->dmac and *params*->smac are not set either. * * *ctx* is either **struct xdp_md** for XDP programs or * **struct sk_buff** tc cls_act programs. @@ -6051,6 +6056,7 @@ enum { BPF_FIB_LOOKUP_OUTPUT = (1U << 1), BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), BPF_FIB_LOOKUP_TBID = (1U << 3), + BPF_FIB_LOOKUP_SRC = (1U << 4), };
enum { @@ -6063,6 +6069,7 @@ enum { BPF_FIB_LKUP_RET_UNSUPP_LWT, /* fwd requires encapsulation */ BPF_FIB_LKUP_RET_NO_NEIGH, /* no neighbor entry for nh */ BPF_FIB_LKUP_RET_FRAG_NEEDED, /* fragmentation required to fwd */ + BPF_FIB_LKUP_RET_NO_SRC_ADDR, /* failed to derive IP src addr */ };
struct bpf_fib_lookup { @@ -6097,6 +6104,9 @@ struct bpf_fib_lookup { __u32 rt_metric; };
+ /* input: source address to consider for lookup + * output: source address result from lookup + */ union { __be32 ipv4_src; __u32 ipv6_src[4]; /* in6_addr; network order */ --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5504,6 +5504,9 @@ static int bpf_ipv4_fib_lookup(struct ne params->rt_metric = res.fi->fib_priority; params->ifindex = dev->ifindex;
+ if (flags & BPF_FIB_LOOKUP_SRC) + params->ipv4_src = fib_result_prefsrc(net, &res); + /* xdp and cls_bpf programs are run in RCU-bh so * rcu_read_lock_bh is not needed here */ @@ -5646,6 +5649,18 @@ static int bpf_ipv6_fib_lookup(struct ne params->rt_metric = res.f6i->fib6_metric; params->ifindex = dev->ifindex;
+ if (flags & BPF_FIB_LOOKUP_SRC) { + if (res.f6i->fib6_prefsrc.plen) { + *src = res.f6i->fib6_prefsrc.addr; + } else { + err = ipv6_bpf_stub->ipv6_dev_get_saddr(net, dev, + &fl6.daddr, 0, + src); + if (err) + return BPF_FIB_LKUP_RET_NO_SRC_ADDR; + } + } + if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) goto set_fwd_params;
@@ -5664,7 +5679,8 @@ set_fwd_params: #endif
#define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \ - BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID) + BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID | \ + BPF_FIB_LOOKUP_SRC)
BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx, struct bpf_fib_lookup *, params, int, plen, u32, flags) --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -1061,6 +1061,7 @@ static const struct ipv6_stub ipv6_stub_ static const struct ipv6_bpf_stub ipv6_bpf_stub_impl = { .inet6_bind = __inet6_bind, .udp6_lib_lookup = __udp6_lib_lookup, + .ipv6_dev_get_saddr = ipv6_dev_get_saddr, };
static int __init inet6_init(void) --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3023,6 +3023,11 @@ union bpf_attr { * and *params*->smac will not be set as output. A common * use case is to call **bpf_redirect_neigh**\ () after * doing **bpf_fib_lookup**\ (). + * **BPF_FIB_LOOKUP_SRC** + * Derive and set source IP addr in *params*->ipv{4,6}_src + * for the nexthop. If the src addr cannot be derived, + * **BPF_FIB_LKUP_RET_NO_SRC_ADDR** is returned. In this + * case, *params*->dmac and *params*->smac are not set either. * * *ctx* is either **struct xdp_md** for XDP programs or * **struct sk_buff** tc cls_act programs. @@ -6051,6 +6056,7 @@ enum { BPF_FIB_LOOKUP_OUTPUT = (1U << 1), BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), BPF_FIB_LOOKUP_TBID = (1U << 3), + BPF_FIB_LOOKUP_SRC = (1U << 4), };
enum { @@ -6063,6 +6069,7 @@ enum { BPF_FIB_LKUP_RET_UNSUPP_LWT, /* fwd requires encapsulation */ BPF_FIB_LKUP_RET_NO_NEIGH, /* no neighbor entry for nh */ BPF_FIB_LKUP_RET_FRAG_NEEDED, /* fragmentation required to fwd */ + BPF_FIB_LKUP_RET_NO_SRC_ADDR, /* failed to derive IP src addr */ };
struct bpf_fib_lookup { @@ -6097,6 +6104,9 @@ struct bpf_fib_lookup { __u32 rt_metric; };
+ /* input: source address to consider for lookup + * output: source address result from lookup + */ union { __be32 ipv4_src; __u32 ipv6_src[4]; /* in6_addr; network order */
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
commit c706b2b5ed74d30436b85cbd8e63e969f6b5873a upstream.
When NIC takes care of crypto (or the record has already been decrypted) we forget to update darg->async. ->async is supposed to mean whether record is async capable on input and whether record has been queued for async crypto on output.
Reported-by: Gal Pressman gal@nvidia.com Fixes: 3547a1f9d988 ("tls: rx: use async as an in-out argument") Tested-by: Gal Pressman gal@nvidia.com Link: https://lore.kernel.org/r/20220425233309.344858-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tls/tls_sw.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1568,6 +1568,7 @@ static int decrypt_skb_update(struct soc
if (tlm->decrypted) { darg->zc = false; + darg->async = false; return 0; }
@@ -1578,6 +1579,7 @@ static int decrypt_skb_update(struct soc if (err > 0) { tlm->decrypted = 1; darg->zc = false; + darg->async = false; goto decrypt_done; } }
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gal Pressman gal@nvidia.com
commit a069a90554168ac4cc81af65f000557d2a8a0745 upstream.
This reverts commit 284b4d93daee56dff3e10029ddf2e03227f50dbf. When using TLS device offload and coming from tls_device_reencrypt() flow, -EBADMSG error in tls_do_decryption() should not be counted towards the TLSTlsDecryptError counter.
Move the counter increase back to the decrypt_internal() call site in decrypt_skb_update(). This also fixes an issue where: if (n_sgin < 1) return -EBADMSG;
Errors in decrypt_internal() were not counted after the cited patch.
Fixes: 284b4d93daee ("tls: rx: move counting TlsDecryptErrors for sync") Cc: Jakub Kicinski kuba@kernel.org Reviewed-by: Maxim Mikityanskiy maximmi@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Gal Pressman gal@nvidia.com Reviewed-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tls/tls_sw.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -278,9 +278,6 @@ static int tls_do_decryption(struct sock } darg->async = false;
- if (ret == -EBADMSG) - TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTERROR); - return ret; }
@@ -1585,8 +1582,11 @@ static int decrypt_skb_update(struct soc }
err = decrypt_internal(sk, skb, dest, NULL, darg); - if (err < 0) + if (err < 0) { + if (err == -EBADMSG) + TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTERROR); return err; + } if (darg->async) goto decrypt_next;
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Davide Caratti dcaratti@redhat.com
commit 10048689def7e40a4405acda16fdc6477d4ecc5c upstream.
when MPTCP server accepts an incoming connection, it clones its listener socket. However, the pointer to 'inet_opt' for the new socket has the same value as the original one: as a consequence, on program exit it's possible to observe the following splat:
BUG: KASAN: double-free in inet_sock_destruct+0x54f/0x8b0 Free of addr ffff888485950880 by task swapper/25/0
CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Not tainted 6.8.0-rc1+ #609 Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0 07/26/2013 Call Trace: <IRQ> dump_stack_lvl+0x32/0x50 print_report+0xca/0x620 kasan_report_invalid_free+0x64/0x90 __kasan_slab_free+0x1aa/0x1f0 kfree+0xed/0x2e0 inet_sock_destruct+0x54f/0x8b0 __sk_destruct+0x48/0x5b0 rcu_do_batch+0x34e/0xd90 rcu_core+0x559/0xac0 __do_softirq+0x183/0x5a4 irq_exit_rcu+0x12d/0x170 sysvec_apic_timer_interrupt+0x6b/0x80 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:cpuidle_enter_state+0x175/0x300 Code: 30 00 0f 84 1f 01 00 00 83 e8 01 83 f8 ff 75 e5 48 83 c4 18 44 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc fb 45 85 ed <0f> 89 60 ff ff ff 48 c1 e5 06 48 c7 43 18 00 00 00 00 48 83 44 2b RSP: 0018:ffff888481cf7d90 EFLAGS: 00000202 RAX: 0000000000000000 RBX: ffff88887facddc8 RCX: 0000000000000000 RDX: 1ffff1110ff588b1 RSI: 0000000000000019 RDI: ffff88887fac4588 RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000043080 R10: 0009b02ea273363f R11: ffff88887fabf42b R12: ffffffff932592e0 R13: 0000000000000004 R14: 0000000000000000 R15: 00000022c880ec80 cpuidle_enter+0x4a/0xa0 do_idle+0x310/0x410 cpu_startup_entry+0x51/0x60 start_secondary+0x211/0x270 secondary_startup_64_no_verify+0x184/0x18b </TASK>
Allocated by task 6853: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 __kasan_kmalloc+0xa6/0xb0 __kmalloc+0x1eb/0x450 cipso_v4_sock_setattr+0x96/0x360 netlbl_sock_setattr+0x132/0x1f0 selinux_netlbl_socket_post_create+0x6c/0x110 selinux_socket_post_create+0x37b/0x7f0 security_socket_post_create+0x63/0xb0 __sock_create+0x305/0x450 __sys_socket_create.part.23+0xbd/0x130 __sys_socket+0x37/0xb0 __x64_sys_socket+0x6f/0xb0 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76
Freed by task 6858: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x12c/0x1f0 kfree+0xed/0x2e0 inet_sock_destruct+0x54f/0x8b0 __sk_destruct+0x48/0x5b0 subflow_ulp_release+0x1f0/0x250 tcp_cleanup_ulp+0x6e/0x110 tcp_v4_destroy_sock+0x5a/0x3a0 inet_csk_destroy_sock+0x135/0x390 tcp_fin+0x416/0x5c0 tcp_data_queue+0x1bc8/0x4310 tcp_rcv_state_process+0x15a3/0x47b0 tcp_v4_do_rcv+0x2c1/0x990 tcp_v4_rcv+0x41fb/0x5ed0 ip_protocol_deliver_rcu+0x6d/0x9f0 ip_local_deliver_finish+0x278/0x360 ip_local_deliver+0x182/0x2c0 ip_rcv+0xb5/0x1c0 __netif_receive_skb_one_core+0x16e/0x1b0 process_backlog+0x1e3/0x650 __napi_poll+0xa6/0x500 net_rx_action+0x740/0xbb0 __do_softirq+0x183/0x5a4
The buggy address belongs to the object at ffff888485950880 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff888485950880, ffff8884859508c0)
The buggy address belongs to the physical page: page:0000000056d1e95e refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888485950700 pfn:0x485950 flags: 0x57ffffc0000800(slab|node=1|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0057ffffc0000800 ffff88810004c640 ffffea00121b8ac0 dead000000000006 raw: ffff888485950700 0000000000200019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888485950780: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888485950800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff888485950880: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^ ffff888485950900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888485950980: 00 00 00 00 00 01 fc fc fc fc fc fc fc fc fc fc
Something similar (a refcount underflow) happens with CALIPSO/IPv6. Fix this by duplicating IP / IPv6 options after clone, so that ip{,6}_sock_destruct() doesn't end up freeing the same memory area twice.
Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections") Cc: stable@vger.kernel.org Signed-off-by: Davide Caratti dcaratti@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-8-162... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2856,8 +2856,50 @@ static struct ipv6_pinfo *mptcp_inet6_sk
return (struct ipv6_pinfo *)(((u8 *)sk) + offset); } + +static void mptcp_copy_ip6_options(struct sock *newsk, const struct sock *sk) +{ + const struct ipv6_pinfo *np = inet6_sk(sk); + struct ipv6_txoptions *opt; + struct ipv6_pinfo *newnp; + + newnp = inet6_sk(newsk); + + rcu_read_lock(); + opt = rcu_dereference(np->opt); + if (opt) { + opt = ipv6_dup_options(newsk, opt); + if (!opt) + net_warn_ratelimited("%s: Failed to copy ip6 options\n", __func__); + } + RCU_INIT_POINTER(newnp->opt, opt); + rcu_read_unlock(); +} #endif
+static void mptcp_copy_ip_options(struct sock *newsk, const struct sock *sk) +{ + struct ip_options_rcu *inet_opt, *newopt = NULL; + const struct inet_sock *inet = inet_sk(sk); + struct inet_sock *newinet; + + newinet = inet_sk(newsk); + + rcu_read_lock(); + inet_opt = rcu_dereference(inet->inet_opt); + if (inet_opt) { + newopt = sock_kmalloc(newsk, sizeof(*inet_opt) + + inet_opt->opt.optlen, GFP_ATOMIC); + if (newopt) + memcpy(newopt, inet_opt, sizeof(*inet_opt) + + inet_opt->opt.optlen); + else + net_warn_ratelimited("%s: Failed to copy ip options\n", __func__); + } + RCU_INIT_POINTER(newinet->inet_opt, newopt); + rcu_read_unlock(); +} + struct sock *mptcp_sk_clone(const struct sock *sk, const struct mptcp_options_received *mp_opt, struct request_sock *req) @@ -2878,6 +2920,13 @@ struct sock *mptcp_sk_clone(const struct nsk->sk_wait_pending = 0; __mptcp_init_sock(nsk);
+#if IS_ENABLED(CONFIG_MPTCP_IPV6) + if (nsk->sk_family == AF_INET6) + mptcp_copy_ip6_options(nsk, sk); + else +#endif + mptcp_copy_ip_options(nsk, sk); + msk = mptcp_sk(nsk); msk->local_key = subflow_req->local_key; msk->token = subflow_req->token;
Hello,
On Mon, 4 Mar 2024 21:23:33 +0000 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.151 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.151-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] e7cbbec10c6e ("Linux 5.15.151-rc1")
Thanks, SJ
[...]
---
ok 1 selftests: damon: debugfs_attrs.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_arm64.sh ok 12 selftests: damon-tests: build_m68k.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m
On 3/4/24 1:23 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.151 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.151-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Tue, 5 Mar 2024 at 03:23, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.151 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.151-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Following build failures noticed on riscv.
The riscv tinyconfig and allnoconfig builds on 5.15. The riscv defconfig, tinyconfig and allnoconfig builds on 5.10.
linux.5.15.y build failures on riscv. riscv: build: * gcc-12-tinyconfig * gcc-8-allnoconfig * clang-17-tinyconfig * gcc-8-tinyconfig * gcc-12-allnoconfig
linux.5.10.y build failures on riscv. riscv:
* gcc-8-defconfig * clang-17-allnoconfig * gcc-12-tinyconfig * gcc-8-allnoconfig * gcc-8-allmodconfig * clang-17-defconfig * gcc-12-defconfig * clang-17-tinyconfig * gcc-12-allmodconfig * gcc-8-tinyconfig * gcc-12-allnoconfig
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
arch/riscv/kernel/return_address.c:39:9: error: implicit declaration of function 'arch_stack_walk' [-Werror=implicit-function-declaration] 39 | arch_stack_walk(save_return_addr, &data, current, NULL); | ^~~~~~~~~~~~~~~ cc1: some warnings being treated as errors
Suspecting patch,
riscv: add CALLER_ADDRx support commit 680341382da56bd192ebfa4e58eaf4fec2e5bca7 upstream.
steps to reproduce: --- # tuxmake --runtime podman --target-arch riscv --toolchain gcc-12 --kconfig defconfig
Links: - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.15.y/v5.15.149-33... - https://storage.tuxsuite.com/public/linaro/lkft/builds/2dF68wn0dXbU2xGLRyzsx...
- https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/v5.10.210-16... - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/v5.10.210-16... - https://storage.tuxsuite.com/public/linaro/lkft/builds/2dF5ke88GqtfanduGie1J...
-- Linaro LKFT https://lkft.linaro.org
On Tue, Mar 05, 2024 at 03:38:20PM +0530, Naresh Kamboju wrote:
On Tue, 5 Mar 2024 at 03:23, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.151 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.151-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Following build failures noticed on riscv.
The riscv tinyconfig and allnoconfig builds on 5.15. The riscv defconfig, tinyconfig and allnoconfig builds on 5.10.
linux.5.15.y build failures on riscv. riscv: build: * gcc-12-tinyconfig * gcc-8-allnoconfig * clang-17-tinyconfig * gcc-8-tinyconfig * gcc-12-allnoconfig
linux.5.10.y build failures on riscv. riscv:
- gcc-8-defconfig
- clang-17-allnoconfig
- gcc-12-tinyconfig
- gcc-8-allnoconfig
- gcc-8-allmodconfig
- clang-17-defconfig
- gcc-12-defconfig
- clang-17-tinyconfig
- gcc-12-allmodconfig
- gcc-8-tinyconfig
- gcc-12-allnoconfig
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
arch/riscv/kernel/return_address.c:39:9: error: implicit declaration of function 'arch_stack_walk' [-Werror=implicit-function-declaration] 39 | arch_stack_walk(save_return_addr, &data, current, NULL); | ^~~~~~~~~~~~~~~ cc1: some warnings being treated as errors
Suspecting patch,
riscv: add CALLER_ADDRx support commit 680341382da56bd192ebfa4e58eaf4fec2e5bca7 upstream.
Thanks, will go drop this and push out -rc2 kernels for 5.15 and 5.10
greg k-h
On Mon, 04 Mar 2024 21:23:33 +0000, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.151 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.151-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.15: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 102 tests: 102 pass, 0 fail
Linux version: 5.15.151-rc1-ge7cbbec10c6e Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
Hi Greg,
On 05/03/24 02:53, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.151 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
No problems seen on x86_64 and aarch64 with our testing.
Tested-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com
Thanks, Harshit
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.151-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
On 3/4/24 14:23, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.151 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.151-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org