This is the start of the stable review cycle for the 5.4.227 release. There are 67 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.227-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.4.227-rc1
Frank Jungclaus frank.jungclaus@esd.eu can: esd_usb: Allow REC and TEC to return to zero
Dan Carpenter error27@gmail.com net: mvneta: Fix an out of bounds check
Eric Dumazet edumazet@google.com ipv6: avoid use-after-free in ip6_fragment()
Yang Yingliang yangyingliang@huawei.com net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq()
Juergen Gross jgross@suse.com xen/netback: fix build warning
Zhang Changzhong zhangchangzhong@huawei.com ethernet: aeroflex: fix potential skb leak in greth_init_rings()
Ido Schimmel idosch@nvidia.com ipv4: Fix incorrect route flushing when table ID 0 is used
Ido Schimmel idosch@nvidia.com ipv4: Fix incorrect route flushing when source address is deleted
YueHaibing yuehaibing@huawei.com tipc: Fix potential OOB in tipc_link_proto_rcv()
Liu Jian liujian56@huawei.com net: hisilicon: Fix potential use-after-free in hix5hd2_rx()
Liu Jian liujian56@huawei.com net: hisilicon: Fix potential use-after-free in hisi_femac_rx()
Yongqiang Liu liuyongqiang13@huawei.com net: thunderx: Fix missing destroy_workqueue of nicvf_rx_mode_wq
Jisheng Zhang jszhang@kernel.org net: stmmac: fix "snps,axi-config" node property parsing
Pankaj Raghav p.raghav@samsung.com nvme initialize core quirks before calling nvme_init_subsystem
Kees Cook keescook@chromium.org NFC: nci: Bounds check struct nfc_target arrays
Przemyslaw Patynowski przemyslawx.patynowski@intel.com i40e: Disallow ip4 and ip6 l4_4_bytes
Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com i40e: Fix for VF MAC address 0
Michal Jaron michalx.jaron@intel.com i40e: Fix not setting default xps_cpus after reset
Dan Carpenter error27@gmail.com net: mvneta: Prevent out of bounds read in mvneta_config_rss()
Lin Liu lin.liu@citrix.com xen-netfront: Fix NULL sring after live migration
Valentina Goncharenko goncharenko.vp@ispras.ru net: encx24j600: Fix invalid logic in reading of MISTAT register
Valentina Goncharenko goncharenko.vp@ispras.ru net: encx24j600: Add parentheses to fix precedence
Wei Yongjun weiyongjun1@huawei.com mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add()
Zhengchao Shao shaozhengchao@huawei.com selftests: rtnetlink: correct xfrm policy rule in kci_test_ipsec_offload
Artem Chernyshev artem.chernyshev@red-soft.ru net: dsa: ksz: Check return value
Chen Zhongjin chenzhongjin@huawei.com Bluetooth: Fix not cleanup led when bt_init fails
Wang ShaoBo bobo.shaobowang@huawei.com Bluetooth: 6LoWPAN: add missing hci_dev_put() in get_l2cap_conn()
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Get user_ns from in_skb in unix_diag_get_exact().
Akihiko Odaki akihiko.odaki@daynix.com igb: Allocate MSI-X vector when testing
Akihiko Odaki akihiko.odaki@daynix.com e1000e: Fix TX dispatch condition
Xiongfeng Wang wangxiongfeng2@huawei.com gpio: amd8111: Fix PCI device reference count leak
Qiqi Zhang eddy.zhang@rock-chips.com drm/bridge: ti-sn65dsi86: Fix output polarity setting bug
Hauke Mehrtens hauke@hauke-m.de ca8210: Fix crash by zero initializing data
Ziyang Xuan william.xuanziyang@huawei.com ieee802154: cc2520: Fix error return code in cc2520_hw_init()
Baolin Wang baolin.wang@linux.alibaba.com mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
Oliver Hartkopp socketcan@hartkopp.net can: af_can: fix NULL pointer dereference in can_rcv_filter
ZhangPeng zhangpeng362@huawei.com HID: core: fix shift-out-of-bounds in hid_report_raw_event
Anastasia Belova abelova@astralinux.ru HID: hid-lg4ff: Add check for empty lbuf
Ankit Patel anpatel@nvidia.com HID: usbhid: Add ALWAYS_POLL quirk for some mice
Rob Clark robdclark@chromium.org drm/shmem-helper: Remove errant put in error path
Thomas Huth thuth@redhat.com KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field
John Starks jostarks@microsoft.com mm/gup: fix gup_pud_range() for dax
Tejun Heo tj@kernel.org memcg: fix possible use-after-free in memcg_write_event_control()
Hans Verkuil hverkuil-cisco@xs4all.nl media: v4l2-dv-timings.c: fix too strict blanking sanity checks
Rafał Miłecki rafal@milecki.pl Revert "net: dsa: b53: Fix valid setting for MDB entries"
Juergen Gross jgross@suse.com xen/netback: don't call kfree_skb() with interrupts disabled
Juergen Gross jgross@suse.com xen/netback: do some code cleanup
Ross Lagerwall ross.lagerwall@citrix.com xen/netback: Ensure protocol headers don't fall in the non-linear area
Jann Horn jannh@google.com mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
Jann Horn jannh@google.com mm/khugepaged: fix GUP-fast interaction by sending IPI
Jann Horn jannh@google.com mm/khugepaged: take the right locks for page table retraction
Davide Tronchin davide.tronchin.94@gmail.com net: usb: qmi_wwan: add u-blox 0x1342 composition
Dominique Martinet asmadeus@codewreck.org 9p/xen: check logical size for buffer size
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp fbcon: Use kzalloc() in fbcon_prepare_logo()
Andreas Kemnade andreas@kemnade.info regulator: twl6030: fix get status of twl6032 regulators
Srinivasa Rao Mandadapu quic_srivasam@quicinc.com ASoC: soc-pcm: Add NULL check in BE reparenting
Filipe Manana fdmanana@suse.com btrfs: send: avoid unaligned encoded writes when attempting to clone range
Kees Cook keescook@chromium.org ALSA: seq: Fix function prototype mismatch in snd_seq_expand_var_event
Konrad Dybcio konrad.dybcio@linaro.org regulator: slg51000: Wait after asserting CS pin
GUO Zihua guozihua@huawei.com 9p/fd: Use P9_HDRSZ for header size
Johan Jonker jbx6244@gmail.com ARM: dts: rockchip: disable arm_global_timer on rk3066 and rk3188
Giulio Benetti giulio.benetti@benettiengineering.com ARM: 9266/1: mm: fix no-MMU ZERO_PAGE() implementation
Tomislav Novak tnovak@fb.com ARM: 9251/1: perf: Fix stacktraces for tracepoint events in THUMB2 kernels
Johan Jonker jbx6244@gmail.com ARM: dts: rockchip: rk3188: fix lcdc1-rgb24 node name
Johan Jonker jbx6244@gmail.com ARM: dts: rockchip: fix ir-receiver node names
Sebastian Reichel sebastian.reichel@collabora.com arm: dts: rockchip: fix node name for hym8563 rtc
FUKAUMI Naoki naoki@radxa.com arm64: dts: rockchip: keep I2S1 disabled for GPIO function on ROCK Pi 4 series
-------------
Diffstat:
Makefile | 4 +- arch/arm/boot/dts/rk3036-evb.dts | 2 +- arch/arm/boot/dts/rk3188-radxarock.dts | 2 +- arch/arm/boot/dts/rk3188.dtsi | 3 +- arch/arm/boot/dts/rk3288-evb-act8846.dts | 2 +- arch/arm/boot/dts/rk3288-firefly.dtsi | 2 +- arch/arm/boot/dts/rk3288-miqi.dts | 2 +- arch/arm/boot/dts/rk3288-rock2-square.dts | 2 +- arch/arm/boot/dts/rk3xxx.dtsi | 7 + arch/arm/include/asm/perf_event.h | 2 +- arch/arm/include/asm/pgtable-nommu.h | 6 - arch/arm/include/asm/pgtable.h | 16 +- arch/arm/mm/nommu.c | 19 + arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dts | 1 - arch/s390/kvm/vsie.c | 4 +- drivers/gpio/gpio-amd8111.c | 4 + drivers/gpu/drm/bridge/ti-sn65dsi86.c | 4 +- drivers/gpu/drm/drm_gem_shmem_helper.c | 4 +- drivers/hid/hid-core.c | 3 + drivers/hid/hid-ids.h | 3 + drivers/hid/hid-lg4ff.c | 6 + drivers/hid/hid-quirks.c | 3 + drivers/media/v4l2-core/v4l2-dv-timings.c | 20 +- drivers/net/can/usb/esd_usb2.c | 6 + drivers/net/dsa/b53/b53_common.c | 1 + drivers/net/ethernet/aeroflex/greth.c | 1 + drivers/net/ethernet/cavium/thunder/nicvf_main.c | 4 +- drivers/net/ethernet/hisilicon/hisi_femac.c | 2 +- drivers/net/ethernet/hisilicon/hix5hd2_gmac.c | 2 +- drivers/net/ethernet/intel/e1000e/netdev.c | 4 +- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 6 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 19 +- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 + drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 + drivers/net/ethernet/marvell/mvneta.c | 2 +- drivers/net/ethernet/microchip/encx24j600-regmap.c | 4 +- .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 8 +- drivers/net/ieee802154/ca8210.c | 2 +- drivers/net/ieee802154/cc2520.c | 2 +- drivers/net/plip/plip.c | 4 +- drivers/net/usb/qmi_wwan.c | 1 + drivers/net/xen-netback/common.h | 14 +- drivers/net/xen-netback/interface.c | 22 +- drivers/net/xen-netback/netback.c | 229 +-- drivers/net/xen-netback/rx.c | 10 +- drivers/net/xen-netfront.c | 6 + drivers/nvme/host/core.c | 8 +- drivers/regulator/slg51000-regulator.c | 2 + drivers/regulator/twl6030-regulator.c | 15 +- drivers/video/fbdev/core/fbcon.c | 2 +- fs/btrfs/send.c | 24 +- include/asm-generic/tlb.h | 4 + include/linux/cgroup.h | 1 + include/linux/hugetlb.h | 6 +- kernel/cgroup/cgroup-internal.h | 1 - mm/gup.c | 15 +- mm/hugetlb.c | 28 +- mm/khugepaged.c | 47 +- mm/memcontrol.c | 15 +- mm/mmu_gather.c | 5 + net/9p/trans_fd.c | 6 +- net/9p/trans_xen.c | 9 + net/bluetooth/6lowpan.c | 1 + net/bluetooth/af_bluetooth.c | 4 +- net/can/af_can.c | 4 +- net/dsa/tag_ksz.c | 3 +- net/ipv4/fib_frontend.c | 3 + net/ipv4/fib_semantics.c | 1 + net/ipv6/ip6_output.c | 5 + net/mac802154/iface.c | 1 + net/nfc/nci/ntf.c | 6 + net/tipc/link.c | 4 +- net/unix/diag.c | 20 +- sound/core/seq/seq_memory.c | 11 +- sound/soc/soc-pcm.c | 2 + tools/testing/selftests/net/fib_tests.sh | 1727 -------------------- tools/testing/selftests/net/rtnetlink.sh | 2 +- 77 files changed, 476 insertions(+), 1980 deletions(-)
From: FUKAUMI Naoki naoki@radxa.com
[ Upstream commit 849c19d14940b87332d5d59c7fc581d73f2099fd ]
I2S1 pins are exposed on 40-pin header on Radxa ROCK Pi 4 series. their default function is GPIO, so I2S1 need to be disabled.
Signed-off-by: FUKAUMI Naoki naoki@radxa.com Link: https://lore.kernel.org/r/20220924112812.1219-1-naoki@radxa.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dts | 1 - 1 file changed, 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dts b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dts index da3b031d4bef..79d04a664b82 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dts @@ -441,7 +441,6 @@ &i2s1 { rockchip,playback-channels = <2>; rockchip,capture-channels = <2>; - status = "okay"; };
&i2s2 {
From: Sebastian Reichel sebastian.reichel@collabora.com
[ Upstream commit 17b57beafccb4569accbfc8c11390744cf59c021 ]
Fix the node name for hym8563 in all arm rockchip devicetrees.
Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Link: https://lore.kernel.org/r/20221024165549.74574-4-sebastian.reichel@collabora... Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/rk3036-evb.dts | 2 +- arch/arm/boot/dts/rk3288-evb-act8846.dts | 2 +- arch/arm/boot/dts/rk3288-firefly.dtsi | 2 +- arch/arm/boot/dts/rk3288-miqi.dts | 2 +- arch/arm/boot/dts/rk3288-rock2-square.dts | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/arm/boot/dts/rk3036-evb.dts b/arch/arm/boot/dts/rk3036-evb.dts index 2a7e6624efb9..ea23ba98625e 100644 --- a/arch/arm/boot/dts/rk3036-evb.dts +++ b/arch/arm/boot/dts/rk3036-evb.dts @@ -31,7 +31,7 @@ &i2c1 { status = "okay";
- hym8563: hym8563@51 { + hym8563: rtc@51 { compatible = "haoyu,hym8563"; reg = <0x51>; #clock-cells = <0>; diff --git a/arch/arm/boot/dts/rk3288-evb-act8846.dts b/arch/arm/boot/dts/rk3288-evb-act8846.dts index 80080767c365..9ac40c100e3f 100644 --- a/arch/arm/boot/dts/rk3288-evb-act8846.dts +++ b/arch/arm/boot/dts/rk3288-evb-act8846.dts @@ -53,7 +53,7 @@ vin-supply = <&vcc_sys>; };
- hym8563@51 { + rtc@51 { compatible = "haoyu,hym8563"; reg = <0x51>;
diff --git a/arch/arm/boot/dts/rk3288-firefly.dtsi b/arch/arm/boot/dts/rk3288-firefly.dtsi index 5e0a19004e46..8d418881166f 100644 --- a/arch/arm/boot/dts/rk3288-firefly.dtsi +++ b/arch/arm/boot/dts/rk3288-firefly.dtsi @@ -233,7 +233,7 @@ vin-supply = <&vcc_sys>; };
- hym8563: hym8563@51 { + hym8563: rtc@51 { compatible = "haoyu,hym8563"; reg = <0x51>; #clock-cells = <0>; diff --git a/arch/arm/boot/dts/rk3288-miqi.dts b/arch/arm/boot/dts/rk3288-miqi.dts index c41d012c8850..bf8f3f9bb4d5 100644 --- a/arch/arm/boot/dts/rk3288-miqi.dts +++ b/arch/arm/boot/dts/rk3288-miqi.dts @@ -145,7 +145,7 @@ vin-supply = <&vcc_sys>; };
- hym8563: hym8563@51 { + hym8563: rtc@51 { compatible = "haoyu,hym8563"; reg = <0x51>; #clock-cells = <0>; diff --git a/arch/arm/boot/dts/rk3288-rock2-square.dts b/arch/arm/boot/dts/rk3288-rock2-square.dts index cdcdc921ee09..7220de126635 100644 --- a/arch/arm/boot/dts/rk3288-rock2-square.dts +++ b/arch/arm/boot/dts/rk3288-rock2-square.dts @@ -165,7 +165,7 @@ };
&i2c0 { - hym8563: hym8563@51 { + hym8563: rtc@51 { compatible = "haoyu,hym8563"; reg = <0x51>; #clock-cells = <0>;
From: Johan Jonker jbx6244@gmail.com
[ Upstream commit dd847fe34cdf1e89afed1af24986359f13082bfb ]
Fix ir-receiver node names on Rockchip boards, so that they match with regex: '^ir(-receiver)?(@[a-f0-9]+)?$'
Signed-off-by: Johan Jonker jbx6244@gmail.com Link: https://lore.kernel.org/r/ea5af279-f44c-afea-023d-bb37f5a0d58d@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/rk3188-radxarock.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/rk3188-radxarock.dts b/arch/arm/boot/dts/rk3188-radxarock.dts index c9a7f5409960..f8e51ca3ee00 100644 --- a/arch/arm/boot/dts/rk3188-radxarock.dts +++ b/arch/arm/boot/dts/rk3188-radxarock.dts @@ -67,7 +67,7 @@ #sound-dai-cells = <0>; };
- ir_recv: gpio-ir-receiver { + ir_recv: ir-receiver { compatible = "gpio-ir-receiver"; gpios = <&gpio0 RK_PB2 GPIO_ACTIVE_LOW>; pinctrl-names = "default";
From: Johan Jonker jbx6244@gmail.com
[ Upstream commit 11871e20bcb23c00966e785a124fb72bc8340af4 ]
The lcdc1-rgb24 node name is out of line with the rest of the rk3188 lcdc1 node, so fix it.
Signed-off-by: Johan Jonker jbx6244@gmail.com Link: https://lore.kernel.org/r/7b9c0a6f-626b-07e8-ae74-7e0f08b8d241@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/rk3188.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/rk3188.dtsi b/arch/arm/boot/dts/rk3188.dtsi index ee8a24a0e3cb..95d558dc163c 100644 --- a/arch/arm/boot/dts/rk3188.dtsi +++ b/arch/arm/boot/dts/rk3188.dtsi @@ -404,7 +404,7 @@ rockchip,pins = <2 RK_PD3 1 &pcfg_pull_none>; };
- lcdc1_rgb24: ldcd1-rgb24 { + lcdc1_rgb24: lcdc1-rgb24 { rockchip,pins = <2 RK_PA0 1 &pcfg_pull_none>, <2 RK_PA1 1 &pcfg_pull_none>, <2 RK_PA2 1 &pcfg_pull_none>,
From: Tomislav Novak tnovak@fb.com
[ Upstream commit 612695bccfdbd52004551308a55bae410e7cd22f ]
Store the frame address where arm_get_current_stackframe() looks for it (ARM_r7 instead of ARM_fp if CONFIG_THUMB2_KERNEL=y). Otherwise frame->fp gets set to 0, causing unwind_frame() to fail.
# bpftrace -e 't:sched:sched_switch { @[kstack] = count(); exit(); }' Attaching 1 probe... @[ __schedule+1059 ]: 1
A typical first unwind instruction is 0x97 (SP = R7), so after executing it SP ends up being 0 and -URC_FAILURE is returned.
unwind_frame(pc = ac9da7d7 lr = 00000000 sp = c69bdda0 fp = 00000000) unwind_find_idx(ac9da7d7) unwind_exec_insn: insn = 00000097 unwind_exec_insn: fp = 00000000 sp = 00000000 lr = 00000000 pc = 00000000
With this patch:
# bpftrace -e 't:sched:sched_switch { @[kstack] = count(); exit(); }' Attaching 1 probe... @[ __schedule+1059 __schedule+1059 schedule+79 schedule_hrtimeout_range_clock+163 schedule_hrtimeout_range+17 ep_poll+471 SyS_epoll_wait+111 sys_epoll_pwait+231 __ret_fast_syscall+1 ]: 1
Link: https://lore.kernel.org/r/20220920230728.2617421-1-tnovak@fb.com/
Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Tomislav Novak tnovak@fb.com Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/include/asm/perf_event.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/perf_event.h b/arch/arm/include/asm/perf_event.h index fe87397c3d8c..bdbc1e590891 100644 --- a/arch/arm/include/asm/perf_event.h +++ b/arch/arm/include/asm/perf_event.h @@ -17,7 +17,7 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
#define perf_arch_fetch_caller_regs(regs, __ip) { \ (regs)->ARM_pc = (__ip); \ - (regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \ + frame_pointer((regs)) = (unsigned long) __builtin_frame_address(0); \ (regs)->ARM_sp = current_stack_pointer; \ (regs)->ARM_cpsr = SVC_MODE; \ }
From: Giulio Benetti giulio.benetti@benettiengineering.com
[ Upstream commit 340a982825f76f1cff0daa605970fe47321b5ee7 ]
Actually in no-MMU SoCs(i.e. i.MXRT) ZERO_PAGE(vaddr) expands to ``` virt_to_page(0) ``` that in order expands to: ``` pfn_to_page(virt_to_pfn(0)) ``` and then virt_to_pfn(0) to: ``` ((((unsigned long)(0) - PAGE_OFFSET) >> PAGE_SHIFT) + PHYS_PFN_OFFSET) ``` where PAGE_OFFSET and PHYS_PFN_OFFSET are the DRAM offset(0x80000000) and PAGE_SHIFT is 12. This way we obtain 16MB(0x01000000) summed to the base of DRAM(0x80000000). When ZERO_PAGE(0) is then used, for example in bio_add_page(), the page gets an address that is out of DRAM bounds. So instead of using fake virtual page 0 let's allocate a dedicated zero_page during paging_init() and assign it to a global 'struct page * empty_zero_page' the same way mmu.c does and it's the same approach used in m68k with commit dc068f462179 as discussed here[0]. Then let's move ZERO_PAGE() definition to the top of pgtable.h to be in common between mmu.c and nommu.c.
[0]: https://lore.kernel.org/linux-m68k/2a462b23-5b8e-bbf4-ec7d-778434a3b9d7@goog... ad140743174d6b3070364d3c9a5179b
Signed-off-by: Giulio Benetti giulio.benetti@benettiengineering.com Reviewed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/include/asm/pgtable-nommu.h | 6 ------ arch/arm/include/asm/pgtable.h | 16 +++++++++------- arch/arm/mm/nommu.c | 19 +++++++++++++++++++ 3 files changed, 28 insertions(+), 13 deletions(-)
diff --git a/arch/arm/include/asm/pgtable-nommu.h b/arch/arm/include/asm/pgtable-nommu.h index 010fa1a35a68..e8ac2f95fb37 100644 --- a/arch/arm/include/asm/pgtable-nommu.h +++ b/arch/arm/include/asm/pgtable-nommu.h @@ -51,12 +51,6 @@
typedef pte_t *pte_addr_t;
-/* - * ZERO_PAGE is a global shared page that is always zero: used - * for zero-mapped memory areas etc.. - */ -#define ZERO_PAGE(vaddr) (virt_to_page(0)) - /* * Mark the prot value as uncacheable and unbufferable. */ diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index 3ae120cd1715..ecfd6e7e128f 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -10,6 +10,15 @@ #include <linux/const.h> #include <asm/proc-fns.h>
+#ifndef __ASSEMBLY__ +/* + * ZERO_PAGE is a global shared page that is always zero: used + * for zero-mapped memory areas etc.. + */ +extern struct page *empty_zero_page; +#define ZERO_PAGE(vaddr) (empty_zero_page) +#endif + #ifndef CONFIG_MMU
#include <asm-generic/4level-fixup.h> @@ -166,13 +175,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, #define __S111 __PAGE_SHARED_EXEC
#ifndef __ASSEMBLY__ -/* - * ZERO_PAGE is a global shared page that is always zero: used - * for zero-mapped memory areas etc.. - */ -extern struct page *empty_zero_page; -#define ZERO_PAGE(vaddr) (empty_zero_page) -
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c index 24ecf8d30a1e..a3ad8a1b0e07 100644 --- a/arch/arm/mm/nommu.c +++ b/arch/arm/mm/nommu.c @@ -26,6 +26,13 @@
unsigned long vectors_base;
+/* + * empty_zero_page is a special page that is used for + * zero-initialized data and COW. + */ +struct page *empty_zero_page; +EXPORT_SYMBOL(empty_zero_page); + #ifdef CONFIG_ARM_MPU struct mpu_rgn_info mpu_rgn_info; #endif @@ -148,9 +155,21 @@ void __init adjust_lowmem_bounds(void) */ void __init paging_init(const struct machine_desc *mdesc) { + void *zero_page; + early_trap_init((void *)vectors_base); mpu_setup(); + + /* allocate the zero page. */ + zero_page = memblock_alloc(PAGE_SIZE, PAGE_SIZE); + if (!zero_page) + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", + __func__, PAGE_SIZE, PAGE_SIZE); + bootmem_init(); + + empty_zero_page = virt_to_page(zero_page); + flush_dcache_page(empty_zero_page); }
/*
From: Johan Jonker jbx6244@gmail.com
[ Upstream commit da74858a475782a3f16470907814c8cc5950ad68 ]
The clock source and the sched_clock provided by the arm_global_timer on Rockchip rk3066a/rk3188 are quite unstable because their rates depend on the CPU frequency.
Recent changes to the arm_global_timer driver makes it impossible to use.
On the other side, the arm_global_timer has a higher rating than the ROCKCHIP_TIMER, it will be selected by default by the time framework while we want to use the stable Rockchip clock source.
Keep the arm_global_timer disabled in order to have the DW_APB_TIMER (rk3066a) or ROCKCHIP_TIMER (rk3188) selected by default.
Signed-off-by: Johan Jonker jbx6244@gmail.com Link: https://lore.kernel.org/r/f275ca8d-fd0a-26e5-b978-b7f3df815e0a@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/rk3188.dtsi | 1 - arch/arm/boot/dts/rk3xxx.dtsi | 7 +++++++ 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/rk3188.dtsi b/arch/arm/boot/dts/rk3188.dtsi index 95d558dc163c..5e8ba80d7b4f 100644 --- a/arch/arm/boot/dts/rk3188.dtsi +++ b/arch/arm/boot/dts/rk3188.dtsi @@ -632,7 +632,6 @@
&global_timer { interrupts = <GIC_PPI 11 (GIC_CPU_MASK_SIMPLE(4) | IRQ_TYPE_EDGE_RISING)>; - status = "disabled"; };
&local_timer { diff --git a/arch/arm/boot/dts/rk3xxx.dtsi b/arch/arm/boot/dts/rk3xxx.dtsi index bce0b05ef7bf..0580eea90fbc 100644 --- a/arch/arm/boot/dts/rk3xxx.dtsi +++ b/arch/arm/boot/dts/rk3xxx.dtsi @@ -108,6 +108,13 @@ reg = <0x1013c200 0x20>; interrupts = <GIC_PPI 11 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_EDGE_RISING)>; clocks = <&cru CORE_PERI>; + status = "disabled"; + /* The clock source and the sched_clock provided by the arm_global_timer + * on Rockchip rk3066a/rk3188 are quite unstable because their rates + * depend on the CPU frequency. + * Keep the arm_global_timer disabled in order to have the + * DW_APB_TIMER (rk3066a) or ROCKCHIP_TIMER (rk3188) selected by default. + */ };
local_timer: local-timer@1013c600 {
From: GUO Zihua guozihua@huawei.com
[ Upstream commit 6854fadbeee10891ed74246bdc05031906b6c8cf ]
Cleanup hardcoded header sizes to use P9_HDRSZ instead of '7'
Link: https://lkml.kernel.org/r/20221117091159.31533-4-guozihua@huawei.com Signed-off-by: GUO Zihua guozihua@huawei.com Reviewed-by: Christian Schoenebeck linux_oss@crudebyte.com [Dominique: commit message adjusted to make sense after offset size adjustment got removed] Signed-off-by: Dominique Martinet asmadeus@codewreck.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/trans_fd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 23c1d78ab1e4..872568cca926 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -118,7 +118,7 @@ struct p9_conn { struct list_head unsent_req_list; struct p9_req_t *rreq; struct p9_req_t *wreq; - char tmp_buf[7]; + char tmp_buf[P9_HDRSZ]; struct p9_fcall rc; int wpos; int wsize; @@ -291,7 +291,7 @@ static void p9_read_work(struct work_struct *work) if (!m->rc.sdata) { m->rc.sdata = m->tmp_buf; m->rc.offset = 0; - m->rc.capacity = 7; /* start by reading header */ + m->rc.capacity = P9_HDRSZ; /* start by reading header */ }
clear_bit(Rpending, &m->wsched); @@ -314,7 +314,7 @@ static void p9_read_work(struct work_struct *work) p9_debug(P9_DEBUG_TRANS, "got new header\n");
/* Header size */ - m->rc.size = 7; + m->rc.size = P9_HDRSZ; err = p9_parse_header(&m->rc, &m->rc.size, NULL, NULL, 0); if (err) { p9_debug(P9_DEBUG_ERROR,
From: Konrad Dybcio konrad.dybcio@linaro.org
[ Upstream commit 0b24dfa587c6cc7484cfb170da5c7dd73451f670 ]
Sony's downstream driver [1], among some other changes, adds a seemingly random 10ms usleep_range, which turned out to be necessary for the hardware to function properly on at least Sony Xperia 1 IV. Without this, I2C transactions with the SLG51000 straight up fail.
Relax (10-10ms -> 10-11ms) and add the aforementioned sleep to make sure the hardware has some time to wake up.
(nagara-2.0.0-mlc/vendor/semc/hardware/camera-kernel-module/) [1] https://developer.sony.com/file/download/open-source-archive-for-64-0-m-4-29...
Signed-off-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20221118131035.54874-1-konrad.dybcio@linaro.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/slg51000-regulator.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/regulator/slg51000-regulator.c b/drivers/regulator/slg51000-regulator.c index a0565daecace..5a18d7e620a5 100644 --- a/drivers/regulator/slg51000-regulator.c +++ b/drivers/regulator/slg51000-regulator.c @@ -465,6 +465,8 @@ static int slg51000_i2c_probe(struct i2c_client *client, chip->cs_gpiod = cs_gpiod; }
+ usleep_range(10000, 11000); + i2c_set_clientdata(client, chip); chip->chip_irq = client->irq; chip->dev = dev;
From: Kees Cook keescook@chromium.org
[ Upstream commit 05530ef7cf7c7d700f6753f058999b1b5099a026 ]
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function pointer prototype to make sure the call target is valid to help mitigate ROP attacks. If they are not identical, there is a failure at run time, which manifests as either a kernel panic or thread getting killed.
seq_copy_in_user() and seq_copy_in_kernel() did not have prototypes matching snd_seq_dump_func_t. Adjust this and remove the casts. There are not resulting binary output differences.
This was found as a result of Clang's new -Wcast-function-type-strict flag, which is more sensitive than the simpler -Wcast-function-type, which only checks for type width mismatches.
Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/lkml/202211041527.HD8TLSE1-lkp@intel.com Cc: Jaroslav Kysela perex@perex.cz Cc: Takashi Iwai tiwai@suse.com Cc: "Gustavo A. R. Silva" gustavoars@kernel.org Cc: alsa-devel@alsa-project.org Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20221118232346.never.380-kees@kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/core/seq/seq_memory.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/sound/core/seq/seq_memory.c b/sound/core/seq/seq_memory.c index 65db1a7c77b7..bb76a2dd0a2f 100644 --- a/sound/core/seq/seq_memory.c +++ b/sound/core/seq/seq_memory.c @@ -112,15 +112,19 @@ EXPORT_SYMBOL(snd_seq_dump_var_event); * expand the variable length event to linear buffer space. */
-static int seq_copy_in_kernel(char **bufptr, const void *src, int size) +static int seq_copy_in_kernel(void *ptr, void *src, int size) { + char **bufptr = ptr; + memcpy(*bufptr, src, size); *bufptr += size; return 0; }
-static int seq_copy_in_user(char __user **bufptr, const void *src, int size) +static int seq_copy_in_user(void *ptr, void *src, int size) { + char __user **bufptr = ptr; + if (copy_to_user(*bufptr, src, size)) return -EFAULT; *bufptr += size; @@ -149,8 +153,7 @@ int snd_seq_expand_var_event(const struct snd_seq_event *event, int count, char return newlen; } err = snd_seq_dump_var_event(event, - in_kernel ? (snd_seq_dump_func_t)seq_copy_in_kernel : - (snd_seq_dump_func_t)seq_copy_in_user, + in_kernel ? seq_copy_in_kernel : seq_copy_in_user, &buf); return err < 0 ? err : newlen; }
From: Filipe Manana fdmanana@suse.com
[ Upstream commit a11452a3709e217492798cf3686ac2cc8eb3fb51 ]
When trying to see if we can clone a file range, there are cases where we end up sending two write operations in case the inode from the source root has an i_size that is not sector size aligned and the length from the current offset to its i_size is less than the remaining length we are trying to clone.
Issuing two write operations when we could instead issue a single write operation is not incorrect. However it is not optimal, specially if the extents are compressed and the flag BTRFS_SEND_FLAG_COMPRESSED was passed to the send ioctl. In that case we can end up sending an encoded write with an offset that is not sector size aligned, which makes the receiver fallback to decompressing the data and writing it using regular buffered IO (so re-compressing the data in case the fs is mounted with compression enabled), because encoded writes fail with -EINVAL when an offset is not sector size aligned.
The following example, which triggered a bug in the receiver code for the fallback logic of decompressing + regular buffer IO and is fixed by the patchset referred in a Link at the bottom of this changelog, is an example where we have the non-optimal behaviour due to an unaligned encoded write:
$ cat test.sh #!/bin/bash
DEV=/dev/sdj MNT=/mnt/sdj
mkfs.btrfs -f $DEV > /dev/null mount -o compress $DEV $MNT
# File foo has a size of 33K, not aligned to the sector size. xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo
xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar
# Now clone the first 32K of file bar into foo at offset 0. xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo
# Snapshot the default subvolume and create a full send stream (v2). btrfs subvolume snapshot -r $MNT $MNT/snap
btrfs send --compressed-data -f /tmp/test.send $MNT/snap
echo -e "\nFile bar in the original filesystem:" od -A d -t x1 $MNT/snap/bar
umount $MNT mkfs.btrfs -f $DEV > /dev/null mount $DEV $MNT
echo -e "\nReceiving stream in a new filesystem..." btrfs receive -f /tmp/test.send $MNT
echo -e "\nFile bar in the new filesystem:" od -A d -t x1 $MNT/snap/bar
umount $MNT
Before this patch, the send stream included one regular write and one encoded write for file 'bar', with the later being not sector size aligned and causing the receiver to fallback to decompression + buffered writes. The output of the btrfs receive command in verbose mode (-vvv):
(...) mkfile o258-7-0 rename o258-7-0 -> bar utimes clone bar - source=foo source offset=0 offset=0 length=32768 write bar - offset=32768 length=1024 encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0 encoded_write bar - falling back to decompress and write due to errno 22 ("Invalid argument") (...)
This patch avoids the regular write followed by an unaligned encoded write so that we end up sending a single encoded write that is aligned. So after this patch the stream content is (output of btrfs receive -vvv):
(...) mkfile o258-7-0 rename o258-7-0 -> bar utimes clone bar - source=foo source offset=0 offset=0 length=32768 encoded_write bar - offset=32768, len=4096, unencoded_offset=32768, unencoded_file_len=32768, unencoded_len=65536, compression=1, encryption=0 (...)
So we get more optimal behaviour and avoid the silent data loss bug in versions of btrfs-progs affected by the bug referred by the Link tag below (btrfs-progs v5.19, v5.19.1, v6.0 and v6.0.1).
Link: https://lore.kernel.org/linux-btrfs/cover.1668529099.git.fdmanana@suse.com/ Reviewed-by: Boris Burkov boris@bur.io Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/send.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index e258fc484cea..fb1996980d26 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -5405,6 +5405,7 @@ static int clone_range(struct send_ctx *sctx, u64 ext_len; u64 clone_len; u64 clone_data_offset; + bool crossed_src_i_size = false;
if (slot >= btrfs_header_nritems(leaf)) { ret = btrfs_next_leaf(clone_root->root, path); @@ -5461,8 +5462,10 @@ static int clone_range(struct send_ctx *sctx, if (key.offset >= clone_src_i_size) break;
- if (key.offset + ext_len > clone_src_i_size) + if (key.offset + ext_len > clone_src_i_size) { ext_len = clone_src_i_size - key.offset; + crossed_src_i_size = true; + }
clone_data_offset = btrfs_file_extent_offset(leaf, ei); if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte) { @@ -5522,6 +5525,25 @@ static int clone_range(struct send_ctx *sctx, ret = send_clone(sctx, offset, clone_len, clone_root); } + } else if (crossed_src_i_size && clone_len < len) { + /* + * If we are at i_size of the clone source inode and we + * can not clone from it, terminate the loop. This is + * to avoid sending two write operations, one with a + * length matching clone_len and the final one after + * this loop with a length of len - clone_len. + * + * When using encoded writes (BTRFS_SEND_FLAG_COMPRESSED + * was passed to the send ioctl), this helps avoid + * sending an encoded write for an offset that is not + * sector size aligned, in case the i_size of the source + * inode is not sector size aligned. That will make the + * receiver fallback to decompression of the data and + * writing it using regular buffered IO, therefore while + * not incorrect, it's not optimal due decompression and + * possible re-compression at the receiver. + */ + break; } else { ret = send_extent_data(sctx, offset, clone_len); }
From: Srinivasa Rao Mandadapu quic_srivasam@quicinc.com
[ Upstream commit db8f91d424fe0ea6db337aca8bc05908bbce1498 ]
Add NULL check in dpcm_be_reparent API, to handle kernel NULL pointer dereference error. The issue occurred in fuzzing test.
Signed-off-by: Srinivasa Rao Mandadapu quic_srivasam@quicinc.com Link: https://lore.kernel.org/r/1669098673-29703-1-git-send-email-quic_srivasam@qu... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/soc-pcm.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c index 1196167364d4..2f1ab70a68fc 100644 --- a/sound/soc/soc-pcm.c +++ b/sound/soc/soc-pcm.c @@ -1201,6 +1201,8 @@ static void dpcm_be_reparent(struct snd_soc_pcm_runtime *fe, return;
be_substream = snd_soc_dpcm_get_substream(be, stream); + if (!be_substream) + return;
for_each_dpcm_fe(be, stream, dpcm) { if (dpcm->fe == fe)
From: Andreas Kemnade andreas@kemnade.info
[ Upstream commit 31a6297b89aabc81b274c093a308a7f5b55081a7 ]
Status is reported as always off in the 6032 case. Status reporting now matches the logic in the setters. Once of the differences to the 6030 is that there are no groups, therefore the state needs to be read out in the lower bits.
Signed-off-by: Andreas Kemnade andreas@kemnade.info Link: https://lore.kernel.org/r/20221120221208.3093727-3-andreas@kemnade.info Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/twl6030-regulator.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/regulator/twl6030-regulator.c b/drivers/regulator/twl6030-regulator.c index 4ffb32ffec35..71625db3a6f1 100644 --- a/drivers/regulator/twl6030-regulator.c +++ b/drivers/regulator/twl6030-regulator.c @@ -67,6 +67,7 @@ struct twlreg_info { #define TWL6030_CFG_STATE_SLEEP 0x03 #define TWL6030_CFG_STATE_GRP_SHIFT 5 #define TWL6030_CFG_STATE_APP_SHIFT 2 +#define TWL6030_CFG_STATE_MASK 0x03 #define TWL6030_CFG_STATE_APP_MASK (0x03 << TWL6030_CFG_STATE_APP_SHIFT) #define TWL6030_CFG_STATE_APP(v) (((v) & TWL6030_CFG_STATE_APP_MASK) >>\ TWL6030_CFG_STATE_APP_SHIFT) @@ -128,13 +129,14 @@ static int twl6030reg_is_enabled(struct regulator_dev *rdev) if (grp < 0) return grp; grp &= P1_GRP_6030; + val = twlreg_read(info, TWL_MODULE_PM_RECEIVER, VREG_STATE); + val = TWL6030_CFG_STATE_APP(val); } else { + val = twlreg_read(info, TWL_MODULE_PM_RECEIVER, VREG_STATE); + val &= TWL6030_CFG_STATE_MASK; grp = 1; }
- val = twlreg_read(info, TWL_MODULE_PM_RECEIVER, VREG_STATE); - val = TWL6030_CFG_STATE_APP(val); - return grp && (val == TWL6030_CFG_STATE_ON); }
@@ -187,7 +189,12 @@ static int twl6030reg_get_status(struct regulator_dev *rdev)
val = twlreg_read(info, TWL_MODULE_PM_RECEIVER, VREG_STATE);
- switch (TWL6030_CFG_STATE_APP(val)) { + if (info->features & TWL6032_SUBCLASS) + val &= TWL6030_CFG_STATE_MASK; + else + val = TWL6030_CFG_STATE_APP(val); + + switch (val) { case TWL6030_CFG_STATE_ON: return REGULATOR_STATUS_NORMAL;
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit a6a00d7e8ffd78d1cdb7a43f1278f081038c638f ]
A kernel built with syzbot's config file reported that
scr_memcpyw(q, save, array3_size(logo_lines, new_cols, 2))
causes uninitialized "save" to be copied.
---------- [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0 [drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1 Console: switching to colour frame buffer device 128x48 ===================================================== BUG: KMSAN: uninit-value in do_update_region+0x4b8/0xba0 do_update_region+0x4b8/0xba0 update_region+0x40d/0x840 fbcon_switch+0x3364/0x35e0 redraw_screen+0xae3/0x18a0 do_bind_con_driver+0x1cb3/0x1df0 do_take_over_console+0x11cb/0x13f0 fbcon_fb_registered+0xacc/0xfd0 register_framebuffer+0x1179/0x1320 __drm_fb_helper_initial_config_and_unlock+0x23ad/0x2b40 drm_fbdev_client_hotplug+0xbea/0xda0 drm_fbdev_generic_setup+0x65e/0x9d0 vkms_init+0x9f3/0xc76 (...snipped...)
Uninit was stored to memory at: fbcon_prepare_logo+0x143b/0x1940 fbcon_init+0x2c1b/0x31c0 visual_init+0x3e7/0x820 do_bind_con_driver+0x14a4/0x1df0 do_take_over_console+0x11cb/0x13f0 fbcon_fb_registered+0xacc/0xfd0 register_framebuffer+0x1179/0x1320 __drm_fb_helper_initial_config_and_unlock+0x23ad/0x2b40 drm_fbdev_client_hotplug+0xbea/0xda0 drm_fbdev_generic_setup+0x65e/0x9d0 vkms_init+0x9f3/0xc76 (...snipped...)
Uninit was created at: __kmem_cache_alloc_node+0xb69/0x1020 __kmalloc+0x379/0x680 fbcon_prepare_logo+0x704/0x1940 fbcon_init+0x2c1b/0x31c0 visual_init+0x3e7/0x820 do_bind_con_driver+0x14a4/0x1df0 do_take_over_console+0x11cb/0x13f0 fbcon_fb_registered+0xacc/0xfd0 register_framebuffer+0x1179/0x1320 __drm_fb_helper_initial_config_and_unlock+0x23ad/0x2b40 drm_fbdev_client_hotplug+0xbea/0xda0 drm_fbdev_generic_setup+0x65e/0x9d0 vkms_init+0x9f3/0xc76 (...snipped...)
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc4-00356-g8f2975c2bb4c #924 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 ----------
Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/cad03d25-0ea0-32c4-8173-fd1895... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/core/fbcon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c index 4a544e1e2038..673af5937489 100644 --- a/drivers/video/fbdev/core/fbcon.c +++ b/drivers/video/fbdev/core/fbcon.c @@ -604,7 +604,7 @@ static void fbcon_prepare_logo(struct vc_data *vc, struct fb_info *info, if (scr_readw(r) != vc->vc_video_erase_char) break; if (r != q && new_rows >= rows + logo_lines) { - save = kmalloc(array3_size(logo_lines, new_cols, 2), + save = kzalloc(array3_size(logo_lines, new_cols, 2), GFP_KERNEL); if (save) { int i = cols < new_cols ? cols : new_cols;
From: Dominique Martinet asmadeus@codewreck.org
[ Upstream commit 391c18cf776eb4569ecda1f7794f360fe0a45a26 ]
trans_xen did not check the data fits into the buffer before copying from the xen ring, but we probably should. Add a check that just skips the request and return an error to userspace if it did not fit
Tested-by: Stefano Stabellini sstabellini@kernel.org Reviewed-by: Christian Schoenebeck linux_oss@crudebyte.com Link: https://lkml.kernel.org/r/20221118135542.63400-1-asmadeus@codewreck.org Signed-off-by: Dominique Martinet asmadeus@codewreck.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/trans_xen.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c index 2779ec1053a0..f043938ae782 100644 --- a/net/9p/trans_xen.c +++ b/net/9p/trans_xen.c @@ -230,6 +230,14 @@ static void p9_xen_response(struct work_struct *work) continue; }
+ if (h.size > req->rc.capacity) { + dev_warn(&priv->dev->dev, + "requested packet size too big: %d for tag %d with capacity %zd\n", + h.size, h.tag, req->rc.capacity); + req->status = REQ_STATUS_ERROR; + goto recv_error; + } + memcpy(&req->rc, &h, sizeof(h)); req->rc.offset = 0;
@@ -239,6 +247,7 @@ static void p9_xen_response(struct work_struct *work) masked_prod, &masked_cons, XEN_9PFS_RING_SIZE);
+recv_error: virt_mb(); cons += h.size; ring->intf->in_cons = cons;
From: Davide Tronchin davide.tronchin.94@gmail.com
[ Upstream commit a487069e11b6527373f7c6f435d8998051d0b5d9 ]
Add RmNet support for LARA-L6.
LARA-L6 module can be configured (by AT interface) in three different USB modes: * Default mode (Vendor ID: 0x1546 Product ID: 0x1341) with 4 serial interfaces * RmNet mode (Vendor ID: 0x1546 Product ID: 0x1342) with 4 serial interfaces and 1 RmNet virtual network interface * CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1343) with 4 serial interface and 1 CDC-ECM virtual network interface
In RmNet mode LARA-L6 exposes the following interfaces: If 0: Diagnostic If 1: AT parser If 2: AT parser If 3: AT parset/alternative functions If 4: RMNET interface
Signed-off-by: Davide Tronchin davide.tronchin.94@gmail.com Acked-by: Bjørn Mork bjorn@mork.no Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 7f0e3b09f776..c310cdbfd583 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1374,6 +1374,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x0489, 0xe0b4, 0)}, /* Foxconn T77W968 LTE */ {QMI_FIXED_INTF(0x0489, 0xe0b5, 0)}, /* Foxconn T77W968 LTE with eSIM support*/ {QMI_FIXED_INTF(0x2692, 0x9025, 4)}, /* Cellient MPL200 (rebranded Qualcomm 05c6:9025) */ + {QMI_QUIRK_SET_DTR(0x1546, 0x1342, 4)}, /* u-blox LARA-L6 */
/* 4. Gobi 1000 devices */ {QMI_GOBI1K_DEVICE(0x05c6, 0x9212)}, /* Acer Gobi Modem Device */
From: Jann Horn jannh@google.com
commit 8d3c106e19e8d251da31ff4cc7462e4565d65084 upstream.
pagetable walks on address ranges mapped by VMAs can be done under the mmap lock, the lock of an anon_vma attached to the VMA, or the lock of the VMA's address_space. Only one of these needs to be held, and it does not need to be held in exclusive mode.
Under those circumstances, the rules for concurrent access to page table entries are:
- Terminal page table entries (entries that don't point to another page table) can be arbitrarily changed under the page table lock, with the exception that they always need to be consistent for hardware page table walks and lockless_pages_from_mm(). This includes that they can be changed into non-terminal entries. - Non-terminal page table entries (which point to another page table) can not be modified; readers are allowed to READ_ONCE() an entry, verify that it is non-terminal, and then assume that its value will stay as-is.
Retracting a page table involves modifying a non-terminal entry, so page-table-level locks are insufficient to protect against concurrent page table traversal; it requires taking all the higher-level locks under which it is possible to start a page walk in the relevant range in exclusive mode.
The collapse_huge_page() path for anonymous THP already follows this rule, but the shmem/file THP path was getting it wrong, making it possible for concurrent rmap-based operations to cause corruption.
Link: https://lkml.kernel.org/r/20221129154730.2274278-1-jannh@google.com Link: https://lkml.kernel.org/r/20221128180252.1684965-1-jannh@google.com Link: https://lkml.kernel.org/r/20221125213714.4115729-1-jannh@google.com Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP") Signed-off-by: Jann Horn jannh@google.com Reviewed-by: Yang Shi shy828301@gmail.com Acked-by: David Hildenbrand david@redhat.com Cc: John Hubbard jhubbard@nvidia.com Cc: Peter Xu peterx@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [manual backport: this code was refactored from two copies into a common helper between 5.15 and 6.0] Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- mm/khugepaged.c | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 3c2326568193..55631cd73939 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1326,6 +1326,14 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE)) return;
+ /* + * Symmetry with retract_page_tables(): Exclude MAP_PRIVATE mappings + * that got written to. Without this, we'd have to also lock the + * anon_vma if one exists. + */ + if (vma->anon_vma) + return; + hpage = find_lock_page(vma->vm_file->f_mapping, linear_page_index(vma, haddr)); if (!hpage) @@ -1338,6 +1346,19 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) if (!pmd) goto drop_hpage;
+ /* + * We need to lock the mapping so that from here on, only GUP-fast and + * hardware page walks can access the parts of the page tables that + * we're operating on. + */ + i_mmap_lock_write(vma->vm_file->f_mapping); + + /* + * This spinlock should be unnecessary: Nobody else should be accessing + * the page tables under spinlock protection here, only + * lockless_pages_from_mm() and the hardware page walker can access page + * tables while all the high-level locks are held in write mode. + */ start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl);
/* step 1: check all mapped PTEs are to the right huge page */ @@ -1384,12 +1405,12 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) }
/* step 4: collapse pmd */ - ptl = pmd_lock(vma->vm_mm, pmd); _pmd = pmdp_collapse_flush(vma, haddr, pmd); - spin_unlock(ptl); mm_dec_nr_ptes(mm); pte_free(mm, pmd_pgtable(_pmd));
+ i_mmap_unlock_write(vma->vm_file->f_mapping); + drop_hpage: unlock_page(hpage); put_page(hpage); @@ -1397,6 +1418,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
abort: pte_unmap_unlock(start_pte, ptl); + i_mmap_unlock_write(vma->vm_file->f_mapping); goto drop_hpage; }
@@ -1446,7 +1468,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * An alternative would be drop the check, but check that page * table is clear before calling pmdp_collapse_flush() under * ptl. It has higher chance to recover THP for the VMA, but - * has higher cost too. + * has higher cost too. It would also probably require locking + * the anon_vma. */ if (vma->anon_vma) continue; @@ -1468,10 +1491,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) */ if (down_write_trylock(&mm->mmap_sem)) { if (!khugepaged_test_exit(mm)) { - spinlock_t *ptl = pmd_lock(mm, pmd); /* assume page table is clear */ _pmd = pmdp_collapse_flush(vma, addr, pmd); - spin_unlock(ptl); mm_dec_nr_ptes(mm); pte_free(mm, pmd_pgtable(_pmd)); }
From: Jann Horn jannh@google.com
commit 2ba99c5e08812494bc57f319fb562f527d9bacd8 upstream.
Since commit 70cbc3cc78a99 ("mm: gup: fix the fast GUP race against THP collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to ensure that the page table was not removed by khugepaged in between.
However, lockless_pages_from_mm() still requires that the page table is not concurrently freed. Fix it by sending IPIs (if the architecture uses semi-RCU-style page table freeing) before freeing/reusing page tables.
Link: https://lkml.kernel.org/r/20221129154730.2274278-2-jannh@google.com Link: https://lkml.kernel.org/r/20221128180252.1684965-2-jannh@google.com Link: https://lkml.kernel.org/r/20221125213714.4115729-2-jannh@google.com Fixes: ba76149f47d8 ("thp: khugepaged") Signed-off-by: Jann Horn jannh@google.com Reviewed-by: Yang Shi shy828301@gmail.com Acked-by: David Hildenbrand david@redhat.com Cc: John Hubbard jhubbard@nvidia.com Cc: Peter Xu peterx@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [manual backport: two of the three places in khugepaged that can free ptes were refactored into a common helper between 5.15 and 6.0; TLB flushing was refactored between 5.4 and 5.10] Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/asm-generic/tlb.h | 4 ++++ mm/khugepaged.c | 3 +++ mm/mmu_gather.c | 5 +++++ 3 files changed, 12 insertions(+)
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 268674c1d568..b06240b67199 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -190,12 +190,16 @@ extern void tlb_remove_table(struct mmu_gather *tlb, void *table); #define tlb_needs_table_invalidate() (true) #endif
+void tlb_remove_table_sync_one(void); + #else
#ifdef tlb_needs_table_invalidate #error tlb_needs_table_invalidate() requires HAVE_RCU_TABLE_FREE #endif
+static inline void tlb_remove_table_sync_one(void) { } + #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 55631cd73939..a8f2605cbd0d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1060,6 +1060,7 @@ static void collapse_huge_page(struct mm_struct *mm, _pmd = pmdp_collapse_flush(vma, address, pmd); spin_unlock(pmd_ptl); mmu_notifier_invalidate_range_end(&range); + tlb_remove_table_sync_one();
spin_lock(pte_ptl); isolated = __collapse_huge_page_isolate(vma, address, pte); @@ -1407,6 +1408,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) /* step 4: collapse pmd */ _pmd = pmdp_collapse_flush(vma, haddr, pmd); mm_dec_nr_ptes(mm); + tlb_remove_table_sync_one(); pte_free(mm, pmd_pgtable(_pmd));
i_mmap_unlock_write(vma->vm_file->f_mapping); @@ -1494,6 +1496,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) /* assume page table is clear */ _pmd = pmdp_collapse_flush(vma, addr, pmd); mm_dec_nr_ptes(mm); + tlb_remove_table_sync_one(); pte_free(mm, pmd_pgtable(_pmd)); } up_write(&mm->mmap_sem); diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 7c1b8f67af7b..341aa036b03c 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -117,6 +117,11 @@ static void tlb_remove_table_smp_sync(void *arg) /* Simply deliver the interrupt */ }
+void tlb_remove_table_sync_one(void) +{ + smp_call_function(tlb_remove_table_smp_sync, NULL, 1); +} + static void tlb_remove_table_one(void *table) { /*
From: Jann Horn jannh@google.com
commit f268f6cf875f3220afc77bdd0bf1bb136eb54db9 upstream.
Any codepath that zaps page table entries must invoke MMU notifiers to ensure that secondary MMUs (like KVM) don't keep accessing pages which aren't mapped anymore. Secondary MMUs don't hold their own references to pages that are mirrored over, so failing to notify them can lead to page use-after-free.
I'm marking this as addressing an issue introduced in commit f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages"), but most of the security impact of this only came in commit 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP"), which actually omitted flushes for the removal of present PTEs, not just for the removal of empty page tables.
Link: https://lkml.kernel.org/r/20221129154730.2274278-3-jannh@google.com Link: https://lkml.kernel.org/r/20221128180252.1684965-3-jannh@google.com Link: https://lkml.kernel.org/r/20221125213714.4115729-3-jannh@google.com Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Jann Horn jannh@google.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: John Hubbard jhubbard@nvidia.com Cc: Peter Xu peterx@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [manual backport: this code was refactored from two copies into a common helper between 5.15 and 6.0] Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- mm/khugepaged.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a8f2605cbd0d..8e67d2e5ff39 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1313,6 +1313,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) spinlock_t *ptl; int count = 0; int i; + struct mmu_notifier_range range;
if (!vma || !vma->vm_file || vma->vm_start > haddr || vma->vm_end < haddr + HPAGE_PMD_SIZE) @@ -1406,9 +1407,13 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) }
/* step 4: collapse pmd */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, haddr, + haddr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); _pmd = pmdp_collapse_flush(vma, haddr, pmd); mm_dec_nr_ptes(mm); tlb_remove_table_sync_one(); + mmu_notifier_invalidate_range_end(&range); pte_free(mm, pmd_pgtable(_pmd));
i_mmap_unlock_write(vma->vm_file->f_mapping); @@ -1493,11 +1498,19 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) */ if (down_write_trylock(&mm->mmap_sem)) { if (!khugepaged_test_exit(mm)) { + struct mmu_notifier_range range; + + mmu_notifier_range_init(&range, + MMU_NOTIFY_CLEAR, 0, + NULL, mm, addr, + addr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); /* assume page table is clear */ _pmd = pmdp_collapse_flush(vma, addr, pmd); mm_dec_nr_ptes(mm); tlb_remove_table_sync_one(); pte_free(mm, pmd_pgtable(_pmd)); + mmu_notifier_invalidate_range_end(&range); } up_write(&mm->mmap_sem); } else {
From: Ross Lagerwall ross.lagerwall@citrix.com
[ Upstream commit ad7f402ae4f466647c3a669b8a6f3e5d4271c84a ]
In some cases, the frontend may send a packet where the protocol headers are spread across multiple slots. This would result in netback creating an skb where the protocol headers spill over into the non-linear area. Some drivers and NICs don't handle this properly resulting in an interface reset or worse.
This issue was introduced by the removal of an unconditional skb pull in the tx path to improve performance. Fix this without reintroducing the pull by setting up grant copy ops for as many slots as needed to reach the XEN_NETBACK_TX_COPY_LEN size. Adjust the rest of the code to handle multiple copy operations per skb.
This is XSA-423 / CVE-2022-3643.
Fixes: 7e5d7753956b ("xen-netback: remove unconditional __pskb_pull_tail() in guest Tx path") Signed-off-by: Ross Lagerwall ross.lagerwall@citrix.com Reviewed-by: Paul Durrant paul@xen.org Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/xen-netback/netback.c | 223 ++++++++++++++++-------------- 1 file changed, 123 insertions(+), 100 deletions(-)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 995566a2785f..5706a1694c9c 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -323,10 +323,13 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
struct xenvif_tx_cb { - u16 pending_idx; + u16 copy_pending_idx[XEN_NETBK_LEGACY_SLOTS_MAX + 1]; + u8 copy_count; };
#define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb) +#define copy_pending_idx(skb, i) (XENVIF_TX_CB(skb)->copy_pending_idx[i]) +#define copy_count(skb) (XENVIF_TX_CB(skb)->copy_count)
static inline void xenvif_tx_create_map_op(struct xenvif_queue *queue, u16 pending_idx, @@ -361,31 +364,93 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned int size) return skb; }
-static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *queue, - struct sk_buff *skb, - struct xen_netif_tx_request *txp, - struct gnttab_map_grant_ref *gop, - unsigned int frag_overflow, - struct sk_buff *nskb) +static void xenvif_get_requests(struct xenvif_queue *queue, + struct sk_buff *skb, + struct xen_netif_tx_request *first, + struct xen_netif_tx_request *txfrags, + unsigned *copy_ops, + unsigned *map_ops, + unsigned int frag_overflow, + struct sk_buff *nskb, + unsigned int extra_count, + unsigned int data_len) { struct skb_shared_info *shinfo = skb_shinfo(skb); skb_frag_t *frags = shinfo->frags; - u16 pending_idx = XENVIF_TX_CB(skb)->pending_idx; - int start; + u16 pending_idx; pending_ring_idx_t index; unsigned int nr_slots; + struct gnttab_copy *cop = queue->tx_copy_ops + *copy_ops; + struct gnttab_map_grant_ref *gop = queue->tx_map_ops + *map_ops; + struct xen_netif_tx_request *txp = first; + + nr_slots = shinfo->nr_frags + 1; + + copy_count(skb) = 0; + + /* Create copy ops for exactly data_len bytes into the skb head. */ + __skb_put(skb, data_len); + while (data_len > 0) { + int amount = data_len > txp->size ? txp->size : data_len; + + cop->source.u.ref = txp->gref; + cop->source.domid = queue->vif->domid; + cop->source.offset = txp->offset; + + cop->dest.domid = DOMID_SELF; + cop->dest.offset = (offset_in_page(skb->data + + skb_headlen(skb) - + data_len)) & ~XEN_PAGE_MASK; + cop->dest.u.gmfn = virt_to_gfn(skb->data + skb_headlen(skb) + - data_len); + + cop->len = amount; + cop->flags = GNTCOPY_source_gref;
- nr_slots = shinfo->nr_frags; + index = pending_index(queue->pending_cons); + pending_idx = queue->pending_ring[index]; + callback_param(queue, pending_idx).ctx = NULL; + copy_pending_idx(skb, copy_count(skb)) = pending_idx; + copy_count(skb)++; + + cop++; + data_len -= amount;
- /* Skip first skb fragment if it is on same page as header fragment. */ - start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx); + if (amount == txp->size) { + /* The copy op covered the full tx_request */ + + memcpy(&queue->pending_tx_info[pending_idx].req, + txp, sizeof(*txp)); + queue->pending_tx_info[pending_idx].extra_count = + (txp == first) ? extra_count : 0; + + if (txp == first) + txp = txfrags; + else + txp++; + queue->pending_cons++; + nr_slots--; + } else { + /* The copy op partially covered the tx_request. + * The remainder will be mapped. + */ + txp->offset += amount; + txp->size -= amount; + } + }
- for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots; - shinfo->nr_frags++, txp++, gop++) { + for (shinfo->nr_frags = 0; shinfo->nr_frags < nr_slots; + shinfo->nr_frags++, gop++) { index = pending_index(queue->pending_cons++); pending_idx = queue->pending_ring[index]; - xenvif_tx_create_map_op(queue, pending_idx, txp, 0, gop); + xenvif_tx_create_map_op(queue, pending_idx, txp, + txp == first ? extra_count : 0, gop); frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx); + + if (txp == first) + txp = txfrags; + else + txp++; }
if (frag_overflow) { @@ -406,7 +471,8 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que skb_shinfo(skb)->frag_list = nskb; }
- return gop; + (*copy_ops) = cop - queue->tx_copy_ops; + (*map_ops) = gop - queue->tx_map_ops; }
static inline void xenvif_grant_handle_set(struct xenvif_queue *queue, @@ -442,7 +508,7 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, struct gnttab_copy **gopp_copy) { struct gnttab_map_grant_ref *gop_map = *gopp_map; - u16 pending_idx = XENVIF_TX_CB(skb)->pending_idx; + u16 pending_idx; /* This always points to the shinfo of the skb being checked, which * could be either the first or the one on the frag_list */ @@ -453,24 +519,37 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, struct skb_shared_info *first_shinfo = NULL; int nr_frags = shinfo->nr_frags; const bool sharedslot = nr_frags && - frag_get_pending_idx(&shinfo->frags[0]) == pending_idx; + frag_get_pending_idx(&shinfo->frags[0]) == + copy_pending_idx(skb, copy_count(skb) - 1); int i, err;
- /* Check status of header. */ - err = (*gopp_copy)->status; - if (unlikely(err)) { - if (net_ratelimit()) - netdev_dbg(queue->vif->dev, - "Grant copy of header failed! status: %d pending_idx: %u ref: %u\n", - (*gopp_copy)->status, - pending_idx, - (*gopp_copy)->source.u.ref); - /* The first frag might still have this slot mapped */ - if (!sharedslot) - xenvif_idx_release(queue, pending_idx, - XEN_NETIF_RSP_ERROR); + for (i = 0; i < copy_count(skb); i++) { + int newerr; + + /* Check status of header. */ + pending_idx = copy_pending_idx(skb, i); + + newerr = (*gopp_copy)->status; + if (likely(!newerr)) { + /* The first frag might still have this slot mapped */ + if (i < copy_count(skb) - 1 || !sharedslot) + xenvif_idx_release(queue, pending_idx, + XEN_NETIF_RSP_OKAY); + } else { + err = newerr; + if (net_ratelimit()) + netdev_dbg(queue->vif->dev, + "Grant copy of header failed! status: %d pending_idx: %u ref: %u\n", + (*gopp_copy)->status, + pending_idx, + (*gopp_copy)->source.u.ref); + /* The first frag might still have this slot mapped */ + if (i < copy_count(skb) - 1 || !sharedslot) + xenvif_idx_release(queue, pending_idx, + XEN_NETIF_RSP_ERROR); + } + (*gopp_copy)++; } - (*gopp_copy)++;
check_frags: for (i = 0; i < nr_frags; i++, gop_map++) { @@ -517,14 +596,6 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, if (err) continue;
- /* First error: if the header haven't shared a slot with the - * first frag, release it as well. - */ - if (!sharedslot) - xenvif_idx_release(queue, - XENVIF_TX_CB(skb)->pending_idx, - XEN_NETIF_RSP_OKAY); - /* Invalidate preceding fragments of this skb. */ for (j = 0; j < i; j++) { pending_idx = frag_get_pending_idx(&shinfo->frags[j]); @@ -794,7 +865,6 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, unsigned *copy_ops, unsigned *map_ops) { - struct gnttab_map_grant_ref *gop = queue->tx_map_ops; struct sk_buff *skb, *nskb; int ret; unsigned int frag_overflow; @@ -876,8 +946,12 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, continue; }
+ data_len = (txreq.size > XEN_NETBACK_TX_COPY_LEN) ? + XEN_NETBACK_TX_COPY_LEN : txreq.size; + ret = xenvif_count_requests(queue, &txreq, extra_count, txfrags, work_to_do); + if (unlikely(ret < 0)) break;
@@ -903,9 +977,8 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, index = pending_index(queue->pending_cons); pending_idx = queue->pending_ring[index];
- data_len = (txreq.size > XEN_NETBACK_TX_COPY_LEN && - ret < XEN_NETBK_LEGACY_SLOTS_MAX) ? - XEN_NETBACK_TX_COPY_LEN : txreq.size; + if (ret >= XEN_NETBK_LEGACY_SLOTS_MAX - 1 && data_len < txreq.size) + data_len = txreq.size;
skb = xenvif_alloc_skb(data_len); if (unlikely(skb == NULL)) { @@ -916,8 +989,6 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, }
skb_shinfo(skb)->nr_frags = ret; - if (data_len < txreq.size) - skb_shinfo(skb)->nr_frags++; /* At this point shinfo->nr_frags is in fact the number of * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX. */ @@ -979,54 +1050,19 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, type); }
- XENVIF_TX_CB(skb)->pending_idx = pending_idx; - - __skb_put(skb, data_len); - queue->tx_copy_ops[*copy_ops].source.u.ref = txreq.gref; - queue->tx_copy_ops[*copy_ops].source.domid = queue->vif->domid; - queue->tx_copy_ops[*copy_ops].source.offset = txreq.offset; - - queue->tx_copy_ops[*copy_ops].dest.u.gmfn = - virt_to_gfn(skb->data); - queue->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF; - queue->tx_copy_ops[*copy_ops].dest.offset = - offset_in_page(skb->data) & ~XEN_PAGE_MASK; - - queue->tx_copy_ops[*copy_ops].len = data_len; - queue->tx_copy_ops[*copy_ops].flags = GNTCOPY_source_gref; - - (*copy_ops)++; - - if (data_len < txreq.size) { - frag_set_pending_idx(&skb_shinfo(skb)->frags[0], - pending_idx); - xenvif_tx_create_map_op(queue, pending_idx, &txreq, - extra_count, gop); - gop++; - } else { - frag_set_pending_idx(&skb_shinfo(skb)->frags[0], - INVALID_PENDING_IDX); - memcpy(&queue->pending_tx_info[pending_idx].req, - &txreq, sizeof(txreq)); - queue->pending_tx_info[pending_idx].extra_count = - extra_count; - } - - queue->pending_cons++; - - gop = xenvif_get_requests(queue, skb, txfrags, gop, - frag_overflow, nskb); + xenvif_get_requests(queue, skb, &txreq, txfrags, copy_ops, + map_ops, frag_overflow, nskb, extra_count, + data_len);
__skb_queue_tail(&queue->tx_queue, skb);
queue->tx.req_cons = idx;
- if (((gop-queue->tx_map_ops) >= ARRAY_SIZE(queue->tx_map_ops)) || + if ((*map_ops >= ARRAY_SIZE(queue->tx_map_ops)) || (*copy_ops >= ARRAY_SIZE(queue->tx_copy_ops))) break; }
- (*map_ops) = gop - queue->tx_map_ops; return; }
@@ -1105,9 +1141,8 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) while ((skb = __skb_dequeue(&queue->tx_queue)) != NULL) { struct xen_netif_tx_request *txp; u16 pending_idx; - unsigned data_len;
- pending_idx = XENVIF_TX_CB(skb)->pending_idx; + pending_idx = copy_pending_idx(skb, 0); txp = &queue->pending_tx_info[pending_idx].req;
/* Check the remap error code. */ @@ -1126,18 +1161,6 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) continue; }
- data_len = skb->len; - callback_param(queue, pending_idx).ctx = NULL; - if (data_len < txp->size) { - /* Append the packet payload as a fragment. */ - txp->offset += data_len; - txp->size -= data_len; - } else { - /* Schedule a response immediately. */ - xenvif_idx_release(queue, pending_idx, - XEN_NETIF_RSP_OKAY); - } - if (txp->flags & XEN_NETTXF_csum_blank) skb->ip_summed = CHECKSUM_PARTIAL; else if (txp->flags & XEN_NETTXF_data_validated) @@ -1323,7 +1346,7 @@ static inline void xenvif_tx_dealloc_action(struct xenvif_queue *queue) /* Called after netfront has transmitted */ int xenvif_tx_action(struct xenvif_queue *queue, int budget) { - unsigned nr_mops, nr_cops = 0; + unsigned nr_mops = 0, nr_cops = 0; int work_done, ret;
if (unlikely(!tx_work_todo(queue)))
From: Juergen Gross jgross@suse.com
[ Upstream commit 5834e72eda0b7e5767eb107259d98eef19ebd11f ]
Remove some unused macros and functions, make local functions static.
Signed-off-by: Juergen Gross jgross@suse.com Acked-by: Wei Liu wei.liu@kernel.org Link: https://lore.kernel.org/r/20220608043726.9380-1-jgross@suse.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 74e7e1efdad4 ("xen/netback: don't call kfree_skb() with interrupts disabled") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/xen-netback/common.h | 12 ------------ drivers/net/xen-netback/interface.c | 16 +--------------- drivers/net/xen-netback/netback.c | 4 +++- drivers/net/xen-netback/rx.c | 2 +- 4 files changed, 5 insertions(+), 29 deletions(-)
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index f7e746f1c9fb..fa52d5ffca72 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -48,7 +48,6 @@ #include <linux/debugfs.h>
typedef unsigned int pending_ring_idx_t; -#define INVALID_PENDING_RING_IDX (~0U)
struct pending_tx_info { struct xen_netif_tx_request req; /* tx request */ @@ -82,8 +81,6 @@ struct xenvif_rx_meta { /* Discriminate from any valid pending_idx value. */ #define INVALID_PENDING_IDX 0xFFFF
-#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE - #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
/* The maximum number of frags is derived from the size of a grant (same @@ -364,11 +361,6 @@ void xenvif_free(struct xenvif *vif); int xenvif_xenbus_init(void); void xenvif_xenbus_fini(void);
-int xenvif_schedulable(struct xenvif *vif); - -int xenvif_queue_stopped(struct xenvif_queue *queue); -void xenvif_wake_queue(struct xenvif_queue *queue); - /* (Un)Map communication rings. */ void xenvif_unmap_frontend_data_rings(struct xenvif_queue *queue); int xenvif_map_frontend_data_rings(struct xenvif_queue *queue, @@ -391,7 +383,6 @@ int xenvif_dealloc_kthread(void *data); irqreturn_t xenvif_ctrl_irq_fn(int irq, void *data);
bool xenvif_have_rx_work(struct xenvif_queue *queue, bool test_kthread); -void xenvif_rx_action(struct xenvif_queue *queue); void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
void xenvif_carrier_on(struct xenvif *vif); @@ -399,9 +390,6 @@ void xenvif_carrier_on(struct xenvif *vif); /* Callback from stack when TX packet can be released */ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
-/* Unmap a pending page and release it back to the guest */ -void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx); - static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue) { return MAX_PENDING_REQS - diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 809089587301..5efe86b3ba06 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -70,7 +70,7 @@ void xenvif_skb_zerocopy_complete(struct xenvif_queue *queue) wake_up(&queue->dealloc_wq); }
-int xenvif_schedulable(struct xenvif *vif) +static int xenvif_schedulable(struct xenvif *vif) { return netif_running(vif->dev) && test_bit(VIF_STATUS_CONNECTED, &vif->status) && @@ -178,20 +178,6 @@ irqreturn_t xenvif_interrupt(int irq, void *dev_id) return IRQ_HANDLED; }
-int xenvif_queue_stopped(struct xenvif_queue *queue) -{ - struct net_device *dev = queue->vif->dev; - unsigned int id = queue->id; - return netif_tx_queue_stopped(netdev_get_tx_queue(dev, id)); -} - -void xenvif_wake_queue(struct xenvif_queue *queue) -{ - struct net_device *dev = queue->vif->dev; - unsigned int id = queue->id; - netif_tx_wake_queue(netdev_get_tx_queue(dev, id)); -} - static u16 xenvif_select_queue(struct net_device *dev, struct sk_buff *skb, struct net_device *sb_dev) { diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 5706a1694c9c..982e501173f1 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -105,6 +105,8 @@ static void make_tx_response(struct xenvif_queue *queue, s8 st); static void push_tx_responses(struct xenvif_queue *queue);
+static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx); + static inline int tx_work_todo(struct xenvif_queue *queue);
static inline unsigned long idx_to_pfn(struct xenvif_queue *queue, @@ -1433,7 +1435,7 @@ static void push_tx_responses(struct xenvif_queue *queue) notify_remote_via_irq(queue->tx_irq); }
-void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx) +static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx) { int ret; struct gnttab_unmap_grant_ref tx_unmap_op; diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c index 85a5a622ec18..6f940a32dcb8 100644 --- a/drivers/net/xen-netback/rx.c +++ b/drivers/net/xen-netback/rx.c @@ -473,7 +473,7 @@ static void xenvif_rx_skb(struct xenvif_queue *queue)
#define RX_BATCH_SIZE 64
-void xenvif_rx_action(struct xenvif_queue *queue) +static void xenvif_rx_action(struct xenvif_queue *queue) { struct sk_buff_head completed_skbs; unsigned int work_done = 0;
From: Juergen Gross jgross@suse.com
[ Upstream commit 74e7e1efdad45580cc3839f2a155174cf158f9b5 ]
It is not allowed to call kfree_skb() from hardware interrupt context or with interrupts being disabled. So remove kfree_skb() from the spin_lock_irqsave() section and use the already existing "drop" label in xenvif_start_xmit() for dropping the SKB. At the same time replace the dev_kfree_skb() call there with a call of dev_kfree_skb_any(), as xenvif_start_xmit() can be called with disabled interrupts.
This is XSA-424 / CVE-2022-42328 / CVE-2022-42329.
Fixes: be81992f9086 ("xen/netback: don't queue unlimited number of packages") Reported-by: Yang Yingliang yangyingliang@huawei.com Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/xen-netback/common.h | 2 +- drivers/net/xen-netback/interface.c | 6 ++++-- drivers/net/xen-netback/rx.c | 8 +++++--- 3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index fa52d5ffca72..ced413d394cd 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -383,7 +383,7 @@ int xenvif_dealloc_kthread(void *data); irqreturn_t xenvif_ctrl_irq_fn(int irq, void *data);
bool xenvif_have_rx_work(struct xenvif_queue *queue, bool test_kthread); -void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb); +bool xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
void xenvif_carrier_on(struct xenvif *vif);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 5efe86b3ba06..6432f6e7fd54 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -255,14 +255,16 @@ xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev) if (vif->hash.alg == XEN_NETIF_CTRL_HASH_ALGORITHM_NONE) skb_clear_hash(skb);
- xenvif_rx_queue_tail(queue, skb); + if (!xenvif_rx_queue_tail(queue, skb)) + goto drop; + xenvif_kick_thread(queue);
return NETDEV_TX_OK;
drop: vif->dev->stats.tx_dropped++; - dev_kfree_skb(skb); + dev_kfree_skb_any(skb); return NETDEV_TX_OK; }
diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c index 6f940a32dcb8..ab216970137c 100644 --- a/drivers/net/xen-netback/rx.c +++ b/drivers/net/xen-netback/rx.c @@ -82,9 +82,10 @@ static bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue) return false; }
-void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb) +bool xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb) { unsigned long flags; + bool ret = true;
spin_lock_irqsave(&queue->rx_queue.lock, flags);
@@ -92,8 +93,7 @@ void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb) struct net_device *dev = queue->vif->dev;
netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id)); - kfree_skb(skb); - queue->vif->dev->stats.rx_dropped++; + ret = false; } else { if (skb_queue_empty(&queue->rx_queue)) xenvif_update_needed_slots(queue, skb); @@ -104,6 +104,8 @@ void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb) }
spin_unlock_irqrestore(&queue->rx_queue.lock, flags); + + return ret; }
static struct sk_buff *xenvif_rx_dequeue(struct xenvif_queue *queue)
From: Rafał Miłecki rafal@milecki.pl
This reverts commit 1fae6eb0fc91d3ecb539e03f9e4dcd1c53ada553.
Upstream commit was a fix for an overlook of setting "ent.is_valid" twice after 5d65b64a3d97 ("net: dsa: b53: Add support for MDB").
Since MDB support was not backported to stable kernels (it's not a bug fix) there is nothing to fix there. Backporting this commit resulted in "env.is_valid" not being set at all.
Signed-off-by: Rafał Miłecki rafal@milecki.pl Acked-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/b53/b53_common.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -1551,6 +1551,7 @@ static int b53_arl_op(struct b53_device
memset(&ent, 0, sizeof(ent)); ent.port = port; + ent.is_valid = is_valid; ent.vid = vid; ent.is_static = true; memcpy(ent.mac, addr, ETH_ALEN);
From: Hans Verkuil hverkuil-cisco@xs4all.nl
commit 5eef2141776da02772c44ec406d6871a790761ee upstream.
Sanity checks were added to verify the v4l2_bt_timings blanking fields in order to avoid integer overflows when userspace passes weird values.
But that assumed that userspace would correctly fill in the front porch, backporch and sync values, but sometimes all you know is the total blanking, which is then assigned to just one of these fields.
And that can fail with these checks.
So instead set a maximum for the total horizontal and vertical blanking and check that each field remains below that.
That is still sufficient to avoid integer overflows, but it also allows for more flexibility in how userspace fills in these fields.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Fixes: 4b6d66a45ed3 ("media: v4l2-dv-timings: add sanity checks for blanking values") Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/v4l2-core/v4l2-dv-timings.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
--- a/drivers/media/v4l2-core/v4l2-dv-timings.c +++ b/drivers/media/v4l2-core/v4l2-dv-timings.c @@ -145,6 +145,8 @@ bool v4l2_valid_dv_timings(const struct const struct v4l2_bt_timings *bt = &t->bt; const struct v4l2_bt_timings_cap *cap = &dvcap->bt; u32 caps = cap->capabilities; + const u32 max_vert = 10240; + u32 max_hor = 3 * bt->width;
if (t->type != V4L2_DV_BT_656_1120) return false; @@ -166,14 +168,20 @@ bool v4l2_valid_dv_timings(const struct if (!bt->interlaced && (bt->il_vbackporch || bt->il_vsync || bt->il_vfrontporch)) return false; - if (bt->hfrontporch > 2 * bt->width || - bt->hsync > 1024 || bt->hbackporch > 1024) + /* + * Some video receivers cannot properly separate the frontporch, + * backporch and sync values, and instead they only have the total + * blanking. That can be assigned to any of these three fields. + * So just check that none of these are way out of range. + */ + if (bt->hfrontporch > max_hor || + bt->hsync > max_hor || bt->hbackporch > max_hor) return false; - if (bt->vfrontporch > 4096 || - bt->vsync > 128 || bt->vbackporch > 4096) + if (bt->vfrontporch > max_vert || + bt->vsync > max_vert || bt->vbackporch > max_vert) return false; - if (bt->interlaced && (bt->il_vfrontporch > 4096 || - bt->il_vsync > 128 || bt->il_vbackporch > 4096)) + if (bt->interlaced && (bt->il_vfrontporch > max_vert || + bt->il_vsync > max_vert || bt->il_vbackporch > max_vert)) return false; return fnc == NULL || fnc(t, fnc_handle); }
From: Tejun Heo tj@kernel.org
commit 4a7ba45b1a435e7097ca0f79a847d0949d0eb088 upstream.
memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too.
Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a call to __file_cft() which verified that the specified file is a regular cgroupfs file before further accesses. The cftype pointer returned from __file_cft() was no longer necessary and the commit inadvertently dropped the file type check with it allowing any file to slip through. With the invarients broken, the d_name and parent accesses can now race against renames and removals of arbitrary files and cause use-after-free's.
Fix the bug by resurrecting the file type check in __file_cft(). Now that cgroupfs is implemented through kernfs, checking the file operations needs to go through a layer of indirection. Instead, let's check the superblock and dentry type.
Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft") Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Jann Horn jannh@google.com Acked-by: Roman Gushchin roman.gushchin@linux.dev Acked-by: Johannes Weiner hannes@cmpxchg.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Michal Hocko mhocko@kernel.org Cc: Muchun Song songmuchun@bytedance.com Cc: Shakeel Butt shakeelb@google.com Cc: stable@vger.kernel.org [3.14+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/cgroup.h | 1 + kernel/cgroup/cgroup-internal.h | 1 - mm/memcontrol.c | 15 +++++++++++++-- 3 files changed, 14 insertions(+), 3 deletions(-)
--- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -69,6 +69,7 @@ struct css_task_iter { struct list_head iters_node; /* css_set->task_iters */ };
+extern struct file_system_type cgroup_fs_type; extern struct cgroup_root cgrp_dfl_root; extern struct css_set init_css_set;
--- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -169,7 +169,6 @@ extern struct mutex cgroup_mutex; extern spinlock_t css_set_lock; extern struct cgroup_subsys *cgroup_subsys[]; extern struct list_head cgroup_roots; -extern struct file_system_type cgroup_fs_type;
/* iterate across the hierarchies */ #define for_each_root(root) \ --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4709,6 +4709,7 @@ static ssize_t memcg_write_event_control unsigned int efd, cfd; struct fd efile; struct fd cfile; + struct dentry *cdentry; const char *name; char *endp; int ret; @@ -4760,6 +4761,16 @@ static ssize_t memcg_write_event_control goto out_put_cfile;
/* + * The control file must be a regular cgroup1 file. As a regular cgroup + * file can't be renamed, it's safe to access its name afterwards. + */ + cdentry = cfile.file->f_path.dentry; + if (cdentry->d_sb->s_type != &cgroup_fs_type || !d_is_reg(cdentry)) { + ret = -EINVAL; + goto out_put_cfile; + } + + /* * Determine the event callbacks and set them in @event. This used * to be done via struct cftype but cgroup core no longer knows * about these events. The following is crude but the whole thing @@ -4767,7 +4778,7 @@ static ssize_t memcg_write_event_control * * DO NOT ADD NEW FILES. */ - name = cfile.file->f_path.dentry->d_name.name; + name = cdentry->d_name.name;
if (!strcmp(name, "memory.usage_in_bytes")) { event->register_event = mem_cgroup_usage_register_event; @@ -4791,7 +4802,7 @@ static ssize_t memcg_write_event_control * automatically removed on cgroup destruction but the removal is * asynchronous, so take an extra ref on @css. */ - cfile_css = css_tryget_online_from_dir(cfile.file->f_path.dentry->d_parent, + cfile_css = css_tryget_online_from_dir(cdentry->d_parent, &memory_cgrp_subsys); ret = -EINVAL; if (IS_ERR(cfile_css))
From: John Starks jostarks@microsoft.com
commit fcd0ccd836ffad73d98a66f6fea7b16f735ea920 upstream.
For dax pud, pud_huge() returns true on x86. So the function works as long as hugetlb is configured. However, dax doesn't depend on hugetlb. Commit 414fd080d125 ("mm/gup: fix gup_pmd_range() for dax") fixed devmap-backed huge PMDs, but missed devmap-backed huge PUDs. Fix this as well.
This fixes the below kernel panic:
general protection fault, probably for non-canonical address 0x69e7c000cc478: 0000 [#1] SMP < snip > Call Trace: <TASK> get_user_pages_fast+0x1f/0x40 iov_iter_get_pages+0xc6/0x3b0 ? mempool_alloc+0x5d/0x170 bio_iov_iter_get_pages+0x82/0x4e0 ? bvec_alloc+0x91/0xc0 ? bio_alloc_bioset+0x19a/0x2a0 blkdev_direct_IO+0x282/0x480 ? __io_complete_rw_common+0xc0/0xc0 ? filemap_range_has_page+0x82/0xc0 generic_file_direct_write+0x9d/0x1a0 ? inode_update_time+0x24/0x30 __generic_file_write_iter+0xbd/0x1e0 blkdev_write_iter+0xb4/0x150 ? io_import_iovec+0x8d/0x340 io_write+0xf9/0x300 io_issue_sqe+0x3c3/0x1d30 ? sysvec_reschedule_ipi+0x6c/0x80 __io_queue_sqe+0x33/0x240 ? fget+0x76/0xa0 io_submit_sqes+0xe6a/0x18d0 ? __fget_light+0xd1/0x100 __x64_sys_io_uring_enter+0x199/0x880 ? __context_tracking_enter+0x1f/0x70 ? irqentry_exit_to_user_mode+0x24/0x30 ? irqentry_exit+0x1d/0x30 ? __context_tracking_exit+0xe/0x70 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x61/0xcb RIP: 0033:0x7fc97c11a7be < snip > </TASK> ---[ end trace 48b2e0e67debcaeb ]--- RIP: 0010:internal_get_user_pages_fast+0x340/0x990 < snip > Kernel panic - not syncing: Fatal exception Kernel Offset: disabled
Link: https://lkml.kernel.org/r/1670392853-28252-1-git-send-email-ssengar@linux.mi... Fixes: 414fd080d125 ("mm/gup: fix gup_pmd_range() for dax") Signed-off-by: John Starks jostarks@microsoft.com Signed-off-by: Saurabh Sengar ssengar@linux.microsoft.com Cc: Jan Kara jack@suse.cz Cc: Yu Zhao yuzhao@google.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: John Hubbard jhubbard@nvidia.com Cc: David Hildenbrand david@redhat.com Cc: Dan Williams dan.j.williams@intel.com Cc: Alistair Popple apopple@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/gup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/gup.c +++ b/mm/gup.c @@ -2240,7 +2240,7 @@ static int gup_pud_range(p4d_t *p4dp, p4 next = pud_addr_end(addr, end); if (pud_none(pud)) return 0; - if (unlikely(pud_huge(pud))) { + if (unlikely(pud_huge(pud) || pud_devmap(pud))) { if (!gup_huge_pud(pud, pudp, addr, next, flags, pages, nr)) return 0;
From: Thomas Huth thuth@redhat.com
commit 0dd4cdccdab3d74bd86b868768a7dca216bcce7e upstream.
We recently experienced some weird huge time jumps in nested guests when rebooting them in certain cases. After adding some debug code to the epoch handling in vsie.c (thanks to David Hildenbrand for the idea!), it was obvious that the "epdx" field (the multi-epoch extension) did not get set to 0xff in case the "epoch" field was negative. Seems like the code misses to copy the value from the epdx field from the guest to the shadow control block. By doing so, the weird time jumps are gone in our scenarios.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2140899 Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Signed-off-by: Thomas Huth thuth@redhat.com Reviewed-by: Christian Borntraeger borntraeger@linux.ibm.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Claudio Imbrenda imbrenda@linux.ibm.com Reviewed-by: Janosch Frank frankja@linux.ibm.com Cc: stable@vger.kernel.org # 4.19+ Link: https://lore.kernel.org/r/20221123090833.292938-1-thuth@redhat.com Message-Id: 20221123090833.292938-1-thuth@redhat.com Signed-off-by: Janosch Frank frankja@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kvm/vsie.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -540,8 +540,10 @@ static int shadow_scb(struct kvm_vcpu *v if (test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_CEI)) scb_s->eca |= scb_o->eca & ECA_CEI; /* Epoch Extension */ - if (test_kvm_facility(vcpu->kvm, 139)) + if (test_kvm_facility(vcpu->kvm, 139)) { scb_s->ecd |= scb_o->ecd & ECD_MEF; + scb_s->epdx = scb_o->epdx; + }
/* etoken */ if (test_kvm_facility(vcpu->kvm, 156))
From: Rob Clark robdclark@chromium.org
commit 24013314be6ee4ee456114a671e9fa3461323de8 upstream.
drm_gem_shmem_mmap() doesn't own this reference, resulting in the GEM object getting prematurely freed leading to a later use-after-free.
Link: https://syzkaller.appspot.com/bug?extid=c8ae65286134dd1b800d Reported-by: syzbot+c8ae65286134dd1b800d@syzkaller.appspotmail.com Fixes: 2194a63a818d ("drm: Add library for shmem backed GEM objects") Cc: stable@vger.kernel.org Signed-off-by: Rob Clark robdclark@chromium.org Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Javier Martinez Canillas javierm@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20221130185748.357410-2-robdcl... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_gem_shmem_helper.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -554,10 +554,8 @@ int drm_gem_shmem_mmap(struct file *filp shmem = to_drm_gem_shmem_obj(vma->vm_private_data);
ret = drm_gem_shmem_get_pages(shmem); - if (ret) { - drm_gem_vm_close(vma); + if (ret) return ret; - }
/* VM_PFNMAP was set by drm_gem_mmap() */ vma->vm_flags &= ~VM_PFNMAP;
From: Ankit Patel anpatel@nvidia.com
commit f6d910a89a2391e5ce1f275d205023880a33d3f8 upstream.
Some additional USB mouse devices are needing ALWAYS_POLL quirk without which they disconnect and reconnect every 60s.
Add below devices to the known quirk list. CHERRY VID 0x046a, PID 0x000c MICROSOFT VID 0x045e, PID 0x0783 PRIMAX VID 0x0461, PID 0x4e2a
Signed-off-by: Ankit Patel anpatel@nvidia.com Signed-off-by: Haotien Hsu haotienh@nvidia.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-ids.h | 3 +++ drivers/hid/hid-quirks.c | 3 +++ 2 files changed, 6 insertions(+)
--- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -259,6 +259,7 @@ #define USB_DEVICE_ID_CH_AXIS_295 0x001c
#define USB_VENDOR_ID_CHERRY 0x046a +#define USB_DEVICE_ID_CHERRY_MOUSE_000C 0x000c #define USB_DEVICE_ID_CHERRY_CYMOTION 0x0023 #define USB_DEVICE_ID_CHERRY_CYMOTION_SOLAR 0x0027
@@ -864,6 +865,7 @@ #define USB_DEVICE_ID_MS_XBOX_ONE_S_CONTROLLER 0x02fd #define USB_DEVICE_ID_MS_PIXART_MOUSE 0x00cb #define USB_DEVICE_ID_8BITDO_SN30_PRO_PLUS 0x02e0 +#define USB_DEVICE_ID_MS_MOUSE_0783 0x0783
#define USB_VENDOR_ID_MOJO 0x8282 #define USB_DEVICE_ID_RETRO_ADAPTER 0x3201 @@ -1292,6 +1294,7 @@
#define USB_VENDOR_ID_PRIMAX 0x0461 #define USB_DEVICE_ID_PRIMAX_MOUSE_4D22 0x4d22 +#define USB_DEVICE_ID_PRIMAX_MOUSE_4E2A 0x4e2a #define USB_DEVICE_ID_PRIMAX_KEYBOARD 0x4e05 #define USB_DEVICE_ID_PRIMAX_REZEL 0x4e72 #define USB_DEVICE_ID_PRIMAX_PIXART_MOUSE_4D0F 0x4d0f --- a/drivers/hid/hid-quirks.c +++ b/drivers/hid/hid-quirks.c @@ -54,6 +54,7 @@ static const struct hid_device_id hid_qu { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_FLIGHT_SIM_YOKE), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_PRO_PEDALS), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_PRO_THROTTLE), HID_QUIRK_NOGET }, + { HID_USB_DEVICE(USB_VENDOR_ID_CHERRY, USB_DEVICE_ID_CHERRY_MOUSE_000C), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K65RGB), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K65RGB_RAPIDFIRE), HID_QUIRK_NO_INIT_REPORTS | HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K70RGB), HID_QUIRK_NO_INIT_REPORTS }, @@ -122,6 +123,7 @@ static const struct hid_device_id hid_qu { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_MOUSE_C05A), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_MOUSE_C06A), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_MCS, USB_DEVICE_ID_MCS_GAMEPADBLOCK), HID_QUIRK_MULTI_INPUT }, + { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_MOUSE_0783), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_PIXART_MOUSE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_POWER_COVER), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_SURFACE3_COVER), HID_QUIRK_NO_INIT_REPORTS }, @@ -146,6 +148,7 @@ static const struct hid_device_id hid_qu { HID_USB_DEVICE(USB_VENDOR_ID_PIXART, USB_DEVICE_ID_PIXART_OPTICAL_TOUCH_SCREEN), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_PIXART, USB_DEVICE_ID_PIXART_USB_OPTICAL_MOUSE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_MOUSE_4D22), HID_QUIRK_ALWAYS_POLL }, + { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_MOUSE_4E2A), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_PIXART_MOUSE_4D0F), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_PIXART_MOUSE_4D65), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_PIXART_MOUSE_4E22), HID_QUIRK_ALWAYS_POLL },
From: Anastasia Belova abelova@astralinux.ru
commit d180b6496143cd360c5d5f58ae4b9a8229c1f344 upstream.
If an empty buf is received, lbuf is also empty. So lbuf is accessed by index -1.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: f31a2de3fe36 ("HID: hid-lg4ff: Allow switching of Logitech gaming wheels between compatibility modes") Signed-off-by: Anastasia Belova abelova@astralinux.ru Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-lg4ff.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/hid/hid-lg4ff.c +++ b/drivers/hid/hid-lg4ff.c @@ -872,6 +872,12 @@ static ssize_t lg4ff_alternate_modes_sto return -ENOMEM;
i = strlen(lbuf); + + if (i == 0) { + kfree(lbuf); + return -EINVAL; + } + if (lbuf[i-1] == '\n') { if (i == 1) { kfree(lbuf);
From: ZhangPeng zhangpeng362@huawei.com
commit ec61b41918587be530398b0d1c9a0d16619397e5 upstream.
Syzbot reported shift-out-of-bounds in hid_report_raw_event.
microsoft 0003:045E:07DA.0001: hid_field_extract() called with n (128) > 32! (swapper/0) ====================================================================== UBSAN: shift-out-of-bounds in drivers/hid/hid-core.c:1323:20 shift exponent 127 is too large for 32-bit type 'int' CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc4-syzkaller-00159-g4bbf3422df78 #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:151 [inline] __ubsan_handle_shift_out_of_bounds+0x3a6/0x420 lib/ubsan.c:322 snto32 drivers/hid/hid-core.c:1323 [inline] hid_input_fetch_field drivers/hid/hid-core.c:1572 [inline] hid_process_report drivers/hid/hid-core.c:1665 [inline] hid_report_raw_event+0xd56/0x18b0 drivers/hid/hid-core.c:1998 hid_input_report+0x408/0x4f0 drivers/hid/hid-core.c:2066 hid_irq_in+0x459/0x690 drivers/hid/usbhid/hid-core.c:284 __usb_hcd_giveback_urb+0x369/0x530 drivers/usb/core/hcd.c:1671 dummy_timer+0x86b/0x3110 drivers/usb/gadget/udc/dummy_hcd.c:1988 call_timer_fn+0xf5/0x210 kernel/time/timer.c:1474 expire_timers kernel/time/timer.c:1519 [inline] __run_timers+0x76a/0x980 kernel/time/timer.c:1790 run_timer_softirq+0x63/0xf0 kernel/time/timer.c:1803 __do_softirq+0x277/0x75b kernel/softirq.c:571 __irq_exit_rcu+0xec/0x170 kernel/softirq.c:650 irq_exit_rcu+0x5/0x20 kernel/softirq.c:662 sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1107 ======================================================================
If the size of the integer (unsigned n) is bigger than 32 in snto32(), shift exponent will be too large for 32-bit type 'int', resulting in a shift-out-of-bounds bug. Fix this by adding a check on the size of the integer (unsigned n) in snto32(). To add support for n greater than 32 bits, set n to 32, if n is greater than 32.
Reported-by: syzbot+8b1641d2f14732407e23@syzkaller.appspotmail.com Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split") Signed-off-by: ZhangPeng zhangpeng362@huawei.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-core.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -1303,6 +1303,9 @@ static s32 snto32(__u32 value, unsigned if (!value || !n) return 0;
+ if (n > 32) + n = 32; + switch (n) { case 8: return ((__s8)value); case 16: return ((__s16)value);
From: Oliver Hartkopp socketcan@hartkopp.net
commit 0acc442309a0a1b01bcdaa135e56e6398a49439c upstream.
Analogue to commit 8aa59e355949 ("can: af_can: fix NULL pointer dereference in can_rx_register()") we need to check for a missing initialization of ml_priv in the receive path of CAN frames.
Since commit 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device") the check for dev->type to be ARPHRD_CAN is not sufficient anymore since bonding or tun netdevices claim to be CAN devices but do not initialize ml_priv accordingly.
Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device") Reported-by: syzbot+2d7f58292cb5b29eb5ad@syzkaller.appspotmail.com Reported-by: Wei Chen harperchen1110@gmail.com Signed-off-by: Oliver Hartkopp socketcan@hartkopp.net Link: https://lore.kernel.org/all/20221206201259.3028-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/can/af_can.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -678,7 +678,7 @@ static int can_rcv(struct sk_buff *skb, { struct canfd_frame *cfd = (struct canfd_frame *)skb->data;
- if (unlikely(dev->type != ARPHRD_CAN || skb->len != CAN_MTU)) { + if (unlikely(dev->type != ARPHRD_CAN || !can_get_ml_priv(dev) || skb->len != CAN_MTU)) { pr_warn_once("PF_CAN: dropped non conform CAN skbuff: dev type %d, len %d\n", dev->type, skb->len); goto free_skb; @@ -704,7 +704,7 @@ static int canfd_rcv(struct sk_buff *skb { struct canfd_frame *cfd = (struct canfd_frame *)skb->data;
- if (unlikely(dev->type != ARPHRD_CAN || skb->len != CANFD_MTU)) { + if (unlikely(dev->type != ARPHRD_CAN || !can_get_ml_priv(dev) || skb->len != CANFD_MTU)) { pr_warn_once("PF_CAN: dropped non conform CAN FD skbuff: dev type %d, len %d\n", dev->type, skb->len); goto free_skb;
From: Baolin Wang baolin.wang@linux.alibaba.com
commit fac35ba763ed07ba93154c95ffc0c4a55023707f upstream.
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and 1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
So when looking up a CONT-PTE size hugetlb page by follow_page(), it will use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size hugetlb in follow_page_pte(). However this pte entry lock is incorrect for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get the correct lock, which is mm->page_table_lock.
That means the pte entry of the CONT-PTE size hugetlb under current pte lock is unstable in follow_page_pte(), we can continue to migrate or poison the pte entry of the CONT-PTE size hugetlb, which can cause some potential race issues, even though they are under the 'pte lock'.
For example, suppose thread A is trying to look up a CONT-PTE size hugetlb page by move_pages() syscall under the lock, however antoher thread B can migrate the CONT-PTE hugetlb page at the same time, which will cause thread A to get an incorrect page, if thread A also wants to do page migration, then data inconsistency error occurs.
Moreover we have the same issue for CONT-PMD size hugetlb in follow_huge_pmd().
To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte() to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to get the correct pte entry lock to make the pte entry stable.
Mike said:
Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb: refactor find_num_contig()"). Patch series "Support for contiguous pte hugepages", v4. However, I do not believe these code paths were executed until migration support was added with 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") I would go with 5480280d3f2d for the Fixes: targe.
Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.166201756... Fixes: 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") Signed-off-by: Baolin Wang baolin.wang@linux.alibaba.com Suggested-by: Mike Kravetz mike.kravetz@oracle.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: David Hildenbrand david@redhat.com Cc: Muchun Song songmuchun@bytedance.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org [5.4: Fixup contextual diffs before pin_user_pages()] Signed-off-by: Samuel Mendoza-Jonas samjonas@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/hugetlb.h | 6 +++--- mm/gup.c | 13 ++++++++++++- mm/hugetlb.c | 28 ++++++++++++++-------------- 3 files changed, 29 insertions(+), 18 deletions(-)
--- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -127,8 +127,8 @@ struct page *follow_huge_addr(struct mm_ struct page *follow_huge_pd(struct vm_area_struct *vma, unsigned long address, hugepd_t hpd, int flags, int pdshift); -struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int flags); +struct page *follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, + int flags); struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, pud_t *pud, int flags); struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address, @@ -175,7 +175,7 @@ static inline void hugetlb_show_meminfo( { } #define follow_huge_pd(vma, addr, hpd, flags, pdshift) NULL -#define follow_huge_pmd(mm, addr, pmd, flags) NULL +#define follow_huge_pmd_pte(vma, addr, flags) NULL #define follow_huge_pud(mm, addr, pud, flags) NULL #define follow_huge_pgd(mm, addr, pgd, flags) NULL #define prepare_hugepage_range(file, addr, len) (-EINVAL) --- a/mm/gup.c +++ b/mm/gup.c @@ -188,6 +188,17 @@ static struct page *follow_page_pte(stru spinlock_t *ptl; pte_t *ptep, pte;
+ /* + * Considering PTE level hugetlb, like continuous-PTE hugetlb on + * ARM64 architecture. + */ + if (is_vm_hugetlb_page(vma)) { + page = follow_huge_pmd_pte(vma, address, flags); + if (page) + return page; + return no_page_table(vma, flags); + } + retry: if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags); @@ -333,7 +344,7 @@ static struct page *follow_pmd_mask(stru if (pmd_none(pmdval)) return no_page_table(vma, flags); if (pmd_huge(pmdval) && vma->vm_flags & VM_HUGETLB) { - page = follow_huge_pmd(mm, address, pmd, flags); + page = follow_huge_pmd_pte(vma, address, flags); if (page) return page; return no_page_table(vma, flags); --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5157,30 +5157,30 @@ follow_huge_pd(struct vm_area_struct *vm }
struct page * __weak -follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int flags) +follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, int flags) { + struct hstate *h = hstate_vma(vma); + struct mm_struct *mm = vma->vm_mm; struct page *page = NULL; spinlock_t *ptl; - pte_t pte; + pte_t *ptep, pte; + retry: - ptl = pmd_lockptr(mm, pmd); - spin_lock(ptl); - /* - * make sure that the address range covered by this pmd is not - * unmapped from other threads. - */ - if (!pmd_huge(*pmd)) - goto out; - pte = huge_ptep_get((pte_t *)pmd); + ptep = huge_pte_offset(mm, address, huge_page_size(h)); + if (!ptep) + return NULL; + + ptl = huge_pte_lock(h, mm, ptep); + pte = huge_ptep_get(ptep); if (pte_present(pte)) { - page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT); + page = pte_page(pte) + + ((address & ~huge_page_mask(h)) >> PAGE_SHIFT); if (flags & FOLL_GET) get_page(page); } else { if (is_hugetlb_entry_migration(pte)) { spin_unlock(ptl); - __migration_entry_wait(mm, (pte_t *)pmd, ptl); + __migration_entry_wait(mm, ptep, ptl); goto retry; } /*
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit 4d002d6a2a00ac1c433899bd7625c6400a74cfba ]
In cc2520_hw_init(), if oscillator start failed, the error code should be returned.
Fixes: 0da6bc8cc341 ("ieee802154: cc2520: adds driver for TI CC2520 radio") Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Link: https://lore.kernel.org/r/20221120075046.2213633-1-william.xuanziyang@huawei... Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ieee802154/cc2520.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ieee802154/cc2520.c b/drivers/net/ieee802154/cc2520.c index 0432a4f829a9..9739a6ed91ad 100644 --- a/drivers/net/ieee802154/cc2520.c +++ b/drivers/net/ieee802154/cc2520.c @@ -973,7 +973,7 @@ static int cc2520_hw_init(struct cc2520_private *priv)
if (timeout-- <= 0) { dev_err(&priv->spi->dev, "oscillator start failed!\n"); - return ret; + return -ETIMEDOUT; } udelay(1); } while (!(status & CC2520_STATUS_XOSC32M_STABLE));
From: Hauke Mehrtens hauke@hauke-m.de
[ Upstream commit 1e24c54da257ab93cff5826be8a793b014a5dc9c ]
The struct cas_control embeds multiple generic SPI structures and we have to make sure these structures are initialized to default values. This driver does not set all attributes. When using kmalloc before some attributes were not initialized and contained random data which caused random crashes at bootup.
Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver") Signed-off-by: Hauke Mehrtens hauke@hauke-m.de Link: https://lore.kernel.org/r/20221121002201.1339636-1-hauke@hauke-m.de Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ieee802154/ca8210.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c index 47959aadbc50..66cf09e637e4 100644 --- a/drivers/net/ieee802154/ca8210.c +++ b/drivers/net/ieee802154/ca8210.c @@ -926,7 +926,7 @@ static int ca8210_spi_transfer(
dev_dbg(&spi->dev, "%s called\n", __func__);
- cas_ctl = kmalloc(sizeof(*cas_ctl), GFP_ATOMIC); + cas_ctl = kzalloc(sizeof(*cas_ctl), GFP_ATOMIC); if (!cas_ctl) return -ENOMEM;
From: Qiqi Zhang eddy.zhang@rock-chips.com
[ Upstream commit 8c115864501fc09932cdfec53d9ec1cde82b4a28 ]
According to the description in ti-sn65dsi86's datasheet:
CHA_HSYNC_POLARITY: 0 = Active High Pulse. Synchronization signal is high for the sync pulse width. (default) 1 = Active Low Pulse. Synchronization signal is low for the sync pulse width.
CHA_VSYNC_POLARITY: 0 = Active High Pulse. Synchronization signal is high for the sync pulse width. (Default) 1 = Active Low Pulse. Synchronization signal is low for the sync pulse width.
We should only set these bits when the polarity is negative.
Fixes: a095f15c00e2 ("drm/bridge: add support for sn65dsi86 bridge driver") Signed-off-by: Qiqi Zhang eddy.zhang@rock-chips.com Reviewed-by: Douglas Anderson dianders@chromium.org Tested-by: Douglas Anderson dianders@chromium.org Reviewed-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://patchwork.freedesktop.org/patch/msgid/20221125104558.84616-1-eddy.zh... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/ti-sn65dsi86.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c b/drivers/gpu/drm/bridge/ti-sn65dsi86.c index dbb4a374cb64..20ea1c6bc8bb 100644 --- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c +++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c @@ -460,9 +460,9 @@ static void ti_sn_bridge_set_video_timings(struct ti_sn_bridge *pdata) &pdata->bridge.encoder->crtc->state->adjusted_mode; u8 hsync_polarity = 0, vsync_polarity = 0;
- if (mode->flags & DRM_MODE_FLAG_PHSYNC) + if (mode->flags & DRM_MODE_FLAG_NHSYNC) hsync_polarity = CHA_HSYNC_POLARITY; - if (mode->flags & DRM_MODE_FLAG_PVSYNC) + if (mode->flags & DRM_MODE_FLAG_NVSYNC) vsync_polarity = CHA_VSYNC_POLARITY;
ti_sn_bridge_write_u16(pdata, SN_CHA_ACTIVE_LINE_LENGTH_LOW_REG,
From: Xiongfeng Wang wangxiongfeng2@huawei.com
[ Upstream commit 45fecdb9f658d9c82960c98240bc0770ade19aca ]
for_each_pci_dev() is implemented by pci_get_device(). The comment of pci_get_device() says that it will increase the reference count for the returned pci_dev and also decrease the reference count for the input pci_dev @from if it is not NULL.
If we break for_each_pci_dev() loop with pdev not NULL, we need to call pci_dev_put() to decrease the reference count. Add the missing pci_dev_put() after the 'out' label. Since pci_dev_put() can handle NULL input parameter, there is no problem for the 'Device not found' branch. For the normal path, add pci_dev_put() in amd_gpio_exit().
Fixes: f942a7de047d ("gpio: add a driver for GPIO pins found on AMD-8111 south bridge chips") Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-amd8111.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpio/gpio-amd8111.c b/drivers/gpio/gpio-amd8111.c index fdcebe59510d..68d95051dd0e 100644 --- a/drivers/gpio/gpio-amd8111.c +++ b/drivers/gpio/gpio-amd8111.c @@ -231,7 +231,10 @@ static int __init amd_gpio_init(void) ioport_unmap(gp.pm); goto out; } + return 0; + out: + pci_dev_put(pdev); return err; }
@@ -239,6 +242,7 @@ static void __exit amd_gpio_exit(void) { gpiochip_remove(&gp.chip); ioport_unmap(gp.pm); + pci_dev_put(gp.pdev); }
module_init(amd_gpio_init);
From: Akihiko Odaki akihiko.odaki@daynix.com
[ Upstream commit eed913f6919e253f35d454b2f115f2a4db2b741a ]
e1000_xmit_frame is expected to stop the queue and dispatch frames to hardware if there is not sufficient space for the next frame in the buffer, but sometimes it failed to do so because the estimated maximum size of frame was wrong. As the consequence, the later invocation of e1000_xmit_frame failed with NETDEV_TX_BUSY, and the frame in the buffer remained forever, resulting in a watchdog failure.
This change fixes the estimated size by making it match with the condition for NETDEV_TX_BUSY. Apparently, the old estimation failed to account for the following lines which determines the space requirement for not causing NETDEV_TX_BUSY: ``` /* reserve a descriptor for the offload context */ if ((mss) || (skb->ip_summed == CHECKSUM_PARTIAL)) count++; count++;
count += DIV_ROUND_UP(len, adapter->tx_fifo_limit); ```
This issue was found when running http-stress02 test included in Linux Test Project 20220930 on QEMU with the following commandline: ``` qemu-system-x86_64 -M q35,accel=kvm -m 8G -smp 8 -drive if=virtio,format=raw,file=root.img,file.locking=on -device e1000e,netdev=netdev -netdev tap,script=ifup,downscript=no,id=netdev ```
Fixes: bc7f75fa9788 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)") Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com Tested-by: Gurucharan G gurucharanx.g@intel.com (A Contingent worker at Intel) Tested-by: Naama Meir naamax.meir@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/e1000e/netdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index cbd83bb5c1ac..b0d43985724d 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -5916,9 +5916,9 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb, e1000_tx_queue(tx_ring, tx_flags, count); /* Make sure there is space in the ring for the next send. */ e1000_maybe_stop_tx(tx_ring, - (MAX_SKB_FRAGS * + ((MAX_SKB_FRAGS + 1) * DIV_ROUND_UP(PAGE_SIZE, - adapter->tx_fifo_limit) + 2)); + adapter->tx_fifo_limit) + 4));
if (!netdev_xmit_more() || netif_xmit_stopped(netdev_get_tx_queue(netdev, 0))) {
From: Akihiko Odaki akihiko.odaki@daynix.com
[ Upstream commit 28e96556baca7056d11d9fb3cdd0aba4483e00d8 ]
Without this change, the interrupt test fail with MSI-X environment:
$ sudo ethtool -t enp0s2 offline [ 43.921783] igb 0000:00:02.0: offline testing starting [ 44.855824] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Down [ 44.961249] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 51.272202] igb 0000:00:02.0: testing shared interrupt [ 56.996975] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX The test result is FAIL The test extra info: Register test (offline) 0 Eeprom test (offline) 0 Interrupt test (offline) 4 Loopback test (offline) 0 Link test (on/offline) 0
Here, "4" means an expected interrupt was not delivered.
To fix this, route IRQs correctly to the first MSI-X vector by setting IVAR_MISC. Also, set bit 0 of EIMS so that the vector will not be masked. The interrupt test now runs properly with this change:
$ sudo ethtool -t enp0s2 offline [ 42.762985] igb 0000:00:02.0: offline testing starting [ 50.141967] igb 0000:00:02.0: testing shared interrupt [ 56.163957] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX The test result is PASS The test extra info: Register test (offline) 0 Eeprom test (offline) 0 Interrupt test (offline) 0 Loopback test (offline) 0 Link test (on/offline) 0
Fixes: 4eefa8f01314 ("igb: add single vector msi-x testing to interrupt test") Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Gurucharan G gurucharanx.g@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index f80933320fd3..6196f9bbd67d 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -1402,6 +1402,8 @@ static int igb_intr_test(struct igb_adapter *adapter, u64 *data) *data = 1; return -1; } + wr32(E1000_IVAR_MISC, E1000_IVAR_VALID << 8); + wr32(E1000_EIMS, BIT(0)); } else if (adapter->flags & IGB_FLAG_HAS_MSI) { shared_int = false; if (request_irq(irq,
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit b3abe42e94900bdd045c472f9c9be620ba5ce553 ]
Wei Chen reported a NULL deref in sk_user_ns() [0][1], and Paolo diagnosed the root cause: in unix_diag_get_exact(), the newly allocated skb does not have sk. [2]
We must get the user_ns from the NETLINK_CB(in_skb).sk and pass it to sk_diag_fill().
[0]: BUG: kernel NULL pointer dereference, address: 0000000000000270 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 12bbce067 P4D 12bbce067 PUD 12bc40067 PMD 0 Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 27942 Comm: syz-executor.0 Not tainted 6.1.0-rc5-next-20221118 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 RIP: 0010:sk_user_ns include/net/sock.h:920 [inline] RIP: 0010:sk_diag_dump_uid net/unix/diag.c:119 [inline] RIP: 0010:sk_diag_fill+0x77d/0x890 net/unix/diag.c:170 Code: 89 ef e8 66 d4 2d fd c7 44 24 40 00 00 00 00 49 8d 7c 24 18 e8 54 d7 2d fd 49 8b 5c 24 18 48 8d bb 70 02 00 00 e8 43 d7 2d fd <48> 8b 9b 70 02 00 00 48 8d 7b 10 e8 33 d7 2d fd 48 8b 5b 10 48 8d RSP: 0018:ffffc90000d67968 EFLAGS: 00010246 RAX: ffff88812badaa48 RBX: 0000000000000000 RCX: ffffffff840d481d RDX: 0000000000000465 RSI: 0000000000000000 RDI: 0000000000000270 RBP: ffffc90000d679a8 R08: 0000000000000277 R09: 0000000000000000 R10: 0001ffffffffffff R11: 0001c90000d679a8 R12: ffff88812ac03800 R13: ffff88812c87c400 R14: ffff88812ae42210 R15: ffff888103026940 FS: 00007f08b4e6f700(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000270 CR3: 000000012c58b000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> unix_diag_get_exact net/unix/diag.c:285 [inline] unix_diag_handler_dump+0x3f9/0x500 net/unix/diag.c:317 __sock_diag_cmd net/core/sock_diag.c:235 [inline] sock_diag_rcv_msg+0x237/0x250 net/core/sock_diag.c:266 netlink_rcv_skb+0x13e/0x250 net/netlink/af_netlink.c:2564 sock_diag_rcv+0x24/0x40 net/core/sock_diag.c:277 netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline] netlink_unicast+0x5e9/0x6b0 net/netlink/af_netlink.c:1356 netlink_sendmsg+0x739/0x860 net/netlink/af_netlink.c:1932 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0x38f/0x500 net/socket.c:2476 ___sys_sendmsg net/socket.c:2530 [inline] __sys_sendmsg+0x197/0x230 net/socket.c:2559 __do_sys_sendmsg net/socket.c:2568 [inline] __se_sys_sendmsg net/socket.c:2566 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2566 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x4697f9 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f08b4e6ec48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000077bf80 RCX: 00000000004697f9 RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003 RBP: 00000000004d29e9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000077bf80 R13: 0000000000000000 R14: 000000000077bf80 R15: 00007ffdb36bc6c0 </TASK> Modules linked in: CR2: 0000000000000270
[1]: https://lore.kernel.org/netdev/CAO4mrfdvyjFpokhNsiwZiP-wpdSD0AStcJwfKcKQdAAL... [2]: https://lore.kernel.org/netdev/e04315e7c90d9a75613f3993c2baf2d344eef7eb.came...
Fixes: cae9910e7344 ("net: Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.") Reported-by: syzbot syzkaller@googlegroups.com Reported-by: Wei Chen harperchen1110@gmail.com Diagnosed-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/diag.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/net/unix/diag.c b/net/unix/diag.c index 9ff64f9df1f3..951b33fa8f5c 100644 --- a/net/unix/diag.c +++ b/net/unix/diag.c @@ -113,14 +113,16 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb) return nla_put(nlskb, UNIX_DIAG_RQLEN, sizeof(rql), &rql); }
-static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb) +static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb, + struct user_namespace *user_ns) { - uid_t uid = from_kuid_munged(sk_user_ns(nlskb->sk), sock_i_uid(sk)); + uid_t uid = from_kuid_munged(user_ns, sock_i_uid(sk)); return nla_put(nlskb, UNIX_DIAG_UID, sizeof(uid_t), &uid); }
static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_req *req, - u32 portid, u32 seq, u32 flags, int sk_ino) + struct user_namespace *user_ns, + u32 portid, u32 seq, u32 flags, int sk_ino) { struct nlmsghdr *nlh; struct unix_diag_msg *rep; @@ -166,7 +168,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r goto out_nlmsg_trim;
if ((req->udiag_show & UDIAG_SHOW_UID) && - sk_diag_dump_uid(sk, skb)) + sk_diag_dump_uid(sk, skb, user_ns)) goto out_nlmsg_trim;
nlmsg_end(skb, nlh); @@ -178,7 +180,8 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r }
static int sk_diag_dump(struct sock *sk, struct sk_buff *skb, struct unix_diag_req *req, - u32 portid, u32 seq, u32 flags) + struct user_namespace *user_ns, + u32 portid, u32 seq, u32 flags) { int sk_ino;
@@ -189,7 +192,7 @@ static int sk_diag_dump(struct sock *sk, struct sk_buff *skb, struct unix_diag_r if (!sk_ino) return 0;
- return sk_diag_fill(sk, skb, req, portid, seq, flags, sk_ino); + return sk_diag_fill(sk, skb, req, user_ns, portid, seq, flags, sk_ino); }
static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb) @@ -217,7 +220,7 @@ static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb) goto next; if (!(req->udiag_states & (1 << sk->sk_state))) goto next; - if (sk_diag_dump(sk, skb, req, + if (sk_diag_dump(sk, skb, req, sk_user_ns(skb->sk), NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI) < 0) @@ -285,7 +288,8 @@ static int unix_diag_get_exact(struct sk_buff *in_skb, if (!rep) goto out;
- err = sk_diag_fill(sk, rep, req, NETLINK_CB(in_skb).portid, + err = sk_diag_fill(sk, rep, req, sk_user_ns(NETLINK_CB(in_skb).sk), + NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, 0, req->udiag_ino); if (err < 0) { nlmsg_free(rep);
From: Wang ShaoBo bobo.shaobowang@huawei.com
[ Upstream commit 747da1308bdd5021409974f9180f0d8ece53d142 ]
hci_get_route() takes reference, we should use hci_dev_put() to release it when not need anymore.
Fixes: 6b8d4a6a0314 ("Bluetooth: 6LoWPAN: Use connected oriented channel instead of fixed one") Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/6lowpan.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/bluetooth/6lowpan.c b/net/bluetooth/6lowpan.c index 52fb6d6d6d58..bccad8c048da 100644 --- a/net/bluetooth/6lowpan.c +++ b/net/bluetooth/6lowpan.c @@ -1002,6 +1002,7 @@ static int get_l2cap_conn(char *buf, bdaddr_t *addr, u8 *addr_type, hci_dev_lock(hdev); hcon = hci_conn_hash_lookup_le(hdev, addr, *addr_type); hci_dev_unlock(hdev); + hci_dev_put(hdev);
if (!hcon) return -ENOENT;
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit 2f3957c7eb4e07df944169a3e50a4d6790e1c744 ]
bt_init() calls bt_leds_init() to register led, but if it fails later, bt_leds_cleanup() is not called to unregister it.
This can cause panic if the argument "bluetooth-power" in text is freed and then another led_trigger_register() tries to access it:
BUG: unable to handle page fault for address: ffffffffc06d3bc0 RIP: 0010:strcmp+0xc/0x30 Call Trace: <TASK> led_trigger_register+0x10d/0x4f0 led_trigger_register_simple+0x7d/0x100 bt_init+0x39/0xf7 [bluetooth] do_one_initcall+0xd0/0x4e0
Fixes: e64c97b53bc6 ("Bluetooth: Add combined LED trigger for controller power") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/af_bluetooth.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c index 5f508c50649d..8031526eeeee 100644 --- a/net/bluetooth/af_bluetooth.c +++ b/net/bluetooth/af_bluetooth.c @@ -735,7 +735,7 @@ static int __init bt_init(void)
err = bt_sysfs_init(); if (err < 0) - return err; + goto cleanup_led;
err = sock_register(&bt_sock_family_ops); if (err) @@ -771,6 +771,8 @@ static int __init bt_init(void) sock_unregister(PF_BLUETOOTH); cleanup_sysfs: bt_sysfs_cleanup(); +cleanup_led: + bt_leds_cleanup(); return err; }
From: Artem Chernyshev artem.chernyshev@red-soft.ru
[ Upstream commit 3d8fdcbf1f42e2bb9ae8b8c0b6f202278c788a22 ]
Return NULL if we got unexpected value from skb_trim_rcsum() in ksz_common_rcv()
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: bafe9ba7d908 ("net: dsa: ksz: Factor out common tag code") Signed-off-by: Artem Chernyshev artem.chernyshev@red-soft.ru Reviewed-by: Vladimir Oltean olteanv@gmail.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20221201140032.26746-1-artem.chernyshev@red-soft.r... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/dsa/tag_ksz.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index 73605bcbb385..7354c5db3a14 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -62,7 +62,8 @@ static struct sk_buff *ksz_common_rcv(struct sk_buff *skb, if (!skb->dev) return NULL;
- pskb_trim_rcsum(skb, skb->len - len); + if (pskb_trim_rcsum(skb, skb->len - len)) + return NULL;
skb->offload_fwd_mark = true;
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 85a0506c073332a3057f5a9635fa0d4db5a8e03b ]
When testing in kci_test_ipsec_offload, srcip is configured as $dstip, it should add xfrm policy rule in instead of out. The test result of this patch is as follows: PASS: ipsec_offload
Fixes: 2766a11161cc ("selftests: rtnetlink: add ipsec offload API test") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Acked-by: Hangbin Liu liuhangbin@gmail.com Link: https://lore.kernel.org/r/20221201082246.14131-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/rtnetlink.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index 28ea3753da20..911c549f186f 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -780,7 +780,7 @@ kci_test_ipsec_offload() tmpl proto esp src $srcip dst $dstip spi 9 \ mode transport reqid 42 check_err $? - ip x p add dir out src $dstip/24 dst $srcip/24 \ + ip x p add dir in src $dstip/24 dst $srcip/24 \ tmpl proto esp src $dstip dst $srcip spi 9 \ mode transport reqid 42 check_err $?
From: Wei Yongjun weiyongjun1@huawei.com
[ Upstream commit b3d72d3135d2ef68296c1ee174436efd65386f04 ]
Kernel fault injection test reports null-ptr-deref as follows:
BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:cfg802154_netdev_notifier_call+0x120/0x310 include/linux/list.h:114 Call Trace: <TASK> raw_notifier_call_chain+0x6d/0xa0 kernel/notifier.c:87 call_netdevice_notifiers_info+0x6e/0xc0 net/core/dev.c:1944 unregister_netdevice_many_notify+0x60d/0xcb0 net/core/dev.c:1982 unregister_netdevice_queue+0x154/0x1a0 net/core/dev.c:10879 register_netdevice+0x9a8/0xb90 net/core/dev.c:10083 ieee802154_if_add+0x6ed/0x7e0 net/mac802154/iface.c:659 ieee802154_register_hw+0x29c/0x330 net/mac802154/main.c:229 mcr20a_probe+0xaaa/0xcb1 drivers/net/ieee802154/mcr20a.c:1316
ieee802154_if_add() allocates wpan_dev as netdev's private data, but not init the list in struct wpan_dev. cfg802154_netdev_notifier_call() manage the list when device register/unregister, and may lead to null-ptr-deref.
Use INIT_LIST_HEAD() on it to initialize it correctly.
Fixes: fcf39e6e88e9 ("ieee802154: add wpan_dev_list") Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Acked-by: Alexander Aring aahringo@redhat.com
Link: https://lore.kernel.org/r/20221130091705.1831140-1-weiyongjun@huaweicloud.co... Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac802154/iface.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/mac802154/iface.c b/net/mac802154/iface.c index 1cf5ac09edcb..a08240fe68a7 100644 --- a/net/mac802154/iface.c +++ b/net/mac802154/iface.c @@ -661,6 +661,7 @@ ieee802154_if_add(struct ieee802154_local *local, const char *name, sdata->dev = ndev; sdata->wpan_dev.wpan_phy = local->hw.phy; sdata->local = local; + INIT_LIST_HEAD(&sdata->wpan_dev.list);
/* setup type-dependent data */ ret = ieee802154_setup_sdata(sdata, type);
From: Valentina Goncharenko goncharenko.vp@ispras.ru
[ Upstream commit 167b3f2dcc62c271f3555b33df17e361bb1fa0ee ]
In functions regmap_encx24j600_phy_reg_read() and regmap_encx24j600_phy_reg_write() in the conditions of the waiting cycles for filling the variable 'ret' it is necessary to add parentheses to prevent wrong assignment due to logical operations precedence.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: d70e53262f5c ("net: Microchip encx24j600 driver") Signed-off-by: Valentina Goncharenko goncharenko.vp@ispras.ru Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/encx24j600-regmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microchip/encx24j600-regmap.c b/drivers/net/ethernet/microchip/encx24j600-regmap.c index e2528633c09a..8c07c3c3c00c 100644 --- a/drivers/net/ethernet/microchip/encx24j600-regmap.c +++ b/drivers/net/ethernet/microchip/encx24j600-regmap.c @@ -364,7 +364,7 @@ static int regmap_encx24j600_phy_reg_read(void *context, unsigned int reg, goto err_out;
usleep_range(26, 100); - while ((ret = regmap_read(ctx->regmap, MISTAT, &mistat) != 0) && + while (((ret = regmap_read(ctx->regmap, MISTAT, &mistat)) != 0) && (mistat & BUSY)) cpu_relax();
@@ -402,7 +402,7 @@ static int regmap_encx24j600_phy_reg_write(void *context, unsigned int reg, goto err_out;
usleep_range(26, 100); - while ((ret = regmap_read(ctx->regmap, MISTAT, &mistat) != 0) && + while (((ret = regmap_read(ctx->regmap, MISTAT, &mistat)) != 0) && (mistat & BUSY)) cpu_relax();
From: Valentina Goncharenko goncharenko.vp@ispras.ru
[ Upstream commit 25f427ac7b8d89b0259f86c0c6407b329df742b2 ]
A loop for reading MISTAT register continues while regmap_read() fails and (mistat & BUSY), but if regmap_read() fails a value of mistat is undefined.
The patch proposes to check for BUSY flag only when regmap_read() succeed. Compile test only.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: d70e53262f5c ("net: Microchip encx24j600 driver") Signed-off-by: Valentina Goncharenko goncharenko.vp@ispras.ru Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/encx24j600-regmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microchip/encx24j600-regmap.c b/drivers/net/ethernet/microchip/encx24j600-regmap.c index 8c07c3c3c00c..8c6c5c706992 100644 --- a/drivers/net/ethernet/microchip/encx24j600-regmap.c +++ b/drivers/net/ethernet/microchip/encx24j600-regmap.c @@ -364,7 +364,7 @@ static int regmap_encx24j600_phy_reg_read(void *context, unsigned int reg, goto err_out;
usleep_range(26, 100); - while (((ret = regmap_read(ctx->regmap, MISTAT, &mistat)) != 0) && + while (((ret = regmap_read(ctx->regmap, MISTAT, &mistat)) == 0) && (mistat & BUSY)) cpu_relax();
@@ -402,7 +402,7 @@ static int regmap_encx24j600_phy_reg_write(void *context, unsigned int reg, goto err_out;
usleep_range(26, 100); - while (((ret = regmap_read(ctx->regmap, MISTAT, &mistat)) != 0) && + while (((ret = regmap_read(ctx->regmap, MISTAT, &mistat)) == 0) && (mistat & BUSY)) cpu_relax();
From: Lin Liu lin.liu@citrix.com
[ Upstream commit d50b7914fae04d840ce36491d22133070b18cca9 ]
A NAPI is setup for each network sring to poll data to kernel The sring with source host is destroyed before live migration and new sring with target host is setup after live migration. The NAPI for the old sring is not deleted until setup new sring with target host after migration. With busy_poll/busy_read enabled, the NAPI can be polled before got deleted when resume VM.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: xennet_poll+0xae/0xd20 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI Call Trace: finish_task_switch+0x71/0x230 timerqueue_del+0x1d/0x40 hrtimer_try_to_cancel+0xb5/0x110 xennet_alloc_rx_buffers+0x2a0/0x2a0 napi_busy_loop+0xdb/0x270 sock_poll+0x87/0x90 do_sys_poll+0x26f/0x580 tracing_map_insert+0x1d4/0x2f0 event_hist_trigger+0x14a/0x260
finish_task_switch+0x71/0x230 __schedule+0x256/0x890 recalc_sigpending+0x1b/0x50 xen_sched_clock+0x15/0x20 __rb_reserve_next+0x12d/0x140 ring_buffer_lock_reserve+0x123/0x3d0 event_triggers_call+0x87/0xb0 trace_event_buffer_commit+0x1c4/0x210 xen_clocksource_get_cycles+0x15/0x20 ktime_get_ts64+0x51/0xf0 SyS_ppoll+0x160/0x1a0 SyS_ppoll+0x160/0x1a0 do_syscall_64+0x73/0x130 entry_SYSCALL_64_after_hwframe+0x41/0xa6 ... RIP: xennet_poll+0xae/0xd20 RSP: ffffb4f041933900 CR2: 0000000000000008 ---[ end trace f8601785b354351c ]---
xen frontend should remove the NAPIs for the old srings before live migration as the bond srings are destroyed
There is a tiny window between the srings are set to NULL and the NAPIs are disabled, It is safe as the NAPI threads are still frozen at that time
Signed-off-by: Lin Liu lin.liu@citrix.com Fixes: 4ec2411980d0 ([NET]: Do not check netif_running() and carrier state in ->poll()) Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/xen-netfront.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 810fa9968be7..9ae0903bc225 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1621,6 +1621,12 @@ static int netfront_resume(struct xenbus_device *dev) netif_tx_unlock_bh(info->netdev);
xennet_disconnect_backend(info); + + rtnl_lock(); + if (info->queues) + xennet_destroy_queues(info); + rtnl_unlock(); + return 0; }
From: Dan Carpenter error27@gmail.com
[ Upstream commit e8b4fc13900b8e8be48debffd0dfd391772501f7 ]
The pp->indir[0] value comes from the user. It is passed to:
if (cpu_online(pp->rxq_def))
inside the mvneta_percpu_elect() function. It needs bounds checkeding to ensure that it is not beyond the end of the cpu bitmap.
Fixes: cad5d847a093 ("net: mvneta: Fix the CPU choice in mvneta_percpu_elect") Signed-off-by: Dan Carpenter error27@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/mvneta.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 64aa5510e61a..67ba697518d6 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -4235,6 +4235,9 @@ static int mvneta_config_rss(struct mvneta_port *pp) napi_disable(&pp->napi); }
+ if (pp->indir[0] >= nr_cpu_ids) + return -EINVAL; + pp->rxq_def = pp->indir[0];
/* Update unicast mapping */
From: Michal Jaron michalx.jaron@intel.com
[ Upstream commit 82e0572b23029b380464fa9fdc125db9c1506d0a ]
During tx rings configuration default XPS queue config is set and __I40E_TX_XPS_INIT_DONE is locked. __I40E_TX_XPS_INIT_DONE state is cleared and set again with default mapping only during queues build, it means after first setup or reset with queues rebuild. (i.e. ethtool -L <interface> combined <number>) After other resets (i.e. ethtool -t <interface>) XPS_INIT_DONE is not cleared and those default maps cannot be set again. It results in cleared xps_cpus mapping until queues are not rebuild or mapping is not set by user.
Add clearing __I40E_TX_XPS_INIT_DONE state during reset to let the driver set xps_cpus to defaults again after it was cleared.
Fixes: 6f853d4f8e93 ("i40e: allow XPS with QoS enabled") Signed-off-by: Michal Jaron michalx.jaron@intel.com Signed-off-by: Kamil Maziarz kamil.maziarz@intel.com Tested-by: Gurucharan gurucharanx.g@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 15f177185d71..d612b23c2e3f 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -9953,6 +9953,21 @@ static int i40e_rebuild_channels(struct i40e_vsi *vsi) return 0; }
+/** + * i40e_clean_xps_state - clean xps state for every tx_ring + * @vsi: ptr to the VSI + **/ +static void i40e_clean_xps_state(struct i40e_vsi *vsi) +{ + int i; + + if (vsi->tx_rings) + for (i = 0; i < vsi->num_queue_pairs; i++) + if (vsi->tx_rings[i]) + clear_bit(__I40E_TX_XPS_INIT_DONE, + vsi->tx_rings[i]->state); +} + /** * i40e_prep_for_reset - prep for the core to reset * @pf: board private structure @@ -9984,8 +9999,10 @@ static void i40e_prep_for_reset(struct i40e_pf *pf, bool lock_acquired) rtnl_unlock();
for (v = 0; v < pf->num_alloc_vsi; v++) { - if (pf->vsi[v]) + if (pf->vsi[v]) { + i40e_clean_xps_state(pf->vsi[v]); pf->vsi[v]->seid = 0; + } }
i40e_shutdown_adminq(&pf->hw);
From: Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com
[ Upstream commit 08501970472077ed5de346ad89943a37d1692e9b ]
After spawning max VFs on a PF, some VFs were not getting resources and their MAC addresses were 0. This was caused by PF sleeping before flushing HW registers which caused VIRTCHNL_VFR_VFACTIVE to not be set in time for VF.
Fix by adding a sleep after hw flush.
Fixes: e4b433f4a741 ("i40e: reset all VFs in parallel when rebuilding PF") Signed-off-by: Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com Signed-off-by: Jan Sokolowski jan.sokolowski@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index fb060e3253d9..be07148a7b29 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -1394,6 +1394,7 @@ bool i40e_reset_vf(struct i40e_vf *vf, bool flr) i40e_cleanup_reset_vf(vf);
i40e_flush(hw); + usleep_range(20000, 40000); clear_bit(I40E_VF_STATE_RESETTING, &vf->vf_states);
return true; @@ -1517,6 +1518,7 @@ bool i40e_reset_all_vfs(struct i40e_pf *pf, bool flr) }
i40e_flush(hw); + usleep_range(20000, 40000); clear_bit(__I40E_VF_DISABLE, pf->state);
return true;
From: Przemyslaw Patynowski przemyslawx.patynowski@intel.com
[ Upstream commit d64aaf3f7869f915fd120763d75f11d6b116424d ]
Return -EOPNOTSUPP, when user requests l4_4_bytes for raw IP4 or IP6 flow director filters. Flow director does not support filtering on l4 bytes for PCTYPEs used by IP4 and IP6 filters. Without this patch, user could create filters with l4_4_bytes fields, which did not do any filtering on L4, but only on L3 fields.
Fixes: 36777d9fa24c ("i40e: check current configured input set when adding ntuple filters") Signed-off-by: Przemyslaw Patynowski przemyslawx.patynowski@intel.com Signed-off-by: Kamil Maziarz kamil.maziarz@intel.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Gurucharan G gurucharanx.g@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c index 7059ced24739..95a6f8689b6f 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c @@ -4233,11 +4233,7 @@ static int i40e_check_fdir_input_set(struct i40e_vsi *vsi, return -EOPNOTSUPP;
/* First 4 bytes of L4 header */ - if (usr_ip4_spec->l4_4_bytes == htonl(0xFFFFFFFF)) - new_mask |= I40E_L4_SRC_MASK | I40E_L4_DST_MASK; - else if (!usr_ip4_spec->l4_4_bytes) - new_mask &= ~(I40E_L4_SRC_MASK | I40E_L4_DST_MASK); - else + if (usr_ip4_spec->l4_4_bytes) return -EOPNOTSUPP;
/* Filtering on Type of Service is not supported. */
From: Kees Cook keescook@chromium.org
[ Upstream commit e329e71013c9b5a4535b099208493c7826ee4a64 ]
While running under CONFIG_FORTIFY_SOURCE=y, syzkaller reported:
memcpy: detected field-spanning write (size 129) of single field "target->sensf_res" at net/nfc/nci/ntf.c:260 (size 18)
This appears to be a legitimate lack of bounds checking in nci_add_new_protocol(). Add the missing checks.
Reported-by: syzbot+210e196cef4711b65139@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/0000000000001c590f05ee7b3ff4@google.com Fixes: 019c4fbaa790 ("NFC: Add NCI multiple targets support") Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20221202214410.never.693-kees@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/nfc/nci/ntf.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c index 33e1170817f0..f8b20cddd5c9 100644 --- a/net/nfc/nci/ntf.c +++ b/net/nfc/nci/ntf.c @@ -218,6 +218,8 @@ static int nci_add_new_protocol(struct nci_dev *ndev, target->sens_res = nfca_poll->sens_res; target->sel_res = nfca_poll->sel_res; target->nfcid1_len = nfca_poll->nfcid1_len; + if (target->nfcid1_len > ARRAY_SIZE(target->nfcid1)) + return -EPROTO; if (target->nfcid1_len > 0) { memcpy(target->nfcid1, nfca_poll->nfcid1, target->nfcid1_len); @@ -226,6 +228,8 @@ static int nci_add_new_protocol(struct nci_dev *ndev, nfcb_poll = (struct rf_tech_specific_params_nfcb_poll *)params;
target->sensb_res_len = nfcb_poll->sensb_res_len; + if (target->sensb_res_len > ARRAY_SIZE(target->sensb_res)) + return -EPROTO; if (target->sensb_res_len > 0) { memcpy(target->sensb_res, nfcb_poll->sensb_res, target->sensb_res_len); @@ -234,6 +238,8 @@ static int nci_add_new_protocol(struct nci_dev *ndev, nfcf_poll = (struct rf_tech_specific_params_nfcf_poll *)params;
target->sensf_res_len = nfcf_poll->sensf_res_len; + if (target->sensf_res_len > ARRAY_SIZE(target->sensf_res)) + return -EPROTO; if (target->sensf_res_len > 0) { memcpy(target->sensf_res, nfcf_poll->sensf_res, target->sensf_res_len);
From: Pankaj Raghav p.raghav@samsung.com
[ Upstream commit 6f2d71524bcfdeb1fcbd22a4a92a5b7b161ab224 ]
A device might have a core quirk for NVME_QUIRK_IGNORE_DEV_SUBNQN (such as Samsung X5) but it would still give a:
"missing or invalid SUBNQN field"
warning as core quirks are filled after calling nvme_init_subnqn. Fill ctrl->quirks from struct core_quirks before calling nvme_init_subsystem to fix this.
Tested on a Samsung X5.
Fixes: ab9e00cc72fa ("nvme: track subsystems") Signed-off-by: Pankaj Raghav p.raghav@samsung.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 3b5e5fb158be..029a89aead53 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2806,10 +2806,6 @@ int nvme_init_identify(struct nvme_ctrl *ctrl) if (!ctrl->identified) { int i;
- ret = nvme_init_subsystem(ctrl, id); - if (ret) - goto out_free; - /* * Check for quirks. Quirk can depend on firmware version, * so, in principle, the set of quirks present can change @@ -2822,6 +2818,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl) if (quirk_matches(id, &core_quirks[i])) ctrl->quirks |= core_quirks[i].quirks; } + + ret = nvme_init_subsystem(ctrl, id); + if (ret) + goto out_free; } memcpy(ctrl->subsys->firmware_rev, id->fr, sizeof(ctrl->subsys->firmware_rev));
From: Jisheng Zhang jszhang@kernel.org
[ Upstream commit 61d4f140943c47c1386ed89f7260e00418dfad9d ]
In dt-binding snps,dwmac.yaml, some properties under "snps,axi-config" node are named without "axi_" prefix, but the driver expects the prefix. Since the dt-binding has been there for a long time, we'd better make driver match the binding for compatibility.
Fixes: afea03656add ("stmmac: rework DMA bus setting and introduce new platform AXI structure") Signed-off-by: Jisheng Zhang jszhang@kernel.org Link: https://lore.kernel.org/r/20221202161739.2203-1-jszhang@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 70cbf48c2c03..a2ff9b4727ec 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -107,10 +107,10 @@ static struct stmmac_axi *stmmac_axi_setup(struct platform_device *pdev)
axi->axi_lpi_en = of_property_read_bool(np, "snps,lpi_en"); axi->axi_xit_frm = of_property_read_bool(np, "snps,xit_frm"); - axi->axi_kbbe = of_property_read_bool(np, "snps,axi_kbbe"); - axi->axi_fb = of_property_read_bool(np, "snps,axi_fb"); - axi->axi_mb = of_property_read_bool(np, "snps,axi_mb"); - axi->axi_rb = of_property_read_bool(np, "snps,axi_rb"); + axi->axi_kbbe = of_property_read_bool(np, "snps,kbbe"); + axi->axi_fb = of_property_read_bool(np, "snps,fb"); + axi->axi_mb = of_property_read_bool(np, "snps,mb"); + axi->axi_rb = of_property_read_bool(np, "snps,rb");
if (of_property_read_u32(np, "snps,wr_osr_lmt", &axi->axi_wr_osr_lmt)) axi->axi_wr_osr_lmt = 1;
From: Yongqiang Liu liuyongqiang13@huawei.com
[ Upstream commit 42330a32933fb42180c52022804dcf09f47a2f99 ]
The nicvf_probe() won't destroy workqueue when register_netdev() failed. Add destroy_workqueue err handle case to fix this issue.
Fixes: 2ecbe4f4a027 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them.") Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Link: https://lore.kernel.org/r/20221203094125.602812-1-liuyongqiang13@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index 27ea528ef448..e861567b2c49 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -2268,7 +2268,7 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) err = register_netdev(netdev); if (err) { dev_err(dev, "Failed to register netdevice\n"); - goto err_unregister_interrupts; + goto err_destroy_workqueue; }
nic->msg_enable = debug; @@ -2277,6 +2277,8 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return 0;
+err_destroy_workqueue: + destroy_workqueue(nic->nicvf_rx_mode_wq); err_unregister_interrupts: nicvf_unregister_interrupts(nic); err_free_netdev:
From: Liu Jian liujian56@huawei.com
[ Upstream commit 4640177049549de1a43e9bc49265f0cdfce08cfd ]
The skb is delivered to napi_gro_receive() which may free it, after calling this, dereferencing skb may trigger use-after-free.
Fixes: 542ae60af24f ("net: hisilicon: Add Fast Ethernet MAC driver") Signed-off-by: Liu Jian liujian56@huawei.com Link: https://lore.kernel.org/r/20221203094240.1240211-1-liujian56@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hisi_femac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hisi_femac.c b/drivers/net/ethernet/hisilicon/hisi_femac.c index 90ab7ade44c4..2ee6265228d1 100644 --- a/drivers/net/ethernet/hisilicon/hisi_femac.c +++ b/drivers/net/ethernet/hisilicon/hisi_femac.c @@ -283,7 +283,7 @@ static int hisi_femac_rx(struct net_device *dev, int limit) skb->protocol = eth_type_trans(skb, dev); napi_gro_receive(&priv->napi, skb); dev->stats.rx_packets++; - dev->stats.rx_bytes += skb->len; + dev->stats.rx_bytes += len; next: pos = (pos + 1) % rxq->num; if (rx_pkts_num >= limit)
From: Liu Jian liujian56@huawei.com
[ Upstream commit 433c07a13f59856e4585e89e86b7d4cc59348fab ]
The skb is delivered to napi_gro_receive() which may free it, after calling this, dereferencing skb may trigger use-after-free.
Fixes: 57c5bc9ad7d7 ("net: hisilicon: add hix5hd2 mac driver") Signed-off-by: Liu Jian liujian56@huawei.com Link: https://lore.kernel.org/r/20221203094240.1240211-2-liujian56@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hix5hd2_gmac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c index c41b19c760f8..645cae590dc4 100644 --- a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c +++ b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c @@ -550,7 +550,7 @@ static int hix5hd2_rx(struct net_device *dev, int limit) skb->protocol = eth_type_trans(skb, dev); napi_gro_receive(&priv->napi, skb); dev->stats.rx_packets++; - dev->stats.rx_bytes += skb->len; + dev->stats.rx_bytes += len; next: pos = dma_ring_incr(pos, RX_DESC_NUM); }
From: YueHaibing yuehaibing@huawei.com
[ Upstream commit 743117a997bbd4840e827295c07e59bcd7f7caa3 ]
Fix the potential risk of OOB if skb_linearize() fails in tipc_link_proto_rcv().
Fixes: 5cbb28a4bf65 ("tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers") Signed-off-by: YueHaibing yuehaibing@huawei.com Link: https://lore.kernel.org/r/20221203094635.29024-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/link.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/tipc/link.c b/net/tipc/link.c index 8f2ee71c63c6..b653d16ab21f 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -1971,7 +1971,9 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, if (tipc_own_addr(l->net) > msg_prevnode(hdr)) l->net_plane = msg_net_plane(hdr);
- skb_linearize(skb); + if (skb_linearize(skb)) + goto exit; + hdr = buf_msg(skb); data = msg_data(hdr);
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f96a3d74554df537b6db5c99c27c80e7afadc8d1 ]
Cited commit added the table ID to the FIB info structure, but did not prevent structures with different table IDs from being consolidated. This can lead to routes being flushed from a VRF when an address is deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB info. This is already done for FIB info structures backed by a nexthop object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [FAIL] TEST: Route in default VRF not removed [ OK ] RTNETLINK answers: File exists TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6 Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8 Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/fib_semantics.c | 1 + tools/testing/selftests/net/fib_tests.sh | 1727 ---------------------- 2 files changed, 1 insertion(+), 1727 deletions(-) delete mode 100755 tools/testing/selftests/net/fib_tests.sh
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 908913d75847..f45b9daf62cf 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -420,6 +420,7 @@ static struct fib_info *fib_find_info(struct fib_info *nfi) nfi->fib_prefsrc == fi->fib_prefsrc && nfi->fib_priority == fi->fib_priority && nfi->fib_type == fi->fib_type && + nfi->fib_tb_id == fi->fib_tb_id && memcmp(nfi->fib_metrics, fi->fib_metrics, sizeof(u32) * RTAX_MAX) == 0 && !((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_COMPARE_MASK) && diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh deleted file mode 100755 index 6986086035d6..000000000000 --- a/tools/testing/selftests/net/fib_tests.sh +++ /dev/null @@ -1,1727 +0,0 @@ -#!/bin/bash -# SPDX-License-Identifier: GPL-2.0 - -# This test is for checking IPv4 and IPv6 FIB behavior in response to -# different events. - -ret=0 -# Kselftest framework requirement - SKIP code is 4. -ksft_skip=4 - -# all tests in this script. Can be overridden with -t option -TESTS="unregister down carrier nexthop suppress ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter" - -VERBOSE=0 -PAUSE_ON_FAIL=no -PAUSE=no -IP="ip -netns ns1" -NS_EXEC="ip netns exec ns1" - -which ping6 > /dev/null 2>&1 && ping6=$(which ping6) || ping6=$(which ping) - -log_test() -{ - local rc=$1 - local expected=$2 - local msg="$3" - - if [ ${rc} -eq ${expected} ]; then - printf " TEST: %-60s [ OK ]\n" "${msg}" - nsuccess=$((nsuccess+1)) - else - ret=1 - nfail=$((nfail+1)) - printf " TEST: %-60s [FAIL]\n" "${msg}" - if [ "${PAUSE_ON_FAIL}" = "yes" ]; then - echo - echo "hit enter to continue, 'q' to quit" - read a - [ "$a" = "q" ] && exit 1 - fi - fi - - if [ "${PAUSE}" = "yes" ]; then - echo - echo "hit enter to continue, 'q' to quit" - read a - [ "$a" = "q" ] && exit 1 - fi -} - -setup() -{ - set -e - ip netns add ns1 - ip netns set ns1 auto - $IP link set dev lo up - ip netns exec ns1 sysctl -qw net.ipv4.ip_forward=1 - ip netns exec ns1 sysctl -qw net.ipv6.conf.all.forwarding=1 - - $IP link add dummy0 type dummy - $IP link set dev dummy0 up - $IP address add 198.51.100.1/24 dev dummy0 - $IP -6 address add 2001:db8:1::1/64 dev dummy0 - set +e - -} - -cleanup() -{ - $IP link del dev dummy0 &> /dev/null - ip netns del ns1 - ip netns del ns2 &> /dev/null -} - -get_linklocal() -{ - local dev=$1 - local addr - - addr=$($IP -6 -br addr show dev ${dev} | \ - awk '{ - for (i = 3; i <= NF; ++i) { - if ($i ~ /^fe80/) - print $i - } - }' - ) - addr=${addr//*} - - [ -z "$addr" ] && return 1 - - echo $addr - - return 0 -} - -fib_unreg_unicast_test() -{ - echo - echo "Single path route test" - - setup - - echo " Start point" - $IP route get fibmatch 198.51.100.2 &> /dev/null - log_test $? 0 "IPv4 fibmatch" - $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null - log_test $? 0 "IPv6 fibmatch" - - set -e - $IP link del dev dummy0 - set +e - - echo " Nexthop device deleted" - $IP route get fibmatch 198.51.100.2 &> /dev/null - log_test $? 2 "IPv4 fibmatch - no route" - $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null - log_test $? 2 "IPv6 fibmatch - no route" - - cleanup -} - -fib_unreg_multipath_test() -{ - - echo - echo "Multipath route test" - - setup - - set -e - $IP link add dummy1 type dummy - $IP link set dev dummy1 up - $IP address add 192.0.2.1/24 dev dummy1 - $IP -6 address add 2001:db8:2::1/64 dev dummy1 - - $IP route add 203.0.113.0/24 \ - nexthop via 198.51.100.2 dev dummy0 \ - nexthop via 192.0.2.2 dev dummy1 - $IP -6 route add 2001:db8:3::/64 \ - nexthop via 2001:db8:1::2 dev dummy0 \ - nexthop via 2001:db8:2::2 dev dummy1 - set +e - - echo " Start point" - $IP route get fibmatch 203.0.113.1 &> /dev/null - log_test $? 0 "IPv4 fibmatch" - $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null - log_test $? 0 "IPv6 fibmatch" - - set -e - $IP link del dev dummy0 - set +e - - echo " One nexthop device deleted" - $IP route get fibmatch 203.0.113.1 &> /dev/null - log_test $? 2 "IPv4 - multipath route removed on delete" - - $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null - # In IPv6 we do not flush the entire multipath route. - log_test $? 0 "IPv6 - multipath down to single path" - - set -e - $IP link del dev dummy1 - set +e - - echo " Second nexthop device deleted" - $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null - log_test $? 2 "IPv6 - no route" - - cleanup -} - -fib_unreg_test() -{ - fib_unreg_unicast_test - fib_unreg_multipath_test -} - -fib_down_unicast_test() -{ - echo - echo "Single path, admin down" - - setup - - echo " Start point" - $IP route get fibmatch 198.51.100.2 &> /dev/null - log_test $? 0 "IPv4 fibmatch" - $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null - log_test $? 0 "IPv6 fibmatch" - - set -e - $IP link set dev dummy0 down - set +e - - echo " Route deleted on down" - $IP route get fibmatch 198.51.100.2 &> /dev/null - log_test $? 2 "IPv4 fibmatch" - $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null - log_test $? 2 "IPv6 fibmatch" - - cleanup -} - -fib_down_multipath_test_do() -{ - local down_dev=$1 - local up_dev=$2 - - $IP route get fibmatch 203.0.113.1 \ - oif $down_dev &> /dev/null - log_test $? 2 "IPv4 fibmatch on down device" - $IP -6 route get fibmatch 2001:db8:3::1 \ - oif $down_dev &> /dev/null - log_test $? 2 "IPv6 fibmatch on down device" - - $IP route get fibmatch 203.0.113.1 \ - oif $up_dev &> /dev/null - log_test $? 0 "IPv4 fibmatch on up device" - $IP -6 route get fibmatch 2001:db8:3::1 \ - oif $up_dev &> /dev/null - log_test $? 0 "IPv6 fibmatch on up device" - - $IP route get fibmatch 203.0.113.1 | \ - grep $down_dev | grep -q "dead linkdown" - log_test $? 0 "IPv4 flags on down device" - $IP -6 route get fibmatch 2001:db8:3::1 | \ - grep $down_dev | grep -q "dead linkdown" - log_test $? 0 "IPv6 flags on down device" - - $IP route get fibmatch 203.0.113.1 | \ - grep $up_dev | grep -q "dead linkdown" - log_test $? 1 "IPv4 flags on up device" - $IP -6 route get fibmatch 2001:db8:3::1 | \ - grep $up_dev | grep -q "dead linkdown" - log_test $? 1 "IPv6 flags on up device" -} - -fib_down_multipath_test() -{ - echo - echo "Admin down multipath" - - setup - - set -e - $IP link add dummy1 type dummy - $IP link set dev dummy1 up - - $IP address add 192.0.2.1/24 dev dummy1 - $IP -6 address add 2001:db8:2::1/64 dev dummy1 - - $IP route add 203.0.113.0/24 \ - nexthop via 198.51.100.2 dev dummy0 \ - nexthop via 192.0.2.2 dev dummy1 - $IP -6 route add 2001:db8:3::/64 \ - nexthop via 2001:db8:1::2 dev dummy0 \ - nexthop via 2001:db8:2::2 dev dummy1 - set +e - - echo " Verify start point" - $IP route get fibmatch 203.0.113.1 &> /dev/null - log_test $? 0 "IPv4 fibmatch" - - $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null - log_test $? 0 "IPv6 fibmatch" - - set -e - $IP link set dev dummy0 down - set +e - - echo " One device down, one up" - fib_down_multipath_test_do "dummy0" "dummy1" - - set -e - $IP link set dev dummy0 up - $IP link set dev dummy1 down - set +e - - echo " Other device down and up" - fib_down_multipath_test_do "dummy1" "dummy0" - - set -e - $IP link set dev dummy0 down - set +e - - echo " Both devices down" - $IP route get fibmatch 203.0.113.1 &> /dev/null - log_test $? 2 "IPv4 fibmatch" - $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null - log_test $? 2 "IPv6 fibmatch" - - $IP link del dev dummy1 - cleanup -} - -fib_down_test() -{ - fib_down_unicast_test - fib_down_multipath_test -} - -# Local routes should not be affected when carrier changes. -fib_carrier_local_test() -{ - echo - echo "Local carrier tests - single path" - - setup - - set -e - $IP link set dev dummy0 carrier on - set +e - - echo " Start point" - $IP route get fibmatch 198.51.100.1 &> /dev/null - log_test $? 0 "IPv4 fibmatch" - $IP -6 route get fibmatch 2001:db8:1::1 &> /dev/null - log_test $? 0 "IPv6 fibmatch" - - $IP route get fibmatch 198.51.100.1 | \ - grep -q "linkdown" - log_test $? 1 "IPv4 - no linkdown flag" - $IP -6 route get fibmatch 2001:db8:1::1 | \ - grep -q "linkdown" - log_test $? 1 "IPv6 - no linkdown flag" - - set -e - $IP link set dev dummy0 carrier off - sleep 1 - set +e - - echo " Carrier off on nexthop" - $IP route get fibmatch 198.51.100.1 &> /dev/null - log_test $? 0 "IPv4 fibmatch" - $IP -6 route get fibmatch 2001:db8:1::1 &> /dev/null - log_test $? 0 "IPv6 fibmatch" - - $IP route get fibmatch 198.51.100.1 | \ - grep -q "linkdown" - log_test $? 1 "IPv4 - linkdown flag set" - $IP -6 route get fibmatch 2001:db8:1::1 | \ - grep -q "linkdown" - log_test $? 1 "IPv6 - linkdown flag set" - - set -e - $IP address add 192.0.2.1/24 dev dummy0 - $IP -6 address add 2001:db8:2::1/64 dev dummy0 - set +e - - echo " Route to local address with carrier down" - $IP route get fibmatch 192.0.2.1 &> /dev/null - log_test $? 0 "IPv4 fibmatch" - $IP -6 route get fibmatch 2001:db8:2::1 &> /dev/null - log_test $? 0 "IPv6 fibmatch" - - $IP route get fibmatch 192.0.2.1 | \ - grep -q "linkdown" - log_test $? 1 "IPv4 linkdown flag set" - $IP -6 route get fibmatch 2001:db8:2::1 | \ - grep -q "linkdown" - log_test $? 1 "IPv6 linkdown flag set" - - cleanup -} - -fib_carrier_unicast_test() -{ - ret=0 - - echo - echo "Single path route carrier test" - - setup - - set -e - $IP link set dev dummy0 carrier on - set +e - - echo " Start point" - $IP route get fibmatch 198.51.100.2 &> /dev/null - log_test $? 0 "IPv4 fibmatch" - $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null - log_test $? 0 "IPv6 fibmatch" - - $IP route get fibmatch 198.51.100.2 | \ - grep -q "linkdown" - log_test $? 1 "IPv4 no linkdown flag" - $IP -6 route get fibmatch 2001:db8:1::2 | \ - grep -q "linkdown" - log_test $? 1 "IPv6 no linkdown flag" - - set -e - $IP link set dev dummy0 carrier off - sleep 1 - set +e - - echo " Carrier down" - $IP route get fibmatch 198.51.100.2 &> /dev/null - log_test $? 0 "IPv4 fibmatch" - $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null - log_test $? 0 "IPv6 fibmatch" - - $IP route get fibmatch 198.51.100.2 | \ - grep -q "linkdown" - log_test $? 0 "IPv4 linkdown flag set" - $IP -6 route get fibmatch 2001:db8:1::2 | \ - grep -q "linkdown" - log_test $? 0 "IPv6 linkdown flag set" - - set -e - $IP address add 192.0.2.1/24 dev dummy0 - $IP -6 address add 2001:db8:2::1/64 dev dummy0 - set +e - - echo " Second address added with carrier down" - $IP route get fibmatch 192.0.2.2 &> /dev/null - log_test $? 0 "IPv4 fibmatch" - $IP -6 route get fibmatch 2001:db8:2::2 &> /dev/null - log_test $? 0 "IPv6 fibmatch" - - $IP route get fibmatch 192.0.2.2 | \ - grep -q "linkdown" - log_test $? 0 "IPv4 linkdown flag set" - $IP -6 route get fibmatch 2001:db8:2::2 | \ - grep -q "linkdown" - log_test $? 0 "IPv6 linkdown flag set" - - cleanup -} - -fib_carrier_test() -{ - fib_carrier_local_test - fib_carrier_unicast_test -} - -fib_rp_filter_test() -{ - echo - echo "IPv4 rp_filter tests" - - setup - - set -e - ip netns add ns2 - ip netns set ns2 auto - - ip -netns ns2 link set dev lo up - - $IP link add name veth1 type veth peer name veth2 - $IP link set dev veth2 netns ns2 - $IP address add 192.0.2.1/24 dev veth1 - ip -netns ns2 address add 192.0.2.1/24 dev veth2 - $IP link set dev veth1 up - ip -netns ns2 link set dev veth2 up - - $IP link set dev lo address 52:54:00:6a:c7:5e - $IP link set dev veth1 address 52:54:00:6a:c7:5e - ip -netns ns2 link set dev lo address 52:54:00:6a:c7:5e - ip -netns ns2 link set dev veth2 address 52:54:00:6a:c7:5e - - # 1. (ns2) redirect lo's egress to veth2's egress - ip netns exec ns2 tc qdisc add dev lo parent root handle 1: fq_codel - ip netns exec ns2 tc filter add dev lo parent 1: protocol arp basic \ - action mirred egress redirect dev veth2 - ip netns exec ns2 tc filter add dev lo parent 1: protocol ip basic \ - action mirred egress redirect dev veth2 - - # 2. (ns1) redirect veth1's ingress to lo's ingress - $NS_EXEC tc qdisc add dev veth1 ingress - $NS_EXEC tc filter add dev veth1 ingress protocol arp basic \ - action mirred ingress redirect dev lo - $NS_EXEC tc filter add dev veth1 ingress protocol ip basic \ - action mirred ingress redirect dev lo - - # 3. (ns1) redirect lo's egress to veth1's egress - $NS_EXEC tc qdisc add dev lo parent root handle 1: fq_codel - $NS_EXEC tc filter add dev lo parent 1: protocol arp basic \ - action mirred egress redirect dev veth1 - $NS_EXEC tc filter add dev lo parent 1: protocol ip basic \ - action mirred egress redirect dev veth1 - - # 4. (ns2) redirect veth2's ingress to lo's ingress - ip netns exec ns2 tc qdisc add dev veth2 ingress - ip netns exec ns2 tc filter add dev veth2 ingress protocol arp basic \ - action mirred ingress redirect dev lo - ip netns exec ns2 tc filter add dev veth2 ingress protocol ip basic \ - action mirred ingress redirect dev lo - - $NS_EXEC sysctl -qw net.ipv4.conf.all.rp_filter=1 - $NS_EXEC sysctl -qw net.ipv4.conf.all.accept_local=1 - $NS_EXEC sysctl -qw net.ipv4.conf.all.route_localnet=1 - ip netns exec ns2 sysctl -qw net.ipv4.conf.all.rp_filter=1 - ip netns exec ns2 sysctl -qw net.ipv4.conf.all.accept_local=1 - ip netns exec ns2 sysctl -qw net.ipv4.conf.all.route_localnet=1 - set +e - - run_cmd "ip netns exec ns2 ping -w1 -c1 192.0.2.1" - log_test $? 0 "rp_filter passes local packets" - - run_cmd "ip netns exec ns2 ping -w1 -c1 127.0.0.1" - log_test $? 0 "rp_filter passes loopback packets" - - cleanup -} - -################################################################################ -# Tests on nexthop spec - -# run 'ip route add' with given spec -add_rt() -{ - local desc="$1" - local erc=$2 - local vrf=$3 - local pfx=$4 - local gw=$5 - local dev=$6 - local cmd out rc - - [ "$vrf" = "-" ] && vrf="default" - [ -n "$gw" ] && gw="via $gw" - [ -n "$dev" ] && dev="dev $dev" - - cmd="$IP route add vrf $vrf $pfx $gw $dev" - if [ "$VERBOSE" = "1" ]; then - printf "\n COMMAND: $cmd\n" - fi - - out=$(eval $cmd 2>&1) - rc=$? - if [ "$VERBOSE" = "1" -a -n "$out" ]; then - echo " $out" - fi - log_test $rc $erc "$desc" -} - -fib4_nexthop() -{ - echo - echo "IPv4 nexthop tests" - - echo "<<< write me >>>" -} - -fib6_nexthop() -{ - local lldummy=$(get_linklocal dummy0) - local llv1=$(get_linklocal dummy0) - - if [ -z "$lldummy" ]; then - echo "Failed to get linklocal address for dummy0" - return 1 - fi - if [ -z "$llv1" ]; then - echo "Failed to get linklocal address for veth1" - return 1 - fi - - echo - echo "IPv6 nexthop tests" - - add_rt "Directly connected nexthop, unicast address" 0 \ - - 2001:db8:101::/64 2001:db8:1::2 - add_rt "Directly connected nexthop, unicast address with device" 0 \ - - 2001:db8:102::/64 2001:db8:1::2 "dummy0" - add_rt "Gateway is linklocal address" 0 \ - - 2001:db8:103::1/64 $llv1 "veth0" - - # fails because LL address requires a device - add_rt "Gateway is linklocal address, no device" 2 \ - - 2001:db8:104::1/64 $llv1 - - # local address can not be a gateway - add_rt "Gateway can not be local unicast address" 2 \ - - 2001:db8:105::/64 2001:db8:1::1 - add_rt "Gateway can not be local unicast address, with device" 2 \ - - 2001:db8:106::/64 2001:db8:1::1 "dummy0" - add_rt "Gateway can not be a local linklocal address" 2 \ - - 2001:db8:107::1/64 $lldummy "dummy0" - - # VRF tests - add_rt "Gateway can be local address in a VRF" 0 \ - - 2001:db8:108::/64 2001:db8:51::2 - add_rt "Gateway can be local address in a VRF, with device" 0 \ - - 2001:db8:109::/64 2001:db8:51::2 "veth0" - add_rt "Gateway can be local linklocal address in a VRF" 0 \ - - 2001:db8:110::1/64 $llv1 "veth0" - - add_rt "Redirect to VRF lookup" 0 \ - - 2001:db8:111::/64 "" "red" - - add_rt "VRF route, gateway can be local address in default VRF" 0 \ - red 2001:db8:112::/64 2001:db8:51::1 - - # local address in same VRF fails - add_rt "VRF route, gateway can not be a local address" 2 \ - red 2001:db8:113::1/64 2001:db8:2::1 - add_rt "VRF route, gateway can not be a local addr with device" 2 \ - red 2001:db8:114::1/64 2001:db8:2::1 "dummy1" -} - -# Default VRF: -# dummy0 - 198.51.100.1/24 2001:db8:1::1/64 -# veth0 - 192.0.2.1/24 2001:db8:51::1/64 -# -# VRF red: -# dummy1 - 192.168.2.1/24 2001:db8:2::1/64 -# veth1 - 192.0.2.2/24 2001:db8:51::2/64 -# -# [ dummy0 veth0 ]--[ veth1 dummy1 ] - -fib_nexthop_test() -{ - setup - - set -e - - $IP -4 rule add pref 32765 table local - $IP -4 rule del pref 0 - $IP -6 rule add pref 32765 table local - $IP -6 rule del pref 0 - - $IP link add red type vrf table 1 - $IP link set red up - $IP -4 route add vrf red unreachable default metric 4278198272 - $IP -6 route add vrf red unreachable default metric 4278198272 - - $IP link add veth0 type veth peer name veth1 - $IP link set dev veth0 up - $IP address add 192.0.2.1/24 dev veth0 - $IP -6 address add 2001:db8:51::1/64 dev veth0 - - $IP link set dev veth1 vrf red up - $IP address add 192.0.2.2/24 dev veth1 - $IP -6 address add 2001:db8:51::2/64 dev veth1 - - $IP link add dummy1 type dummy - $IP link set dev dummy1 vrf red up - $IP address add 192.168.2.1/24 dev dummy1 - $IP -6 address add 2001:db8:2::1/64 dev dummy1 - set +e - - sleep 1 - fib4_nexthop - fib6_nexthop - - ( - $IP link del dev dummy1 - $IP link del veth0 - $IP link del red - ) 2>/dev/null - cleanup -} - -fib_suppress_test() -{ - echo - echo "FIB rule with suppress_prefixlength" - setup - - $IP link add dummy1 type dummy - $IP link set dummy1 up - $IP -6 route add default dev dummy1 - $IP -6 rule add table main suppress_prefixlength 0 - ping -f -c 1000 -W 1 1234::1 >/dev/null 2>&1 - $IP -6 rule del table main suppress_prefixlength 0 - $IP link del dummy1 - - # If we got here without crashing, we're good. - log_test 0 0 "FIB rule suppress test" - - cleanup -} - -################################################################################ -# Tests on route add and replace - -run_cmd() -{ - local cmd="$1" - local out - local stderr="2>/dev/null" - - if [ "$VERBOSE" = "1" ]; then - printf " COMMAND: $cmd\n" - stderr= - fi - - out=$(eval $cmd $stderr) - rc=$? - if [ "$VERBOSE" = "1" -a -n "$out" ]; then - echo " $out" - fi - - [ "$VERBOSE" = "1" ] && echo - - return $rc -} - -check_expected() -{ - local out="$1" - local expected="$2" - local rc=0 - - [ "${out}" = "${expected}" ] && return 0 - - if [ -z "${out}" ]; then - if [ "$VERBOSE" = "1" ]; then - printf "\nNo route entry found\n" - printf "Expected:\n" - printf " ${expected}\n" - fi - return 1 - fi - - # tricky way to convert output to 1-line without ip's - # messy ''; this drops all extra white space - out=$(echo ${out}) - if [ "${out}" != "${expected}" ]; then - rc=1 - if [ "${VERBOSE}" = "1" ]; then - printf " Unexpected route entry. Have:\n" - printf " ${out}\n" - printf " Expected:\n" - printf " ${expected}\n\n" - fi - fi - - return $rc -} - -# add route for a prefix, flushing any existing routes first -# expected to be the first step of a test -add_route6() -{ - local pfx="$1" - local nh="$2" - local out - - if [ "$VERBOSE" = "1" ]; then - echo - echo " ##################################################" - echo - fi - - run_cmd "$IP -6 ro flush ${pfx}" - [ $? -ne 0 ] && exit 1 - - out=$($IP -6 ro ls match ${pfx}) - if [ -n "$out" ]; then - echo "Failed to flush routes for prefix used for tests." - exit 1 - fi - - run_cmd "$IP -6 ro add ${pfx} ${nh}" - if [ $? -ne 0 ]; then - echo "Failed to add initial route for test." - exit 1 - fi -} - -# add initial route - used in replace route tests -add_initial_route6() -{ - add_route6 "2001:db8:104::/64" "$1" -} - -check_route6() -{ - local pfx - local expected="$1" - local out - local rc=0 - - set -- $expected - pfx=$1 - - out=$($IP -6 ro ls match ${pfx} | sed -e 's/ pref medium//') - check_expected "${out}" "${expected}" -} - -route_cleanup() -{ - $IP li del red 2>/dev/null - $IP li del dummy1 2>/dev/null - $IP li del veth1 2>/dev/null - $IP li del veth3 2>/dev/null - - cleanup &> /dev/null -} - -route_setup() -{ - route_cleanup - setup - - [ "${VERBOSE}" = "1" ] && set -x - set -e - - ip netns add ns2 - ip netns set ns2 auto - ip -netns ns2 link set dev lo up - ip netns exec ns2 sysctl -qw net.ipv4.ip_forward=1 - ip netns exec ns2 sysctl -qw net.ipv6.conf.all.forwarding=1 - - $IP li add veth1 type veth peer name veth2 - $IP li add veth3 type veth peer name veth4 - - $IP li set veth1 up - $IP li set veth3 up - $IP li set veth2 netns ns2 up - $IP li set veth4 netns ns2 up - ip -netns ns2 li add dummy1 type dummy - ip -netns ns2 li set dummy1 up - - $IP -6 addr add 2001:db8:101::1/64 dev veth1 nodad - $IP -6 addr add 2001:db8:103::1/64 dev veth3 nodad - $IP addr add 172.16.101.1/24 dev veth1 - $IP addr add 172.16.103.1/24 dev veth3 - - ip -netns ns2 -6 addr add 2001:db8:101::2/64 dev veth2 nodad - ip -netns ns2 -6 addr add 2001:db8:103::2/64 dev veth4 nodad - ip -netns ns2 -6 addr add 2001:db8:104::1/64 dev dummy1 nodad - - ip -netns ns2 addr add 172.16.101.2/24 dev veth2 - ip -netns ns2 addr add 172.16.103.2/24 dev veth4 - ip -netns ns2 addr add 172.16.104.1/24 dev dummy1 - - set +e -} - -# assumption is that basic add of a single path route works -# otherwise just adding an address on an interface is broken -ipv6_rt_add() -{ - local rc - - echo - echo "IPv6 route add / append tests" - - # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL - add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" - run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::2" - log_test $? 2 "Attempt to add duplicate route - gw" - - # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL - add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" - run_cmd "$IP -6 ro add 2001:db8:104::/64 dev veth3" - log_test $? 2 "Attempt to add duplicate route - dev only" - - # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL - add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" - run_cmd "$IP -6 ro add unreachable 2001:db8:104::/64" - log_test $? 2 "Attempt to add duplicate route - reject route" - - # route append with same prefix adds a new route - # - iproute2 sets NLM_F_CREATE | NLM_F_APPEND - add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" - run_cmd "$IP -6 ro append 2001:db8:104::/64 via 2001:db8:103::2" - check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" - log_test $? 0 "Append nexthop to existing route - gw" - - # insert mpath directly - add_route6 "2001:db8:104::/64" "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" - check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" - log_test $? 0 "Add multipath route" - - add_route6 "2001:db8:104::/64" "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" - run_cmd "$IP -6 ro add 2001:db8:104::/64 nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" - log_test $? 2 "Attempt to add duplicate multipath route" - - # insert of a second route without append but different metric - add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" - run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::2 metric 512" - rc=$? - if [ $rc -eq 0 ]; then - run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::3 metric 256" - rc=$? - fi - log_test $rc 0 "Route add with different metrics" - - run_cmd "$IP -6 ro del 2001:db8:104::/64 metric 512" - rc=$? - if [ $rc -eq 0 ]; then - check_route6 "2001:db8:104::/64 via 2001:db8:103::3 dev veth3 metric 256 2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024" - rc=$? - fi - log_test $rc 0 "Route delete with metric" -} - -ipv6_rt_replace_single() -{ - # single path with single path - # - add_initial_route6 "via 2001:db8:101::2" - run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:103::2" - check_route6 "2001:db8:104::/64 via 2001:db8:103::2 dev veth3 metric 1024" - log_test $? 0 "Single path with single path" - - # single path with multipath - # - add_initial_route6 "nexthop via 2001:db8:101::2" - run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::2" - check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::3 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" - log_test $? 0 "Single path with multipath" - - # single path with single path using MULTIPATH attribute - # - add_initial_route6 "via 2001:db8:101::2" - run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:103::2" - check_route6 "2001:db8:104::/64 via 2001:db8:103::2 dev veth3 metric 1024" - log_test $? 0 "Single path with single path via multipath attribute" - - # route replace fails - invalid nexthop - add_initial_route6 "via 2001:db8:101::2" - run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:104::2" - if [ $? -eq 0 ]; then - # previous command is expected to fail so if it returns 0 - # that means the test failed. - log_test 0 1 "Invalid nexthop" - else - check_route6 "2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024" - log_test $? 0 "Invalid nexthop" - fi - - # replace non-existent route - # - note use of change versus replace since ip adds NLM_F_CREATE - # for replace - add_initial_route6 "via 2001:db8:101::2" - run_cmd "$IP -6 ro change 2001:db8:105::/64 via 2001:db8:101::2" - log_test $? 2 "Single path - replace of non-existent route" -} - -ipv6_rt_replace_mpath() -{ - # multipath with multipath - add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" - run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::3" - check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::3 dev veth1 weight 1 nexthop via 2001:db8:103::3 dev veth3 weight 1" - log_test $? 0 "Multipath with multipath" - - # multipath with single - add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" - run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:101::3" - check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024" - log_test $? 0 "Multipath with single path" - - # multipath with single - add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" - run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3" - check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024" - log_test $? 0 "Multipath with single path via multipath attribute" - - # multipath with dev-only - add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" - run_cmd "$IP -6 ro replace 2001:db8:104::/64 dev veth1" - check_route6 "2001:db8:104::/64 dev veth1 metric 1024" - log_test $? 0 "Multipath with dev-only" - - # route replace fails - invalid nexthop 1 - add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" - run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:111::3 nexthop via 2001:db8:103::3" - check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" - log_test $? 0 "Multipath - invalid first nexthop" - - # route replace fails - invalid nexthop 2 - add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" - run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:113::3" - check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" - log_test $? 0 "Multipath - invalid second nexthop" - - # multipath non-existent route - add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" - run_cmd "$IP -6 ro change 2001:db8:105::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::3" - log_test $? 2 "Multipath - replace of non-existent route" -} - -ipv6_rt_replace() -{ - echo - echo "IPv6 route replace tests" - - ipv6_rt_replace_single - ipv6_rt_replace_mpath -} - -ipv6_route_test() -{ - route_setup - - ipv6_rt_add - ipv6_rt_replace - - route_cleanup -} - -ip_addr_metric_check() -{ - ip addr help 2>&1 | grep -q metric - if [ $? -ne 0 ]; then - echo "iproute2 command does not support metric for addresses. Skipping test" - return 1 - fi - - return 0 -} - -ipv6_addr_metric_test() -{ - local rc - - echo - echo "IPv6 prefix route tests" - - ip_addr_metric_check || return 1 - - setup - - set -e - $IP li add dummy1 type dummy - $IP li add dummy2 type dummy - $IP li set dummy1 up - $IP li set dummy2 up - - # default entry is metric 256 - run_cmd "$IP -6 addr add dev dummy1 2001:db8:104::1/64" - run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::2/64" - set +e - - check_route6 "2001:db8:104::/64 dev dummy1 proto kernel metric 256 2001:db8:104::/64 dev dummy2 proto kernel metric 256" - log_test $? 0 "Default metric" - - set -e - run_cmd "$IP -6 addr flush dev dummy1" - run_cmd "$IP -6 addr add dev dummy1 2001:db8:104::1/64 metric 257" - set +e - - check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 256 2001:db8:104::/64 dev dummy1 proto kernel metric 257" - log_test $? 0 "User specified metric on first device" - - set -e - run_cmd "$IP -6 addr flush dev dummy2" - run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::2/64 metric 258" - set +e - - check_route6 "2001:db8:104::/64 dev dummy1 proto kernel metric 257 2001:db8:104::/64 dev dummy2 proto kernel metric 258" - log_test $? 0 "User specified metric on second device" - - run_cmd "$IP -6 addr del dev dummy1 2001:db8:104::1/64 metric 257" - rc=$? - if [ $rc -eq 0 ]; then - check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 258" - rc=$? - fi - log_test $rc 0 "Delete of address on first device" - - run_cmd "$IP -6 addr change dev dummy2 2001:db8:104::2/64 metric 259" - rc=$? - if [ $rc -eq 0 ]; then - check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 259" - rc=$? - fi - log_test $rc 0 "Modify metric of address" - - # verify prefix route removed on down - run_cmd "ip netns exec ns1 sysctl -qw net.ipv6.conf.all.keep_addr_on_down=1" - run_cmd "$IP li set dev dummy2 down" - rc=$? - if [ $rc -eq 0 ]; then - out=$($IP -6 ro ls match 2001:db8:104::/64) - check_expected "${out}" "" - rc=$? - fi - log_test $rc 0 "Prefix route removed on link down" - - # verify prefix route re-inserted with assigned metric - run_cmd "$IP li set dev dummy2 up" - rc=$? - if [ $rc -eq 0 ]; then - check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 259" - rc=$? - fi - log_test $rc 0 "Prefix route with metric on link up" - - # verify peer metric added correctly - set -e - run_cmd "$IP -6 addr flush dev dummy2" - run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::1 peer 2001:db8:104::2 metric 260" - set +e - - check_route6 "2001:db8:104::1 dev dummy2 proto kernel metric 260" - log_test $? 0 "Set metric with peer route on local side" - check_route6 "2001:db8:104::2 dev dummy2 proto kernel metric 260" - log_test $? 0 "Set metric with peer route on peer side" - - set -e - run_cmd "$IP -6 addr change dev dummy2 2001:db8:104::1 peer 2001:db8:104::3 metric 261" - set +e - - check_route6 "2001:db8:104::1 dev dummy2 proto kernel metric 261" - log_test $? 0 "Modify metric and peer address on local side" - check_route6 "2001:db8:104::3 dev dummy2 proto kernel metric 261" - log_test $? 0 "Modify metric and peer address on peer side" - - $IP li del dummy1 - $IP li del dummy2 - cleanup -} - -ipv6_route_metrics_test() -{ - local rc - - echo - echo "IPv6 routes with metrics" - - route_setup - - # - # single path with metrics - # - run_cmd "$IP -6 ro add 2001:db8:111::/64 via 2001:db8:101::2 mtu 1400" - rc=$? - if [ $rc -eq 0 ]; then - check_route6 "2001:db8:111::/64 via 2001:db8:101::2 dev veth1 metric 1024 mtu 1400" - rc=$? - fi - log_test $rc 0 "Single path route with mtu metric" - - - # - # multipath via separate routes with metrics - # - run_cmd "$IP -6 ro add 2001:db8:112::/64 via 2001:db8:101::2 mtu 1400" - run_cmd "$IP -6 ro append 2001:db8:112::/64 via 2001:db8:103::2" - rc=$? - if [ $rc -eq 0 ]; then - check_route6 "2001:db8:112::/64 metric 1024 mtu 1400 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" - rc=$? - fi - log_test $rc 0 "Multipath route via 2 single routes with mtu metric on first" - - # second route is coalesced to first to make a multipath route. - # MTU of the second path is hidden from display! - run_cmd "$IP -6 ro add 2001:db8:113::/64 via 2001:db8:101::2" - run_cmd "$IP -6 ro append 2001:db8:113::/64 via 2001:db8:103::2 mtu 1400" - rc=$? - if [ $rc -eq 0 ]; then - check_route6 "2001:db8:113::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" - rc=$? - fi - log_test $rc 0 "Multipath route via 2 single routes with mtu metric on 2nd" - - run_cmd "$IP -6 ro del 2001:db8:113::/64 via 2001:db8:101::2" - if [ $? -eq 0 ]; then - check_route6 "2001:db8:113::/64 via 2001:db8:103::2 dev veth3 metric 1024 mtu 1400" - log_test $? 0 " MTU of second leg" - fi - - # - # multipath with metrics - # - run_cmd "$IP -6 ro add 2001:db8:115::/64 mtu 1400 nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" - rc=$? - if [ $rc -eq 0 ]; then - check_route6 "2001:db8:115::/64 metric 1024 mtu 1400 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" - rc=$? - fi - log_test $rc 0 "Multipath route with mtu metric" - - $IP -6 ro add 2001:db8:104::/64 via 2001:db8:101::2 mtu 1300 - run_cmd "ip netns exec ns1 ${ping6} -w1 -c1 -s 1500 2001:db8:104::1" - log_test $? 0 "Using route with mtu metric" - - run_cmd "$IP -6 ro add 2001:db8:114::/64 via 2001:db8:101::2 congctl lock foo" - log_test $? 2 "Invalid metric (fails metric_convert)" - - route_cleanup -} - -# add route for a prefix, flushing any existing routes first -# expected to be the first step of a test -add_route() -{ - local pfx="$1" - local nh="$2" - local out - - if [ "$VERBOSE" = "1" ]; then - echo - echo " ##################################################" - echo - fi - - run_cmd "$IP ro flush ${pfx}" - [ $? -ne 0 ] && exit 1 - - out=$($IP ro ls match ${pfx}) - if [ -n "$out" ]; then - echo "Failed to flush routes for prefix used for tests." - exit 1 - fi - - run_cmd "$IP ro add ${pfx} ${nh}" - if [ $? -ne 0 ]; then - echo "Failed to add initial route for test." - exit 1 - fi -} - -# add initial route - used in replace route tests -add_initial_route() -{ - add_route "172.16.104.0/24" "$1" -} - -check_route() -{ - local pfx - local expected="$1" - local out - - set -- $expected - pfx=$1 - [ "${pfx}" = "unreachable" ] && pfx=$2 - - out=$($IP ro ls match ${pfx}) - check_expected "${out}" "${expected}" -} - -# assumption is that basic add of a single path route works -# otherwise just adding an address on an interface is broken -ipv4_rt_add() -{ - local rc - - echo - echo "IPv4 route add / append tests" - - # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL - add_route "172.16.104.0/24" "via 172.16.101.2" - run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.2" - log_test $? 2 "Attempt to add duplicate route - gw" - - # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL - add_route "172.16.104.0/24" "via 172.16.101.2" - run_cmd "$IP ro add 172.16.104.0/24 dev veth3" - log_test $? 2 "Attempt to add duplicate route - dev only" - - # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL - add_route "172.16.104.0/24" "via 172.16.101.2" - run_cmd "$IP ro add unreachable 172.16.104.0/24" - log_test $? 2 "Attempt to add duplicate route - reject route" - - # iproute2 prepend only sets NLM_F_CREATE - # - adds a new route; does NOT convert existing route to ECMP - add_route "172.16.104.0/24" "via 172.16.101.2" - run_cmd "$IP ro prepend 172.16.104.0/24 via 172.16.103.2" - check_route "172.16.104.0/24 via 172.16.103.2 dev veth3 172.16.104.0/24 via 172.16.101.2 dev veth1" - log_test $? 0 "Add new nexthop for existing prefix" - - # route append with same prefix adds a new route - # - iproute2 sets NLM_F_CREATE | NLM_F_APPEND - add_route "172.16.104.0/24" "via 172.16.101.2" - run_cmd "$IP ro append 172.16.104.0/24 via 172.16.103.2" - check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 via 172.16.103.2 dev veth3" - log_test $? 0 "Append nexthop to existing route - gw" - - add_route "172.16.104.0/24" "via 172.16.101.2" - run_cmd "$IP ro append 172.16.104.0/24 dev veth3" - check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 dev veth3 scope link" - log_test $? 0 "Append nexthop to existing route - dev only" - - add_route "172.16.104.0/24" "via 172.16.101.2" - run_cmd "$IP ro append unreachable 172.16.104.0/24" - check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 unreachable 172.16.104.0/24" - log_test $? 0 "Append nexthop to existing route - reject route" - - run_cmd "$IP ro flush 172.16.104.0/24" - run_cmd "$IP ro add unreachable 172.16.104.0/24" - run_cmd "$IP ro append 172.16.104.0/24 via 172.16.103.2" - check_route "unreachable 172.16.104.0/24 172.16.104.0/24 via 172.16.103.2 dev veth3" - log_test $? 0 "Append nexthop to existing reject route - gw" - - run_cmd "$IP ro flush 172.16.104.0/24" - run_cmd "$IP ro add unreachable 172.16.104.0/24" - run_cmd "$IP ro append 172.16.104.0/24 dev veth3" - check_route "unreachable 172.16.104.0/24 172.16.104.0/24 dev veth3 scope link" - log_test $? 0 "Append nexthop to existing reject route - dev only" - - # insert mpath directly - add_route "172.16.104.0/24" "nexthop via 172.16.101.2 nexthop via 172.16.103.2" - check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" - log_test $? 0 "add multipath route" - - add_route "172.16.104.0/24" "nexthop via 172.16.101.2 nexthop via 172.16.103.2" - run_cmd "$IP ro add 172.16.104.0/24 nexthop via 172.16.101.2 nexthop via 172.16.103.2" - log_test $? 2 "Attempt to add duplicate multipath route" - - # insert of a second route without append but different metric - add_route "172.16.104.0/24" "via 172.16.101.2" - run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.2 metric 512" - rc=$? - if [ $rc -eq 0 ]; then - run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.3 metric 256" - rc=$? - fi - log_test $rc 0 "Route add with different metrics" - - run_cmd "$IP ro del 172.16.104.0/24 metric 512" - rc=$? - if [ $rc -eq 0 ]; then - check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 via 172.16.103.3 dev veth3 metric 256" - rc=$? - fi - log_test $rc 0 "Route delete with metric" -} - -ipv4_rt_replace_single() -{ - # single path with single path - # - add_initial_route "via 172.16.101.2" - run_cmd "$IP ro replace 172.16.104.0/24 via 172.16.103.2" - check_route "172.16.104.0/24 via 172.16.103.2 dev veth3" - log_test $? 0 "Single path with single path" - - # single path with multipath - # - add_initial_route "nexthop via 172.16.101.2" - run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.2" - check_route "172.16.104.0/24 nexthop via 172.16.101.3 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" - log_test $? 0 "Single path with multipath" - - # single path with reject - # - add_initial_route "nexthop via 172.16.101.2" - run_cmd "$IP ro replace unreachable 172.16.104.0/24" - check_route "unreachable 172.16.104.0/24" - log_test $? 0 "Single path with reject route" - - # single path with single path using MULTIPATH attribute - # - add_initial_route "via 172.16.101.2" - run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.103.2" - check_route "172.16.104.0/24 via 172.16.103.2 dev veth3" - log_test $? 0 "Single path with single path via multipath attribute" - - # route replace fails - invalid nexthop - add_initial_route "via 172.16.101.2" - run_cmd "$IP ro replace 172.16.104.0/24 via 2001:db8:104::2" - if [ $? -eq 0 ]; then - # previous command is expected to fail so if it returns 0 - # that means the test failed. - log_test 0 1 "Invalid nexthop" - else - check_route "172.16.104.0/24 via 172.16.101.2 dev veth1" - log_test $? 0 "Invalid nexthop" - fi - - # replace non-existent route - # - note use of change versus replace since ip adds NLM_F_CREATE - # for replace - add_initial_route "via 172.16.101.2" - run_cmd "$IP ro change 172.16.105.0/24 via 172.16.101.2" - log_test $? 2 "Single path - replace of non-existent route" -} - -ipv4_rt_replace_mpath() -{ - # multipath with multipath - add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" - run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.3" - check_route "172.16.104.0/24 nexthop via 172.16.101.3 dev veth1 weight 1 nexthop via 172.16.103.3 dev veth3 weight 1" - log_test $? 0 "Multipath with multipath" - - # multipath with single - add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" - run_cmd "$IP ro replace 172.16.104.0/24 via 172.16.101.3" - check_route "172.16.104.0/24 via 172.16.101.3 dev veth1" - log_test $? 0 "Multipath with single path" - - # multipath with single - add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" - run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3" - check_route "172.16.104.0/24 via 172.16.101.3 dev veth1" - log_test $? 0 "Multipath with single path via multipath attribute" - - # multipath with reject - add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" - run_cmd "$IP ro replace unreachable 172.16.104.0/24" - check_route "unreachable 172.16.104.0/24" - log_test $? 0 "Multipath with reject route" - - # route replace fails - invalid nexthop 1 - add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" - run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.111.3 nexthop via 172.16.103.3" - check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" - log_test $? 0 "Multipath - invalid first nexthop" - - # route replace fails - invalid nexthop 2 - add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" - run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.113.3" - check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" - log_test $? 0 "Multipath - invalid second nexthop" - - # multipath non-existent route - add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" - run_cmd "$IP ro change 172.16.105.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.3" - log_test $? 2 "Multipath - replace of non-existent route" -} - -ipv4_rt_replace() -{ - echo - echo "IPv4 route replace tests" - - ipv4_rt_replace_single - ipv4_rt_replace_mpath -} - -ipv4_route_test() -{ - route_setup - - ipv4_rt_add - ipv4_rt_replace - - route_cleanup -} - -ipv4_addr_metric_test() -{ - local rc - - echo - echo "IPv4 prefix route tests" - - ip_addr_metric_check || return 1 - - setup - - set -e - $IP li add dummy1 type dummy - $IP li add dummy2 type dummy - $IP li set dummy1 up - $IP li set dummy2 up - - # default entry is metric 256 - run_cmd "$IP addr add dev dummy1 172.16.104.1/24" - run_cmd "$IP addr add dev dummy2 172.16.104.2/24" - set +e - - check_route "172.16.104.0/24 dev dummy1 proto kernel scope link src 172.16.104.1 172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2" - log_test $? 0 "Default metric" - - set -e - run_cmd "$IP addr flush dev dummy1" - run_cmd "$IP addr add dev dummy1 172.16.104.1/24 metric 257" - set +e - - check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 172.16.104.0/24 dev dummy1 proto kernel scope link src 172.16.104.1 metric 257" - log_test $? 0 "User specified metric on first device" - - set -e - run_cmd "$IP addr flush dev dummy2" - run_cmd "$IP addr add dev dummy2 172.16.104.2/24 metric 258" - set +e - - check_route "172.16.104.0/24 dev dummy1 proto kernel scope link src 172.16.104.1 metric 257 172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 258" - log_test $? 0 "User specified metric on second device" - - run_cmd "$IP addr del dev dummy1 172.16.104.1/24 metric 257" - rc=$? - if [ $rc -eq 0 ]; then - check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 258" - rc=$? - fi - log_test $rc 0 "Delete of address on first device" - - run_cmd "$IP addr change dev dummy2 172.16.104.2/24 metric 259" - rc=$? - if [ $rc -eq 0 ]; then - check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 259" - rc=$? - fi - log_test $rc 0 "Modify metric of address" - - # verify prefix route removed on down - run_cmd "$IP li set dev dummy2 down" - rc=$? - if [ $rc -eq 0 ]; then - out=$($IP ro ls match 172.16.104.0/24) - check_expected "${out}" "" - rc=$? - fi - log_test $rc 0 "Prefix route removed on link down" - - # verify prefix route re-inserted with assigned metric - run_cmd "$IP li set dev dummy2 up" - rc=$? - if [ $rc -eq 0 ]; then - check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 259" - rc=$? - fi - log_test $rc 0 "Prefix route with metric on link up" - - # explicitly check for metric changes on edge scenarios - run_cmd "$IP addr flush dev dummy2" - run_cmd "$IP addr add dev dummy2 172.16.104.0/24 metric 259" - run_cmd "$IP addr change dev dummy2 172.16.104.0/24 metric 260" - rc=$? - if [ $rc -eq 0 ]; then - check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.0 metric 260" - rc=$? - fi - log_test $rc 0 "Modify metric of .0/24 address" - - run_cmd "$IP addr flush dev dummy2" - run_cmd "$IP addr add dev dummy2 172.16.104.1/32 peer 172.16.104.2 metric 260" - rc=$? - if [ $rc -eq 0 ]; then - check_route "172.16.104.2 dev dummy2 proto kernel scope link src 172.16.104.1 metric 260" - rc=$? - fi - log_test $rc 0 "Set metric of address with peer route" - - run_cmd "$IP addr change dev dummy2 172.16.104.1/32 peer 172.16.104.3 metric 261" - rc=$? - if [ $rc -eq 0 ]; then - check_route "172.16.104.3 dev dummy2 proto kernel scope link src 172.16.104.1 metric 261" - rc=$? - fi - log_test $rc 0 "Modify metric and peer address for peer route" - - $IP li del dummy1 - $IP li del dummy2 - cleanup -} - -ipv4_route_metrics_test() -{ - local rc - - echo - echo "IPv4 route add / append tests" - - route_setup - - run_cmd "$IP ro add 172.16.111.0/24 via 172.16.101.2 mtu 1400" - rc=$? - if [ $rc -eq 0 ]; then - check_route "172.16.111.0/24 via 172.16.101.2 dev veth1 mtu 1400" - rc=$? - fi - log_test $rc 0 "Single path route with mtu metric" - - - run_cmd "$IP ro add 172.16.112.0/24 mtu 1400 nexthop via 172.16.101.2 nexthop via 172.16.103.2" - rc=$? - if [ $rc -eq 0 ]; then - check_route "172.16.112.0/24 mtu 1400 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" - rc=$? - fi - log_test $rc 0 "Multipath route with mtu metric" - - $IP ro add 172.16.104.0/24 via 172.16.101.2 mtu 1300 - run_cmd "ip netns exec ns1 ping -w1 -c1 -s 1500 172.16.104.1" - log_test $? 0 "Using route with mtu metric" - - run_cmd "$IP ro add 172.16.111.0/24 via 172.16.101.2 congctl lock foo" - log_test $? 2 "Invalid metric (fails metric_convert)" - - route_cleanup -} - -ipv4_route_v6_gw_test() -{ - local rc - - echo - echo "IPv4 route with IPv6 gateway tests" - - route_setup - sleep 2 - - # - # single path route - # - run_cmd "$IP ro add 172.16.104.0/24 via inet6 2001:db8:101::2" - rc=$? - log_test $rc 0 "Single path route with IPv6 gateway" - if [ $rc -eq 0 ]; then - check_route "172.16.104.0/24 via inet6 2001:db8:101::2 dev veth1" - fi - - run_cmd "ip netns exec ns1 ping -w1 -c1 172.16.104.1" - log_test $rc 0 "Single path route with IPv6 gateway - ping" - - run_cmd "$IP ro del 172.16.104.0/24 via inet6 2001:db8:101::2" - rc=$? - log_test $rc 0 "Single path route delete" - if [ $rc -eq 0 ]; then - check_route "172.16.112.0/24" - fi - - # - # multipath - v6 then v4 - # - run_cmd "$IP ro add 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3" - rc=$? - log_test $rc 0 "Multipath route add - v6 nexthop then v4" - if [ $rc -eq 0 ]; then - check_route "172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" - fi - - run_cmd "$IP ro del 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1" - log_test $? 2 " Multipath route delete - nexthops in wrong order" - - run_cmd "$IP ro del 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3" - log_test $? 0 " Multipath route delete exact match" - - # - # multipath - v4 then v6 - # - run_cmd "$IP ro add 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1" - rc=$? - log_test $rc 0 "Multipath route add - v4 nexthop then v6" - if [ $rc -eq 0 ]; then - check_route "172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 weight 1 nexthop via inet6 2001:db8:101::2 dev veth1 weight 1" - fi - - run_cmd "$IP ro del 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3" - log_test $? 2 " Multipath route delete - nexthops in wrong order" - - run_cmd "$IP ro del 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1" - log_test $? 0 " Multipath route delete exact match" - - route_cleanup -} - -################################################################################ -# usage - -usage() -{ - cat <<EOF -usage: ${0##*/} OPTS - - -t <test> Test(s) to run (default: all) - (options: $TESTS) - -p Pause on fail - -P Pause after each test before cleanup - -v verbose mode (show commands and output) -EOF -} - -################################################################################ -# main - -while getopts :t:pPhv o -do - case $o in - t) TESTS=$OPTARG;; - p) PAUSE_ON_FAIL=yes;; - P) PAUSE=yes;; - v) VERBOSE=$(($VERBOSE + 1));; - h) usage; exit 0;; - *) usage; exit 1;; - esac -done - -PEER_CMD="ip netns exec ${PEER_NS}" - -# make sure we don't pause twice -[ "${PAUSE}" = "yes" ] && PAUSE_ON_FAIL=no - -if [ "$(id -u)" -ne 0 ];then - echo "SKIP: Need root privileges" - exit $ksft_skip; -fi - -if [ ! -x "$(command -v ip)" ]; then - echo "SKIP: Could not run test without ip tool" - exit $ksft_skip -fi - -ip route help 2>&1 | grep -q fibmatch -if [ $? -ne 0 ]; then - echo "SKIP: iproute2 too old, missing fibmatch" - exit $ksft_skip -fi - -# start clean -cleanup &> /dev/null - -for t in $TESTS -do - case $t in - fib_unreg_test|unregister) fib_unreg_test;; - fib_down_test|down) fib_down_test;; - fib_carrier_test|carrier) fib_carrier_test;; - fib_rp_filter_test|rp_filter) fib_rp_filter_test;; - fib_nexthop_test|nexthop) fib_nexthop_test;; - fib_suppress_test|suppress) fib_suppress_test;; - ipv6_route_test|ipv6_rt) ipv6_route_test;; - ipv4_route_test|ipv4_rt) ipv4_route_test;; - ipv6_addr_metric) ipv6_addr_metric_test;; - ipv4_addr_metric) ipv4_addr_metric_test;; - ipv6_route_metrics) ipv6_route_metrics_test;; - ipv4_route_metrics) ipv4_route_metrics_test;; - ipv4_route_v6_gw) ipv4_route_v6_gw_test;; - - help) echo "Test names: $TESTS"; exit 0;; - esac -done - -if [ "$TESTS" != "none" ]; then - printf "\nTests passed: %3d\n" ${nsuccess} - printf "Tests failed: %3d\n" ${nfail} -fi - -exit $ret
Hi Greg,
It seems this patch deletes the whole fib_tests.sh that causes new failure in kselftests run. Looking at the upstream patch [1] and given the context that ipv4_del_addr test is not available to kernel 5.4, I updated the patch like attached below to resolve the potentail mistake.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
Thanks, Shaoying
================================ >8 ===========================================
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 908913d75847..f45b9daf62cf 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -420,6 +420,7 @@ static struct fib_info *fib_find_info(struct fib_info *nfi) nfi->fib_prefsrc == fi->fib_prefsrc && nfi->fib_priority == fi->fib_priority && nfi->fib_type == fi->fib_type && + nfi->fib_tb_id == fi->fib_tb_id && memcmp(nfi->fib_metrics, fi->fib_metrics, sizeof(u32) * RTAX_MAX) == 0 && !((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_COMPARE_MASK) &&
On Fri, Feb 03, 2023 at 06:47:29PM +0000, Shaoying Xu wrote:
Hi Greg,
It seems this patch deletes the whole fib_tests.sh that causes new failure in kselftests run. Looking at the upstream patch [1] and given the context that ipv4_del_addr test is not available to kernel 5.4, I updated the patch like attached below to resolve the potentail mistake.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
I'm sorry, but what commit exactly are you referring to here? There is no context at all in this email message :(
And if something is already in a release, I need a fix-up patch to resolve it, your patch was whitespace damaged and could not be applied to anything, sorry.
thanks,
greg k-h
On Mon, 12 Dec 2022 14:17:35 +0100 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f96a3d74554df537b6db5c99c27c80e7afadc8d1 ]
Cited commit added the table ID to the FIB info structure, but did not prevent structures with different table IDs from being consolidated. This can lead to routes being flushed from a VRF when an address is deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB info. This is already done for FIB info structures backed by a nexthop object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [FAIL] TEST: Route in default VRF not removed [ OK ] RTNETLINK answers: File exists TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6 Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8 Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
net/ipv4/fib_semantics.c | 1 + tools/testing/selftests/net/fib_tests.sh | 1727 ---------------------- 2 files changed, 1 insertion(+), 1727 deletions(-) delete mode 100755 tools/testing/selftests/net/fib_tests.sh
Hi Greg,
Sorry for last unclear message without context. I found this commit deleted the whole fib_tests.sh that causes new failure in kselftests run. Looking at the upstream patch [1] and given the context that ipv4_del_addr test is not available to kernel 5.4, I added the revert patch and new backport of this commit to resolve the potential mistake.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
Thanks, Shaoying
Ido Schimmel (1): ipv4: Fix incorrect route flushing when source address is deleted
Shaoying Xu (1): Revert "ipv4: Fix incorrect route flushing when source address is deleted"
tools/testing/selftests/net/fib_tests.sh | 1727 ++++++++++++++++++++++ 1 file changed, 1727 insertions(+) create mode 100755 tools/testing/selftests/net/fib_tests.sh
This reverts commit 2537b637eac0bd546f63e1492a34edd30878e8d4.
Signed-off-by: Shaoying Xu shaoyi@amazon.com --- net/ipv4/fib_semantics.c | 1 - tools/testing/selftests/net/fib_tests.sh | 1727 ++++++++++++++++++++++ 2 files changed, 1727 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/net/fib_tests.sh
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index f45b9daf62cf..908913d75847 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -420,7 +420,6 @@ static struct fib_info *fib_find_info(struct fib_info *nfi) nfi->fib_prefsrc == fi->fib_prefsrc && nfi->fib_priority == fi->fib_priority && nfi->fib_type == fi->fib_type && - nfi->fib_tb_id == fi->fib_tb_id && memcmp(nfi->fib_metrics, fi->fib_metrics, sizeof(u32) * RTAX_MAX) == 0 && !((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_COMPARE_MASK) && diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh new file mode 100755 index 000000000000..6986086035d6 --- /dev/null +++ b/tools/testing/selftests/net/fib_tests.sh @@ -0,0 +1,1727 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# This test is for checking IPv4 and IPv6 FIB behavior in response to +# different events. + +ret=0 +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +# all tests in this script. Can be overridden with -t option +TESTS="unregister down carrier nexthop suppress ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter" + +VERBOSE=0 +PAUSE_ON_FAIL=no +PAUSE=no +IP="ip -netns ns1" +NS_EXEC="ip netns exec ns1" + +which ping6 > /dev/null 2>&1 && ping6=$(which ping6) || ping6=$(which ping) + +log_test() +{ + local rc=$1 + local expected=$2 + local msg="$3" + + if [ ${rc} -eq ${expected} ]; then + printf " TEST: %-60s [ OK ]\n" "${msg}" + nsuccess=$((nsuccess+1)) + else + ret=1 + nfail=$((nfail+1)) + printf " TEST: %-60s [FAIL]\n" "${msg}" + if [ "${PAUSE_ON_FAIL}" = "yes" ]; then + echo + echo "hit enter to continue, 'q' to quit" + read a + [ "$a" = "q" ] && exit 1 + fi + fi + + if [ "${PAUSE}" = "yes" ]; then + echo + echo "hit enter to continue, 'q' to quit" + read a + [ "$a" = "q" ] && exit 1 + fi +} + +setup() +{ + set -e + ip netns add ns1 + ip netns set ns1 auto + $IP link set dev lo up + ip netns exec ns1 sysctl -qw net.ipv4.ip_forward=1 + ip netns exec ns1 sysctl -qw net.ipv6.conf.all.forwarding=1 + + $IP link add dummy0 type dummy + $IP link set dev dummy0 up + $IP address add 198.51.100.1/24 dev dummy0 + $IP -6 address add 2001:db8:1::1/64 dev dummy0 + set +e + +} + +cleanup() +{ + $IP link del dev dummy0 &> /dev/null + ip netns del ns1 + ip netns del ns2 &> /dev/null +} + +get_linklocal() +{ + local dev=$1 + local addr + + addr=$($IP -6 -br addr show dev ${dev} | \ + awk '{ + for (i = 3; i <= NF; ++i) { + if ($i ~ /^fe80/) + print $i + } + }' + ) + addr=${addr//*} + + [ -z "$addr" ] && return 1 + + echo $addr + + return 0 +} + +fib_unreg_unicast_test() +{ + echo + echo "Single path route test" + + setup + + echo " Start point" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + set -e + $IP link del dev dummy0 + set +e + + echo " Nexthop device deleted" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 2 "IPv4 fibmatch - no route" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 2 "IPv6 fibmatch - no route" + + cleanup +} + +fib_unreg_multipath_test() +{ + + echo + echo "Multipath route test" + + setup + + set -e + $IP link add dummy1 type dummy + $IP link set dev dummy1 up + $IP address add 192.0.2.1/24 dev dummy1 + $IP -6 address add 2001:db8:2::1/64 dev dummy1 + + $IP route add 203.0.113.0/24 \ + nexthop via 198.51.100.2 dev dummy0 \ + nexthop via 192.0.2.2 dev dummy1 + $IP -6 route add 2001:db8:3::/64 \ + nexthop via 2001:db8:1::2 dev dummy0 \ + nexthop via 2001:db8:2::2 dev dummy1 + set +e + + echo " Start point" + $IP route get fibmatch 203.0.113.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + set -e + $IP link del dev dummy0 + set +e + + echo " One nexthop device deleted" + $IP route get fibmatch 203.0.113.1 &> /dev/null + log_test $? 2 "IPv4 - multipath route removed on delete" + + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + # In IPv6 we do not flush the entire multipath route. + log_test $? 0 "IPv6 - multipath down to single path" + + set -e + $IP link del dev dummy1 + set +e + + echo " Second nexthop device deleted" + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + log_test $? 2 "IPv6 - no route" + + cleanup +} + +fib_unreg_test() +{ + fib_unreg_unicast_test + fib_unreg_multipath_test +} + +fib_down_unicast_test() +{ + echo + echo "Single path, admin down" + + setup + + echo " Start point" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + set -e + $IP link set dev dummy0 down + set +e + + echo " Route deleted on down" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 2 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 2 "IPv6 fibmatch" + + cleanup +} + +fib_down_multipath_test_do() +{ + local down_dev=$1 + local up_dev=$2 + + $IP route get fibmatch 203.0.113.1 \ + oif $down_dev &> /dev/null + log_test $? 2 "IPv4 fibmatch on down device" + $IP -6 route get fibmatch 2001:db8:3::1 \ + oif $down_dev &> /dev/null + log_test $? 2 "IPv6 fibmatch on down device" + + $IP route get fibmatch 203.0.113.1 \ + oif $up_dev &> /dev/null + log_test $? 0 "IPv4 fibmatch on up device" + $IP -6 route get fibmatch 2001:db8:3::1 \ + oif $up_dev &> /dev/null + log_test $? 0 "IPv6 fibmatch on up device" + + $IP route get fibmatch 203.0.113.1 | \ + grep $down_dev | grep -q "dead linkdown" + log_test $? 0 "IPv4 flags on down device" + $IP -6 route get fibmatch 2001:db8:3::1 | \ + grep $down_dev | grep -q "dead linkdown" + log_test $? 0 "IPv6 flags on down device" + + $IP route get fibmatch 203.0.113.1 | \ + grep $up_dev | grep -q "dead linkdown" + log_test $? 1 "IPv4 flags on up device" + $IP -6 route get fibmatch 2001:db8:3::1 | \ + grep $up_dev | grep -q "dead linkdown" + log_test $? 1 "IPv6 flags on up device" +} + +fib_down_multipath_test() +{ + echo + echo "Admin down multipath" + + setup + + set -e + $IP link add dummy1 type dummy + $IP link set dev dummy1 up + + $IP address add 192.0.2.1/24 dev dummy1 + $IP -6 address add 2001:db8:2::1/64 dev dummy1 + + $IP route add 203.0.113.0/24 \ + nexthop via 198.51.100.2 dev dummy0 \ + nexthop via 192.0.2.2 dev dummy1 + $IP -6 route add 2001:db8:3::/64 \ + nexthop via 2001:db8:1::2 dev dummy0 \ + nexthop via 2001:db8:2::2 dev dummy1 + set +e + + echo " Verify start point" + $IP route get fibmatch 203.0.113.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + set -e + $IP link set dev dummy0 down + set +e + + echo " One device down, one up" + fib_down_multipath_test_do "dummy0" "dummy1" + + set -e + $IP link set dev dummy0 up + $IP link set dev dummy1 down + set +e + + echo " Other device down and up" + fib_down_multipath_test_do "dummy1" "dummy0" + + set -e + $IP link set dev dummy0 down + set +e + + echo " Both devices down" + $IP route get fibmatch 203.0.113.1 &> /dev/null + log_test $? 2 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + log_test $? 2 "IPv6 fibmatch" + + $IP link del dev dummy1 + cleanup +} + +fib_down_test() +{ + fib_down_unicast_test + fib_down_multipath_test +} + +# Local routes should not be affected when carrier changes. +fib_carrier_local_test() +{ + echo + echo "Local carrier tests - single path" + + setup + + set -e + $IP link set dev dummy0 carrier on + set +e + + echo " Start point" + $IP route get fibmatch 198.51.100.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 198.51.100.1 | \ + grep -q "linkdown" + log_test $? 1 "IPv4 - no linkdown flag" + $IP -6 route get fibmatch 2001:db8:1::1 | \ + grep -q "linkdown" + log_test $? 1 "IPv6 - no linkdown flag" + + set -e + $IP link set dev dummy0 carrier off + sleep 1 + set +e + + echo " Carrier off on nexthop" + $IP route get fibmatch 198.51.100.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 198.51.100.1 | \ + grep -q "linkdown" + log_test $? 1 "IPv4 - linkdown flag set" + $IP -6 route get fibmatch 2001:db8:1::1 | \ + grep -q "linkdown" + log_test $? 1 "IPv6 - linkdown flag set" + + set -e + $IP address add 192.0.2.1/24 dev dummy0 + $IP -6 address add 2001:db8:2::1/64 dev dummy0 + set +e + + echo " Route to local address with carrier down" + $IP route get fibmatch 192.0.2.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:2::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 192.0.2.1 | \ + grep -q "linkdown" + log_test $? 1 "IPv4 linkdown flag set" + $IP -6 route get fibmatch 2001:db8:2::1 | \ + grep -q "linkdown" + log_test $? 1 "IPv6 linkdown flag set" + + cleanup +} + +fib_carrier_unicast_test() +{ + ret=0 + + echo + echo "Single path route carrier test" + + setup + + set -e + $IP link set dev dummy0 carrier on + set +e + + echo " Start point" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 198.51.100.2 | \ + grep -q "linkdown" + log_test $? 1 "IPv4 no linkdown flag" + $IP -6 route get fibmatch 2001:db8:1::2 | \ + grep -q "linkdown" + log_test $? 1 "IPv6 no linkdown flag" + + set -e + $IP link set dev dummy0 carrier off + sleep 1 + set +e + + echo " Carrier down" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 198.51.100.2 | \ + grep -q "linkdown" + log_test $? 0 "IPv4 linkdown flag set" + $IP -6 route get fibmatch 2001:db8:1::2 | \ + grep -q "linkdown" + log_test $? 0 "IPv6 linkdown flag set" + + set -e + $IP address add 192.0.2.1/24 dev dummy0 + $IP -6 address add 2001:db8:2::1/64 dev dummy0 + set +e + + echo " Second address added with carrier down" + $IP route get fibmatch 192.0.2.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:2::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 192.0.2.2 | \ + grep -q "linkdown" + log_test $? 0 "IPv4 linkdown flag set" + $IP -6 route get fibmatch 2001:db8:2::2 | \ + grep -q "linkdown" + log_test $? 0 "IPv6 linkdown flag set" + + cleanup +} + +fib_carrier_test() +{ + fib_carrier_local_test + fib_carrier_unicast_test +} + +fib_rp_filter_test() +{ + echo + echo "IPv4 rp_filter tests" + + setup + + set -e + ip netns add ns2 + ip netns set ns2 auto + + ip -netns ns2 link set dev lo up + + $IP link add name veth1 type veth peer name veth2 + $IP link set dev veth2 netns ns2 + $IP address add 192.0.2.1/24 dev veth1 + ip -netns ns2 address add 192.0.2.1/24 dev veth2 + $IP link set dev veth1 up + ip -netns ns2 link set dev veth2 up + + $IP link set dev lo address 52:54:00:6a:c7:5e + $IP link set dev veth1 address 52:54:00:6a:c7:5e + ip -netns ns2 link set dev lo address 52:54:00:6a:c7:5e + ip -netns ns2 link set dev veth2 address 52:54:00:6a:c7:5e + + # 1. (ns2) redirect lo's egress to veth2's egress + ip netns exec ns2 tc qdisc add dev lo parent root handle 1: fq_codel + ip netns exec ns2 tc filter add dev lo parent 1: protocol arp basic \ + action mirred egress redirect dev veth2 + ip netns exec ns2 tc filter add dev lo parent 1: protocol ip basic \ + action mirred egress redirect dev veth2 + + # 2. (ns1) redirect veth1's ingress to lo's ingress + $NS_EXEC tc qdisc add dev veth1 ingress + $NS_EXEC tc filter add dev veth1 ingress protocol arp basic \ + action mirred ingress redirect dev lo + $NS_EXEC tc filter add dev veth1 ingress protocol ip basic \ + action mirred ingress redirect dev lo + + # 3. (ns1) redirect lo's egress to veth1's egress + $NS_EXEC tc qdisc add dev lo parent root handle 1: fq_codel + $NS_EXEC tc filter add dev lo parent 1: protocol arp basic \ + action mirred egress redirect dev veth1 + $NS_EXEC tc filter add dev lo parent 1: protocol ip basic \ + action mirred egress redirect dev veth1 + + # 4. (ns2) redirect veth2's ingress to lo's ingress + ip netns exec ns2 tc qdisc add dev veth2 ingress + ip netns exec ns2 tc filter add dev veth2 ingress protocol arp basic \ + action mirred ingress redirect dev lo + ip netns exec ns2 tc filter add dev veth2 ingress protocol ip basic \ + action mirred ingress redirect dev lo + + $NS_EXEC sysctl -qw net.ipv4.conf.all.rp_filter=1 + $NS_EXEC sysctl -qw net.ipv4.conf.all.accept_local=1 + $NS_EXEC sysctl -qw net.ipv4.conf.all.route_localnet=1 + ip netns exec ns2 sysctl -qw net.ipv4.conf.all.rp_filter=1 + ip netns exec ns2 sysctl -qw net.ipv4.conf.all.accept_local=1 + ip netns exec ns2 sysctl -qw net.ipv4.conf.all.route_localnet=1 + set +e + + run_cmd "ip netns exec ns2 ping -w1 -c1 192.0.2.1" + log_test $? 0 "rp_filter passes local packets" + + run_cmd "ip netns exec ns2 ping -w1 -c1 127.0.0.1" + log_test $? 0 "rp_filter passes loopback packets" + + cleanup +} + +################################################################################ +# Tests on nexthop spec + +# run 'ip route add' with given spec +add_rt() +{ + local desc="$1" + local erc=$2 + local vrf=$3 + local pfx=$4 + local gw=$5 + local dev=$6 + local cmd out rc + + [ "$vrf" = "-" ] && vrf="default" + [ -n "$gw" ] && gw="via $gw" + [ -n "$dev" ] && dev="dev $dev" + + cmd="$IP route add vrf $vrf $pfx $gw $dev" + if [ "$VERBOSE" = "1" ]; then + printf "\n COMMAND: $cmd\n" + fi + + out=$(eval $cmd 2>&1) + rc=$? + if [ "$VERBOSE" = "1" -a -n "$out" ]; then + echo " $out" + fi + log_test $rc $erc "$desc" +} + +fib4_nexthop() +{ + echo + echo "IPv4 nexthop tests" + + echo "<<< write me >>>" +} + +fib6_nexthop() +{ + local lldummy=$(get_linklocal dummy0) + local llv1=$(get_linklocal dummy0) + + if [ -z "$lldummy" ]; then + echo "Failed to get linklocal address for dummy0" + return 1 + fi + if [ -z "$llv1" ]; then + echo "Failed to get linklocal address for veth1" + return 1 + fi + + echo + echo "IPv6 nexthop tests" + + add_rt "Directly connected nexthop, unicast address" 0 \ + - 2001:db8:101::/64 2001:db8:1::2 + add_rt "Directly connected nexthop, unicast address with device" 0 \ + - 2001:db8:102::/64 2001:db8:1::2 "dummy0" + add_rt "Gateway is linklocal address" 0 \ + - 2001:db8:103::1/64 $llv1 "veth0" + + # fails because LL address requires a device + add_rt "Gateway is linklocal address, no device" 2 \ + - 2001:db8:104::1/64 $llv1 + + # local address can not be a gateway + add_rt "Gateway can not be local unicast address" 2 \ + - 2001:db8:105::/64 2001:db8:1::1 + add_rt "Gateway can not be local unicast address, with device" 2 \ + - 2001:db8:106::/64 2001:db8:1::1 "dummy0" + add_rt "Gateway can not be a local linklocal address" 2 \ + - 2001:db8:107::1/64 $lldummy "dummy0" + + # VRF tests + add_rt "Gateway can be local address in a VRF" 0 \ + - 2001:db8:108::/64 2001:db8:51::2 + add_rt "Gateway can be local address in a VRF, with device" 0 \ + - 2001:db8:109::/64 2001:db8:51::2 "veth0" + add_rt "Gateway can be local linklocal address in a VRF" 0 \ + - 2001:db8:110::1/64 $llv1 "veth0" + + add_rt "Redirect to VRF lookup" 0 \ + - 2001:db8:111::/64 "" "red" + + add_rt "VRF route, gateway can be local address in default VRF" 0 \ + red 2001:db8:112::/64 2001:db8:51::1 + + # local address in same VRF fails + add_rt "VRF route, gateway can not be a local address" 2 \ + red 2001:db8:113::1/64 2001:db8:2::1 + add_rt "VRF route, gateway can not be a local addr with device" 2 \ + red 2001:db8:114::1/64 2001:db8:2::1 "dummy1" +} + +# Default VRF: +# dummy0 - 198.51.100.1/24 2001:db8:1::1/64 +# veth0 - 192.0.2.1/24 2001:db8:51::1/64 +# +# VRF red: +# dummy1 - 192.168.2.1/24 2001:db8:2::1/64 +# veth1 - 192.0.2.2/24 2001:db8:51::2/64 +# +# [ dummy0 veth0 ]--[ veth1 dummy1 ] + +fib_nexthop_test() +{ + setup + + set -e + + $IP -4 rule add pref 32765 table local + $IP -4 rule del pref 0 + $IP -6 rule add pref 32765 table local + $IP -6 rule del pref 0 + + $IP link add red type vrf table 1 + $IP link set red up + $IP -4 route add vrf red unreachable default metric 4278198272 + $IP -6 route add vrf red unreachable default metric 4278198272 + + $IP link add veth0 type veth peer name veth1 + $IP link set dev veth0 up + $IP address add 192.0.2.1/24 dev veth0 + $IP -6 address add 2001:db8:51::1/64 dev veth0 + + $IP link set dev veth1 vrf red up + $IP address add 192.0.2.2/24 dev veth1 + $IP -6 address add 2001:db8:51::2/64 dev veth1 + + $IP link add dummy1 type dummy + $IP link set dev dummy1 vrf red up + $IP address add 192.168.2.1/24 dev dummy1 + $IP -6 address add 2001:db8:2::1/64 dev dummy1 + set +e + + sleep 1 + fib4_nexthop + fib6_nexthop + + ( + $IP link del dev dummy1 + $IP link del veth0 + $IP link del red + ) 2>/dev/null + cleanup +} + +fib_suppress_test() +{ + echo + echo "FIB rule with suppress_prefixlength" + setup + + $IP link add dummy1 type dummy + $IP link set dummy1 up + $IP -6 route add default dev dummy1 + $IP -6 rule add table main suppress_prefixlength 0 + ping -f -c 1000 -W 1 1234::1 >/dev/null 2>&1 + $IP -6 rule del table main suppress_prefixlength 0 + $IP link del dummy1 + + # If we got here without crashing, we're good. + log_test 0 0 "FIB rule suppress test" + + cleanup +} + +################################################################################ +# Tests on route add and replace + +run_cmd() +{ + local cmd="$1" + local out + local stderr="2>/dev/null" + + if [ "$VERBOSE" = "1" ]; then + printf " COMMAND: $cmd\n" + stderr= + fi + + out=$(eval $cmd $stderr) + rc=$? + if [ "$VERBOSE" = "1" -a -n "$out" ]; then + echo " $out" + fi + + [ "$VERBOSE" = "1" ] && echo + + return $rc +} + +check_expected() +{ + local out="$1" + local expected="$2" + local rc=0 + + [ "${out}" = "${expected}" ] && return 0 + + if [ -z "${out}" ]; then + if [ "$VERBOSE" = "1" ]; then + printf "\nNo route entry found\n" + printf "Expected:\n" + printf " ${expected}\n" + fi + return 1 + fi + + # tricky way to convert output to 1-line without ip's + # messy ''; this drops all extra white space + out=$(echo ${out}) + if [ "${out}" != "${expected}" ]; then + rc=1 + if [ "${VERBOSE}" = "1" ]; then + printf " Unexpected route entry. Have:\n" + printf " ${out}\n" + printf " Expected:\n" + printf " ${expected}\n\n" + fi + fi + + return $rc +} + +# add route for a prefix, flushing any existing routes first +# expected to be the first step of a test +add_route6() +{ + local pfx="$1" + local nh="$2" + local out + + if [ "$VERBOSE" = "1" ]; then + echo + echo " ##################################################" + echo + fi + + run_cmd "$IP -6 ro flush ${pfx}" + [ $? -ne 0 ] && exit 1 + + out=$($IP -6 ro ls match ${pfx}) + if [ -n "$out" ]; then + echo "Failed to flush routes for prefix used for tests." + exit 1 + fi + + run_cmd "$IP -6 ro add ${pfx} ${nh}" + if [ $? -ne 0 ]; then + echo "Failed to add initial route for test." + exit 1 + fi +} + +# add initial route - used in replace route tests +add_initial_route6() +{ + add_route6 "2001:db8:104::/64" "$1" +} + +check_route6() +{ + local pfx + local expected="$1" + local out + local rc=0 + + set -- $expected + pfx=$1 + + out=$($IP -6 ro ls match ${pfx} | sed -e 's/ pref medium//') + check_expected "${out}" "${expected}" +} + +route_cleanup() +{ + $IP li del red 2>/dev/null + $IP li del dummy1 2>/dev/null + $IP li del veth1 2>/dev/null + $IP li del veth3 2>/dev/null + + cleanup &> /dev/null +} + +route_setup() +{ + route_cleanup + setup + + [ "${VERBOSE}" = "1" ] && set -x + set -e + + ip netns add ns2 + ip netns set ns2 auto + ip -netns ns2 link set dev lo up + ip netns exec ns2 sysctl -qw net.ipv4.ip_forward=1 + ip netns exec ns2 sysctl -qw net.ipv6.conf.all.forwarding=1 + + $IP li add veth1 type veth peer name veth2 + $IP li add veth3 type veth peer name veth4 + + $IP li set veth1 up + $IP li set veth3 up + $IP li set veth2 netns ns2 up + $IP li set veth4 netns ns2 up + ip -netns ns2 li add dummy1 type dummy + ip -netns ns2 li set dummy1 up + + $IP -6 addr add 2001:db8:101::1/64 dev veth1 nodad + $IP -6 addr add 2001:db8:103::1/64 dev veth3 nodad + $IP addr add 172.16.101.1/24 dev veth1 + $IP addr add 172.16.103.1/24 dev veth3 + + ip -netns ns2 -6 addr add 2001:db8:101::2/64 dev veth2 nodad + ip -netns ns2 -6 addr add 2001:db8:103::2/64 dev veth4 nodad + ip -netns ns2 -6 addr add 2001:db8:104::1/64 dev dummy1 nodad + + ip -netns ns2 addr add 172.16.101.2/24 dev veth2 + ip -netns ns2 addr add 172.16.103.2/24 dev veth4 + ip -netns ns2 addr add 172.16.104.1/24 dev dummy1 + + set +e +} + +# assumption is that basic add of a single path route works +# otherwise just adding an address on an interface is broken +ipv6_rt_add() +{ + local rc + + echo + echo "IPv6 route add / append tests" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::2" + log_test $? 2 "Attempt to add duplicate route - gw" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro add 2001:db8:104::/64 dev veth3" + log_test $? 2 "Attempt to add duplicate route - dev only" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro add unreachable 2001:db8:104::/64" + log_test $? 2 "Attempt to add duplicate route - reject route" + + # route append with same prefix adds a new route + # - iproute2 sets NLM_F_CREATE | NLM_F_APPEND + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro append 2001:db8:104::/64 via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Append nexthop to existing route - gw" + + # insert mpath directly + add_route6 "2001:db8:104::/64" "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Add multipath route" + + add_route6 "2001:db8:104::/64" "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro add 2001:db8:104::/64 nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + log_test $? 2 "Attempt to add duplicate multipath route" + + # insert of a second route without append but different metric + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::2 metric 512" + rc=$? + if [ $rc -eq 0 ]; then + run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::3 metric 256" + rc=$? + fi + log_test $rc 0 "Route add with different metrics" + + run_cmd "$IP -6 ro del 2001:db8:104::/64 metric 512" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:104::/64 via 2001:db8:103::3 dev veth3 metric 256 2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024" + rc=$? + fi + log_test $rc 0 "Route delete with metric" +} + +ipv6_rt_replace_single() +{ + # single path with single path + # + add_initial_route6 "via 2001:db8:101::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 via 2001:db8:103::2 dev veth3 metric 1024" + log_test $? 0 "Single path with single path" + + # single path with multipath + # + add_initial_route6 "nexthop via 2001:db8:101::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::3 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Single path with multipath" + + # single path with single path using MULTIPATH attribute + # + add_initial_route6 "via 2001:db8:101::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 via 2001:db8:103::2 dev veth3 metric 1024" + log_test $? 0 "Single path with single path via multipath attribute" + + # route replace fails - invalid nexthop + add_initial_route6 "via 2001:db8:101::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:104::2" + if [ $? -eq 0 ]; then + # previous command is expected to fail so if it returns 0 + # that means the test failed. + log_test 0 1 "Invalid nexthop" + else + check_route6 "2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024" + log_test $? 0 "Invalid nexthop" + fi + + # replace non-existent route + # - note use of change versus replace since ip adds NLM_F_CREATE + # for replace + add_initial_route6 "via 2001:db8:101::2" + run_cmd "$IP -6 ro change 2001:db8:105::/64 via 2001:db8:101::2" + log_test $? 2 "Single path - replace of non-existent route" +} + +ipv6_rt_replace_mpath() +{ + # multipath with multipath + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::3" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::3 dev veth1 weight 1 nexthop via 2001:db8:103::3 dev veth3 weight 1" + log_test $? 0 "Multipath with multipath" + + # multipath with single + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:101::3" + check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024" + log_test $? 0 "Multipath with single path" + + # multipath with single + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3" + check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024" + log_test $? 0 "Multipath with single path via multipath attribute" + + # multipath with dev-only + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 dev veth1" + check_route6 "2001:db8:104::/64 dev veth1 metric 1024" + log_test $? 0 "Multipath with dev-only" + + # route replace fails - invalid nexthop 1 + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:111::3 nexthop via 2001:db8:103::3" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Multipath - invalid first nexthop" + + # route replace fails - invalid nexthop 2 + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:113::3" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Multipath - invalid second nexthop" + + # multipath non-existent route + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro change 2001:db8:105::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::3" + log_test $? 2 "Multipath - replace of non-existent route" +} + +ipv6_rt_replace() +{ + echo + echo "IPv6 route replace tests" + + ipv6_rt_replace_single + ipv6_rt_replace_mpath +} + +ipv6_route_test() +{ + route_setup + + ipv6_rt_add + ipv6_rt_replace + + route_cleanup +} + +ip_addr_metric_check() +{ + ip addr help 2>&1 | grep -q metric + if [ $? -ne 0 ]; then + echo "iproute2 command does not support metric for addresses. Skipping test" + return 1 + fi + + return 0 +} + +ipv6_addr_metric_test() +{ + local rc + + echo + echo "IPv6 prefix route tests" + + ip_addr_metric_check || return 1 + + setup + + set -e + $IP li add dummy1 type dummy + $IP li add dummy2 type dummy + $IP li set dummy1 up + $IP li set dummy2 up + + # default entry is metric 256 + run_cmd "$IP -6 addr add dev dummy1 2001:db8:104::1/64" + run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::2/64" + set +e + + check_route6 "2001:db8:104::/64 dev dummy1 proto kernel metric 256 2001:db8:104::/64 dev dummy2 proto kernel metric 256" + log_test $? 0 "Default metric" + + set -e + run_cmd "$IP -6 addr flush dev dummy1" + run_cmd "$IP -6 addr add dev dummy1 2001:db8:104::1/64 metric 257" + set +e + + check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 256 2001:db8:104::/64 dev dummy1 proto kernel metric 257" + log_test $? 0 "User specified metric on first device" + + set -e + run_cmd "$IP -6 addr flush dev dummy2" + run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::2/64 metric 258" + set +e + + check_route6 "2001:db8:104::/64 dev dummy1 proto kernel metric 257 2001:db8:104::/64 dev dummy2 proto kernel metric 258" + log_test $? 0 "User specified metric on second device" + + run_cmd "$IP -6 addr del dev dummy1 2001:db8:104::1/64 metric 257" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 258" + rc=$? + fi + log_test $rc 0 "Delete of address on first device" + + run_cmd "$IP -6 addr change dev dummy2 2001:db8:104::2/64 metric 259" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 259" + rc=$? + fi + log_test $rc 0 "Modify metric of address" + + # verify prefix route removed on down + run_cmd "ip netns exec ns1 sysctl -qw net.ipv6.conf.all.keep_addr_on_down=1" + run_cmd "$IP li set dev dummy2 down" + rc=$? + if [ $rc -eq 0 ]; then + out=$($IP -6 ro ls match 2001:db8:104::/64) + check_expected "${out}" "" + rc=$? + fi + log_test $rc 0 "Prefix route removed on link down" + + # verify prefix route re-inserted with assigned metric + run_cmd "$IP li set dev dummy2 up" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 259" + rc=$? + fi + log_test $rc 0 "Prefix route with metric on link up" + + # verify peer metric added correctly + set -e + run_cmd "$IP -6 addr flush dev dummy2" + run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::1 peer 2001:db8:104::2 metric 260" + set +e + + check_route6 "2001:db8:104::1 dev dummy2 proto kernel metric 260" + log_test $? 0 "Set metric with peer route on local side" + check_route6 "2001:db8:104::2 dev dummy2 proto kernel metric 260" + log_test $? 0 "Set metric with peer route on peer side" + + set -e + run_cmd "$IP -6 addr change dev dummy2 2001:db8:104::1 peer 2001:db8:104::3 metric 261" + set +e + + check_route6 "2001:db8:104::1 dev dummy2 proto kernel metric 261" + log_test $? 0 "Modify metric and peer address on local side" + check_route6 "2001:db8:104::3 dev dummy2 proto kernel metric 261" + log_test $? 0 "Modify metric and peer address on peer side" + + $IP li del dummy1 + $IP li del dummy2 + cleanup +} + +ipv6_route_metrics_test() +{ + local rc + + echo + echo "IPv6 routes with metrics" + + route_setup + + # + # single path with metrics + # + run_cmd "$IP -6 ro add 2001:db8:111::/64 via 2001:db8:101::2 mtu 1400" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:111::/64 via 2001:db8:101::2 dev veth1 metric 1024 mtu 1400" + rc=$? + fi + log_test $rc 0 "Single path route with mtu metric" + + + # + # multipath via separate routes with metrics + # + run_cmd "$IP -6 ro add 2001:db8:112::/64 via 2001:db8:101::2 mtu 1400" + run_cmd "$IP -6 ro append 2001:db8:112::/64 via 2001:db8:103::2" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:112::/64 metric 1024 mtu 1400 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + rc=$? + fi + log_test $rc 0 "Multipath route via 2 single routes with mtu metric on first" + + # second route is coalesced to first to make a multipath route. + # MTU of the second path is hidden from display! + run_cmd "$IP -6 ro add 2001:db8:113::/64 via 2001:db8:101::2" + run_cmd "$IP -6 ro append 2001:db8:113::/64 via 2001:db8:103::2 mtu 1400" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:113::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + rc=$? + fi + log_test $rc 0 "Multipath route via 2 single routes with mtu metric on 2nd" + + run_cmd "$IP -6 ro del 2001:db8:113::/64 via 2001:db8:101::2" + if [ $? -eq 0 ]; then + check_route6 "2001:db8:113::/64 via 2001:db8:103::2 dev veth3 metric 1024 mtu 1400" + log_test $? 0 " MTU of second leg" + fi + + # + # multipath with metrics + # + run_cmd "$IP -6 ro add 2001:db8:115::/64 mtu 1400 nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:115::/64 metric 1024 mtu 1400 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + rc=$? + fi + log_test $rc 0 "Multipath route with mtu metric" + + $IP -6 ro add 2001:db8:104::/64 via 2001:db8:101::2 mtu 1300 + run_cmd "ip netns exec ns1 ${ping6} -w1 -c1 -s 1500 2001:db8:104::1" + log_test $? 0 "Using route with mtu metric" + + run_cmd "$IP -6 ro add 2001:db8:114::/64 via 2001:db8:101::2 congctl lock foo" + log_test $? 2 "Invalid metric (fails metric_convert)" + + route_cleanup +} + +# add route for a prefix, flushing any existing routes first +# expected to be the first step of a test +add_route() +{ + local pfx="$1" + local nh="$2" + local out + + if [ "$VERBOSE" = "1" ]; then + echo + echo " ##################################################" + echo + fi + + run_cmd "$IP ro flush ${pfx}" + [ $? -ne 0 ] && exit 1 + + out=$($IP ro ls match ${pfx}) + if [ -n "$out" ]; then + echo "Failed to flush routes for prefix used for tests." + exit 1 + fi + + run_cmd "$IP ro add ${pfx} ${nh}" + if [ $? -ne 0 ]; then + echo "Failed to add initial route for test." + exit 1 + fi +} + +# add initial route - used in replace route tests +add_initial_route() +{ + add_route "172.16.104.0/24" "$1" +} + +check_route() +{ + local pfx + local expected="$1" + local out + + set -- $expected + pfx=$1 + [ "${pfx}" = "unreachable" ] && pfx=$2 + + out=$($IP ro ls match ${pfx}) + check_expected "${out}" "${expected}" +} + +# assumption is that basic add of a single path route works +# otherwise just adding an address on an interface is broken +ipv4_rt_add() +{ + local rc + + echo + echo "IPv4 route add / append tests" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.2" + log_test $? 2 "Attempt to add duplicate route - gw" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro add 172.16.104.0/24 dev veth3" + log_test $? 2 "Attempt to add duplicate route - dev only" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro add unreachable 172.16.104.0/24" + log_test $? 2 "Attempt to add duplicate route - reject route" + + # iproute2 prepend only sets NLM_F_CREATE + # - adds a new route; does NOT convert existing route to ECMP + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro prepend 172.16.104.0/24 via 172.16.103.2" + check_route "172.16.104.0/24 via 172.16.103.2 dev veth3 172.16.104.0/24 via 172.16.101.2 dev veth1" + log_test $? 0 "Add new nexthop for existing prefix" + + # route append with same prefix adds a new route + # - iproute2 sets NLM_F_CREATE | NLM_F_APPEND + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro append 172.16.104.0/24 via 172.16.103.2" + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 via 172.16.103.2 dev veth3" + log_test $? 0 "Append nexthop to existing route - gw" + + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro append 172.16.104.0/24 dev veth3" + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 dev veth3 scope link" + log_test $? 0 "Append nexthop to existing route - dev only" + + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro append unreachable 172.16.104.0/24" + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 unreachable 172.16.104.0/24" + log_test $? 0 "Append nexthop to existing route - reject route" + + run_cmd "$IP ro flush 172.16.104.0/24" + run_cmd "$IP ro add unreachable 172.16.104.0/24" + run_cmd "$IP ro append 172.16.104.0/24 via 172.16.103.2" + check_route "unreachable 172.16.104.0/24 172.16.104.0/24 via 172.16.103.2 dev veth3" + log_test $? 0 "Append nexthop to existing reject route - gw" + + run_cmd "$IP ro flush 172.16.104.0/24" + run_cmd "$IP ro add unreachable 172.16.104.0/24" + run_cmd "$IP ro append 172.16.104.0/24 dev veth3" + check_route "unreachable 172.16.104.0/24 172.16.104.0/24 dev veth3 scope link" + log_test $? 0 "Append nexthop to existing reject route - dev only" + + # insert mpath directly + add_route "172.16.104.0/24" "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + log_test $? 0 "add multipath route" + + add_route "172.16.104.0/24" "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro add 172.16.104.0/24 nexthop via 172.16.101.2 nexthop via 172.16.103.2" + log_test $? 2 "Attempt to add duplicate multipath route" + + # insert of a second route without append but different metric + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.2 metric 512" + rc=$? + if [ $rc -eq 0 ]; then + run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.3 metric 256" + rc=$? + fi + log_test $rc 0 "Route add with different metrics" + + run_cmd "$IP ro del 172.16.104.0/24 metric 512" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 via 172.16.103.3 dev veth3 metric 256" + rc=$? + fi + log_test $rc 0 "Route delete with metric" +} + +ipv4_rt_replace_single() +{ + # single path with single path + # + add_initial_route "via 172.16.101.2" + run_cmd "$IP ro replace 172.16.104.0/24 via 172.16.103.2" + check_route "172.16.104.0/24 via 172.16.103.2 dev veth3" + log_test $? 0 "Single path with single path" + + # single path with multipath + # + add_initial_route "nexthop via 172.16.101.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.2" + check_route "172.16.104.0/24 nexthop via 172.16.101.3 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + log_test $? 0 "Single path with multipath" + + # single path with reject + # + add_initial_route "nexthop via 172.16.101.2" + run_cmd "$IP ro replace unreachable 172.16.104.0/24" + check_route "unreachable 172.16.104.0/24" + log_test $? 0 "Single path with reject route" + + # single path with single path using MULTIPATH attribute + # + add_initial_route "via 172.16.101.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.103.2" + check_route "172.16.104.0/24 via 172.16.103.2 dev veth3" + log_test $? 0 "Single path with single path via multipath attribute" + + # route replace fails - invalid nexthop + add_initial_route "via 172.16.101.2" + run_cmd "$IP ro replace 172.16.104.0/24 via 2001:db8:104::2" + if [ $? -eq 0 ]; then + # previous command is expected to fail so if it returns 0 + # that means the test failed. + log_test 0 1 "Invalid nexthop" + else + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1" + log_test $? 0 "Invalid nexthop" + fi + + # replace non-existent route + # - note use of change versus replace since ip adds NLM_F_CREATE + # for replace + add_initial_route "via 172.16.101.2" + run_cmd "$IP ro change 172.16.105.0/24 via 172.16.101.2" + log_test $? 2 "Single path - replace of non-existent route" +} + +ipv4_rt_replace_mpath() +{ + # multipath with multipath + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.3" + check_route "172.16.104.0/24 nexthop via 172.16.101.3 dev veth1 weight 1 nexthop via 172.16.103.3 dev veth3 weight 1" + log_test $? 0 "Multipath with multipath" + + # multipath with single + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 via 172.16.101.3" + check_route "172.16.104.0/24 via 172.16.101.3 dev veth1" + log_test $? 0 "Multipath with single path" + + # multipath with single + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3" + check_route "172.16.104.0/24 via 172.16.101.3 dev veth1" + log_test $? 0 "Multipath with single path via multipath attribute" + + # multipath with reject + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace unreachable 172.16.104.0/24" + check_route "unreachable 172.16.104.0/24" + log_test $? 0 "Multipath with reject route" + + # route replace fails - invalid nexthop 1 + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.111.3 nexthop via 172.16.103.3" + check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + log_test $? 0 "Multipath - invalid first nexthop" + + # route replace fails - invalid nexthop 2 + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.113.3" + check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + log_test $? 0 "Multipath - invalid second nexthop" + + # multipath non-existent route + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro change 172.16.105.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.3" + log_test $? 2 "Multipath - replace of non-existent route" +} + +ipv4_rt_replace() +{ + echo + echo "IPv4 route replace tests" + + ipv4_rt_replace_single + ipv4_rt_replace_mpath +} + +ipv4_route_test() +{ + route_setup + + ipv4_rt_add + ipv4_rt_replace + + route_cleanup +} + +ipv4_addr_metric_test() +{ + local rc + + echo + echo "IPv4 prefix route tests" + + ip_addr_metric_check || return 1 + + setup + + set -e + $IP li add dummy1 type dummy + $IP li add dummy2 type dummy + $IP li set dummy1 up + $IP li set dummy2 up + + # default entry is metric 256 + run_cmd "$IP addr add dev dummy1 172.16.104.1/24" + run_cmd "$IP addr add dev dummy2 172.16.104.2/24" + set +e + + check_route "172.16.104.0/24 dev dummy1 proto kernel scope link src 172.16.104.1 172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2" + log_test $? 0 "Default metric" + + set -e + run_cmd "$IP addr flush dev dummy1" + run_cmd "$IP addr add dev dummy1 172.16.104.1/24 metric 257" + set +e + + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 172.16.104.0/24 dev dummy1 proto kernel scope link src 172.16.104.1 metric 257" + log_test $? 0 "User specified metric on first device" + + set -e + run_cmd "$IP addr flush dev dummy2" + run_cmd "$IP addr add dev dummy2 172.16.104.2/24 metric 258" + set +e + + check_route "172.16.104.0/24 dev dummy1 proto kernel scope link src 172.16.104.1 metric 257 172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 258" + log_test $? 0 "User specified metric on second device" + + run_cmd "$IP addr del dev dummy1 172.16.104.1/24 metric 257" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 258" + rc=$? + fi + log_test $rc 0 "Delete of address on first device" + + run_cmd "$IP addr change dev dummy2 172.16.104.2/24 metric 259" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 259" + rc=$? + fi + log_test $rc 0 "Modify metric of address" + + # verify prefix route removed on down + run_cmd "$IP li set dev dummy2 down" + rc=$? + if [ $rc -eq 0 ]; then + out=$($IP ro ls match 172.16.104.0/24) + check_expected "${out}" "" + rc=$? + fi + log_test $rc 0 "Prefix route removed on link down" + + # verify prefix route re-inserted with assigned metric + run_cmd "$IP li set dev dummy2 up" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 259" + rc=$? + fi + log_test $rc 0 "Prefix route with metric on link up" + + # explicitly check for metric changes on edge scenarios + run_cmd "$IP addr flush dev dummy2" + run_cmd "$IP addr add dev dummy2 172.16.104.0/24 metric 259" + run_cmd "$IP addr change dev dummy2 172.16.104.0/24 metric 260" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.0 metric 260" + rc=$? + fi + log_test $rc 0 "Modify metric of .0/24 address" + + run_cmd "$IP addr flush dev dummy2" + run_cmd "$IP addr add dev dummy2 172.16.104.1/32 peer 172.16.104.2 metric 260" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.2 dev dummy2 proto kernel scope link src 172.16.104.1 metric 260" + rc=$? + fi + log_test $rc 0 "Set metric of address with peer route" + + run_cmd "$IP addr change dev dummy2 172.16.104.1/32 peer 172.16.104.3 metric 261" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.3 dev dummy2 proto kernel scope link src 172.16.104.1 metric 261" + rc=$? + fi + log_test $rc 0 "Modify metric and peer address for peer route" + + $IP li del dummy1 + $IP li del dummy2 + cleanup +} + +ipv4_route_metrics_test() +{ + local rc + + echo + echo "IPv4 route add / append tests" + + route_setup + + run_cmd "$IP ro add 172.16.111.0/24 via 172.16.101.2 mtu 1400" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.111.0/24 via 172.16.101.2 dev veth1 mtu 1400" + rc=$? + fi + log_test $rc 0 "Single path route with mtu metric" + + + run_cmd "$IP ro add 172.16.112.0/24 mtu 1400 nexthop via 172.16.101.2 nexthop via 172.16.103.2" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.112.0/24 mtu 1400 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + rc=$? + fi + log_test $rc 0 "Multipath route with mtu metric" + + $IP ro add 172.16.104.0/24 via 172.16.101.2 mtu 1300 + run_cmd "ip netns exec ns1 ping -w1 -c1 -s 1500 172.16.104.1" + log_test $? 0 "Using route with mtu metric" + + run_cmd "$IP ro add 172.16.111.0/24 via 172.16.101.2 congctl lock foo" + log_test $? 2 "Invalid metric (fails metric_convert)" + + route_cleanup +} + +ipv4_route_v6_gw_test() +{ + local rc + + echo + echo "IPv4 route with IPv6 gateway tests" + + route_setup + sleep 2 + + # + # single path route + # + run_cmd "$IP ro add 172.16.104.0/24 via inet6 2001:db8:101::2" + rc=$? + log_test $rc 0 "Single path route with IPv6 gateway" + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 via inet6 2001:db8:101::2 dev veth1" + fi + + run_cmd "ip netns exec ns1 ping -w1 -c1 172.16.104.1" + log_test $rc 0 "Single path route with IPv6 gateway - ping" + + run_cmd "$IP ro del 172.16.104.0/24 via inet6 2001:db8:101::2" + rc=$? + log_test $rc 0 "Single path route delete" + if [ $rc -eq 0 ]; then + check_route "172.16.112.0/24" + fi + + # + # multipath - v6 then v4 + # + run_cmd "$IP ro add 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3" + rc=$? + log_test $rc 0 "Multipath route add - v6 nexthop then v4" + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + fi + + run_cmd "$IP ro del 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1" + log_test $? 2 " Multipath route delete - nexthops in wrong order" + + run_cmd "$IP ro del 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3" + log_test $? 0 " Multipath route delete exact match" + + # + # multipath - v4 then v6 + # + run_cmd "$IP ro add 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1" + rc=$? + log_test $rc 0 "Multipath route add - v4 nexthop then v6" + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 weight 1 nexthop via inet6 2001:db8:101::2 dev veth1 weight 1" + fi + + run_cmd "$IP ro del 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3" + log_test $? 2 " Multipath route delete - nexthops in wrong order" + + run_cmd "$IP ro del 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1" + log_test $? 0 " Multipath route delete exact match" + + route_cleanup +} + +################################################################################ +# usage + +usage() +{ + cat <<EOF +usage: ${0##*/} OPTS + + -t <test> Test(s) to run (default: all) + (options: $TESTS) + -p Pause on fail + -P Pause after each test before cleanup + -v verbose mode (show commands and output) +EOF +} + +################################################################################ +# main + +while getopts :t:pPhv o +do + case $o in + t) TESTS=$OPTARG;; + p) PAUSE_ON_FAIL=yes;; + P) PAUSE=yes;; + v) VERBOSE=$(($VERBOSE + 1));; + h) usage; exit 0;; + *) usage; exit 1;; + esac +done + +PEER_CMD="ip netns exec ${PEER_NS}" + +# make sure we don't pause twice +[ "${PAUSE}" = "yes" ] && PAUSE_ON_FAIL=no + +if [ "$(id -u)" -ne 0 ];then + echo "SKIP: Need root privileges" + exit $ksft_skip; +fi + +if [ ! -x "$(command -v ip)" ]; then + echo "SKIP: Could not run test without ip tool" + exit $ksft_skip +fi + +ip route help 2>&1 | grep -q fibmatch +if [ $? -ne 0 ]; then + echo "SKIP: iproute2 too old, missing fibmatch" + exit $ksft_skip +fi + +# start clean +cleanup &> /dev/null + +for t in $TESTS +do + case $t in + fib_unreg_test|unregister) fib_unreg_test;; + fib_down_test|down) fib_down_test;; + fib_carrier_test|carrier) fib_carrier_test;; + fib_rp_filter_test|rp_filter) fib_rp_filter_test;; + fib_nexthop_test|nexthop) fib_nexthop_test;; + fib_suppress_test|suppress) fib_suppress_test;; + ipv6_route_test|ipv6_rt) ipv6_route_test;; + ipv4_route_test|ipv4_rt) ipv4_route_test;; + ipv6_addr_metric) ipv6_addr_metric_test;; + ipv4_addr_metric) ipv4_addr_metric_test;; + ipv6_route_metrics) ipv6_route_metrics_test;; + ipv4_route_metrics) ipv4_route_metrics_test;; + ipv4_route_v6_gw) ipv4_route_v6_gw_test;; + + help) echo "Test names: $TESTS"; exit 0;; + esac +done + +if [ "$TESTS" != "none" ]; then + printf "\nTests passed: %3d\n" ${nsuccess} + printf "Tests failed: %3d\n" ${nfail} +fi + +exit $ret
On Sun, Feb 05, 2023 at 05:30:59AM +0000, Shaoying Xu wrote:
This reverts commit 2537b637eac0bd546f63e1492a34edd30878e8d4.
Signed-off-by: Shaoying Xu shaoyi@amazon.com
We need a reason why this revert happens please. Perhaps put the information you have in the previous email in this thread in here?
thanks,
greg k-h
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f96a3d74554df537b6db5c99c27c80e7afadc8d1 ]
Cited commit added the table ID to the FIB info structure, but did not prevent structures with different table IDs from being consolidated. This can lead to routes being flushed from a VRF when an address is deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB info. This is already done for FIB info structures backed by a nexthop object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [FAIL] TEST: Route in default VRF not removed [ OK ] RTNETLINK answers: File exists TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6 Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8 Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Shaoying Xu shaoyi@amazon.com --- net/ipv4/fib_semantics.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 908913d75847..f45b9daf62cf 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -420,6 +420,7 @@ static struct fib_info *fib_find_info(struct fib_info *nfi) nfi->fib_prefsrc == fi->fib_prefsrc && nfi->fib_priority == fi->fib_priority && nfi->fib_type == fi->fib_type && + nfi->fib_tb_id == fi->fib_tb_id && memcmp(nfi->fib_metrics, fi->fib_metrics, sizeof(u32) * RTAX_MAX) == 0 && !((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_COMPARE_MASK) &&
On Sun, Feb 05, 2023 at 05:30:58AM +0000, Shaoying Xu wrote:
On Mon, 12 Dec 2022 14:17:35 +0100 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f96a3d74554df537b6db5c99c27c80e7afadc8d1 ]
Cited commit added the table ID to the FIB info structure, but did not prevent structures with different table IDs from being consolidated. This can lead to routes being flushed from a VRF when an address is deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB info. This is already done for FIB info structures backed by a nexthop object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [FAIL] TEST: Route in default VRF not removed [ OK ] RTNETLINK answers: File exists TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6 Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8 Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
net/ipv4/fib_semantics.c | 1 + tools/testing/selftests/net/fib_tests.sh | 1727 ---------------------- 2 files changed, 1 insertion(+), 1727 deletions(-) delete mode 100755 tools/testing/selftests/net/fib_tests.sh
Hi Greg,
Sorry for last unclear message without context. I found this commit deleted the whole fib_tests.sh that causes new failure in kselftests run. Looking at the upstream patch [1] and given the context that ipv4_del_addr test is not available to kernel 5.4, I added the revert patch and new backport of this commit to resolve the potential mistake.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
Thanks, Shaoying
Ido Schimmel (1): ipv4: Fix incorrect route flushing when source address is deleted
Shaoying Xu (1): Revert "ipv4: Fix incorrect route flushing when source address is deleted"
tools/testing/selftests/net/fib_tests.sh | 1727 ++++++++++++++++++++++ 1 file changed, 1727 insertions(+) create mode 100755 tools/testing/selftests/net/fib_tests.sh
See my comments on patch 1/2 here, can you fix that up and resend these?
thanks,
greg k-h
This reverts commit 2537b637eac0bd546f63e1492a34edd30878e8d4 that deleted the whole fib_tests.sh by mistake and caused fib_tests failure in kselftests run.
Signed-off-by: Shaoying Xu shaoyi@amazon.com --- net/ipv4/fib_semantics.c | 1 - tools/testing/selftests/net/fib_tests.sh | 1727 ++++++++++++++++++++++ 2 files changed, 1727 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/net/fib_tests.sh
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index f45b9daf62cf..908913d75847 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -420,7 +420,6 @@ static struct fib_info *fib_find_info(struct fib_info *nfi) nfi->fib_prefsrc == fi->fib_prefsrc && nfi->fib_priority == fi->fib_priority && nfi->fib_type == fi->fib_type && - nfi->fib_tb_id == fi->fib_tb_id && memcmp(nfi->fib_metrics, fi->fib_metrics, sizeof(u32) * RTAX_MAX) == 0 && !((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_COMPARE_MASK) && diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh new file mode 100755 index 000000000000..6986086035d6 --- /dev/null +++ b/tools/testing/selftests/net/fib_tests.sh @@ -0,0 +1,1727 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# This test is for checking IPv4 and IPv6 FIB behavior in response to +# different events. + +ret=0 +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +# all tests in this script. Can be overridden with -t option +TESTS="unregister down carrier nexthop suppress ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter" + +VERBOSE=0 +PAUSE_ON_FAIL=no +PAUSE=no +IP="ip -netns ns1" +NS_EXEC="ip netns exec ns1" + +which ping6 > /dev/null 2>&1 && ping6=$(which ping6) || ping6=$(which ping) + +log_test() +{ + local rc=$1 + local expected=$2 + local msg="$3" + + if [ ${rc} -eq ${expected} ]; then + printf " TEST: %-60s [ OK ]\n" "${msg}" + nsuccess=$((nsuccess+1)) + else + ret=1 + nfail=$((nfail+1)) + printf " TEST: %-60s [FAIL]\n" "${msg}" + if [ "${PAUSE_ON_FAIL}" = "yes" ]; then + echo + echo "hit enter to continue, 'q' to quit" + read a + [ "$a" = "q" ] && exit 1 + fi + fi + + if [ "${PAUSE}" = "yes" ]; then + echo + echo "hit enter to continue, 'q' to quit" + read a + [ "$a" = "q" ] && exit 1 + fi +} + +setup() +{ + set -e + ip netns add ns1 + ip netns set ns1 auto + $IP link set dev lo up + ip netns exec ns1 sysctl -qw net.ipv4.ip_forward=1 + ip netns exec ns1 sysctl -qw net.ipv6.conf.all.forwarding=1 + + $IP link add dummy0 type dummy + $IP link set dev dummy0 up + $IP address add 198.51.100.1/24 dev dummy0 + $IP -6 address add 2001:db8:1::1/64 dev dummy0 + set +e + +} + +cleanup() +{ + $IP link del dev dummy0 &> /dev/null + ip netns del ns1 + ip netns del ns2 &> /dev/null +} + +get_linklocal() +{ + local dev=$1 + local addr + + addr=$($IP -6 -br addr show dev ${dev} | \ + awk '{ + for (i = 3; i <= NF; ++i) { + if ($i ~ /^fe80/) + print $i + } + }' + ) + addr=${addr//*} + + [ -z "$addr" ] && return 1 + + echo $addr + + return 0 +} + +fib_unreg_unicast_test() +{ + echo + echo "Single path route test" + + setup + + echo " Start point" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + set -e + $IP link del dev dummy0 + set +e + + echo " Nexthop device deleted" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 2 "IPv4 fibmatch - no route" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 2 "IPv6 fibmatch - no route" + + cleanup +} + +fib_unreg_multipath_test() +{ + + echo + echo "Multipath route test" + + setup + + set -e + $IP link add dummy1 type dummy + $IP link set dev dummy1 up + $IP address add 192.0.2.1/24 dev dummy1 + $IP -6 address add 2001:db8:2::1/64 dev dummy1 + + $IP route add 203.0.113.0/24 \ + nexthop via 198.51.100.2 dev dummy0 \ + nexthop via 192.0.2.2 dev dummy1 + $IP -6 route add 2001:db8:3::/64 \ + nexthop via 2001:db8:1::2 dev dummy0 \ + nexthop via 2001:db8:2::2 dev dummy1 + set +e + + echo " Start point" + $IP route get fibmatch 203.0.113.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + set -e + $IP link del dev dummy0 + set +e + + echo " One nexthop device deleted" + $IP route get fibmatch 203.0.113.1 &> /dev/null + log_test $? 2 "IPv4 - multipath route removed on delete" + + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + # In IPv6 we do not flush the entire multipath route. + log_test $? 0 "IPv6 - multipath down to single path" + + set -e + $IP link del dev dummy1 + set +e + + echo " Second nexthop device deleted" + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + log_test $? 2 "IPv6 - no route" + + cleanup +} + +fib_unreg_test() +{ + fib_unreg_unicast_test + fib_unreg_multipath_test +} + +fib_down_unicast_test() +{ + echo + echo "Single path, admin down" + + setup + + echo " Start point" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + set -e + $IP link set dev dummy0 down + set +e + + echo " Route deleted on down" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 2 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 2 "IPv6 fibmatch" + + cleanup +} + +fib_down_multipath_test_do() +{ + local down_dev=$1 + local up_dev=$2 + + $IP route get fibmatch 203.0.113.1 \ + oif $down_dev &> /dev/null + log_test $? 2 "IPv4 fibmatch on down device" + $IP -6 route get fibmatch 2001:db8:3::1 \ + oif $down_dev &> /dev/null + log_test $? 2 "IPv6 fibmatch on down device" + + $IP route get fibmatch 203.0.113.1 \ + oif $up_dev &> /dev/null + log_test $? 0 "IPv4 fibmatch on up device" + $IP -6 route get fibmatch 2001:db8:3::1 \ + oif $up_dev &> /dev/null + log_test $? 0 "IPv6 fibmatch on up device" + + $IP route get fibmatch 203.0.113.1 | \ + grep $down_dev | grep -q "dead linkdown" + log_test $? 0 "IPv4 flags on down device" + $IP -6 route get fibmatch 2001:db8:3::1 | \ + grep $down_dev | grep -q "dead linkdown" + log_test $? 0 "IPv6 flags on down device" + + $IP route get fibmatch 203.0.113.1 | \ + grep $up_dev | grep -q "dead linkdown" + log_test $? 1 "IPv4 flags on up device" + $IP -6 route get fibmatch 2001:db8:3::1 | \ + grep $up_dev | grep -q "dead linkdown" + log_test $? 1 "IPv6 flags on up device" +} + +fib_down_multipath_test() +{ + echo + echo "Admin down multipath" + + setup + + set -e + $IP link add dummy1 type dummy + $IP link set dev dummy1 up + + $IP address add 192.0.2.1/24 dev dummy1 + $IP -6 address add 2001:db8:2::1/64 dev dummy1 + + $IP route add 203.0.113.0/24 \ + nexthop via 198.51.100.2 dev dummy0 \ + nexthop via 192.0.2.2 dev dummy1 + $IP -6 route add 2001:db8:3::/64 \ + nexthop via 2001:db8:1::2 dev dummy0 \ + nexthop via 2001:db8:2::2 dev dummy1 + set +e + + echo " Verify start point" + $IP route get fibmatch 203.0.113.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + set -e + $IP link set dev dummy0 down + set +e + + echo " One device down, one up" + fib_down_multipath_test_do "dummy0" "dummy1" + + set -e + $IP link set dev dummy0 up + $IP link set dev dummy1 down + set +e + + echo " Other device down and up" + fib_down_multipath_test_do "dummy1" "dummy0" + + set -e + $IP link set dev dummy0 down + set +e + + echo " Both devices down" + $IP route get fibmatch 203.0.113.1 &> /dev/null + log_test $? 2 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + log_test $? 2 "IPv6 fibmatch" + + $IP link del dev dummy1 + cleanup +} + +fib_down_test() +{ + fib_down_unicast_test + fib_down_multipath_test +} + +# Local routes should not be affected when carrier changes. +fib_carrier_local_test() +{ + echo + echo "Local carrier tests - single path" + + setup + + set -e + $IP link set dev dummy0 carrier on + set +e + + echo " Start point" + $IP route get fibmatch 198.51.100.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 198.51.100.1 | \ + grep -q "linkdown" + log_test $? 1 "IPv4 - no linkdown flag" + $IP -6 route get fibmatch 2001:db8:1::1 | \ + grep -q "linkdown" + log_test $? 1 "IPv6 - no linkdown flag" + + set -e + $IP link set dev dummy0 carrier off + sleep 1 + set +e + + echo " Carrier off on nexthop" + $IP route get fibmatch 198.51.100.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 198.51.100.1 | \ + grep -q "linkdown" + log_test $? 1 "IPv4 - linkdown flag set" + $IP -6 route get fibmatch 2001:db8:1::1 | \ + grep -q "linkdown" + log_test $? 1 "IPv6 - linkdown flag set" + + set -e + $IP address add 192.0.2.1/24 dev dummy0 + $IP -6 address add 2001:db8:2::1/64 dev dummy0 + set +e + + echo " Route to local address with carrier down" + $IP route get fibmatch 192.0.2.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:2::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 192.0.2.1 | \ + grep -q "linkdown" + log_test $? 1 "IPv4 linkdown flag set" + $IP -6 route get fibmatch 2001:db8:2::1 | \ + grep -q "linkdown" + log_test $? 1 "IPv6 linkdown flag set" + + cleanup +} + +fib_carrier_unicast_test() +{ + ret=0 + + echo + echo "Single path route carrier test" + + setup + + set -e + $IP link set dev dummy0 carrier on + set +e + + echo " Start point" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 198.51.100.2 | \ + grep -q "linkdown" + log_test $? 1 "IPv4 no linkdown flag" + $IP -6 route get fibmatch 2001:db8:1::2 | \ + grep -q "linkdown" + log_test $? 1 "IPv6 no linkdown flag" + + set -e + $IP link set dev dummy0 carrier off + sleep 1 + set +e + + echo " Carrier down" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 198.51.100.2 | \ + grep -q "linkdown" + log_test $? 0 "IPv4 linkdown flag set" + $IP -6 route get fibmatch 2001:db8:1::2 | \ + grep -q "linkdown" + log_test $? 0 "IPv6 linkdown flag set" + + set -e + $IP address add 192.0.2.1/24 dev dummy0 + $IP -6 address add 2001:db8:2::1/64 dev dummy0 + set +e + + echo " Second address added with carrier down" + $IP route get fibmatch 192.0.2.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:2::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 192.0.2.2 | \ + grep -q "linkdown" + log_test $? 0 "IPv4 linkdown flag set" + $IP -6 route get fibmatch 2001:db8:2::2 | \ + grep -q "linkdown" + log_test $? 0 "IPv6 linkdown flag set" + + cleanup +} + +fib_carrier_test() +{ + fib_carrier_local_test + fib_carrier_unicast_test +} + +fib_rp_filter_test() +{ + echo + echo "IPv4 rp_filter tests" + + setup + + set -e + ip netns add ns2 + ip netns set ns2 auto + + ip -netns ns2 link set dev lo up + + $IP link add name veth1 type veth peer name veth2 + $IP link set dev veth2 netns ns2 + $IP address add 192.0.2.1/24 dev veth1 + ip -netns ns2 address add 192.0.2.1/24 dev veth2 + $IP link set dev veth1 up + ip -netns ns2 link set dev veth2 up + + $IP link set dev lo address 52:54:00:6a:c7:5e + $IP link set dev veth1 address 52:54:00:6a:c7:5e + ip -netns ns2 link set dev lo address 52:54:00:6a:c7:5e + ip -netns ns2 link set dev veth2 address 52:54:00:6a:c7:5e + + # 1. (ns2) redirect lo's egress to veth2's egress + ip netns exec ns2 tc qdisc add dev lo parent root handle 1: fq_codel + ip netns exec ns2 tc filter add dev lo parent 1: protocol arp basic \ + action mirred egress redirect dev veth2 + ip netns exec ns2 tc filter add dev lo parent 1: protocol ip basic \ + action mirred egress redirect dev veth2 + + # 2. (ns1) redirect veth1's ingress to lo's ingress + $NS_EXEC tc qdisc add dev veth1 ingress + $NS_EXEC tc filter add dev veth1 ingress protocol arp basic \ + action mirred ingress redirect dev lo + $NS_EXEC tc filter add dev veth1 ingress protocol ip basic \ + action mirred ingress redirect dev lo + + # 3. (ns1) redirect lo's egress to veth1's egress + $NS_EXEC tc qdisc add dev lo parent root handle 1: fq_codel + $NS_EXEC tc filter add dev lo parent 1: protocol arp basic \ + action mirred egress redirect dev veth1 + $NS_EXEC tc filter add dev lo parent 1: protocol ip basic \ + action mirred egress redirect dev veth1 + + # 4. (ns2) redirect veth2's ingress to lo's ingress + ip netns exec ns2 tc qdisc add dev veth2 ingress + ip netns exec ns2 tc filter add dev veth2 ingress protocol arp basic \ + action mirred ingress redirect dev lo + ip netns exec ns2 tc filter add dev veth2 ingress protocol ip basic \ + action mirred ingress redirect dev lo + + $NS_EXEC sysctl -qw net.ipv4.conf.all.rp_filter=1 + $NS_EXEC sysctl -qw net.ipv4.conf.all.accept_local=1 + $NS_EXEC sysctl -qw net.ipv4.conf.all.route_localnet=1 + ip netns exec ns2 sysctl -qw net.ipv4.conf.all.rp_filter=1 + ip netns exec ns2 sysctl -qw net.ipv4.conf.all.accept_local=1 + ip netns exec ns2 sysctl -qw net.ipv4.conf.all.route_localnet=1 + set +e + + run_cmd "ip netns exec ns2 ping -w1 -c1 192.0.2.1" + log_test $? 0 "rp_filter passes local packets" + + run_cmd "ip netns exec ns2 ping -w1 -c1 127.0.0.1" + log_test $? 0 "rp_filter passes loopback packets" + + cleanup +} + +################################################################################ +# Tests on nexthop spec + +# run 'ip route add' with given spec +add_rt() +{ + local desc="$1" + local erc=$2 + local vrf=$3 + local pfx=$4 + local gw=$5 + local dev=$6 + local cmd out rc + + [ "$vrf" = "-" ] && vrf="default" + [ -n "$gw" ] && gw="via $gw" + [ -n "$dev" ] && dev="dev $dev" + + cmd="$IP route add vrf $vrf $pfx $gw $dev" + if [ "$VERBOSE" = "1" ]; then + printf "\n COMMAND: $cmd\n" + fi + + out=$(eval $cmd 2>&1) + rc=$? + if [ "$VERBOSE" = "1" -a -n "$out" ]; then + echo " $out" + fi + log_test $rc $erc "$desc" +} + +fib4_nexthop() +{ + echo + echo "IPv4 nexthop tests" + + echo "<<< write me >>>" +} + +fib6_nexthop() +{ + local lldummy=$(get_linklocal dummy0) + local llv1=$(get_linklocal dummy0) + + if [ -z "$lldummy" ]; then + echo "Failed to get linklocal address for dummy0" + return 1 + fi + if [ -z "$llv1" ]; then + echo "Failed to get linklocal address for veth1" + return 1 + fi + + echo + echo "IPv6 nexthop tests" + + add_rt "Directly connected nexthop, unicast address" 0 \ + - 2001:db8:101::/64 2001:db8:1::2 + add_rt "Directly connected nexthop, unicast address with device" 0 \ + - 2001:db8:102::/64 2001:db8:1::2 "dummy0" + add_rt "Gateway is linklocal address" 0 \ + - 2001:db8:103::1/64 $llv1 "veth0" + + # fails because LL address requires a device + add_rt "Gateway is linklocal address, no device" 2 \ + - 2001:db8:104::1/64 $llv1 + + # local address can not be a gateway + add_rt "Gateway can not be local unicast address" 2 \ + - 2001:db8:105::/64 2001:db8:1::1 + add_rt "Gateway can not be local unicast address, with device" 2 \ + - 2001:db8:106::/64 2001:db8:1::1 "dummy0" + add_rt "Gateway can not be a local linklocal address" 2 \ + - 2001:db8:107::1/64 $lldummy "dummy0" + + # VRF tests + add_rt "Gateway can be local address in a VRF" 0 \ + - 2001:db8:108::/64 2001:db8:51::2 + add_rt "Gateway can be local address in a VRF, with device" 0 \ + - 2001:db8:109::/64 2001:db8:51::2 "veth0" + add_rt "Gateway can be local linklocal address in a VRF" 0 \ + - 2001:db8:110::1/64 $llv1 "veth0" + + add_rt "Redirect to VRF lookup" 0 \ + - 2001:db8:111::/64 "" "red" + + add_rt "VRF route, gateway can be local address in default VRF" 0 \ + red 2001:db8:112::/64 2001:db8:51::1 + + # local address in same VRF fails + add_rt "VRF route, gateway can not be a local address" 2 \ + red 2001:db8:113::1/64 2001:db8:2::1 + add_rt "VRF route, gateway can not be a local addr with device" 2 \ + red 2001:db8:114::1/64 2001:db8:2::1 "dummy1" +} + +# Default VRF: +# dummy0 - 198.51.100.1/24 2001:db8:1::1/64 +# veth0 - 192.0.2.1/24 2001:db8:51::1/64 +# +# VRF red: +# dummy1 - 192.168.2.1/24 2001:db8:2::1/64 +# veth1 - 192.0.2.2/24 2001:db8:51::2/64 +# +# [ dummy0 veth0 ]--[ veth1 dummy1 ] + +fib_nexthop_test() +{ + setup + + set -e + + $IP -4 rule add pref 32765 table local + $IP -4 rule del pref 0 + $IP -6 rule add pref 32765 table local + $IP -6 rule del pref 0 + + $IP link add red type vrf table 1 + $IP link set red up + $IP -4 route add vrf red unreachable default metric 4278198272 + $IP -6 route add vrf red unreachable default metric 4278198272 + + $IP link add veth0 type veth peer name veth1 + $IP link set dev veth0 up + $IP address add 192.0.2.1/24 dev veth0 + $IP -6 address add 2001:db8:51::1/64 dev veth0 + + $IP link set dev veth1 vrf red up + $IP address add 192.0.2.2/24 dev veth1 + $IP -6 address add 2001:db8:51::2/64 dev veth1 + + $IP link add dummy1 type dummy + $IP link set dev dummy1 vrf red up + $IP address add 192.168.2.1/24 dev dummy1 + $IP -6 address add 2001:db8:2::1/64 dev dummy1 + set +e + + sleep 1 + fib4_nexthop + fib6_nexthop + + ( + $IP link del dev dummy1 + $IP link del veth0 + $IP link del red + ) 2>/dev/null + cleanup +} + +fib_suppress_test() +{ + echo + echo "FIB rule with suppress_prefixlength" + setup + + $IP link add dummy1 type dummy + $IP link set dummy1 up + $IP -6 route add default dev dummy1 + $IP -6 rule add table main suppress_prefixlength 0 + ping -f -c 1000 -W 1 1234::1 >/dev/null 2>&1 + $IP -6 rule del table main suppress_prefixlength 0 + $IP link del dummy1 + + # If we got here without crashing, we're good. + log_test 0 0 "FIB rule suppress test" + + cleanup +} + +################################################################################ +# Tests on route add and replace + +run_cmd() +{ + local cmd="$1" + local out + local stderr="2>/dev/null" + + if [ "$VERBOSE" = "1" ]; then + printf " COMMAND: $cmd\n" + stderr= + fi + + out=$(eval $cmd $stderr) + rc=$? + if [ "$VERBOSE" = "1" -a -n "$out" ]; then + echo " $out" + fi + + [ "$VERBOSE" = "1" ] && echo + + return $rc +} + +check_expected() +{ + local out="$1" + local expected="$2" + local rc=0 + + [ "${out}" = "${expected}" ] && return 0 + + if [ -z "${out}" ]; then + if [ "$VERBOSE" = "1" ]; then + printf "\nNo route entry found\n" + printf "Expected:\n" + printf " ${expected}\n" + fi + return 1 + fi + + # tricky way to convert output to 1-line without ip's + # messy ''; this drops all extra white space + out=$(echo ${out}) + if [ "${out}" != "${expected}" ]; then + rc=1 + if [ "${VERBOSE}" = "1" ]; then + printf " Unexpected route entry. Have:\n" + printf " ${out}\n" + printf " Expected:\n" + printf " ${expected}\n\n" + fi + fi + + return $rc +} + +# add route for a prefix, flushing any existing routes first +# expected to be the first step of a test +add_route6() +{ + local pfx="$1" + local nh="$2" + local out + + if [ "$VERBOSE" = "1" ]; then + echo + echo " ##################################################" + echo + fi + + run_cmd "$IP -6 ro flush ${pfx}" + [ $? -ne 0 ] && exit 1 + + out=$($IP -6 ro ls match ${pfx}) + if [ -n "$out" ]; then + echo "Failed to flush routes for prefix used for tests." + exit 1 + fi + + run_cmd "$IP -6 ro add ${pfx} ${nh}" + if [ $? -ne 0 ]; then + echo "Failed to add initial route for test." + exit 1 + fi +} + +# add initial route - used in replace route tests +add_initial_route6() +{ + add_route6 "2001:db8:104::/64" "$1" +} + +check_route6() +{ + local pfx + local expected="$1" + local out + local rc=0 + + set -- $expected + pfx=$1 + + out=$($IP -6 ro ls match ${pfx} | sed -e 's/ pref medium//') + check_expected "${out}" "${expected}" +} + +route_cleanup() +{ + $IP li del red 2>/dev/null + $IP li del dummy1 2>/dev/null + $IP li del veth1 2>/dev/null + $IP li del veth3 2>/dev/null + + cleanup &> /dev/null +} + +route_setup() +{ + route_cleanup + setup + + [ "${VERBOSE}" = "1" ] && set -x + set -e + + ip netns add ns2 + ip netns set ns2 auto + ip -netns ns2 link set dev lo up + ip netns exec ns2 sysctl -qw net.ipv4.ip_forward=1 + ip netns exec ns2 sysctl -qw net.ipv6.conf.all.forwarding=1 + + $IP li add veth1 type veth peer name veth2 + $IP li add veth3 type veth peer name veth4 + + $IP li set veth1 up + $IP li set veth3 up + $IP li set veth2 netns ns2 up + $IP li set veth4 netns ns2 up + ip -netns ns2 li add dummy1 type dummy + ip -netns ns2 li set dummy1 up + + $IP -6 addr add 2001:db8:101::1/64 dev veth1 nodad + $IP -6 addr add 2001:db8:103::1/64 dev veth3 nodad + $IP addr add 172.16.101.1/24 dev veth1 + $IP addr add 172.16.103.1/24 dev veth3 + + ip -netns ns2 -6 addr add 2001:db8:101::2/64 dev veth2 nodad + ip -netns ns2 -6 addr add 2001:db8:103::2/64 dev veth4 nodad + ip -netns ns2 -6 addr add 2001:db8:104::1/64 dev dummy1 nodad + + ip -netns ns2 addr add 172.16.101.2/24 dev veth2 + ip -netns ns2 addr add 172.16.103.2/24 dev veth4 + ip -netns ns2 addr add 172.16.104.1/24 dev dummy1 + + set +e +} + +# assumption is that basic add of a single path route works +# otherwise just adding an address on an interface is broken +ipv6_rt_add() +{ + local rc + + echo + echo "IPv6 route add / append tests" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::2" + log_test $? 2 "Attempt to add duplicate route - gw" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro add 2001:db8:104::/64 dev veth3" + log_test $? 2 "Attempt to add duplicate route - dev only" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro add unreachable 2001:db8:104::/64" + log_test $? 2 "Attempt to add duplicate route - reject route" + + # route append with same prefix adds a new route + # - iproute2 sets NLM_F_CREATE | NLM_F_APPEND + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro append 2001:db8:104::/64 via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Append nexthop to existing route - gw" + + # insert mpath directly + add_route6 "2001:db8:104::/64" "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Add multipath route" + + add_route6 "2001:db8:104::/64" "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro add 2001:db8:104::/64 nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + log_test $? 2 "Attempt to add duplicate multipath route" + + # insert of a second route without append but different metric + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::2 metric 512" + rc=$? + if [ $rc -eq 0 ]; then + run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::3 metric 256" + rc=$? + fi + log_test $rc 0 "Route add with different metrics" + + run_cmd "$IP -6 ro del 2001:db8:104::/64 metric 512" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:104::/64 via 2001:db8:103::3 dev veth3 metric 256 2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024" + rc=$? + fi + log_test $rc 0 "Route delete with metric" +} + +ipv6_rt_replace_single() +{ + # single path with single path + # + add_initial_route6 "via 2001:db8:101::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 via 2001:db8:103::2 dev veth3 metric 1024" + log_test $? 0 "Single path with single path" + + # single path with multipath + # + add_initial_route6 "nexthop via 2001:db8:101::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::3 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Single path with multipath" + + # single path with single path using MULTIPATH attribute + # + add_initial_route6 "via 2001:db8:101::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 via 2001:db8:103::2 dev veth3 metric 1024" + log_test $? 0 "Single path with single path via multipath attribute" + + # route replace fails - invalid nexthop + add_initial_route6 "via 2001:db8:101::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:104::2" + if [ $? -eq 0 ]; then + # previous command is expected to fail so if it returns 0 + # that means the test failed. + log_test 0 1 "Invalid nexthop" + else + check_route6 "2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024" + log_test $? 0 "Invalid nexthop" + fi + + # replace non-existent route + # - note use of change versus replace since ip adds NLM_F_CREATE + # for replace + add_initial_route6 "via 2001:db8:101::2" + run_cmd "$IP -6 ro change 2001:db8:105::/64 via 2001:db8:101::2" + log_test $? 2 "Single path - replace of non-existent route" +} + +ipv6_rt_replace_mpath() +{ + # multipath with multipath + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::3" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::3 dev veth1 weight 1 nexthop via 2001:db8:103::3 dev veth3 weight 1" + log_test $? 0 "Multipath with multipath" + + # multipath with single + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:101::3" + check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024" + log_test $? 0 "Multipath with single path" + + # multipath with single + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3" + check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024" + log_test $? 0 "Multipath with single path via multipath attribute" + + # multipath with dev-only + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 dev veth1" + check_route6 "2001:db8:104::/64 dev veth1 metric 1024" + log_test $? 0 "Multipath with dev-only" + + # route replace fails - invalid nexthop 1 + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:111::3 nexthop via 2001:db8:103::3" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Multipath - invalid first nexthop" + + # route replace fails - invalid nexthop 2 + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:113::3" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Multipath - invalid second nexthop" + + # multipath non-existent route + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro change 2001:db8:105::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::3" + log_test $? 2 "Multipath - replace of non-existent route" +} + +ipv6_rt_replace() +{ + echo + echo "IPv6 route replace tests" + + ipv6_rt_replace_single + ipv6_rt_replace_mpath +} + +ipv6_route_test() +{ + route_setup + + ipv6_rt_add + ipv6_rt_replace + + route_cleanup +} + +ip_addr_metric_check() +{ + ip addr help 2>&1 | grep -q metric + if [ $? -ne 0 ]; then + echo "iproute2 command does not support metric for addresses. Skipping test" + return 1 + fi + + return 0 +} + +ipv6_addr_metric_test() +{ + local rc + + echo + echo "IPv6 prefix route tests" + + ip_addr_metric_check || return 1 + + setup + + set -e + $IP li add dummy1 type dummy + $IP li add dummy2 type dummy + $IP li set dummy1 up + $IP li set dummy2 up + + # default entry is metric 256 + run_cmd "$IP -6 addr add dev dummy1 2001:db8:104::1/64" + run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::2/64" + set +e + + check_route6 "2001:db8:104::/64 dev dummy1 proto kernel metric 256 2001:db8:104::/64 dev dummy2 proto kernel metric 256" + log_test $? 0 "Default metric" + + set -e + run_cmd "$IP -6 addr flush dev dummy1" + run_cmd "$IP -6 addr add dev dummy1 2001:db8:104::1/64 metric 257" + set +e + + check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 256 2001:db8:104::/64 dev dummy1 proto kernel metric 257" + log_test $? 0 "User specified metric on first device" + + set -e + run_cmd "$IP -6 addr flush dev dummy2" + run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::2/64 metric 258" + set +e + + check_route6 "2001:db8:104::/64 dev dummy1 proto kernel metric 257 2001:db8:104::/64 dev dummy2 proto kernel metric 258" + log_test $? 0 "User specified metric on second device" + + run_cmd "$IP -6 addr del dev dummy1 2001:db8:104::1/64 metric 257" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 258" + rc=$? + fi + log_test $rc 0 "Delete of address on first device" + + run_cmd "$IP -6 addr change dev dummy2 2001:db8:104::2/64 metric 259" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 259" + rc=$? + fi + log_test $rc 0 "Modify metric of address" + + # verify prefix route removed on down + run_cmd "ip netns exec ns1 sysctl -qw net.ipv6.conf.all.keep_addr_on_down=1" + run_cmd "$IP li set dev dummy2 down" + rc=$? + if [ $rc -eq 0 ]; then + out=$($IP -6 ro ls match 2001:db8:104::/64) + check_expected "${out}" "" + rc=$? + fi + log_test $rc 0 "Prefix route removed on link down" + + # verify prefix route re-inserted with assigned metric + run_cmd "$IP li set dev dummy2 up" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 259" + rc=$? + fi + log_test $rc 0 "Prefix route with metric on link up" + + # verify peer metric added correctly + set -e + run_cmd "$IP -6 addr flush dev dummy2" + run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::1 peer 2001:db8:104::2 metric 260" + set +e + + check_route6 "2001:db8:104::1 dev dummy2 proto kernel metric 260" + log_test $? 0 "Set metric with peer route on local side" + check_route6 "2001:db8:104::2 dev dummy2 proto kernel metric 260" + log_test $? 0 "Set metric with peer route on peer side" + + set -e + run_cmd "$IP -6 addr change dev dummy2 2001:db8:104::1 peer 2001:db8:104::3 metric 261" + set +e + + check_route6 "2001:db8:104::1 dev dummy2 proto kernel metric 261" + log_test $? 0 "Modify metric and peer address on local side" + check_route6 "2001:db8:104::3 dev dummy2 proto kernel metric 261" + log_test $? 0 "Modify metric and peer address on peer side" + + $IP li del dummy1 + $IP li del dummy2 + cleanup +} + +ipv6_route_metrics_test() +{ + local rc + + echo + echo "IPv6 routes with metrics" + + route_setup + + # + # single path with metrics + # + run_cmd "$IP -6 ro add 2001:db8:111::/64 via 2001:db8:101::2 mtu 1400" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:111::/64 via 2001:db8:101::2 dev veth1 metric 1024 mtu 1400" + rc=$? + fi + log_test $rc 0 "Single path route with mtu metric" + + + # + # multipath via separate routes with metrics + # + run_cmd "$IP -6 ro add 2001:db8:112::/64 via 2001:db8:101::2 mtu 1400" + run_cmd "$IP -6 ro append 2001:db8:112::/64 via 2001:db8:103::2" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:112::/64 metric 1024 mtu 1400 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + rc=$? + fi + log_test $rc 0 "Multipath route via 2 single routes with mtu metric on first" + + # second route is coalesced to first to make a multipath route. + # MTU of the second path is hidden from display! + run_cmd "$IP -6 ro add 2001:db8:113::/64 via 2001:db8:101::2" + run_cmd "$IP -6 ro append 2001:db8:113::/64 via 2001:db8:103::2 mtu 1400" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:113::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + rc=$? + fi + log_test $rc 0 "Multipath route via 2 single routes with mtu metric on 2nd" + + run_cmd "$IP -6 ro del 2001:db8:113::/64 via 2001:db8:101::2" + if [ $? -eq 0 ]; then + check_route6 "2001:db8:113::/64 via 2001:db8:103::2 dev veth3 metric 1024 mtu 1400" + log_test $? 0 " MTU of second leg" + fi + + # + # multipath with metrics + # + run_cmd "$IP -6 ro add 2001:db8:115::/64 mtu 1400 nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:115::/64 metric 1024 mtu 1400 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + rc=$? + fi + log_test $rc 0 "Multipath route with mtu metric" + + $IP -6 ro add 2001:db8:104::/64 via 2001:db8:101::2 mtu 1300 + run_cmd "ip netns exec ns1 ${ping6} -w1 -c1 -s 1500 2001:db8:104::1" + log_test $? 0 "Using route with mtu metric" + + run_cmd "$IP -6 ro add 2001:db8:114::/64 via 2001:db8:101::2 congctl lock foo" + log_test $? 2 "Invalid metric (fails metric_convert)" + + route_cleanup +} + +# add route for a prefix, flushing any existing routes first +# expected to be the first step of a test +add_route() +{ + local pfx="$1" + local nh="$2" + local out + + if [ "$VERBOSE" = "1" ]; then + echo + echo " ##################################################" + echo + fi + + run_cmd "$IP ro flush ${pfx}" + [ $? -ne 0 ] && exit 1 + + out=$($IP ro ls match ${pfx}) + if [ -n "$out" ]; then + echo "Failed to flush routes for prefix used for tests." + exit 1 + fi + + run_cmd "$IP ro add ${pfx} ${nh}" + if [ $? -ne 0 ]; then + echo "Failed to add initial route for test." + exit 1 + fi +} + +# add initial route - used in replace route tests +add_initial_route() +{ + add_route "172.16.104.0/24" "$1" +} + +check_route() +{ + local pfx + local expected="$1" + local out + + set -- $expected + pfx=$1 + [ "${pfx}" = "unreachable" ] && pfx=$2 + + out=$($IP ro ls match ${pfx}) + check_expected "${out}" "${expected}" +} + +# assumption is that basic add of a single path route works +# otherwise just adding an address on an interface is broken +ipv4_rt_add() +{ + local rc + + echo + echo "IPv4 route add / append tests" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.2" + log_test $? 2 "Attempt to add duplicate route - gw" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro add 172.16.104.0/24 dev veth3" + log_test $? 2 "Attempt to add duplicate route - dev only" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro add unreachable 172.16.104.0/24" + log_test $? 2 "Attempt to add duplicate route - reject route" + + # iproute2 prepend only sets NLM_F_CREATE + # - adds a new route; does NOT convert existing route to ECMP + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro prepend 172.16.104.0/24 via 172.16.103.2" + check_route "172.16.104.0/24 via 172.16.103.2 dev veth3 172.16.104.0/24 via 172.16.101.2 dev veth1" + log_test $? 0 "Add new nexthop for existing prefix" + + # route append with same prefix adds a new route + # - iproute2 sets NLM_F_CREATE | NLM_F_APPEND + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro append 172.16.104.0/24 via 172.16.103.2" + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 via 172.16.103.2 dev veth3" + log_test $? 0 "Append nexthop to existing route - gw" + + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro append 172.16.104.0/24 dev veth3" + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 dev veth3 scope link" + log_test $? 0 "Append nexthop to existing route - dev only" + + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro append unreachable 172.16.104.0/24" + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 unreachable 172.16.104.0/24" + log_test $? 0 "Append nexthop to existing route - reject route" + + run_cmd "$IP ro flush 172.16.104.0/24" + run_cmd "$IP ro add unreachable 172.16.104.0/24" + run_cmd "$IP ro append 172.16.104.0/24 via 172.16.103.2" + check_route "unreachable 172.16.104.0/24 172.16.104.0/24 via 172.16.103.2 dev veth3" + log_test $? 0 "Append nexthop to existing reject route - gw" + + run_cmd "$IP ro flush 172.16.104.0/24" + run_cmd "$IP ro add unreachable 172.16.104.0/24" + run_cmd "$IP ro append 172.16.104.0/24 dev veth3" + check_route "unreachable 172.16.104.0/24 172.16.104.0/24 dev veth3 scope link" + log_test $? 0 "Append nexthop to existing reject route - dev only" + + # insert mpath directly + add_route "172.16.104.0/24" "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + log_test $? 0 "add multipath route" + + add_route "172.16.104.0/24" "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro add 172.16.104.0/24 nexthop via 172.16.101.2 nexthop via 172.16.103.2" + log_test $? 2 "Attempt to add duplicate multipath route" + + # insert of a second route without append but different metric + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.2 metric 512" + rc=$? + if [ $rc -eq 0 ]; then + run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.3 metric 256" + rc=$? + fi + log_test $rc 0 "Route add with different metrics" + + run_cmd "$IP ro del 172.16.104.0/24 metric 512" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 via 172.16.103.3 dev veth3 metric 256" + rc=$? + fi + log_test $rc 0 "Route delete with metric" +} + +ipv4_rt_replace_single() +{ + # single path with single path + # + add_initial_route "via 172.16.101.2" + run_cmd "$IP ro replace 172.16.104.0/24 via 172.16.103.2" + check_route "172.16.104.0/24 via 172.16.103.2 dev veth3" + log_test $? 0 "Single path with single path" + + # single path with multipath + # + add_initial_route "nexthop via 172.16.101.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.2" + check_route "172.16.104.0/24 nexthop via 172.16.101.3 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + log_test $? 0 "Single path with multipath" + + # single path with reject + # + add_initial_route "nexthop via 172.16.101.2" + run_cmd "$IP ro replace unreachable 172.16.104.0/24" + check_route "unreachable 172.16.104.0/24" + log_test $? 0 "Single path with reject route" + + # single path with single path using MULTIPATH attribute + # + add_initial_route "via 172.16.101.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.103.2" + check_route "172.16.104.0/24 via 172.16.103.2 dev veth3" + log_test $? 0 "Single path with single path via multipath attribute" + + # route replace fails - invalid nexthop + add_initial_route "via 172.16.101.2" + run_cmd "$IP ro replace 172.16.104.0/24 via 2001:db8:104::2" + if [ $? -eq 0 ]; then + # previous command is expected to fail so if it returns 0 + # that means the test failed. + log_test 0 1 "Invalid nexthop" + else + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1" + log_test $? 0 "Invalid nexthop" + fi + + # replace non-existent route + # - note use of change versus replace since ip adds NLM_F_CREATE + # for replace + add_initial_route "via 172.16.101.2" + run_cmd "$IP ro change 172.16.105.0/24 via 172.16.101.2" + log_test $? 2 "Single path - replace of non-existent route" +} + +ipv4_rt_replace_mpath() +{ + # multipath with multipath + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.3" + check_route "172.16.104.0/24 nexthop via 172.16.101.3 dev veth1 weight 1 nexthop via 172.16.103.3 dev veth3 weight 1" + log_test $? 0 "Multipath with multipath" + + # multipath with single + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 via 172.16.101.3" + check_route "172.16.104.0/24 via 172.16.101.3 dev veth1" + log_test $? 0 "Multipath with single path" + + # multipath with single + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3" + check_route "172.16.104.0/24 via 172.16.101.3 dev veth1" + log_test $? 0 "Multipath with single path via multipath attribute" + + # multipath with reject + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace unreachable 172.16.104.0/24" + check_route "unreachable 172.16.104.0/24" + log_test $? 0 "Multipath with reject route" + + # route replace fails - invalid nexthop 1 + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.111.3 nexthop via 172.16.103.3" + check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + log_test $? 0 "Multipath - invalid first nexthop" + + # route replace fails - invalid nexthop 2 + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.113.3" + check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + log_test $? 0 "Multipath - invalid second nexthop" + + # multipath non-existent route + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro change 172.16.105.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.3" + log_test $? 2 "Multipath - replace of non-existent route" +} + +ipv4_rt_replace() +{ + echo + echo "IPv4 route replace tests" + + ipv4_rt_replace_single + ipv4_rt_replace_mpath +} + +ipv4_route_test() +{ + route_setup + + ipv4_rt_add + ipv4_rt_replace + + route_cleanup +} + +ipv4_addr_metric_test() +{ + local rc + + echo + echo "IPv4 prefix route tests" + + ip_addr_metric_check || return 1 + + setup + + set -e + $IP li add dummy1 type dummy + $IP li add dummy2 type dummy + $IP li set dummy1 up + $IP li set dummy2 up + + # default entry is metric 256 + run_cmd "$IP addr add dev dummy1 172.16.104.1/24" + run_cmd "$IP addr add dev dummy2 172.16.104.2/24" + set +e + + check_route "172.16.104.0/24 dev dummy1 proto kernel scope link src 172.16.104.1 172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2" + log_test $? 0 "Default metric" + + set -e + run_cmd "$IP addr flush dev dummy1" + run_cmd "$IP addr add dev dummy1 172.16.104.1/24 metric 257" + set +e + + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 172.16.104.0/24 dev dummy1 proto kernel scope link src 172.16.104.1 metric 257" + log_test $? 0 "User specified metric on first device" + + set -e + run_cmd "$IP addr flush dev dummy2" + run_cmd "$IP addr add dev dummy2 172.16.104.2/24 metric 258" + set +e + + check_route "172.16.104.0/24 dev dummy1 proto kernel scope link src 172.16.104.1 metric 257 172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 258" + log_test $? 0 "User specified metric on second device" + + run_cmd "$IP addr del dev dummy1 172.16.104.1/24 metric 257" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 258" + rc=$? + fi + log_test $rc 0 "Delete of address on first device" + + run_cmd "$IP addr change dev dummy2 172.16.104.2/24 metric 259" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 259" + rc=$? + fi + log_test $rc 0 "Modify metric of address" + + # verify prefix route removed on down + run_cmd "$IP li set dev dummy2 down" + rc=$? + if [ $rc -eq 0 ]; then + out=$($IP ro ls match 172.16.104.0/24) + check_expected "${out}" "" + rc=$? + fi + log_test $rc 0 "Prefix route removed on link down" + + # verify prefix route re-inserted with assigned metric + run_cmd "$IP li set dev dummy2 up" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 259" + rc=$? + fi + log_test $rc 0 "Prefix route with metric on link up" + + # explicitly check for metric changes on edge scenarios + run_cmd "$IP addr flush dev dummy2" + run_cmd "$IP addr add dev dummy2 172.16.104.0/24 metric 259" + run_cmd "$IP addr change dev dummy2 172.16.104.0/24 metric 260" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.0 metric 260" + rc=$? + fi + log_test $rc 0 "Modify metric of .0/24 address" + + run_cmd "$IP addr flush dev dummy2" + run_cmd "$IP addr add dev dummy2 172.16.104.1/32 peer 172.16.104.2 metric 260" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.2 dev dummy2 proto kernel scope link src 172.16.104.1 metric 260" + rc=$? + fi + log_test $rc 0 "Set metric of address with peer route" + + run_cmd "$IP addr change dev dummy2 172.16.104.1/32 peer 172.16.104.3 metric 261" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.3 dev dummy2 proto kernel scope link src 172.16.104.1 metric 261" + rc=$? + fi + log_test $rc 0 "Modify metric and peer address for peer route" + + $IP li del dummy1 + $IP li del dummy2 + cleanup +} + +ipv4_route_metrics_test() +{ + local rc + + echo + echo "IPv4 route add / append tests" + + route_setup + + run_cmd "$IP ro add 172.16.111.0/24 via 172.16.101.2 mtu 1400" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.111.0/24 via 172.16.101.2 dev veth1 mtu 1400" + rc=$? + fi + log_test $rc 0 "Single path route with mtu metric" + + + run_cmd "$IP ro add 172.16.112.0/24 mtu 1400 nexthop via 172.16.101.2 nexthop via 172.16.103.2" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.112.0/24 mtu 1400 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + rc=$? + fi + log_test $rc 0 "Multipath route with mtu metric" + + $IP ro add 172.16.104.0/24 via 172.16.101.2 mtu 1300 + run_cmd "ip netns exec ns1 ping -w1 -c1 -s 1500 172.16.104.1" + log_test $? 0 "Using route with mtu metric" + + run_cmd "$IP ro add 172.16.111.0/24 via 172.16.101.2 congctl lock foo" + log_test $? 2 "Invalid metric (fails metric_convert)" + + route_cleanup +} + +ipv4_route_v6_gw_test() +{ + local rc + + echo + echo "IPv4 route with IPv6 gateway tests" + + route_setup + sleep 2 + + # + # single path route + # + run_cmd "$IP ro add 172.16.104.0/24 via inet6 2001:db8:101::2" + rc=$? + log_test $rc 0 "Single path route with IPv6 gateway" + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 via inet6 2001:db8:101::2 dev veth1" + fi + + run_cmd "ip netns exec ns1 ping -w1 -c1 172.16.104.1" + log_test $rc 0 "Single path route with IPv6 gateway - ping" + + run_cmd "$IP ro del 172.16.104.0/24 via inet6 2001:db8:101::2" + rc=$? + log_test $rc 0 "Single path route delete" + if [ $rc -eq 0 ]; then + check_route "172.16.112.0/24" + fi + + # + # multipath - v6 then v4 + # + run_cmd "$IP ro add 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3" + rc=$? + log_test $rc 0 "Multipath route add - v6 nexthop then v4" + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + fi + + run_cmd "$IP ro del 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1" + log_test $? 2 " Multipath route delete - nexthops in wrong order" + + run_cmd "$IP ro del 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3" + log_test $? 0 " Multipath route delete exact match" + + # + # multipath - v4 then v6 + # + run_cmd "$IP ro add 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1" + rc=$? + log_test $rc 0 "Multipath route add - v4 nexthop then v6" + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 weight 1 nexthop via inet6 2001:db8:101::2 dev veth1 weight 1" + fi + + run_cmd "$IP ro del 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3" + log_test $? 2 " Multipath route delete - nexthops in wrong order" + + run_cmd "$IP ro del 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1" + log_test $? 0 " Multipath route delete exact match" + + route_cleanup +} + +################################################################################ +# usage + +usage() +{ + cat <<EOF +usage: ${0##*/} OPTS + + -t <test> Test(s) to run (default: all) + (options: $TESTS) + -p Pause on fail + -P Pause after each test before cleanup + -v verbose mode (show commands and output) +EOF +} + +################################################################################ +# main + +while getopts :t:pPhv o +do + case $o in + t) TESTS=$OPTARG;; + p) PAUSE_ON_FAIL=yes;; + P) PAUSE=yes;; + v) VERBOSE=$(($VERBOSE + 1));; + h) usage; exit 0;; + *) usage; exit 1;; + esac +done + +PEER_CMD="ip netns exec ${PEER_NS}" + +# make sure we don't pause twice +[ "${PAUSE}" = "yes" ] && PAUSE_ON_FAIL=no + +if [ "$(id -u)" -ne 0 ];then + echo "SKIP: Need root privileges" + exit $ksft_skip; +fi + +if [ ! -x "$(command -v ip)" ]; then + echo "SKIP: Could not run test without ip tool" + exit $ksft_skip +fi + +ip route help 2>&1 | grep -q fibmatch +if [ $? -ne 0 ]; then + echo "SKIP: iproute2 too old, missing fibmatch" + exit $ksft_skip +fi + +# start clean +cleanup &> /dev/null + +for t in $TESTS +do + case $t in + fib_unreg_test|unregister) fib_unreg_test;; + fib_down_test|down) fib_down_test;; + fib_carrier_test|carrier) fib_carrier_test;; + fib_rp_filter_test|rp_filter) fib_rp_filter_test;; + fib_nexthop_test|nexthop) fib_nexthop_test;; + fib_suppress_test|suppress) fib_suppress_test;; + ipv6_route_test|ipv6_rt) ipv6_route_test;; + ipv4_route_test|ipv4_rt) ipv4_route_test;; + ipv6_addr_metric) ipv6_addr_metric_test;; + ipv4_addr_metric) ipv4_addr_metric_test;; + ipv6_route_metrics) ipv6_route_metrics_test;; + ipv4_route_metrics) ipv4_route_metrics_test;; + ipv4_route_v6_gw) ipv4_route_v6_gw_test;; + + help) echo "Test names: $TESTS"; exit 0;; + esac +done + +if [ "$TESTS" != "none" ]; then + printf "\nTests passed: %3d\n" ${nsuccess} + printf "Tests failed: %3d\n" ${nfail} +fi + +exit $ret
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f96a3d74554df537b6db5c99c27c80e7afadc8d1 ]
Cited commit added the table ID to the FIB info structure, but did not prevent structures with different table IDs from being consolidated. This can lead to routes being flushed from a VRF when an address is deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB info. This is already done for FIB info structures backed by a nexthop object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [FAIL] TEST: Route in default VRF not removed [ OK ] RTNETLINK answers: File exists TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6 Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8 Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Shaoying Xu shaoyi@amazon.com --- net/ipv4/fib_semantics.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 908913d75847..f45b9daf62cf 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -420,6 +420,7 @@ static struct fib_info *fib_find_info(struct fib_info *nfi) nfi->fib_prefsrc == fi->fib_prefsrc && nfi->fib_priority == fi->fib_priority && nfi->fib_type == fi->fib_type && + nfi->fib_tb_id == fi->fib_tb_id && memcmp(nfi->fib_metrics, fi->fib_metrics, sizeof(u32) * RTAX_MAX) == 0 && !((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_COMPARE_MASK) &&
On 2/7/23 11:28 AM, Shaoying Xu wrote:
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f96a3d74554df537b6db5c99c27c80e7afadc8d1 ]
That commit, does not have this change:
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 908913d75847..f45b9daf62cf 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -420,6 +420,7 @@ static struct fib_info *fib_find_info(struct fib_info *nfi) nfi->fib_prefsrc == fi->fib_prefsrc && nfi->fib_priority == fi->fib_priority && nfi->fib_type == fi->fib_type &&
memcmp(nfi->fib_metrics, fi->fib_metrics, sizeof(u32) * RTAX_MAX) == 0 && !((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_COMPARE_MASK) &&nfi->fib_tb_id == fi->fib_tb_id &&
On 2/7/23 7:22 PM, David Ahern wrote:
On 2/7/23 11:28 AM, Shaoying Xu wrote:
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f96a3d74554df537b6db5c99c27c80e7afadc8d1 ]
That commit, does not have this change:
nevermind. I was looking at the other commit with that Fixes tag.
You did drop the selftests associated with the commit.
On Tue, Feb 07, 2023 at 07:23:41PM -0700, David Ahern wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On 2/7/23 7:22 PM, David Ahern wrote:
On 2/7/23 11:28 AM, Shaoying Xu wrote:
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f96a3d74554df537b6db5c99c27c80e7afadc8d1 ]
That commit, does not have this change:
nevermind. I was looking at the other commit with that Fixes tag.
You did drop the selftests associated with the commit.
Yes because ipv4_del_addr test does not exist in fib_tests.sh of kernel 5.4 and similar drop in the commit ("ipv4: Fix incorrect route flushing when table ID 0 is used") [1].
[1] https://lore.kernel.org/stable/20221212130920.527377379@linuxfoundation.org/
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit c0d999348e01df03e0a7f550351f3907fabbf611 ]
Cited commit added the table ID to the FIB info structure, but did not properly initialize it when table ID 0 is used. This can lead to a route in the default VRF with a preferred source address not being flushed when the address is deleted.
Consider the following example:
# ip address add dev dummy1 192.0.2.1/28 # ip address add dev dummy1 192.0.2.17/28 # ip route add 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 100 # ip route add table 0 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 200 # ip route show 198.51.100.0/24 198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 100 198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Both routes are installed in the default VRF, but they are using two different FIB info structures. One with a metric of 100 and table ID of 254 (main) and one with a metric of 200 and table ID of 0. Therefore, when the preferred source address is deleted from the default VRF, the second route is not flushed:
# ip address del dev dummy1 192.0.2.17/28 # ip route show 198.51.100.0/24 198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Fix by storing a table ID of 254 instead of 0 in the route configuration structure.
Add a test case that fails before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Table ID 0 TEST: Route removed in default VRF when source address deleted [FAIL]
Tests passed: 8 Tests failed: 1
And passes after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Table ID 0 TEST: Route removed in default VRF when source address deleted [ OK ]
Tests passed: 9 Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs") Reported-by: Donald Sharp sharpd@nvidia.com Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/fib_frontend.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index d38c8ca93ba0..be31eeacb0be 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -840,6 +840,9 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb, return -EINVAL; }
+ if (!cfg->fc_table) + cfg->fc_table = RT_TABLE_MAIN; + return 0; errout: return err;
From: Zhang Changzhong zhangchangzhong@huawei.com
[ Upstream commit 063a932b64db3317ec020c94466fe52923a15f60 ]
The greth_init_rings() function won't free the newly allocated skb when dma_mapping_error() returns error, so add dev_kfree_skb() to fix it.
Compile tested only.
Fixes: d4c41139df6e ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver") Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/1670134149-29516-1-git-send-email-zhangchangzhong@... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/aeroflex/greth.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/aeroflex/greth.c b/drivers/net/ethernet/aeroflex/greth.c index 907904c0a288..19e2b838750c 100644 --- a/drivers/net/ethernet/aeroflex/greth.c +++ b/drivers/net/ethernet/aeroflex/greth.c @@ -258,6 +258,7 @@ static int greth_init_rings(struct greth_private *greth) if (dma_mapping_error(greth->dev, dma_addr)) { if (netif_msg_ifup(greth)) dev_err(greth->dev, "Could not create initial DMA mapping\n"); + dev_kfree_skb(skb); goto cleanup; } greth->rx_skbuff[i] = skb;
From: Juergen Gross jgross@suse.com
[ Upstream commit 7dfa764e0223a324366a2a1fc056d4d9d4e95491 ]
Commit ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area") introduced a (valid) build warning. There have even been reports of this problem breaking networking of Xen guests.
Fixes: ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area") Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Reviewed-by: Ross Lagerwall ross.lagerwall@citrix.com Tested-by: Jason Andryuk jandryuk@gmail.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/xen-netback/netback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 982e501173f1..036459670fc3 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -523,7 +523,7 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, const bool sharedslot = nr_frags && frag_get_pending_idx(&shinfo->frags[0]) == copy_pending_idx(skb, copy_count(skb) - 1); - int i, err; + int i, err = 0;
for (i = 0; i < copy_count(skb); i++) { int newerr;
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit 7d8c19bfc8ff3f78e5337107ca9246327fcb6b45 ]
It is not allowed to call kfree_skb() or consume_skb() from hardware interrupt context or with interrupts being disabled. So replace kfree_skb/dev_kfree_skb() with dev_kfree_skb_irq() and dev_consume_skb_irq() under spin_lock_irq().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Jiri Pirko jiri@nvidia.com Link: https://lore.kernel.org/r/20221207015310.2984909-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/plip/plip.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/plip/plip.c b/drivers/net/plip/plip.c index e8b7d596d749..4b50c28f01a7 100644 --- a/drivers/net/plip/plip.c +++ b/drivers/net/plip/plip.c @@ -444,12 +444,12 @@ plip_bh_timeout_error(struct net_device *dev, struct net_local *nl, } rcv->state = PLIP_PK_DONE; if (rcv->skb) { - kfree_skb(rcv->skb); + dev_kfree_skb_irq(rcv->skb); rcv->skb = NULL; } snd->state = PLIP_PK_DONE; if (snd->skb) { - dev_kfree_skb(snd->skb); + dev_consume_skb_irq(snd->skb); snd->skb = NULL; } spin_unlock_irq(&nl->lock);
From: Eric Dumazet edumazet@google.com
[ Upstream commit 803e84867de59a1e5d126666d25eb4860cfd2ebe ]
Blamed commit claimed rcu_read_lock() was held by ip6_fragment() callers.
It seems to not be always true, at least for UDP stack.
syzbot reported:
BUG: KASAN: use-after-free in ip6_dst_idev include/net/ip6_fib.h:245 [inline] BUG: KASAN: use-after-free in ip6_fragment+0x2724/0x2770 net/ipv6/ip6_output.c:951 Read of size 8 at addr ffff88801d403e80 by task syz-executor.3/7618
CPU: 1 PID: 7618 Comm: syz-executor.3 Not tainted 6.1.0-rc6-syzkaller-00012-g4312098baf37 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x15e/0x45d mm/kasan/report.c:395 kasan_report+0xbf/0x1f0 mm/kasan/report.c:495 ip6_dst_idev include/net/ip6_fib.h:245 [inline] ip6_fragment+0x2724/0x2770 net/ipv6/ip6_output.c:951 __ip6_finish_output net/ipv6/ip6_output.c:193 [inline] ip6_finish_output+0x9a3/0x1170 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] ip6_local_out+0xb3/0x1a0 net/ipv6/output_core.c:161 ip6_send_skb+0xbb/0x340 net/ipv6/ip6_output.c:1966 udp_v6_send_skb+0x82a/0x18a0 net/ipv6/udp.c:1286 udp_v6_push_pending_frames+0x140/0x200 net/ipv6/udp.c:1313 udpv6_sendmsg+0x18da/0x2c80 net/ipv6/udp.c:1606 inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 sock_write_iter+0x295/0x3d0 net/socket.c:1108 call_write_iter include/linux/fs.h:2191 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x9ed/0xdd0 fs/read_write.c:584 ksys_write+0x1ec/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fde3588c0d9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fde365b6168 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fde359ac050 RCX: 00007fde3588c0d9 RDX: 000000000000ffdc RSI: 00000000200000c0 RDI: 000000000000000a RBP: 00007fde358e7ae9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fde35acfb1f R14: 00007fde365b6300 R15: 0000000000022000 </TASK>
Allocated by task 7618: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 __kasan_slab_alloc+0x82/0x90 mm/kasan/common.c:325 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3398 [inline] slab_alloc mm/slub.c:3406 [inline] __kmem_cache_alloc_lru mm/slub.c:3413 [inline] kmem_cache_alloc+0x2b4/0x3d0 mm/slub.c:3422 dst_alloc+0x14a/0x1f0 net/core/dst.c:92 ip6_dst_alloc+0x32/0xa0 net/ipv6/route.c:344 ip6_rt_pcpu_alloc net/ipv6/route.c:1369 [inline] rt6_make_pcpu_route net/ipv6/route.c:1417 [inline] ip6_pol_route+0x901/0x1190 net/ipv6/route.c:2254 pol_lookup_func include/net/ip6_fib.h:582 [inline] fib6_rule_lookup+0x52e/0x6f0 net/ipv6/fib6_rules.c:121 ip6_route_output_flags_noref+0x2e6/0x380 net/ipv6/route.c:2625 ip6_route_output_flags+0x76/0x320 net/ipv6/route.c:2638 ip6_route_output include/net/ip6_route.h:98 [inline] ip6_dst_lookup_tail+0x5ab/0x1620 net/ipv6/ip6_output.c:1092 ip6_dst_lookup_flow+0x90/0x1d0 net/ipv6/ip6_output.c:1222 ip6_sk_dst_lookup_flow+0x553/0x980 net/ipv6/ip6_output.c:1260 udpv6_sendmsg+0x151d/0x2c80 net/ipv6/udp.c:1554 inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 __sys_sendto+0x23a/0x340 net/socket.c:2117 __do_sys_sendto net/socket.c:2129 [inline] __se_sys_sendto net/socket.c:2125 [inline] __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 7599: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2e/0x40 mm/kasan/generic.c:511 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free+0x160/0x1c0 mm/kasan/common.c:200 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1724 [inline] slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1750 slab_free mm/slub.c:3661 [inline] kmem_cache_free+0xee/0x5c0 mm/slub.c:3683 dst_destroy+0x2ea/0x400 net/core/dst.c:127 rcu_do_batch kernel/rcu/tree.c:2250 [inline] rcu_core+0x81f/0x1980 kernel/rcu/tree.c:2510 __do_softirq+0x1fb/0xadc kernel/softirq.c:571
Last potentially related work creation: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:481 call_rcu+0x9d/0x820 kernel/rcu/tree.c:2798 dst_release net/core/dst.c:177 [inline] dst_release+0x7d/0xe0 net/core/dst.c:167 refdst_drop include/net/dst.h:256 [inline] skb_dst_drop include/net/dst.h:268 [inline] skb_release_head_state+0x250/0x2a0 net/core/skbuff.c:838 skb_release_all net/core/skbuff.c:852 [inline] __kfree_skb net/core/skbuff.c:868 [inline] kfree_skb_reason+0x151/0x4b0 net/core/skbuff.c:891 kfree_skb_list_reason+0x4b/0x70 net/core/skbuff.c:901 kfree_skb_list include/linux/skbuff.h:1227 [inline] ip6_fragment+0x2026/0x2770 net/ipv6/ip6_output.c:949 __ip6_finish_output net/ipv6/ip6_output.c:193 [inline] ip6_finish_output+0x9a3/0x1170 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] ip6_local_out+0xb3/0x1a0 net/ipv6/output_core.c:161 ip6_send_skb+0xbb/0x340 net/ipv6/ip6_output.c:1966 udp_v6_send_skb+0x82a/0x18a0 net/ipv6/udp.c:1286 udp_v6_push_pending_frames+0x140/0x200 net/ipv6/udp.c:1313 udpv6_sendmsg+0x18da/0x2c80 net/ipv6/udp.c:1606 inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 sock_write_iter+0x295/0x3d0 net/socket.c:1108 call_write_iter include/linux/fs.h:2191 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x9ed/0xdd0 fs/read_write.c:584 ksys_write+0x1ec/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Second to last potentially related work creation: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:481 call_rcu+0x9d/0x820 kernel/rcu/tree.c:2798 dst_release net/core/dst.c:177 [inline] dst_release+0x7d/0xe0 net/core/dst.c:167 refdst_drop include/net/dst.h:256 [inline] skb_dst_drop include/net/dst.h:268 [inline] __dev_queue_xmit+0x1b9d/0x3ba0 net/core/dev.c:4211 dev_queue_xmit include/linux/netdevice.h:3008 [inline] neigh_resolve_output net/core/neighbour.c:1552 [inline] neigh_resolve_output+0x51b/0x840 net/core/neighbour.c:1532 neigh_output include/net/neighbour.h:546 [inline] ip6_finish_output2+0x56c/0x1530 net/ipv6/ip6_output.c:134 __ip6_finish_output net/ipv6/ip6_output.c:195 [inline] ip6_finish_output+0x694/0x1170 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] NF_HOOK include/linux/netfilter.h:302 [inline] NF_HOOK include/linux/netfilter.h:296 [inline] mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x720/0xdc0 net/ipv6/mcast.c:2653 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
The buggy address belongs to the object at ffff88801d403dc0 which belongs to the cache ip6_dst_cache of size 240 The buggy address is located 192 bytes inside of 240-byte region [ffff88801d403dc0, ffff88801d403eb0)
The buggy address belongs to the physical page: page:ffffea00007500c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d403 memcg:ffff888022f49c81 flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000000200 ffffea0001ef6580 dead000000000002 ffff88814addf640 raw: 0000000000000000 00000000800c000c 00000001ffffffff ffff888022f49c81 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 3719, tgid 3719 (kworker/0:6), ts 136223432244, free_ts 136222971441 prep_new_page mm/page_alloc.c:2539 [inline] get_page_from_freelist+0x10b5/0x2d50 mm/page_alloc.c:4288 __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5555 alloc_pages+0x1aa/0x270 mm/mempolicy.c:2285 alloc_slab_page mm/slub.c:1794 [inline] allocate_slab+0x213/0x300 mm/slub.c:1939 new_slab mm/slub.c:1992 [inline] ___slab_alloc+0xa91/0x1400 mm/slub.c:3180 __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3279 slab_alloc_node mm/slub.c:3364 [inline] slab_alloc mm/slub.c:3406 [inline] __kmem_cache_alloc_lru mm/slub.c:3413 [inline] kmem_cache_alloc+0x31a/0x3d0 mm/slub.c:3422 dst_alloc+0x14a/0x1f0 net/core/dst.c:92 ip6_dst_alloc+0x32/0xa0 net/ipv6/route.c:344 icmp6_dst_alloc+0x71/0x680 net/ipv6/route.c:3261 mld_sendpack+0x5de/0xe70 net/ipv6/mcast.c:1809 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x720/0xdc0 net/ipv6/mcast.c:2653 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1459 [inline] free_pcp_prepare+0x65c/0xd90 mm/page_alloc.c:1509 free_unref_page_prepare mm/page_alloc.c:3387 [inline] free_unref_page+0x1d/0x4d0 mm/page_alloc.c:3483 __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2586 qlink_free mm/kasan/quarantine.c:168 [inline] qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187 kasan_quarantine_reduce+0x184/0x210 mm/kasan/quarantine.c:294 __kasan_slab_alloc+0x66/0x90 mm/kasan/common.c:302 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3398 [inline] kmem_cache_alloc_node+0x304/0x410 mm/slub.c:3443 __alloc_skb+0x214/0x300 net/core/skbuff.c:497 alloc_skb include/linux/skbuff.h:1267 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1191 [inline] netlink_sendmsg+0x9a6/0xe10 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 __sys_sendto+0x23a/0x340 net/socket.c:2117 __do_sys_sendto net/socket.c:2129 [inline] __se_sys_sendto net/socket.c:2125 [inline] __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 1758fd4688eb ("ipv6: remove unnecessary dst_hold() in ip6_fragment()") Reported-by: syzbot+8c0ac31aa9681abb9e2d@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Wei Wang weiwan@google.com Cc: Martin KaFai Lau kafai@fb.com Link: https://lore.kernel.org/r/20221206101351.2037285-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 5585e3a94f3c..457eb07be482 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -919,6 +919,9 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, if (err < 0) goto fail;
+ /* We prevent @rt from being freed. */ + rcu_read_lock(); + for (;;) { /* Prepare header of the next frame, * before previous one went down. */ @@ -942,6 +945,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, if (err == 0) { IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), IPSTATS_MIB_FRAGOKS); + rcu_read_unlock(); return 0; }
@@ -949,6 +953,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), IPSTATS_MIB_FRAGFAILS); + rcu_read_unlock(); return err;
slow_path_clean:
From: Dan Carpenter error27@gmail.com
[ Upstream commit cdd97383e19d4afe29adc3376025a15ae3bab3a3 ]
In an earlier commit, I added a bounds check to prevent an out of bounds read and a WARN(). On further discussion and consideration that check was probably too aggressive. Instead of returning -EINVAL, a better fix would be to just prevent the out of bounds read but continue the process.
Background: The value of "pp->rxq_def" is a number between 0-7 by default, or even higher depending on the value of "rxq_number", which is a module parameter. If the value is more than the number of available CPUs then it will trigger the WARN() in cpu_max_bits_warn().
Fixes: e8b4fc13900b ("net: mvneta: Prevent out of bounds read in mvneta_config_rss()") Signed-off-by: Dan Carpenter error27@gmail.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/Y5A7d1E5ccwHTYPf@kadam Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/mvneta.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 67ba697518d6..2c1ee3268498 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -3717,7 +3717,7 @@ static void mvneta_percpu_elect(struct mvneta_port *pp) /* Use the cpu associated to the rxq when it is online, in all * the other cases, use the cpu 0 which can't be offline. */ - if (cpu_online(pp->rxq_def)) + if (pp->rxq_def < nr_cpu_ids && cpu_online(pp->rxq_def)) elected_cpu = pp->rxq_def;
max_cpu = num_present_cpus(); @@ -4235,9 +4235,6 @@ static int mvneta_config_rss(struct mvneta_port *pp) napi_disable(&pp->napi); }
- if (pp->indir[0] >= nr_cpu_ids) - return -EINVAL; - pp->rxq_def = pp->indir[0];
/* Update unicast mapping */
From: Frank Jungclaus frank.jungclaus@esd.eu
[ Upstream commit 918ee4911f7a41fb4505dff877c1d7f9f64eb43e ]
We don't get any further EVENT from an esd CAN USB device for changes on REC or TEC while those counters converge to 0 (with ecc == 0). So when handling the "Back to Error Active"-event force txerr = rxerr = 0, otherwise the berr-counters might stay on values like 95 forever.
Also, to make life easier during the ongoing development a netdev_dbg() has been introduced to allow dumping error events send by an esd CAN USB device.
Fixes: 96d8e90382dc ("can: Add driver for esd CAN-USB/2 device") Signed-off-by: Frank Jungclaus frank.jungclaus@esd.eu Link: https://lore.kernel.org/all/20221130202242.3998219-2-frank.jungclaus@esd.eu Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/usb/esd_usb2.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/can/usb/esd_usb2.c b/drivers/net/can/usb/esd_usb2.c index 8847942a8d97..73c5343e609b 100644 --- a/drivers/net/can/usb/esd_usb2.c +++ b/drivers/net/can/usb/esd_usb2.c @@ -227,6 +227,10 @@ static void esd_usb2_rx_event(struct esd_usb2_net_priv *priv, u8 rxerr = msg->msg.rx.data[2]; u8 txerr = msg->msg.rx.data[3];
+ netdev_dbg(priv->netdev, + "CAN_ERR_EV_EXT: dlc=%#02x state=%02x ecc=%02x rec=%02x tec=%02x\n", + msg->msg.rx.dlc, state, ecc, rxerr, txerr); + skb = alloc_can_err_skb(priv->netdev, &cf); if (skb == NULL) { stats->rx_dropped++; @@ -253,6 +257,8 @@ static void esd_usb2_rx_event(struct esd_usb2_net_priv *priv, break; default: priv->can.state = CAN_STATE_ERROR_ACTIVE; + txerr = 0; + rxerr = 0; break; } } else {
On 12/12/22 05:16, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.227 release. There are 67 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.227-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On 12/12/22 06:16, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.227 release. There are 67 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.227-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Mon, Dec 12, 2022 at 02:16:35PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.227 release. There are 67 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
Build results: total: 159 pass: 159 fail: 0 Qemu test results: total: 447 pass: 447 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Mon, 12 Dec 2022 at 18:50, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.227 release. There are 67 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.227-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
NOTE: Following build warning found,
mm/hugetlb.c: In function 'follow_huge_pmd_pte': mm/hugetlb.c:5191:1: warning: label 'out' defined but not used [-Wunused-label] 5191 | out: | ^~~
details of commit causing this build warning. mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page commit fac35ba763ed07ba93154c95ffc0c4a55023707f upstream.
## Build * kernel: 5.4.227-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.4.y * git commit: 8c05f5e0777d154e70c3ab34e0fb0e1778b7e23c * git describe: v5.4.226-68-g8c05f5e0777d * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/build/v5.4.22...
## Test Regressions (compared to v5.4.226)
## Metric Regressions (compared to v5.4.226)
## Test Fixes (compared to v5.4.226)
## Metric Fixes (compared to v5.4.226)
## Test result summary total: 114898, pass: 99995, fail: 1941, skip: 12727, xfail: 235
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 146 total, 145 passed, 1 failed * arm64: 44 total, 40 passed, 4 failed * i386: 26 total, 20 passed, 6 failed * mips: 27 total, 27 passed, 0 failed * parisc: 6 total, 6 passed, 0 failed * powerpc: 30 total, 30 passed, 0 failed * riscv: 12 total, 12 passed, 0 failed * s390: 6 total, 6 passed, 0 failed * sh: 12 total, 12 passed, 0 failed * sparc: 6 total, 6 passed, 0 failed * x86_64: 37 total, 35 passed, 2 failed
## Test suites summary * boot * fwts * igt-gpu-tools * kselftest-android * kselftest-arm64 * kselftest-arm64/arm64.btitest.bti_c_func * kselftest-arm64/arm64.btitest.bti_j_func * kselftest-arm64/arm64.btitest.bti_jc_func * kselftest-arm64/arm64.btitest.bti_none_func * kselftest-arm64/arm64.btitest.nohint_func * kselftest-arm64/arm64.btitest.paciasp_func * kselftest-arm64/arm64.nobtitest.bti_c_func * kselftest-arm64/arm64.nobtitest.bti_j_func * kselftest-arm64/arm64.nobtitest.bti_jc_func * kselftest-arm64/arm64.nobtitest.bti_none_func * kselftest-arm64/arm64.nobtitest.nohint_func * kselftest-arm64/arm64.nobtitest.paciasp_func * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-open-posix-tests * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * perf * perf/Zstd-perf.data-compression * rcutorture * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
On Tue, Dec 13, 2022 at 02:50:21PM +0530, Naresh Kamboju wrote:
On Mon, 12 Dec 2022 at 18:50, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.227 release. There are 67 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.227-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
NOTE: Following build warning found,
mm/hugetlb.c: In function 'follow_huge_pmd_pte': mm/hugetlb.c:5191:1: warning: label 'out' defined but not used [-Wunused-label] 5191 | out: | ^~~
details of commit causing this build warning. mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page commit fac35ba763ed07ba93154c95ffc0c4a55023707f upstream.
Thanks, I'll go drop that commit now.
greg k-h
Hi Greg,
On Mon, Dec 12, 2022 at 02:16:35PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.227 release. There are 67 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
Build test (gcc version 11.3.1 20221127): mips: 65 configs -> no failure arm: 106 configs -> no failure arm64: 2 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1]
[1]. https://openqa.qa.codethink.co.uk/tests/2338
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
linux-stable-mirror@lists.linaro.org