This is the start of the stable review cycle for the 5.10.159 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.159-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.10.159-rc1
Frank Jungclaus frank.jungclaus@esd.eu can: esd_usb: Allow REC and TEC to return to zero
Emeel Hakim ehakim@nvidia.com macsec: add missing attribute validation for offload
Dan Carpenter error27@gmail.com net: mvneta: Fix an out of bounds check
Eric Dumazet edumazet@google.com ipv6: avoid use-after-free in ip6_fragment()
Yang Yingliang yangyingliang@huawei.com net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq()
Juergen Gross jgross@suse.com xen/netback: fix build warning
Zhang Changzhong zhangchangzhong@huawei.com ethernet: aeroflex: fix potential skb leak in greth_init_rings()
Xin Long lucien.xin@gmail.com tipc: call tipc_lxc_xmit without holding node_read_lock
Zhengchao Shao shaozhengchao@huawei.com net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions()
Ido Schimmel idosch@nvidia.com ipv4: Fix incorrect route flushing when table ID 0 is used
Ido Schimmel idosch@nvidia.com ipv4: Fix incorrect route flushing when source address is deleted
YueHaibing yuehaibing@huawei.com tipc: Fix potential OOB in tipc_link_proto_rcv()
Liu Jian liujian56@huawei.com net: hisilicon: Fix potential use-after-free in hix5hd2_rx()
Liu Jian liujian56@huawei.com net: hisilicon: Fix potential use-after-free in hisi_femac_rx()
Yongqiang Liu liuyongqiang13@huawei.com net: thunderx: Fix missing destroy_workqueue of nicvf_rx_mode_wq
Hangbin Liu liuhangbin@gmail.com ip_gre: do not report erspan version on GRE interface
Jisheng Zhang jszhang@kernel.org net: stmmac: fix "snps,axi-config" node property parsing
Pankaj Raghav p.raghav@samsung.com nvme initialize core quirks before calling nvme_init_subsystem
Kees Cook keescook@chromium.org NFC: nci: Bounds check struct nfc_target arrays
Przemyslaw Patynowski przemyslawx.patynowski@intel.com i40e: Disallow ip4 and ip6 l4_4_bytes
Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com i40e: Fix for VF MAC address 0
Michal Jaron michalx.jaron@intel.com i40e: Fix not setting default xps_cpus after reset
Dan Carpenter error27@gmail.com net: mvneta: Prevent out of bounds read in mvneta_config_rss()
Lin Liu lin.liu@citrix.com xen-netfront: Fix NULL sring after live migration
Valentina Goncharenko goncharenko.vp@ispras.ru net: encx24j600: Fix invalid logic in reading of MISTAT register
Valentina Goncharenko goncharenko.vp@ispras.ru net: encx24j600: Add parentheses to fix precedence
Wei Yongjun weiyongjun1@huawei.com mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add()
Zhengchao Shao shaozhengchao@huawei.com selftests: rtnetlink: correct xfrm policy rule in kci_test_ipsec_offload
Artem Chernyshev artem.chernyshev@red-soft.ru net: dsa: ksz: Check return value
Chen Zhongjin chenzhongjin@huawei.com Bluetooth: Fix not cleanup led when bt_init fails
Wang ShaoBo bobo.shaobowang@huawei.com Bluetooth: 6LoWPAN: add missing hci_dev_put() in get_l2cap_conn()
Ronak Doshi doshir@vmware.com vmxnet3: correctly report encapsulated LRO packet
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Get user_ns from in_skb in unix_diag_get_exact().
Guillaume BRUN the.cheaterman@gmail.com drm: bridge: dw_hdmi: fix preference of RGB modes over YUV420
YueHaibing yuehaibing@huawei.com net: broadcom: Add PTP_1588_CLOCK_OPTIONAL dependency for BCMGENET under ARCH_BCM2835
Akihiko Odaki akihiko.odaki@daynix.com igb: Allocate MSI-X vector when testing
Akihiko Odaki akihiko.odaki@daynix.com e1000e: Fix TX dispatch condition
Xiongfeng Wang wangxiongfeng2@huawei.com gpio: amd8111: Fix PCI device reference count leak
Qiqi Zhang eddy.zhang@rock-chips.com drm/bridge: ti-sn65dsi86: Fix output polarity setting bug
Pablo Neira Ayuso pablo@netfilter.org netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark
Hauke Mehrtens hauke@hauke-m.de ca8210: Fix crash by zero initializing data
Ziyang Xuan william.xuanziyang@huawei.com ieee802154: cc2520: Fix error return code in cc2520_hw_init()
Stefano Brivio sbrivio@redhat.com netfilter: nft_set_pipapo: Actually validate intervals in fields after the first one
Dan Carpenter dan.carpenter@oracle.com rtc: mc146818-lib: fix signedness bug in mc146818_get_time()
Mateusz Jończyk mat.jonczyk@o2.pl rtc: mc146818-lib: fix locking in mc146818_set_time
Chris Wilson chris@chris-wilson.co.uk rtc: cmos: Disable irq around direct invocation of cmos_interrupt()
Baolin Wang baolin.wang@linux.alibaba.com mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
Oliver Hartkopp socketcan@hartkopp.net can: af_can: fix NULL pointer dereference in can_rcv_filter
ZhangPeng zhangpeng362@huawei.com HID: core: fix shift-out-of-bounds in hid_report_raw_event
Anastasia Belova abelova@astralinux.ru HID: hid-lg4ff: Add check for empty lbuf
Ankit Patel anpatel@nvidia.com HID: usbhid: Add ALWAYS_POLL quirk for some mice
Rob Clark robdclark@chromium.org drm/shmem-helper: Avoid vm_open error paths
Rob Clark robdclark@chromium.org drm/shmem-helper: Remove errant put in error path
Zack Rusin zackr@vmware.com drm/vmwgfx: Don't use screen objects when SEV is active
Thomas Huth thuth@redhat.com KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: Fix crash when replugging CSR fake controllers
Ismael Ferreras Morezuelas swyterzone@gmail.com Bluetooth: btusb: Add debug message for CSR controllers
John Starks jostarks@microsoft.com mm/gup: fix gup_pud_range() for dax
Tejun Heo tj@kernel.org memcg: fix possible use-after-free in memcg_write_event_control()
Hans Verkuil hverkuil-cisco@xs4all.nl media: v4l2-dv-timings.c: fix too strict blanking sanity checks
Francesco Dolcini francesco.dolcini@toradex.com Revert "ARM: dts: imx7: Fix NAND controller size-cells"
Hans Verkuil hverkuil-cisco@xs4all.nl media: videobuf2-core: take mmap_lock in vb2_get_unmapped_area()
Juergen Gross jgross@suse.com xen/netback: don't call kfree_skb() with interrupts disabled
Juergen Gross jgross@suse.com xen/netback: do some code cleanup
Ross Lagerwall ross.lagerwall@citrix.com xen/netback: Ensure protocol headers don't fall in the non-linear area
Thomas Gleixner tglx@linutronix.de rtc: mc146818: Reduce spinlock section in mc146818_set_time()
Xiaofei Tan tanxiaofei@huawei.com rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ
Mateusz Jończyk mat.jonczyk@o2.pl rtc: cmos: avoid UIP when reading alarm time
Mateusz Jończyk mat.jonczyk@o2.pl rtc: cmos: avoid UIP when writing alarm time
Mateusz Jończyk mat.jonczyk@o2.pl rtc: mc146818-lib: extract mc146818_avoid_UIP
Mateusz Jończyk mat.jonczyk@o2.pl rtc: mc146818-lib: fix RTC presence check
Mateusz Jończyk mat.jonczyk@o2.pl rtc: Check return value from mc146818_get_time()
Mateusz Jończyk mat.jonczyk@o2.pl rtc: mc146818-lib: change return values of mc146818_get_time()
Mateusz Jończyk mat.jonczyk@o2.pl rtc: cmos: remove stale REVISIT comments
Thomas Gleixner tglx@linutronix.de rtc: mc146818: Dont test for bit 0-5 in Register D
Thomas Gleixner tglx@linutronix.de rtc: mc146818: Detect and handle broken RTCs
Thomas Gleixner tglx@linutronix.de rtc: mc146818: Prevent reading garbage
Jann Horn jannh@google.com mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
Jann Horn jannh@google.com mm/khugepaged: fix GUP-fast interaction by sending IPI
Jann Horn jannh@google.com mm/khugepaged: take the right locks for page table retraction
Davide Tronchin davide.tronchin.94@gmail.com net: usb: qmi_wwan: add u-blox 0x1342 composition
Dominique Martinet asmadeus@codewreck.org 9p/xen: check logical size for buffer size
Thinh Nguyen Thinh.Nguyen@synopsys.com usb: dwc3: gadget: Disable GUSB2PHYCFG.SUSPHY for End Transfer
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp fbcon: Use kzalloc() in fbcon_prepare_logo()
Andreas Kemnade andreas@kemnade.info regulator: twl6030: fix get status of twl6032 regulators
Srinivasa Rao Mandadapu quic_srivasam@quicinc.com ASoC: soc-pcm: Add NULL check in BE reparenting
Filipe Manana fdmanana@suse.com btrfs: send: avoid unaligned encoded writes when attempting to clone range
Kees Cook keescook@chromium.org ALSA: seq: Fix function prototype mismatch in snd_seq_expand_var_event
Konrad Dybcio konrad.dybcio@linaro.org regulator: slg51000: Wait after asserting CS pin
GUO Zihua guozihua@huawei.com 9p/fd: Use P9_HDRSZ for header size
Johan Jonker jbx6244@gmail.com ARM: dts: rockchip: disable arm_global_timer on rk3066 and rk3188
Chancel Liu chancel.liu@nxp.com ASoC: wm8962: Wait for updated value of WM8962_CLOCKING1 register
Giulio Benetti giulio.benetti@benettiengineering.com ARM: 9266/1: mm: fix no-MMU ZERO_PAGE() implementation
Tomislav Novak tnovak@fb.com ARM: 9251/1: perf: Fix stacktraces for tracepoint events in THUMB2 kernels
Johan Jonker jbx6244@gmail.com ARM: dts: rockchip: rk3188: fix lcdc1-rgb24 node name
Johan Jonker jbx6244@gmail.com arm64: dts: rockchip: fix ir-receiver node names
Johan Jonker jbx6244@gmail.com ARM: dts: rockchip: fix ir-receiver node names
Sebastian Reichel sebastian.reichel@collabora.com arm: dts: rockchip: fix node name for hym8563 rtc
FUKAUMI Naoki naoki@radxa.com arm64: dts: rockchip: keep I2S1 disabled for GPIO function on ROCK Pi 4 series
Gavin Shan gshan@redhat.com mm: migrate: fix THP's mapcount on isolation
Hugh Dickins hughd@google.com mm: __isolate_lru_page_prepare() in isolate_migratepages_block()
Alex Shi alex.shi@linux.alibaba.com mm/vmscan: __isolate_lru_page_prepare() cleanup
Alex Shi alex.shi@linux.alibaba.com mm/compaction: do page isolation first in compaction
Alex Shi alex.shi@linux.alibaba.com mm/lru: introduce TestClearPageLRU()
Alex Shi alex.shi@linux.alibaba.com mm/mlock: remove __munlock_isolate_lru_page()
Alex Shi alex.shi@linux.alibaba.com mm/mlock: remove lru_lock on TestClearPageMlocked
-------------
Diffstat:
Makefile | 4 +- arch/alpha/kernel/rtc.c | 7 +- arch/arm/boot/dts/imx7s.dtsi | 4 +- arch/arm/boot/dts/rk3036-evb.dts | 2 +- arch/arm/boot/dts/rk3188-radxarock.dts | 2 +- arch/arm/boot/dts/rk3188.dtsi | 3 +- arch/arm/boot/dts/rk3288-evb-act8846.dts | 2 +- arch/arm/boot/dts/rk3288-firefly.dtsi | 2 +- arch/arm/boot/dts/rk3288-miqi.dts | 2 +- arch/arm/boot/dts/rk3288-rock2-square.dts | 2 +- arch/arm/boot/dts/rk3xxx.dtsi | 7 + arch/arm/include/asm/perf_event.h | 2 +- arch/arm/include/asm/pgtable-nommu.h | 6 - arch/arm/include/asm/pgtable.h | 16 +- arch/arm/mm/nommu.c | 19 ++ arch/arm64/boot/dts/rockchip/rk3308-roc-cc.dts | 2 +- arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi | 1 - arch/s390/kvm/vsie.c | 4 +- arch/x86/kernel/hpet.c | 8 +- drivers/base/power/trace.c | 6 +- drivers/bluetooth/btusb.c | 5 + drivers/gpio/gpio-amd8111.c | 4 + drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 6 +- drivers/gpu/drm/bridge/ti-sn65dsi86.c | 4 +- drivers/gpu/drm/drm_gem_shmem_helper.c | 18 +- drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 4 + drivers/hid/hid-core.c | 3 + drivers/hid/hid-ids.h | 3 + drivers/hid/hid-lg4ff.c | 6 + drivers/hid/hid-quirks.c | 3 + drivers/media/common/videobuf2/videobuf2-core.c | 102 ++++++--- drivers/media/v4l2-core/v4l2-dv-timings.c | 20 +- drivers/net/can/usb/esd_usb2.c | 6 + drivers/net/dsa/sja1105/sja1105_devlink.c | 2 + drivers/net/ethernet/aeroflex/greth.c | 1 + drivers/net/ethernet/broadcom/Kconfig | 1 + drivers/net/ethernet/cavium/thunder/nicvf_main.c | 4 +- drivers/net/ethernet/hisilicon/hisi_femac.c | 2 +- drivers/net/ethernet/hisilicon/hix5hd2_gmac.c | 2 +- drivers/net/ethernet/intel/e1000e/netdev.c | 4 +- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 6 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 19 +- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 + drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 + drivers/net/ethernet/marvell/mvneta.c | 2 +- drivers/net/ethernet/microchip/encx24j600-regmap.c | 4 +- .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 8 +- drivers/net/ieee802154/ca8210.c | 2 +- drivers/net/ieee802154/cc2520.c | 2 +- drivers/net/macsec.c | 1 + drivers/net/plip/plip.c | 4 +- drivers/net/usb/qmi_wwan.c | 1 + drivers/net/vmxnet3/vmxnet3_drv.c | 11 +- drivers/net/xen-netback/common.h | 14 +- drivers/net/xen-netback/interface.c | 22 +- drivers/net/xen-netback/netback.c | 229 ++++++++++++--------- drivers/net/xen-netback/rx.c | 10 +- drivers/net/xen-netfront.c | 6 + drivers/nvme/host/core.c | 8 +- drivers/regulator/slg51000-regulator.c | 2 + drivers/regulator/twl6030-regulator.c | 15 +- drivers/rtc/rtc-cmos.c | 209 ++++++++++++------- drivers/rtc/rtc-mc146818-lib.c | 169 ++++++++++++--- drivers/usb/dwc3/gadget.c | 3 +- drivers/video/fbdev/core/fbcon.c | 2 +- fs/btrfs/send.c | 24 ++- include/asm-generic/tlb.h | 4 + include/linux/cgroup.h | 1 + include/linux/hugetlb.h | 8 +- include/linux/mc146818rtc.h | 6 +- include/linux/page-flags.h | 1 + include/linux/swap.h | 1 - kernel/cgroup/cgroup-internal.h | 1 - mm/compaction.c | 95 +++++++-- mm/gup.c | 16 +- mm/hugetlb.c | 27 ++- mm/khugepaged.c | 47 ++++- mm/memcontrol.c | 15 +- mm/mlock.c | 56 ++--- mm/mmu_gather.c | 4 +- mm/vmscan.c | 155 ++++---------- net/9p/trans_fd.c | 6 +- net/9p/trans_xen.c | 9 + net/bluetooth/6lowpan.c | 1 + net/bluetooth/af_bluetooth.c | 4 +- net/bluetooth/hci_core.c | 3 +- net/can/af_can.c | 4 +- net/dsa/tag_ksz.c | 3 +- net/ipv4/fib_frontend.c | 3 + net/ipv4/fib_semantics.c | 1 + net/ipv4/ip_gre.c | 48 +++-- net/ipv6/ip6_output.c | 5 + net/mac802154/iface.c | 1 + net/netfilter/nf_conntrack_netlink.c | 19 +- net/netfilter/nft_set_pipapo.c | 5 +- net/nfc/nci/ntf.c | 6 + net/tipc/link.c | 4 +- net/tipc/node.c | 12 +- net/unix/diag.c | 20 +- sound/core/seq/seq_memory.c | 11 +- sound/soc/codecs/wm8962.c | 8 + sound/soc/soc-pcm.c | 2 + tools/testing/selftests/net/fib_tests.sh | 37 ++++ tools/testing/selftests/net/rtnetlink.sh | 2 +- 104 files changed, 1117 insertions(+), 612 deletions(-)
From: Alex Shi alex.shi@linux.alibaba.com
[ Upstream commit 3db19aa39bac33f2e850fa1ddd67be29b192e51f ]
In the func munlock_vma_page, comments mentained lru_lock needed for serialization with split_huge_pages. But the page must be PageLocked as well as pages in split_huge_page series funcs. Thus the PageLocked is enough to serialize both funcs.
Further more, Hugh Dickins pointed: before splitting in split_huge_page_to_list, the page was unmap_page() to remove pmd/ptes which protect the page from munlock. Thus, no needs to guard __split_huge_page_tail for mlock clean, just keep the lru_lock there for isolation purpose.
LKP found a preempt issue on __mod_zone_page_state which need change to mod_zone_page_state. Thanks!
Link: https://lkml.kernel.org/r/1604566549-62481-13-git-send-email-alex.shi@linux.... Signed-off-by: Alex Shi alex.shi@linux.alibaba.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Johannes Weiner hannes@cmpxchg.org Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alexander Duyck alexander.duyck@gmail.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: "Chen, Rong A" rong.a.chen@intel.com Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Jann Horn jannh@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kirill A. Shutemov kirill@shutemov.name Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Michal Hocko mhocko@kernel.org Cc: Michal Hocko mhocko@suse.com Cc: Mika Penttilä mika.penttila@nextfour.com Cc: Minchan Kim minchan@kernel.org Cc: Shakeel Butt shakeelb@google.com Cc: Tejun Heo tj@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vladimir Davydov vdavydov.dev@gmail.com Cc: Wei Yang richard.weiyang@gmail.com Cc: Yang Shi yang.shi@linux.alibaba.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Stable-dep-of: 829ae0f81ce0 ("mm: migrate: fix THP's mapcount on isolation") Signed-off-by: Sasha Levin sashal@kernel.org --- mm/mlock.c | 26 +++++--------------------- 1 file changed, 5 insertions(+), 21 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c index 884b1216da6a..796c726a0407 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -187,40 +187,24 @@ static void __munlock_isolation_failed(struct page *page) unsigned int munlock_vma_page(struct page *page) { int nr_pages; - pg_data_t *pgdat = page_pgdat(page);
/* For try_to_munlock() and to serialize with page migration */ BUG_ON(!PageLocked(page)); - VM_BUG_ON_PAGE(PageTail(page), page);
- /* - * Serialize with any parallel __split_huge_page_refcount() which - * might otherwise copy PageMlocked to part of the tail pages before - * we clear it in the head page. It also stabilizes thp_nr_pages(). - */ - spin_lock_irq(&pgdat->lru_lock); - if (!TestClearPageMlocked(page)) { /* Potentially, PTE-mapped THP: do not skip the rest PTEs */ - nr_pages = 1; - goto unlock_out; + return 0; }
nr_pages = thp_nr_pages(page); - __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); + mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
- if (__munlock_isolate_lru_page(page, true)) { - spin_unlock_irq(&pgdat->lru_lock); + if (!isolate_lru_page(page)) __munlock_isolated_page(page); - goto out; - } - __munlock_isolation_failed(page); - -unlock_out: - spin_unlock_irq(&pgdat->lru_lock); + else + __munlock_isolation_failed(page);
-out: return nr_pages - 1; }
On Mon, 12 Dec 2022, Greg Kroah-Hartman wrote:
From: Alex Shi alex.shi@linux.alibaba.com
[ Upstream commit 3db19aa39bac33f2e850fa1ddd67be29b192e51f ]
In the func munlock_vma_page, comments mentained lru_lock needed for serialization with split_huge_pages. But the page must be PageLocked as well as pages in split_huge_page series funcs. Thus the PageLocked is enough to serialize both funcs.
Further more, Hugh Dickins pointed: before splitting in split_huge_page_to_list, the page was unmap_page() to remove pmd/ptes which protect the page from munlock. Thus, no needs to guard __split_huge_page_tail for mlock clean, just keep the lru_lock there for isolation purpose.
LKP found a preempt issue on __mod_zone_page_state which need change to mod_zone_page_state. Thanks!
Link: https://lkml.kernel.org/r/1604566549-62481-13-git-send-email-alex.shi@linux.... Signed-off-by: Alex Shi alex.shi@linux.alibaba.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Johannes Weiner hannes@cmpxchg.org Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alexander Duyck alexander.duyck@gmail.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: "Chen, Rong A" rong.a.chen@intel.com Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Jann Horn jannh@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kirill A. Shutemov kirill@shutemov.name Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Michal Hocko mhocko@kernel.org Cc: Michal Hocko mhocko@suse.com Cc: Mika Penttilä mika.penttila@nextfour.com Cc: Minchan Kim minchan@kernel.org Cc: Shakeel Butt shakeelb@google.com Cc: Tejun Heo tj@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vladimir Davydov vdavydov.dev@gmail.com Cc: Wei Yang richard.weiyang@gmail.com Cc: Yang Shi yang.shi@linux.alibaba.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Stable-dep-of: 829ae0f81ce0 ("mm: migrate: fix THP's mapcount on isolation") Signed-off-by: Sasha Levin sashal@kernel.org
NAK from me to patches 001 through 007 here: 001 through 006 are a risky subset of patches and followups to a per-memcg per-node lru_lock series from Alex Shi, which made subtle changes to locking, memcg charging, lru management, page migration etc.
The whole series could be backported to 5.10 (I did so myself for internal usage), but cherry-picking parts of it into 5.10-stable is misguided and contrary to stable principles.
Maybe there is in fact nothing wrong with the selection made: but then give linux-mm guys two or three weeks to review and test and give the thumbs up to that selection.
Much easier, quicker and safer would be to adjust 007 (I presume the reason behind 001 through 006) to fit the 5.10-stable tree: I can do that myself if you ask, but not until later this week.
Hugh
mm/mlock.c | 26 +++++--------------------- 1 file changed, 5 insertions(+), 21 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c index 884b1216da6a..796c726a0407 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -187,40 +187,24 @@ static void __munlock_isolation_failed(struct page *page) unsigned int munlock_vma_page(struct page *page) { int nr_pages;
- pg_data_t *pgdat = page_pgdat(page);
/* For try_to_munlock() and to serialize with page migration */ BUG_ON(!PageLocked(page));
- VM_BUG_ON_PAGE(PageTail(page), page);
- /*
* Serialize with any parallel __split_huge_page_refcount() which
* might otherwise copy PageMlocked to part of the tail pages before
* we clear it in the head page. It also stabilizes thp_nr_pages().
*/
- spin_lock_irq(&pgdat->lru_lock);
- if (!TestClearPageMlocked(page)) { /* Potentially, PTE-mapped THP: do not skip the rest PTEs */
nr_pages = 1;
goto unlock_out;
}return 0;
nr_pages = thp_nr_pages(page);
- __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
- mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
- if (__munlock_isolate_lru_page(page, true)) {
spin_unlock_irq(&pgdat->lru_lock);
- if (!isolate_lru_page(page)) __munlock_isolated_page(page);
goto out;
- }
- __munlock_isolation_failed(page);
-unlock_out:
- spin_unlock_irq(&pgdat->lru_lock);
- else
__munlock_isolation_failed(page);
-out: return nr_pages - 1; } -- 2.35.1
On Mon, Dec 12, 2022 at 12:35:57PM -0800, Hugh Dickins wrote:
On Mon, 12 Dec 2022, Greg Kroah-Hartman wrote:
From: Alex Shi alex.shi@linux.alibaba.com
[ Upstream commit 3db19aa39bac33f2e850fa1ddd67be29b192e51f ]
In the func munlock_vma_page, comments mentained lru_lock needed for serialization with split_huge_pages. But the page must be PageLocked as well as pages in split_huge_page series funcs. Thus the PageLocked is enough to serialize both funcs.
Further more, Hugh Dickins pointed: before splitting in split_huge_page_to_list, the page was unmap_page() to remove pmd/ptes which protect the page from munlock. Thus, no needs to guard __split_huge_page_tail for mlock clean, just keep the lru_lock there for isolation purpose.
LKP found a preempt issue on __mod_zone_page_state which need change to mod_zone_page_state. Thanks!
Link: https://lkml.kernel.org/r/1604566549-62481-13-git-send-email-alex.shi@linux.... Signed-off-by: Alex Shi alex.shi@linux.alibaba.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Johannes Weiner hannes@cmpxchg.org Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alexander Duyck alexander.duyck@gmail.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: "Chen, Rong A" rong.a.chen@intel.com Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Jann Horn jannh@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kirill A. Shutemov kirill@shutemov.name Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Michal Hocko mhocko@kernel.org Cc: Michal Hocko mhocko@suse.com Cc: Mika Penttilä mika.penttila@nextfour.com Cc: Minchan Kim minchan@kernel.org Cc: Shakeel Butt shakeelb@google.com Cc: Tejun Heo tj@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vladimir Davydov vdavydov.dev@gmail.com Cc: Wei Yang richard.weiyang@gmail.com Cc: Yang Shi yang.shi@linux.alibaba.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Stable-dep-of: 829ae0f81ce0 ("mm: migrate: fix THP's mapcount on isolation") Signed-off-by: Sasha Levin sashal@kernel.org
NAK from me to patches 001 through 007 here: 001 through 006 are a risky subset of patches and followups to a per-memcg per-node lru_lock series from Alex Shi, which made subtle changes to locking, memcg charging, lru management, page migration etc.
The whole series could be backported to 5.10 (I did so myself for internal usage), but cherry-picking parts of it into 5.10-stable is misguided and contrary to stable principles.
Maybe there is in fact nothing wrong with the selection made: but then give linux-mm guys two or three weeks to review and test and give the thumbs up to that selection.
Much easier, quicker and safer would be to adjust 007 (I presume the reason behind 001 through 006) to fit the 5.10-stable tree: I can do that myself if you ask, but not until later this week.
All now dropped, thanks.
greg k-h
From: Alex Shi alex.shi@linux.alibaba.com
[ Upstream commit 13805a88a9bd3fb37f33dd8972d904de62796f3d ]
__munlock_isolate_lru_page() only has one caller, remove it to clean up and simplify code.
Link: https://lkml.kernel.org/r/1604566549-62481-14-git-send-email-alex.shi@linux.... Signed-off-by: Alex Shi alex.shi@linux.alibaba.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Johannes Weiner hannes@cmpxchg.org Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alexander Duyck alexander.duyck@gmail.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: "Chen, Rong A" rong.a.chen@intel.com Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Jann Horn jannh@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kirill A. Shutemov kirill@shutemov.name Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Michal Hocko mhocko@kernel.org Cc: Michal Hocko mhocko@suse.com Cc: Mika Penttilä mika.penttila@nextfour.com Cc: Minchan Kim minchan@kernel.org Cc: Shakeel Butt shakeelb@google.com Cc: Tejun Heo tj@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vladimir Davydov vdavydov.dev@gmail.com Cc: Wei Yang richard.weiyang@gmail.com Cc: Yang Shi yang.shi@linux.alibaba.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Stable-dep-of: 829ae0f81ce0 ("mm: migrate: fix THP's mapcount on isolation") Signed-off-by: Sasha Levin sashal@kernel.org --- mm/mlock.c | 31 +++++++++---------------------- 1 file changed, 9 insertions(+), 22 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c index 796c726a0407..d487aa864e86 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -105,26 +105,6 @@ void mlock_vma_page(struct page *page) } }
-/* - * Isolate a page from LRU with optional get_page() pin. - * Assumes lru_lock already held and page already pinned. - */ -static bool __munlock_isolate_lru_page(struct page *page, bool getpage) -{ - if (PageLRU(page)) { - struct lruvec *lruvec; - - lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (getpage) - get_page(page); - ClearPageLRU(page); - del_page_from_lru_list(page, lruvec, page_lru(page)); - return true; - } - - return false; -} - /* * Finish munlock after successful page isolation * @@ -296,9 +276,16 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - if (__munlock_isolate_lru_page(page, false)) + if (PageLRU(page)) { + struct lruvec *lruvec; + + ClearPageLRU(page); + lruvec = mem_cgroup_page_lruvec(page, + page_pgdat(page)); + del_page_from_lru_list(page, lruvec, + page_lru(page)); continue; - else + } else __munlock_isolation_failed(page); } else { delta_munlocked++;
From: Alex Shi alex.shi@linux.alibaba.com
[ Upstream commit d25b5bd8a8f420b15517c19c4626c0c009f72a63 ]
Currently lru_lock still guards both lru list and page's lru bit, that's ok. but if we want to use specific lruvec lock on the page, we need to pin down the page's lruvec/memcg during locking. Just taking lruvec lock first may be undermined by the page's memcg charge/migration. To fix this problem, we will clear the lru bit out of locking and use it as pin down action to block the page isolation in memcg changing.
So now a standard steps of page isolation is following: 1, get_page(); #pin the page avoid to be free 2, TestClearPageLRU(); #block other isolation like memcg change 3, spin_lock on lru_lock; #serialize lru list access 4, delete page from lru list;
This patch start with the first part: TestClearPageLRU, which combines PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This function will be used as page isolation precondition to prevent other isolations some where else. Then there are may !PageLRU page on lru list, need to remove BUG() checking accordingly.
There 2 rules for lru bit now: 1, the lru bit still indicate if a page on lru list, just in some temporary moment(isolating), the page may have no lru bit when it's on lru list. but the page still must be on lru list when the lru bit set. 2, have to remove lru bit before delete it from lru list.
As Andrew Morton mentioned this change would dirty cacheline for a page which isn't on the LRU. But the loss would be acceptable in Rong Chen rong.a.chen@intel.com report: https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/
Link: https://lkml.kernel.org/r/1604566549-62481-15-git-send-email-alex.shi@linux.... Suggested-by: Johannes Weiner hannes@cmpxchg.org Signed-off-by: Alex Shi alex.shi@linux.alibaba.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Johannes Weiner hannes@cmpxchg.org Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Michal Hocko mhocko@kernel.org Cc: Vladimir Davydov vdavydov.dev@gmail.com Cc: Alexander Duyck alexander.duyck@gmail.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Jann Horn jannh@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Kirill A. Shutemov kirill@shutemov.name Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Michal Hocko mhocko@suse.com Cc: Mika Penttilä mika.penttila@nextfour.com Cc: Minchan Kim minchan@kernel.org Cc: Shakeel Butt shakeelb@google.com Cc: Tejun Heo tj@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Wei Yang richard.weiyang@gmail.com Cc: Yang Shi yang.shi@linux.alibaba.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Stable-dep-of: 829ae0f81ce0 ("mm: migrate: fix THP's mapcount on isolation") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/page-flags.h | 1 + mm/mlock.c | 3 +-- mm/vmscan.c | 39 +++++++++++++++++++------------------- 3 files changed, 21 insertions(+), 22 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 4f6ba9379112..14a0cac9e099 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -335,6 +335,7 @@ PAGEFLAG(Referenced, referenced, PF_HEAD) PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD) __CLEARPAGEFLAG(Dirty, dirty, PF_HEAD) PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD) + TESTCLEARFLAG(LRU, lru, PF_HEAD) PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD) TESTCLEARFLAG(Active, active, PF_HEAD) PAGEFLAG(Workingset, workingset, PF_HEAD) diff --git a/mm/mlock.c b/mm/mlock.c index d487aa864e86..7b0e6334be6f 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -276,10 +276,9 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - if (PageLRU(page)) { + if (TestClearPageLRU(page)) { struct lruvec *lruvec;
- ClearPageLRU(page); lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); del_page_from_lru_list(page, lruvec, diff --git a/mm/vmscan.c b/mm/vmscan.c index 51ccd80e70b6..8d62eedfc794 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1547,7 +1547,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, */ int __isolate_lru_page(struct page *page, isolate_mode_t mode) { - int ret = -EINVAL; + int ret = -EBUSY;
/* Only take pages on the LRU. */ if (!PageLRU(page)) @@ -1557,8 +1557,6 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) return ret;
- ret = -EBUSY; - /* * To minimise LRU disruption, the caller can indicate that it only * wants to isolate pages it will be able to operate on without @@ -1605,8 +1603,10 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) * sure the page is not being freed elsewhere -- the * page release code relies on it. */ - ClearPageLRU(page); - ret = 0; + if (TestClearPageLRU(page)) + ret = 0; + else + put_page(page); }
return ret; @@ -1672,8 +1672,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, page = lru_to_page(src); prefetchw_prev_lru_page(page, src, flags);
- VM_BUG_ON_PAGE(!PageLRU(page), page); - nr_pages = compound_nr(page); total_scan += nr_pages;
@@ -1770,21 +1768,18 @@ int isolate_lru_page(struct page *page) VM_BUG_ON_PAGE(!page_count(page), page); WARN_RATELIMIT(PageTail(page), "trying to isolate tail page");
- if (PageLRU(page)) { + if (TestClearPageLRU(page)) { pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec;
- spin_lock_irq(&pgdat->lru_lock); + get_page(page); lruvec = mem_cgroup_page_lruvec(page, pgdat); - if (PageLRU(page)) { - int lru = page_lru(page); - get_page(page); - ClearPageLRU(page); - del_page_from_lru_list(page, lruvec, lru); - ret = 0; - } + spin_lock_irq(&pgdat->lru_lock); + del_page_from_lru_list(page, lruvec, page_lru(page)); spin_unlock_irq(&pgdat->lru_lock); + ret = 0; } + return ret; }
@@ -4291,6 +4286,10 @@ void check_move_unevictable_pages(struct pagevec *pvec) nr_pages = thp_nr_pages(page); pgscanned += nr_pages;
+ /* block memcg migration during page moving between lru */ + if (!TestClearPageLRU(page)) + continue; + if (pagepgdat != pgdat) { if (pgdat) spin_unlock_irq(&pgdat->lru_lock); @@ -4299,10 +4298,7 @@ void check_move_unevictable_pages(struct pagevec *pvec) } lruvec = mem_cgroup_page_lruvec(page, pgdat);
- if (!PageLRU(page) || !PageUnevictable(page)) - continue; - - if (page_evictable(page)) { + if (page_evictable(page) && PageUnevictable(page)) { enum lru_list lru = page_lru_base_type(page);
VM_BUG_ON_PAGE(PageActive(page), page); @@ -4311,12 +4307,15 @@ void check_move_unevictable_pages(struct pagevec *pvec) add_page_to_lru_list(page, lruvec, lru); pgrescued += nr_pages; } + SetPageLRU(page); }
if (pgdat) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); spin_unlock_irq(&pgdat->lru_lock); + } else if (pgscanned) { + count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); } } EXPORT_SYMBOL_GPL(check_move_unevictable_pages);
From: Alex Shi alex.shi@linux.alibaba.com
[ Upstream commit 9df41314390b81a541ca6e84c8340bad0959e4b5 ]
Currently, compaction would get the lru_lock and then do page isolation which works fine with pgdat->lru_lock, since any page isoltion would compete for the lru_lock. If we want to change to memcg lru_lock, we have to isolate the page before getting lru_lock, thus isoltion would block page's memcg change which relay on page isoltion too. Then we could safely use per memcg lru_lock later.
The new page isolation use previous introduced TestClearPageLRU() + pgdat lru locking which will be changed to memcg lru lock later.
Hugh Dickins hughd@google.com fixed following bugs in this patch's early version:
Fix lots of crashes under compaction load: isolate_migratepages_block() must clean up appropriately when rejecting a page, setting PageLRU again if it had been cleared; and a put_page() after get_page_unless_zero() cannot safely be done while holding locked_lruvec - it may turn out to be the final put_page(), which will take an lruvec lock when PageLRU.
And move __isolate_lru_page_prepare back after get_page_unless_zero to make trylock_page() safe: trylock_page() is not safe to use at this time: its setting PG_locked can race with the page being freed or allocated ("Bad page"), and can also erase flags being set by one of those "sole owners" of a freshly allocated page who use non-atomic __SetPageFlag().
Link: https://lkml.kernel.org/r/1604566549-62481-16-git-send-email-alex.shi@linux.... Suggested-by: Johannes Weiner hannes@cmpxchg.org Signed-off-by: Alex Shi alex.shi@linux.alibaba.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Johannes Weiner hannes@cmpxchg.org Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Matthew Wilcox willy@infradead.org Cc: Alexander Duyck alexander.duyck@gmail.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: "Chen, Rong A" rong.a.chen@intel.com Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Jann Horn jannh@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Kirill A. Shutemov kirill@shutemov.name Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Mel Gorman mgorman@techsingularity.net Cc: Michal Hocko mhocko@kernel.org Cc: Michal Hocko mhocko@suse.com Cc: Mika Penttilä mika.penttila@nextfour.com Cc: Minchan Kim minchan@kernel.org Cc: Shakeel Butt shakeelb@google.com Cc: Tejun Heo tj@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vladimir Davydov vdavydov.dev@gmail.com Cc: Wei Yang richard.weiyang@gmail.com Cc: Yang Shi yang.shi@linux.alibaba.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Stable-dep-of: 829ae0f81ce0 ("mm: migrate: fix THP's mapcount on isolation") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/swap.h | 2 +- mm/compaction.c | 42 +++++++++++++++++++++++++++++++++--------- mm/vmscan.c | 43 ++++++++++++++++++++++--------------------- 3 files changed, 56 insertions(+), 31 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h index fbc6805358da..3577d3a6ec37 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -358,7 +358,7 @@ extern void lru_cache_add_inactive_or_unevictable(struct page *page, extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask); -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode); +extern int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode); extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, diff --git a/mm/compaction.c b/mm/compaction.c index 8dfbe86bd74f..ba3e907f03b7 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -890,6 +890,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) { if (!cc->ignore_skip_hint && get_pageblock_skip(page)) { low_pfn = end_pfn; + page = NULL; goto isolate_abort; } valid_page = page; @@ -971,6 +972,21 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) goto isolate_fail;
+ /* + * Be careful not to clear PageLRU until after we're + * sure the page is not being freed elsewhere -- the + * page release code relies on it. + */ + if (unlikely(!get_page_unless_zero(page))) + goto isolate_fail; + + if (__isolate_lru_page_prepare(page, isolate_mode) != 0) + goto isolate_fail_put; + + /* Try isolate the page */ + if (!TestClearPageLRU(page)) + goto isolate_fail_put; + /* If we already hold the lock, we can skip some rechecking */ if (!locked) { locked = compact_lock_irqsave(&pgdat->lru_lock, @@ -983,10 +999,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, goto isolate_abort; }
- /* Recheck PageLRU and PageCompound under lock */ - if (!PageLRU(page)) - goto isolate_fail; - /* * Page become compound since the non-locked check, * and it's on LRU. It can only be a THP so the order @@ -994,16 +1006,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, */ if (unlikely(PageCompound(page) && !cc->alloc_contig)) { low_pfn += compound_nr(page) - 1; - goto isolate_fail; + SetPageLRU(page); + goto isolate_fail_put; } }
lruvec = mem_cgroup_page_lruvec(page, pgdat);
- /* Try isolate the page */ - if (__isolate_lru_page(page, isolate_mode) != 0) - goto isolate_fail; - /* The whole page is taken off the LRU; skip the tail pages. */ if (PageCompound(page)) low_pfn += compound_nr(page) - 1; @@ -1032,6 +1041,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, }
continue; + +isolate_fail_put: + /* Avoid potential deadlock in freeing page under lru_lock */ + if (locked) { + spin_unlock_irqrestore(&pgdat->lru_lock, flags); + locked = false; + } + put_page(page); + isolate_fail: if (!skip_on_failure) continue; @@ -1068,9 +1086,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (unlikely(low_pfn > end_pfn)) low_pfn = end_pfn;
+ page = NULL; + isolate_abort: if (locked) spin_unlock_irqrestore(&pgdat->lru_lock, flags); + if (page) { + SetPageLRU(page); + put_page(page); + }
/* * Updated the cached scanner pfn once the pageblock has been scanned diff --git a/mm/vmscan.c b/mm/vmscan.c index 8d62eedfc794..5ada402c8d95 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1545,7 +1545,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, * * returns 0 on success, -ve errno on failure. */ -int __isolate_lru_page(struct page *page, isolate_mode_t mode) +int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) { int ret = -EBUSY;
@@ -1597,22 +1597,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) if ((mode & ISOLATE_UNMAPPED) && page_mapped(page)) return ret;
- if (likely(get_page_unless_zero(page))) { - /* - * Be careful not to clear PageLRU until after we're - * sure the page is not being freed elsewhere -- the - * page release code relies on it. - */ - if (TestClearPageLRU(page)) - ret = 0; - else - put_page(page); - } - - return ret; + return 0; }
- /* * Update LRU sizes after isolating pages. The LRU size updates must * be complete before mem_cgroup_update_lru_size due to a sanity check. @@ -1692,20 +1679,34 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, * only when the page is being freed somewhere else. */ scan += nr_pages; - switch (__isolate_lru_page(page, mode)) { + switch (__isolate_lru_page_prepare(page, mode)) { case 0: + /* + * Be careful not to clear PageLRU until after we're + * sure the page is not being freed elsewhere -- the + * page release code relies on it. + */ + if (unlikely(!get_page_unless_zero(page))) + goto busy; + + if (!TestClearPageLRU(page)) { + /* + * This page may in other isolation path, + * but we still hold lru_lock. + */ + put_page(page); + goto busy; + } + nr_taken += nr_pages; nr_zone_taken[page_zonenum(page)] += nr_pages; list_move(&page->lru, dst); break;
- case -EBUSY: + default: +busy: /* else it is being freed elsewhere */ list_move(&page->lru, src); - continue; - - default: - BUG(); } }
From: Alex Shi alex.shi@linux.alibaba.com
[ Upstream commit c2135f7c570bc274035834848d9bf46ea89ba763 ]
The function just returns 2 results, so using a 'switch' to deal with its result is unnecessary. Also simplify it to a bool func as Vlastimil suggested.
Also remove 'goto' by reusing list_move(), and take Matthew Wilcox's suggestion to update comments in function.
Link: https://lkml.kernel.org/r/728874d7-2d93-4049-68c1-dcc3b2d52ccd@linux.alibaba... Signed-off-by: Alex Shi alex.shi@linux.alibaba.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Matthew Wilcox willy@infradead.org Cc: Hugh Dickins hughd@google.com Cc: Yu Zhao yuzhao@google.com Cc: Michal Hocko mhocko@suse.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Stable-dep-of: 829ae0f81ce0 ("mm: migrate: fix THP's mapcount on isolation") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/swap.h | 2 +- mm/compaction.c | 2 +- mm/vmscan.c | 68 ++++++++++++++++++++------------------------ 3 files changed, 33 insertions(+), 39 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h index 3577d3a6ec37..394d5de5d4b4 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -358,7 +358,7 @@ extern void lru_cache_add_inactive_or_unevictable(struct page *page, extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask); -extern int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode); +extern bool __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode); extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, diff --git a/mm/compaction.c b/mm/compaction.c index ba3e907f03b7..ea46aadc7c21 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -980,7 +980,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (unlikely(!get_page_unless_zero(page))) goto isolate_fail;
- if (__isolate_lru_page_prepare(page, isolate_mode) != 0) + if (!__isolate_lru_page_prepare(page, isolate_mode)) goto isolate_fail_put;
/* Try isolate the page */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 5ada402c8d95..00a47845a15b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1543,19 +1543,17 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, * page: page to consider * mode: one of the LRU isolation modes defined above * - * returns 0 on success, -ve errno on failure. + * returns true on success, false on failure. */ -int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) +bool __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) { - int ret = -EBUSY; - /* Only take pages on the LRU. */ if (!PageLRU(page)) - return ret; + return false;
/* Compaction should not handle unevictable pages but CMA can do so */ if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) - return ret; + return false;
/* * To minimise LRU disruption, the caller can indicate that it only @@ -1568,7 +1566,7 @@ int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) if (mode & ISOLATE_ASYNC_MIGRATE) { /* All the caller can do on PageWriteback is block */ if (PageWriteback(page)) - return ret; + return false;
if (PageDirty(page)) { struct address_space *mapping; @@ -1584,20 +1582,20 @@ int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) * from the page cache. */ if (!trylock_page(page)) - return ret; + return false;
mapping = page_mapping(page); migrate_dirty = !mapping || mapping->a_ops->migratepage; unlock_page(page); if (!migrate_dirty) - return ret; + return false; } }
if ((mode & ISOLATE_UNMAPPED) && page_mapped(page)) - return ret; + return false;
- return 0; + return true; }
/* @@ -1679,35 +1677,31 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, * only when the page is being freed somewhere else. */ scan += nr_pages; - switch (__isolate_lru_page_prepare(page, mode)) { - case 0: - /* - * Be careful not to clear PageLRU until after we're - * sure the page is not being freed elsewhere -- the - * page release code relies on it. - */ - if (unlikely(!get_page_unless_zero(page))) - goto busy; - - if (!TestClearPageLRU(page)) { - /* - * This page may in other isolation path, - * but we still hold lru_lock. - */ - put_page(page); - goto busy; - } - - nr_taken += nr_pages; - nr_zone_taken[page_zonenum(page)] += nr_pages; - list_move(&page->lru, dst); - break; + if (!__isolate_lru_page_prepare(page, mode)) { + /* It is being freed elsewhere */ + list_move(&page->lru, src); + continue; + } + /* + * Be careful not to clear PageLRU until after we're + * sure the page is not being freed elsewhere -- the + * page release code relies on it. + */ + if (unlikely(!get_page_unless_zero(page))) { + list_move(&page->lru, src); + continue; + }
- default: -busy: - /* else it is being freed elsewhere */ + if (!TestClearPageLRU(page)) { + /* Another thread is already isolating this page */ + put_page(page); list_move(&page->lru, src); + continue; } + + nr_taken += nr_pages; + nr_zone_taken[page_zonenum(page)] += nr_pages; + list_move(&page->lru, dst); }
/*
From: Hugh Dickins hughd@google.com
[ Upstream commit 89f6c88a6ab4a11deb14c270f7f1454cda4f73d6 ]
__isolate_lru_page_prepare() conflates two unrelated functions, with the flags to one disjoint from the flags to the other; and hides some of the important checks outside of isolate_migratepages_block(), where the sequence is better to be visible. It comes from the days of lumpy reclaim, before compaction, when the combination made more sense.
Move what's needed by mm/compaction.c isolate_migratepages_block() inline there, and what's needed by mm/vmscan.c isolate_lru_pages() inline there.
Shorten "isolate_mode" to "mode", so the sequence of conditions is easier to read. Declare a "mapping" variable, to save one call to page_mapping() (but not another: calling again after page is locked is necessary). Simplify isolate_lru_pages() with a "move_to" list pointer.
Link: https://lkml.kernel.org/r/879d62a8-91cc-d3c6-fb3b-69768236df68@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: David Rientjes rientjes@google.com Reviewed-by: Alex Shi alexs@kernel.org Cc: Alexander Duyck alexander.duyck@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Stable-dep-of: 829ae0f81ce0 ("mm: migrate: fix THP's mapcount on isolation") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/swap.h | 1 - mm/compaction.c | 51 +++++++++++++++++++--- mm/vmscan.c | 101 ++++++++----------------------------------- 3 files changed, 62 insertions(+), 91 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h index 394d5de5d4b4..a502928c29c5 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -358,7 +358,6 @@ extern void lru_cache_add_inactive_or_unevictable(struct page *page, extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask); -extern bool __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode); extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, diff --git a/mm/compaction.c b/mm/compaction.c index ea46aadc7c21..57ce6b001b10 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -784,7 +784,7 @@ static bool too_many_isolated(pg_data_t *pgdat) * @cc: Compaction control structure. * @low_pfn: The first PFN to isolate * @end_pfn: The one-past-the-last PFN to isolate, within same pageblock - * @isolate_mode: Isolation mode to be used. + * @mode: Isolation mode to be used. * * Isolate all pages that can be migrated from the range specified by * [low_pfn, end_pfn). The range is expected to be within same pageblock. @@ -798,7 +798,7 @@ static bool too_many_isolated(pg_data_t *pgdat) */ static unsigned long isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, - unsigned long end_pfn, isolate_mode_t isolate_mode) + unsigned long end_pfn, isolate_mode_t mode) { pg_data_t *pgdat = cc->zone->zone_pgdat; unsigned long nr_scanned = 0, nr_isolated = 0; @@ -806,6 +806,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, unsigned long flags = 0; bool locked = false; struct page *page = NULL, *valid_page = NULL; + struct address_space *mapping; unsigned long start_pfn = low_pfn; bool skip_on_failure = false; unsigned long next_skip_pfn = 0; @@ -949,7 +950,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, locked = false; }
- if (!isolate_movable_page(page, isolate_mode)) + if (!isolate_movable_page(page, mode)) goto isolate_success; }
@@ -961,15 +962,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, * so avoid taking lru_lock and isolating it unnecessarily in an * admittedly racy check. */ - if (!page_mapping(page) && - page_count(page) > page_mapcount(page)) + mapping = page_mapping(page); + if (!mapping && page_count(page) > page_mapcount(page)) goto isolate_fail;
/* * Only allow to migrate anonymous pages in GFP_NOFS context * because those do not depend on fs locks. */ - if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) + if (!(cc->gfp_mask & __GFP_FS) && mapping) goto isolate_fail;
/* @@ -980,9 +981,45 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (unlikely(!get_page_unless_zero(page))) goto isolate_fail;
- if (!__isolate_lru_page_prepare(page, isolate_mode)) + /* Only take pages on LRU: a check now makes later tests safe */ + if (!PageLRU(page)) + goto isolate_fail_put; + + /* Compaction might skip unevictable pages but CMA takes them */ + if (!(mode & ISOLATE_UNEVICTABLE) && PageUnevictable(page)) + goto isolate_fail_put; + + /* + * To minimise LRU disruption, the caller can indicate with + * ISOLATE_ASYNC_MIGRATE that it only wants to isolate pages + * it will be able to migrate without blocking - clean pages + * for the most part. PageWriteback would require blocking. + */ + if ((mode & ISOLATE_ASYNC_MIGRATE) && PageWriteback(page)) goto isolate_fail_put;
+ if ((mode & ISOLATE_ASYNC_MIGRATE) && PageDirty(page)) { + bool migrate_dirty; + + /* + * Only pages without mappings or that have a + * ->migratepage callback are possible to migrate + * without blocking. However, we can be racing with + * truncation so it's necessary to lock the page + * to stabilise the mapping as truncation holds + * the page lock until after the page is removed + * from the page cache. + */ + if (!trylock_page(page)) + goto isolate_fail_put; + + mapping = page_mapping(page); + migrate_dirty = !mapping || mapping->a_ops->migratepage; + unlock_page(page); + if (!migrate_dirty) + goto isolate_fail_put; + } + /* Try isolate the page */ if (!TestClearPageLRU(page)) goto isolate_fail_put; diff --git a/mm/vmscan.c b/mm/vmscan.c index 00a47845a15b..9cba0f890b33 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1535,69 +1535,6 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, return nr_reclaimed; }
-/* - * Attempt to remove the specified page from its LRU. Only take this page - * if it is of the appropriate PageActive status. Pages which are being - * freed elsewhere are also ignored. - * - * page: page to consider - * mode: one of the LRU isolation modes defined above - * - * returns true on success, false on failure. - */ -bool __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) -{ - /* Only take pages on the LRU. */ - if (!PageLRU(page)) - return false; - - /* Compaction should not handle unevictable pages but CMA can do so */ - if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) - return false; - - /* - * To minimise LRU disruption, the caller can indicate that it only - * wants to isolate pages it will be able to operate on without - * blocking - clean pages for the most part. - * - * ISOLATE_ASYNC_MIGRATE is used to indicate that it only wants to pages - * that it is possible to migrate without blocking - */ - if (mode & ISOLATE_ASYNC_MIGRATE) { - /* All the caller can do on PageWriteback is block */ - if (PageWriteback(page)) - return false; - - if (PageDirty(page)) { - struct address_space *mapping; - bool migrate_dirty; - - /* - * Only pages without mappings or that have a - * ->migratepage callback are possible to migrate - * without blocking. However, we can be racing with - * truncation so it's necessary to lock the page - * to stabilise the mapping as truncation holds - * the page lock until after the page is removed - * from the page cache. - */ - if (!trylock_page(page)) - return false; - - mapping = page_mapping(page); - migrate_dirty = !mapping || mapping->a_ops->migratepage; - unlock_page(page); - if (!migrate_dirty) - return false; - } - } - - if ((mode & ISOLATE_UNMAPPED) && page_mapped(page)) - return false; - - return true; -} - /* * Update LRU sizes after isolating pages. The LRU size updates must * be complete before mem_cgroup_update_lru_size due to a sanity check. @@ -1647,11 +1584,11 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, unsigned long skipped = 0; unsigned long scan, total_scan, nr_pages; LIST_HEAD(pages_skipped); - isolate_mode_t mode = (sc->may_unmap ? 0 : ISOLATE_UNMAPPED);
total_scan = 0; scan = 0; while (scan < nr_to_scan && !list_empty(src)) { + struct list_head *move_to = src; struct page *page;
page = lru_to_page(src); @@ -1661,9 +1598,9 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, total_scan += nr_pages;
if (page_zonenum(page) > sc->reclaim_idx) { - list_move(&page->lru, &pages_skipped); nr_skipped[page_zonenum(page)] += nr_pages; - continue; + move_to = &pages_skipped; + goto move; }
/* @@ -1671,37 +1608,34 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, * return with no isolated pages if the LRU mostly contains * ineligible pages. This causes the VM to not reclaim any * pages, triggering a premature OOM. - * - * Account all tail pages of THP. This would not cause - * premature OOM since __isolate_lru_page() returns -EBUSY - * only when the page is being freed somewhere else. + * Account all tail pages of THP. */ scan += nr_pages; - if (!__isolate_lru_page_prepare(page, mode)) { - /* It is being freed elsewhere */ - list_move(&page->lru, src); - continue; - } + + if (!PageLRU(page)) + goto move; + if (!sc->may_unmap && page_mapped(page)) + goto move; + /* * Be careful not to clear PageLRU until after we're * sure the page is not being freed elsewhere -- the * page release code relies on it. */ - if (unlikely(!get_page_unless_zero(page))) { - list_move(&page->lru, src); - continue; - } + if (unlikely(!get_page_unless_zero(page))) + goto move;
if (!TestClearPageLRU(page)) { /* Another thread is already isolating this page */ put_page(page); - list_move(&page->lru, src); - continue; + goto move; }
nr_taken += nr_pages; nr_zone_taken[page_zonenum(page)] += nr_pages; - list_move(&page->lru, dst); + move_to = dst; +move: + list_move(&page->lru, move_to); }
/* @@ -1725,7 +1659,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, } *nr_scanned = total_scan; trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, - total_scan, skipped, nr_taken, mode, lru); + total_scan, skipped, nr_taken, + sc->may_unmap ? 0 : ISOLATE_UNMAPPED, lru); update_lru_sizes(lruvec, lru, nr_zone_taken); return nr_taken; }
From: Gavin Shan gshan@redhat.com
[ Upstream commit 829ae0f81ce093d674ff2256f66a714753e9ce32 ]
The issue is reported when removing memory through virtio_mem device. The transparent huge page, experienced copy-on-write fault, is wrongly regarded as pinned. The transparent huge page is escaped from being isolated in isolate_migratepages_block(). The transparent huge page can't be migrated and the corresponding memory block can't be put into offline state.
Fix it by replacing page_mapcount() with total_mapcount(). With this, the transparent huge page can be isolated and migrated, and the memory block can be put into offline state. Besides, The page's refcount is increased a bit earlier to avoid the page is released when the check is executed.
Link: https://lkml.kernel.org/r/20221124095523.31061-1-gshan@redhat.com Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations") Signed-off-by: Gavin Shan gshan@redhat.com Reported-by: Zhenyu Zhang zhenyzha@redhat.com Tested-by: Zhenyu Zhang zhenyzha@redhat.com Suggested-by: David Hildenbrand david@redhat.com Acked-by: David Hildenbrand david@redhat.com Cc: Alistair Popple apopple@nvidia.com Cc: Hugh Dickins hughd@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Matthew Wilcox willy@infradead.org Cc: William Kucharski william.kucharski@oracle.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org [5.7+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/compaction.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index 57ce6b001b10..54d1041560c7 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -957,29 +957,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, goto isolate_fail; }
+ /* + * Be careful not to clear PageLRU until after we're + * sure the page is not being freed elsewhere -- the + * page release code relies on it. + */ + if (unlikely(!get_page_unless_zero(page))) + goto isolate_fail; + /* * Migration will fail if an anonymous page is pinned in memory, * so avoid taking lru_lock and isolating it unnecessarily in an * admittedly racy check. */ mapping = page_mapping(page); - if (!mapping && page_count(page) > page_mapcount(page)) - goto isolate_fail; + if (!mapping && (page_count(page) - 1) > total_mapcount(page)) + goto isolate_fail_put;
/* * Only allow to migrate anonymous pages in GFP_NOFS context * because those do not depend on fs locks. */ if (!(cc->gfp_mask & __GFP_FS) && mapping) - goto isolate_fail; - - /* - * Be careful not to clear PageLRU until after we're - * sure the page is not being freed elsewhere -- the - * page release code relies on it. - */ - if (unlikely(!get_page_unless_zero(page))) - goto isolate_fail; + goto isolate_fail_put;
/* Only take pages on LRU: a check now makes later tests safe */ if (!PageLRU(page))
From: FUKAUMI Naoki naoki@radxa.com
[ Upstream commit 849c19d14940b87332d5d59c7fc581d73f2099fd ]
I2S1 pins are exposed on 40-pin header on Radxa ROCK Pi 4 series. their default function is GPIO, so I2S1 need to be disabled.
Signed-off-by: FUKAUMI Naoki naoki@radxa.com Link: https://lore.kernel.org/r/20220924112812.1219-1-naoki@radxa.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi | 1 - 1 file changed, 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi index f121203081b9..64df64339119 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi @@ -448,7 +448,6 @@ &i2s1 { rockchip,playback-channels = <2>; rockchip,capture-channels = <2>; - status = "okay"; };
&i2s2 {
From: Sebastian Reichel sebastian.reichel@collabora.com
[ Upstream commit 17b57beafccb4569accbfc8c11390744cf59c021 ]
Fix the node name for hym8563 in all arm rockchip devicetrees.
Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Link: https://lore.kernel.org/r/20221024165549.74574-4-sebastian.reichel@collabora... Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/rk3036-evb.dts | 2 +- arch/arm/boot/dts/rk3288-evb-act8846.dts | 2 +- arch/arm/boot/dts/rk3288-firefly.dtsi | 2 +- arch/arm/boot/dts/rk3288-miqi.dts | 2 +- arch/arm/boot/dts/rk3288-rock2-square.dts | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/arm/boot/dts/rk3036-evb.dts b/arch/arm/boot/dts/rk3036-evb.dts index 2a7e6624efb9..ea23ba98625e 100644 --- a/arch/arm/boot/dts/rk3036-evb.dts +++ b/arch/arm/boot/dts/rk3036-evb.dts @@ -31,7 +31,7 @@ &i2c1 { status = "okay";
- hym8563: hym8563@51 { + hym8563: rtc@51 { compatible = "haoyu,hym8563"; reg = <0x51>; #clock-cells = <0>; diff --git a/arch/arm/boot/dts/rk3288-evb-act8846.dts b/arch/arm/boot/dts/rk3288-evb-act8846.dts index be695b8c1f67..8a635c243127 100644 --- a/arch/arm/boot/dts/rk3288-evb-act8846.dts +++ b/arch/arm/boot/dts/rk3288-evb-act8846.dts @@ -54,7 +54,7 @@ vin-supply = <&vcc_sys>; };
- hym8563@51 { + rtc@51 { compatible = "haoyu,hym8563"; reg = <0x51>;
diff --git a/arch/arm/boot/dts/rk3288-firefly.dtsi b/arch/arm/boot/dts/rk3288-firefly.dtsi index 7fb582302b32..c560afe3af78 100644 --- a/arch/arm/boot/dts/rk3288-firefly.dtsi +++ b/arch/arm/boot/dts/rk3288-firefly.dtsi @@ -233,7 +233,7 @@ vin-supply = <&vcc_sys>; };
- hym8563: hym8563@51 { + hym8563: rtc@51 { compatible = "haoyu,hym8563"; reg = <0x51>; #clock-cells = <0>; diff --git a/arch/arm/boot/dts/rk3288-miqi.dts b/arch/arm/boot/dts/rk3288-miqi.dts index cf54d5ffff2f..fe265a834e8e 100644 --- a/arch/arm/boot/dts/rk3288-miqi.dts +++ b/arch/arm/boot/dts/rk3288-miqi.dts @@ -157,7 +157,7 @@ vin-supply = <&vcc_sys>; };
- hym8563: hym8563@51 { + hym8563: rtc@51 { compatible = "haoyu,hym8563"; reg = <0x51>; #clock-cells = <0>; diff --git a/arch/arm/boot/dts/rk3288-rock2-square.dts b/arch/arm/boot/dts/rk3288-rock2-square.dts index c4d1d142d8c6..d5ef99ebbddc 100644 --- a/arch/arm/boot/dts/rk3288-rock2-square.dts +++ b/arch/arm/boot/dts/rk3288-rock2-square.dts @@ -165,7 +165,7 @@ };
&i2c0 { - hym8563: hym8563@51 { + hym8563: rtc@51 { compatible = "haoyu,hym8563"; reg = <0x51>; #clock-cells = <0>;
From: Johan Jonker jbx6244@gmail.com
[ Upstream commit dd847fe34cdf1e89afed1af24986359f13082bfb ]
Fix ir-receiver node names on Rockchip boards, so that they match with regex: '^ir(-receiver)?(@[a-f0-9]+)?$'
Signed-off-by: Johan Jonker jbx6244@gmail.com Link: https://lore.kernel.org/r/ea5af279-f44c-afea-023d-bb37f5a0d58d@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/rk3188-radxarock.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/rk3188-radxarock.dts b/arch/arm/boot/dts/rk3188-radxarock.dts index b0fef82c0a71..39b913f8d701 100644 --- a/arch/arm/boot/dts/rk3188-radxarock.dts +++ b/arch/arm/boot/dts/rk3188-radxarock.dts @@ -67,7 +67,7 @@ #sound-dai-cells = <0>; };
- ir_recv: gpio-ir-receiver { + ir_recv: ir-receiver { compatible = "gpio-ir-receiver"; gpios = <&gpio0 RK_PB2 GPIO_ACTIVE_LOW>; pinctrl-names = "default";
From: Johan Jonker jbx6244@gmail.com
[ Upstream commit de0d04b9780a23eb928aedfb6f981285f78d58e5 ]
Fix ir-receiver node names on Rockchip boards, so that they match with regex: '^ir(-receiver)?(@[a-f0-9]+)?$'
Signed-off-by: Johan Jonker jbx6244@gmail.com Link: https://lore.kernel.org/r/e9764253-8ce8-150b-4820-41f03f845469@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/rockchip/rk3308-roc-cc.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3308-roc-cc.dts b/arch/arm64/boot/dts/rockchip/rk3308-roc-cc.dts index fbcb9531cc70..213c0759c4b8 100644 --- a/arch/arm64/boot/dts/rockchip/rk3308-roc-cc.dts +++ b/arch/arm64/boot/dts/rockchip/rk3308-roc-cc.dts @@ -13,7 +13,7 @@ stdout-path = "serial2:1500000n8"; };
- ir_rx { + ir-receiver { compatible = "gpio-ir-receiver"; gpios = <&gpio0 RK_PC0 GPIO_ACTIVE_HIGH>; pinctrl-names = "default";
From: Johan Jonker jbx6244@gmail.com
[ Upstream commit 11871e20bcb23c00966e785a124fb72bc8340af4 ]
The lcdc1-rgb24 node name is out of line with the rest of the rk3188 lcdc1 node, so fix it.
Signed-off-by: Johan Jonker jbx6244@gmail.com Link: https://lore.kernel.org/r/7b9c0a6f-626b-07e8-ae74-7e0f08b8d241@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/rk3188.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/rk3188.dtsi b/arch/arm/boot/dts/rk3188.dtsi index b6bde9d12c2b..a837a9a34e3e 100644 --- a/arch/arm/boot/dts/rk3188.dtsi +++ b/arch/arm/boot/dts/rk3188.dtsi @@ -402,7 +402,7 @@ rockchip,pins = <2 RK_PD3 1 &pcfg_pull_none>; };
- lcdc1_rgb24: ldcd1-rgb24 { + lcdc1_rgb24: lcdc1-rgb24 { rockchip,pins = <2 RK_PA0 1 &pcfg_pull_none>, <2 RK_PA1 1 &pcfg_pull_none>, <2 RK_PA2 1 &pcfg_pull_none>,
From: Tomislav Novak tnovak@fb.com
[ Upstream commit 612695bccfdbd52004551308a55bae410e7cd22f ]
Store the frame address where arm_get_current_stackframe() looks for it (ARM_r7 instead of ARM_fp if CONFIG_THUMB2_KERNEL=y). Otherwise frame->fp gets set to 0, causing unwind_frame() to fail.
# bpftrace -e 't:sched:sched_switch { @[kstack] = count(); exit(); }' Attaching 1 probe... @[ __schedule+1059 ]: 1
A typical first unwind instruction is 0x97 (SP = R7), so after executing it SP ends up being 0 and -URC_FAILURE is returned.
unwind_frame(pc = ac9da7d7 lr = 00000000 sp = c69bdda0 fp = 00000000) unwind_find_idx(ac9da7d7) unwind_exec_insn: insn = 00000097 unwind_exec_insn: fp = 00000000 sp = 00000000 lr = 00000000 pc = 00000000
With this patch:
# bpftrace -e 't:sched:sched_switch { @[kstack] = count(); exit(); }' Attaching 1 probe... @[ __schedule+1059 __schedule+1059 schedule+79 schedule_hrtimeout_range_clock+163 schedule_hrtimeout_range+17 ep_poll+471 SyS_epoll_wait+111 sys_epoll_pwait+231 __ret_fast_syscall+1 ]: 1
Link: https://lore.kernel.org/r/20220920230728.2617421-1-tnovak@fb.com/
Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Tomislav Novak tnovak@fb.com Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/include/asm/perf_event.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/perf_event.h b/arch/arm/include/asm/perf_event.h index fe87397c3d8c..bdbc1e590891 100644 --- a/arch/arm/include/asm/perf_event.h +++ b/arch/arm/include/asm/perf_event.h @@ -17,7 +17,7 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
#define perf_arch_fetch_caller_regs(regs, __ip) { \ (regs)->ARM_pc = (__ip); \ - (regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \ + frame_pointer((regs)) = (unsigned long) __builtin_frame_address(0); \ (regs)->ARM_sp = current_stack_pointer; \ (regs)->ARM_cpsr = SVC_MODE; \ }
From: Giulio Benetti giulio.benetti@benettiengineering.com
[ Upstream commit 340a982825f76f1cff0daa605970fe47321b5ee7 ]
Actually in no-MMU SoCs(i.e. i.MXRT) ZERO_PAGE(vaddr) expands to ``` virt_to_page(0) ``` that in order expands to: ``` pfn_to_page(virt_to_pfn(0)) ``` and then virt_to_pfn(0) to: ``` ((((unsigned long)(0) - PAGE_OFFSET) >> PAGE_SHIFT) + PHYS_PFN_OFFSET) ``` where PAGE_OFFSET and PHYS_PFN_OFFSET are the DRAM offset(0x80000000) and PAGE_SHIFT is 12. This way we obtain 16MB(0x01000000) summed to the base of DRAM(0x80000000). When ZERO_PAGE(0) is then used, for example in bio_add_page(), the page gets an address that is out of DRAM bounds. So instead of using fake virtual page 0 let's allocate a dedicated zero_page during paging_init() and assign it to a global 'struct page * empty_zero_page' the same way mmu.c does and it's the same approach used in m68k with commit dc068f462179 as discussed here[0]. Then let's move ZERO_PAGE() definition to the top of pgtable.h to be in common between mmu.c and nommu.c.
[0]: https://lore.kernel.org/linux-m68k/2a462b23-5b8e-bbf4-ec7d-778434a3b9d7@goog... ad140743174d6b3070364d3c9a5179b
Signed-off-by: Giulio Benetti giulio.benetti@benettiengineering.com Reviewed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/include/asm/pgtable-nommu.h | 6 ------ arch/arm/include/asm/pgtable.h | 16 +++++++++------- arch/arm/mm/nommu.c | 19 +++++++++++++++++++ 3 files changed, 28 insertions(+), 13 deletions(-)
diff --git a/arch/arm/include/asm/pgtable-nommu.h b/arch/arm/include/asm/pgtable-nommu.h index d16aba48fa0a..090011394477 100644 --- a/arch/arm/include/asm/pgtable-nommu.h +++ b/arch/arm/include/asm/pgtable-nommu.h @@ -44,12 +44,6 @@
typedef pte_t *pte_addr_t;
-/* - * ZERO_PAGE is a global shared page that is always zero: used - * for zero-mapped memory areas etc.. - */ -#define ZERO_PAGE(vaddr) (virt_to_page(0)) - /* * Mark the prot value as uncacheable and unbufferable. */ diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index c02f24400369..d38d503493cb 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -10,6 +10,15 @@ #include <linux/const.h> #include <asm/proc-fns.h>
+#ifndef __ASSEMBLY__ +/* + * ZERO_PAGE is a global shared page that is always zero: used + * for zero-mapped memory areas etc.. + */ +extern struct page *empty_zero_page; +#define ZERO_PAGE(vaddr) (empty_zero_page) +#endif + #ifndef CONFIG_MMU
#include <asm-generic/pgtable-nopud.h> @@ -156,13 +165,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, #define __S111 __PAGE_SHARED_EXEC
#ifndef __ASSEMBLY__ -/* - * ZERO_PAGE is a global shared page that is always zero: used - * for zero-mapped memory areas etc.. - */ -extern struct page *empty_zero_page; -#define ZERO_PAGE(vaddr) (empty_zero_page) -
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c index 8b3d7191e2b8..959f05701738 100644 --- a/arch/arm/mm/nommu.c +++ b/arch/arm/mm/nommu.c @@ -26,6 +26,13 @@
unsigned long vectors_base;
+/* + * empty_zero_page is a special page that is used for + * zero-initialized data and COW. + */ +struct page *empty_zero_page; +EXPORT_SYMBOL(empty_zero_page); + #ifdef CONFIG_ARM_MPU struct mpu_rgn_info mpu_rgn_info; #endif @@ -148,9 +155,21 @@ void __init adjust_lowmem_bounds(void) */ void __init paging_init(const struct machine_desc *mdesc) { + void *zero_page; + early_trap_init((void *)vectors_base); mpu_setup(); + + /* allocate the zero page. */ + zero_page = memblock_alloc(PAGE_SIZE, PAGE_SIZE); + if (!zero_page) + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", + __func__, PAGE_SIZE, PAGE_SIZE); + bootmem_init(); + + empty_zero_page = virt_to_page(zero_page); + flush_dcache_page(empty_zero_page); }
/*
From: Chancel Liu chancel.liu@nxp.com
[ Upstream commit 3ca507bf99611c82dafced73e921c1b10ee12869 ]
DSPCLK_DIV field in WM8962_CLOCKING1 register is used to generate correct frequency of LRCLK and BCLK. Sometimes the read-only value can't be updated timely after enabling SYSCLK. This results in wrong calculation values. Delay is introduced here to wait for newest value from register. The time of the delay should be at least 500~1000us according to test.
Signed-off-by: Chancel Liu chancel.liu@nxp.com Acked-by: Charles Keepax ckeepax@opensource.cirrus.com Link: https://lore.kernel.org/r/20221109121354.123958-1-chancel.liu@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/wm8962.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/sound/soc/codecs/wm8962.c b/sound/soc/codecs/wm8962.c index 21574447650c..57aeded978c2 100644 --- a/sound/soc/codecs/wm8962.c +++ b/sound/soc/codecs/wm8962.c @@ -2489,6 +2489,14 @@ static void wm8962_configure_bclk(struct snd_soc_component *component) snd_soc_component_update_bits(component, WM8962_CLOCKING2, WM8962_SYSCLK_ENA_MASK, WM8962_SYSCLK_ENA);
+ /* DSPCLK_DIV field in WM8962_CLOCKING1 register is used to generate + * correct frequency of LRCLK and BCLK. Sometimes the read-only value + * can't be updated timely after enabling SYSCLK. This results in wrong + * calculation values. Delay is introduced here to wait for newest + * value from register. The time of the delay should be at least + * 500~1000us according to test. + */ + usleep_range(500, 1000); dspclk = snd_soc_component_read(component, WM8962_CLOCKING1);
if (snd_soc_component_get_bias_level(component) != SND_SOC_BIAS_ON)
From: Johan Jonker jbx6244@gmail.com
[ Upstream commit da74858a475782a3f16470907814c8cc5950ad68 ]
The clock source and the sched_clock provided by the arm_global_timer on Rockchip rk3066a/rk3188 are quite unstable because their rates depend on the CPU frequency.
Recent changes to the arm_global_timer driver makes it impossible to use.
On the other side, the arm_global_timer has a higher rating than the ROCKCHIP_TIMER, it will be selected by default by the time framework while we want to use the stable Rockchip clock source.
Keep the arm_global_timer disabled in order to have the DW_APB_TIMER (rk3066a) or ROCKCHIP_TIMER (rk3188) selected by default.
Signed-off-by: Johan Jonker jbx6244@gmail.com Link: https://lore.kernel.org/r/f275ca8d-fd0a-26e5-b978-b7f3df815e0a@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/rk3188.dtsi | 1 - arch/arm/boot/dts/rk3xxx.dtsi | 7 +++++++ 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/rk3188.dtsi b/arch/arm/boot/dts/rk3188.dtsi index a837a9a34e3e..ddf23748ead4 100644 --- a/arch/arm/boot/dts/rk3188.dtsi +++ b/arch/arm/boot/dts/rk3188.dtsi @@ -630,7 +630,6 @@
&global_timer { interrupts = <GIC_PPI 11 (GIC_CPU_MASK_SIMPLE(4) | IRQ_TYPE_EDGE_RISING)>; - status = "disabled"; };
&local_timer { diff --git a/arch/arm/boot/dts/rk3xxx.dtsi b/arch/arm/boot/dts/rk3xxx.dtsi index 859a7477909f..5edc46a5585c 100644 --- a/arch/arm/boot/dts/rk3xxx.dtsi +++ b/arch/arm/boot/dts/rk3xxx.dtsi @@ -111,6 +111,13 @@ reg = <0x1013c200 0x20>; interrupts = <GIC_PPI 11 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_EDGE_RISING)>; clocks = <&cru CORE_PERI>; + status = "disabled"; + /* The clock source and the sched_clock provided by the arm_global_timer + * on Rockchip rk3066a/rk3188 are quite unstable because their rates + * depend on the CPU frequency. + * Keep the arm_global_timer disabled in order to have the + * DW_APB_TIMER (rk3066a) or ROCKCHIP_TIMER (rk3188) selected by default. + */ };
local_timer: local-timer@1013c600 {
From: GUO Zihua guozihua@huawei.com
[ Upstream commit 6854fadbeee10891ed74246bdc05031906b6c8cf ]
Cleanup hardcoded header sizes to use P9_HDRSZ instead of '7'
Link: https://lkml.kernel.org/r/20221117091159.31533-4-guozihua@huawei.com Signed-off-by: GUO Zihua guozihua@huawei.com Reviewed-by: Christian Schoenebeck linux_oss@crudebyte.com [Dominique: commit message adjusted to make sense after offset size adjustment got removed] Signed-off-by: Dominique Martinet asmadeus@codewreck.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/trans_fd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index deb66635f0f3..e070a0b8e5ca 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -118,7 +118,7 @@ struct p9_conn { struct list_head unsent_req_list; struct p9_req_t *rreq; struct p9_req_t *wreq; - char tmp_buf[7]; + char tmp_buf[P9_HDRSZ]; struct p9_fcall rc; int wpos; int wsize; @@ -291,7 +291,7 @@ static void p9_read_work(struct work_struct *work) if (!m->rc.sdata) { m->rc.sdata = m->tmp_buf; m->rc.offset = 0; - m->rc.capacity = 7; /* start by reading header */ + m->rc.capacity = P9_HDRSZ; /* start by reading header */ }
clear_bit(Rpending, &m->wsched); @@ -314,7 +314,7 @@ static void p9_read_work(struct work_struct *work) p9_debug(P9_DEBUG_TRANS, "got new header\n");
/* Header size */ - m->rc.size = 7; + m->rc.size = P9_HDRSZ; err = p9_parse_header(&m->rc, &m->rc.size, NULL, NULL, 0); if (err) { p9_debug(P9_DEBUG_ERROR,
From: Konrad Dybcio konrad.dybcio@linaro.org
[ Upstream commit 0b24dfa587c6cc7484cfb170da5c7dd73451f670 ]
Sony's downstream driver [1], among some other changes, adds a seemingly random 10ms usleep_range, which turned out to be necessary for the hardware to function properly on at least Sony Xperia 1 IV. Without this, I2C transactions with the SLG51000 straight up fail.
Relax (10-10ms -> 10-11ms) and add the aforementioned sleep to make sure the hardware has some time to wake up.
(nagara-2.0.0-mlc/vendor/semc/hardware/camera-kernel-module/) [1] https://developer.sony.com/file/download/open-source-archive-for-64-0-m-4-29...
Signed-off-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20221118131035.54874-1-konrad.dybcio@linaro.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/slg51000-regulator.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/regulator/slg51000-regulator.c b/drivers/regulator/slg51000-regulator.c index 75a941fb3c2b..1b2eee95ad3f 100644 --- a/drivers/regulator/slg51000-regulator.c +++ b/drivers/regulator/slg51000-regulator.c @@ -457,6 +457,8 @@ static int slg51000_i2c_probe(struct i2c_client *client) chip->cs_gpiod = cs_gpiod; }
+ usleep_range(10000, 11000); + i2c_set_clientdata(client, chip); chip->chip_irq = client->irq; chip->dev = dev;
From: Kees Cook keescook@chromium.org
[ Upstream commit 05530ef7cf7c7d700f6753f058999b1b5099a026 ]
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function pointer prototype to make sure the call target is valid to help mitigate ROP attacks. If they are not identical, there is a failure at run time, which manifests as either a kernel panic or thread getting killed.
seq_copy_in_user() and seq_copy_in_kernel() did not have prototypes matching snd_seq_dump_func_t. Adjust this and remove the casts. There are not resulting binary output differences.
This was found as a result of Clang's new -Wcast-function-type-strict flag, which is more sensitive than the simpler -Wcast-function-type, which only checks for type width mismatches.
Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/lkml/202211041527.HD8TLSE1-lkp@intel.com Cc: Jaroslav Kysela perex@perex.cz Cc: Takashi Iwai tiwai@suse.com Cc: "Gustavo A. R. Silva" gustavoars@kernel.org Cc: alsa-devel@alsa-project.org Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20221118232346.never.380-kees@kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/core/seq/seq_memory.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/sound/core/seq/seq_memory.c b/sound/core/seq/seq_memory.c index 65db1a7c77b7..bb76a2dd0a2f 100644 --- a/sound/core/seq/seq_memory.c +++ b/sound/core/seq/seq_memory.c @@ -112,15 +112,19 @@ EXPORT_SYMBOL(snd_seq_dump_var_event); * expand the variable length event to linear buffer space. */
-static int seq_copy_in_kernel(char **bufptr, const void *src, int size) +static int seq_copy_in_kernel(void *ptr, void *src, int size) { + char **bufptr = ptr; + memcpy(*bufptr, src, size); *bufptr += size; return 0; }
-static int seq_copy_in_user(char __user **bufptr, const void *src, int size) +static int seq_copy_in_user(void *ptr, void *src, int size) { + char __user **bufptr = ptr; + if (copy_to_user(*bufptr, src, size)) return -EFAULT; *bufptr += size; @@ -149,8 +153,7 @@ int snd_seq_expand_var_event(const struct snd_seq_event *event, int count, char return newlen; } err = snd_seq_dump_var_event(event, - in_kernel ? (snd_seq_dump_func_t)seq_copy_in_kernel : - (snd_seq_dump_func_t)seq_copy_in_user, + in_kernel ? seq_copy_in_kernel : seq_copy_in_user, &buf); return err < 0 ? err : newlen; }
From: Filipe Manana fdmanana@suse.com
[ Upstream commit a11452a3709e217492798cf3686ac2cc8eb3fb51 ]
When trying to see if we can clone a file range, there are cases where we end up sending two write operations in case the inode from the source root has an i_size that is not sector size aligned and the length from the current offset to its i_size is less than the remaining length we are trying to clone.
Issuing two write operations when we could instead issue a single write operation is not incorrect. However it is not optimal, specially if the extents are compressed and the flag BTRFS_SEND_FLAG_COMPRESSED was passed to the send ioctl. In that case we can end up sending an encoded write with an offset that is not sector size aligned, which makes the receiver fallback to decompressing the data and writing it using regular buffered IO (so re-compressing the data in case the fs is mounted with compression enabled), because encoded writes fail with -EINVAL when an offset is not sector size aligned.
The following example, which triggered a bug in the receiver code for the fallback logic of decompressing + regular buffer IO and is fixed by the patchset referred in a Link at the bottom of this changelog, is an example where we have the non-optimal behaviour due to an unaligned encoded write:
$ cat test.sh #!/bin/bash
DEV=/dev/sdj MNT=/mnt/sdj
mkfs.btrfs -f $DEV > /dev/null mount -o compress $DEV $MNT
# File foo has a size of 33K, not aligned to the sector size. xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo
xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar
# Now clone the first 32K of file bar into foo at offset 0. xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo
# Snapshot the default subvolume and create a full send stream (v2). btrfs subvolume snapshot -r $MNT $MNT/snap
btrfs send --compressed-data -f /tmp/test.send $MNT/snap
echo -e "\nFile bar in the original filesystem:" od -A d -t x1 $MNT/snap/bar
umount $MNT mkfs.btrfs -f $DEV > /dev/null mount $DEV $MNT
echo -e "\nReceiving stream in a new filesystem..." btrfs receive -f /tmp/test.send $MNT
echo -e "\nFile bar in the new filesystem:" od -A d -t x1 $MNT/snap/bar
umount $MNT
Before this patch, the send stream included one regular write and one encoded write for file 'bar', with the later being not sector size aligned and causing the receiver to fallback to decompression + buffered writes. The output of the btrfs receive command in verbose mode (-vvv):
(...) mkfile o258-7-0 rename o258-7-0 -> bar utimes clone bar - source=foo source offset=0 offset=0 length=32768 write bar - offset=32768 length=1024 encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0 encoded_write bar - falling back to decompress and write due to errno 22 ("Invalid argument") (...)
This patch avoids the regular write followed by an unaligned encoded write so that we end up sending a single encoded write that is aligned. So after this patch the stream content is (output of btrfs receive -vvv):
(...) mkfile o258-7-0 rename o258-7-0 -> bar utimes clone bar - source=foo source offset=0 offset=0 length=32768 encoded_write bar - offset=32768, len=4096, unencoded_offset=32768, unencoded_file_len=32768, unencoded_len=65536, compression=1, encryption=0 (...)
So we get more optimal behaviour and avoid the silent data loss bug in versions of btrfs-progs affected by the bug referred by the Link tag below (btrfs-progs v5.19, v5.19.1, v6.0 and v6.0.1).
Link: https://lore.kernel.org/linux-btrfs/cover.1668529099.git.fdmanana@suse.com/ Reviewed-by: Boris Burkov boris@bur.io Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/send.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 6b80dee17f49..4a6ba0997e39 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -5398,6 +5398,7 @@ static int clone_range(struct send_ctx *sctx, u64 ext_len; u64 clone_len; u64 clone_data_offset; + bool crossed_src_i_size = false;
if (slot >= btrfs_header_nritems(leaf)) { ret = btrfs_next_leaf(clone_root->root, path); @@ -5454,8 +5455,10 @@ static int clone_range(struct send_ctx *sctx, if (key.offset >= clone_src_i_size) break;
- if (key.offset + ext_len > clone_src_i_size) + if (key.offset + ext_len > clone_src_i_size) { ext_len = clone_src_i_size - key.offset; + crossed_src_i_size = true; + }
clone_data_offset = btrfs_file_extent_offset(leaf, ei); if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte) { @@ -5515,6 +5518,25 @@ static int clone_range(struct send_ctx *sctx, ret = send_clone(sctx, offset, clone_len, clone_root); } + } else if (crossed_src_i_size && clone_len < len) { + /* + * If we are at i_size of the clone source inode and we + * can not clone from it, terminate the loop. This is + * to avoid sending two write operations, one with a + * length matching clone_len and the final one after + * this loop with a length of len - clone_len. + * + * When using encoded writes (BTRFS_SEND_FLAG_COMPRESSED + * was passed to the send ioctl), this helps avoid + * sending an encoded write for an offset that is not + * sector size aligned, in case the i_size of the source + * inode is not sector size aligned. That will make the + * receiver fallback to decompression of the data and + * writing it using regular buffered IO, therefore while + * not incorrect, it's not optimal due decompression and + * possible re-compression at the receiver. + */ + break; } else { ret = send_extent_data(sctx, offset, clone_len); }
From: Srinivasa Rao Mandadapu quic_srivasam@quicinc.com
[ Upstream commit db8f91d424fe0ea6db337aca8bc05908bbce1498 ]
Add NULL check in dpcm_be_reparent API, to handle kernel NULL pointer dereference error. The issue occurred in fuzzing test.
Signed-off-by: Srinivasa Rao Mandadapu quic_srivasam@quicinc.com Link: https://lore.kernel.org/r/1669098673-29703-1-git-send-email-quic_srivasam@qu... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/soc-pcm.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c index 0e2261ee07b6..fb874f924bbe 100644 --- a/sound/soc/soc-pcm.c +++ b/sound/soc/soc-pcm.c @@ -1154,6 +1154,8 @@ static void dpcm_be_reparent(struct snd_soc_pcm_runtime *fe, return;
be_substream = snd_soc_dpcm_get_substream(be, stream); + if (!be_substream) + return;
for_each_dpcm_fe(be, stream, dpcm) { if (dpcm->fe == fe)
From: Andreas Kemnade andreas@kemnade.info
[ Upstream commit 31a6297b89aabc81b274c093a308a7f5b55081a7 ]
Status is reported as always off in the 6032 case. Status reporting now matches the logic in the setters. Once of the differences to the 6030 is that there are no groups, therefore the state needs to be read out in the lower bits.
Signed-off-by: Andreas Kemnade andreas@kemnade.info Link: https://lore.kernel.org/r/20221120221208.3093727-3-andreas@kemnade.info Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/twl6030-regulator.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/regulator/twl6030-regulator.c b/drivers/regulator/twl6030-regulator.c index 7c7e3648ea4b..f3856750944f 100644 --- a/drivers/regulator/twl6030-regulator.c +++ b/drivers/regulator/twl6030-regulator.c @@ -67,6 +67,7 @@ struct twlreg_info { #define TWL6030_CFG_STATE_SLEEP 0x03 #define TWL6030_CFG_STATE_GRP_SHIFT 5 #define TWL6030_CFG_STATE_APP_SHIFT 2 +#define TWL6030_CFG_STATE_MASK 0x03 #define TWL6030_CFG_STATE_APP_MASK (0x03 << TWL6030_CFG_STATE_APP_SHIFT) #define TWL6030_CFG_STATE_APP(v) (((v) & TWL6030_CFG_STATE_APP_MASK) >>\ TWL6030_CFG_STATE_APP_SHIFT) @@ -128,13 +129,14 @@ static int twl6030reg_is_enabled(struct regulator_dev *rdev) if (grp < 0) return grp; grp &= P1_GRP_6030; + val = twlreg_read(info, TWL_MODULE_PM_RECEIVER, VREG_STATE); + val = TWL6030_CFG_STATE_APP(val); } else { + val = twlreg_read(info, TWL_MODULE_PM_RECEIVER, VREG_STATE); + val &= TWL6030_CFG_STATE_MASK; grp = 1; }
- val = twlreg_read(info, TWL_MODULE_PM_RECEIVER, VREG_STATE); - val = TWL6030_CFG_STATE_APP(val); - return grp && (val == TWL6030_CFG_STATE_ON); }
@@ -187,7 +189,12 @@ static int twl6030reg_get_status(struct regulator_dev *rdev)
val = twlreg_read(info, TWL_MODULE_PM_RECEIVER, VREG_STATE);
- switch (TWL6030_CFG_STATE_APP(val)) { + if (info->features & TWL6032_SUBCLASS) + val &= TWL6030_CFG_STATE_MASK; + else + val = TWL6030_CFG_STATE_APP(val); + + switch (val) { case TWL6030_CFG_STATE_ON: return REGULATOR_STATUS_NORMAL;
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit a6a00d7e8ffd78d1cdb7a43f1278f081038c638f ]
A kernel built with syzbot's config file reported that
scr_memcpyw(q, save, array3_size(logo_lines, new_cols, 2))
causes uninitialized "save" to be copied.
---------- [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0 [drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1 Console: switching to colour frame buffer device 128x48 ===================================================== BUG: KMSAN: uninit-value in do_update_region+0x4b8/0xba0 do_update_region+0x4b8/0xba0 update_region+0x40d/0x840 fbcon_switch+0x3364/0x35e0 redraw_screen+0xae3/0x18a0 do_bind_con_driver+0x1cb3/0x1df0 do_take_over_console+0x11cb/0x13f0 fbcon_fb_registered+0xacc/0xfd0 register_framebuffer+0x1179/0x1320 __drm_fb_helper_initial_config_and_unlock+0x23ad/0x2b40 drm_fbdev_client_hotplug+0xbea/0xda0 drm_fbdev_generic_setup+0x65e/0x9d0 vkms_init+0x9f3/0xc76 (...snipped...)
Uninit was stored to memory at: fbcon_prepare_logo+0x143b/0x1940 fbcon_init+0x2c1b/0x31c0 visual_init+0x3e7/0x820 do_bind_con_driver+0x14a4/0x1df0 do_take_over_console+0x11cb/0x13f0 fbcon_fb_registered+0xacc/0xfd0 register_framebuffer+0x1179/0x1320 __drm_fb_helper_initial_config_and_unlock+0x23ad/0x2b40 drm_fbdev_client_hotplug+0xbea/0xda0 drm_fbdev_generic_setup+0x65e/0x9d0 vkms_init+0x9f3/0xc76 (...snipped...)
Uninit was created at: __kmem_cache_alloc_node+0xb69/0x1020 __kmalloc+0x379/0x680 fbcon_prepare_logo+0x704/0x1940 fbcon_init+0x2c1b/0x31c0 visual_init+0x3e7/0x820 do_bind_con_driver+0x14a4/0x1df0 do_take_over_console+0x11cb/0x13f0 fbcon_fb_registered+0xacc/0xfd0 register_framebuffer+0x1179/0x1320 __drm_fb_helper_initial_config_and_unlock+0x23ad/0x2b40 drm_fbdev_client_hotplug+0xbea/0xda0 drm_fbdev_generic_setup+0x65e/0x9d0 vkms_init+0x9f3/0xc76 (...snipped...)
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc4-00356-g8f2975c2bb4c #924 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 ----------
Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/cad03d25-0ea0-32c4-8173-fd1895... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/core/fbcon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c index 2618d3beef64..27828435dd4f 100644 --- a/drivers/video/fbdev/core/fbcon.c +++ b/drivers/video/fbdev/core/fbcon.c @@ -609,7 +609,7 @@ static void fbcon_prepare_logo(struct vc_data *vc, struct fb_info *info, if (scr_readw(r) != vc->vc_video_erase_char) break; if (r != q && new_rows >= rows + logo_lines) { - save = kmalloc(array3_size(logo_lines, new_cols, 2), + save = kzalloc(array3_size(logo_lines, new_cols, 2), GFP_KERNEL); if (save) { int i = cols < new_cols ? cols : new_cols;
From: Thinh Nguyen Thinh.Nguyen@synopsys.com
[ Upstream commit 3aa07f72894d209fcf922ad686cbb28cf005aaad ]
If there's a disconnection while operating in eSS, there may be a delay in VBUS drop response from the connector. In that case, the internal link state may drop to operate in usb2 speed while the controller thinks the VBUS is still high. The driver must make sure to disable GUSB2PHYCFG.SUSPHY when sending endpoint command while in usb2 speed. The End Transfer command may be called, and only that command needs to go through at this point. Let's keep it simple and unconditionally disable GUSB2PHYCFG.SUSPHY whenever we issue the command.
This scenario is not seen in real hardware. In a rare case, our prototype type-c controller/interface may have a slow response triggerring this issue.
Signed-off-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Link: https://lore.kernel.org/r/5651117207803c26e2f22ddf4e5ce9e865dcf7c7.166804546... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/gadget.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index a9a43d649478..28a1194f849f 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -291,7 +291,8 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd, * * DWC_usb3 3.30a and DWC_usb31 1.90a programming guide section 3.2.2 */ - if (dwc->gadget->speed <= USB_SPEED_HIGH) { + if (dwc->gadget->speed <= USB_SPEED_HIGH || + DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_ENDTRANSFER) { reg = dwc3_readl(dwc->regs, DWC3_GUSB2PHYCFG(0)); if (unlikely(reg & DWC3_GUSB2PHYCFG_SUSPHY)) { saved_config |= DWC3_GUSB2PHYCFG_SUSPHY;
From: Dominique Martinet asmadeus@codewreck.org
[ Upstream commit 391c18cf776eb4569ecda1f7794f360fe0a45a26 ]
trans_xen did not check the data fits into the buffer before copying from the xen ring, but we probably should. Add a check that just skips the request and return an error to userspace if it did not fit
Tested-by: Stefano Stabellini sstabellini@kernel.org Reviewed-by: Christian Schoenebeck linux_oss@crudebyte.com Link: https://lkml.kernel.org/r/20221118135542.63400-1-asmadeus@codewreck.org Signed-off-by: Dominique Martinet asmadeus@codewreck.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/trans_xen.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c index 432ac5a16f2e..6c8a33f98f09 100644 --- a/net/9p/trans_xen.c +++ b/net/9p/trans_xen.c @@ -231,6 +231,14 @@ static void p9_xen_response(struct work_struct *work) continue; }
+ if (h.size > req->rc.capacity) { + dev_warn(&priv->dev->dev, + "requested packet size too big: %d for tag %d with capacity %zd\n", + h.size, h.tag, req->rc.capacity); + req->status = REQ_STATUS_ERROR; + goto recv_error; + } + memcpy(&req->rc, &h, sizeof(h)); req->rc.offset = 0;
@@ -240,6 +248,7 @@ static void p9_xen_response(struct work_struct *work) masked_prod, &masked_cons, XEN_9PFS_RING_SIZE(ring));
+recv_error: virt_mb(); cons += h.size; ring->intf->in_cons = cons;
From: Davide Tronchin davide.tronchin.94@gmail.com
[ Upstream commit a487069e11b6527373f7c6f435d8998051d0b5d9 ]
Add RmNet support for LARA-L6.
LARA-L6 module can be configured (by AT interface) in three different USB modes: * Default mode (Vendor ID: 0x1546 Product ID: 0x1341) with 4 serial interfaces * RmNet mode (Vendor ID: 0x1546 Product ID: 0x1342) with 4 serial interfaces and 1 RmNet virtual network interface * CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1343) with 4 serial interface and 1 CDC-ECM virtual network interface
In RmNet mode LARA-L6 exposes the following interfaces: If 0: Diagnostic If 1: AT parser If 2: AT parser If 3: AT parset/alternative functions If 4: RMNET interface
Signed-off-by: Davide Tronchin davide.tronchin.94@gmail.com Acked-by: Bjørn Mork bjorn@mork.no Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 7313e6e03c12..bce151e3706a 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1352,6 +1352,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x0489, 0xe0b4, 0)}, /* Foxconn T77W968 LTE */ {QMI_FIXED_INTF(0x0489, 0xe0b5, 0)}, /* Foxconn T77W968 LTE with eSIM support*/ {QMI_FIXED_INTF(0x2692, 0x9025, 4)}, /* Cellient MPL200 (rebranded Qualcomm 05c6:9025) */ + {QMI_QUIRK_SET_DTR(0x1546, 0x1342, 4)}, /* u-blox LARA-L6 */
/* 4. Gobi 1000 devices */ {QMI_GOBI1K_DEVICE(0x05c6, 0x9212)}, /* Acer Gobi Modem Device */
From: Jann Horn jannh@google.com
commit 8d3c106e19e8d251da31ff4cc7462e4565d65084 upstream.
pagetable walks on address ranges mapped by VMAs can be done under the mmap lock, the lock of an anon_vma attached to the VMA, or the lock of the VMA's address_space. Only one of these needs to be held, and it does not need to be held in exclusive mode.
Under those circumstances, the rules for concurrent access to page table entries are:
- Terminal page table entries (entries that don't point to another page table) can be arbitrarily changed under the page table lock, with the exception that they always need to be consistent for hardware page table walks and lockless_pages_from_mm(). This includes that they can be changed into non-terminal entries. - Non-terminal page table entries (which point to another page table) can not be modified; readers are allowed to READ_ONCE() an entry, verify that it is non-terminal, and then assume that its value will stay as-is.
Retracting a page table involves modifying a non-terminal entry, so page-table-level locks are insufficient to protect against concurrent page table traversal; it requires taking all the higher-level locks under which it is possible to start a page walk in the relevant range in exclusive mode.
The collapse_huge_page() path for anonymous THP already follows this rule, but the shmem/file THP path was getting it wrong, making it possible for concurrent rmap-based operations to cause corruption.
Link: https://lkml.kernel.org/r/20221129154730.2274278-1-jannh@google.com Link: https://lkml.kernel.org/r/20221128180252.1684965-1-jannh@google.com Link: https://lkml.kernel.org/r/20221125213714.4115729-1-jannh@google.com Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP") Signed-off-by: Jann Horn jannh@google.com Reviewed-by: Yang Shi shy828301@gmail.com Acked-by: David Hildenbrand david@redhat.com Cc: John Hubbard jhubbard@nvidia.com Cc: Peter Xu peterx@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [manual backport: this code was refactored from two copies into a common helper between 5.15 and 6.0] Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- mm/khugepaged.c | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index cf4dceb9682b..014e8b259313 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1457,6 +1457,14 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE)) return;
+ /* + * Symmetry with retract_page_tables(): Exclude MAP_PRIVATE mappings + * that got written to. Without this, we'd have to also lock the + * anon_vma if one exists. + */ + if (vma->anon_vma) + return; + hpage = find_lock_page(vma->vm_file->f_mapping, linear_page_index(vma, haddr)); if (!hpage) @@ -1469,6 +1477,19 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) if (!pmd) goto drop_hpage;
+ /* + * We need to lock the mapping so that from here on, only GUP-fast and + * hardware page walks can access the parts of the page tables that + * we're operating on. + */ + i_mmap_lock_write(vma->vm_file->f_mapping); + + /* + * This spinlock should be unnecessary: Nobody else should be accessing + * the page tables under spinlock protection here, only + * lockless_pages_from_mm() and the hardware page walker can access page + * tables while all the high-level locks are held in write mode. + */ start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl);
/* step 1: check all mapped PTEs are to the right huge page */ @@ -1515,12 +1536,12 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) }
/* step 4: collapse pmd */ - ptl = pmd_lock(vma->vm_mm, pmd); _pmd = pmdp_collapse_flush(vma, haddr, pmd); - spin_unlock(ptl); mm_dec_nr_ptes(mm); pte_free(mm, pmd_pgtable(_pmd));
+ i_mmap_unlock_write(vma->vm_file->f_mapping); + drop_hpage: unlock_page(hpage); put_page(hpage); @@ -1528,6 +1549,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
abort: pte_unmap_unlock(start_pte, ptl); + i_mmap_unlock_write(vma->vm_file->f_mapping); goto drop_hpage; }
@@ -1577,7 +1599,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * An alternative would be drop the check, but check that page * table is clear before calling pmdp_collapse_flush() under * ptl. It has higher chance to recover THP for the VMA, but - * has higher cost too. + * has higher cost too. It would also probably require locking + * the anon_vma. */ if (vma->anon_vma) continue; @@ -1599,10 +1622,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) */ if (mmap_write_trylock(mm)) { if (!khugepaged_test_exit(mm)) { - spinlock_t *ptl = pmd_lock(mm, pmd); /* assume page table is clear */ _pmd = pmdp_collapse_flush(vma, addr, pmd); - spin_unlock(ptl); mm_dec_nr_ptes(mm); pte_free(mm, pmd_pgtable(_pmd)); }
From: Jann Horn jannh@google.com
commit 2ba99c5e08812494bc57f319fb562f527d9bacd8 upstream.
Since commit 70cbc3cc78a99 ("mm: gup: fix the fast GUP race against THP collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to ensure that the page table was not removed by khugepaged in between.
However, lockless_pages_from_mm() still requires that the page table is not concurrently freed. Fix it by sending IPIs (if the architecture uses semi-RCU-style page table freeing) before freeing/reusing page tables.
Link: https://lkml.kernel.org/r/20221129154730.2274278-2-jannh@google.com Link: https://lkml.kernel.org/r/20221128180252.1684965-2-jannh@google.com Link: https://lkml.kernel.org/r/20221125213714.4115729-2-jannh@google.com Fixes: ba76149f47d8 ("thp: khugepaged") Signed-off-by: Jann Horn jannh@google.com Reviewed-by: Yang Shi shy828301@gmail.com Acked-by: David Hildenbrand david@redhat.com Cc: John Hubbard jhubbard@nvidia.com Cc: Peter Xu peterx@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [manual backport: two of the three places in khugepaged that can free ptes were refactored into a common helper between 5.15 and 6.0] Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/asm-generic/tlb.h | 4 ++++ mm/khugepaged.c | 3 +++ mm/mmu_gather.c | 4 +--- 3 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index a0c4b99d2899..f40c9534f20b 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -205,12 +205,16 @@ extern void tlb_remove_table(struct mmu_gather *tlb, void *table); #define tlb_needs_table_invalidate() (true) #endif
+void tlb_remove_table_sync_one(void); + #else
#ifdef tlb_needs_table_invalidate #error tlb_needs_table_invalidate() requires MMU_GATHER_RCU_TABLE_FREE #endif
+static inline void tlb_remove_table_sync_one(void) { } + #endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 014e8b259313..0268b549bd60 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1154,6 +1154,7 @@ static void collapse_huge_page(struct mm_struct *mm, _pmd = pmdp_collapse_flush(vma, address, pmd); spin_unlock(pmd_ptl); mmu_notifier_invalidate_range_end(&range); + tlb_remove_table_sync_one();
spin_lock(pte_ptl); isolated = __collapse_huge_page_isolate(vma, address, pte, @@ -1538,6 +1539,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) /* step 4: collapse pmd */ _pmd = pmdp_collapse_flush(vma, haddr, pmd); mm_dec_nr_ptes(mm); + tlb_remove_table_sync_one(); pte_free(mm, pmd_pgtable(_pmd));
i_mmap_unlock_write(vma->vm_file->f_mapping); @@ -1625,6 +1627,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) /* assume page table is clear */ _pmd = pmdp_collapse_flush(vma, addr, pmd); mm_dec_nr_ptes(mm); + tlb_remove_table_sync_one(); pte_free(mm, pmd_pgtable(_pmd)); } mmap_write_unlock(mm); diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 03c33c93a582..205fdbb5792a 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -139,7 +139,7 @@ static void tlb_remove_table_smp_sync(void *arg) /* Simply deliver the interrupt */ }
-static void tlb_remove_table_sync_one(void) +void tlb_remove_table_sync_one(void) { /* * This isn't an RCU grace period and hence the page-tables cannot be @@ -163,8 +163,6 @@ static void tlb_remove_table_free(struct mmu_table_batch *batch)
#else /* !CONFIG_MMU_GATHER_RCU_TABLE_FREE */
-static void tlb_remove_table_sync_one(void) { } - static void tlb_remove_table_free(struct mmu_table_batch *batch) { __tlb_remove_table_free(batch);
From: Jann Horn jannh@google.com
commit f268f6cf875f3220afc77bdd0bf1bb136eb54db9 upstream.
Any codepath that zaps page table entries must invoke MMU notifiers to ensure that secondary MMUs (like KVM) don't keep accessing pages which aren't mapped anymore. Secondary MMUs don't hold their own references to pages that are mirrored over, so failing to notify them can lead to page use-after-free.
I'm marking this as addressing an issue introduced in commit f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages"), but most of the security impact of this only came in commit 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP"), which actually omitted flushes for the removal of present PTEs, not just for the removal of empty page tables.
Link: https://lkml.kernel.org/r/20221129154730.2274278-3-jannh@google.com Link: https://lkml.kernel.org/r/20221128180252.1684965-3-jannh@google.com Link: https://lkml.kernel.org/r/20221125213714.4115729-3-jannh@google.com Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Jann Horn jannh@google.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: John Hubbard jhubbard@nvidia.com Cc: Peter Xu peterx@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [manual backport: this code was refactored from two copies into a common helper between 5.15 and 6.0] Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- mm/khugepaged.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 0268b549bd60..0eb3adf4ff68 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1444,6 +1444,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) spinlock_t *ptl; int count = 0; int i; + struct mmu_notifier_range range;
if (!vma || !vma->vm_file || vma->vm_start > haddr || vma->vm_end < haddr + HPAGE_PMD_SIZE) @@ -1537,9 +1538,13 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) }
/* step 4: collapse pmd */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, haddr, + haddr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); _pmd = pmdp_collapse_flush(vma, haddr, pmd); mm_dec_nr_ptes(mm); tlb_remove_table_sync_one(); + mmu_notifier_invalidate_range_end(&range); pte_free(mm, pmd_pgtable(_pmd));
i_mmap_unlock_write(vma->vm_file->f_mapping); @@ -1624,11 +1629,19 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) */ if (mmap_write_trylock(mm)) { if (!khugepaged_test_exit(mm)) { + struct mmu_notifier_range range; + + mmu_notifier_range_init(&range, + MMU_NOTIFY_CLEAR, 0, + NULL, mm, addr, + addr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); /* assume page table is clear */ _pmd = pmdp_collapse_flush(vma, addr, pmd); mm_dec_nr_ptes(mm); tlb_remove_table_sync_one(); pte_free(mm, pmd_pgtable(_pmd)); + mmu_notifier_invalidate_range_end(&range); } mmap_write_unlock(mm); } else {
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit 05a0302c35481e9b47fb90ba40922b0a4cae40d8 ]
The MC146818 driver is prone to read garbage from the RTC. There are several issues all related to the update cycle of the MC146818. The chip increments seconds obviously once per second and indicates that by a bit in a register. The bit goes high 244us before the actual update starts. During the update the readout of the time values is undefined.
The code just checks whether the update in progress bit (UIP) is set before reading the clock. If it's set it waits arbitrary 20ms before retrying, which is ample because the maximum update time is ~2ms.
But this check does not guarantee that the UIP bit goes high and the actual update happens during the readout. So the following can happen
0.997 UIP = False -> Interrupt/NMI/preemption 0.998 UIP -> True 0.999 Readout <- Undefined
To prevent this rework the code so it checks UIP before and after the readout and if set after the readout try again.
But that's not enough to cover the following:
0.997 UIP = False Readout seconds -> NMI (or vCPU scheduled out) 0.998 UIP -> True update completes UIP -> False 1.000 Readout minutes,.... UIP check succeeds
That can make the readout wrong up to 59 seconds.
To prevent this, read the seconds value before the first UIP check, validate it after checking UIP and after reading out the rest.
It's amazing that the original i386 code had this actually correct and the generic implementation of the MC146818 driver got it wrong in 2002 and it stayed that way until today.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20201206220541.594826678@linutronix.de Stable-dep-of: cd17420ebea5 ("rtc: cmos: avoid UIP when writing alarm time") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-mc146818-lib.c | 64 +++++++++++++++++++++------------- 1 file changed, 39 insertions(+), 25 deletions(-)
diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index b036ff33fbe6..8364e4141670 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -8,41 +8,41 @@ #include <linux/acpi.h> #endif
-/* - * Returns true if a clock update is in progress - */ -static inline unsigned char mc146818_is_updating(void) -{ - unsigned char uip; - unsigned long flags; - - spin_lock_irqsave(&rtc_lock, flags); - uip = (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP); - spin_unlock_irqrestore(&rtc_lock, flags); - return uip; -} - unsigned int mc146818_get_time(struct rtc_time *time) { unsigned char ctrl; unsigned long flags; unsigned char century = 0; + bool retry;
#ifdef CONFIG_MACH_DECSTATION unsigned int real_year; #endif
+again: + spin_lock_irqsave(&rtc_lock, flags); /* - * read RTC once any update in progress is done. The update - * can take just over 2ms. We wait 20ms. There is no need to - * to poll-wait (up to 1s - eeccch) for the falling edge of RTC_UIP. - * If you need to know *exactly* when a second has started, enable - * periodic update complete interrupts, (via ioctl) and then - * immediately read /dev/rtc which will block until you get the IRQ. - * Once the read clears, read the RTC time (again via ioctl). Easy. + * Check whether there is an update in progress during which the + * readout is unspecified. The maximum update time is ~2ms. Poll + * every msec for completion. + * + * Store the second value before checking UIP so a long lasting NMI + * which happens to hit after the UIP check cannot make an update + * cycle invisible. */ - if (mc146818_is_updating()) - mdelay(20); + time->tm_sec = CMOS_READ(RTC_SECONDS); + + if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) { + spin_unlock_irqrestore(&rtc_lock, flags); + mdelay(1); + goto again; + } + + /* Revalidate the above readout */ + if (time->tm_sec != CMOS_READ(RTC_SECONDS)) { + spin_unlock_irqrestore(&rtc_lock, flags); + goto again; + }
/* * Only the values that we read from the RTC are set. We leave @@ -50,8 +50,6 @@ unsigned int mc146818_get_time(struct rtc_time *time) * RTC has RTC_DAY_OF_WEEK, we ignore it, as it is only updated * by the RTC when initially set to a non-zero value. */ - spin_lock_irqsave(&rtc_lock, flags); - time->tm_sec = CMOS_READ(RTC_SECONDS); time->tm_min = CMOS_READ(RTC_MINUTES); time->tm_hour = CMOS_READ(RTC_HOURS); time->tm_mday = CMOS_READ(RTC_DAY_OF_MONTH); @@ -66,8 +64,24 @@ unsigned int mc146818_get_time(struct rtc_time *time) century = CMOS_READ(acpi_gbl_FADT.century); #endif ctrl = CMOS_READ(RTC_CONTROL); + /* + * Check for the UIP bit again. If it is set now then + * the above values may contain garbage. + */ + retry = CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP; + /* + * A NMI might have interrupted the above sequence so check whether + * the seconds value has changed which indicates that the NMI took + * longer than the UIP bit was set. Unlikely, but possible and + * there is also virt... + */ + retry |= time->tm_sec != CMOS_READ(RTC_SECONDS); + spin_unlock_irqrestore(&rtc_lock, flags);
+ if (retry) + goto again; + if (!(ctrl & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { time->tm_sec = bcd2bin(time->tm_sec);
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit 211e5db19d15a721b2953ea54b8f26c2963720eb ]
The recent fix for handling the UIP bit unearthed another issue in the RTC code. If the RTC is advertised but the readout is straight 0xFF because it's not available, the old code just proceeded with crappy values, but the new code hangs because it waits for the UIP bit to become low.
Add a sanity check in the RTC CMOS probe function which reads the RTC_VALID register (Register D) which should have bit 0-6 cleared. If that's not the case then fail to register the CMOS.
Add the same check to mc146818_get_time(), warn once when the condition is true and invalidate the rtc_time data.
Reported-by: Mickaël Salaün mic@digikod.net Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Mickaël Salaün mic@linux.microsoft.com Acked-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/87tur3fx7w.fsf@nanos.tec.linutronix.de Stable-dep-of: cd17420ebea5 ("rtc: cmos: avoid UIP when writing alarm time") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-cmos.c | 8 ++++++++ drivers/rtc/rtc-mc146818-lib.c | 7 +++++++ 2 files changed, 15 insertions(+)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 58c6382a2807..cce4b62ffdd0 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -808,6 +808,14 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
spin_lock_irq(&rtc_lock);
+ /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */ + if ((CMOS_READ(RTC_VALID) & 0x7f) != 0) { + spin_unlock_irq(&rtc_lock); + dev_warn(dev, "not accessible\n"); + retval = -ENXIO; + goto cleanup1; + } + if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) { /* force periodic irq to CMOS reset default of 1024Hz; * diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index 8364e4141670..7f01dc41271d 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -21,6 +21,13 @@ unsigned int mc146818_get_time(struct rtc_time *time)
again: spin_lock_irqsave(&rtc_lock, flags); + /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */ + if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) { + spin_unlock_irqrestore(&rtc_lock, flags); + memset(time, 0xff, sizeof(*time)); + return 0; + } + /* * Check whether there is an update in progress during which the * readout is unspecified. The maximum update time is ~2ms. Poll
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit ebb22a05943666155e6da04407cc6e913974c78c ]
The recent change to validate the RTC turned out to be overly tight.
While it cures the problem on the reporters machine it breaks machines with Intel chipsets which use bit 0-5 of the D register. So check only for bit 6 being 0 which is the case on these Intel machines as well.
Fixes: 211e5db19d15 ("rtc: mc146818: Detect and handle broken RTCs") Reported-by: Serge Belyshev belyshev@depni.sinp.msu.ru Reported-by: Dirk Gouders dirk@gouders.net Reported-by: Borislav Petkov bp@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Dirk Gouders dirk@gouders.net Tested-by: Len Brown len.brown@intel.com Tested-by: Borislav Petkov bp@suse.de Acked-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/87zh0nbnha.fsf@nanos.tec.linutronix.de Stable-dep-of: cd17420ebea5 ("rtc: cmos: avoid UIP when writing alarm time") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-cmos.c | 4 ++-- drivers/rtc/rtc-mc146818-lib.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index cce4b62ffdd0..8e8ce40f6440 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -808,8 +808,8 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
spin_lock_irq(&rtc_lock);
- /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */ - if ((CMOS_READ(RTC_VALID) & 0x7f) != 0) { + /* Ensure that the RTC is accessible. Bit 6 must be 0! */ + if ((CMOS_READ(RTC_VALID) & 0x40) != 0) { spin_unlock_irq(&rtc_lock); dev_warn(dev, "not accessible\n"); retval = -ENXIO; diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index 7f01dc41271d..6ed2cd5d2bba 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -21,8 +21,8 @@ unsigned int mc146818_get_time(struct rtc_time *time)
again: spin_lock_irqsave(&rtc_lock, flags); - /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */ - if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) { + /* Ensure that the RTC is accessible. Bit 6 must be 0! */ + if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x40) != 0)) { spin_unlock_irqrestore(&rtc_lock, flags); memset(time, 0xff, sizeof(*time)); return 0;
From: Mateusz Jończyk mat.jonczyk@o2.pl
[ Upstream commit e1aba37569f0aa9c993f740828871e48eea79f98 ]
It appears mc146818_get_time() and mc146818_set_time() now correctly use the century register as specified in the ACPI FADT table. It is not clear what else could be done here.
These comments were introduced by commit 7be2c7c96aff ("[PATCH] RTC framework driver for CMOS RTCs") in 2007, which originally referenced function get_rtc_time() in include/asm-generic/rtc.h .
Signed-off-by: Mateusz Jończyk mat.jonczyk@o2.pl Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20210716210437.29622-1-mat.jonczyk@o2.pl Stable-dep-of: cd17420ebea5 ("rtc: cmos: avoid UIP when writing alarm time") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-cmos.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 8e8ce40f6440..ed4f512eabf0 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -229,19 +229,13 @@ static int cmos_read_time(struct device *dev, struct rtc_time *t) if (!pm_trace_rtc_valid()) return -EIO;
- /* REVISIT: if the clock has a "century" register, use - * that instead of the heuristic in mc146818_get_time(). - * That'll make Y3K compatility (year > 2070) easy! - */ mc146818_get_time(t); return 0; }
static int cmos_set_time(struct device *dev, struct rtc_time *t) { - /* REVISIT: set the "century" register if available - * - * NOTE: this ignores the issue whereby updating the seconds + /* NOTE: this ignores the issue whereby updating the seconds * takes effect exactly 500ms after we write the register. * (Also queueing and other delays before we get this far.) */
From: Mateusz Jończyk mat.jonczyk@o2.pl
[ Upstream commit d35786b3a28dee20b12962ae2dd365892a99ed1a ]
No function is checking mc146818_get_time() return values yet, so correct them to make them more customary.
Signed-off-by: Mateusz Jończyk mat.jonczyk@o2.pl Cc: Alessandro Zummo a.zummo@towertech.it Cc: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20211210200131.153887-3-mat.jonczyk@o2.pl Stable-dep-of: cd17420ebea5 ("rtc: cmos: avoid UIP when writing alarm time") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-mc146818-lib.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index 6ed2cd5d2bba..6262f0680f13 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -25,7 +25,7 @@ unsigned int mc146818_get_time(struct rtc_time *time) if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x40) != 0)) { spin_unlock_irqrestore(&rtc_lock, flags); memset(time, 0xff, sizeof(*time)); - return 0; + return -EIO; }
/* @@ -116,7 +116,7 @@ unsigned int mc146818_get_time(struct rtc_time *time)
time->tm_mon--;
- return RTC_24H; + return 0; } EXPORT_SYMBOL_GPL(mc146818_get_time);
From: Mateusz Jończyk mat.jonczyk@o2.pl
[ Upstream commit 0dd8d6cb9eddfe637bcd821bbfd40ebd5a0737b9 ]
There are 4 users of mc146818_get_time() and none of them was checking the return value from this function. Change this.
Print the appropriate warnings in callers of mc146818_get_time() instead of in the function mc146818_get_time() itself, in order not to add strings to rtc-mc146818-lib.c, which is kind of a library.
The callers of alpha_rtc_read_time() and cmos_read_time() may use the contents of (struct rtc_time *) even when the functions return a failure code. Therefore, set the contents of (struct rtc_time *) to 0x00, which looks more sensible then 0xff and aligns with the (possibly stale?) comment in cmos_read_time:
/* * If pm_trace abused the RTC for storage, set the timespec to 0, * which tells the caller that this RTC value is unusable. */
For consistency, do this in mc146818_get_time().
Note: hpet_rtc_interrupt() may call mc146818_get_time() many times a second. It is very unlikely, though, that the RTC suddenly stops working and mc146818_get_time() would consistently fail.
Only compile-tested on alpha.
Signed-off-by: Mateusz Jończyk mat.jonczyk@o2.pl Cc: Richard Henderson rth@twiddle.net Cc: Ivan Kokshaysky ink@jurassic.park.msu.ru Cc: Matt Turner mattst88@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Alessandro Zummo a.zummo@towertech.it Cc: Alexandre Belloni alexandre.belloni@bootlin.com Cc: linux-alpha@vger.kernel.org Cc: x86@kernel.org Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20211210200131.153887-4-mat.jonczyk@o2.pl Stable-dep-of: cd17420ebea5 ("rtc: cmos: avoid UIP when writing alarm time") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/alpha/kernel/rtc.c | 7 ++++++- arch/x86/kernel/hpet.c | 8 ++++++-- drivers/base/power/trace.c | 6 +++++- drivers/rtc/rtc-cmos.c | 9 ++++++++- drivers/rtc/rtc-mc146818-lib.c | 2 +- 5 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/arch/alpha/kernel/rtc.c b/arch/alpha/kernel/rtc.c index 1b1d5963ac55..48ffbfbd0624 100644 --- a/arch/alpha/kernel/rtc.c +++ b/arch/alpha/kernel/rtc.c @@ -80,7 +80,12 @@ init_rtc_epoch(void) static int alpha_rtc_read_time(struct device *dev, struct rtc_time *tm) { - mc146818_get_time(tm); + int ret = mc146818_get_time(tm); + + if (ret < 0) { + dev_err_ratelimited(dev, "unable to read current time\n"); + return ret; + }
/* Adjust for non-default epochs. It's easier to depend on the generic __get_rtc_time and adjust the epoch here than create diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index 4ab7a9757e52..574df24a8e5a 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -1325,8 +1325,12 @@ irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id) hpet_rtc_timer_reinit(); memset(&curr_time, 0, sizeof(struct rtc_time));
- if (hpet_rtc_flags & (RTC_UIE | RTC_AIE)) - mc146818_get_time(&curr_time); + if (hpet_rtc_flags & (RTC_UIE | RTC_AIE)) { + if (unlikely(mc146818_get_time(&curr_time) < 0)) { + pr_err_ratelimited("unable to read current time from RTC\n"); + return IRQ_HANDLED; + } + }
if (hpet_rtc_flags & RTC_UIE && curr_time.tm_sec != hpet_prev_update_sec) { diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c index 94665037f4a3..72b7a92337b1 100644 --- a/drivers/base/power/trace.c +++ b/drivers/base/power/trace.c @@ -120,7 +120,11 @@ static unsigned int read_magic_time(void) struct rtc_time time; unsigned int val;
- mc146818_get_time(&time); + if (mc146818_get_time(&time) < 0) { + pr_err("Unable to read current time from RTC\n"); + return 0; + } + pr_info("RTC time: %ptRt, date: %ptRd\n", &time, &time); val = time.tm_year; /* 100 years */ if (val > 100) diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index ed4f512eabf0..f8358bb2ae31 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -222,6 +222,8 @@ static inline void cmos_write_bank2(unsigned char val, unsigned char addr)
static int cmos_read_time(struct device *dev, struct rtc_time *t) { + int ret; + /* * If pm_trace abused the RTC for storage, set the timespec to 0, * which tells the caller that this RTC value is unusable. @@ -229,7 +231,12 @@ static int cmos_read_time(struct device *dev, struct rtc_time *t) if (!pm_trace_rtc_valid()) return -EIO;
- mc146818_get_time(t); + ret = mc146818_get_time(t); + if (ret < 0) { + dev_err_ratelimited(dev, "unable to read current time\n"); + return ret; + } + return 0; }
diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index 6262f0680f13..3ae5c690f22b 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -24,7 +24,7 @@ unsigned int mc146818_get_time(struct rtc_time *time) /* Ensure that the RTC is accessible. Bit 6 must be 0! */ if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x40) != 0)) { spin_unlock_irqrestore(&rtc_lock, flags); - memset(time, 0xff, sizeof(*time)); + memset(time, 0, sizeof(*time)); return -EIO; }
From: Mateusz Jończyk mat.jonczyk@o2.pl
[ Upstream commit ea6fa4961aab8f90a8aa03575a98b4bda368d4b6 ]
To prevent an infinite loop in mc146818_get_time(), commit 211e5db19d15 ("rtc: mc146818: Detect and handle broken RTCs") added a check for RTC availability. Together with a later fix, it checked if bit 6 in register 0x0d is cleared.
This, however, caused a false negative on a motherboard with an AMD SB710 southbridge; according to the specification [1], bit 6 of register 0x0d of this chipset is a scratchbit. This caused a regression in Linux 5.11 - the RTC was determined broken by the kernel and not used by rtc-cmos.c [3]. This problem was also reported in Fedora [4].
As a better alternative, check whether the UIP ("Update-in-progress") bit is set for longer then 10ms. If that is the case, then apparently the RTC is either absent (and all register reads return 0xff) or broken. Also limit the number of loop iterations in mc146818_get_time() to 10 to prevent an infinite loop there.
The functions mc146818_get_time() and mc146818_does_rtc_work() will be refactored later in this patch series, in order to fix a separate problem with reading / setting the RTC alarm time. This is done so to avoid a confusion about what is being fixed when.
In a previous approach to this problem, I implemented a check whether the RTC_HOURS register contains a value <= 24. This, however, sometimes did not work correctly on my Intel Kaby Lake laptop. According to Intel's documentation [2], "the time and date RAM locations (0-9) are disconnected from the external bus" during the update cycle so reading this register without checking the UIP bit is incorrect.
[1] AMD SB700/710/750 Register Reference Guide, page 308, https://developer.amd.com/wordpress/media/2012/10/43009_sb7xx_rrg_pub_1.00.p...
[2] 7th Generation Intel ® Processor Family I/O for U/Y Platforms [...] Datasheet Volume 1 of 2, page 209 Intel's Document Number: 334658-006, https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/7th-...
[3] Functions in arch/x86/kernel/rtc.c apparently were using it.
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1936688
Fixes: 211e5db19d15 ("rtc: mc146818: Detect and handle broken RTCs") Fixes: ebb22a059436 ("rtc: mc146818: Dont test for bit 0-5 in Register D") Signed-off-by: Mateusz Jończyk mat.jonczyk@o2.pl Cc: Thomas Gleixner tglx@linutronix.de Cc: Alessandro Zummo a.zummo@towertech.it Cc: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20211210200131.153887-5-mat.jonczyk@o2.pl Stable-dep-of: cd17420ebea5 ("rtc: cmos: avoid UIP when writing alarm time") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-cmos.c | 10 ++++------ drivers/rtc/rtc-mc146818-lib.c | 34 ++++++++++++++++++++++++++++++---- include/linux/mc146818rtc.h | 1 + 3 files changed, 35 insertions(+), 10 deletions(-)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index f8358bb2ae31..93ffb9eaf63a 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -807,16 +807,14 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
rename_region(ports, dev_name(&cmos_rtc.rtc->dev));
- spin_lock_irq(&rtc_lock); - - /* Ensure that the RTC is accessible. Bit 6 must be 0! */ - if ((CMOS_READ(RTC_VALID) & 0x40) != 0) { - spin_unlock_irq(&rtc_lock); - dev_warn(dev, "not accessible\n"); + if (!mc146818_does_rtc_work()) { + dev_warn(dev, "broken or not accessible\n"); retval = -ENXIO; goto cleanup1; }
+ spin_lock_irq(&rtc_lock); + if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) { /* force periodic irq to CMOS reset default of 1024Hz; * diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index 3ae5c690f22b..94df6056c5c0 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -8,10 +8,36 @@ #include <linux/acpi.h> #endif
+/* + * If the UIP (Update-in-progress) bit of the RTC is set for more then + * 10ms, the RTC is apparently broken or not present. + */ +bool mc146818_does_rtc_work(void) +{ + int i; + unsigned char val; + unsigned long flags; + + for (i = 0; i < 10; i++) { + spin_lock_irqsave(&rtc_lock, flags); + val = CMOS_READ(RTC_FREQ_SELECT); + spin_unlock_irqrestore(&rtc_lock, flags); + + if ((val & RTC_UIP) == 0) + return true; + + mdelay(1); + } + + return false; +} +EXPORT_SYMBOL_GPL(mc146818_does_rtc_work); + unsigned int mc146818_get_time(struct rtc_time *time) { unsigned char ctrl; unsigned long flags; + unsigned int iter_count = 0; unsigned char century = 0; bool retry;
@@ -20,13 +46,13 @@ unsigned int mc146818_get_time(struct rtc_time *time) #endif
again: - spin_lock_irqsave(&rtc_lock, flags); - /* Ensure that the RTC is accessible. Bit 6 must be 0! */ - if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x40) != 0)) { - spin_unlock_irqrestore(&rtc_lock, flags); + if (iter_count > 10) { memset(time, 0, sizeof(*time)); return -EIO; } + iter_count++; + + spin_lock_irqsave(&rtc_lock, flags);
/* * Check whether there is an update in progress during which the diff --git a/include/linux/mc146818rtc.h b/include/linux/mc146818rtc.h index 1e0205811394..c246ce191915 100644 --- a/include/linux/mc146818rtc.h +++ b/include/linux/mc146818rtc.h @@ -125,6 +125,7 @@ struct cmos_rtc_board_info { #define RTC_IO_EXTENT_USED RTC_IO_EXTENT #endif /* ARCH_RTC_LOCATION */
+bool mc146818_does_rtc_work(void); unsigned int mc146818_get_time(struct rtc_time *time); int mc146818_set_time(struct rtc_time *time);
From: Mateusz Jończyk mat.jonczyk@o2.pl
[ Upstream commit ec5895c0f2d87b9bf4185db1915e40fa6fcfc0ac ]
Function mc146818_get_time() contains an elaborate mechanism of reading the RTC time while no RTC update is in progress. It turns out that reading the RTC alarm clock also requires avoiding the RTC update. Therefore, the mechanism in mc146818_get_time() should be reused - so extract it into a separate function.
The logic in mc146818_avoid_UIP() is same as in mc146818_get_time() except that after every
if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) {
there is now "mdelay(1)".
To avoid producing a very unreadable patch, mc146818_get_time() will be refactored to use mc146818_avoid_UIP() in the next patch.
Signed-off-by: Mateusz Jończyk mat.jonczyk@o2.pl Cc: Alessandro Zummo a.zummo@towertech.it Cc: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20211210200131.153887-6-mat.jonczyk@o2.pl Stable-dep-of: cd17420ebea5 ("rtc: cmos: avoid UIP when writing alarm time") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-mc146818-lib.c | 70 ++++++++++++++++++++++++++++++++++ include/linux/mc146818rtc.h | 3 ++ 2 files changed, 73 insertions(+)
diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index 94df6056c5c0..46527a5d3912 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -8,6 +8,76 @@ #include <linux/acpi.h> #endif
+/* + * Execute a function while the UIP (Update-in-progress) bit of the RTC is + * unset. + * + * Warning: callback may be executed more then once. + */ +bool mc146818_avoid_UIP(void (*callback)(unsigned char seconds, void *param), + void *param) +{ + int i; + unsigned long flags; + unsigned char seconds; + + for (i = 0; i < 10; i++) { + spin_lock_irqsave(&rtc_lock, flags); + + /* + * Check whether there is an update in progress during which the + * readout is unspecified. The maximum update time is ~2ms. Poll + * every msec for completion. + * + * Store the second value before checking UIP so a long lasting + * NMI which happens to hit after the UIP check cannot make + * an update cycle invisible. + */ + seconds = CMOS_READ(RTC_SECONDS); + + if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) { + spin_unlock_irqrestore(&rtc_lock, flags); + mdelay(1); + continue; + } + + /* Revalidate the above readout */ + if (seconds != CMOS_READ(RTC_SECONDS)) { + spin_unlock_irqrestore(&rtc_lock, flags); + continue; + } + + if (callback) + callback(seconds, param); + + /* + * Check for the UIP bit again. If it is set now then + * the above values may contain garbage. + */ + if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) { + spin_unlock_irqrestore(&rtc_lock, flags); + mdelay(1); + continue; + } + + /* + * A NMI might have interrupted the above sequence so check + * whether the seconds value has changed which indicates that + * the NMI took longer than the UIP bit was set. Unlikely, but + * possible and there is also virt... + */ + if (seconds != CMOS_READ(RTC_SECONDS)) { + spin_unlock_irqrestore(&rtc_lock, flags); + continue; + } + spin_unlock_irqrestore(&rtc_lock, flags); + + return true; + } + return false; +} +EXPORT_SYMBOL_GPL(mc146818_avoid_UIP); + /* * If the UIP (Update-in-progress) bit of the RTC is set for more then * 10ms, the RTC is apparently broken or not present. diff --git a/include/linux/mc146818rtc.h b/include/linux/mc146818rtc.h index c246ce191915..fb042e0e7d76 100644 --- a/include/linux/mc146818rtc.h +++ b/include/linux/mc146818rtc.h @@ -129,4 +129,7 @@ bool mc146818_does_rtc_work(void); unsigned int mc146818_get_time(struct rtc_time *time); int mc146818_set_time(struct rtc_time *time);
+bool mc146818_avoid_UIP(void (*callback)(unsigned char seconds, void *param), + void *param); + #endif /* _MC146818RTC_H */
From: Mateusz Jończyk mat.jonczyk@o2.pl
[ Upstream commit cd17420ebea580c22dd3a93f7237de3d2cfafc37 ]
Some Intel chipsets disconnect the time and date RTC registers when the clock update is in progress: during this time reads may return bogus values and writes fail silently. This includes the RTC alarm registers. [1]
cmos_set_alarm() did not take account for that, fix it.
[1] 7th Generation Intel ® Processor Family I/O for U/Y Platforms [...] Datasheet, Volume 1 of 2 (Intel's Document Number: 334658-006) Page 208 https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/7th-... "If a RAM read from the ten time and date bytes is attempted during an update cycle, the value read do not necessarily represent the true contents of those locations. Any RAM writes under the same conditions are ignored."
Signed-off-by: Mateusz Jończyk mat.jonczyk@o2.pl Cc: Alessandro Zummo a.zummo@towertech.it Cc: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20211210200131.153887-10-mat.jonczyk@o2.pl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-cmos.c | 107 +++++++++++++++++++++++++---------------- 1 file changed, 66 insertions(+), 41 deletions(-)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 93ffb9eaf63a..601e3967e1f0 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -444,10 +444,57 @@ static int cmos_validate_alarm(struct device *dev, struct rtc_wkalrm *t) return 0; }
+struct cmos_set_alarm_callback_param { + struct cmos_rtc *cmos; + unsigned char mon, mday, hrs, min, sec; + struct rtc_wkalrm *t; +}; + +/* Note: this function may be executed by mc146818_avoid_UIP() more then + * once + */ +static void cmos_set_alarm_callback(unsigned char __always_unused seconds, + void *param_in) +{ + struct cmos_set_alarm_callback_param *p = + (struct cmos_set_alarm_callback_param *)param_in; + + /* next rtc irq must not be from previous alarm setting */ + cmos_irq_disable(p->cmos, RTC_AIE); + + /* update alarm */ + CMOS_WRITE(p->hrs, RTC_HOURS_ALARM); + CMOS_WRITE(p->min, RTC_MINUTES_ALARM); + CMOS_WRITE(p->sec, RTC_SECONDS_ALARM); + + /* the system may support an "enhanced" alarm */ + if (p->cmos->day_alrm) { + CMOS_WRITE(p->mday, p->cmos->day_alrm); + if (p->cmos->mon_alrm) + CMOS_WRITE(p->mon, p->cmos->mon_alrm); + } + + if (use_hpet_alarm()) { + /* + * FIXME the HPET alarm glue currently ignores day_alrm + * and mon_alrm ... + */ + hpet_set_alarm_time(p->t->time.tm_hour, p->t->time.tm_min, + p->t->time.tm_sec); + } + + if (p->t->enabled) + cmos_irq_enable(p->cmos, RTC_AIE); +} + static int cmos_set_alarm(struct device *dev, struct rtc_wkalrm *t) { struct cmos_rtc *cmos = dev_get_drvdata(dev); - unsigned char mon, mday, hrs, min, sec, rtc_control; + struct cmos_set_alarm_callback_param p = { + .cmos = cmos, + .t = t + }; + unsigned char rtc_control; int ret;
/* This not only a rtc_op, but also called directly */ @@ -458,11 +505,11 @@ static int cmos_set_alarm(struct device *dev, struct rtc_wkalrm *t) if (ret < 0) return ret;
- mon = t->time.tm_mon + 1; - mday = t->time.tm_mday; - hrs = t->time.tm_hour; - min = t->time.tm_min; - sec = t->time.tm_sec; + p.mon = t->time.tm_mon + 1; + p.mday = t->time.tm_mday; + p.hrs = t->time.tm_hour; + p.min = t->time.tm_min; + p.sec = t->time.tm_sec;
spin_lock_irq(&rtc_lock); rtc_control = CMOS_READ(RTC_CONTROL); @@ -470,43 +517,21 @@ static int cmos_set_alarm(struct device *dev, struct rtc_wkalrm *t)
if (!(rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { /* Writing 0xff means "don't care" or "match all". */ - mon = (mon <= 12) ? bin2bcd(mon) : 0xff; - mday = (mday >= 1 && mday <= 31) ? bin2bcd(mday) : 0xff; - hrs = (hrs < 24) ? bin2bcd(hrs) : 0xff; - min = (min < 60) ? bin2bcd(min) : 0xff; - sec = (sec < 60) ? bin2bcd(sec) : 0xff; + p.mon = (p.mon <= 12) ? bin2bcd(p.mon) : 0xff; + p.mday = (p.mday >= 1 && p.mday <= 31) ? bin2bcd(p.mday) : 0xff; + p.hrs = (p.hrs < 24) ? bin2bcd(p.hrs) : 0xff; + p.min = (p.min < 60) ? bin2bcd(p.min) : 0xff; + p.sec = (p.sec < 60) ? bin2bcd(p.sec) : 0xff; }
- spin_lock_irq(&rtc_lock); - - /* next rtc irq must not be from previous alarm setting */ - cmos_irq_disable(cmos, RTC_AIE); - - /* update alarm */ - CMOS_WRITE(hrs, RTC_HOURS_ALARM); - CMOS_WRITE(min, RTC_MINUTES_ALARM); - CMOS_WRITE(sec, RTC_SECONDS_ALARM); - - /* the system may support an "enhanced" alarm */ - if (cmos->day_alrm) { - CMOS_WRITE(mday, cmos->day_alrm); - if (cmos->mon_alrm) - CMOS_WRITE(mon, cmos->mon_alrm); - } - - if (use_hpet_alarm()) { - /* - * FIXME the HPET alarm glue currently ignores day_alrm - * and mon_alrm ... - */ - hpet_set_alarm_time(t->time.tm_hour, t->time.tm_min, - t->time.tm_sec); - } - - if (t->enabled) - cmos_irq_enable(cmos, RTC_AIE); - - spin_unlock_irq(&rtc_lock); + /* + * Some Intel chipsets disconnect the alarm registers when the clock + * update is in progress - during this time writes fail silently. + * + * Use mc146818_avoid_UIP() to avoid this. + */ + if (!mc146818_avoid_UIP(cmos_set_alarm_callback, &p)) + return -EIO;
cmos->alarm_expires = rtc_tm_to_time64(&t->time);
From: Mateusz Jończyk mat.jonczyk@o2.pl
[ Upstream commit cdedc45c579faf8cc6608d3ef81576ee0d512aa4 ]
Some Intel chipsets disconnect the time and date RTC registers when the clock update is in progress: during this time reads may return bogus values and writes fail silently. This includes the RTC alarm registers. [1]
cmos_read_alarm() did not take account for that, which caused alarm time reads to sometimes return bogus values. This can be shown with a test patch that I am attaching to this patch series.
Fix this, by using mc146818_avoid_UIP().
[1] 7th Generation Intel ® Processor Family I/O for U/Y Platforms [...] Datasheet, Volume 1 of 2 (Intel's Document Number: 334658-006) Page 208 https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/7th-... "If a RAM read from the ten time and date bytes is attempted during an update cycle, the value read do not necessarily represent the true contents of those locations. Any RAM writes under the same conditions are ignored."
Signed-off-by: Mateusz Jończyk mat.jonczyk@o2.pl Cc: Alessandro Zummo a.zummo@towertech.it Cc: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20211210200131.153887-9-mat.jonczyk@o2.pl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-cmos.c | 72 ++++++++++++++++++++++++++++-------------- 1 file changed, 49 insertions(+), 23 deletions(-)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 601e3967e1f0..d419eb988b22 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -249,10 +249,46 @@ static int cmos_set_time(struct device *dev, struct rtc_time *t) return mc146818_set_time(t); }
+struct cmos_read_alarm_callback_param { + struct cmos_rtc *cmos; + struct rtc_time *time; + unsigned char rtc_control; +}; + +static void cmos_read_alarm_callback(unsigned char __always_unused seconds, + void *param_in) +{ + struct cmos_read_alarm_callback_param *p = + (struct cmos_read_alarm_callback_param *)param_in; + struct rtc_time *time = p->time; + + time->tm_sec = CMOS_READ(RTC_SECONDS_ALARM); + time->tm_min = CMOS_READ(RTC_MINUTES_ALARM); + time->tm_hour = CMOS_READ(RTC_HOURS_ALARM); + + if (p->cmos->day_alrm) { + /* ignore upper bits on readback per ACPI spec */ + time->tm_mday = CMOS_READ(p->cmos->day_alrm) & 0x3f; + if (!time->tm_mday) + time->tm_mday = -1; + + if (p->cmos->mon_alrm) { + time->tm_mon = CMOS_READ(p->cmos->mon_alrm); + if (!time->tm_mon) + time->tm_mon = -1; + } + } + + p->rtc_control = CMOS_READ(RTC_CONTROL); +} + static int cmos_read_alarm(struct device *dev, struct rtc_wkalrm *t) { struct cmos_rtc *cmos = dev_get_drvdata(dev); - unsigned char rtc_control; + struct cmos_read_alarm_callback_param p = { + .cmos = cmos, + .time = &t->time, + };
/* This not only a rtc_op, but also called directly */ if (!is_valid_irq(cmos->irq)) @@ -263,28 +299,18 @@ static int cmos_read_alarm(struct device *dev, struct rtc_wkalrm *t) * the future. */
- spin_lock_irq(&rtc_lock); - t->time.tm_sec = CMOS_READ(RTC_SECONDS_ALARM); - t->time.tm_min = CMOS_READ(RTC_MINUTES_ALARM); - t->time.tm_hour = CMOS_READ(RTC_HOURS_ALARM); - - if (cmos->day_alrm) { - /* ignore upper bits on readback per ACPI spec */ - t->time.tm_mday = CMOS_READ(cmos->day_alrm) & 0x3f; - if (!t->time.tm_mday) - t->time.tm_mday = -1; - - if (cmos->mon_alrm) { - t->time.tm_mon = CMOS_READ(cmos->mon_alrm); - if (!t->time.tm_mon) - t->time.tm_mon = -1; - } - } - - rtc_control = CMOS_READ(RTC_CONTROL); - spin_unlock_irq(&rtc_lock); + /* Some Intel chipsets disconnect the alarm registers when the clock + * update is in progress - during this time reads return bogus values + * and writes may fail silently. See for example "7th Generation Intel® + * Processor Family I/O for U/Y Platforms [...] Datasheet", section + * 27.7.1 + * + * Use the mc146818_avoid_UIP() function to avoid this. + */ + if (!mc146818_avoid_UIP(cmos_read_alarm_callback, &p)) + return -EIO;
- if (!(rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { + if (!(p.rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { if (((unsigned)t->time.tm_sec) < 0x60) t->time.tm_sec = bcd2bin(t->time.tm_sec); else @@ -313,7 +339,7 @@ static int cmos_read_alarm(struct device *dev, struct rtc_wkalrm *t) } }
- t->enabled = !!(rtc_control & RTC_AIE); + t->enabled = !!(p.rtc_control & RTC_AIE); t->pending = 0;
return 0;
From: Xiaofei Tan tanxiaofei@huawei.com
[ Upstream commit 6950d046eb6eabbc271fda416460c05f7a85698a ]
It is redundant to do irqsave and irqrestore in hardIRQ context, where it has been in a irq-disabled context.
Signed-off-by: Xiaofei Tan tanxiaofei@huawei.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/1612355981-6764-2-git-send-email-tanxiaofei@huawei... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-cmos.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index d419eb988b22..21f2bdd025b6 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -704,11 +704,10 @@ static struct cmos_rtc cmos_rtc;
static irqreturn_t cmos_interrupt(int irq, void *p) { - unsigned long flags; u8 irqstat; u8 rtc_control;
- spin_lock_irqsave(&rtc_lock, flags); + spin_lock(&rtc_lock);
/* When the HPET interrupt handler calls us, the interrupt * status is passed as arg1 instead of the irq number. But @@ -742,7 +741,7 @@ static irqreturn_t cmos_interrupt(int irq, void *p) hpet_mask_rtc_irq_bit(RTC_AIE); CMOS_READ(RTC_INTR_FLAGS); } - spin_unlock_irqrestore(&rtc_lock, flags); + spin_unlock(&rtc_lock);
if (is_intr(irqstat)) { rtc_update_irq(p, 1, irqstat);
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit dcf257e92622ba0e25fdc4b6699683e7ae67e2a1 ]
No need to hold the lock and disable interrupts for doing math.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20201206220541.709243630@linutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-mc146818-lib.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index 46527a5d3912..1ca866461d10 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -249,7 +249,6 @@ int mc146818_set_time(struct rtc_time *time) if (yrs > 255) /* They are unsigned */ return -EINVAL;
- spin_lock_irqsave(&rtc_lock, flags); #ifdef CONFIG_MACH_DECSTATION real_yrs = yrs; leap_yr = ((!((yrs + 1900) % 4) && ((yrs + 1900) % 100)) || @@ -278,10 +277,8 @@ int mc146818_set_time(struct rtc_time *time) /* These limits and adjustments are independent of * whether the chip is in binary mode or not. */ - if (yrs > 169) { - spin_unlock_irqrestore(&rtc_lock, flags); + if (yrs > 169) return -EINVAL; - }
if (yrs >= 100) yrs -= 100; @@ -297,6 +294,7 @@ int mc146818_set_time(struct rtc_time *time) century = bin2bcd(century); }
+ spin_lock_irqsave(&rtc_lock, flags); save_control = CMOS_READ(RTC_CONTROL); CMOS_WRITE((save_control|RTC_SET), RTC_CONTROL); save_freq_select = CMOS_READ(RTC_FREQ_SELECT);
From: Ross Lagerwall ross.lagerwall@citrix.com
[ Upstream commit ad7f402ae4f466647c3a669b8a6f3e5d4271c84a ]
In some cases, the frontend may send a packet where the protocol headers are spread across multiple slots. This would result in netback creating an skb where the protocol headers spill over into the non-linear area. Some drivers and NICs don't handle this properly resulting in an interface reset or worse.
This issue was introduced by the removal of an unconditional skb pull in the tx path to improve performance. Fix this without reintroducing the pull by setting up grant copy ops for as many slots as needed to reach the XEN_NETBACK_TX_COPY_LEN size. Adjust the rest of the code to handle multiple copy operations per skb.
This is XSA-423 / CVE-2022-3643.
Fixes: 7e5d7753956b ("xen-netback: remove unconditional __pskb_pull_tail() in guest Tx path") Signed-off-by: Ross Lagerwall ross.lagerwall@citrix.com Reviewed-by: Paul Durrant paul@xen.org Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/xen-netback/netback.c | 223 ++++++++++++++++-------------- 1 file changed, 123 insertions(+), 100 deletions(-)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index b0cbc7fead74..06fd61b71d37 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -330,10 +330,13 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
struct xenvif_tx_cb { - u16 pending_idx; + u16 copy_pending_idx[XEN_NETBK_LEGACY_SLOTS_MAX + 1]; + u8 copy_count; };
#define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb) +#define copy_pending_idx(skb, i) (XENVIF_TX_CB(skb)->copy_pending_idx[i]) +#define copy_count(skb) (XENVIF_TX_CB(skb)->copy_count)
static inline void xenvif_tx_create_map_op(struct xenvif_queue *queue, u16 pending_idx, @@ -368,31 +371,93 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned int size) return skb; }
-static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *queue, - struct sk_buff *skb, - struct xen_netif_tx_request *txp, - struct gnttab_map_grant_ref *gop, - unsigned int frag_overflow, - struct sk_buff *nskb) +static void xenvif_get_requests(struct xenvif_queue *queue, + struct sk_buff *skb, + struct xen_netif_tx_request *first, + struct xen_netif_tx_request *txfrags, + unsigned *copy_ops, + unsigned *map_ops, + unsigned int frag_overflow, + struct sk_buff *nskb, + unsigned int extra_count, + unsigned int data_len) { struct skb_shared_info *shinfo = skb_shinfo(skb); skb_frag_t *frags = shinfo->frags; - u16 pending_idx = XENVIF_TX_CB(skb)->pending_idx; - int start; + u16 pending_idx; pending_ring_idx_t index; unsigned int nr_slots; + struct gnttab_copy *cop = queue->tx_copy_ops + *copy_ops; + struct gnttab_map_grant_ref *gop = queue->tx_map_ops + *map_ops; + struct xen_netif_tx_request *txp = first; + + nr_slots = shinfo->nr_frags + 1; + + copy_count(skb) = 0; + + /* Create copy ops for exactly data_len bytes into the skb head. */ + __skb_put(skb, data_len); + while (data_len > 0) { + int amount = data_len > txp->size ? txp->size : data_len; + + cop->source.u.ref = txp->gref; + cop->source.domid = queue->vif->domid; + cop->source.offset = txp->offset; + + cop->dest.domid = DOMID_SELF; + cop->dest.offset = (offset_in_page(skb->data + + skb_headlen(skb) - + data_len)) & ~XEN_PAGE_MASK; + cop->dest.u.gmfn = virt_to_gfn(skb->data + skb_headlen(skb) + - data_len); + + cop->len = amount; + cop->flags = GNTCOPY_source_gref;
- nr_slots = shinfo->nr_frags; + index = pending_index(queue->pending_cons); + pending_idx = queue->pending_ring[index]; + callback_param(queue, pending_idx).ctx = NULL; + copy_pending_idx(skb, copy_count(skb)) = pending_idx; + copy_count(skb)++; + + cop++; + data_len -= amount;
- /* Skip first skb fragment if it is on same page as header fragment. */ - start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx); + if (amount == txp->size) { + /* The copy op covered the full tx_request */ + + memcpy(&queue->pending_tx_info[pending_idx].req, + txp, sizeof(*txp)); + queue->pending_tx_info[pending_idx].extra_count = + (txp == first) ? extra_count : 0; + + if (txp == first) + txp = txfrags; + else + txp++; + queue->pending_cons++; + nr_slots--; + } else { + /* The copy op partially covered the tx_request. + * The remainder will be mapped. + */ + txp->offset += amount; + txp->size -= amount; + } + }
- for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots; - shinfo->nr_frags++, txp++, gop++) { + for (shinfo->nr_frags = 0; shinfo->nr_frags < nr_slots; + shinfo->nr_frags++, gop++) { index = pending_index(queue->pending_cons++); pending_idx = queue->pending_ring[index]; - xenvif_tx_create_map_op(queue, pending_idx, txp, 0, gop); + xenvif_tx_create_map_op(queue, pending_idx, txp, + txp == first ? extra_count : 0, gop); frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx); + + if (txp == first) + txp = txfrags; + else + txp++; }
if (frag_overflow) { @@ -413,7 +478,8 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que skb_shinfo(skb)->frag_list = nskb; }
- return gop; + (*copy_ops) = cop - queue->tx_copy_ops; + (*map_ops) = gop - queue->tx_map_ops; }
static inline void xenvif_grant_handle_set(struct xenvif_queue *queue, @@ -449,7 +515,7 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, struct gnttab_copy **gopp_copy) { struct gnttab_map_grant_ref *gop_map = *gopp_map; - u16 pending_idx = XENVIF_TX_CB(skb)->pending_idx; + u16 pending_idx; /* This always points to the shinfo of the skb being checked, which * could be either the first or the one on the frag_list */ @@ -460,24 +526,37 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, struct skb_shared_info *first_shinfo = NULL; int nr_frags = shinfo->nr_frags; const bool sharedslot = nr_frags && - frag_get_pending_idx(&shinfo->frags[0]) == pending_idx; + frag_get_pending_idx(&shinfo->frags[0]) == + copy_pending_idx(skb, copy_count(skb) - 1); int i, err;
- /* Check status of header. */ - err = (*gopp_copy)->status; - if (unlikely(err)) { - if (net_ratelimit()) - netdev_dbg(queue->vif->dev, - "Grant copy of header failed! status: %d pending_idx: %u ref: %u\n", - (*gopp_copy)->status, - pending_idx, - (*gopp_copy)->source.u.ref); - /* The first frag might still have this slot mapped */ - if (!sharedslot) - xenvif_idx_release(queue, pending_idx, - XEN_NETIF_RSP_ERROR); + for (i = 0; i < copy_count(skb); i++) { + int newerr; + + /* Check status of header. */ + pending_idx = copy_pending_idx(skb, i); + + newerr = (*gopp_copy)->status; + if (likely(!newerr)) { + /* The first frag might still have this slot mapped */ + if (i < copy_count(skb) - 1 || !sharedslot) + xenvif_idx_release(queue, pending_idx, + XEN_NETIF_RSP_OKAY); + } else { + err = newerr; + if (net_ratelimit()) + netdev_dbg(queue->vif->dev, + "Grant copy of header failed! status: %d pending_idx: %u ref: %u\n", + (*gopp_copy)->status, + pending_idx, + (*gopp_copy)->source.u.ref); + /* The first frag might still have this slot mapped */ + if (i < copy_count(skb) - 1 || !sharedslot) + xenvif_idx_release(queue, pending_idx, + XEN_NETIF_RSP_ERROR); + } + (*gopp_copy)++; } - (*gopp_copy)++;
check_frags: for (i = 0; i < nr_frags; i++, gop_map++) { @@ -524,14 +603,6 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, if (err) continue;
- /* First error: if the header haven't shared a slot with the - * first frag, release it as well. - */ - if (!sharedslot) - xenvif_idx_release(queue, - XENVIF_TX_CB(skb)->pending_idx, - XEN_NETIF_RSP_OKAY); - /* Invalidate preceding fragments of this skb. */ for (j = 0; j < i; j++) { pending_idx = frag_get_pending_idx(&shinfo->frags[j]); @@ -801,7 +872,6 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, unsigned *copy_ops, unsigned *map_ops) { - struct gnttab_map_grant_ref *gop = queue->tx_map_ops; struct sk_buff *skb, *nskb; int ret; unsigned int frag_overflow; @@ -883,8 +953,12 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, continue; }
+ data_len = (txreq.size > XEN_NETBACK_TX_COPY_LEN) ? + XEN_NETBACK_TX_COPY_LEN : txreq.size; + ret = xenvif_count_requests(queue, &txreq, extra_count, txfrags, work_to_do); + if (unlikely(ret < 0)) break;
@@ -910,9 +984,8 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, index = pending_index(queue->pending_cons); pending_idx = queue->pending_ring[index];
- data_len = (txreq.size > XEN_NETBACK_TX_COPY_LEN && - ret < XEN_NETBK_LEGACY_SLOTS_MAX) ? - XEN_NETBACK_TX_COPY_LEN : txreq.size; + if (ret >= XEN_NETBK_LEGACY_SLOTS_MAX - 1 && data_len < txreq.size) + data_len = txreq.size;
skb = xenvif_alloc_skb(data_len); if (unlikely(skb == NULL)) { @@ -923,8 +996,6 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, }
skb_shinfo(skb)->nr_frags = ret; - if (data_len < txreq.size) - skb_shinfo(skb)->nr_frags++; /* At this point shinfo->nr_frags is in fact the number of * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX. */ @@ -986,54 +1057,19 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, type); }
- XENVIF_TX_CB(skb)->pending_idx = pending_idx; - - __skb_put(skb, data_len); - queue->tx_copy_ops[*copy_ops].source.u.ref = txreq.gref; - queue->tx_copy_ops[*copy_ops].source.domid = queue->vif->domid; - queue->tx_copy_ops[*copy_ops].source.offset = txreq.offset; - - queue->tx_copy_ops[*copy_ops].dest.u.gmfn = - virt_to_gfn(skb->data); - queue->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF; - queue->tx_copy_ops[*copy_ops].dest.offset = - offset_in_page(skb->data) & ~XEN_PAGE_MASK; - - queue->tx_copy_ops[*copy_ops].len = data_len; - queue->tx_copy_ops[*copy_ops].flags = GNTCOPY_source_gref; - - (*copy_ops)++; - - if (data_len < txreq.size) { - frag_set_pending_idx(&skb_shinfo(skb)->frags[0], - pending_idx); - xenvif_tx_create_map_op(queue, pending_idx, &txreq, - extra_count, gop); - gop++; - } else { - frag_set_pending_idx(&skb_shinfo(skb)->frags[0], - INVALID_PENDING_IDX); - memcpy(&queue->pending_tx_info[pending_idx].req, - &txreq, sizeof(txreq)); - queue->pending_tx_info[pending_idx].extra_count = - extra_count; - } - - queue->pending_cons++; - - gop = xenvif_get_requests(queue, skb, txfrags, gop, - frag_overflow, nskb); + xenvif_get_requests(queue, skb, &txreq, txfrags, copy_ops, + map_ops, frag_overflow, nskb, extra_count, + data_len);
__skb_queue_tail(&queue->tx_queue, skb);
queue->tx.req_cons = idx;
- if (((gop-queue->tx_map_ops) >= ARRAY_SIZE(queue->tx_map_ops)) || + if ((*map_ops >= ARRAY_SIZE(queue->tx_map_ops)) || (*copy_ops >= ARRAY_SIZE(queue->tx_copy_ops))) break; }
- (*map_ops) = gop - queue->tx_map_ops; return; }
@@ -1112,9 +1148,8 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) while ((skb = __skb_dequeue(&queue->tx_queue)) != NULL) { struct xen_netif_tx_request *txp; u16 pending_idx; - unsigned data_len;
- pending_idx = XENVIF_TX_CB(skb)->pending_idx; + pending_idx = copy_pending_idx(skb, 0); txp = &queue->pending_tx_info[pending_idx].req;
/* Check the remap error code. */ @@ -1133,18 +1168,6 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) continue; }
- data_len = skb->len; - callback_param(queue, pending_idx).ctx = NULL; - if (data_len < txp->size) { - /* Append the packet payload as a fragment. */ - txp->offset += data_len; - txp->size -= data_len; - } else { - /* Schedule a response immediately. */ - xenvif_idx_release(queue, pending_idx, - XEN_NETIF_RSP_OKAY); - } - if (txp->flags & XEN_NETTXF_csum_blank) skb->ip_summed = CHECKSUM_PARTIAL; else if (txp->flags & XEN_NETTXF_data_validated) @@ -1330,7 +1353,7 @@ static inline void xenvif_tx_dealloc_action(struct xenvif_queue *queue) /* Called after netfront has transmitted */ int xenvif_tx_action(struct xenvif_queue *queue, int budget) { - unsigned nr_mops, nr_cops = 0; + unsigned nr_mops = 0, nr_cops = 0; int work_done, ret;
if (unlikely(!tx_work_todo(queue)))
From: Juergen Gross jgross@suse.com
[ Upstream commit 5834e72eda0b7e5767eb107259d98eef19ebd11f ]
Remove some unused macros and functions, make local functions static.
Signed-off-by: Juergen Gross jgross@suse.com Acked-by: Wei Liu wei.liu@kernel.org Link: https://lore.kernel.org/r/20220608043726.9380-1-jgross@suse.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 74e7e1efdad4 ("xen/netback: don't call kfree_skb() with interrupts disabled") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/xen-netback/common.h | 12 ------------ drivers/net/xen-netback/interface.c | 16 +--------------- drivers/net/xen-netback/netback.c | 4 +++- drivers/net/xen-netback/rx.c | 2 +- 4 files changed, 5 insertions(+), 29 deletions(-)
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 6a9178896c90..945647128c0e 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -48,7 +48,6 @@ #include <linux/debugfs.h>
typedef unsigned int pending_ring_idx_t; -#define INVALID_PENDING_RING_IDX (~0U)
struct pending_tx_info { struct xen_netif_tx_request req; /* tx request */ @@ -82,8 +81,6 @@ struct xenvif_rx_meta { /* Discriminate from any valid pending_idx value. */ #define INVALID_PENDING_IDX 0xFFFF
-#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE - #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
/* The maximum number of frags is derived from the size of a grant (same @@ -367,11 +364,6 @@ void xenvif_free(struct xenvif *vif); int xenvif_xenbus_init(void); void xenvif_xenbus_fini(void);
-int xenvif_schedulable(struct xenvif *vif); - -int xenvif_queue_stopped(struct xenvif_queue *queue); -void xenvif_wake_queue(struct xenvif_queue *queue); - /* (Un)Map communication rings. */ void xenvif_unmap_frontend_data_rings(struct xenvif_queue *queue); int xenvif_map_frontend_data_rings(struct xenvif_queue *queue, @@ -394,7 +386,6 @@ int xenvif_dealloc_kthread(void *data); irqreturn_t xenvif_ctrl_irq_fn(int irq, void *data);
bool xenvif_have_rx_work(struct xenvif_queue *queue, bool test_kthread); -void xenvif_rx_action(struct xenvif_queue *queue); void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
void xenvif_carrier_on(struct xenvif *vif); @@ -402,9 +393,6 @@ void xenvif_carrier_on(struct xenvif *vif); /* Callback from stack when TX packet can be released */ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
-/* Unmap a pending page and release it back to the guest */ -void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx); - static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue) { return MAX_PENDING_REQS - diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 7ce9807fc24c..645a804ab788 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -70,7 +70,7 @@ void xenvif_skb_zerocopy_complete(struct xenvif_queue *queue) wake_up(&queue->dealloc_wq); }
-int xenvif_schedulable(struct xenvif *vif) +static int xenvif_schedulable(struct xenvif *vif) { return netif_running(vif->dev) && test_bit(VIF_STATUS_CONNECTED, &vif->status) && @@ -178,20 +178,6 @@ irqreturn_t xenvif_interrupt(int irq, void *dev_id) return IRQ_HANDLED; }
-int xenvif_queue_stopped(struct xenvif_queue *queue) -{ - struct net_device *dev = queue->vif->dev; - unsigned int id = queue->id; - return netif_tx_queue_stopped(netdev_get_tx_queue(dev, id)); -} - -void xenvif_wake_queue(struct xenvif_queue *queue) -{ - struct net_device *dev = queue->vif->dev; - unsigned int id = queue->id; - netif_tx_wake_queue(netdev_get_tx_queue(dev, id)); -} - static u16 xenvif_select_queue(struct net_device *dev, struct sk_buff *skb, struct net_device *sb_dev) { diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 06fd61b71d37..fed0f7458e18 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -112,6 +112,8 @@ static void make_tx_response(struct xenvif_queue *queue, s8 st); static void push_tx_responses(struct xenvif_queue *queue);
+static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx); + static inline int tx_work_todo(struct xenvif_queue *queue);
static inline unsigned long idx_to_pfn(struct xenvif_queue *queue, @@ -1440,7 +1442,7 @@ static void push_tx_responses(struct xenvif_queue *queue) notify_remote_via_irq(queue->tx_irq); }
-void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx) +static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx) { int ret; struct gnttab_unmap_grant_ref tx_unmap_op; diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c index a0335407be42..932762177110 100644 --- a/drivers/net/xen-netback/rx.c +++ b/drivers/net/xen-netback/rx.c @@ -486,7 +486,7 @@ static void xenvif_rx_skb(struct xenvif_queue *queue)
#define RX_BATCH_SIZE 64
-void xenvif_rx_action(struct xenvif_queue *queue) +static void xenvif_rx_action(struct xenvif_queue *queue) { struct sk_buff_head completed_skbs; unsigned int work_done = 0;
From: Juergen Gross jgross@suse.com
[ Upstream commit 74e7e1efdad45580cc3839f2a155174cf158f9b5 ]
It is not allowed to call kfree_skb() from hardware interrupt context or with interrupts being disabled. So remove kfree_skb() from the spin_lock_irqsave() section and use the already existing "drop" label in xenvif_start_xmit() for dropping the SKB. At the same time replace the dev_kfree_skb() call there with a call of dev_kfree_skb_any(), as xenvif_start_xmit() can be called with disabled interrupts.
This is XSA-424 / CVE-2022-42328 / CVE-2022-42329.
Fixes: be81992f9086 ("xen/netback: don't queue unlimited number of packages") Reported-by: Yang Yingliang yangyingliang@huawei.com Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/xen-netback/common.h | 2 +- drivers/net/xen-netback/interface.c | 6 ++++-- drivers/net/xen-netback/rx.c | 8 +++++--- 3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 945647128c0e..1ba974969216 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -386,7 +386,7 @@ int xenvif_dealloc_kthread(void *data); irqreturn_t xenvif_ctrl_irq_fn(int irq, void *data);
bool xenvif_have_rx_work(struct xenvif_queue *queue, bool test_kthread); -void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb); +bool xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
void xenvif_carrier_on(struct xenvif *vif);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 645a804ab788..97cf5bc48902 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -255,14 +255,16 @@ xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev) if (vif->hash.alg == XEN_NETIF_CTRL_HASH_ALGORITHM_NONE) skb_clear_hash(skb);
- xenvif_rx_queue_tail(queue, skb); + if (!xenvif_rx_queue_tail(queue, skb)) + goto drop; + xenvif_kick_thread(queue);
return NETDEV_TX_OK;
drop: vif->dev->stats.tx_dropped++; - dev_kfree_skb(skb); + dev_kfree_skb_any(skb); return NETDEV_TX_OK; }
diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c index 932762177110..0ba754ebc5ba 100644 --- a/drivers/net/xen-netback/rx.c +++ b/drivers/net/xen-netback/rx.c @@ -82,9 +82,10 @@ static bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue) return false; }
-void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb) +bool xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb) { unsigned long flags; + bool ret = true;
spin_lock_irqsave(&queue->rx_queue.lock, flags);
@@ -92,8 +93,7 @@ void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb) struct net_device *dev = queue->vif->dev;
netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id)); - kfree_skb(skb); - queue->vif->dev->stats.rx_dropped++; + ret = false; } else { if (skb_queue_empty(&queue->rx_queue)) xenvif_update_needed_slots(queue, skb); @@ -104,6 +104,8 @@ void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb) }
spin_unlock_irqrestore(&queue->rx_queue.lock, flags); + + return ret; }
static struct sk_buff *xenvif_rx_dequeue(struct xenvif_queue *queue)
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 098e5edc5d048a8df8691fd9fde895af100be42b ]
While vb2_mmap took the mmap_lock mutex, vb2_get_unmapped_area didn't. Add this.
Also take this opportunity to move the 'q->memory != VB2_MEMORY_MMAP' check and vb2_fileio_is_active() check into __find_plane_by_offset() so both vb2_mmap and vb2_get_unmapped_area do the same checks.
Since q->memory is checked while mmap_lock is held, also take that lock in reqbufs and create_bufs when it is set, and set it back to MEMORY_UNKNOWN on error.
Fixes: f035eb4e976e ("[media] videobuf2: fix lockdep warning") Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Acked-by: Tomasz Figa tfiga@chromium.org Reviewed-by: Ricardo Ribalda ribalda@chromium.org Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../media/common/videobuf2/videobuf2-core.c | 102 +++++++++++++----- 1 file changed, 73 insertions(+), 29 deletions(-)
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c index 72350343a56a..3bafde87a125 100644 --- a/drivers/media/common/videobuf2/videobuf2-core.c +++ b/drivers/media/common/videobuf2/videobuf2-core.c @@ -787,7 +787,13 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, num_buffers = max_t(unsigned int, *count, q->min_buffers_needed); num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME); memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); + /* + * Set this now to ensure that drivers see the correct q->memory value + * in the queue_setup op. + */ + mutex_lock(&q->mmap_lock); q->memory = memory; + mutex_unlock(&q->mmap_lock);
/* * Ask the driver how many buffers and planes per buffer it requires. @@ -796,22 +802,27 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, ret = call_qop(q, queue_setup, q, &num_buffers, &num_planes, plane_sizes, q->alloc_devs); if (ret) - return ret; + goto error;
/* Check that driver has set sane values */ - if (WARN_ON(!num_planes)) - return -EINVAL; + if (WARN_ON(!num_planes)) { + ret = -EINVAL; + goto error; + }
for (i = 0; i < num_planes; i++) - if (WARN_ON(!plane_sizes[i])) - return -EINVAL; + if (WARN_ON(!plane_sizes[i])) { + ret = -EINVAL; + goto error; + }
/* Finally, allocate buffers and video memory */ allocated_buffers = __vb2_queue_alloc(q, memory, num_buffers, num_planes, plane_sizes); if (allocated_buffers == 0) { dprintk(q, 1, "memory allocation failed\n"); - return -ENOMEM; + ret = -ENOMEM; + goto error; }
/* @@ -852,7 +863,8 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, if (ret < 0) { /* * Note: __vb2_queue_free() will subtract 'allocated_buffers' - * from q->num_buffers. + * from q->num_buffers and it will reset q->memory to + * VB2_MEMORY_UNKNOWN. */ __vb2_queue_free(q, allocated_buffers); mutex_unlock(&q->mmap_lock); @@ -868,6 +880,12 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, q->waiting_for_buffers = !q->is_output;
return 0; + +error: + mutex_lock(&q->mmap_lock); + q->memory = VB2_MEMORY_UNKNOWN; + mutex_unlock(&q->mmap_lock); + return ret; } EXPORT_SYMBOL_GPL(vb2_core_reqbufs);
@@ -878,6 +896,7 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, { unsigned int num_planes = 0, num_buffers, allocated_buffers; unsigned plane_sizes[VB2_MAX_PLANES] = { }; + bool no_previous_buffers = !q->num_buffers; int ret;
if (q->num_buffers == VB2_MAX_FRAME) { @@ -885,13 +904,19 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, return -ENOBUFS; }
- if (!q->num_buffers) { + if (no_previous_buffers) { if (q->waiting_in_dqbuf && *count) { dprintk(q, 1, "another dup()ped fd is waiting for a buffer\n"); return -EBUSY; } memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); + /* + * Set this now to ensure that drivers see the correct q->memory + * value in the queue_setup op. + */ + mutex_lock(&q->mmap_lock); q->memory = memory; + mutex_unlock(&q->mmap_lock); q->waiting_for_buffers = !q->is_output; } else { if (q->memory != memory) { @@ -914,14 +939,15 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, ret = call_qop(q, queue_setup, q, &num_buffers, &num_planes, plane_sizes, q->alloc_devs); if (ret) - return ret; + goto error;
/* Finally, allocate buffers and video memory */ allocated_buffers = __vb2_queue_alloc(q, memory, num_buffers, num_planes, plane_sizes); if (allocated_buffers == 0) { dprintk(q, 1, "memory allocation failed\n"); - return -ENOMEM; + ret = -ENOMEM; + goto error; }
/* @@ -952,7 +978,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, if (ret < 0) { /* * Note: __vb2_queue_free() will subtract 'allocated_buffers' - * from q->num_buffers. + * from q->num_buffers and it will reset q->memory to + * VB2_MEMORY_UNKNOWN. */ __vb2_queue_free(q, allocated_buffers); mutex_unlock(&q->mmap_lock); @@ -967,6 +994,14 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, *count = allocated_buffers;
return 0; + +error: + if (no_previous_buffers) { + mutex_lock(&q->mmap_lock); + q->memory = VB2_MEMORY_UNKNOWN; + mutex_unlock(&q->mmap_lock); + } + return ret; } EXPORT_SYMBOL_GPL(vb2_core_create_bufs);
@@ -2120,6 +2155,22 @@ static int __find_plane_by_offset(struct vb2_queue *q, unsigned long off, struct vb2_buffer *vb; unsigned int buffer, plane;
+ /* + * Sanity checks to ensure the lock is held, MEMORY_MMAP is + * used and fileio isn't active. + */ + lockdep_assert_held(&q->mmap_lock); + + if (q->memory != VB2_MEMORY_MMAP) { + dprintk(q, 1, "queue is not currently set up for mmap\n"); + return -EINVAL; + } + + if (vb2_fileio_is_active(q)) { + dprintk(q, 1, "file io in progress\n"); + return -EBUSY; + } + /* * Go over all buffers and their planes, comparing the given offset * with an offset assigned to each plane. If a match is found, @@ -2219,11 +2270,6 @@ int vb2_mmap(struct vb2_queue *q, struct vm_area_struct *vma) int ret; unsigned long length;
- if (q->memory != VB2_MEMORY_MMAP) { - dprintk(q, 1, "queue is not currently set up for mmap\n"); - return -EINVAL; - } - /* * Check memory area access mode. */ @@ -2245,14 +2291,9 @@ int vb2_mmap(struct vb2_queue *q, struct vm_area_struct *vma)
mutex_lock(&q->mmap_lock);
- if (vb2_fileio_is_active(q)) { - dprintk(q, 1, "mmap: file io in progress\n"); - ret = -EBUSY; - goto unlock; - } - /* - * Find the plane corresponding to the offset passed by userspace. + * Find the plane corresponding to the offset passed by userspace. This + * will return an error if not MEMORY_MMAP or file I/O is in progress. */ ret = __find_plane_by_offset(q, off, &buffer, &plane); if (ret) @@ -2305,22 +2346,25 @@ unsigned long vb2_get_unmapped_area(struct vb2_queue *q, void *vaddr; int ret;
- if (q->memory != VB2_MEMORY_MMAP) { - dprintk(q, 1, "queue is not currently set up for mmap\n"); - return -EINVAL; - } + mutex_lock(&q->mmap_lock);
/* - * Find the plane corresponding to the offset passed by userspace. + * Find the plane corresponding to the offset passed by userspace. This + * will return an error if not MEMORY_MMAP or file I/O is in progress. */ ret = __find_plane_by_offset(q, off, &buffer, &plane); if (ret) - return ret; + goto unlock;
vb = q->bufs[buffer];
vaddr = vb2_plane_vaddr(vb, plane); + mutex_unlock(&q->mmap_lock); return vaddr ? (unsigned long)vaddr : -EINVAL; + +unlock: + mutex_unlock(&q->mmap_lock); + return ret; } EXPORT_SYMBOL_GPL(vb2_get_unmapped_area); #endif
From: Francesco Dolcini francesco.dolcini@toradex.com
commit ef19964da8a668c683f1d38274f6fb756e047945 upstream.
This reverts commit 753395ea1e45c724150070b5785900b6a44bd5fb.
It introduced a boot regression on colibri-imx7, and potentially any other i.MX7 boards with MTD partition list generated into the fdt by U-Boot.
While the commit we are reverting here is not obviously wrong, it fixes only a dt binding checker warning that is non-functional, while it introduces a boot regression and there is no obvious fix ready.
Fixes: 753395ea1e45 ("ARM: dts: imx7: Fix NAND controller size-cells") Signed-off-by: Francesco Dolcini francesco.dolcini@toradex.com Reviewed-by: Miquel Raynal miquel.raynal@bootlin.com Acked-by: Marek Vasut marex@denx.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/Y4dgBTGNWpM6SQXI@francesco-nb.int.toradex.com/ Link: https://lore.kernel.org/all/20221205144917.6514168a@xps-13/ Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/imx7s.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/boot/dts/imx7s.dtsi +++ b/arch/arm/boot/dts/imx7s.dtsi @@ -1221,10 +1221,10 @@ clocks = <&clks IMX7D_NAND_USDHC_BUS_RAWNAND_CLK>; };
- gpmi: nand-controller@33002000 { + gpmi: nand-controller@33002000{ compatible = "fsl,imx7d-gpmi-nand"; #address-cells = <1>; - #size-cells = <0>; + #size-cells = <1>; reg = <0x33002000 0x2000>, <0x33004000 0x4000>; reg-names = "gpmi-nand", "bch"; interrupts = <GIC_SPI 14 IRQ_TYPE_LEVEL_HIGH>;
From: Hans Verkuil hverkuil-cisco@xs4all.nl
commit 5eef2141776da02772c44ec406d6871a790761ee upstream.
Sanity checks were added to verify the v4l2_bt_timings blanking fields in order to avoid integer overflows when userspace passes weird values.
But that assumed that userspace would correctly fill in the front porch, backporch and sync values, but sometimes all you know is the total blanking, which is then assigned to just one of these fields.
And that can fail with these checks.
So instead set a maximum for the total horizontal and vertical blanking and check that each field remains below that.
That is still sufficient to avoid integer overflows, but it also allows for more flexibility in how userspace fills in these fields.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Fixes: 4b6d66a45ed3 ("media: v4l2-dv-timings: add sanity checks for blanking values") Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/v4l2-core/v4l2-dv-timings.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
--- a/drivers/media/v4l2-core/v4l2-dv-timings.c +++ b/drivers/media/v4l2-core/v4l2-dv-timings.c @@ -145,6 +145,8 @@ bool v4l2_valid_dv_timings(const struct const struct v4l2_bt_timings *bt = &t->bt; const struct v4l2_bt_timings_cap *cap = &dvcap->bt; u32 caps = cap->capabilities; + const u32 max_vert = 10240; + u32 max_hor = 3 * bt->width;
if (t->type != V4L2_DV_BT_656_1120) return false; @@ -166,14 +168,20 @@ bool v4l2_valid_dv_timings(const struct if (!bt->interlaced && (bt->il_vbackporch || bt->il_vsync || bt->il_vfrontporch)) return false; - if (bt->hfrontporch > 2 * bt->width || - bt->hsync > 1024 || bt->hbackporch > 1024) + /* + * Some video receivers cannot properly separate the frontporch, + * backporch and sync values, and instead they only have the total + * blanking. That can be assigned to any of these three fields. + * So just check that none of these are way out of range. + */ + if (bt->hfrontporch > max_hor || + bt->hsync > max_hor || bt->hbackporch > max_hor) return false; - if (bt->vfrontporch > 4096 || - bt->vsync > 128 || bt->vbackporch > 4096) + if (bt->vfrontporch > max_vert || + bt->vsync > max_vert || bt->vbackporch > max_vert) return false; - if (bt->interlaced && (bt->il_vfrontporch > 4096 || - bt->il_vsync > 128 || bt->il_vbackporch > 4096)) + if (bt->interlaced && (bt->il_vfrontporch > max_vert || + bt->il_vsync > max_vert || bt->il_vbackporch > max_vert)) return false; return fnc == NULL || fnc(t, fnc_handle); }
From: Tejun Heo tj@kernel.org
commit 4a7ba45b1a435e7097ca0f79a847d0949d0eb088 upstream.
memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too.
Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a call to __file_cft() which verified that the specified file is a regular cgroupfs file before further accesses. The cftype pointer returned from __file_cft() was no longer necessary and the commit inadvertently dropped the file type check with it allowing any file to slip through. With the invarients broken, the d_name and parent accesses can now race against renames and removals of arbitrary files and cause use-after-free's.
Fix the bug by resurrecting the file type check in __file_cft(). Now that cgroupfs is implemented through kernfs, checking the file operations needs to go through a layer of indirection. Instead, let's check the superblock and dentry type.
Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft") Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Jann Horn jannh@google.com Acked-by: Roman Gushchin roman.gushchin@linux.dev Acked-by: Johannes Weiner hannes@cmpxchg.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Michal Hocko mhocko@kernel.org Cc: Muchun Song songmuchun@bytedance.com Cc: Shakeel Butt shakeelb@google.com Cc: stable@vger.kernel.org [3.14+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/cgroup.h | 1 + kernel/cgroup/cgroup-internal.h | 1 - mm/memcontrol.c | 15 +++++++++++++-- 3 files changed, 14 insertions(+), 3 deletions(-)
--- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -68,6 +68,7 @@ struct css_task_iter { struct list_head iters_node; /* css_set->task_iters */ };
+extern struct file_system_type cgroup_fs_type; extern struct cgroup_root cgrp_dfl_root; extern struct css_set init_css_set;
--- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -169,7 +169,6 @@ extern struct mutex cgroup_mutex; extern spinlock_t css_set_lock; extern struct cgroup_subsys *cgroup_subsys[]; extern struct list_head cgroup_roots; -extern struct file_system_type cgroup_fs_type;
/* iterate across the hierarchies */ #define for_each_root(root) \ --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4866,6 +4866,7 @@ static ssize_t memcg_write_event_control unsigned int efd, cfd; struct fd efile; struct fd cfile; + struct dentry *cdentry; const char *name; char *endp; int ret; @@ -4917,6 +4918,16 @@ static ssize_t memcg_write_event_control goto out_put_cfile;
/* + * The control file must be a regular cgroup1 file. As a regular cgroup + * file can't be renamed, it's safe to access its name afterwards. + */ + cdentry = cfile.file->f_path.dentry; + if (cdentry->d_sb->s_type != &cgroup_fs_type || !d_is_reg(cdentry)) { + ret = -EINVAL; + goto out_put_cfile; + } + + /* * Determine the event callbacks and set them in @event. This used * to be done via struct cftype but cgroup core no longer knows * about these events. The following is crude but the whole thing @@ -4924,7 +4935,7 @@ static ssize_t memcg_write_event_control * * DO NOT ADD NEW FILES. */ - name = cfile.file->f_path.dentry->d_name.name; + name = cdentry->d_name.name;
if (!strcmp(name, "memory.usage_in_bytes")) { event->register_event = mem_cgroup_usage_register_event; @@ -4948,7 +4959,7 @@ static ssize_t memcg_write_event_control * automatically removed on cgroup destruction but the removal is * asynchronous, so take an extra ref on @css. */ - cfile_css = css_tryget_online_from_dir(cfile.file->f_path.dentry->d_parent, + cfile_css = css_tryget_online_from_dir(cdentry->d_parent, &memory_cgrp_subsys); ret = -EINVAL; if (IS_ERR(cfile_css))
From: John Starks jostarks@microsoft.com
commit fcd0ccd836ffad73d98a66f6fea7b16f735ea920 upstream.
For dax pud, pud_huge() returns true on x86. So the function works as long as hugetlb is configured. However, dax doesn't depend on hugetlb. Commit 414fd080d125 ("mm/gup: fix gup_pmd_range() for dax") fixed devmap-backed huge PMDs, but missed devmap-backed huge PUDs. Fix this as well.
This fixes the below kernel panic:
general protection fault, probably for non-canonical address 0x69e7c000cc478: 0000 [#1] SMP < snip > Call Trace: <TASK> get_user_pages_fast+0x1f/0x40 iov_iter_get_pages+0xc6/0x3b0 ? mempool_alloc+0x5d/0x170 bio_iov_iter_get_pages+0x82/0x4e0 ? bvec_alloc+0x91/0xc0 ? bio_alloc_bioset+0x19a/0x2a0 blkdev_direct_IO+0x282/0x480 ? __io_complete_rw_common+0xc0/0xc0 ? filemap_range_has_page+0x82/0xc0 generic_file_direct_write+0x9d/0x1a0 ? inode_update_time+0x24/0x30 __generic_file_write_iter+0xbd/0x1e0 blkdev_write_iter+0xb4/0x150 ? io_import_iovec+0x8d/0x340 io_write+0xf9/0x300 io_issue_sqe+0x3c3/0x1d30 ? sysvec_reschedule_ipi+0x6c/0x80 __io_queue_sqe+0x33/0x240 ? fget+0x76/0xa0 io_submit_sqes+0xe6a/0x18d0 ? __fget_light+0xd1/0x100 __x64_sys_io_uring_enter+0x199/0x880 ? __context_tracking_enter+0x1f/0x70 ? irqentry_exit_to_user_mode+0x24/0x30 ? irqentry_exit+0x1d/0x30 ? __context_tracking_exit+0xe/0x70 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x61/0xcb RIP: 0033:0x7fc97c11a7be < snip > </TASK> ---[ end trace 48b2e0e67debcaeb ]--- RIP: 0010:internal_get_user_pages_fast+0x340/0x990 < snip > Kernel panic - not syncing: Fatal exception Kernel Offset: disabled
Link: https://lkml.kernel.org/r/1670392853-28252-1-git-send-email-ssengar@linux.mi... Fixes: 414fd080d125 ("mm/gup: fix gup_pmd_range() for dax") Signed-off-by: John Starks jostarks@microsoft.com Signed-off-by: Saurabh Sengar ssengar@linux.microsoft.com Cc: Jan Kara jack@suse.cz Cc: Yu Zhao yuzhao@google.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: John Hubbard jhubbard@nvidia.com Cc: David Hildenbrand david@redhat.com Cc: Dan Williams dan.j.williams@intel.com Cc: Alistair Popple apopple@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/gup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/gup.c +++ b/mm/gup.c @@ -2564,7 +2564,7 @@ static int gup_pud_range(p4d_t *p4dp, p4 next = pud_addr_end(addr, end); if (unlikely(!pud_present(pud))) return 0; - if (unlikely(pud_huge(pud))) { + if (unlikely(pud_huge(pud) || pud_devmap(pud))) { if (!gup_huge_pud(pud, pudp, addr, next, flags, pages, nr)) return 0;
From: Ismael Ferreras Morezuelas swyterzone@gmail.com
commit 955aebd445e2b49622f2184b7abb82b05c060549 upstream.
The rationale of showing this is that it's potentially critical information to diagnose and find more CSR compatibility bugs in the future and it will save a lot of headaches.
Given that clones come from a wide array of vendors (some are actually Barrot, some are something else) and these numbers are what let us find differences between actual and fake ones, it will be immensely helpful to scour the Internet looking for this pattern and building an actual database to find correlations and improve the checks.
Cc: stable@vger.kernel.org Cc: Hans de Goede hdegoede@redhat.com Signed-off-by: Ismael Ferreras Morezuelas swyterzone@gmail.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/bluetooth/btusb.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -1833,6 +1833,11 @@ static int btusb_setup_csr(struct hci_de
rp = (struct hci_rp_read_local_version *)skb->data;
+ bt_dev_info(hdev, "CSR: Setting up dongle with HCI ver=%u rev=%04x; LMP ver=%u subver=%04x; manufacturer=%u", + le16_to_cpu(rp->hci_ver), le16_to_cpu(rp->hci_rev), + le16_to_cpu(rp->lmp_ver), le16_to_cpu(rp->lmp_subver), + le16_to_cpu(rp->manufacturer)); + /* Detect a wide host of Chinese controllers that aren't CSR. * * Known fake bcdDevices: 0x0100, 0x0134, 0x1915, 0x2520, 0x7558, 0x8891
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
commit b5ca338751ad4783ec8d37b5d99c3e37b7813e59 upstream.
It seems fake CSR 5.0 clones can cause the suspend notifier to be registered twice causing the following kernel panic:
[ 71.986122] Call Trace: [ 71.986124] <TASK> [ 71.986125] blocking_notifier_chain_register+0x33/0x60 [ 71.986130] hci_register_dev+0x316/0x3d0 [bluetooth 99b5497ea3d09708fa1366c1dc03288bf3cca8da] [ 71.986154] btusb_probe+0x979/0xd85 [btusb e1e0605a4f4c01984a4b9c8ac58c3666ae287477] [ 71.986159] ? __pm_runtime_set_status+0x1a9/0x300 [ 71.986162] ? ktime_get_mono_fast_ns+0x3e/0x90 [ 71.986167] usb_probe_interface+0xe3/0x2b0 [ 71.986171] really_probe+0xdb/0x380 [ 71.986174] ? pm_runtime_barrier+0x54/0x90 [ 71.986177] __driver_probe_device+0x78/0x170 [ 71.986180] driver_probe_device+0x1f/0x90 [ 71.986183] __device_attach_driver+0x89/0x110 [ 71.986186] ? driver_allows_async_probing+0x70/0x70 [ 71.986189] bus_for_each_drv+0x8c/0xe0 [ 71.986192] __device_attach+0xb2/0x1e0 [ 71.986195] bus_probe_device+0x92/0xb0 [ 71.986198] device_add+0x422/0x9a0 [ 71.986201] ? sysfs_merge_group+0xd4/0x110 [ 71.986205] usb_set_configuration+0x57a/0x820 [ 71.986208] usb_generic_driver_probe+0x4f/0x70 [ 71.986211] usb_probe_device+0x3a/0x110 [ 71.986213] really_probe+0xdb/0x380 [ 71.986216] ? pm_runtime_barrier+0x54/0x90 [ 71.986219] __driver_probe_device+0x78/0x170 [ 71.986221] driver_probe_device+0x1f/0x90 [ 71.986224] __device_attach_driver+0x89/0x110 [ 71.986227] ? driver_allows_async_probing+0x70/0x70 [ 71.986230] bus_for_each_drv+0x8c/0xe0 [ 71.986232] __device_attach+0xb2/0x1e0 [ 71.986235] bus_probe_device+0x92/0xb0 [ 71.986237] device_add+0x422/0x9a0 [ 71.986239] ? _dev_info+0x7d/0x98 [ 71.986242] ? blake2s_update+0x4c/0xc0 [ 71.986246] usb_new_device.cold+0x148/0x36d [ 71.986250] hub_event+0xa8a/0x1910 [ 71.986255] process_one_work+0x1c4/0x380 [ 71.986259] worker_thread+0x51/0x390 [ 71.986262] ? rescuer_thread+0x3b0/0x3b0 [ 71.986264] kthread+0xdb/0x110 [ 71.986266] ? kthread_complete_and_exit+0x20/0x20 [ 71.986268] ret_from_fork+0x1f/0x30 [ 71.986273] </TASK> [ 71.986274] ---[ end trace 0000000000000000 ]--- [ 71.986284] btusb: probe of 2-1.6:1.0 failed with error -17
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216683 Cc: stable@vger.kernel.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Tested-by: Leonardo Eugênio lelgenio@disroot.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/hci_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -3799,7 +3799,8 @@ int hci_register_dev(struct hci_dev *hde hci_sock_dev_event(hdev, HCI_DEV_REG); hci_dev_hold(hdev);
- if (!test_bit(HCI_QUIRK_NO_SUSPEND_NOTIFIER, &hdev->quirks)) { + if (!hdev->suspend_notifier.notifier_call && + !test_bit(HCI_QUIRK_NO_SUSPEND_NOTIFIER, &hdev->quirks)) { hdev->suspend_notifier.notifier_call = hci_suspend_notifier; error = register_pm_notifier(&hdev->suspend_notifier); if (error)
From: Thomas Huth thuth@redhat.com
commit 0dd4cdccdab3d74bd86b868768a7dca216bcce7e upstream.
We recently experienced some weird huge time jumps in nested guests when rebooting them in certain cases. After adding some debug code to the epoch handling in vsie.c (thanks to David Hildenbrand for the idea!), it was obvious that the "epdx" field (the multi-epoch extension) did not get set to 0xff in case the "epoch" field was negative. Seems like the code misses to copy the value from the epdx field from the guest to the shadow control block. By doing so, the weird time jumps are gone in our scenarios.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2140899 Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Signed-off-by: Thomas Huth thuth@redhat.com Reviewed-by: Christian Borntraeger borntraeger@linux.ibm.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Claudio Imbrenda imbrenda@linux.ibm.com Reviewed-by: Janosch Frank frankja@linux.ibm.com Cc: stable@vger.kernel.org # 4.19+ Link: https://lore.kernel.org/r/20221123090833.292938-1-thuth@redhat.com Message-Id: 20221123090833.292938-1-thuth@redhat.com Signed-off-by: Janosch Frank frankja@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kvm/vsie.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -535,8 +535,10 @@ static int shadow_scb(struct kvm_vcpu *v if (test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_CEI)) scb_s->eca |= scb_o->eca & ECA_CEI; /* Epoch Extension */ - if (test_kvm_facility(vcpu->kvm, 139)) + if (test_kvm_facility(vcpu->kvm, 139)) { scb_s->ecd |= scb_o->ecd & ECD_MEF; + scb_s->epdx = scb_o->epdx; + }
/* etoken */ if (test_kvm_facility(vcpu->kvm, 156))
From: Zack Rusin zackr@vmware.com
commit 6e90293618ed476d6b11f82ce724efbb9e9a071b upstream.
When SEV is enabled gmr's and mob's are explicitly disabled because the encrypted system memory can not be used by the hypervisor.
The driver was disabling GMR's but the presentation code, which depends on GMR's, wasn't honoring it which lead to black screen on hosts with SEV enabled.
Make sure screen objects presentation is not used when guest memory regions have been disabled to fix presentation on SEV enabled hosts.
Fixes: 3b0d6458c705 ("drm/vmwgfx: Refuse DMA operation when SEV encryption is active") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Zack Rusin zackr@vmware.com Reported-by: Nicholas Hunt nhunt@vmware.com Reviewed-by: Martin Krastev krastevm@vmware.com Link: https://patchwork.freedesktop.org/patch/msgid/20221201175341.491884-1-zack@k... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c @@ -949,6 +949,10 @@ int vmw_kms_sou_init_display(struct vmw_ struct drm_device *dev = dev_priv->dev; int i, ret;
+ /* Screen objects won't work if GMR's aren't available */ + if (!dev_priv->has_gmr) + return -ENOSYS; + if (!(dev_priv->capabilities & SVGA_CAP_SCREEN_OBJECT_2)) { DRM_INFO("Not using screen objects," " missing cap SCREEN_OBJECT_2\n");
From: Rob Clark robdclark@chromium.org
commit 24013314be6ee4ee456114a671e9fa3461323de8 upstream.
drm_gem_shmem_mmap() doesn't own this reference, resulting in the GEM object getting prematurely freed leading to a later use-after-free.
Link: https://syzkaller.appspot.com/bug?extid=c8ae65286134dd1b800d Reported-by: syzbot+c8ae65286134dd1b800d@syzkaller.appspotmail.com Fixes: 2194a63a818d ("drm: Add library for shmem backed GEM objects") Cc: stable@vger.kernel.org Signed-off-by: Rob Clark robdclark@chromium.org Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Javier Martinez Canillas javierm@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20221130185748.357410-2-robdcl... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_gem_shmem_helper.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -616,10 +616,8 @@ int drm_gem_shmem_mmap(struct drm_gem_ob shmem = to_drm_gem_shmem_obj(obj);
ret = drm_gem_shmem_get_pages(shmem); - if (ret) { - drm_gem_vm_close(vma); + if (ret) return ret; - }
vma->vm_flags |= VM_MIXEDMAP | VM_DONTEXPAND; vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
From: Rob Clark robdclark@chromium.org
commit 09bf649a74573cb596e211418a4f8008f265c5a9 upstream.
vm_open() is not allowed to fail. Fortunately we are guaranteed that the pages are already pinned, thanks to the initial mmap which is now being cloned into a forked process, and only need to increment the refcnt. So just increment it directly. Previously if a signal was delivered at the wrong time to the forking process, the mutex_lock_interruptible() could fail resulting in the pages_use_count not being incremented.
Fixes: 2194a63a818d ("drm: Add library for shmem backed GEM objects") Cc: stable@vger.kernel.org Signed-off-by: Rob Clark robdclark@chromium.org Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Javier Martinez Canillas javierm@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20221130185748.357410-3-robdcl... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_gem_shmem_helper.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -563,12 +563,20 @@ static void drm_gem_shmem_vm_open(struct { struct drm_gem_object *obj = vma->vm_private_data; struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj); - int ret;
WARN_ON(shmem->base.import_attach);
- ret = drm_gem_shmem_get_pages(shmem); - WARN_ON_ONCE(ret != 0); + mutex_lock(&shmem->pages_lock); + + /* + * We should have already pinned the pages when the buffer was first + * mmap'd, vm_open() just grabs an additional reference for the new + * mm the vma is getting copied into (ie. on fork()). + */ + if (!WARN_ON_ONCE(!shmem->pages_use_count)) + shmem->pages_use_count++; + + mutex_unlock(&shmem->pages_lock);
drm_gem_vm_open(vma); }
From: Ankit Patel anpatel@nvidia.com
commit f6d910a89a2391e5ce1f275d205023880a33d3f8 upstream.
Some additional USB mouse devices are needing ALWAYS_POLL quirk without which they disconnect and reconnect every 60s.
Add below devices to the known quirk list. CHERRY VID 0x046a, PID 0x000c MICROSOFT VID 0x045e, PID 0x0783 PRIMAX VID 0x0461, PID 0x4e2a
Signed-off-by: Ankit Patel anpatel@nvidia.com Signed-off-by: Haotien Hsu haotienh@nvidia.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-ids.h | 3 +++ drivers/hid/hid-quirks.c | 3 +++ 2 files changed, 6 insertions(+)
--- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -257,6 +257,7 @@ #define USB_DEVICE_ID_CH_AXIS_295 0x001c
#define USB_VENDOR_ID_CHERRY 0x046a +#define USB_DEVICE_ID_CHERRY_MOUSE_000C 0x000c #define USB_DEVICE_ID_CHERRY_CYMOTION 0x0023 #define USB_DEVICE_ID_CHERRY_CYMOTION_SOLAR 0x0027
@@ -874,6 +875,7 @@ #define USB_DEVICE_ID_MS_XBOX_ONE_S_CONTROLLER 0x02fd #define USB_DEVICE_ID_MS_PIXART_MOUSE 0x00cb #define USB_DEVICE_ID_8BITDO_SN30_PRO_PLUS 0x02e0 +#define USB_DEVICE_ID_MS_MOUSE_0783 0x0783
#define USB_VENDOR_ID_MOJO 0x8282 #define USB_DEVICE_ID_RETRO_ADAPTER 0x3201 @@ -1302,6 +1304,7 @@
#define USB_VENDOR_ID_PRIMAX 0x0461 #define USB_DEVICE_ID_PRIMAX_MOUSE_4D22 0x4d22 +#define USB_DEVICE_ID_PRIMAX_MOUSE_4E2A 0x4e2a #define USB_DEVICE_ID_PRIMAX_KEYBOARD 0x4e05 #define USB_DEVICE_ID_PRIMAX_REZEL 0x4e72 #define USB_DEVICE_ID_PRIMAX_PIXART_MOUSE_4D0F 0x4d0f --- a/drivers/hid/hid-quirks.c +++ b/drivers/hid/hid-quirks.c @@ -54,6 +54,7 @@ static const struct hid_device_id hid_qu { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_FLIGHT_SIM_YOKE), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_PRO_PEDALS), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_PRO_THROTTLE), HID_QUIRK_NOGET }, + { HID_USB_DEVICE(USB_VENDOR_ID_CHERRY, USB_DEVICE_ID_CHERRY_MOUSE_000C), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K65RGB), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K65RGB_RAPIDFIRE), HID_QUIRK_NO_INIT_REPORTS | HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K70RGB), HID_QUIRK_NO_INIT_REPORTS }, @@ -122,6 +123,7 @@ static const struct hid_device_id hid_qu { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_MOUSE_C05A), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_MOUSE_C06A), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_MCS, USB_DEVICE_ID_MCS_GAMEPADBLOCK), HID_QUIRK_MULTI_INPUT }, + { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_MOUSE_0783), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_PIXART_MOUSE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_POWER_COVER), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_SURFACE3_COVER), HID_QUIRK_NO_INIT_REPORTS }, @@ -146,6 +148,7 @@ static const struct hid_device_id hid_qu { HID_USB_DEVICE(USB_VENDOR_ID_PIXART, USB_DEVICE_ID_PIXART_OPTICAL_TOUCH_SCREEN), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_PIXART, USB_DEVICE_ID_PIXART_USB_OPTICAL_MOUSE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_MOUSE_4D22), HID_QUIRK_ALWAYS_POLL }, + { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_MOUSE_4E2A), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_PIXART_MOUSE_4D0F), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_PIXART_MOUSE_4D65), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_PIXART_MOUSE_4E22), HID_QUIRK_ALWAYS_POLL },
From: Anastasia Belova abelova@astralinux.ru
commit d180b6496143cd360c5d5f58ae4b9a8229c1f344 upstream.
If an empty buf is received, lbuf is also empty. So lbuf is accessed by index -1.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: f31a2de3fe36 ("HID: hid-lg4ff: Allow switching of Logitech gaming wheels between compatibility modes") Signed-off-by: Anastasia Belova abelova@astralinux.ru Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-lg4ff.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/hid/hid-lg4ff.c +++ b/drivers/hid/hid-lg4ff.c @@ -872,6 +872,12 @@ static ssize_t lg4ff_alternate_modes_sto return -ENOMEM;
i = strlen(lbuf); + + if (i == 0) { + kfree(lbuf); + return -EINVAL; + } + if (lbuf[i-1] == '\n') { if (i == 1) { kfree(lbuf);
From: ZhangPeng zhangpeng362@huawei.com
commit ec61b41918587be530398b0d1c9a0d16619397e5 upstream.
Syzbot reported shift-out-of-bounds in hid_report_raw_event.
microsoft 0003:045E:07DA.0001: hid_field_extract() called with n (128) > 32! (swapper/0) ====================================================================== UBSAN: shift-out-of-bounds in drivers/hid/hid-core.c:1323:20 shift exponent 127 is too large for 32-bit type 'int' CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc4-syzkaller-00159-g4bbf3422df78 #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:151 [inline] __ubsan_handle_shift_out_of_bounds+0x3a6/0x420 lib/ubsan.c:322 snto32 drivers/hid/hid-core.c:1323 [inline] hid_input_fetch_field drivers/hid/hid-core.c:1572 [inline] hid_process_report drivers/hid/hid-core.c:1665 [inline] hid_report_raw_event+0xd56/0x18b0 drivers/hid/hid-core.c:1998 hid_input_report+0x408/0x4f0 drivers/hid/hid-core.c:2066 hid_irq_in+0x459/0x690 drivers/hid/usbhid/hid-core.c:284 __usb_hcd_giveback_urb+0x369/0x530 drivers/usb/core/hcd.c:1671 dummy_timer+0x86b/0x3110 drivers/usb/gadget/udc/dummy_hcd.c:1988 call_timer_fn+0xf5/0x210 kernel/time/timer.c:1474 expire_timers kernel/time/timer.c:1519 [inline] __run_timers+0x76a/0x980 kernel/time/timer.c:1790 run_timer_softirq+0x63/0xf0 kernel/time/timer.c:1803 __do_softirq+0x277/0x75b kernel/softirq.c:571 __irq_exit_rcu+0xec/0x170 kernel/softirq.c:650 irq_exit_rcu+0x5/0x20 kernel/softirq.c:662 sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1107 ======================================================================
If the size of the integer (unsigned n) is bigger than 32 in snto32(), shift exponent will be too large for 32-bit type 'int', resulting in a shift-out-of-bounds bug. Fix this by adding a check on the size of the integer (unsigned n) in snto32(). To add support for n greater than 32 bits, set n to 32, if n is greater than 32.
Reported-by: syzbot+8b1641d2f14732407e23@syzkaller.appspotmail.com Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split") Signed-off-by: ZhangPeng zhangpeng362@huawei.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-core.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -1310,6 +1310,9 @@ static s32 snto32(__u32 value, unsigned if (!value || !n) return 0;
+ if (n > 32) + n = 32; + switch (n) { case 8: return ((__s8)value); case 16: return ((__s16)value);
From: Oliver Hartkopp socketcan@hartkopp.net
commit 0acc442309a0a1b01bcdaa135e56e6398a49439c upstream.
Analogue to commit 8aa59e355949 ("can: af_can: fix NULL pointer dereference in can_rx_register()") we need to check for a missing initialization of ml_priv in the receive path of CAN frames.
Since commit 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device") the check for dev->type to be ARPHRD_CAN is not sufficient anymore since bonding or tun netdevices claim to be CAN devices but do not initialize ml_priv accordingly.
Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device") Reported-by: syzbot+2d7f58292cb5b29eb5ad@syzkaller.appspotmail.com Reported-by: Wei Chen harperchen1110@gmail.com Signed-off-by: Oliver Hartkopp socketcan@hartkopp.net Link: https://lore.kernel.org/all/20221206201259.3028-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/can/af_can.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -680,7 +680,7 @@ static int can_rcv(struct sk_buff *skb, { struct canfd_frame *cfd = (struct canfd_frame *)skb->data;
- if (unlikely(dev->type != ARPHRD_CAN || skb->len != CAN_MTU)) { + if (unlikely(dev->type != ARPHRD_CAN || !can_get_ml_priv(dev) || skb->len != CAN_MTU)) { pr_warn_once("PF_CAN: dropped non conform CAN skbuff: dev type %d, len %d\n", dev->type, skb->len); goto free_skb; @@ -706,7 +706,7 @@ static int canfd_rcv(struct sk_buff *skb { struct canfd_frame *cfd = (struct canfd_frame *)skb->data;
- if (unlikely(dev->type != ARPHRD_CAN || skb->len != CANFD_MTU)) { + if (unlikely(dev->type != ARPHRD_CAN || !can_get_ml_priv(dev) || skb->len != CANFD_MTU)) { pr_warn_once("PF_CAN: dropped non conform CAN FD skbuff: dev type %d, len %d\n", dev->type, skb->len); goto free_skb;
From: Baolin Wang baolin.wang@linux.alibaba.com
commit fac35ba763ed07ba93154c95ffc0c4a55023707f upstream.
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and 1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
So when looking up a CONT-PTE size hugetlb page by follow_page(), it will use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size hugetlb in follow_page_pte(). However this pte entry lock is incorrect for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get the correct lock, which is mm->page_table_lock.
That means the pte entry of the CONT-PTE size hugetlb under current pte lock is unstable in follow_page_pte(), we can continue to migrate or poison the pte entry of the CONT-PTE size hugetlb, which can cause some potential race issues, even though they are under the 'pte lock'.
For example, suppose thread A is trying to look up a CONT-PTE size hugetlb page by move_pages() syscall under the lock, however antoher thread B can migrate the CONT-PTE hugetlb page at the same time, which will cause thread A to get an incorrect page, if thread A also wants to do page migration, then data inconsistency error occurs.
Moreover we have the same issue for CONT-PMD size hugetlb in follow_huge_pmd().
To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte() to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to get the correct pte entry lock to make the pte entry stable.
Mike said:
Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb: refactor find_num_contig()"). Patch series "Support for contiguous pte hugepages", v4. However, I do not believe these code paths were executed until migration support was added with 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") I would go with 5480280d3f2d for the Fixes: targe.
Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.166201756... Fixes: 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") Signed-off-by: Baolin Wang baolin.wang@linux.alibaba.com Suggested-by: Mike Kravetz mike.kravetz@oracle.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: David Hildenbrand david@redhat.com Cc: Muchun Song songmuchun@bytedance.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Samuel Mendoza-Jonas samjonas@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/hugetlb.h | 8 ++++---- mm/gup.c | 14 +++++++++++++- mm/hugetlb.c | 27 +++++++++++++-------------- 3 files changed, 30 insertions(+), 19 deletions(-)
--- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -174,8 +174,8 @@ struct page *follow_huge_addr(struct mm_ struct page *follow_huge_pd(struct vm_area_struct *vma, unsigned long address, hugepd_t hpd, int flags, int pdshift); -struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int flags); +struct page *follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, + int flags); struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, pud_t *pud, int flags); struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address, @@ -261,8 +261,8 @@ static inline struct page *follow_huge_p return NULL; }
-static inline struct page *follow_huge_pmd(struct mm_struct *mm, - unsigned long address, pmd_t *pmd, int flags) +static inline struct page *follow_huge_pmd_pte(struct vm_area_struct *vma, + unsigned long address, int flags) { return NULL; } --- a/mm/gup.c +++ b/mm/gup.c @@ -405,6 +405,18 @@ static struct page *follow_page_pte(stru if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == (FOLL_PIN | FOLL_GET))) return ERR_PTR(-EINVAL); + + /* + * Considering PTE level hugetlb, like continuous-PTE hugetlb on + * ARM64 architecture. + */ + if (is_vm_hugetlb_page(vma)) { + page = follow_huge_pmd_pte(vma, address, flags); + if (page) + return page; + return no_page_table(vma, flags); + } + retry: if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags); @@ -560,7 +572,7 @@ static struct page *follow_pmd_mask(stru if (pmd_none(pmdval)) return no_page_table(vma, flags); if (pmd_huge(pmdval) && is_vm_hugetlb_page(vma)) { - page = follow_huge_pmd(mm, address, pmd, flags); + page = follow_huge_pmd_pte(vma, address, flags); if (page) return page; return no_page_table(vma, flags); --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5585,12 +5585,13 @@ follow_huge_pd(struct vm_area_struct *vm }
struct page * __weak -follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int flags) +follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, int flags) { + struct hstate *h = hstate_vma(vma); + struct mm_struct *mm = vma->vm_mm; struct page *page = NULL; spinlock_t *ptl; - pte_t pte; + pte_t *ptep, pte;
/* FOLL_GET and FOLL_PIN are mutually exclusive. */ if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == @@ -5598,17 +5599,15 @@ follow_huge_pmd(struct mm_struct *mm, un return NULL;
retry: - ptl = pmd_lockptr(mm, pmd); - spin_lock(ptl); - /* - * make sure that the address range covered by this pmd is not - * unmapped from other threads. - */ - if (!pmd_huge(*pmd)) - goto out; - pte = huge_ptep_get((pte_t *)pmd); + ptep = huge_pte_offset(mm, address, huge_page_size(h)); + if (!ptep) + return NULL; + + ptl = huge_pte_lock(h, mm, ptep); + pte = huge_ptep_get(ptep); if (pte_present(pte)) { - page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT); + page = pte_page(pte) + + ((address & ~huge_page_mask(h)) >> PAGE_SHIFT); /* * try_grab_page() should always succeed here, because: a) we * hold the pmd (ptl) lock, and b) we've just checked that the @@ -5624,7 +5623,7 @@ retry: } else { if (is_hugetlb_entry_migration(pte)) { spin_unlock(ptl); - __migration_entry_wait(mm, (pte_t *)pmd, ptl); + __migration_entry_wait(mm, ptep, ptl); goto retry; } /*
From: Chris Wilson chris@chris-wilson.co.uk
[ Upstream commit 13be2efc390acd2a46a69a359f6efc00ca434599 ]
As previously noted in commit 66e4f4a9cc38 ("rtc: cmos: Use spin_lock_irqsave() in cmos_interrupt()"):
<4>[ 254.192378] WARNING: inconsistent lock state <4>[ 254.192384] 5.12.0-rc1-CI-CI_DRM_9834+ #1 Not tainted <4>[ 254.192396] -------------------------------- <4>[ 254.192400] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. <4>[ 254.192409] rtcwake/5309 [HC0[0]:SC0[0]:HE1:SE1] takes: <4>[ 254.192429] ffffffff8263c5f8 (rtc_lock){?...}-{2:2}, at: cmos_interrupt+0x18/0x100 <4>[ 254.192481] {IN-HARDIRQ-W} state was registered at: <4>[ 254.192488] lock_acquire+0xd1/0x3d0 <4>[ 254.192504] _raw_spin_lock+0x2a/0x40 <4>[ 254.192519] cmos_interrupt+0x18/0x100 <4>[ 254.192536] rtc_handler+0x1f/0xc0 <4>[ 254.192553] acpi_ev_fixed_event_detect+0x109/0x13c <4>[ 254.192574] acpi_ev_sci_xrupt_handler+0xb/0x28 <4>[ 254.192596] acpi_irq+0x13/0x30 <4>[ 254.192620] __handle_irq_event_percpu+0x43/0x2c0 <4>[ 254.192641] handle_irq_event_percpu+0x2b/0x70 <4>[ 254.192661] handle_irq_event+0x2f/0x50 <4>[ 254.192680] handle_fasteoi_irq+0x9e/0x150 <4>[ 254.192693] __common_interrupt+0x76/0x140 <4>[ 254.192715] common_interrupt+0x96/0xc0 <4>[ 254.192732] asm_common_interrupt+0x1e/0x40 <4>[ 254.192750] _raw_spin_unlock_irqrestore+0x38/0x60 <4>[ 254.192767] resume_irqs+0xba/0xf0 <4>[ 254.192786] dpm_resume_noirq+0x245/0x3d0 <4>[ 254.192811] suspend_devices_and_enter+0x230/0xaa0 <4>[ 254.192835] pm_suspend.cold.8+0x301/0x34a <4>[ 254.192859] state_store+0x7b/0xe0 <4>[ 254.192879] kernfs_fop_write_iter+0x11d/0x1c0 <4>[ 254.192899] new_sync_write+0x11d/0x1b0 <4>[ 254.192916] vfs_write+0x265/0x390 <4>[ 254.192933] ksys_write+0x5a/0xd0 <4>[ 254.192949] do_syscall_64+0x33/0x80 <4>[ 254.192965] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 254.192986] irq event stamp: 43775 <4>[ 254.192994] hardirqs last enabled at (43775): [<ffffffff81c00c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20 <4>[ 254.193023] hardirqs last disabled at (43774): [<ffffffff81aa691a>] sysvec_apic_timer_interrupt+0xa/0xb0 <4>[ 254.193049] softirqs last enabled at (42548): [<ffffffff81e00342>] __do_softirq+0x342/0x48e <4>[ 254.193074] softirqs last disabled at (42543): [<ffffffff810b45fd>] irq_exit_rcu+0xad/0xd0 <4>[ 254.193101] other info that might help us debug this: <4>[ 254.193107] Possible unsafe locking scenario:
<4>[ 254.193112] CPU0 <4>[ 254.193117] ---- <4>[ 254.193121] lock(rtc_lock); <4>[ 254.193137] <Interrupt> <4>[ 254.193142] lock(rtc_lock); <4>[ 254.193156] *** DEADLOCK ***
<4>[ 254.193161] 6 locks held by rtcwake/5309: <4>[ 254.193174] #0: ffff888104861430 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x5a/0xd0 <4>[ 254.193232] #1: ffff88810f823288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xe7/0x1c0 <4>[ 254.193282] #2: ffff888100cef3c0 (kn->active#285 <7>[ 254.192706] i915 0000:00:02.0: [drm:intel_modeset_setup_hw_state [i915]] [CRTC:51:pipe A] hw state readout: disabled <4>[ 254.193307] ){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf0/0x1c0 <4>[ 254.193333] #3: ffffffff82649fa8 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend.cold.8+0xce/0x34a <4>[ 254.193387] #4: ffffffff827a2108 (acpi_scan_lock){+.+.}-{3:3}, at: acpi_suspend_begin+0x47/0x70 <4>[ 254.193433] #5: ffff8881019ea178 (&dev->mutex){....}-{3:3}, at: device_resume+0x68/0x1e0 <4>[ 254.193485] stack backtrace: <4>[ 254.193492] CPU: 1 PID: 5309 Comm: rtcwake Not tainted 5.12.0-rc1-CI-CI_DRM_9834+ #1 <4>[ 254.193514] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019 <4>[ 254.193524] Call Trace: <4>[ 254.193536] dump_stack+0x7f/0xad <4>[ 254.193567] mark_lock.part.47+0x8ca/0xce0 <4>[ 254.193604] __lock_acquire+0x39b/0x2590 <4>[ 254.193626] ? asm_sysvec_apic_timer_interrupt+0x12/0x20 <4>[ 254.193660] lock_acquire+0xd1/0x3d0 <4>[ 254.193677] ? cmos_interrupt+0x18/0x100 <4>[ 254.193716] _raw_spin_lock+0x2a/0x40 <4>[ 254.193735] ? cmos_interrupt+0x18/0x100 <4>[ 254.193758] cmos_interrupt+0x18/0x100 <4>[ 254.193785] cmos_resume+0x2ac/0x2d0 <4>[ 254.193813] ? acpi_pm_set_device_wakeup+0x1f/0x110 <4>[ 254.193842] ? pnp_bus_suspend+0x10/0x10 <4>[ 254.193864] pnp_bus_resume+0x5e/0x90 <4>[ 254.193885] dpm_run_callback+0x5f/0x240 <4>[ 254.193914] device_resume+0xb2/0x1e0 <4>[ 254.193942] ? pm_dev_err+0x25/0x25 <4>[ 254.193974] dpm_resume+0xea/0x3f0 <4>[ 254.194005] dpm_resume_end+0x8/0x10 <4>[ 254.194030] suspend_devices_and_enter+0x29b/0xaa0 <4>[ 254.194066] pm_suspend.cold.8+0x301/0x34a <4>[ 254.194094] state_store+0x7b/0xe0 <4>[ 254.194124] kernfs_fop_write_iter+0x11d/0x1c0 <4>[ 254.194151] new_sync_write+0x11d/0x1b0 <4>[ 254.194183] vfs_write+0x265/0x390 <4>[ 254.194207] ksys_write+0x5a/0xd0 <4>[ 254.194232] do_syscall_64+0x33/0x80 <4>[ 254.194251] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 254.194274] RIP: 0033:0x7f07d79691e7 <4>[ 254.194293] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 <4>[ 254.194312] RSP: 002b:00007ffd9cc2c768 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 <4>[ 254.194337] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f07d79691e7 <4>[ 254.194352] RDX: 0000000000000004 RSI: 0000556ebfc63590 RDI: 000000000000000b <4>[ 254.194366] RBP: 0000556ebfc63590 R08: 0000000000000000 R09: 0000000000000004 <4>[ 254.194379] R10: 0000556ebf0ec2a6 R11: 0000000000000246 R12: 0000000000000004
which breaks S3-resume on fi-kbl-soraka presumably as that's slow enough to trigger the alarm during the suspend.
Fixes: 6950d046eb6e ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Xiaofei Tan tanxiaofei@huawei.com Cc: Alexandre Belloni alexandre.belloni@bootlin.com Cc: Alessandro Zummo a.zummo@towertech.it Cc: Ville Syrjälä ville.syrjala@linux.intel.com Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20210305122140.28774-1-chris@chris-wilson.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-cmos.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 21f2bdd025b6..d4f6c4dd42c4 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -1111,7 +1111,9 @@ static void cmos_check_wkalrm(struct device *dev) * ACK the rtc irq here */ if (t_now >= cmos->alarm_expires && cmos_use_acpi_alarm()) { + local_irq_disable(); cmos_interrupt(0, (void *)cmos->rtc); + local_irq_enable(); return; }
From: Mateusz Jończyk mat.jonczyk@o2.pl
[ Upstream commit 811f5559270f25c34c338d6eaa2ece2544c3d3bd ]
In mc146818_set_time(), CMOS_READ(RTC_CONTROL) was performed without the rtc_lock taken, which is required for CMOS accesses. Fix this.
Nothing in kernel modifies RTC_DM_BINARY, so a separate critical section is allowed here.
Fixes: dcf257e92622 ("rtc: mc146818: Reduce spinlock section in mc146818_set_time()") Signed-off-by: Mateusz Jończyk mat.jonczyk@o2.pl Cc: Alessandro Zummo a.zummo@towertech.it Cc: Alexandre Belloni alexandre.belloni@bootlin.com Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20220220090403.153928-1-mat.jonczyk@o2.pl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-mc146818-lib.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index 1ca866461d10..3783aaf9dd5a 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -283,8 +283,10 @@ int mc146818_set_time(struct rtc_time *time) if (yrs >= 100) yrs -= 100;
- if (!(CMOS_READ(RTC_CONTROL) & RTC_DM_BINARY) - || RTC_ALWAYS_BCD) { + spin_lock_irqsave(&rtc_lock, flags); + save_control = CMOS_READ(RTC_CONTROL); + spin_unlock_irqrestore(&rtc_lock, flags); + if (!(save_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { sec = bin2bcd(sec); min = bin2bcd(min); hrs = bin2bcd(hrs);
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 7372971c1be5b7d4fdd8ad237798bdc1d1d54162 ]
The mc146818_get_time() function returns zero on success or negative a error code on failure. It needs to be type int.
Fixes: d35786b3a28d ("rtc: mc146818-lib: change return values of mc146818_get_time()") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Mateusz Jończyk mat.jonczyk@o2.pl Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20220111071922.GE11243@kili Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-mc146818-lib.c | 2 +- include/linux/mc146818rtc.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index 3783aaf9dd5a..347655d24b5d 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -103,7 +103,7 @@ bool mc146818_does_rtc_work(void) } EXPORT_SYMBOL_GPL(mc146818_does_rtc_work);
-unsigned int mc146818_get_time(struct rtc_time *time) +int mc146818_get_time(struct rtc_time *time) { unsigned char ctrl; unsigned long flags; diff --git a/include/linux/mc146818rtc.h b/include/linux/mc146818rtc.h index fb042e0e7d76..b0da04fe087b 100644 --- a/include/linux/mc146818rtc.h +++ b/include/linux/mc146818rtc.h @@ -126,7 +126,7 @@ struct cmos_rtc_board_info { #endif /* ARCH_RTC_LOCATION */
bool mc146818_does_rtc_work(void); -unsigned int mc146818_get_time(struct rtc_time *time); +int mc146818_get_time(struct rtc_time *time); int mc146818_set_time(struct rtc_time *time);
bool mc146818_avoid_UIP(void (*callback)(unsigned char seconds, void *param),
From: Stefano Brivio sbrivio@redhat.com
[ Upstream commit 97d4d394b58777f7056ebba8ffdb4002d0563259 ]
Embarrassingly, nft_pipapo_insert() checked for interval validity in the first field only.
The start_p and end_p pointers were reset to key data from the first field at every iteration of the loop which was supposed to go over the set fields.
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 949da87dbb06..30cf0673d6c1 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -1162,6 +1162,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set, struct nft_pipapo_match *m = priv->clone; u8 genmask = nft_genmask_next(net); struct nft_pipapo_field *f; + const u8 *start_p, *end_p; int i, bsize_max, err = 0;
if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END)) @@ -1202,9 +1203,9 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set, }
/* Validate */ + start_p = start; + end_p = end; nft_pipapo_for_each_field(f, i, m) { - const u8 *start_p = start, *end_p = end; - if (f->rules >= (unsigned long)NFT_PIPAPO_RULE0_MAX) return -ENOSPC;
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit 4d002d6a2a00ac1c433899bd7625c6400a74cfba ]
In cc2520_hw_init(), if oscillator start failed, the error code should be returned.
Fixes: 0da6bc8cc341 ("ieee802154: cc2520: adds driver for TI CC2520 radio") Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Link: https://lore.kernel.org/r/20221120075046.2213633-1-william.xuanziyang@huawei... Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ieee802154/cc2520.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ieee802154/cc2520.c b/drivers/net/ieee802154/cc2520.c index 4517517215f2..a8369bfa4050 100644 --- a/drivers/net/ieee802154/cc2520.c +++ b/drivers/net/ieee802154/cc2520.c @@ -970,7 +970,7 @@ static int cc2520_hw_init(struct cc2520_private *priv)
if (timeout-- <= 0) { dev_err(&priv->spi->dev, "oscillator start failed!\n"); - return ret; + return -ETIMEDOUT; } udelay(1); } while (!(status & CC2520_STATUS_XOSC32M_STABLE));
From: Hauke Mehrtens hauke@hauke-m.de
[ Upstream commit 1e24c54da257ab93cff5826be8a793b014a5dc9c ]
The struct cas_control embeds multiple generic SPI structures and we have to make sure these structures are initialized to default values. This driver does not set all attributes. When using kmalloc before some attributes were not initialized and contained random data which caused random crashes at bootup.
Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver") Signed-off-by: Hauke Mehrtens hauke@hauke-m.de Link: https://lore.kernel.org/r/20221121002201.1339636-1-hauke@hauke-m.de Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ieee802154/ca8210.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c index fd9f33c833fa..95ef3b6f98dd 100644 --- a/drivers/net/ieee802154/ca8210.c +++ b/drivers/net/ieee802154/ca8210.c @@ -926,7 +926,7 @@ static int ca8210_spi_transfer(
dev_dbg(&spi->dev, "%s called\n", __func__);
- cas_ctl = kmalloc(sizeof(*cas_ctl), GFP_ATOMIC); + cas_ctl = kzalloc(sizeof(*cas_ctl), GFP_ATOMIC); if (!cas_ctl) return -ENOMEM;
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 1feeae071507ad65cf9f462a1bdd543a4bf89e71 ]
All warnings (new ones prefixed by >>):
net/netfilter/nf_conntrack_netlink.c: In function '__ctnetlink_glue_build':
net/netfilter/nf_conntrack_netlink.c:2674:13: warning: unused variable 'mark' [-Wunused-variable]
2674 | u32 mark; | ^~~~
Fixes: 52d1aa8b8249 ("netfilter: conntrack: Fix data-races around ct mark") Reported-by: kernel test robot lkp@intel.com Tested-by: Ivan Babrou ivan@ivan.computer Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_conntrack_netlink.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index c402283e7545..2efdc50f978b 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -317,8 +317,13 @@ ctnetlink_dump_timestamp(struct sk_buff *skb, const struct nf_conn *ct) }
#ifdef CONFIG_NF_CONNTRACK_MARK -static int ctnetlink_dump_mark(struct sk_buff *skb, u32 mark) +static int ctnetlink_dump_mark(struct sk_buff *skb, const struct nf_conn *ct) { + u32 mark = READ_ONCE(ct->mark); + + if (!mark) + return 0; + if (nla_put_be32(skb, CTA_MARK, htonl(mark))) goto nla_put_failure; return 0; @@ -532,7 +537,7 @@ static int ctnetlink_dump_extinfo(struct sk_buff *skb, static int ctnetlink_dump_info(struct sk_buff *skb, struct nf_conn *ct) { if (ctnetlink_dump_status(skb, ct) < 0 || - ctnetlink_dump_mark(skb, READ_ONCE(ct->mark)) < 0 || + ctnetlink_dump_mark(skb, ct) < 0 || ctnetlink_dump_secctx(skb, ct) < 0 || ctnetlink_dump_id(skb, ct) < 0 || ctnetlink_dump_use(skb, ct) < 0 || @@ -711,7 +716,6 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item) struct sk_buff *skb; unsigned int type; unsigned int flags = 0, group; - u32 mark; int err;
if (events & (1 << IPCT_DESTROY)) { @@ -812,9 +816,8 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item) }
#ifdef CONFIG_NF_CONNTRACK_MARK - mark = READ_ONCE(ct->mark); - if ((events & (1 << IPCT_MARK) || mark) && - ctnetlink_dump_mark(skb, mark) < 0) + if (events & (1 << IPCT_MARK) && + ctnetlink_dump_mark(skb, ct) < 0) goto nla_put_failure; #endif nlmsg_end(skb, nlh); @@ -2671,7 +2674,6 @@ static int __ctnetlink_glue_build(struct sk_buff *skb, struct nf_conn *ct) { const struct nf_conntrack_zone *zone; struct nlattr *nest_parms; - u32 mark;
zone = nf_ct_zone(ct);
@@ -2729,8 +2731,7 @@ static int __ctnetlink_glue_build(struct sk_buff *skb, struct nf_conn *ct) goto nla_put_failure;
#ifdef CONFIG_NF_CONNTRACK_MARK - mark = READ_ONCE(ct->mark); - if (mark && ctnetlink_dump_mark(skb, mark) < 0) + if (ctnetlink_dump_mark(skb, ct) < 0) goto nla_put_failure; #endif if (ctnetlink_dump_labels(skb, ct) < 0)
From: Qiqi Zhang eddy.zhang@rock-chips.com
[ Upstream commit 8c115864501fc09932cdfec53d9ec1cde82b4a28 ]
According to the description in ti-sn65dsi86's datasheet:
CHA_HSYNC_POLARITY: 0 = Active High Pulse. Synchronization signal is high for the sync pulse width. (default) 1 = Active Low Pulse. Synchronization signal is low for the sync pulse width.
CHA_VSYNC_POLARITY: 0 = Active High Pulse. Synchronization signal is high for the sync pulse width. (Default) 1 = Active Low Pulse. Synchronization signal is low for the sync pulse width.
We should only set these bits when the polarity is negative.
Fixes: a095f15c00e2 ("drm/bridge: add support for sn65dsi86 bridge driver") Signed-off-by: Qiqi Zhang eddy.zhang@rock-chips.com Reviewed-by: Douglas Anderson dianders@chromium.org Tested-by: Douglas Anderson dianders@chromium.org Reviewed-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://patchwork.freedesktop.org/patch/msgid/20221125104558.84616-1-eddy.zh... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/ti-sn65dsi86.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c b/drivers/gpu/drm/bridge/ti-sn65dsi86.c index 1a58481037b3..77a447a3fb1d 100644 --- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c +++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c @@ -621,9 +621,9 @@ static void ti_sn_bridge_set_video_timings(struct ti_sn_bridge *pdata) &pdata->bridge.encoder->crtc->state->adjusted_mode; u8 hsync_polarity = 0, vsync_polarity = 0;
- if (mode->flags & DRM_MODE_FLAG_PHSYNC) + if (mode->flags & DRM_MODE_FLAG_NHSYNC) hsync_polarity = CHA_HSYNC_POLARITY; - if (mode->flags & DRM_MODE_FLAG_PVSYNC) + if (mode->flags & DRM_MODE_FLAG_NVSYNC) vsync_polarity = CHA_VSYNC_POLARITY;
ti_sn_bridge_write_u16(pdata, SN_CHA_ACTIVE_LINE_LENGTH_LOW_REG,
From: Xiongfeng Wang wangxiongfeng2@huawei.com
[ Upstream commit 45fecdb9f658d9c82960c98240bc0770ade19aca ]
for_each_pci_dev() is implemented by pci_get_device(). The comment of pci_get_device() says that it will increase the reference count for the returned pci_dev and also decrease the reference count for the input pci_dev @from if it is not NULL.
If we break for_each_pci_dev() loop with pdev not NULL, we need to call pci_dev_put() to decrease the reference count. Add the missing pci_dev_put() after the 'out' label. Since pci_dev_put() can handle NULL input parameter, there is no problem for the 'Device not found' branch. For the normal path, add pci_dev_put() in amd_gpio_exit().
Fixes: f942a7de047d ("gpio: add a driver for GPIO pins found on AMD-8111 south bridge chips") Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-amd8111.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpio/gpio-amd8111.c b/drivers/gpio/gpio-amd8111.c index fdcebe59510d..68d95051dd0e 100644 --- a/drivers/gpio/gpio-amd8111.c +++ b/drivers/gpio/gpio-amd8111.c @@ -231,7 +231,10 @@ static int __init amd_gpio_init(void) ioport_unmap(gp.pm); goto out; } + return 0; + out: + pci_dev_put(pdev); return err; }
@@ -239,6 +242,7 @@ static void __exit amd_gpio_exit(void) { gpiochip_remove(&gp.chip); ioport_unmap(gp.pm); + pci_dev_put(gp.pdev); }
module_init(amd_gpio_init);
From: Akihiko Odaki akihiko.odaki@daynix.com
[ Upstream commit eed913f6919e253f35d454b2f115f2a4db2b741a ]
e1000_xmit_frame is expected to stop the queue and dispatch frames to hardware if there is not sufficient space for the next frame in the buffer, but sometimes it failed to do so because the estimated maximum size of frame was wrong. As the consequence, the later invocation of e1000_xmit_frame failed with NETDEV_TX_BUSY, and the frame in the buffer remained forever, resulting in a watchdog failure.
This change fixes the estimated size by making it match with the condition for NETDEV_TX_BUSY. Apparently, the old estimation failed to account for the following lines which determines the space requirement for not causing NETDEV_TX_BUSY: ``` /* reserve a descriptor for the offload context */ if ((mss) || (skb->ip_summed == CHECKSUM_PARTIAL)) count++; count++;
count += DIV_ROUND_UP(len, adapter->tx_fifo_limit); ```
This issue was found when running http-stress02 test included in Linux Test Project 20220930 on QEMU with the following commandline: ``` qemu-system-x86_64 -M q35,accel=kvm -m 8G -smp 8 -drive if=virtio,format=raw,file=root.img,file.locking=on -device e1000e,netdev=netdev -netdev tap,script=ifup,downscript=no,id=netdev ```
Fixes: bc7f75fa9788 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)") Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com Tested-by: Gurucharan G gurucharanx.g@intel.com (A Contingent worker at Intel) Tested-by: Naama Meir naamax.meir@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/e1000e/netdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index d0c4de023112..ae0c9aaab48d 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -5937,9 +5937,9 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb, e1000_tx_queue(tx_ring, tx_flags, count); /* Make sure there is space in the ring for the next send. */ e1000_maybe_stop_tx(tx_ring, - (MAX_SKB_FRAGS * + ((MAX_SKB_FRAGS + 1) * DIV_ROUND_UP(PAGE_SIZE, - adapter->tx_fifo_limit) + 2)); + adapter->tx_fifo_limit) + 4));
if (!netdev_xmit_more() || netif_xmit_stopped(netdev_get_tx_queue(netdev, 0))) {
From: Akihiko Odaki akihiko.odaki@daynix.com
[ Upstream commit 28e96556baca7056d11d9fb3cdd0aba4483e00d8 ]
Without this change, the interrupt test fail with MSI-X environment:
$ sudo ethtool -t enp0s2 offline [ 43.921783] igb 0000:00:02.0: offline testing starting [ 44.855824] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Down [ 44.961249] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 51.272202] igb 0000:00:02.0: testing shared interrupt [ 56.996975] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX The test result is FAIL The test extra info: Register test (offline) 0 Eeprom test (offline) 0 Interrupt test (offline) 4 Loopback test (offline) 0 Link test (on/offline) 0
Here, "4" means an expected interrupt was not delivered.
To fix this, route IRQs correctly to the first MSI-X vector by setting IVAR_MISC. Also, set bit 0 of EIMS so that the vector will not be masked. The interrupt test now runs properly with this change:
$ sudo ethtool -t enp0s2 offline [ 42.762985] igb 0000:00:02.0: offline testing starting [ 50.141967] igb 0000:00:02.0: testing shared interrupt [ 56.163957] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX The test result is PASS The test extra info: Register test (offline) 0 Eeprom test (offline) 0 Interrupt test (offline) 0 Loopback test (offline) 0 Link test (on/offline) 0
Fixes: 4eefa8f01314 ("igb: add single vector msi-x testing to interrupt test") Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Gurucharan G gurucharanx.g@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index 28baf203459a..5e3b0a5843a8 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -1413,6 +1413,8 @@ static int igb_intr_test(struct igb_adapter *adapter, u64 *data) *data = 1; return -1; } + wr32(E1000_IVAR_MISC, E1000_IVAR_VALID << 8); + wr32(E1000_EIMS, BIT(0)); } else if (adapter->flags & IGB_FLAG_HAS_MSI) { shared_int = false; if (request_irq(irq,
From: YueHaibing yuehaibing@huawei.com
[ Upstream commit 421f8663b3a775c32f724f793264097c60028f2e ]
commit 8d820bc9d12b ("net: broadcom: Fix BCMGENET Kconfig") fixes the build that contain 99addbe31f55 ("net: broadcom: Select BROADCOM_PHY for BCMGENET") and enable BCMGENET=y but PTP_1588_CLOCK_OPTIONAL=m, which otherwise leads to a link failure. However this may trigger a runtime failure.
Fix the original issue by propagating the PTP_1588_CLOCK_OPTIONAL dependency of BROADCOM_PHY down to BCMGENET.
Fixes: 8d820bc9d12b ("net: broadcom: Fix BCMGENET Kconfig") Fixes: 99addbe31f55 ("net: broadcom: Select BROADCOM_PHY for BCMGENET") Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Suggested-by: Arnd Bergmann arnd@arndb.de Signed-off-by: YueHaibing yuehaibing@huawei.com Acked-by: Arnd Bergmann arnd@arndb.de Link: https://lore.kernel.org/r/20221125115003.30308-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig index 7b79528d6eed..06aaeaadf2e9 100644 --- a/drivers/net/ethernet/broadcom/Kconfig +++ b/drivers/net/ethernet/broadcom/Kconfig @@ -63,6 +63,7 @@ config BCM63XX_ENET config BCMGENET tristate "Broadcom GENET internal MAC support" depends on HAS_IOMEM + depends on PTP_1588_CLOCK_OPTIONAL || !ARCH_BCM2835 select MII select PHYLIB select FIXED_PHY
On 2022/12/12 21:10, Greg Kroah-Hartman wrote:
From: YueHaibing yuehaibing@huawei.com
[ Upstream commit 421f8663b3a775c32f724f793264097c60028f2e ]
commit 8d820bc9d12b ("net: broadcom: Fix BCMGENET Kconfig") fixes the build that contain 99addbe31f55 ("net: broadcom: Select BROADCOM_PHY for BCMGENET") and enable BCMGENET=y but PTP_1588_CLOCK_OPTIONAL=m, which otherwise leads to a link failure. However this may trigger a runtime failure.
Fix the original issue by propagating the PTP_1588_CLOCK_OPTIONAL dependency of BROADCOM_PHY down to BCMGENET.
Fixes: 8d820bc9d12b ("net: broadcom: Fix BCMGENET Kconfig") Fixes: 99addbe31f55 ("net: broadcom: Select BROADCOM_PHY for BCMGENET") Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Suggested-by: Arnd Bergmann arnd@arndb.de Signed-off-by: YueHaibing yuehaibing@huawei.com Acked-by: Arnd Bergmann arnd@arndb.de Link: https://lore.kernel.org/r/20221125115003.30308-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
drivers/net/ethernet/broadcom/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig index 7b79528d6eed..06aaeaadf2e9 100644 --- a/drivers/net/ethernet/broadcom/Kconfig +++ b/drivers/net/ethernet/broadcom/Kconfig @@ -63,6 +63,7 @@ config BCM63XX_ENET config BCMGENET tristate "Broadcom GENET internal MAC support" depends on HAS_IOMEM
- depends on PTP_1588_CLOCK_OPTIONAL || !ARCH_BCM2835
This commit is not needed by 5.10, see
commit 7be134eb691f6a54b267dbc321530ce0221a76b1 Author: Greg Kroah-Hartman gregkh@linuxfoundation.org Date: Fri Nov 25 15:51:06 2022 +0100
Revert "net: broadcom: Fix BCMGENET Kconfig"
This reverts commit fbb4e8e6dc7b38b3007354700f03c8ad2d9a2118 which is commit 8d820bc9d12b8beebca836cceaf2bbe68216c2f8 upstream.
It causes runtime failures as reported by Naresh and Arnd writes:
Greg, please just revert fbb4e8e6dc7b ("net: broadcom: Fix BCMGENET Kconfig") in stable/linux-5.10.y: it depends on e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies"), which we probably don't want backported from 5.15 to 5.10.
select MII select PHYLIB select FIXED_PHY
On Tue, Dec 13, 2022 at 03:24:50PM +0800, YueHaibing wrote:
On 2022/12/12 21:10, Greg Kroah-Hartman wrote:
From: YueHaibing yuehaibing@huawei.com
[ Upstream commit 421f8663b3a775c32f724f793264097c60028f2e ]
commit 8d820bc9d12b ("net: broadcom: Fix BCMGENET Kconfig") fixes the build that contain 99addbe31f55 ("net: broadcom: Select BROADCOM_PHY for BCMGENET") and enable BCMGENET=y but PTP_1588_CLOCK_OPTIONAL=m, which otherwise leads to a link failure. However this may trigger a runtime failure.
Fix the original issue by propagating the PTP_1588_CLOCK_OPTIONAL dependency of BROADCOM_PHY down to BCMGENET.
Fixes: 8d820bc9d12b ("net: broadcom: Fix BCMGENET Kconfig") Fixes: 99addbe31f55 ("net: broadcom: Select BROADCOM_PHY for BCMGENET") Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Suggested-by: Arnd Bergmann arnd@arndb.de Signed-off-by: YueHaibing yuehaibing@huawei.com Acked-by: Arnd Bergmann arnd@arndb.de Link: https://lore.kernel.org/r/20221125115003.30308-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
drivers/net/ethernet/broadcom/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig index 7b79528d6eed..06aaeaadf2e9 100644 --- a/drivers/net/ethernet/broadcom/Kconfig +++ b/drivers/net/ethernet/broadcom/Kconfig @@ -63,6 +63,7 @@ config BCM63XX_ENET config BCMGENET tristate "Broadcom GENET internal MAC support" depends on HAS_IOMEM
- depends on PTP_1588_CLOCK_OPTIONAL || !ARCH_BCM2835
This commit is not needed by 5.10, see
commit 7be134eb691f6a54b267dbc321530ce0221a76b1 Author: Greg Kroah-Hartman gregkh@linuxfoundation.org Date: Fri Nov 25 15:51:06 2022 +0100
Revert "net: broadcom: Fix BCMGENET Kconfig"
This reverts commit fbb4e8e6dc7b38b3007354700f03c8ad2d9a2118 which is commit 8d820bc9d12b8beebca836cceaf2bbe68216c2f8 upstream.
It causes runtime failures as reported by Naresh and Arnd writes:
Greg, please just revert fbb4e8e6dc7b ("net: broadcom: Fix BCMGENET Kconfig") in stable/linux-5.10.y: it depends on e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies"), which we probably don't want backported from 5.15 to 5.10.
Now dropped, thanks.
greg k-h
From: Guillaume BRUN the.cheaterman@gmail.com
[ Upstream commit d3d6b1bf85aefe0ebc0624574b3bb62f0693914c ]
Cheap monitors sometimes advertise YUV modes they don't really have (HDMI specification mandates YUV support so even monitors without actual support will often wrongfully advertise it) which results in YUV matches and user forum complaints of a red tint to light colour display areas in common desktop environments.
Moving the default RGB fall-back before YUV selection results in RGB mode matching in most cases, reducing complaints.
Fixes: 6c3c719936da ("drm/bridge: synopsys: dw-hdmi: add bus format negociation") Signed-off-by: Guillaume BRUN the.cheaterman@gmail.com Tested-by: Christian Hewitt christianshewitt@gmail.com Reviewed-by: Robert Foss robert.foss@linaro.org Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20221116143523.2126-1-the.chea... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c index 356c7d0bd035..2c3c743df950 100644 --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c @@ -2609,6 +2609,9 @@ static u32 *dw_hdmi_bridge_atomic_get_output_bus_fmts(struct drm_bridge *bridge, * if supported. In any case the default RGB888 format is added */
+ /* Default 8bit RGB fallback */ + output_fmts[i++] = MEDIA_BUS_FMT_RGB888_1X24; + if (max_bpc >= 16 && info->bpc == 16) { if (info->color_formats & DRM_COLOR_FORMAT_YCRCB444) output_fmts[i++] = MEDIA_BUS_FMT_YUV16_1X48; @@ -2642,9 +2645,6 @@ static u32 *dw_hdmi_bridge_atomic_get_output_bus_fmts(struct drm_bridge *bridge, if (info->color_formats & DRM_COLOR_FORMAT_YCRCB444) output_fmts[i++] = MEDIA_BUS_FMT_YUV8_1X24;
- /* Default 8bit RGB fallback */ - output_fmts[i++] = MEDIA_BUS_FMT_RGB888_1X24; - *num_output_fmts = i;
return output_fmts;
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit b3abe42e94900bdd045c472f9c9be620ba5ce553 ]
Wei Chen reported a NULL deref in sk_user_ns() [0][1], and Paolo diagnosed the root cause: in unix_diag_get_exact(), the newly allocated skb does not have sk. [2]
We must get the user_ns from the NETLINK_CB(in_skb).sk and pass it to sk_diag_fill().
[0]: BUG: kernel NULL pointer dereference, address: 0000000000000270 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 12bbce067 P4D 12bbce067 PUD 12bc40067 PMD 0 Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 27942 Comm: syz-executor.0 Not tainted 6.1.0-rc5-next-20221118 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 RIP: 0010:sk_user_ns include/net/sock.h:920 [inline] RIP: 0010:sk_diag_dump_uid net/unix/diag.c:119 [inline] RIP: 0010:sk_diag_fill+0x77d/0x890 net/unix/diag.c:170 Code: 89 ef e8 66 d4 2d fd c7 44 24 40 00 00 00 00 49 8d 7c 24 18 e8 54 d7 2d fd 49 8b 5c 24 18 48 8d bb 70 02 00 00 e8 43 d7 2d fd <48> 8b 9b 70 02 00 00 48 8d 7b 10 e8 33 d7 2d fd 48 8b 5b 10 48 8d RSP: 0018:ffffc90000d67968 EFLAGS: 00010246 RAX: ffff88812badaa48 RBX: 0000000000000000 RCX: ffffffff840d481d RDX: 0000000000000465 RSI: 0000000000000000 RDI: 0000000000000270 RBP: ffffc90000d679a8 R08: 0000000000000277 R09: 0000000000000000 R10: 0001ffffffffffff R11: 0001c90000d679a8 R12: ffff88812ac03800 R13: ffff88812c87c400 R14: ffff88812ae42210 R15: ffff888103026940 FS: 00007f08b4e6f700(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000270 CR3: 000000012c58b000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> unix_diag_get_exact net/unix/diag.c:285 [inline] unix_diag_handler_dump+0x3f9/0x500 net/unix/diag.c:317 __sock_diag_cmd net/core/sock_diag.c:235 [inline] sock_diag_rcv_msg+0x237/0x250 net/core/sock_diag.c:266 netlink_rcv_skb+0x13e/0x250 net/netlink/af_netlink.c:2564 sock_diag_rcv+0x24/0x40 net/core/sock_diag.c:277 netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline] netlink_unicast+0x5e9/0x6b0 net/netlink/af_netlink.c:1356 netlink_sendmsg+0x739/0x860 net/netlink/af_netlink.c:1932 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0x38f/0x500 net/socket.c:2476 ___sys_sendmsg net/socket.c:2530 [inline] __sys_sendmsg+0x197/0x230 net/socket.c:2559 __do_sys_sendmsg net/socket.c:2568 [inline] __se_sys_sendmsg net/socket.c:2566 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2566 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x4697f9 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f08b4e6ec48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000077bf80 RCX: 00000000004697f9 RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003 RBP: 00000000004d29e9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000077bf80 R13: 0000000000000000 R14: 000000000077bf80 R15: 00007ffdb36bc6c0 </TASK> Modules linked in: CR2: 0000000000000270
[1]: https://lore.kernel.org/netdev/CAO4mrfdvyjFpokhNsiwZiP-wpdSD0AStcJwfKcKQdAAL... [2]: https://lore.kernel.org/netdev/e04315e7c90d9a75613f3993c2baf2d344eef7eb.came...
Fixes: cae9910e7344 ("net: Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.") Reported-by: syzbot syzkaller@googlegroups.com Reported-by: Wei Chen harperchen1110@gmail.com Diagnosed-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/diag.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/net/unix/diag.c b/net/unix/diag.c index 9ff64f9df1f3..951b33fa8f5c 100644 --- a/net/unix/diag.c +++ b/net/unix/diag.c @@ -113,14 +113,16 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb) return nla_put(nlskb, UNIX_DIAG_RQLEN, sizeof(rql), &rql); }
-static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb) +static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb, + struct user_namespace *user_ns) { - uid_t uid = from_kuid_munged(sk_user_ns(nlskb->sk), sock_i_uid(sk)); + uid_t uid = from_kuid_munged(user_ns, sock_i_uid(sk)); return nla_put(nlskb, UNIX_DIAG_UID, sizeof(uid_t), &uid); }
static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_req *req, - u32 portid, u32 seq, u32 flags, int sk_ino) + struct user_namespace *user_ns, + u32 portid, u32 seq, u32 flags, int sk_ino) { struct nlmsghdr *nlh; struct unix_diag_msg *rep; @@ -166,7 +168,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r goto out_nlmsg_trim;
if ((req->udiag_show & UDIAG_SHOW_UID) && - sk_diag_dump_uid(sk, skb)) + sk_diag_dump_uid(sk, skb, user_ns)) goto out_nlmsg_trim;
nlmsg_end(skb, nlh); @@ -178,7 +180,8 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r }
static int sk_diag_dump(struct sock *sk, struct sk_buff *skb, struct unix_diag_req *req, - u32 portid, u32 seq, u32 flags) + struct user_namespace *user_ns, + u32 portid, u32 seq, u32 flags) { int sk_ino;
@@ -189,7 +192,7 @@ static int sk_diag_dump(struct sock *sk, struct sk_buff *skb, struct unix_diag_r if (!sk_ino) return 0;
- return sk_diag_fill(sk, skb, req, portid, seq, flags, sk_ino); + return sk_diag_fill(sk, skb, req, user_ns, portid, seq, flags, sk_ino); }
static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb) @@ -217,7 +220,7 @@ static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb) goto next; if (!(req->udiag_states & (1 << sk->sk_state))) goto next; - if (sk_diag_dump(sk, skb, req, + if (sk_diag_dump(sk, skb, req, sk_user_ns(skb->sk), NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI) < 0) @@ -285,7 +288,8 @@ static int unix_diag_get_exact(struct sk_buff *in_skb, if (!rep) goto out;
- err = sk_diag_fill(sk, rep, req, NETLINK_CB(in_skb).portid, + err = sk_diag_fill(sk, rep, req, sk_user_ns(NETLINK_CB(in_skb).sk), + NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, 0, req->udiag_ino); if (err < 0) { nlmsg_free(rep);
From: Ronak Doshi doshir@vmware.com
[ Upstream commit 40b8c2a1af03ba3e8da55a4490d646bfa845e71a ]
Commit dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") added support for encapsulation offload. However, the pathc did not report correctly the encapsulated packet which is LRO'ed by the hypervisor.
This patch fixes this issue by using correct callback for the LRO'ed encapsulated packet.
Fixes: dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi doshir@vmware.com Acked-by: Guolin Yang gyang@vmware.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/vmxnet3/vmxnet3_drv.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index 6678a734cc4d..43a4bcdd92c1 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -1356,6 +1356,7 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, }; u32 num_pkts = 0; bool skip_page_frags = false; + bool encap_lro = false; struct Vmxnet3_RxCompDesc *rcd; struct vmxnet3_rx_ctx *ctx = &rq->rx_ctx; u16 segCnt = 0, mss = 0; @@ -1496,13 +1497,18 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, if (VMXNET3_VERSION_GE_2(adapter) && rcd->type == VMXNET3_CDTYPE_RXCOMP_LRO) { struct Vmxnet3_RxCompDescExt *rcdlro; + union Vmxnet3_GenericDesc *gdesc; + rcdlro = (struct Vmxnet3_RxCompDescExt *)rcd; + gdesc = (union Vmxnet3_GenericDesc *)rcd;
segCnt = rcdlro->segCnt; WARN_ON_ONCE(segCnt == 0); mss = rcdlro->mss; if (unlikely(segCnt <= 1)) segCnt = 0; + encap_lro = (le32_to_cpu(gdesc->dword[0]) & + (1UL << VMXNET3_RCD_HDR_INNER_SHIFT)); } else { segCnt = 0; } @@ -1570,7 +1576,7 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, vmxnet3_rx_csum(adapter, skb, (union Vmxnet3_GenericDesc *)rcd); skb->protocol = eth_type_trans(skb, adapter->netdev); - if (!rcd->tcp || + if ((!rcd->tcp && !encap_lro) || !(adapter->netdev->features & NETIF_F_LRO)) goto not_lro;
@@ -1579,7 +1585,7 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, SKB_GSO_TCPV4 : SKB_GSO_TCPV6; skb_shinfo(skb)->gso_size = mss; skb_shinfo(skb)->gso_segs = segCnt; - } else if (segCnt != 0 || skb->len > mtu) { + } else if ((segCnt != 0 || skb->len > mtu) && !encap_lro) { u32 hlen;
hlen = vmxnet3_get_hdr_len(adapter, skb, @@ -1608,6 +1614,7 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, napi_gro_receive(&rq->napi, skb);
ctx->skb = NULL; + encap_lro = false; num_pkts++; }
From: Wang ShaoBo bobo.shaobowang@huawei.com
[ Upstream commit 747da1308bdd5021409974f9180f0d8ece53d142 ]
hci_get_route() takes reference, we should use hci_dev_put() to release it when not need anymore.
Fixes: 6b8d4a6a0314 ("Bluetooth: 6LoWPAN: Use connected oriented channel instead of fixed one") Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/6lowpan.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/bluetooth/6lowpan.c b/net/bluetooth/6lowpan.c index cff4944d5b66..7601ce9143c1 100644 --- a/net/bluetooth/6lowpan.c +++ b/net/bluetooth/6lowpan.c @@ -1010,6 +1010,7 @@ static int get_l2cap_conn(char *buf, bdaddr_t *addr, u8 *addr_type, hci_dev_lock(hdev); hcon = hci_conn_hash_lookup_le(hdev, addr, *addr_type); hci_dev_unlock(hdev); + hci_dev_put(hdev);
if (!hcon) return -ENOENT;
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit 2f3957c7eb4e07df944169a3e50a4d6790e1c744 ]
bt_init() calls bt_leds_init() to register led, but if it fails later, bt_leds_cleanup() is not called to unregister it.
This can cause panic if the argument "bluetooth-power" in text is freed and then another led_trigger_register() tries to access it:
BUG: unable to handle page fault for address: ffffffffc06d3bc0 RIP: 0010:strcmp+0xc/0x30 Call Trace: <TASK> led_trigger_register+0x10d/0x4f0 led_trigger_register_simple+0x7d/0x100 bt_init+0x39/0xf7 [bluetooth] do_one_initcall+0xd0/0x4e0
Fixes: e64c97b53bc6 ("Bluetooth: Add combined LED trigger for controller power") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/af_bluetooth.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c index 4ef6a54403aa..2f87f57e7a4f 100644 --- a/net/bluetooth/af_bluetooth.c +++ b/net/bluetooth/af_bluetooth.c @@ -736,7 +736,7 @@ static int __init bt_init(void)
err = bt_sysfs_init(); if (err < 0) - return err; + goto cleanup_led;
err = sock_register(&bt_sock_family_ops); if (err) @@ -772,6 +772,8 @@ static int __init bt_init(void) sock_unregister(PF_BLUETOOTH); cleanup_sysfs: bt_sysfs_cleanup(); +cleanup_led: + bt_leds_cleanup(); return err; }
From: Artem Chernyshev artem.chernyshev@red-soft.ru
[ Upstream commit 3d8fdcbf1f42e2bb9ae8b8c0b6f202278c788a22 ]
Return NULL if we got unexpected value from skb_trim_rcsum() in ksz_common_rcv()
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: bafe9ba7d908 ("net: dsa: ksz: Factor out common tag code") Signed-off-by: Artem Chernyshev artem.chernyshev@red-soft.ru Reviewed-by: Vladimir Oltean olteanv@gmail.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20221201140032.26746-1-artem.chernyshev@red-soft.r... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/dsa/tag_ksz.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index 4820dbcedfa2..230ddf45dff0 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -22,7 +22,8 @@ static struct sk_buff *ksz_common_rcv(struct sk_buff *skb, if (!skb->dev) return NULL;
- pskb_trim_rcsum(skb, skb->len - len); + if (pskb_trim_rcsum(skb, skb->len - len)) + return NULL;
skb->offload_fwd_mark = true;
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 85a0506c073332a3057f5a9635fa0d4db5a8e03b ]
When testing in kci_test_ipsec_offload, srcip is configured as $dstip, it should add xfrm policy rule in instead of out. The test result of this patch is as follows: PASS: ipsec_offload
Fixes: 2766a11161cc ("selftests: rtnetlink: add ipsec offload API test") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Acked-by: Hangbin Liu liuhangbin@gmail.com Link: https://lore.kernel.org/r/20221201082246.14131-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/rtnetlink.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index c9ce3dfa42ee..c3a905923ef2 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -782,7 +782,7 @@ kci_test_ipsec_offload() tmpl proto esp src $srcip dst $dstip spi 9 \ mode transport reqid 42 check_err $? - ip x p add dir out src $dstip/24 dst $srcip/24 \ + ip x p add dir in src $dstip/24 dst $srcip/24 \ tmpl proto esp src $dstip dst $srcip spi 9 \ mode transport reqid 42 check_err $?
From: Wei Yongjun weiyongjun1@huawei.com
[ Upstream commit b3d72d3135d2ef68296c1ee174436efd65386f04 ]
Kernel fault injection test reports null-ptr-deref as follows:
BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:cfg802154_netdev_notifier_call+0x120/0x310 include/linux/list.h:114 Call Trace: <TASK> raw_notifier_call_chain+0x6d/0xa0 kernel/notifier.c:87 call_netdevice_notifiers_info+0x6e/0xc0 net/core/dev.c:1944 unregister_netdevice_many_notify+0x60d/0xcb0 net/core/dev.c:1982 unregister_netdevice_queue+0x154/0x1a0 net/core/dev.c:10879 register_netdevice+0x9a8/0xb90 net/core/dev.c:10083 ieee802154_if_add+0x6ed/0x7e0 net/mac802154/iface.c:659 ieee802154_register_hw+0x29c/0x330 net/mac802154/main.c:229 mcr20a_probe+0xaaa/0xcb1 drivers/net/ieee802154/mcr20a.c:1316
ieee802154_if_add() allocates wpan_dev as netdev's private data, but not init the list in struct wpan_dev. cfg802154_netdev_notifier_call() manage the list when device register/unregister, and may lead to null-ptr-deref.
Use INIT_LIST_HEAD() on it to initialize it correctly.
Fixes: fcf39e6e88e9 ("ieee802154: add wpan_dev_list") Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Acked-by: Alexander Aring aahringo@redhat.com
Link: https://lore.kernel.org/r/20221130091705.1831140-1-weiyongjun@huaweicloud.co... Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac802154/iface.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/mac802154/iface.c b/net/mac802154/iface.c index 1cf5ac09edcb..a08240fe68a7 100644 --- a/net/mac802154/iface.c +++ b/net/mac802154/iface.c @@ -661,6 +661,7 @@ ieee802154_if_add(struct ieee802154_local *local, const char *name, sdata->dev = ndev; sdata->wpan_dev.wpan_phy = local->hw.phy; sdata->local = local; + INIT_LIST_HEAD(&sdata->wpan_dev.list);
/* setup type-dependent data */ ret = ieee802154_setup_sdata(sdata, type);
From: Valentina Goncharenko goncharenko.vp@ispras.ru
[ Upstream commit 167b3f2dcc62c271f3555b33df17e361bb1fa0ee ]
In functions regmap_encx24j600_phy_reg_read() and regmap_encx24j600_phy_reg_write() in the conditions of the waiting cycles for filling the variable 'ret' it is necessary to add parentheses to prevent wrong assignment due to logical operations precedence.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: d70e53262f5c ("net: Microchip encx24j600 driver") Signed-off-by: Valentina Goncharenko goncharenko.vp@ispras.ru Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/encx24j600-regmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microchip/encx24j600-regmap.c b/drivers/net/ethernet/microchip/encx24j600-regmap.c index 81a8ccca7e5e..2e337c7a5773 100644 --- a/drivers/net/ethernet/microchip/encx24j600-regmap.c +++ b/drivers/net/ethernet/microchip/encx24j600-regmap.c @@ -359,7 +359,7 @@ static int regmap_encx24j600_phy_reg_read(void *context, unsigned int reg, goto err_out;
usleep_range(26, 100); - while ((ret = regmap_read(ctx->regmap, MISTAT, &mistat) != 0) && + while (((ret = regmap_read(ctx->regmap, MISTAT, &mistat)) != 0) && (mistat & BUSY)) cpu_relax();
@@ -397,7 +397,7 @@ static int regmap_encx24j600_phy_reg_write(void *context, unsigned int reg, goto err_out;
usleep_range(26, 100); - while ((ret = regmap_read(ctx->regmap, MISTAT, &mistat) != 0) && + while (((ret = regmap_read(ctx->regmap, MISTAT, &mistat)) != 0) && (mistat & BUSY)) cpu_relax();
From: Valentina Goncharenko goncharenko.vp@ispras.ru
[ Upstream commit 25f427ac7b8d89b0259f86c0c6407b329df742b2 ]
A loop for reading MISTAT register continues while regmap_read() fails and (mistat & BUSY), but if regmap_read() fails a value of mistat is undefined.
The patch proposes to check for BUSY flag only when regmap_read() succeed. Compile test only.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: d70e53262f5c ("net: Microchip encx24j600 driver") Signed-off-by: Valentina Goncharenko goncharenko.vp@ispras.ru Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/encx24j600-regmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microchip/encx24j600-regmap.c b/drivers/net/ethernet/microchip/encx24j600-regmap.c index 2e337c7a5773..5693784eec5b 100644 --- a/drivers/net/ethernet/microchip/encx24j600-regmap.c +++ b/drivers/net/ethernet/microchip/encx24j600-regmap.c @@ -359,7 +359,7 @@ static int regmap_encx24j600_phy_reg_read(void *context, unsigned int reg, goto err_out;
usleep_range(26, 100); - while (((ret = regmap_read(ctx->regmap, MISTAT, &mistat)) != 0) && + while (((ret = regmap_read(ctx->regmap, MISTAT, &mistat)) == 0) && (mistat & BUSY)) cpu_relax();
@@ -397,7 +397,7 @@ static int regmap_encx24j600_phy_reg_write(void *context, unsigned int reg, goto err_out;
usleep_range(26, 100); - while (((ret = regmap_read(ctx->regmap, MISTAT, &mistat)) != 0) && + while (((ret = regmap_read(ctx->regmap, MISTAT, &mistat)) == 0) && (mistat & BUSY)) cpu_relax();
From: Lin Liu lin.liu@citrix.com
[ Upstream commit d50b7914fae04d840ce36491d22133070b18cca9 ]
A NAPI is setup for each network sring to poll data to kernel The sring with source host is destroyed before live migration and new sring with target host is setup after live migration. The NAPI for the old sring is not deleted until setup new sring with target host after migration. With busy_poll/busy_read enabled, the NAPI can be polled before got deleted when resume VM.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: xennet_poll+0xae/0xd20 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI Call Trace: finish_task_switch+0x71/0x230 timerqueue_del+0x1d/0x40 hrtimer_try_to_cancel+0xb5/0x110 xennet_alloc_rx_buffers+0x2a0/0x2a0 napi_busy_loop+0xdb/0x270 sock_poll+0x87/0x90 do_sys_poll+0x26f/0x580 tracing_map_insert+0x1d4/0x2f0 event_hist_trigger+0x14a/0x260
finish_task_switch+0x71/0x230 __schedule+0x256/0x890 recalc_sigpending+0x1b/0x50 xen_sched_clock+0x15/0x20 __rb_reserve_next+0x12d/0x140 ring_buffer_lock_reserve+0x123/0x3d0 event_triggers_call+0x87/0xb0 trace_event_buffer_commit+0x1c4/0x210 xen_clocksource_get_cycles+0x15/0x20 ktime_get_ts64+0x51/0xf0 SyS_ppoll+0x160/0x1a0 SyS_ppoll+0x160/0x1a0 do_syscall_64+0x73/0x130 entry_SYSCALL_64_after_hwframe+0x41/0xa6 ... RIP: xennet_poll+0xae/0xd20 RSP: ffffb4f041933900 CR2: 0000000000000008 ---[ end trace f8601785b354351c ]---
xen frontend should remove the NAPIs for the old srings before live migration as the bond srings are destroyed
There is a tiny window between the srings are set to NULL and the NAPIs are disabled, It is safe as the NAPI threads are still frozen at that time
Signed-off-by: Lin Liu lin.liu@citrix.com Fixes: 4ec2411980d0 ([NET]: Do not check netif_running() and carrier state in ->poll()) Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/xen-netfront.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 569f3c8e7b75..3d149890fa36 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1868,6 +1868,12 @@ static int netfront_resume(struct xenbus_device *dev) netif_tx_unlock_bh(info->netdev);
xennet_disconnect_backend(info); + + rtnl_lock(); + if (info->queues) + xennet_destroy_queues(info); + rtnl_unlock(); + return 0; }
From: Dan Carpenter error27@gmail.com
[ Upstream commit e8b4fc13900b8e8be48debffd0dfd391772501f7 ]
The pp->indir[0] value comes from the user. It is passed to:
if (cpu_online(pp->rxq_def))
inside the mvneta_percpu_elect() function. It needs bounds checkeding to ensure that it is not beyond the end of the cpu bitmap.
Fixes: cad5d847a093 ("net: mvneta: Fix the CPU choice in mvneta_percpu_elect") Signed-off-by: Dan Carpenter error27@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/mvneta.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 74e266c0b8e1..6bfa0ac27be3 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -4767,6 +4767,9 @@ static int mvneta_config_rss(struct mvneta_port *pp) napi_disable(&pp->napi); }
+ if (pp->indir[0] >= nr_cpu_ids) + return -EINVAL; + pp->rxq_def = pp->indir[0];
/* Update unicast mapping */
From: Michal Jaron michalx.jaron@intel.com
[ Upstream commit 82e0572b23029b380464fa9fdc125db9c1506d0a ]
During tx rings configuration default XPS queue config is set and __I40E_TX_XPS_INIT_DONE is locked. __I40E_TX_XPS_INIT_DONE state is cleared and set again with default mapping only during queues build, it means after first setup or reset with queues rebuild. (i.e. ethtool -L <interface> combined <number>) After other resets (i.e. ethtool -t <interface>) XPS_INIT_DONE is not cleared and those default maps cannot be set again. It results in cleared xps_cpus mapping until queues are not rebuild or mapping is not set by user.
Add clearing __I40E_TX_XPS_INIT_DONE state during reset to let the driver set xps_cpus to defaults again after it was cleared.
Fixes: 6f853d4f8e93 ("i40e: allow XPS with QoS enabled") Signed-off-by: Michal Jaron michalx.jaron@intel.com Signed-off-by: Kamil Maziarz kamil.maziarz@intel.com Tested-by: Gurucharan gurucharanx.g@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index d7ddf9239e51..2c60d2a93330 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -10065,6 +10065,21 @@ static int i40e_rebuild_channels(struct i40e_vsi *vsi) return 0; }
+/** + * i40e_clean_xps_state - clean xps state for every tx_ring + * @vsi: ptr to the VSI + **/ +static void i40e_clean_xps_state(struct i40e_vsi *vsi) +{ + int i; + + if (vsi->tx_rings) + for (i = 0; i < vsi->num_queue_pairs; i++) + if (vsi->tx_rings[i]) + clear_bit(__I40E_TX_XPS_INIT_DONE, + vsi->tx_rings[i]->state); +} + /** * i40e_prep_for_reset - prep for the core to reset * @pf: board private structure @@ -10096,8 +10111,10 @@ static void i40e_prep_for_reset(struct i40e_pf *pf, bool lock_acquired) rtnl_unlock();
for (v = 0; v < pf->num_alloc_vsi; v++) { - if (pf->vsi[v]) + if (pf->vsi[v]) { + i40e_clean_xps_state(pf->vsi[v]); pf->vsi[v]->seid = 0; + } }
i40e_shutdown_adminq(&pf->hw);
From: Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com
[ Upstream commit 08501970472077ed5de346ad89943a37d1692e9b ]
After spawning max VFs on a PF, some VFs were not getting resources and their MAC addresses were 0. This was caused by PF sleeping before flushing HW registers which caused VIRTCHNL_VFR_VFACTIVE to not be set in time for VF.
Fix by adding a sleep after hw flush.
Fixes: e4b433f4a741 ("i40e: reset all VFs in parallel when rebuilding PF") Signed-off-by: Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com Signed-off-by: Jan Sokolowski jan.sokolowski@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 381b28a08746..bb2a79b70c3a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -1525,6 +1525,7 @@ bool i40e_reset_vf(struct i40e_vf *vf, bool flr) i40e_cleanup_reset_vf(vf);
i40e_flush(hw); + usleep_range(20000, 40000); clear_bit(I40E_VF_STATE_RESETTING, &vf->vf_states);
return true; @@ -1648,6 +1649,7 @@ bool i40e_reset_all_vfs(struct i40e_pf *pf, bool flr) }
i40e_flush(hw); + usleep_range(20000, 40000); clear_bit(__I40E_VF_DISABLE, pf->state);
return true;
From: Przemyslaw Patynowski przemyslawx.patynowski@intel.com
[ Upstream commit d64aaf3f7869f915fd120763d75f11d6b116424d ]
Return -EOPNOTSUPP, when user requests l4_4_bytes for raw IP4 or IP6 flow director filters. Flow director does not support filtering on l4 bytes for PCTYPEs used by IP4 and IP6 filters. Without this patch, user could create filters with l4_4_bytes fields, which did not do any filtering on L4, but only on L3 fields.
Fixes: 36777d9fa24c ("i40e: check current configured input set when adding ntuple filters") Signed-off-by: Przemyslaw Patynowski przemyslawx.patynowski@intel.com Signed-off-by: Kamil Maziarz kamil.maziarz@intel.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Gurucharan G gurucharanx.g@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c index 144c4824b5e8..520929f4d535 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c @@ -4234,11 +4234,7 @@ static int i40e_check_fdir_input_set(struct i40e_vsi *vsi, return -EOPNOTSUPP;
/* First 4 bytes of L4 header */ - if (usr_ip4_spec->l4_4_bytes == htonl(0xFFFFFFFF)) - new_mask |= I40E_L4_SRC_MASK | I40E_L4_DST_MASK; - else if (!usr_ip4_spec->l4_4_bytes) - new_mask &= ~(I40E_L4_SRC_MASK | I40E_L4_DST_MASK); - else + if (usr_ip4_spec->l4_4_bytes) return -EOPNOTSUPP;
/* Filtering on Type of Service is not supported. */
From: Kees Cook keescook@chromium.org
[ Upstream commit e329e71013c9b5a4535b099208493c7826ee4a64 ]
While running under CONFIG_FORTIFY_SOURCE=y, syzkaller reported:
memcpy: detected field-spanning write (size 129) of single field "target->sensf_res" at net/nfc/nci/ntf.c:260 (size 18)
This appears to be a legitimate lack of bounds checking in nci_add_new_protocol(). Add the missing checks.
Reported-by: syzbot+210e196cef4711b65139@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/0000000000001c590f05ee7b3ff4@google.com Fixes: 019c4fbaa790 ("NFC: Add NCI multiple targets support") Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20221202214410.never.693-kees@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/nfc/nci/ntf.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c index 33e1170817f0..f8b20cddd5c9 100644 --- a/net/nfc/nci/ntf.c +++ b/net/nfc/nci/ntf.c @@ -218,6 +218,8 @@ static int nci_add_new_protocol(struct nci_dev *ndev, target->sens_res = nfca_poll->sens_res; target->sel_res = nfca_poll->sel_res; target->nfcid1_len = nfca_poll->nfcid1_len; + if (target->nfcid1_len > ARRAY_SIZE(target->nfcid1)) + return -EPROTO; if (target->nfcid1_len > 0) { memcpy(target->nfcid1, nfca_poll->nfcid1, target->nfcid1_len); @@ -226,6 +228,8 @@ static int nci_add_new_protocol(struct nci_dev *ndev, nfcb_poll = (struct rf_tech_specific_params_nfcb_poll *)params;
target->sensb_res_len = nfcb_poll->sensb_res_len; + if (target->sensb_res_len > ARRAY_SIZE(target->sensb_res)) + return -EPROTO; if (target->sensb_res_len > 0) { memcpy(target->sensb_res, nfcb_poll->sensb_res, target->sensb_res_len); @@ -234,6 +238,8 @@ static int nci_add_new_protocol(struct nci_dev *ndev, nfcf_poll = (struct rf_tech_specific_params_nfcf_poll *)params;
target->sensf_res_len = nfcf_poll->sensf_res_len; + if (target->sensf_res_len > ARRAY_SIZE(target->sensf_res)) + return -EPROTO; if (target->sensf_res_len > 0) { memcpy(target->sensf_res, nfcf_poll->sensf_res, target->sensf_res_len);
From: Pankaj Raghav p.raghav@samsung.com
[ Upstream commit 6f2d71524bcfdeb1fcbd22a4a92a5b7b161ab224 ]
A device might have a core quirk for NVME_QUIRK_IGNORE_DEV_SUBNQN (such as Samsung X5) but it would still give a:
"missing or invalid SUBNQN field"
warning as core quirks are filled after calling nvme_init_subnqn. Fill ctrl->quirks from struct core_quirks before calling nvme_init_subsystem to fix this.
Tested on a Samsung X5.
Fixes: ab9e00cc72fa ("nvme: track subsystems") Signed-off-by: Pankaj Raghav p.raghav@samsung.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index d9c78fe85cb3..e162f1dfbafe 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3092,10 +3092,6 @@ int nvme_init_identify(struct nvme_ctrl *ctrl) if (!ctrl->identified) { int i;
- ret = nvme_init_subsystem(ctrl, id); - if (ret) - goto out_free; - /* * Check for quirks. Quirk can depend on firmware version, * so, in principle, the set of quirks present can change @@ -3108,6 +3104,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl) if (quirk_matches(id, &core_quirks[i])) ctrl->quirks |= core_quirks[i].quirks; } + + ret = nvme_init_subsystem(ctrl, id); + if (ret) + goto out_free; } memcpy(ctrl->subsys->firmware_rev, id->fr, sizeof(ctrl->subsys->firmware_rev));
From: Jisheng Zhang jszhang@kernel.org
[ Upstream commit 61d4f140943c47c1386ed89f7260e00418dfad9d ]
In dt-binding snps,dwmac.yaml, some properties under "snps,axi-config" node are named without "axi_" prefix, but the driver expects the prefix. Since the dt-binding has been there for a long time, we'd better make driver match the binding for compatibility.
Fixes: afea03656add ("stmmac: rework DMA bus setting and introduce new platform AXI structure") Signed-off-by: Jisheng Zhang jszhang@kernel.org Link: https://lore.kernel.org/r/20221202161739.2203-1-jszhang@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index f70d8d1ce329..1ed74cfb61fc 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -108,10 +108,10 @@ static struct stmmac_axi *stmmac_axi_setup(struct platform_device *pdev)
axi->axi_lpi_en = of_property_read_bool(np, "snps,lpi_en"); axi->axi_xit_frm = of_property_read_bool(np, "snps,xit_frm"); - axi->axi_kbbe = of_property_read_bool(np, "snps,axi_kbbe"); - axi->axi_fb = of_property_read_bool(np, "snps,axi_fb"); - axi->axi_mb = of_property_read_bool(np, "snps,axi_mb"); - axi->axi_rb = of_property_read_bool(np, "snps,axi_rb"); + axi->axi_kbbe = of_property_read_bool(np, "snps,kbbe"); + axi->axi_fb = of_property_read_bool(np, "snps,fb"); + axi->axi_mb = of_property_read_bool(np, "snps,mb"); + axi->axi_rb = of_property_read_bool(np, "snps,rb");
if (of_property_read_u32(np, "snps,wr_osr_lmt", &axi->axi_wr_osr_lmt)) axi->axi_wr_osr_lmt = 1;
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit ee496694b9eea651ae1aa4c4667d886cdf74aa3b ]
Although the type I ERSPAN is based on the barebones IP + GRE encapsulation and no extra ERSPAN header. Report erspan version on GRE interface looks unreasonable. Fix this by separating the erspan and gre fill info.
IPv6 GRE does not have this info as IPv6 only supports erspan version 1 and 2.
Reported-by: Jianlin Shi jishi@redhat.com Fixes: f989d546a2d5 ("erspan: Add type I version 0 support.") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Acked-by: William Tu u9012063@gmail.com Link: https://lore.kernel.org/r/20221203032858.3130339-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/ip_gre.c | 48 ++++++++++++++++++++++++++++------------------- 1 file changed, 29 insertions(+), 19 deletions(-)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 6ab5c50aa7a8..65ead8a74933 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -1493,24 +1493,6 @@ static int ipgre_fill_info(struct sk_buff *skb, const struct net_device *dev) struct ip_tunnel_parm *p = &t->parms; __be16 o_flags = p->o_flags;
- if (t->erspan_ver <= 2) { - if (t->erspan_ver != 0 && !t->collect_md) - o_flags |= TUNNEL_KEY; - - if (nla_put_u8(skb, IFLA_GRE_ERSPAN_VER, t->erspan_ver)) - goto nla_put_failure; - - if (t->erspan_ver == 1) { - if (nla_put_u32(skb, IFLA_GRE_ERSPAN_INDEX, t->index)) - goto nla_put_failure; - } else if (t->erspan_ver == 2) { - if (nla_put_u8(skb, IFLA_GRE_ERSPAN_DIR, t->dir)) - goto nla_put_failure; - if (nla_put_u16(skb, IFLA_GRE_ERSPAN_HWID, t->hwid)) - goto nla_put_failure; - } - } - if (nla_put_u32(skb, IFLA_GRE_LINK, p->link) || nla_put_be16(skb, IFLA_GRE_IFLAGS, gre_tnl_flags_to_gre_flags(p->i_flags)) || @@ -1551,6 +1533,34 @@ static int ipgre_fill_info(struct sk_buff *skb, const struct net_device *dev) return -EMSGSIZE; }
+static int erspan_fill_info(struct sk_buff *skb, const struct net_device *dev) +{ + struct ip_tunnel *t = netdev_priv(dev); + + if (t->erspan_ver <= 2) { + if (t->erspan_ver != 0 && !t->collect_md) + t->parms.o_flags |= TUNNEL_KEY; + + if (nla_put_u8(skb, IFLA_GRE_ERSPAN_VER, t->erspan_ver)) + goto nla_put_failure; + + if (t->erspan_ver == 1) { + if (nla_put_u32(skb, IFLA_GRE_ERSPAN_INDEX, t->index)) + goto nla_put_failure; + } else if (t->erspan_ver == 2) { + if (nla_put_u8(skb, IFLA_GRE_ERSPAN_DIR, t->dir)) + goto nla_put_failure; + if (nla_put_u16(skb, IFLA_GRE_ERSPAN_HWID, t->hwid)) + goto nla_put_failure; + } + } + + return ipgre_fill_info(skb, dev); + +nla_put_failure: + return -EMSGSIZE; +} + static void erspan_setup(struct net_device *dev) { struct ip_tunnel *t = netdev_priv(dev); @@ -1629,7 +1639,7 @@ static struct rtnl_link_ops erspan_link_ops __read_mostly = { .changelink = erspan_changelink, .dellink = ip_tunnel_dellink, .get_size = ipgre_get_size, - .fill_info = ipgre_fill_info, + .fill_info = erspan_fill_info, .get_link_net = ip_tunnel_get_link_net, };
From: Yongqiang Liu liuyongqiang13@huawei.com
[ Upstream commit 42330a32933fb42180c52022804dcf09f47a2f99 ]
The nicvf_probe() won't destroy workqueue when register_netdev() failed. Add destroy_workqueue err handle case to fix this issue.
Fixes: 2ecbe4f4a027 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them.") Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Link: https://lore.kernel.org/r/20221203094125.602812-1-liuyongqiang13@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index c00f1a7ffc15..488da767cfdf 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -2258,7 +2258,7 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) err = register_netdev(netdev); if (err) { dev_err(dev, "Failed to register netdevice\n"); - goto err_unregister_interrupts; + goto err_destroy_workqueue; }
nic->msg_enable = debug; @@ -2267,6 +2267,8 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return 0;
+err_destroy_workqueue: + destroy_workqueue(nic->nicvf_rx_mode_wq); err_unregister_interrupts: nicvf_unregister_interrupts(nic); err_free_netdev:
From: Liu Jian liujian56@huawei.com
[ Upstream commit 4640177049549de1a43e9bc49265f0cdfce08cfd ]
The skb is delivered to napi_gro_receive() which may free it, after calling this, dereferencing skb may trigger use-after-free.
Fixes: 542ae60af24f ("net: hisilicon: Add Fast Ethernet MAC driver") Signed-off-by: Liu Jian liujian56@huawei.com Link: https://lore.kernel.org/r/20221203094240.1240211-1-liujian56@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hisi_femac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hisi_femac.c b/drivers/net/ethernet/hisilicon/hisi_femac.c index 57c3bc4f7089..c16dfd869363 100644 --- a/drivers/net/ethernet/hisilicon/hisi_femac.c +++ b/drivers/net/ethernet/hisilicon/hisi_femac.c @@ -283,7 +283,7 @@ static int hisi_femac_rx(struct net_device *dev, int limit) skb->protocol = eth_type_trans(skb, dev); napi_gro_receive(&priv->napi, skb); dev->stats.rx_packets++; - dev->stats.rx_bytes += skb->len; + dev->stats.rx_bytes += len; next: pos = (pos + 1) % rxq->num; if (rx_pkts_num >= limit)
From: Liu Jian liujian56@huawei.com
[ Upstream commit 433c07a13f59856e4585e89e86b7d4cc59348fab ]
The skb is delivered to napi_gro_receive() which may free it, after calling this, dereferencing skb may trigger use-after-free.
Fixes: 57c5bc9ad7d7 ("net: hisilicon: add hix5hd2 mac driver") Signed-off-by: Liu Jian liujian56@huawei.com Link: https://lore.kernel.org/r/20221203094240.1240211-2-liujian56@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hix5hd2_gmac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c index 8b2bf85039f1..43f3146caf07 100644 --- a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c +++ b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c @@ -550,7 +550,7 @@ static int hix5hd2_rx(struct net_device *dev, int limit) skb->protocol = eth_type_trans(skb, dev); napi_gro_receive(&priv->napi, skb); dev->stats.rx_packets++; - dev->stats.rx_bytes += skb->len; + dev->stats.rx_bytes += len; next: pos = dma_ring_incr(pos, RX_DESC_NUM); }
From: YueHaibing yuehaibing@huawei.com
[ Upstream commit 743117a997bbd4840e827295c07e59bcd7f7caa3 ]
Fix the potential risk of OOB if skb_linearize() fails in tipc_link_proto_rcv().
Fixes: 5cbb28a4bf65 ("tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers") Signed-off-by: YueHaibing yuehaibing@huawei.com Link: https://lore.kernel.org/r/20221203094635.29024-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/link.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/tipc/link.c b/net/tipc/link.c index 064fdb8e50e1..c1e56d1f21b3 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -2188,7 +2188,9 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, if (tipc_own_addr(l->net) > msg_prevnode(hdr)) l->net_plane = msg_net_plane(hdr);
- skb_linearize(skb); + if (skb_linearize(skb)) + goto exit; + hdr = buf_msg(skb); data = msg_data(hdr);
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f96a3d74554df537b6db5c99c27c80e7afadc8d1 ]
Cited commit added the table ID to the FIB info structure, but did not prevent structures with different table IDs from being consolidated. This can lead to routes being flushed from a VRF when an address is deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB info. This is already done for FIB info structures backed by a nexthop object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [FAIL] TEST: Route in default VRF not removed [ OK ] RTNETLINK answers: File exists TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6 Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8 Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/fib_semantics.c | 1 + tools/testing/selftests/net/fib_tests.sh | 27 ++++++++++++++++++++++++ 2 files changed, 28 insertions(+)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 52ec0c43e6b8..ab9fcc6231b8 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -423,6 +423,7 @@ static struct fib_info *fib_find_info(struct fib_info *nfi) nfi->fib_prefsrc == fi->fib_prefsrc && nfi->fib_priority == fi->fib_priority && nfi->fib_type == fi->fib_type && + nfi->fib_tb_id == fi->fib_tb_id && memcmp(nfi->fib_metrics, fi->fib_metrics, sizeof(u32) * RTAX_MAX) == 0 && !((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_COMPARE_MASK) && diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh index a7f53c2a9580..a7b40dc56cae 100755 --- a/tools/testing/selftests/net/fib_tests.sh +++ b/tools/testing/selftests/net/fib_tests.sh @@ -1622,13 +1622,19 @@ ipv4_del_addr_test()
$IP addr add dev dummy1 172.16.104.1/24 $IP addr add dev dummy1 172.16.104.11/24 + $IP addr add dev dummy1 172.16.104.12/24 $IP addr add dev dummy2 172.16.104.1/24 $IP addr add dev dummy2 172.16.104.11/24 + $IP addr add dev dummy2 172.16.104.12/24 $IP route add 172.16.105.0/24 via 172.16.104.2 src 172.16.104.11 + $IP route add 172.16.106.0/24 dev lo src 172.16.104.12 $IP route add vrf red 172.16.105.0/24 via 172.16.104.2 src 172.16.104.11 + $IP route add vrf red 172.16.106.0/24 dev lo src 172.16.104.12 set +e
# removing address from device in vrf should only remove route from vrf table + echo " Regular FIB info" + $IP addr del dev dummy2 172.16.104.11/24 $IP ro ls vrf red | grep -q 172.16.105.0/24 log_test $? 1 "Route removed from VRF when source address deleted" @@ -1646,6 +1652,27 @@ ipv4_del_addr_test() $IP ro ls vrf red | grep -q 172.16.105.0/24 log_test $? 0 "Route in VRF is not removed by address delete"
+ # removing address from device in vrf should only remove route from vrf + # table even when the associated fib info only differs in table ID + echo " Identical FIB info with different table ID" + + $IP addr del dev dummy2 172.16.104.12/24 + $IP ro ls vrf red | grep -q 172.16.106.0/24 + log_test $? 1 "Route removed from VRF when source address deleted" + + $IP ro ls | grep -q 172.16.106.0/24 + log_test $? 0 "Route in default VRF not removed" + + $IP addr add dev dummy2 172.16.104.12/24 + $IP route add vrf red 172.16.106.0/24 dev lo src 172.16.104.12 + + $IP addr del dev dummy1 172.16.104.12/24 + $IP ro ls | grep -q 172.16.106.0/24 + log_test $? 1 "Route removed in default VRF when source address deleted" + + $IP ro ls vrf red | grep -q 172.16.106.0/24 + log_test $? 0 "Route in VRF is not removed by address delete" + $IP li del dummy1 $IP li del dummy2 cleanup
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit c0d999348e01df03e0a7f550351f3907fabbf611 ]
Cited commit added the table ID to the FIB info structure, but did not properly initialize it when table ID 0 is used. This can lead to a route in the default VRF with a preferred source address not being flushed when the address is deleted.
Consider the following example:
# ip address add dev dummy1 192.0.2.1/28 # ip address add dev dummy1 192.0.2.17/28 # ip route add 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 100 # ip route add table 0 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 200 # ip route show 198.51.100.0/24 198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 100 198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Both routes are installed in the default VRF, but they are using two different FIB info structures. One with a metric of 100 and table ID of 254 (main) and one with a metric of 200 and table ID of 0. Therefore, when the preferred source address is deleted from the default VRF, the second route is not flushed:
# ip address del dev dummy1 192.0.2.17/28 # ip route show 198.51.100.0/24 198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Fix by storing a table ID of 254 instead of 0 in the route configuration structure.
Add a test case that fails before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Table ID 0 TEST: Route removed in default VRF when source address deleted [FAIL]
Tests passed: 8 Tests failed: 1
And passes after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Table ID 0 TEST: Route removed in default VRF when source address deleted [ OK ]
Tests passed: 9 Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs") Reported-by: Donald Sharp sharpd@nvidia.com Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/fib_frontend.c | 3 +++ tools/testing/selftests/net/fib_tests.sh | 10 ++++++++++ 2 files changed, 13 insertions(+)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index af8a4255cf1b..5f786ef662ea 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -830,6 +830,9 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb, return -EINVAL; }
+ if (!cfg->fc_table) + cfg->fc_table = RT_TABLE_MAIN; + return 0; errout: return err; diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh index a7b40dc56cae..0f3bf90e04d3 100755 --- a/tools/testing/selftests/net/fib_tests.sh +++ b/tools/testing/selftests/net/fib_tests.sh @@ -1623,11 +1623,13 @@ ipv4_del_addr_test() $IP addr add dev dummy1 172.16.104.1/24 $IP addr add dev dummy1 172.16.104.11/24 $IP addr add dev dummy1 172.16.104.12/24 + $IP addr add dev dummy1 172.16.104.13/24 $IP addr add dev dummy2 172.16.104.1/24 $IP addr add dev dummy2 172.16.104.11/24 $IP addr add dev dummy2 172.16.104.12/24 $IP route add 172.16.105.0/24 via 172.16.104.2 src 172.16.104.11 $IP route add 172.16.106.0/24 dev lo src 172.16.104.12 + $IP route add table 0 172.16.107.0/24 via 172.16.104.2 src 172.16.104.13 $IP route add vrf red 172.16.105.0/24 via 172.16.104.2 src 172.16.104.11 $IP route add vrf red 172.16.106.0/24 dev lo src 172.16.104.12 set +e @@ -1673,6 +1675,14 @@ ipv4_del_addr_test() $IP ro ls vrf red | grep -q 172.16.106.0/24 log_test $? 0 "Route in VRF is not removed by address delete"
+ # removing address from device in default vrf should remove route from + # the default vrf even when route was inserted with a table ID of 0. + echo " Table ID 0" + + $IP addr del dev dummy1 172.16.104.13/24 + $IP ro ls | grep -q 172.16.107.0/24 + log_test $? 1 "Route removed in default VRF when source address deleted" + $IP li del dummy1 $IP li del dummy2 cleanup
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 78a9ea43fc1a7c06a420b132d2d47cbf4344a5df ]
When dsa_devlink_region_create failed in sja1105_setup_devlink_regions(), priv->regions is not released.
Fixes: bf425b82059e ("net: dsa: sja1105: expose static config as devlink region") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Vladimir Oltean olteanv@gmail.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20221205012132.2110979-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/sja1105/sja1105_devlink.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/dsa/sja1105/sja1105_devlink.c b/drivers/net/dsa/sja1105/sja1105_devlink.c index ec2ac91abcfa..8e3d185c8460 100644 --- a/drivers/net/dsa/sja1105/sja1105_devlink.c +++ b/drivers/net/dsa/sja1105/sja1105_devlink.c @@ -95,6 +95,8 @@ static int sja1105_setup_devlink_regions(struct dsa_switch *ds) if (IS_ERR(region)) { while (--i >= 0) dsa_devlink_region_destroy(priv->regions[i]); + + kfree(priv->regions); return PTR_ERR(region); }
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 88956177db179e4eba7cd590971961857d1565b8 ]
When sending packets between nodes in netns, it calls tipc_lxc_xmit() for peer node to receive the packets where tipc_sk_mcast_rcv()/tipc_sk_rcv() might be called, and it's pretty much like in tipc_rcv().
Currently the local 'node rw lock' is held during calling tipc_lxc_xmit() to protect the peer_net not being freed by another thread. However, when receiving these packets, tipc_node_add_conn() might be called where the peer 'node rw lock' is acquired. Then a dead lock warning is triggered by lockdep detector, although it is not a real dead lock:
WARNING: possible recursive locking detected -------------------------------------------- conn_server/1086 is trying to acquire lock: ffff8880065cb020 (&n->lock#2){++--}-{2:2}, \ at: tipc_node_add_conn.cold.76+0xaa/0x211 [tipc]
but task is already holding lock: ffff8880065cd020 (&n->lock#2){++--}-{2:2}, \ at: tipc_node_xmit+0x285/0xb30 [tipc]
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(&n->lock#2); lock(&n->lock#2);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by conn_server/1086: #0: ffff8880036d1e40 (sk_lock-AF_TIPC){+.+.}-{0:0}, \ at: tipc_accept+0x9c0/0x10b0 [tipc] #1: ffff8880036d5f80 (sk_lock-AF_TIPC/1){+.+.}-{0:0}, \ at: tipc_accept+0x363/0x10b0 [tipc] #2: ffff8880065cd020 (&n->lock#2){++--}-{2:2}, \ at: tipc_node_xmit+0x285/0xb30 [tipc] #3: ffff888012e13370 (slock-AF_TIPC){+...}-{2:2}, \ at: tipc_sk_rcv+0x2da/0x1b40 [tipc]
Call Trace: <TASK> dump_stack_lvl+0x44/0x5b __lock_acquire.cold.77+0x1f2/0x3d7 lock_acquire+0x1d2/0x610 _raw_write_lock_bh+0x38/0x80 tipc_node_add_conn.cold.76+0xaa/0x211 [tipc] tipc_sk_finish_conn+0x21e/0x640 [tipc] tipc_sk_filter_rcv+0x147b/0x3030 [tipc] tipc_sk_rcv+0xbb4/0x1b40 [tipc] tipc_lxc_xmit+0x225/0x26b [tipc] tipc_node_xmit.cold.82+0x4a/0x102 [tipc] __tipc_sendstream+0x879/0xff0 [tipc] tipc_accept+0x966/0x10b0 [tipc] do_accept+0x37d/0x590
This patch avoids this warning by not holding the 'node rw lock' before calling tipc_lxc_xmit(). As to protect the 'peer_net', rcu_read_lock() should be enough, as in cleanup_net() when freeing the netns, it calls synchronize_rcu() before the free is continued.
Also since tipc_lxc_xmit() is like the RX path in tipc_rcv(), it makes sense to call it under rcu_read_lock(). Note that the right lock order must be:
rcu_read_lock(); tipc_node_read_lock(n); tipc_node_read_unlock(n); tipc_lxc_xmit(); rcu_read_unlock();
instead of:
tipc_node_read_lock(n); rcu_read_lock(); tipc_node_read_unlock(n); tipc_lxc_xmit(); rcu_read_unlock();
and we have to call tipc_node_read_lock/unlock() twice in tipc_node_xmit().
Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns") Reported-by: Shuang Li shuali@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Link: https://lore.kernel.org/r/5bdd1f8fee9db695cfff4528a48c9b9d0523fb00.167011064... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/node.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/net/tipc/node.c b/net/tipc/node.c index 60059827563a..7589f2ac6fd0 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -1660,6 +1660,7 @@ int tipc_node_xmit(struct net *net, struct sk_buff_head *list, struct tipc_node *n; struct sk_buff_head xmitq; bool node_up = false; + struct net *peer_net; int bearer_id; int rc;
@@ -1676,18 +1677,23 @@ int tipc_node_xmit(struct net *net, struct sk_buff_head *list, return -EHOSTUNREACH; }
+ rcu_read_lock(); tipc_node_read_lock(n); node_up = node_is_up(n); - if (node_up && n->peer_net && check_net(n->peer_net)) { + peer_net = n->peer_net; + tipc_node_read_unlock(n); + if (node_up && peer_net && check_net(peer_net)) { /* xmit inner linux container */ - tipc_lxc_xmit(n->peer_net, list); + tipc_lxc_xmit(peer_net, list); if (likely(skb_queue_empty(list))) { - tipc_node_read_unlock(n); + rcu_read_unlock(); tipc_node_put(n); return 0; } } + rcu_read_unlock();
+ tipc_node_read_lock(n); bearer_id = n->active_links[selector & 1]; if (unlikely(bearer_id == INVALID_BEARER_ID)) { tipc_node_read_unlock(n);
From: Zhang Changzhong zhangchangzhong@huawei.com
[ Upstream commit 063a932b64db3317ec020c94466fe52923a15f60 ]
The greth_init_rings() function won't free the newly allocated skb when dma_mapping_error() returns error, so add dev_kfree_skb() to fix it.
Compile tested only.
Fixes: d4c41139df6e ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver") Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/1670134149-29516-1-git-send-email-zhangchangzhong@... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/aeroflex/greth.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/aeroflex/greth.c b/drivers/net/ethernet/aeroflex/greth.c index f4f50b3a472e..0d56cb4f5dd9 100644 --- a/drivers/net/ethernet/aeroflex/greth.c +++ b/drivers/net/ethernet/aeroflex/greth.c @@ -258,6 +258,7 @@ static int greth_init_rings(struct greth_private *greth) if (dma_mapping_error(greth->dev, dma_addr)) { if (netif_msg_ifup(greth)) dev_err(greth->dev, "Could not create initial DMA mapping\n"); + dev_kfree_skb(skb); goto cleanup; } greth->rx_skbuff[i] = skb;
From: Juergen Gross jgross@suse.com
[ Upstream commit 7dfa764e0223a324366a2a1fc056d4d9d4e95491 ]
Commit ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area") introduced a (valid) build warning. There have even been reports of this problem breaking networking of Xen guests.
Fixes: ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area") Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Reviewed-by: Ross Lagerwall ross.lagerwall@citrix.com Tested-by: Jason Andryuk jandryuk@gmail.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/xen-netback/netback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index fed0f7458e18..f9373a88cf37 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -530,7 +530,7 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, const bool sharedslot = nr_frags && frag_get_pending_idx(&shinfo->frags[0]) == copy_pending_idx(skb, copy_count(skb) - 1); - int i, err; + int i, err = 0;
for (i = 0; i < copy_count(skb); i++) { int newerr;
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit 7d8c19bfc8ff3f78e5337107ca9246327fcb6b45 ]
It is not allowed to call kfree_skb() or consume_skb() from hardware interrupt context or with interrupts being disabled. So replace kfree_skb/dev_kfree_skb() with dev_kfree_skb_irq() and dev_consume_skb_irq() under spin_lock_irq().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Jiri Pirko jiri@nvidia.com Link: https://lore.kernel.org/r/20221207015310.2984909-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/plip/plip.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/plip/plip.c b/drivers/net/plip/plip.c index 5a0e5a8a8917..22f7db87ed21 100644 --- a/drivers/net/plip/plip.c +++ b/drivers/net/plip/plip.c @@ -444,12 +444,12 @@ plip_bh_timeout_error(struct net_device *dev, struct net_local *nl, } rcv->state = PLIP_PK_DONE; if (rcv->skb) { - kfree_skb(rcv->skb); + dev_kfree_skb_irq(rcv->skb); rcv->skb = NULL; } snd->state = PLIP_PK_DONE; if (snd->skb) { - dev_kfree_skb(snd->skb); + dev_consume_skb_irq(snd->skb); snd->skb = NULL; } spin_unlock_irq(&nl->lock);
From: Eric Dumazet edumazet@google.com
[ Upstream commit 803e84867de59a1e5d126666d25eb4860cfd2ebe ]
Blamed commit claimed rcu_read_lock() was held by ip6_fragment() callers.
It seems to not be always true, at least for UDP stack.
syzbot reported:
BUG: KASAN: use-after-free in ip6_dst_idev include/net/ip6_fib.h:245 [inline] BUG: KASAN: use-after-free in ip6_fragment+0x2724/0x2770 net/ipv6/ip6_output.c:951 Read of size 8 at addr ffff88801d403e80 by task syz-executor.3/7618
CPU: 1 PID: 7618 Comm: syz-executor.3 Not tainted 6.1.0-rc6-syzkaller-00012-g4312098baf37 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x15e/0x45d mm/kasan/report.c:395 kasan_report+0xbf/0x1f0 mm/kasan/report.c:495 ip6_dst_idev include/net/ip6_fib.h:245 [inline] ip6_fragment+0x2724/0x2770 net/ipv6/ip6_output.c:951 __ip6_finish_output net/ipv6/ip6_output.c:193 [inline] ip6_finish_output+0x9a3/0x1170 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] ip6_local_out+0xb3/0x1a0 net/ipv6/output_core.c:161 ip6_send_skb+0xbb/0x340 net/ipv6/ip6_output.c:1966 udp_v6_send_skb+0x82a/0x18a0 net/ipv6/udp.c:1286 udp_v6_push_pending_frames+0x140/0x200 net/ipv6/udp.c:1313 udpv6_sendmsg+0x18da/0x2c80 net/ipv6/udp.c:1606 inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 sock_write_iter+0x295/0x3d0 net/socket.c:1108 call_write_iter include/linux/fs.h:2191 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x9ed/0xdd0 fs/read_write.c:584 ksys_write+0x1ec/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fde3588c0d9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fde365b6168 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fde359ac050 RCX: 00007fde3588c0d9 RDX: 000000000000ffdc RSI: 00000000200000c0 RDI: 000000000000000a RBP: 00007fde358e7ae9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fde35acfb1f R14: 00007fde365b6300 R15: 0000000000022000 </TASK>
Allocated by task 7618: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 __kasan_slab_alloc+0x82/0x90 mm/kasan/common.c:325 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3398 [inline] slab_alloc mm/slub.c:3406 [inline] __kmem_cache_alloc_lru mm/slub.c:3413 [inline] kmem_cache_alloc+0x2b4/0x3d0 mm/slub.c:3422 dst_alloc+0x14a/0x1f0 net/core/dst.c:92 ip6_dst_alloc+0x32/0xa0 net/ipv6/route.c:344 ip6_rt_pcpu_alloc net/ipv6/route.c:1369 [inline] rt6_make_pcpu_route net/ipv6/route.c:1417 [inline] ip6_pol_route+0x901/0x1190 net/ipv6/route.c:2254 pol_lookup_func include/net/ip6_fib.h:582 [inline] fib6_rule_lookup+0x52e/0x6f0 net/ipv6/fib6_rules.c:121 ip6_route_output_flags_noref+0x2e6/0x380 net/ipv6/route.c:2625 ip6_route_output_flags+0x76/0x320 net/ipv6/route.c:2638 ip6_route_output include/net/ip6_route.h:98 [inline] ip6_dst_lookup_tail+0x5ab/0x1620 net/ipv6/ip6_output.c:1092 ip6_dst_lookup_flow+0x90/0x1d0 net/ipv6/ip6_output.c:1222 ip6_sk_dst_lookup_flow+0x553/0x980 net/ipv6/ip6_output.c:1260 udpv6_sendmsg+0x151d/0x2c80 net/ipv6/udp.c:1554 inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 __sys_sendto+0x23a/0x340 net/socket.c:2117 __do_sys_sendto net/socket.c:2129 [inline] __se_sys_sendto net/socket.c:2125 [inline] __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 7599: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2e/0x40 mm/kasan/generic.c:511 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free+0x160/0x1c0 mm/kasan/common.c:200 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1724 [inline] slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1750 slab_free mm/slub.c:3661 [inline] kmem_cache_free+0xee/0x5c0 mm/slub.c:3683 dst_destroy+0x2ea/0x400 net/core/dst.c:127 rcu_do_batch kernel/rcu/tree.c:2250 [inline] rcu_core+0x81f/0x1980 kernel/rcu/tree.c:2510 __do_softirq+0x1fb/0xadc kernel/softirq.c:571
Last potentially related work creation: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:481 call_rcu+0x9d/0x820 kernel/rcu/tree.c:2798 dst_release net/core/dst.c:177 [inline] dst_release+0x7d/0xe0 net/core/dst.c:167 refdst_drop include/net/dst.h:256 [inline] skb_dst_drop include/net/dst.h:268 [inline] skb_release_head_state+0x250/0x2a0 net/core/skbuff.c:838 skb_release_all net/core/skbuff.c:852 [inline] __kfree_skb net/core/skbuff.c:868 [inline] kfree_skb_reason+0x151/0x4b0 net/core/skbuff.c:891 kfree_skb_list_reason+0x4b/0x70 net/core/skbuff.c:901 kfree_skb_list include/linux/skbuff.h:1227 [inline] ip6_fragment+0x2026/0x2770 net/ipv6/ip6_output.c:949 __ip6_finish_output net/ipv6/ip6_output.c:193 [inline] ip6_finish_output+0x9a3/0x1170 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] ip6_local_out+0xb3/0x1a0 net/ipv6/output_core.c:161 ip6_send_skb+0xbb/0x340 net/ipv6/ip6_output.c:1966 udp_v6_send_skb+0x82a/0x18a0 net/ipv6/udp.c:1286 udp_v6_push_pending_frames+0x140/0x200 net/ipv6/udp.c:1313 udpv6_sendmsg+0x18da/0x2c80 net/ipv6/udp.c:1606 inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 sock_write_iter+0x295/0x3d0 net/socket.c:1108 call_write_iter include/linux/fs.h:2191 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x9ed/0xdd0 fs/read_write.c:584 ksys_write+0x1ec/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Second to last potentially related work creation: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:481 call_rcu+0x9d/0x820 kernel/rcu/tree.c:2798 dst_release net/core/dst.c:177 [inline] dst_release+0x7d/0xe0 net/core/dst.c:167 refdst_drop include/net/dst.h:256 [inline] skb_dst_drop include/net/dst.h:268 [inline] __dev_queue_xmit+0x1b9d/0x3ba0 net/core/dev.c:4211 dev_queue_xmit include/linux/netdevice.h:3008 [inline] neigh_resolve_output net/core/neighbour.c:1552 [inline] neigh_resolve_output+0x51b/0x840 net/core/neighbour.c:1532 neigh_output include/net/neighbour.h:546 [inline] ip6_finish_output2+0x56c/0x1530 net/ipv6/ip6_output.c:134 __ip6_finish_output net/ipv6/ip6_output.c:195 [inline] ip6_finish_output+0x694/0x1170 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] NF_HOOK include/linux/netfilter.h:302 [inline] NF_HOOK include/linux/netfilter.h:296 [inline] mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x720/0xdc0 net/ipv6/mcast.c:2653 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
The buggy address belongs to the object at ffff88801d403dc0 which belongs to the cache ip6_dst_cache of size 240 The buggy address is located 192 bytes inside of 240-byte region [ffff88801d403dc0, ffff88801d403eb0)
The buggy address belongs to the physical page: page:ffffea00007500c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d403 memcg:ffff888022f49c81 flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000000200 ffffea0001ef6580 dead000000000002 ffff88814addf640 raw: 0000000000000000 00000000800c000c 00000001ffffffff ffff888022f49c81 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 3719, tgid 3719 (kworker/0:6), ts 136223432244, free_ts 136222971441 prep_new_page mm/page_alloc.c:2539 [inline] get_page_from_freelist+0x10b5/0x2d50 mm/page_alloc.c:4288 __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5555 alloc_pages+0x1aa/0x270 mm/mempolicy.c:2285 alloc_slab_page mm/slub.c:1794 [inline] allocate_slab+0x213/0x300 mm/slub.c:1939 new_slab mm/slub.c:1992 [inline] ___slab_alloc+0xa91/0x1400 mm/slub.c:3180 __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3279 slab_alloc_node mm/slub.c:3364 [inline] slab_alloc mm/slub.c:3406 [inline] __kmem_cache_alloc_lru mm/slub.c:3413 [inline] kmem_cache_alloc+0x31a/0x3d0 mm/slub.c:3422 dst_alloc+0x14a/0x1f0 net/core/dst.c:92 ip6_dst_alloc+0x32/0xa0 net/ipv6/route.c:344 icmp6_dst_alloc+0x71/0x680 net/ipv6/route.c:3261 mld_sendpack+0x5de/0xe70 net/ipv6/mcast.c:1809 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x720/0xdc0 net/ipv6/mcast.c:2653 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1459 [inline] free_pcp_prepare+0x65c/0xd90 mm/page_alloc.c:1509 free_unref_page_prepare mm/page_alloc.c:3387 [inline] free_unref_page+0x1d/0x4d0 mm/page_alloc.c:3483 __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2586 qlink_free mm/kasan/quarantine.c:168 [inline] qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187 kasan_quarantine_reduce+0x184/0x210 mm/kasan/quarantine.c:294 __kasan_slab_alloc+0x66/0x90 mm/kasan/common.c:302 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3398 [inline] kmem_cache_alloc_node+0x304/0x410 mm/slub.c:3443 __alloc_skb+0x214/0x300 net/core/skbuff.c:497 alloc_skb include/linux/skbuff.h:1267 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1191 [inline] netlink_sendmsg+0x9a6/0xe10 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 __sys_sendto+0x23a/0x340 net/socket.c:2117 __do_sys_sendto net/socket.c:2129 [inline] __se_sys_sendto net/socket.c:2125 [inline] __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 1758fd4688eb ("ipv6: remove unnecessary dst_hold() in ip6_fragment()") Reported-by: syzbot+8c0ac31aa9681abb9e2d@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Wei Wang weiwan@google.com Cc: Martin KaFai Lau kafai@fb.com Link: https://lore.kernel.org/r/20221206101351.2037285-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index fadad8e83521..e427f5040a08 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -919,6 +919,9 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, if (err < 0) goto fail;
+ /* We prevent @rt from being freed. */ + rcu_read_lock(); + for (;;) { /* Prepare header of the next frame, * before previous one went down. */ @@ -942,6 +945,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, if (err == 0) { IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), IPSTATS_MIB_FRAGOKS); + rcu_read_unlock(); return 0; }
@@ -949,6 +953,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), IPSTATS_MIB_FRAGFAILS); + rcu_read_unlock(); return err;
slow_path_clean:
From: Dan Carpenter error27@gmail.com
[ Upstream commit cdd97383e19d4afe29adc3376025a15ae3bab3a3 ]
In an earlier commit, I added a bounds check to prevent an out of bounds read and a WARN(). On further discussion and consideration that check was probably too aggressive. Instead of returning -EINVAL, a better fix would be to just prevent the out of bounds read but continue the process.
Background: The value of "pp->rxq_def" is a number between 0-7 by default, or even higher depending on the value of "rxq_number", which is a module parameter. If the value is more than the number of available CPUs then it will trigger the WARN() in cpu_max_bits_warn().
Fixes: e8b4fc13900b ("net: mvneta: Prevent out of bounds read in mvneta_config_rss()") Signed-off-by: Dan Carpenter error27@gmail.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/Y5A7d1E5ccwHTYPf@kadam Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/mvneta.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 6bfa0ac27be3..f5567d485e91 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -4140,7 +4140,7 @@ static void mvneta_percpu_elect(struct mvneta_port *pp) /* Use the cpu associated to the rxq when it is online, in all * the other cases, use the cpu 0 which can't be offline. */ - if (cpu_online(pp->rxq_def)) + if (pp->rxq_def < nr_cpu_ids && cpu_online(pp->rxq_def)) elected_cpu = pp->rxq_def;
max_cpu = num_present_cpus(); @@ -4767,9 +4767,6 @@ static int mvneta_config_rss(struct mvneta_port *pp) napi_disable(&pp->napi); }
- if (pp->indir[0] >= nr_cpu_ids) - return -EINVAL; - pp->rxq_def = pp->indir[0];
/* Update unicast mapping */
From: Emeel Hakim ehakim@nvidia.com
[ Upstream commit 38099024e51ee37dee5f0f577ca37175c932e3f7 ]
Add missing attribute validation for IFLA_MACSEC_OFFLOAD to the netlink policy.
Fixes: 791bb3fcafce ("net: macsec: add support for specifying offload upon link creation") Signed-off-by: Emeel Hakim ehakim@nvidia.com Reviewed-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Sabrina Dubroca sd@queasysnail.net Link: https://lore.kernel.org/r/20221207101618.989-1-ehakim@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/macsec.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 3e564158c401..eb029456b594 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -3680,6 +3680,7 @@ static const struct nla_policy macsec_rtnl_policy[IFLA_MACSEC_MAX + 1] = { [IFLA_MACSEC_SCB] = { .type = NLA_U8 }, [IFLA_MACSEC_REPLAY_PROTECT] = { .type = NLA_U8 }, [IFLA_MACSEC_VALIDATION] = { .type = NLA_U8 }, + [IFLA_MACSEC_OFFLOAD] = { .type = NLA_U8 }, };
static void macsec_free_netdev(struct net_device *dev)
From: Frank Jungclaus frank.jungclaus@esd.eu
[ Upstream commit 918ee4911f7a41fb4505dff877c1d7f9f64eb43e ]
We don't get any further EVENT from an esd CAN USB device for changes on REC or TEC while those counters converge to 0 (with ecc == 0). So when handling the "Back to Error Active"-event force txerr = rxerr = 0, otherwise the berr-counters might stay on values like 95 forever.
Also, to make life easier during the ongoing development a netdev_dbg() has been introduced to allow dumping error events send by an esd CAN USB device.
Fixes: 96d8e90382dc ("can: Add driver for esd CAN-USB/2 device") Signed-off-by: Frank Jungclaus frank.jungclaus@esd.eu Link: https://lore.kernel.org/all/20221130202242.3998219-2-frank.jungclaus@esd.eu Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/usb/esd_usb2.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/can/usb/esd_usb2.c b/drivers/net/can/usb/esd_usb2.c index 8847942a8d97..73c5343e609b 100644 --- a/drivers/net/can/usb/esd_usb2.c +++ b/drivers/net/can/usb/esd_usb2.c @@ -227,6 +227,10 @@ static void esd_usb2_rx_event(struct esd_usb2_net_priv *priv, u8 rxerr = msg->msg.rx.data[2]; u8 txerr = msg->msg.rx.data[3];
+ netdev_dbg(priv->netdev, + "CAN_ERR_EV_EXT: dlc=%#02x state=%02x ecc=%02x rec=%02x tec=%02x\n", + msg->msg.rx.dlc, state, ecc, rxerr, txerr); + skb = alloc_can_err_skb(priv->netdev, &cf); if (skb == NULL) { stats->rx_dropped++; @@ -253,6 +257,8 @@ static void esd_usb2_rx_event(struct esd_usb2_net_priv *priv, break; default: priv->can.state = CAN_STATE_ERROR_ACTIVE; + txerr = 0; + rxerr = 0; break; } } else {
Hi!
This is the start of the stable review cycle for the 5.10.159 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-5...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
Hi!
This is the start of the stable review cycle for the 5.10.159 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
YueHaibing yuehaibing@huawei.com net: broadcom: Add PTP_1588_CLOCK_OPTIONAL dependency for BCMGENET under ARCH_BCM2835
This one is not suitable for 5.10, we don't have PTP_1588_CLOCK_OPTIONAL there.
Kees Cook keescook@chromium.org ALSA: seq: Fix function prototype mismatch in snd_seq_expand_var_event
This is useful fo kCFI, but we don't have kCFI in 5.10 and others.
Best regards, Pavel
On Mon, Dec 12, 2022 at 07:31:59PM +0100, Pavel Machek wrote:
Hi!
This is the start of the stable review cycle for the 5.10.159 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
YueHaibing yuehaibing@huawei.com net: broadcom: Add PTP_1588_CLOCK_OPTIONAL dependency for BCMGENET under ARCH_BCM2835
This one is not suitable for 5.10, we don't have PTP_1588_CLOCK_OPTIONAL there.
Now dropped.
Kees Cook keescook@chromium.org ALSA: seq: Fix function prototype mismatch in snd_seq_expand_var_event
This is useful fo kCFI, but we don't have kCFI in 5.10 and others.
I'll keep this one as it solves a clang warning that people are going to hit eventually on older kernels.
thanks,
greg k-h
On 12/12/22 05:09, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.159 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.159-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On 12/12/22 06:09, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.159 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.159-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Mon, Dec 12, 2022 at 02:09:03PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.159 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
Build results: total: 162 pass: 162 fail: 0 Qemu test results: total: 475 pass: 475 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Mon, 12 Dec 2022 at 18:43, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.159 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.159-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Regression detected on arm64 Raspberry Pi 4 Model B the NFS mount failed.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Following changes have been noticed in the Kconfig file between good and bad. The config files attached to this email.
-CONFIG_BCMGENET=y -CONFIG_BROADCOM_PHY=y +# CONFIG_BROADCOM_PHY is not set -CONFIG_BCM7XXX_PHY=y +# CONFIG_BCM7XXX_PHY is not set -CONFIG_BCM_NET_PHYLIB=y
boot log: ---------- [ 0.000000] Linux version 5.10.159-rc1 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 11.3.0-6) 11.3.0, GNU ld (GNU Binutils for Debian) 2.39) #1 SMP PREEMPT @1670851702 [ 0.000000] Machine model: Raspberry Pi 4 Model B ... [ 113.631107] VFS: Unable to mount root fs via NFS. [ 113.636037] devtmpfs: mounted [ 113.655540] Freeing unused kernel memory: 9408K [ 113.678706] Run /sbin/init as init process [ 113.683209] Run /etc/init as init process [ 113.687486] Run /bin/init as init process [ 113.691792] Run /bin/sh as init process [ 113.695887] Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. [ 113.710272] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.159-rc1 #1 [ 113.716806] Hardware name: Raspberry Pi 4 Model B (DT) [ 113.722015] Call trace: [ 113.724509] dump_backtrace+0x0/0x1d0 [ 113.728226] show_stack+0x20/0x30 [ 113.731590] dump_stack+0xf8/0x168 [ 113.735041] panic+0x184/0x390 [ 113.738141] kernel_init+0x10c/0x130 [ 113.741766] ret_from_fork+0x10/0x3c [ 113.745397] SMP: stopping secondary CPUs [ 113.749381] Kernel Offset: disabled [ 113.752918] CPU features: 0x28240022,61006000 [ 113.757333] Memory Limit: none [ 113.760441] ---[ end Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. ]---
Full test log details, - https://lkft.validation.linaro.org/scheduler/job/5946533#L392 - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10.... - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
$ diff -Naurb good-rpi4.config bad-rpi4.config --- good-rpi4.config 2022-12-08 16:38:17.000000000 +0530 +++ bad-rpi4.config 2022-12-12 19:05:51.000000000 +0530 @@ -1,6 +1,6 @@ # # Automatically generated file; DO NOT EDIT. -# Linux/arm64 5.10.158 Kernel Configuration +# Linux/arm64 5.10.159-rc1 Kernel Configuration # CONFIG_CC_VERSION_TEXT="aarch64-linux-gnu-gcc (Debian 11.3.0-6) 11.3.0" CONFIG_CC_IS_GCC=y @@ -2355,7 +2355,6 @@ # CONFIG_AURORA_NB8800 is not set CONFIG_NET_VENDOR_BROADCOM=y # CONFIG_B44 is not set -CONFIG_BCMGENET=y # CONFIG_BNX2 is not set # CONFIG_CNIC is not set # CONFIG_TIGON3 is not set @@ -2613,13 +2612,12 @@ # CONFIG_ADIN_PHY is not set CONFIG_AQUANTIA_PHY=y # CONFIG_AX88796B_PHY is not set -CONFIG_BROADCOM_PHY=y +# CONFIG_BROADCOM_PHY is not set # CONFIG_BCM54140_PHY is not set -CONFIG_BCM7XXX_PHY=y +# CONFIG_BCM7XXX_PHY is not set # CONFIG_BCM84881_PHY is not set # CONFIG_BCM87XX_PHY is not set # CONFIG_BCM_CYGNUS_PHY is not set -CONFIG_BCM_NET_PHYLIB=y # CONFIG_CICADA_PHY is not set # CONFIG_CORTINA_PHY is not set # CONFIG_DAVICOM_PHY is not set @@ -2659,7 +2657,7 @@ CONFIG_MDIO_XGENE=y CONFIG_MDIO_BITBANG=y CONFIG_MDIO_BCM_IPROC=y -CONFIG_MDIO_BCM_UNIMAC=y +# CONFIG_MDIO_BCM_UNIMAC is not set CONFIG_MDIO_CAVIUM=y # CONFIG_MDIO_GPIO is not set # CONFIG_MDIO_HISI_FEMAC is not set
## Build * kernel: 5.10.159-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.10.y * git commit: d2432186ff47a7e6d420858a997aebbe1bda2dd8 * git describe: v5.10.158-107-gd2432186ff47 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
## Test Regressions (compared to v5.10.158) * bcm2711-rpi-4-b, log-parser-test - check-kernel-panic
## Metric Regressions (compared to v5.10.158)
## Test Fixes (compared to v5.10.158)
## Metric Fixes (compared to v5.10.158)
## Test result summary total: 134142, pass: 116844, fail: 2542, skip: 14290, xfail: 466
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 151 total, 148 passed, 3 failed * arm64: 49 total, 46 passed, 3 failed * i386: 39 total, 37 passed, 2 failed * mips: 31 total, 29 passed, 2 failed * parisc: 8 total, 8 passed, 0 failed * powerpc: 32 total, 26 passed, 6 failed * riscv: 16 total, 14 passed, 2 failed * s390: 16 total, 16 passed, 0 failed * sh: 14 total, 12 passed, 2 failed * sparc: 8 total, 8 passed, 0 failed * x86_64: 42 total, 40 passed, 2 failed
## Test suites summary * boot * fwts * igt-gpu-tools * kselftest-android * kselftest-arm64 * kselftest-arm64/arm64.btitest.bti_c_func * kselftest-arm64/arm64.btitest.bti_j_func * kselftest-arm64/arm64.btitest.bti_jc_func * kselftest-arm64/arm64.btitest.bti_none_func * kselftest-arm64/arm64.btitest.nohint_func * kselftest-arm64/arm64.btitest.paciasp_func * kselftest-arm64/arm64.nobtitest.bti_c_func * kselftest-arm64/arm64.nobtitest.bti_j_func * kselftest-arm64/arm64.nobtitest.bti_jc_func * kselftest-arm64/arm64.nobtitest.bti_none_func * kselftest-arm64/arm64.nobtitest.nohint_func * kselftest-arm64/arm64.nobtitest.paciasp_func * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-open-posix-tests * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * packetdrill * perf * perf/Zstd-perf.data-compression * rcutorture * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
On Tue, Dec 13, 2022, at 08:48, Naresh Kamboju wrote:
On Mon, 12 Dec 2022 at 18:43, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Regression detected on arm64 Raspberry Pi 4 Model B the NFS mount failed.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Following changes have been noticed in the Kconfig file between good and bad. The config files attached to this email.
-CONFIG_BCMGENET=y -CONFIG_BROADCOM_PHY=y +# CONFIG_BROADCOM_PHY is not set -CONFIG_BCM7XXX_PHY=y +# CONFIG_BCM7XXX_PHY is not set -CONFIG_BCM_NET_PHYLIB=y
Full test log details,
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
Where does the kernel configuration come from? Is this a plain defconfig that used to work, or do you have a board specific config file?
This is most likely caused by the added dependency on CONFIG_PTP_1588_CLOCK that would lead to the BCMGENET driver not being built-in if PTP support is in a module.
Arnd
On Tue, 13 Dec 2022 at 14:50, Arnd Bergmann arnd@arndb.de wrote:
On Tue, Dec 13, 2022, at 08:48, Naresh Kamboju wrote:
On Mon, 12 Dec 2022 at 18:43, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Regression detected on arm64 Raspberry Pi 4 Model B the NFS mount failed.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Following changes have been noticed in the Kconfig file between good and bad. The config files attached to this email.
-CONFIG_BCMGENET=y -CONFIG_BROADCOM_PHY=y +# CONFIG_BROADCOM_PHY is not set -CONFIG_BCM7XXX_PHY=y +# CONFIG_BCM7XXX_PHY is not set -CONFIG_BCM_NET_PHYLIB=y
Full test log details,
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
Where does the kernel configuration come from? Is this a plain defconfig that used to work, or do you have a board specific config file?
This is most likely caused by the added dependency on CONFIG_PTP_1588_CLOCK that would lead to the BCMGENET driver not being built-in if PTP support is in a module.
Here is the build command which is the same for working and not working kernels.
# tuxmake --runtime podman --target-arch arm64 --toolchain gcc-11 --kconfig defconfig --kconfig-add https://raw.githubusercontent.com/Linaro/meta-lkft/kirkstone/meta/recipes-ke... --kconfig-add https://raw.githubusercontent.com/Linaro/meta-lkft/kirkstone/meta/recipes-ke... --kconfig-add https://raw.githubusercontent.com/Linaro/meta-lkft/kirkstone/meta/recipes-ke... --kconfig-add https://raw.githubusercontent.com/Linaro/meta-lkft/kirkstone/meta/recipes-ke... --kconfig-add https://raw.githubusercontent.com/Linaro/meta-lkft/kirkstone/meta/recipes-ke... --kconfig-add CONFIG_ARM64_MODULE_PLTS=y --kconfig-add CONFIG_SYN_COOKIES=y --kconfig-add CONFIG_SCHEDSTATS=y CROSS_COMPILE_COMPAT=arm-linux-gnueabihf-
- Naresh
Arnd
On Tue 2022-12-13 10:20:30, Arnd Bergmann wrote:
On Tue, Dec 13, 2022, at 08:48, Naresh Kamboju wrote:
On Mon, 12 Dec 2022 at 18:43, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Regression detected on arm64 Raspberry Pi 4 Model B the NFS mount failed.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Following changes have been noticed in the Kconfig file between good and bad. The config files attached to this email.
-CONFIG_BCMGENET=y -CONFIG_BROADCOM_PHY=y +# CONFIG_BROADCOM_PHY is not set -CONFIG_BCM7XXX_PHY=y +# CONFIG_BCM7XXX_PHY is not set -CONFIG_BCM_NET_PHYLIB=y
Full test log details,
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
Where does the kernel configuration come from? Is this a plain defconfig that used to work, or do you have a board specific config file?
This is most likely caused by the added dependency on CONFIG_PTP_1588_CLOCK that would lead to the BCMGENET driver not being built-in if PTP support is in a module.
There's no PTP_1588_CLOCK_OPTIONAL in 5.10. This needs to be dropped:
|2a4912705 421f86 o: 5.10| net: broadcom: Add PTP_1588_CLOCK_OPTIONAL dependency for BCMGENET under ARCH_BCM2835
Best regards, Pavel
On Tue, Dec 13, 2022 at 10:20:30AM +0100, Arnd Bergmann wrote:
On Tue, Dec 13, 2022, at 08:48, Naresh Kamboju wrote:
On Mon, 12 Dec 2022 at 18:43, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Regression detected on arm64 Raspberry Pi 4 Model B the NFS mount failed.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Following changes have been noticed in the Kconfig file between good and bad. The config files attached to this email.
-CONFIG_BCMGENET=y -CONFIG_BROADCOM_PHY=y +# CONFIG_BROADCOM_PHY is not set -CONFIG_BCM7XXX_PHY=y +# CONFIG_BCM7XXX_PHY is not set -CONFIG_BCM_NET_PHYLIB=y
Full test log details,
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
Where does the kernel configuration come from? Is this a plain defconfig that used to work, or do you have a board specific config file?
This is most likely caused by the added dependency on CONFIG_PTP_1588_CLOCK that would lead to the BCMGENET driver not being built-in if PTP support is in a module.
I've dropped the patch that caused this and will push out a -rc2 in a bit.
thanks all!
greg k-h
Hi Greg,
On Mon, Dec 12, 2022 at 02:09:03PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.159 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
Build test (gcc version 11.3.1 20221127): mips: 63 configs -> no failure arm: 104 configs -> no failure arm64: 3 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1]
arm64: Booted on rpi4b (4GB model). Regression noticed: Failed to start Network Manager
Will try a bisect and find which commit caused it.
[1]. https://openqa.qa.codethink.co.uk/tests/2339
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
On Tue, 13 Dec 2022 at 12:06, Sudip Mukherjee (Codethink) sudipm.mukherjee@gmail.com wrote:
Hi Greg,
On Mon, Dec 12, 2022 at 02:09:03PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.159 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
Build test (gcc version 11.3.1 20221127): mips: 63 configs -> no failure arm: 104 configs -> no failure arm64: 3 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1]
arm64: Booted on rpi4b (4GB model). Regression noticed: Failed to start Network Manager
Will try a bisect and find which commit caused it.
Bisect pointed to 60aefe77fb48 ("net: broadcom: Add PTP_1588_CLOCK_OPTIONAL dependency for BCMGENET under ARCH_BCM2835")
On Tue, Dec 13, 2022 at 03:31:35PM +0000, Sudip Mukherjee wrote:
On Tue, 13 Dec 2022 at 12:06, Sudip Mukherjee (Codethink) sudipm.mukherjee@gmail.com wrote:
Hi Greg,
On Mon, Dec 12, 2022 at 02:09:03PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.159 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Dec 2022 13:08:57 +0000. Anything received after that time might be too late.
Build test (gcc version 11.3.1 20221127): mips: 63 configs -> no failure arm: 104 configs -> no failure arm64: 3 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1]
arm64: Booted on rpi4b (4GB model). Regression noticed: Failed to start Network Manager
Will try a bisect and find which commit caused it.
Bisect pointed to 60aefe77fb48 ("net: broadcom: Add PTP_1588_CLOCK_OPTIONAL dependency for BCMGENET under ARCH_BCM2835")
Should be fixed in -rc2.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org