This is the start of the stable review cycle for the 5.2.14 release. There are 94 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue 10 Sep 2019 12:09:36 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.14-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.2.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.2.14-rc1
Jan Kaisrlik ja.kaisrlik@gmail.com Revert "mmc: core: do not retry CMD6 in __mmc_switch()"
John S. Gruber JohnSGruber@gmail.com x86/boot: Preserve boot_params.secure_boot from sanitizing
Linus Torvalds torvalds@linux-foundation.org Revert "x86/apic: Include the LDR when clearing out APIC registers"
Luis Henriques lhenriques@suse.com libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer
Kirill A. Shutemov kirill.shutemov@linux.intel.com x86/boot/compressed/64: Fix missing initialization in find_trampoline_placement()
Andre Przywara andre.przywara@arm.com KVM: arm/arm64: VGIC: Properly initialise private IRQ affinity
Linus Walleij linus.walleij@linaro.org gpio: Fix irqchip initialization order
Selvin Xavier selvin.xavier@broadcom.com RDMA/bnxt_re: Fix stack-out-of-bounds in bnxt_qplib_rcfw_send_message
YueHaibing yuehaibing@huawei.com afs: use correct afs_call_type in yfs_fs_store_opaque_acl2
Marc Dionne marc.dionne@auristor.com afs: Fix possible oops in afs_lookup trace event
David Howells dhowells@redhat.com afs: Fix leak in afs_lookup_cell_rcu()
Andrew Jones drjones@redhat.com KVM: arm/arm64: Only skip MMIO insn once
Luis Henriques lhenriques@suse.com ceph: fix buffer free while holding i_ceph_lock in fill_inode()
Luis Henriques lhenriques@suse.com ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob()
Luis Henriques lhenriques@suse.com ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr()
Nicolai Hähnle nicolai.haehnle@amd.com drm/amdgpu: prevent memory leaks in AMDGPU_CS ioctl
Vitaly Kuznetsov vkuznets@redhat.com selftests/kvm: make platform_info_test pass on AMD
Paolo Bonzini pbonzini@redhat.com selftests: kvm: fix state save/load on processors without XSAVE
Wenwen Wang wenwen@cs.uga.edu infiniband: hfi1: fix memory leaks
Wenwen Wang wenwen@cs.uga.edu infiniband: hfi1: fix a memory leak bug
Wenwen Wang wenwen@cs.uga.edu IB/mlx4: Fix memory leaks
zhengbin zhengbin13@huawei.com RDMA/cma: fix null-ptr-deref Read in cma_cleanup
Guilherme G. Piccoli gpiccoli@canonical.com nvme: Fix cntlid validation when not using NVMEoF
Anton Eidelman anton@lightbitslabs.com nvme-multipath: fix possible I/O hang when paths are updated
Vitaly Kuznetsov vkuznets@redhat.com Tools: hv: kvp: eliminate 'may be used uninitialized' warning
Dexuan Cui decui@microsoft.com Input: hyperv-keyboard: Use in-place iterator API in the channel callback
James Smart jsmart2021@gmail.com scsi: lpfc: Mitigate high memory pre-allocation by SCSI-MQ
Kirill A. Shutemov kirill@shutemov.name x86/boot/compressed/64: Fix boot on machines with broken E820 table
Benjamin Tissoires benjamin.tissoires@redhat.com HID: cp2112: prevent sleeping function called from invalid context
Even Xu even.xu@intel.com HID: intel-ish-hid: ipc: add EHL device id
Andrea Righi andrea.righi@canonical.com kprobes: Fix potential deadlock in kprobe_optimizer()
Sebastian Andrzej Siewior bigeasy@linutronix.de sched/core: Schedule new worker even if PI-blocked
Tho Vu tho.vu.wh@rvc.renesas.com ravb: Fix use-after-free ravb_tstamp_skb
Wenwen Wang wenwen@cs.uga.edu wimax/i2400m: fix a memory leak bug
Stephen Hemminger stephen@networkplumber.org net: cavium: fix driver name
Thomas Falcon tlfalcon@linux.ibm.com ibmvnic: Unmap DMA address of TX descriptor buffers after use
Wenwen Wang wenwen@cs.uga.edu net: kalmia: fix memory leaks
Wenwen Wang wenwen@cs.uga.edu cx82310_eth: fix a memory leak bug
Darrick J. Wong darrick.wong@oracle.com vfs: fix page locking deadlocks when deduping files
Wenwen Wang wenwen@cs.uga.edu lan78xx: Fix memory leaks
Martin Blumenstingl martin.blumenstingl@googlemail.com clk: Fix potential NULL dereference in clk_fetch_parent_index()
Stephen Boyd sboyd@kernel.org clk: Fix falling back to legacy parent string matching
Wenwen Wang wenwen@cs.uga.edu net: myri10ge: fix memory leaks
Wenwen Wang wenwen@cs.uga.edu liquidio: add cleanup in octeon_setup_iq()
Paolo Bonzini pbonzini@redhat.com selftests: kvm: fix vmx_set_nested_state_test
Paolo Bonzini pbonzini@redhat.com selftests: kvm: provide common function to enable eVMCS
Paolo Bonzini pbonzini@redhat.com selftests: kvm: do not try running the VM in vmx_set_nested_state_test
Wenwen Wang wenwen@cs.uga.edu cxgb4: fix a memory leak bug
Dmitry Fomichev dmitry.fomichev@wdc.com scsi: target: tcmu: avoid use-after-free after command timeout
Bill Kuzeja William.Kuzeja@stratus.com scsi: qla2xxx: Fix gnl.l memory leak on adapter init failure
Alexandre Courbot acourbot@chromium.org drm/mediatek: set DMA max segment size
Alexandre Courbot acourbot@chromium.org drm/mediatek: use correct device to import PRIME buffers
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_flow_offload: skip tcp rst and fin packets
YueHaibing yuehaibing@huawei.com gpio: Fix build error of function redefinition
Thomas Falcon tlfalcon@linux.ibm.com ibmveth: Convert multicast list size for little-endian system
Julian Wiedmann jwi@linux.ibm.com s390/qeth: serialize cmd reply with concurrent timeout
Fabian Henneke fabian.henneke@gmail.com Bluetooth: hidp: Let hidp_send_message return number of queued bytes
Harish Bandi c-hbandi@codeaurora.org Bluetooth: hci_qca: Send VS pre shutdown command.
Matthias Kaehlcke mka@chromium.org Bluetooth: btqca: Add a short delay before downloading the NVM
Nathan Chancellor natechancellor@gmail.com net: tc35815: Explicitly check NET_IP_ALIGN is not zero in tc35815_rx
Dexuan Cui decui@microsoft.com hv_netvsc: Fix a warning of suspicious RCU usage
Taehee Yoo ap420073@gmail.com ixgbe: fix possible deadlock in ixgbe_service_task()
Jakub Kicinski jakub.kicinski@netronome.com tools: bpftool: fix error message (prog -> object)
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_flow_table: teardown flow timeout race
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_flow_table: conntrack picks up expired flows
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: use-after-free in failing rule with bound set
Fuqian Huang huangfq.daxian@gmail.com net: tundra: tsi108: use spin_lock_irqsave instead of spin_lock_irq in IRQ context
Marek Szyprowski m.szyprowski@samsung.com clk: samsung: exynos542x: Move MSCL subsystem clocks to its sub-CMU
Sylwester Nawrocki s.nawrocki@samsung.com clk: samsung: exynos5800: Move MAU subsystem clocks to MAU sub-CMU
Sylwester Nawrocki s.nawrocki@samsung.com clk: samsung: Change signature of exynos5_subcmus_init() function
Aya Levin ayal@mellanox.com net/mlx5e: Fix error flow of CQE recovery on tx reporter
Florian Westphal fw@strlen.de netfilter: nf_flow_table: fix offload for flows that are subject to xfrm
Andrii Nakryiko andriin@fb.com libbpf: set BTF FD for prog only when there is supported .BTF.ext data
Andrii Nakryiko andriin@fb.com libbpf: fix erroneous multi-closing of BTF FD
Sven Eckelmann sven@narfation.org batman-adv: Fix netlink dumping of all mcast_flags buckets
Ka-Cheong Poon ka-cheong.poon@oracle.com net/rds: Fix info leak in rds6_inc_info_copy()
Davide Caratti dcaratti@redhat.com net/sched: pfifo_fast: fix wrong dereference when qdisc is reset
Davide Caratti dcaratti@redhat.com net/sched: pfifo_fast: fix wrong dereference in pfifo_fast_enqueue
Vladimir Oltean olteanv@gmail.com net: dsa: tag_8021q: Future-proof the reserved fields in the custom VID
Marco Hartmann marco.hartmann@nxp.com Add genphy_c45_config_aneg() function to phy-c45.c
Vladimir Oltean olteanv@gmail.com net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate
Vladimir Oltean olteanv@gmail.com taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte
Vladimir Oltean olteanv@gmail.com taprio: Fix kernel panic in taprio_destroy
Hayes Wang hayeswang@realtek.com r8152: remove calling netif_napi_del
Hayes Wang hayeswang@realtek.com Revert "r8152: napi hangup fix after disconnect"
John Hurley john.hurley@netronome.com nfp: flower: handle neighbour events on internal ports
John Hurley john.hurley@netronome.com nfp: flower: prevent ingress block binds on internal ports
Eric Dumazet edumazet@google.com tcp: remove empty skb from write queue in error cases
Willem de Bruijn willemb@google.com tcp: inherit timestamp on mtu probe
Chen-Yu Tsai wens@csie.org net: stmmac: dwmac-rk: Don't fail if phy regulator is absent
Cong Wang xiyou.wangcong@gmail.com net_sched: fix a NULL pointer deref in ipt action
Vlad Buslov vladbu@mellanox.com net: sched: act_sample: fix psample group handling on overwrite
Feng Sun loyou85@gmail.com net: fix skb use after free in netpoll
Eric Dumazet edumazet@google.com mld: fix memory leak in mld_del_delrec()
-------------
Diffstat:
Makefile | 4 +- arch/x86/boot/compressed/pgtable_64.c | 13 +- arch/x86/include/asm/bootparam_utils.h | 1 + arch/x86/kernel/apic/apic.c | 4 - drivers/bluetooth/btqca.c | 24 +++ drivers/bluetooth/btqca.h | 7 + drivers/bluetooth/hci_qca.c | 3 + drivers/clk/clk.c | 49 +++++-- drivers/clk/samsung/clk-exynos5-subcmu.c | 16 +- drivers/clk/samsung/clk-exynos5-subcmu.h | 2 +- drivers/clk/samsung/clk-exynos5250.c | 7 +- drivers/clk/samsung/clk-exynos5420.c | 162 ++++++++++++++------- drivers/gpio/gpiolib.c | 30 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +- drivers/gpu/drm/mediatek/mtk_drm_drv.c | 49 ++++++- drivers/gpu/drm/mediatek/mtk_drm_drv.h | 2 + drivers/hid/hid-cp2112.c | 8 +- drivers/hid/intel-ish-hid/ipc/hw-ish.h | 1 + drivers/hid/intel-ish-hid/ipc/pci-ish.c | 1 + drivers/infiniband/core/cma.c | 6 +- drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 +- drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 11 +- drivers/infiniband/hw/hfi1/fault.c | 12 +- drivers/infiniband/hw/mlx4/mad.c | 4 +- drivers/input/serio/hyperv-keyboard.c | 35 +---- drivers/mmc/core/mmc_ops.c | 2 +- drivers/net/ethernet/cavium/common/cavium_ptp.c | 2 +- .../net/ethernet/cavium/liquidio/request_manager.c | 4 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 4 +- drivers/net/ethernet/ibm/ibmveth.c | 9 +- drivers/net/ethernet/ibm/ibmvnic.c | 11 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 +- .../ethernet/mellanox/mlx5/core/en/reporter_tx.c | 12 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 - drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 2 +- .../net/ethernet/netronome/nfp/flower/offload.c | 7 +- .../ethernet/netronome/nfp/flower/tunnel_conf.c | 8 +- drivers/net/ethernet/renesas/ravb_main.c | 8 +- drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 6 +- drivers/net/ethernet/toshiba/tc35815.c | 2 +- drivers/net/ethernet/tundra/tsi108_eth.c | 5 +- drivers/net/hyperv/netvsc_drv.c | 9 +- drivers/net/phy/phy-c45.c | 26 ++++ drivers/net/phy/phy.c | 2 +- drivers/net/usb/cx82310_eth.c | 3 +- drivers/net/usb/kalmia.c | 6 +- drivers/net/usb/lan78xx.c | 8 +- drivers/net/usb/r8152.c | 5 +- drivers/net/wimax/i2400m/fw.c | 4 +- drivers/nvme/host/core.c | 4 +- drivers/nvme/host/multipath.c | 1 + drivers/s390/net/qeth_core.h | 1 + drivers/s390/net/qeth_core_main.c | 20 +++ drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_attr.c | 15 ++ drivers/scsi/lpfc/lpfc_init.c | 10 +- drivers/scsi/lpfc/lpfc_sli4.h | 5 + drivers/scsi/qla2xxx/qla_attr.c | 2 + drivers/scsi/qla2xxx/qla_os.c | 11 +- drivers/target/target_core_user.c | 9 +- fs/afs/cell.c | 4 + fs/afs/dir.c | 3 +- fs/afs/yfsclient.c | 2 +- fs/ceph/caps.c | 5 +- fs/ceph/inode.c | 7 +- fs/ceph/snap.c | 4 +- fs/ceph/super.h | 2 +- fs/ceph/xattr.c | 19 ++- fs/read_write.c | 49 ++++++- include/linux/ceph/buffer.h | 3 +- include/linux/gpio.h | 24 --- include/linux/phy.h | 1 + include/net/act_api.h | 4 +- include/net/netfilter/nf_tables.h | 9 +- include/net/psample.h | 1 + kernel/kprobes.c | 8 +- kernel/sched/core.c | 5 +- net/batman-adv/multicast.c | 2 +- net/bluetooth/hidp/core.c | 9 +- net/core/netpoll.c | 6 +- net/dsa/tag_8021q.c | 2 + net/ipv4/tcp.c | 30 ++-- net/ipv4/tcp_output.c | 3 +- net/ipv6/mcast.c | 5 +- net/netfilter/nf_flow_table_core.c | 43 ++++-- net/netfilter/nf_flow_table_ip.c | 43 ++++++ net/netfilter/nf_tables_api.c | 15 +- net/netfilter/nft_flow_offload.c | 9 +- net/psample/psample.c | 2 +- net/rds/recv.c | 5 +- net/sched/act_bpf.c | 2 +- net/sched/act_connmark.c | 2 +- net/sched/act_csum.c | 2 +- net/sched/act_gact.c | 2 +- net/sched/act_ife.c | 2 +- net/sched/act_ipt.c | 11 +- net/sched/act_mirred.c | 2 +- net/sched/act_nat.c | 2 +- net/sched/act_pedit.c | 2 +- net/sched/act_police.c | 2 +- net/sched/act_sample.c | 8 +- net/sched/act_simple.c | 2 +- net/sched/act_skbedit.c | 2 +- net/sched/act_skbmod.c | 2 +- net/sched/act_tunnel_key.c | 2 +- net/sched/act_vlan.c | 2 +- net/sched/sch_cbs.c | 19 ++- net/sched/sch_generic.c | 19 ++- net/sched/sch_taprio.c | 31 ++-- tools/bpf/bpftool/common.c | 2 +- tools/hv/hv_kvp_daemon.c | 2 +- tools/lib/bpf/libbpf.c | 15 +- tools/testing/selftests/kvm/include/evmcs.h | 2 + tools/testing/selftests/kvm/lib/x86_64/processor.c | 16 +- tools/testing/selftests/kvm/lib/x86_64/vmx.c | 20 +++ tools/testing/selftests/kvm/x86_64/evmcs_test.c | 15 +- tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c | 12 +- .../selftests/kvm/x86_64/platform_info_test.c | 2 +- .../kvm/x86_64/vmx_set_nested_state_test.c | 32 ++-- virt/kvm/arm/mmio.c | 7 + virt/kvm/arm/vgic/vgic-init.c | 30 ++-- 121 files changed, 872 insertions(+), 421 deletions(-)
From: Eric Dumazet edumazet@google.com
[ Upstream commit a84d016479896b5526a2cc54784e6ffc41c9d6f6 ]
Similar to the fix done for IPv4 in commit e5b1c6c6277d ("igmp: fix memory leak in igmpv3_del_delrec()"), we need to make sure mca_tomb and mca_sources are not blindly overwritten.
Using swap() then a call to ip6_mc_clear_src() will take care of the missing free.
BUG: memory leak unreferenced object 0xffff888117d9db00 (size 64): comm "syz-executor247", pid 6918, jiffies 4294943989 (age 25.350s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 fe 88 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000005b463030>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<000000005b463030>] slab_post_alloc_hook mm/slab.h:522 [inline] [<000000005b463030>] slab_alloc mm/slab.c:3319 [inline] [<000000005b463030>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548 [<00000000939cbf94>] kmalloc include/linux/slab.h:552 [inline] [<00000000939cbf94>] kzalloc include/linux/slab.h:748 [inline] [<00000000939cbf94>] ip6_mc_add1_src net/ipv6/mcast.c:2236 [inline] [<00000000939cbf94>] ip6_mc_add_src+0x31f/0x420 net/ipv6/mcast.c:2356 [<00000000d8972221>] ip6_mc_source+0x4a8/0x600 net/ipv6/mcast.c:449 [<000000002b203d0d>] do_ipv6_setsockopt.isra.0+0x1b92/0x1dd0 net/ipv6/ipv6_sockglue.c:748 [<000000001f1e2d54>] ipv6_setsockopt+0x89/0xd0 net/ipv6/ipv6_sockglue.c:944 [<00000000c8f7bdf9>] udpv6_setsockopt+0x4e/0x90 net/ipv6/udp.c:1558 [<000000005a9a0c5e>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3139 [<00000000910b37b2>] __sys_setsockopt+0x10f/0x220 net/socket.c:2084 [<00000000e9108023>] __do_sys_setsockopt net/socket.c:2100 [inline] [<00000000e9108023>] __se_sys_setsockopt net/socket.c:2097 [inline] [<00000000e9108023>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2097 [<00000000f4818160>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:296 [<000000008d367e8f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down") Fixes: 9c8bb163ae78 ("igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/mcast.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -787,14 +787,15 @@ static void mld_del_delrec(struct inet6_ if (pmc) { im->idev = pmc->idev; if (im->mca_sfmode == MCAST_INCLUDE) { - im->mca_tomb = pmc->mca_tomb; - im->mca_sources = pmc->mca_sources; + swap(im->mca_tomb, pmc->mca_tomb); + swap(im->mca_sources, pmc->mca_sources); for (psf = im->mca_sources; psf; psf = psf->sf_next) psf->sf_crcount = idev->mc_qrv; } else { im->mca_crcount = idev->mc_qrv; } in6_dev_put(pmc->idev); + ip6_mc_clear_src(pmc); kfree(pmc); } spin_unlock_bh(&im->mca_lock);
From: Feng Sun loyou85@gmail.com
[ Upstream commit 2c1644cf6d46a8267d79ed95cb9b563839346562 ]
After commit baeababb5b85d5c4e6c917efe2a1504179438d3b ("tun: return NET_XMIT_DROP for dropped packets"), when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP, netpoll_send_skb_on_dev will run into following use after free cases: 1. retry netpoll_start_xmit with freed skb; 2. queue freed skb in npinfo->txq. queue_process will also run into use after free case.
hit netpoll_send_skb_on_dev first case with following kernel log:
[ 117.864773] kernel BUG at mm/slub.c:306! [ 117.864773] invalid opcode: 0000 [#1] SMP PTI [ 117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P OE 5.3.0-050300rc5-generic #201908182231 [ 117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0 [ 117.864781] Call Trace: [ 117.864781] ? tun_net_xmit+0x21c/0x460 [ 117.864781] kfree_skbmem+0x4e/0x60 [ 117.864782] kfree_skb+0x3a/0xa0 [ 117.864782] tun_net_xmit+0x21c/0x460 [ 117.864782] netpoll_start_xmit+0x11d/0x1b0 [ 117.864788] netpoll_send_skb_on_dev+0x1b8/0x200 [ 117.864789] __br_forward+0x1b9/0x1e0 [bridge] [ 117.864789] ? skb_clone+0x53/0xd0 [ 117.864790] ? __skb_clone+0x2e/0x120 [ 117.864790] deliver_clone+0x37/0x50 [bridge] [ 117.864790] maybe_deliver+0x89/0xc0 [bridge] [ 117.864791] br_flood+0x6c/0x130 [bridge] [ 117.864791] br_dev_xmit+0x315/0x3c0 [bridge] [ 117.864792] netpoll_start_xmit+0x11d/0x1b0 [ 117.864792] netpoll_send_skb_on_dev+0x1b8/0x200 [ 117.864792] netpoll_send_udp+0x2c6/0x3e8 [ 117.864793] write_msg+0xd9/0xf0 [netconsole] [ 117.864793] console_unlock+0x386/0x4e0 [ 117.864793] vprintk_emit+0x17e/0x280 [ 117.864794] vprintk_default+0x29/0x50 [ 117.864794] vprintk_func+0x4c/0xbc [ 117.864794] printk+0x58/0x6f [ 117.864795] loop_fun+0x24/0x41 [printmsg_loop] [ 117.864795] kthread+0x104/0x140 [ 117.864795] ? 0xffffffffc05b1000 [ 117.864796] ? kthread_park+0x80/0x80 [ 117.864796] ret_from_fork+0x35/0x40
Signed-off-by: Feng Sun loyou85@gmail.com Signed-off-by: Xiaojun Zhao xiaojunzhao141@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/netpoll.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -122,7 +122,7 @@ static void queue_process(struct work_st txq = netdev_get_tx_queue(dev, q_index); HARD_TX_LOCK(dev, txq, smp_processor_id()); if (netif_xmit_frozen_or_stopped(txq) || - netpoll_start_xmit(skb, dev, txq) != NETDEV_TX_OK) { + !dev_xmit_complete(netpoll_start_xmit(skb, dev, txq))) { skb_queue_head(&npinfo->txq, skb); HARD_TX_UNLOCK(dev, txq); local_irq_restore(flags); @@ -335,7 +335,7 @@ void netpoll_send_skb_on_dev(struct netp
HARD_TX_UNLOCK(dev, txq);
- if (status == NETDEV_TX_OK) + if (dev_xmit_complete(status)) break;
} @@ -352,7 +352,7 @@ void netpoll_send_skb_on_dev(struct netp
}
- if (status != NETDEV_TX_OK) { + if (!dev_xmit_complete(status)) { skb_queue_tail(&npinfo->txq, skb); schedule_delayed_work(&npinfo->tx_work,0); }
From: Vlad Buslov vladbu@mellanox.com
[ Upstream commit dbf47a2a094edf58983265e323ca4bdcdb58b5ee ]
Action sample doesn't properly handle psample_group pointer in overwrite case. Following issues need to be fixed:
- In tcf_sample_init() function RCU_INIT_POINTER() is used to set s->psample_group, even though we neither setting the pointer to NULL, nor preventing concurrent readers from accessing the pointer in some way. Use rcu_swap_protected() instead to safely reset the pointer.
- Old value of s->psample_group is not released or deallocated in any way, which results resource leak. Use psample_group_put() on non-NULL value obtained with rcu_swap_protected().
- The function psample_group_put() that released reference to struct psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu grace period when deallocating it. Extend struct psample_group with rcu head and use kfree_rcu when freeing it.
Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Signed-off-by: Vlad Buslov vladbu@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/psample.h | 1 + net/psample/psample.c | 2 +- net/sched/act_sample.c | 6 +++++- 3 files changed, 7 insertions(+), 2 deletions(-)
--- a/include/net/psample.h +++ b/include/net/psample.h @@ -11,6 +11,7 @@ struct psample_group { u32 group_num; u32 refcount; u32 seq; + struct rcu_head rcu; };
struct psample_group *psample_group_get(struct net *net, u32 group_num); --- a/net/psample/psample.c +++ b/net/psample/psample.c @@ -154,7 +154,7 @@ static void psample_group_destroy(struct { psample_group_notify(group, PSAMPLE_CMD_DEL_GROUP); list_del(&group->list); - kfree(group); + kfree_rcu(group, rcu); }
static struct psample_group * --- a/net/sched/act_sample.c +++ b/net/sched/act_sample.c @@ -102,13 +102,17 @@ static int tcf_sample_init(struct net *n goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); s->rate = rate; s->psample_group_num = psample_group_num; - RCU_INIT_POINTER(s->psample_group, psample_group); + rcu_swap_protected(s->psample_group, psample_group, + lockdep_is_held(&s->tcf_lock));
if (tb[TCA_SAMPLE_TRUNC_SIZE]) { s->truncate = true; s->trunc_size = nla_get_u32(tb[TCA_SAMPLE_TRUNC_SIZE]); } spin_unlock_bh(&s->tcf_lock); + + if (psample_group) + psample_group_put(psample_group); if (goto_ch) tcf_chain_put_by_act(goto_ch);
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 981471bd3abf4d572097645d765391533aac327d ]
The net pointer in struct xt_tgdtor_param is not explicitly initialized therefore is still NULL when dereferencing it. So we have to find a way to pass the correct net pointer to ipt_destroy_target().
The best way I find is just saving the net pointer inside the per netns struct tcf_idrinfo, which could make this patch smaller.
Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset") Reported-and-tested-by: itugrok@yahoo.com Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Jiri Pirko jiri@resnulli.us Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/act_api.h | 4 +++- net/sched/act_bpf.c | 2 +- net/sched/act_connmark.c | 2 +- net/sched/act_csum.c | 2 +- net/sched/act_gact.c | 2 +- net/sched/act_ife.c | 2 +- net/sched/act_ipt.c | 11 ++++++----- net/sched/act_mirred.c | 2 +- net/sched/act_nat.c | 2 +- net/sched/act_pedit.c | 2 +- net/sched/act_police.c | 2 +- net/sched/act_sample.c | 2 +- net/sched/act_simple.c | 2 +- net/sched/act_skbedit.c | 2 +- net/sched/act_skbmod.c | 2 +- net/sched/act_tunnel_key.c | 2 +- net/sched/act_vlan.c | 2 +- 17 files changed, 24 insertions(+), 21 deletions(-)
--- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -15,6 +15,7 @@ struct tcf_idrinfo { struct mutex lock; struct idr action_idr; + struct net *net; };
struct tc_action_ops; @@ -108,7 +109,7 @@ struct tc_action_net { };
static inline -int tc_action_net_init(struct tc_action_net *tn, +int tc_action_net_init(struct net *net, struct tc_action_net *tn, const struct tc_action_ops *ops) { int err = 0; @@ -117,6 +118,7 @@ int tc_action_net_init(struct tc_action_ if (!tn->idrinfo) return -ENOMEM; tn->ops = ops; + tn->idrinfo->net = net; mutex_init(&tn->idrinfo->lock); idr_init(&tn->idrinfo->action_idr); return err; --- a/net/sched/act_bpf.c +++ b/net/sched/act_bpf.c @@ -422,7 +422,7 @@ static __net_init int bpf_init_net(struc { struct tc_action_net *tn = net_generic(net, bpf_net_id);
- return tc_action_net_init(tn, &act_bpf_ops); + return tc_action_net_init(net, tn, &act_bpf_ops); }
static void __net_exit bpf_exit_net(struct list_head *net_list) --- a/net/sched/act_connmark.c +++ b/net/sched/act_connmark.c @@ -231,7 +231,7 @@ static __net_init int connmark_init_net( { struct tc_action_net *tn = net_generic(net, connmark_net_id);
- return tc_action_net_init(tn, &act_connmark_ops); + return tc_action_net_init(net, tn, &act_connmark_ops); }
static void __net_exit connmark_exit_net(struct list_head *net_list) --- a/net/sched/act_csum.c +++ b/net/sched/act_csum.c @@ -714,7 +714,7 @@ static __net_init int csum_init_net(stru { struct tc_action_net *tn = net_generic(net, csum_net_id);
- return tc_action_net_init(tn, &act_csum_ops); + return tc_action_net_init(net, tn, &act_csum_ops); }
static void __net_exit csum_exit_net(struct list_head *net_list) --- a/net/sched/act_gact.c +++ b/net/sched/act_gact.c @@ -278,7 +278,7 @@ static __net_init int gact_init_net(stru { struct tc_action_net *tn = net_generic(net, gact_net_id);
- return tc_action_net_init(tn, &act_gact_ops); + return tc_action_net_init(net, tn, &act_gact_ops); }
static void __net_exit gact_exit_net(struct list_head *net_list) --- a/net/sched/act_ife.c +++ b/net/sched/act_ife.c @@ -890,7 +890,7 @@ static __net_init int ife_init_net(struc { struct tc_action_net *tn = net_generic(net, ife_net_id);
- return tc_action_net_init(tn, &act_ife_ops); + return tc_action_net_init(net, tn, &act_ife_ops); }
static void __net_exit ife_exit_net(struct list_head *net_list) --- a/net/sched/act_ipt.c +++ b/net/sched/act_ipt.c @@ -61,12 +61,13 @@ static int ipt_init_target(struct net *n return 0; }
-static void ipt_destroy_target(struct xt_entry_target *t) +static void ipt_destroy_target(struct xt_entry_target *t, struct net *net) { struct xt_tgdtor_param par = { .target = t->u.kernel.target, .targinfo = t->data, .family = NFPROTO_IPV4, + .net = net, }; if (par.target->destroy != NULL) par.target->destroy(&par); @@ -78,7 +79,7 @@ static void tcf_ipt_release(struct tc_ac struct tcf_ipt *ipt = to_ipt(a);
if (ipt->tcfi_t) { - ipt_destroy_target(ipt->tcfi_t); + ipt_destroy_target(ipt->tcfi_t, a->idrinfo->net); kfree(ipt->tcfi_t); } kfree(ipt->tcfi_tname); @@ -180,7 +181,7 @@ static int __tcf_ipt_init(struct net *ne
spin_lock_bh(&ipt->tcf_lock); if (ret != ACT_P_CREATED) { - ipt_destroy_target(ipt->tcfi_t); + ipt_destroy_target(ipt->tcfi_t, net); kfree(ipt->tcfi_tname); kfree(ipt->tcfi_t); } @@ -350,7 +351,7 @@ static __net_init int ipt_init_net(struc { struct tc_action_net *tn = net_generic(net, ipt_net_id);
- return tc_action_net_init(tn, &act_ipt_ops); + return tc_action_net_init(net, tn, &act_ipt_ops); }
static void __net_exit ipt_exit_net(struct list_head *net_list) @@ -399,7 +400,7 @@ static __net_init int xt_init_net(struct { struct tc_action_net *tn = net_generic(net, xt_net_id);
- return tc_action_net_init(tn, &act_xt_ops); + return tc_action_net_init(net, tn, &act_xt_ops); }
static void __net_exit xt_exit_net(struct list_head *net_list) --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -432,7 +432,7 @@ static __net_init int mirred_init_net(st { struct tc_action_net *tn = net_generic(net, mirred_net_id);
- return tc_action_net_init(tn, &act_mirred_ops); + return tc_action_net_init(net, tn, &act_mirred_ops); }
static void __net_exit mirred_exit_net(struct list_head *net_list) --- a/net/sched/act_nat.c +++ b/net/sched/act_nat.c @@ -327,7 +327,7 @@ static __net_init int nat_init_net(struc { struct tc_action_net *tn = net_generic(net, nat_net_id);
- return tc_action_net_init(tn, &act_nat_ops); + return tc_action_net_init(net, tn, &act_nat_ops); }
static void __net_exit nat_exit_net(struct list_head *net_list) --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -498,7 +498,7 @@ static __net_init int pedit_init_net(str { struct tc_action_net *tn = net_generic(net, pedit_net_id);
- return tc_action_net_init(tn, &act_pedit_ops); + return tc_action_net_init(net, tn, &act_pedit_ops); }
static void __net_exit pedit_exit_net(struct list_head *net_list) --- a/net/sched/act_police.c +++ b/net/sched/act_police.c @@ -371,7 +371,7 @@ static __net_init int police_init_net(st { struct tc_action_net *tn = net_generic(net, police_net_id);
- return tc_action_net_init(tn, &act_police_ops); + return tc_action_net_init(net, tn, &act_police_ops); }
static void __net_exit police_exit_net(struct list_head *net_list) --- a/net/sched/act_sample.c +++ b/net/sched/act_sample.c @@ -269,7 +269,7 @@ static __net_init int sample_init_net(st { struct tc_action_net *tn = net_generic(net, sample_net_id);
- return tc_action_net_init(tn, &act_sample_ops); + return tc_action_net_init(net, tn, &act_sample_ops); }
static void __net_exit sample_exit_net(struct list_head *net_list) --- a/net/sched/act_simple.c +++ b/net/sched/act_simple.c @@ -232,7 +232,7 @@ static __net_init int simp_init_net(stru { struct tc_action_net *tn = net_generic(net, simp_net_id);
- return tc_action_net_init(tn, &act_simp_ops); + return tc_action_net_init(net, tn, &act_simp_ops); }
static void __net_exit simp_exit_net(struct list_head *net_list) --- a/net/sched/act_skbedit.c +++ b/net/sched/act_skbedit.c @@ -336,7 +336,7 @@ static __net_init int skbedit_init_net(s { struct tc_action_net *tn = net_generic(net, skbedit_net_id);
- return tc_action_net_init(tn, &act_skbedit_ops); + return tc_action_net_init(net, tn, &act_skbedit_ops); }
static void __net_exit skbedit_exit_net(struct list_head *net_list) --- a/net/sched/act_skbmod.c +++ b/net/sched/act_skbmod.c @@ -287,7 +287,7 @@ static __net_init int skbmod_init_net(st { struct tc_action_net *tn = net_generic(net, skbmod_net_id);
- return tc_action_net_init(tn, &act_skbmod_ops); + return tc_action_net_init(net, tn, &act_skbmod_ops); }
static void __net_exit skbmod_exit_net(struct list_head *net_list) --- a/net/sched/act_tunnel_key.c +++ b/net/sched/act_tunnel_key.c @@ -600,7 +600,7 @@ static __net_init int tunnel_key_init_ne { struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
- return tc_action_net_init(tn, &act_tunnel_key_ops); + return tc_action_net_init(net, tn, &act_tunnel_key_ops); }
static void __net_exit tunnel_key_exit_net(struct list_head *net_list) --- a/net/sched/act_vlan.c +++ b/net/sched/act_vlan.c @@ -334,7 +334,7 @@ static __net_init int vlan_init_net(stru { struct tc_action_net *tn = net_generic(net, vlan_net_id);
- return tc_action_net_init(tn, &act_vlan_ops); + return tc_action_net_init(net, tn, &act_vlan_ops); }
static void __net_exit vlan_exit_net(struct list_head *net_list)
From: Chen-Yu Tsai wens@csie.org
[ Upstream commit 3b25528e1e355c803e73aa326ce657b5606cda73 ]
The devicetree binding lists the phy phy as optional. As such, the driver should not bail out if it can't find a regulator. Instead it should just skip the remaining regulator related code and continue on normally.
Skip the remainder of phy_power_on() if a regulator supply isn't available. This also gets rid of the bogus return code.
Fixes: 2e12f536635f ("net: stmmac: dwmac-rk: Use standard devicetree property for phy regulator") Signed-off-by: Chen-Yu Tsai wens@csie.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c @@ -1194,10 +1194,8 @@ static int phy_power_on(struct rk_priv_d int ret; struct device *dev = &bsp_priv->pdev->dev;
- if (!ldo) { - dev_err(dev, "no regulator found\n"); - return -1; - } + if (!ldo) + return 0;
if (enable) { ret = regulator_enable(ldo);
From: Willem de Bruijn willemb@google.com
[ Upstream commit 888a5c53c0d8be6e98bc85b677f179f77a647873 ]
TCP associates tx timestamp requests with a byte in the bytestream. If merging skbs in tcp_mtu_probe, migrate the tstamp request.
Similar to MSG_EOR, do not allow moving a timestamp from any segment in the probe but the last. This to avoid merging multiple timestamps.
Tested with the packetdrill script at https://github.com/wdebruij/packetdrill/commits/mtu_probe-1
Link: http://patchwork.ozlabs.org/patch/1143278/#2232897 Fixes: 4ed2d765dfac ("net-timestamp: TCP timestamping") Signed-off-by: Willem de Bruijn willemb@google.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_output.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2051,7 +2051,7 @@ static bool tcp_can_coalesce_send_queue_ if (len <= skb->len) break;
- if (unlikely(TCP_SKB_CB(skb)->eor)) + if (unlikely(TCP_SKB_CB(skb)->eor) || tcp_has_tx_tstamp(skb)) return false;
len -= skb->len; @@ -2168,6 +2168,7 @@ static int tcp_mtu_probe(struct sock *sk * we need to propagate it to the new skb. */ TCP_SKB_CB(nskb)->eor = TCP_SKB_CB(skb)->eor; + tcp_skb_collapse_tstamp(nskb, skb); tcp_unlink_write_queue(skb, sk); sk_wmem_free_skb(sk, skb); } else {
From: Eric Dumazet edumazet@google.com
[ Upstream commit fdfc5c8594c24c5df883583ebd286321a80e0a67 ]
Vladimir Rutsky reported stuck TCP sessions after memory pressure events. Edge Trigger epoll() user would never receive an EPOLLOUT notification allowing them to retry a sendmsg().
Jason tested the case of sk_stream_alloc_skb() returning NULL, but there are other paths that could lead both sendmsg() and sendpage() to return -1 (EAGAIN), with an empty skb queued on the write queue.
This patch makes sure we remove this empty skb so that Jason code can detect that the queue is empty, and call sk->sk_write_space(sk) accordingly.
Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Jason Baron jbaron@akamai.com Reported-by: Vladimir Rutsky rutsky@google.com Cc: Soheil Hassas Yeganeh soheil@google.com Cc: Neal Cardwell ncardwell@google.com Acked-by: Soheil Hassas Yeganeh soheil@google.com Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-)
--- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -935,6 +935,22 @@ static int tcp_send_mss(struct sock *sk, return mss_now; }
+/* In some cases, both sendpage() and sendmsg() could have added + * an skb to the write queue, but failed adding payload on it. + * We need to remove it to consume less memory, but more + * importantly be able to generate EPOLLOUT for Edge Trigger epoll() + * users. + */ +static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) +{ + if (skb && !skb->len) { + tcp_unlink_write_queue(skb, sk); + if (tcp_write_queue_empty(sk)) + tcp_chrono_stop(sk, TCP_CHRONO_BUSY); + sk_wmem_free_skb(sk, skb); + } +} + ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, size_t size, int flags) { @@ -1064,6 +1080,7 @@ out: return copied;
do_error: + tcp_remove_empty_skb(sk, tcp_write_queue_tail(sk)); if (copied) goto out; out_err: @@ -1388,18 +1405,11 @@ out_nopush: sock_zerocopy_put(uarg); return copied + copied_syn;
+do_error: + skb = tcp_write_queue_tail(sk); do_fault: - if (!skb->len) { - tcp_unlink_write_queue(skb, sk); - /* It is the one place in all of TCP, except connection - * reset, where we can be unlinking the send_head. - */ - if (tcp_write_queue_empty(sk)) - tcp_chrono_stop(sk, TCP_CHRONO_BUSY); - sk_wmem_free_skb(sk, skb); - } + tcp_remove_empty_skb(sk, skb);
-do_error: if (copied + copied_syn) goto out; out_err:
From: John Hurley john.hurley@netronome.com
[ Upstream commit 739d7c5752b255e89ddbb1b0474f3b88ef5cd343 ]
Internal port TC offload is implemented through user-space applications (such as OvS) by adding filters at egress via TC clsact qdiscs. Indirect block offload support in the NFP driver accepts both ingress qdisc binds and egress binds if the device is an internal port. However, clsact sends bind notification for both ingress and egress block binds which can lead to the driver registering multiple callbacks and receiving multiple notifications of new filters.
Fix this by rejecting ingress block bind callbacks when the port is internal and only adding filter callbacks for egress binds.
Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports") Signed-off-by: John Hurley john.hurley@netronome.com Reviewed-by: Jakub Kicinski jakub.kicinski@netronome.com Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/flower/offload.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c @@ -1280,9 +1280,10 @@ nfp_flower_setup_indr_tc_block(struct ne struct nfp_flower_priv *priv = app->priv; int err;
- if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS && - !(f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS && - nfp_flower_internal_port_can_offload(app, netdev))) + if ((f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS && + !nfp_flower_internal_port_can_offload(app, netdev)) || + (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS && + nfp_flower_internal_port_can_offload(app, netdev))) return -EOPNOTSUPP;
switch (f->command) {
From: John Hurley john.hurley@netronome.com
[ Upstream commit e8024cb483abb2b0290b3ef5e34c736e9de2492f ]
Recent code changes to NFP allowed the offload of neighbour entries to FW when the next hop device was an internal port. This allows for offload of tunnel encap when the end-point IP address is applied to such a port.
Unfortunately, the neighbour event handler still rejects events that are not associated with a repr dev and so the firmware neighbour table may get out of sync for internal ports.
Fix this by allowing internal port neighbour events to be correctly processed.
Fixes: 45756dfedab5 ("nfp: flower: allow tunnels to output to internal port") Signed-off-by: John Hurley john.hurley@netronome.com Reviewed-by: Simon Horman simon.horman@netronome.com Reviewed-by: Jakub Kicinski jakub.kicinski@netronome.com Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c +++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c @@ -329,13 +329,13 @@ nfp_tun_neigh_event_handler(struct notif
flow.daddr = *(__be32 *)n->primary_key;
- /* Only concerned with route changes for representors. */ - if (!nfp_netdev_is_nfp_repr(n->dev)) - return NOTIFY_DONE; - app_priv = container_of(nb, struct nfp_flower_priv, tun.neigh_nb); app = app_priv->app;
+ if (!nfp_netdev_is_nfp_repr(n->dev) && + !nfp_flower_internal_port_can_offload(app, n->dev)) + return NOTIFY_DONE; + /* Only concerned with changes to routes already added to NFP. */ if (!nfp_tun_has_route(app, flow.daddr)) return NOTIFY_DONE;
From: Hayes Wang hayeswang@realtek.com
[ Upstream commit 49d4b14113cae1410eb4654ada5b9583bad971c4 ]
This reverts commit 0ee1f4734967af8321ecebaf9c74221ace34f2d5.
The commit 0ee1f4734967 ("r8152: napi hangup fix after disconnect") adds a check about RTL8152_UNPLUG to determine if calling napi_disable() is invalid in rtl8152_close(), when rtl8152_disconnect() is called. This avoids to use napi_disable() after calling netif_napi_del().
Howver, commit ffa9fec30ca0 ("r8152: set RTL8152_UNPLUG only for real disconnection") causes that RTL8152_UNPLUG is not always set when calling rtl8152_disconnect(). Therefore, I have to revert commit 0ee1f4734967 ("r8152: napi hangup fix after disconnect"), first. And submit another patch to fix it.
Signed-off-by: Hayes Wang hayeswang@realtek.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/r8152.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -3987,8 +3987,7 @@ static int rtl8152_close(struct net_devi #ifdef CONFIG_PM_SLEEP unregister_pm_notifier(&tp->pm_notifier); #endif - if (!test_bit(RTL8152_UNPLUG, &tp->flags)) - napi_disable(&tp->napi); + napi_disable(&tp->napi); clear_bit(WORK_ENABLE, &tp->flags); usb_kill_urb(tp->intr_urb); cancel_delayed_work_sync(&tp->schedule);
From: Hayes Wang hayeswang@realtek.com
[ Upstream commit 973dc6cfc0e2c43ff29ca5645ceaf1ae694ea110 ]
Remove unnecessary use of netif_napi_del. This also avoids to call napi_disable() after netif_napi_del().
Signed-off-by: Hayes Wang hayeswang@realtek.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/r8152.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -5309,7 +5309,6 @@ static int rtl8152_probe(struct usb_inte return 0;
out1: - netif_napi_del(&tp->napi); usb_set_intfdata(intf, NULL); out: free_netdev(netdev); @@ -5327,7 +5326,6 @@ static void rtl8152_disconnect(struct us if (udev->state == USB_STATE_NOTATTACHED) set_bit(RTL8152_UNPLUG, &tp->flags);
- netif_napi_del(&tp->napi); unregister_netdev(tp->netdev); cancel_delayed_work_sync(&tp->hw_phy_work); tp->rtl_ops.unload(tp);
From: Vladimir Oltean olteanv@gmail.com
taprio_init may fail earlier than this line:
list_add(&q->taprio_list, &taprio_list);
i.e. due to the net device not being multi queue.
Attempting to remove q from the global taprio_list when it is not part of it will result in a kernel panic.
Fix it by matching list_add and list_del better to one another in the order of operations. This way we can keep the deletion unconditional and with lower complexity - O(1).
Cc: Leandro Dorileo leandro.maciel.dorileo@intel.com Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation") Signed-off-by: Vladimir Oltean olteanv@gmail.com Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_taprio.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -903,6 +903,10 @@ static int taprio_init(struct Qdisc *sch */ q->clockid = -1;
+ spin_lock(&taprio_list_lock); + list_add(&q->taprio_list, &taprio_list); + spin_unlock(&taprio_list_lock); + if (sch->parent != TC_H_ROOT) return -EOPNOTSUPP;
@@ -920,10 +924,6 @@ static int taprio_init(struct Qdisc *sch if (!opt) return -EINVAL;
- spin_lock(&taprio_list_lock); - list_add(&q->taprio_list, &taprio_list); - spin_unlock(&taprio_list_lock); - for (i = 0; i < dev->num_tx_queues; i++) { struct netdev_queue *dev_queue; struct Qdisc *qdisc;
From: Vladimir Oltean olteanv@gmail.com
The taprio budget needs to be adapted at runtime according to interface link speed. But that handling is problematic.
For one thing, installing a qdisc on an interface that doesn't have carrier is not illegal. But taprio prints the following stack trace:
[ 31.851373] ------------[ cut here ]------------ [ 31.856024] WARNING: CPU: 1 PID: 207 at net/sched/sch_taprio.c:481 taprio_dequeue+0x1a8/0x2d4 [ 31.864566] taprio: dequeue() called with unknown picos per byte. [ 31.864570] Modules linked in: [ 31.873701] CPU: 1 PID: 207 Comm: tc Not tainted 5.3.0-rc5-01199-g8838fe023cd6 #1689 [ 31.881398] Hardware name: Freescale LS1021A [ 31.885661] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14) [ 31.893368] [<c030d8cc>] (show_stack) from [<c10ac958>] (dump_stack+0xb4/0xc8) [ 31.900555] [<c10ac958>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8) [ 31.907395] [<c0349d04>] (__warn) from [<c0349d64>] (warn_slowpath_fmt+0x48/0x6c) [ 31.914841] [<c0349d64>] (warn_slowpath_fmt) from [<c0f38db4>] (taprio_dequeue+0x1a8/0x2d4) [ 31.923150] [<c0f38db4>] (taprio_dequeue) from [<c0f227b0>] (__qdisc_run+0x90/0x61c) [ 31.930856] [<c0f227b0>] (__qdisc_run) from [<c0ec82ac>] (net_tx_action+0x12c/0x2bc) [ 31.938560] [<c0ec82ac>] (net_tx_action) from [<c0302298>] (__do_softirq+0x130/0x3c8) [ 31.946350] [<c0302298>] (__do_softirq) from [<c03502a0>] (irq_exit+0xbc/0xd8) [ 31.953536] [<c03502a0>] (irq_exit) from [<c03a4808>] (__handle_domain_irq+0x60/0xb4) [ 31.961328] [<c03a4808>] (__handle_domain_irq) from [<c0754478>] (gic_handle_irq+0x58/0x9c) [ 31.969638] [<c0754478>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90) [ 31.977076] Exception stack(0xe8167b20 to 0xe8167b68) [ 31.982100] 7b20: e9d4bd80 00000cc0 000000cf 00000000 e9d4bd80 c1f38958 00000cc0 c1f38960 [ 31.990234] 7b40: 00000001 000000cf 00000004 e9dc0800 00000000 e8167b70 c0f478ec c0f46d94 [ 31.998363] 7b60: 60070013 ffffffff [ 32.001833] [<c0301a8c>] (__irq_svc) from [<c0f46d94>] (netlink_trim+0x18/0xd8) [ 32.009104] [<c0f46d94>] (netlink_trim) from [<c0f478ec>] (netlink_broadcast_filtered+0x34/0x414) [ 32.017930] [<c0f478ec>] (netlink_broadcast_filtered) from [<c0f47cec>] (netlink_broadcast+0x20/0x28) [ 32.027102] [<c0f47cec>] (netlink_broadcast) from [<c0eea378>] (rtnetlink_send+0x34/0x88) [ 32.035238] [<c0eea378>] (rtnetlink_send) from [<c0f25890>] (notify_and_destroy+0x2c/0x44) [ 32.043461] [<c0f25890>] (notify_and_destroy) from [<c0f25e08>] (qdisc_graft+0x398/0x470) [ 32.051595] [<c0f25e08>] (qdisc_graft) from [<c0f27a00>] (tc_modify_qdisc+0x3a4/0x724) [ 32.059470] [<c0f27a00>] (tc_modify_qdisc) from [<c0ee4c84>] (rtnetlink_rcv_msg+0x260/0x2ec) [ 32.067864] [<c0ee4c84>] (rtnetlink_rcv_msg) from [<c0f4a988>] (netlink_rcv_skb+0xb8/0x110) [ 32.076172] [<c0f4a988>] (netlink_rcv_skb) from [<c0f4a170>] (netlink_unicast+0x1b4/0x22c) [ 32.084392] [<c0f4a170>] (netlink_unicast) from [<c0f4a5e4>] (netlink_sendmsg+0x33c/0x380) [ 32.092614] [<c0f4a5e4>] (netlink_sendmsg) from [<c0ea9f40>] (sock_sendmsg+0x14/0x24) [ 32.100403] [<c0ea9f40>] (sock_sendmsg) from [<c0eaa780>] (___sys_sendmsg+0x214/0x228) [ 32.108279] [<c0eaa780>] (___sys_sendmsg) from [<c0eabad0>] (__sys_sendmsg+0x50/0x8c) [ 32.116068] [<c0eabad0>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54) [ 32.123938] Exception stack(0xe8167fa8 to 0xe8167ff0) [ 32.128960] 7fa0: b6fa68c8 000000f8 00000003 bea142d0 00000000 00000000 [ 32.137093] 7fc0: b6fa68c8 000000f8 0052154c 00000128 5d6468a2 00000000 00000028 00558c9c [ 32.145224] 7fe0: 00000070 bea14278 00530d64 b6e17e64 [ 32.150659] ---[ end trace 2139c9827c3e5177 ]---
This happens because the qdisc ->dequeue callback gets called. Which again is not illegal, the qdisc will dequeue even when the interface is up but doesn't have carrier (and hence SPEED_UNKNOWN), and the frames will be dropped further down the stack in dev_direct_xmit().
And, at the end of the day, for what? For calculating the initial budget of an interface which is non-operational at the moment and where frames will get dropped anyway.
So if we can't figure out the link speed, default to SPEED_10 and move along. We can also remove the runtime check now.
Cc: Leandro Dorileo leandro.maciel.dorileo@intel.com Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation") Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Signed-off-by: Vladimir Oltean olteanv@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_taprio.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-)
--- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -205,11 +205,6 @@ static struct sk_buff *taprio_dequeue(st u32 gate_mask; int i;
- if (atomic64_read(&q->picos_per_byte) == -1) { - WARN_ONCE(1, "taprio: dequeue() called with unknown picos per byte."); - return NULL; - } - rcu_read_lock(); entry = rcu_dereference(q->current_entry); /* if there's no entry, it means that the schedule didn't @@ -665,12 +660,20 @@ static void taprio_set_picos_per_byte(st struct taprio_sched *q) { struct ethtool_link_ksettings ecmd; - int picos_per_byte = -1; + int speed = SPEED_10; + int picos_per_byte; + int err; + + err = __ethtool_get_link_ksettings(dev, &ecmd); + if (err < 0) + goto skip; + + if (ecmd.base.speed != SPEED_UNKNOWN) + speed = ecmd.base.speed;
- if (!__ethtool_get_link_ksettings(dev, &ecmd) && - ecmd.base.speed != SPEED_UNKNOWN) - picos_per_byte = div64_s64(NSEC_PER_SEC * 1000LL * 8, - ecmd.base.speed * 1000 * 1000); +skip: + picos_per_byte = div64_s64(NSEC_PER_SEC * 1000LL * 8, + speed * 1000 * 1000);
atomic64_set(&q->picos_per_byte, picos_per_byte); netdev_dbg(dev, "taprio: set %s's picos_per_byte to: %lld, linkspeed: %d\n",
From: Vladimir Oltean olteanv@gmail.com
The discussion to be made is absolutely the same as in the case of previous patch ("taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte"). Nothing is lost when setting a default.
Cc: Leandro Dorileo leandro.maciel.dorileo@intel.com Fixes: e0a7683d30e9 ("net/sched: cbs: fix port_rate miscalculation") Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Signed-off-by: Vladimir Oltean olteanv@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_cbs.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
--- a/net/sched/sch_cbs.c +++ b/net/sched/sch_cbs.c @@ -181,11 +181,6 @@ static struct sk_buff *cbs_dequeue_soft( s64 credits; int len;
- if (atomic64_read(&q->port_rate) == -1) { - WARN_ONCE(1, "cbs: dequeue() called with unknown port rate."); - return NULL; - } - if (q->credits < 0) { credits = timediff_to_credits(now - q->last, q->idleslope);
@@ -303,11 +298,19 @@ static int cbs_enable_offload(struct net static void cbs_set_port_rate(struct net_device *dev, struct cbs_sched_data *q) { struct ethtool_link_ksettings ecmd; + int speed = SPEED_10; int port_rate = -1; + int err; + + err = __ethtool_get_link_ksettings(dev, &ecmd); + if (err < 0) + goto skip; + + if (ecmd.base.speed != SPEED_UNKNOWN) + speed = ecmd.base.speed;
- if (!__ethtool_get_link_ksettings(dev, &ecmd) && - ecmd.base.speed != SPEED_UNKNOWN) - port_rate = ecmd.base.speed * 1000 * BYTES_PER_KBIT; +skip: + port_rate = speed * 1000 * BYTES_PER_KBIT;
atomic64_set(&q->port_rate, port_rate); netdev_dbg(dev, "cbs: set %s's port_rate to: %lld, linkspeed: %d\n",
From: Marco Hartmann marco.hartmann@nxp.com
[ Upstream commit 2ebb991641d3f64b70fec0156e2b6933810177e9 ]
Commit 34786005eca3 ("net: phy: prevent PHYs w/o Clause 22 regs from calling genphy_config_aneg") introduced a check that aborts phy_config_aneg() if the phy is a C45 phy. This causes phy_state_machine() to call phy_error() so that the phy ends up in PHY_HALTED state.
Instead of returning -EOPNOTSUPP, call genphy_c45_config_aneg() (analogous to the C22 case) so that the state machine can run correctly.
genphy_c45_config_aneg() closely resembles mv3310_config_aneg() in drivers/net/phy/marvell10g.c, excluding vendor specific configurations for 1000BaseT.
Fixes: 22b56e827093 ("net: phy: replace genphy_10g_driver with genphy_c45_driver")
Signed-off-by: Marco Hartmann marco.hartmann@nxp.com Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/phy-c45.c | 26 ++++++++++++++++++++++++++ drivers/net/phy/phy.c | 2 +- include/linux/phy.h | 1 + 3 files changed, 28 insertions(+), 1 deletion(-)
--- a/drivers/net/phy/phy-c45.c +++ b/drivers/net/phy/phy-c45.c @@ -523,6 +523,32 @@ int genphy_c45_read_status(struct phy_de } EXPORT_SYMBOL_GPL(genphy_c45_read_status);
+/** + * genphy_c45_config_aneg - restart auto-negotiation or forced setup + * @phydev: target phy_device struct + * + * Description: If auto-negotiation is enabled, we configure the + * advertising, and then restart auto-negotiation. If it is not + * enabled, then we force a configuration. + */ +int genphy_c45_config_aneg(struct phy_device *phydev) +{ + bool changed = false; + int ret; + + if (phydev->autoneg == AUTONEG_DISABLE) + return genphy_c45_pma_setup_forced(phydev); + + ret = genphy_c45_an_config_aneg(phydev); + if (ret < 0) + return ret; + if (ret > 0) + changed = true; + + return genphy_c45_check_and_restart_aneg(phydev, changed); +} +EXPORT_SYMBOL_GPL(genphy_c45_config_aneg); + /* The gen10g_* functions are the old Clause 45 stub */
int gen10g_config_aneg(struct phy_device *phydev) --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -499,7 +499,7 @@ static int phy_config_aneg(struct phy_de * allowed to call genphy_config_aneg() */ if (phydev->is_c45 && !(phydev->c45_ids.devices_in_package & BIT(0))) - return -EOPNOTSUPP; + return genphy_c45_config_aneg(phydev);
return genphy_config_aneg(phydev); } --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1108,6 +1108,7 @@ int genphy_c45_an_disable_aneg(struct ph int genphy_c45_read_mdix(struct phy_device *phydev); int genphy_c45_pma_read_abilities(struct phy_device *phydev); int genphy_c45_read_status(struct phy_device *phydev); +int genphy_c45_config_aneg(struct phy_device *phydev);
/* The gen10g_* functions are the old Clause 45 stub */ int gen10g_config_aneg(struct phy_device *phydev);
From: Vladimir Oltean olteanv@gmail.com
[ Upstream commit bcccb0a535bb99616e4b992568371efab1ab14e8 ]
After witnessing the discussion in https://lkml.org/lkml/2019/8/14/151 w.r.t. ioctl extensibility, it became clear that such an issue might prevent that the 3 RSV bits inside the DSA 802.1Q tag might also suffer the same fate and be useless for further extension.
So clearly specify that the reserved bits should currently be transmitted as zero and ignored on receive. The DSA tagger already does this (and has always did), and is the only known user so far (no Wireshark dissection plugin, etc). So there should be no incompatibility to speak of.
Fixes: 0471dd429cea ("net: dsa: tag_8021q: Create a stable binary format") Signed-off-by: Vladimir Oltean olteanv@gmail.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dsa/tag_8021q.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/dsa/tag_8021q.c +++ b/net/dsa/tag_8021q.c @@ -28,6 +28,7 @@ * * RSV - VID[9]: * To be used for further expansion of SWITCH_ID or for other purposes. + * Must be transmitted as zero and ignored on receive. * * SWITCH_ID - VID[8:6]: * Index of switch within DSA tree. Must be between 0 and @@ -35,6 +36,7 @@ * * RSV - VID[5:4]: * To be used for further expansion of PORT or for other purposes. + * Must be transmitted as zero and ignored on receive. * * PORT - VID[3:0]: * Index of switch port. Must be between 0 and DSA_MAX_PORTS - 1.
From: Davide Caratti dcaratti@redhat.com
[ Upstream commit 092e22e586236bba106a82113826a68080a03506 ]
Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of 'TCQ_F_NOLOCK' bit in the parent qdisc, we can't assume anymore that per-cpu counters are there in the error path of skb_array_produce(). Otherwise, the following splat can be seen:
Unable to handle kernel paging request at virtual address 0000600dea430008 Mem abort info: ESR = 0x96000005 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000005 CM = 0, WnR = 0 user pgtable: 64k pages, 48-bit VAs, pgdp = 000000007b97530e [0000600dea430008] pgd=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000005 [#1] SMP [...] pstate: 10000005 (nzcV daif -PAN -UAO) pc : pfifo_fast_enqueue+0x524/0x6e8 lr : pfifo_fast_enqueue+0x46c/0x6e8 sp : ffff800d39376fe0 x29: ffff800d39376fe0 x28: 1ffff001a07d1e40 x27: ffff800d03e8f188 x26: ffff800d03e8f200 x25: 0000000000000062 x24: ffff800d393772f0 x23: 0000000000000000 x22: 0000000000000403 x21: ffff800cca569a00 x20: ffff800d03e8ee00 x19: ffff800cca569a10 x18: 00000000000000bf x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: ffff1001a726edd0 x13: 1fffe4000276a9a4 x12: 0000000000000000 x11: dfff200000000000 x10: ffff800d03e8f1a0 x9 : 0000000000000003 x8 : 0000000000000000 x7 : 00000000f1f1f1f1 x6 : ffff1001a726edea x5 : ffff800cca56a53c x4 : 1ffff001bf9a8003 x3 : 1ffff001bf9a8003 x2 : 1ffff001a07d1dcb x1 : 0000600dea430000 x0 : 0000600dea430008 Process ping (pid: 6067, stack limit = 0x00000000dc0aa557) Call trace: pfifo_fast_enqueue+0x524/0x6e8 htb_enqueue+0x660/0x10e0 [sch_htb] __dev_queue_xmit+0x123c/0x2de0 dev_queue_xmit+0x24/0x30 ip_finish_output2+0xc48/0x1720 ip_finish_output+0x548/0x9d8 ip_output+0x334/0x788 ip_local_out+0x90/0x138 ip_send_skb+0x44/0x1d0 ip_push_pending_frames+0x5c/0x78 raw_sendmsg+0xed8/0x28d0 inet_sendmsg+0xc4/0x5c0 sock_sendmsg+0xac/0x108 __sys_sendto+0x1ac/0x2a0 __arm64_sys_sendto+0xc4/0x138 el0_svc_handler+0x13c/0x298 el0_svc+0x8/0xc Code: f9402e80 d538d081 91002000 8b010000 (885f7c03)
Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags', before dereferencing 'qdisc->cpu_qstats'.
Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too") CC: Paolo Abeni pabeni@redhat.com CC: Stefano Brivio sbrivio@redhat.com Reported-by: Li Shuang shuali@redhat.com Signed-off-by: Davide Caratti dcaratti@redhat.com Acked-by: Paolo Abeni pabeni@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_generic.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -624,8 +624,12 @@ static int pfifo_fast_enqueue(struct sk_
err = skb_array_produce(q, skb);
- if (unlikely(err)) - return qdisc_drop_cpu(skb, qdisc, to_free); + if (unlikely(err)) { + if (qdisc_is_percpu_stats(qdisc)) + return qdisc_drop_cpu(skb, qdisc, to_free); + else + return qdisc_drop(skb, qdisc, to_free); + }
qdisc_update_stats_at_enqueue(qdisc, pkt_len); return NET_XMIT_SUCCESS;
From: Davide Caratti dcaratti@redhat.com
[ Upstream commit 04d37cf46a773910f75fefaa9f9488f42bfe1fe2 ]
Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of 'TCQ_F_NOLOCK' bit in the parent qdisc, we need to be sure that per-cpu counters are present when 'reset()' is called for pfifo_fast qdiscs. Otherwise, the following script:
# tc q a dev lo handle 1: root htb default 100 # tc c a dev lo parent 1: classid 1:100 htb \
rate 95Mbit ceil 100Mbit burst 64k
[...] # tc f a dev lo parent 1: protocol arp basic classid 1:100 [...] # tc q a dev lo parent 1:100 handle 100: pfifo_fast [...] # tc q d dev lo root
can generate the following splat:
Unable to handle kernel paging request at virtual address dfff2c01bd148000 Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [dfff2c01bd148000] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] SMP [...] pstate: 80000005 (Nzcv daif -PAN -UAO) pc : pfifo_fast_reset+0x280/0x4d8 lr : pfifo_fast_reset+0x21c/0x4d8 sp : ffff800d09676fa0 x29: ffff800d09676fa0 x28: ffff200012ee22e4 x27: dfff200000000000 x26: 0000000000000000 x25: ffff800ca0799958 x24: ffff1001940f332b x23: 0000000000000007 x22: ffff200012ee1ab8 x21: 0000600de8a40000 x20: 0000000000000000 x19: ffff800ca0799900 x18: 0000000000000000 x17: 0000000000000002 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: ffff1001b922e6e2 x11: 1ffff001b922e6e1 x10: 0000000000000000 x9 : 1ffff001b922e6e1 x8 : dfff200000000000 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 1fffe400025dc45c x4 : 1fffe400025dc357 x3 : 00000c01bd148000 x2 : 0000600de8a40000 x1 : 0000000000000007 x0 : 0000600de8a40004 Call trace: pfifo_fast_reset+0x280/0x4d8 qdisc_reset+0x6c/0x370 htb_reset+0x150/0x3b8 [sch_htb] qdisc_reset+0x6c/0x370 dev_deactivate_queue.constprop.5+0xe0/0x1a8 dev_deactivate_many+0xd8/0x908 dev_deactivate+0xe4/0x190 qdisc_graft+0x88c/0xbd0 tc_get_qdisc+0x418/0x8a8 rtnetlink_rcv_msg+0x3a8/0xa78 netlink_rcv_skb+0x18c/0x328 rtnetlink_rcv+0x28/0x38 netlink_unicast+0x3c4/0x538 netlink_sendmsg+0x538/0x9a0 sock_sendmsg+0xac/0xf8 ___sys_sendmsg+0x53c/0x658 __sys_sendmsg+0xc8/0x140 __arm64_sys_sendmsg+0x74/0xa8 el0_svc_handler+0x164/0x468 el0_svc+0x10/0x14 Code: 910012a0 92400801 d343fc03 11000c21 (38fb6863)
Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags', before dereferencing 'qdisc->cpu_qstats'.
Changes since v1: - coding style improvements, thanks to Stefano Brivio
Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too") CC: Paolo Abeni pabeni@redhat.com Reported-by: Li Shuang shuali@redhat.com Signed-off-by: Davide Caratti dcaratti@redhat.com Acked-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_generic.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -692,11 +692,14 @@ static void pfifo_fast_reset(struct Qdis kfree_skb(skb); }
- for_each_possible_cpu(i) { - struct gnet_stats_queue *q = per_cpu_ptr(qdisc->cpu_qstats, i); + if (qdisc_is_percpu_stats(qdisc)) { + for_each_possible_cpu(i) { + struct gnet_stats_queue *q;
- q->backlog = 0; - q->qlen = 0; + q = per_cpu_ptr(qdisc->cpu_qstats, i); + q->backlog = 0; + q->qlen = 0; + } } }
From: Ka-Cheong Poon ka-cheong.poon@oracle.com
[ Upstream commit 7d0a06586b2686ba80c4a2da5f91cb10ffbea736 ]
The rds6_inc_info_copy() function has a couple struct members which are leaking stack information. The ->tos field should hold actual information and the ->flags field needs to be zeroed out.
Fixes: 3eb450367d08 ("rds: add type of service(tos) infrastructure") Fixes: b7ff8b1036f0 ("rds: Extend RDS API for IPv6 support") Reported-by: 黄ID蝴蝶 butterflyhuangxx@gmail.com Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Ka-Cheong Poon ka-cheong.poon@oracle.com Acked-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rds/recv.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2018 Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2006, 2019 Oracle and/or its affiliates. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -811,6 +811,7 @@ void rds6_inc_info_copy(struct rds_incom
minfo6.seq = be64_to_cpu(inc->i_hdr.h_sequence); minfo6.len = be32_to_cpu(inc->i_hdr.h_len); + minfo6.tos = inc->i_conn->c_tos;
if (flip) { minfo6.laddr = *daddr; @@ -824,6 +825,8 @@ void rds6_inc_info_copy(struct rds_incom minfo6.fport = inc->i_hdr.h_dport; }
+ minfo6.flags = 0; + rds_info_copy(iter, &minfo6, sizeof(minfo6)); } #endif
[ Upstream commit fa3a03da549a889fc9dbc0d3c5908eb7882cac8f ]
The bucket variable is only updated outside the loop over the mcast_flags buckets. It will only be updated during a dumping run when the dumping has to be interrupted and a new message has to be started.
This could result in repeated or missing entries when the multicast flags are dumped to userspace.
Fixes: d2d489b7d851 ("batman-adv: Add inconsistent multicast netlink dump detection") Signed-off-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/batman-adv/multicast.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c index ec54e236e3454..50fe9dfb088b6 100644 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@ -1653,7 +1653,7 @@ __batadv_mcast_flags_dump(struct sk_buff *msg, u32 portid,
while (bucket_tmp < hash->size) { if (batadv_mcast_flags_dump_bucket(msg, portid, cb, hash, - *bucket, &idx_tmp)) + bucket_tmp, &idx_tmp)) break;
bucket_tmp++;
[ Upstream commit 5d01ab7bac467edfc530e6ccf953921def935c62 ]
Libbpf stores associated BTF FD per each instance of bpf_program. When program is unloaded, that FD is closed. This is wrong, because leads to a race and possibly closing of unrelated files, if application simultaneously opens new files while bpf_programs are unloaded.
It's also unnecessary, because struct btf "owns" that FD, and btf__free(), called from bpf_object__close() will close it. Thus the fix is to never have per-program BTF FD and fetch it from obj->btf, when necessary.
Fixes: 2993e0515bb4 ("tools/bpf: add support to read .BTF.ext sections") Reported-by: Andrey Ignatov rdna@fb.com Signed-off-by: Andrii Nakryiko andriin@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/bpf/libbpf.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 77e14d9954796..21355f6be2434 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -178,7 +178,6 @@ struct bpf_program { bpf_program_clear_priv_t clear_priv;
enum bpf_attach_type expected_attach_type; - int btf_fd; void *func_info; __u32 func_info_rec_size; __u32 func_info_cnt; @@ -305,7 +304,6 @@ void bpf_program__unload(struct bpf_program *prog) prog->instances.nr = -1; zfree(&prog->instances.fds);
- zclose(prog->btf_fd); zfree(&prog->func_info); zfree(&prog->line_info); } @@ -382,7 +380,6 @@ bpf_program__init(void *data, size_t size, char *section_name, int idx, prog->instances.fds = NULL; prog->instances.nr = -1; prog->type = BPF_PROG_TYPE_UNSPEC; - prog->btf_fd = -1;
return 0; errout: @@ -1888,9 +1885,6 @@ bpf_program_reloc_btf_ext(struct bpf_program *prog, struct bpf_object *obj, prog->line_info_rec_size = btf_ext__line_info_rec_size(obj->btf_ext); }
- if (!insn_offset) - prog->btf_fd = btf__fd(obj->btf); - return 0; }
@@ -2065,7 +2059,7 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt, char *cp, errmsg[STRERR_BUFSIZE]; int log_buf_size = BPF_LOG_BUF_SIZE; char *log_buf; - int ret; + int btf_fd, ret;
memset(&load_attr, 0, sizeof(struct bpf_load_program_attr)); load_attr.prog_type = prog->type; @@ -2077,7 +2071,8 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt, load_attr.license = license; load_attr.kern_version = kern_version; load_attr.prog_ifindex = prog->prog_ifindex; - load_attr.prog_btf_fd = prog->btf_fd >= 0 ? prog->btf_fd : 0; + btf_fd = bpf_object__btf_fd(prog->obj); + load_attr.prog_btf_fd = btf_fd >= 0 ? btf_fd : 0; load_attr.func_info = prog->func_info; load_attr.func_info_rec_size = prog->func_info_rec_size; load_attr.func_info_cnt = prog->func_info_cnt;
[ Upstream commit 3415ec643e7bd644b03026efbe2f2b36cbe9b34b ]
5d01ab7bac46 ("libbpf: fix erroneous multi-closing of BTF FD") introduced backwards-compatibility issue, manifesting itself as -E2BIG error returned on program load due to unknown non-zero btf_fd attribute value for BPF_PROG_LOAD sys_bpf() sub-command.
This patch fixes bug by ensuring that we only ever associate BTF FD with program if there is a BTF.ext data that was successfully loaded into kernel, which automatically means kernel supports func_info/line_info and associated BTF FD for progs (checked and ensured also by BTF sanitization code).
Fixes: 5d01ab7bac46 ("libbpf: fix erroneous multi-closing of BTF FD") Reported-by: Andrey Ignatov rdna@fb.com Signed-off-by: Andrii Nakryiko andriin@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/bpf/libbpf.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 21355f6be2434..0ccf6aa533ae9 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -2071,7 +2071,11 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt, load_attr.license = license; load_attr.kern_version = kern_version; load_attr.prog_ifindex = prog->prog_ifindex; - btf_fd = bpf_object__btf_fd(prog->obj); + /* if .BTF.ext was loaded, kernel supports associated BTF for prog */ + if (prog->obj->btf_ext) + btf_fd = bpf_object__btf_fd(prog->obj); + else + btf_fd = -1; load_attr.prog_btf_fd = btf_fd >= 0 ? btf_fd : 0; load_attr.func_info = prog->func_info; load_attr.func_info_rec_size = prog->func_info_rec_size;
[ Upstream commit 589b474a4b7ce409d6821ef17234a995841bd131 ]
This makes the previously added 'encap test' pass. Because its possible that the xfrm dst entry becomes stale while such a flow is offloaded, we need to call dst_check() -- the notifier that handles this for non-tunneled traffic isn't sufficient, because SA or or policies might have changed.
If dst becomes stale the flow offload entry will be tagged for teardown and packets will be passed to 'classic' forwarding path.
Removing the entry right away is problematic, as this would introduce a race condition with the gc worker.
In case flow is long-lived, it could eventually be offloaded again once the gc worker removes the entry from the flow table.
Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_flow_table_ip.c | 43 ++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+)
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c index cdfc33517e85b..d68c801dd614b 100644 --- a/net/netfilter/nf_flow_table_ip.c +++ b/net/netfilter/nf_flow_table_ip.c @@ -214,6 +214,25 @@ static bool nf_flow_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu) return true; }
+static int nf_flow_offload_dst_check(struct dst_entry *dst) +{ + if (unlikely(dst_xfrm(dst))) + return dst_check(dst, 0) ? 0 : -1; + + return 0; +} + +static unsigned int nf_flow_xmit_xfrm(struct sk_buff *skb, + const struct nf_hook_state *state, + struct dst_entry *dst) +{ + skb_orphan(skb); + skb_dst_set_noref(skb, dst); + skb->tstamp = 0; + dst_output(state->net, state->sk, skb); + return NF_STOLEN; +} + unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) @@ -254,6 +273,11 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, if (nf_flow_state_check(flow, ip_hdr(skb)->protocol, skb, thoff)) return NF_ACCEPT;
+ if (nf_flow_offload_dst_check(&rt->dst)) { + flow_offload_teardown(flow); + return NF_ACCEPT; + } + if (nf_flow_nat_ip(flow, skb, thoff, dir) < 0) return NF_DROP;
@@ -261,6 +285,13 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, iph = ip_hdr(skb); ip_decrease_ttl(iph);
+ if (unlikely(dst_xfrm(&rt->dst))) { + memset(skb->cb, 0, sizeof(struct inet_skb_parm)); + IPCB(skb)->iif = skb->dev->ifindex; + IPCB(skb)->flags = IPSKB_FORWARDED; + return nf_flow_xmit_xfrm(skb, state, &rt->dst); + } + skb->dev = outdev; nexthop = rt_nexthop(rt, flow->tuplehash[!dir].tuple.src_v4.s_addr); skb_dst_set_noref(skb, &rt->dst); @@ -467,6 +498,11 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, sizeof(*ip6h))) return NF_ACCEPT;
+ if (nf_flow_offload_dst_check(&rt->dst)) { + flow_offload_teardown(flow); + return NF_ACCEPT; + } + if (skb_try_make_writable(skb, sizeof(*ip6h))) return NF_DROP;
@@ -477,6 +513,13 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, ip6h = ipv6_hdr(skb); ip6h->hop_limit--;
+ if (unlikely(dst_xfrm(&rt->dst))) { + memset(skb->cb, 0, sizeof(struct inet6_skb_parm)); + IP6CB(skb)->iif = skb->dev->ifindex; + IP6CB(skb)->flags = IP6SKB_FORWARDED; + return nf_flow_xmit_xfrm(skb, state, &rt->dst); + } + skb->dev = outdev; nexthop = rt6_nexthop(rt, &flow->tuplehash[!dir].tuple.src_v6); skb_dst_set_noref(skb, &rt->dst);
[ Upstream commit 276d197e70bcc47153592f4384675b51c7d83aba ]
CQE recovery function begins with test and set of recovery bit. Add an error flow which ensures clearing of this bit when leaving the recovery function, to allow further recoveries to take place. This allows removal of clearing recovery bit on sq activate.
Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin ayal@mellanox.com Reviewed-by: Tariq Toukan tariqt@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 12 ++++++++---- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 - 2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c index c1caf14bc3346..c7f86453c6384 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c @@ -80,17 +80,17 @@ static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq) if (err) { netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n", sq->sqn, err); - return err; + goto out; }
if (state != MLX5_SQC_STATE_ERR) - return 0; + goto out;
mlx5e_tx_disable_queue(sq->txq);
err = mlx5e_wait_for_sq_flush(sq); if (err) - return err; + goto out;
/* At this point, no new packets will arrive from the stack as TXQ is * marked with QUEUE_STATE_DRV_XOFF. In addition, NAPI cleared all @@ -99,13 +99,17 @@ static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
err = mlx5e_sq_to_ready(sq, state); if (err) - return err; + goto out;
mlx5e_reset_txqsq_cc_pc(sq); sq->stats->recover++; + clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state); mlx5e_activate_txqsq(sq);
return 0; +out: + clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state); + return err; }
static int mlx5_tx_health_report(struct devlink_health_reporter *tx_reporter, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 882d26b8095da..bbdfdaf06391a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1279,7 +1279,6 @@ err_free_txqsq: void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq) { sq->txq = netdev_get_tx_queue(sq->channel->netdev, sq->txq_ix); - clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state); set_bit(MLX5E_SQ_STATE_ENABLED, &sq->state); netdev_tx_reset_queue(sq->txq); netif_tx_start_queue(sq->txq);
[ Upstream commit bf32e7dbfce87d518c0ca77af890eae9ab8d6ab9 ]
In order to make it easier in subsequent patch to create different subcmu lists for exynos5420 and exynos5800 SoCs the code is rewritten so we pass an array of pointers to the subcmus initialization function.
Fixes: b06a532bf1fa ("clk: samsung: Add Exynos5 sub-CMU clock driver") Tested-by: Jaafar Ali jaafarkhalaf@gmail.com Signed-off-by: Sylwester Nawrocki s.nawrocki@samsung.com Link: https://lkml.kernel.org/r/20190808144929.18685-1-s.nawrocki@samsung.com Reviewed-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/samsung/clk-exynos5-subcmu.c | 16 +++---- drivers/clk/samsung/clk-exynos5-subcmu.h | 2 +- drivers/clk/samsung/clk-exynos5250.c | 7 ++- drivers/clk/samsung/clk-exynos5420.c | 60 ++++++++++++++---------- 4 files changed, 49 insertions(+), 36 deletions(-)
diff --git a/drivers/clk/samsung/clk-exynos5-subcmu.c b/drivers/clk/samsung/clk-exynos5-subcmu.c index 91db7894125df..65c82d922b05c 100644 --- a/drivers/clk/samsung/clk-exynos5-subcmu.c +++ b/drivers/clk/samsung/clk-exynos5-subcmu.c @@ -14,7 +14,7 @@ #include "clk-exynos5-subcmu.h"
static struct samsung_clk_provider *ctx; -static const struct exynos5_subcmu_info *cmu; +static const struct exynos5_subcmu_info **cmu; static int nr_cmus;
static void exynos5_subcmu_clk_save(void __iomem *base, @@ -56,17 +56,17 @@ static void exynos5_subcmu_defer_gate(struct samsung_clk_provider *ctx, * when OF-core populates all device-tree nodes. */ void exynos5_subcmus_init(struct samsung_clk_provider *_ctx, int _nr_cmus, - const struct exynos5_subcmu_info *_cmu) + const struct exynos5_subcmu_info **_cmu) { ctx = _ctx; cmu = _cmu; nr_cmus = _nr_cmus;
for (; _nr_cmus--; _cmu++) { - exynos5_subcmu_defer_gate(ctx, _cmu->gate_clks, - _cmu->nr_gate_clks); - exynos5_subcmu_clk_save(ctx->reg_base, _cmu->suspend_regs, - _cmu->nr_suspend_regs); + exynos5_subcmu_defer_gate(ctx, (*_cmu)->gate_clks, + (*_cmu)->nr_gate_clks); + exynos5_subcmu_clk_save(ctx->reg_base, (*_cmu)->suspend_regs, + (*_cmu)->nr_suspend_regs); } }
@@ -163,9 +163,9 @@ static int __init exynos5_clk_probe(struct platform_device *pdev) if (of_property_read_string(np, "label", &name) < 0) continue; for (i = 0; i < nr_cmus; i++) - if (strcmp(cmu[i].pd_name, name) == 0) + if (strcmp(cmu[i]->pd_name, name) == 0) exynos5_clk_register_subcmu(&pdev->dev, - &cmu[i], np); + cmu[i], np); } return 0; } diff --git a/drivers/clk/samsung/clk-exynos5-subcmu.h b/drivers/clk/samsung/clk-exynos5-subcmu.h index 755ee8aaa3de5..9ae5356f25aa4 100644 --- a/drivers/clk/samsung/clk-exynos5-subcmu.h +++ b/drivers/clk/samsung/clk-exynos5-subcmu.h @@ -21,6 +21,6 @@ struct exynos5_subcmu_info { };
void exynos5_subcmus_init(struct samsung_clk_provider *ctx, int nr_cmus, - const struct exynos5_subcmu_info *cmu); + const struct exynos5_subcmu_info **cmu);
#endif diff --git a/drivers/clk/samsung/clk-exynos5250.c b/drivers/clk/samsung/clk-exynos5250.c index f2b8968817682..931c70a4da196 100644 --- a/drivers/clk/samsung/clk-exynos5250.c +++ b/drivers/clk/samsung/clk-exynos5250.c @@ -681,6 +681,10 @@ static const struct exynos5_subcmu_info exynos5250_disp_subcmu = { .pd_name = "DISP1", };
+static const struct exynos5_subcmu_info *exynos5250_subcmus[] = { + &exynos5250_disp_subcmu, +}; + static const struct samsung_pll_rate_table vpll_24mhz_tbl[] __initconst = { /* sorted in descending order */ /* PLL_36XX_RATE(rate, m, p, s, k) */ @@ -843,7 +847,8 @@ static void __init exynos5250_clk_init(struct device_node *np)
samsung_clk_sleep_init(reg_base, exynos5250_clk_regs, ARRAY_SIZE(exynos5250_clk_regs)); - exynos5_subcmus_init(ctx, 1, &exynos5250_disp_subcmu); + exynos5_subcmus_init(ctx, ARRAY_SIZE(exynos5250_subcmus), + exynos5250_subcmus);
samsung_clk_of_add_provider(np, ctx);
diff --git a/drivers/clk/samsung/clk-exynos5420.c b/drivers/clk/samsung/clk-exynos5420.c index 12d800fd95286..a6ea5d7e63d02 100644 --- a/drivers/clk/samsung/clk-exynos5420.c +++ b/drivers/clk/samsung/clk-exynos5420.c @@ -1232,32 +1232,40 @@ static struct exynos5_subcmu_reg_dump exynos5x_mfc_suspend_regs[] = { { DIV4_RATIO, 0, 0x3 }, /* DIV dout_mfc_blk */ };
-static const struct exynos5_subcmu_info exynos5x_subcmus[] = { - { - .div_clks = exynos5x_disp_div_clks, - .nr_div_clks = ARRAY_SIZE(exynos5x_disp_div_clks), - .gate_clks = exynos5x_disp_gate_clks, - .nr_gate_clks = ARRAY_SIZE(exynos5x_disp_gate_clks), - .suspend_regs = exynos5x_disp_suspend_regs, - .nr_suspend_regs = ARRAY_SIZE(exynos5x_disp_suspend_regs), - .pd_name = "DISP", - }, { - .div_clks = exynos5x_gsc_div_clks, - .nr_div_clks = ARRAY_SIZE(exynos5x_gsc_div_clks), - .gate_clks = exynos5x_gsc_gate_clks, - .nr_gate_clks = ARRAY_SIZE(exynos5x_gsc_gate_clks), - .suspend_regs = exynos5x_gsc_suspend_regs, - .nr_suspend_regs = ARRAY_SIZE(exynos5x_gsc_suspend_regs), - .pd_name = "GSC", - }, { - .div_clks = exynos5x_mfc_div_clks, - .nr_div_clks = ARRAY_SIZE(exynos5x_mfc_div_clks), - .gate_clks = exynos5x_mfc_gate_clks, - .nr_gate_clks = ARRAY_SIZE(exynos5x_mfc_gate_clks), - .suspend_regs = exynos5x_mfc_suspend_regs, - .nr_suspend_regs = ARRAY_SIZE(exynos5x_mfc_suspend_regs), - .pd_name = "MFC", - }, +static const struct exynos5_subcmu_info exynos5x_disp_subcmu = { + .div_clks = exynos5x_disp_div_clks, + .nr_div_clks = ARRAY_SIZE(exynos5x_disp_div_clks), + .gate_clks = exynos5x_disp_gate_clks, + .nr_gate_clks = ARRAY_SIZE(exynos5x_disp_gate_clks), + .suspend_regs = exynos5x_disp_suspend_regs, + .nr_suspend_regs = ARRAY_SIZE(exynos5x_disp_suspend_regs), + .pd_name = "DISP", +}; + +static const struct exynos5_subcmu_info exynos5x_gsc_subcmu = { + .div_clks = exynos5x_gsc_div_clks, + .nr_div_clks = ARRAY_SIZE(exynos5x_gsc_div_clks), + .gate_clks = exynos5x_gsc_gate_clks, + .nr_gate_clks = ARRAY_SIZE(exynos5x_gsc_gate_clks), + .suspend_regs = exynos5x_gsc_suspend_regs, + .nr_suspend_regs = ARRAY_SIZE(exynos5x_gsc_suspend_regs), + .pd_name = "GSC", +}; + +static const struct exynos5_subcmu_info exynos5x_mfc_subcmu = { + .div_clks = exynos5x_mfc_div_clks, + .nr_div_clks = ARRAY_SIZE(exynos5x_mfc_div_clks), + .gate_clks = exynos5x_mfc_gate_clks, + .nr_gate_clks = ARRAY_SIZE(exynos5x_mfc_gate_clks), + .suspend_regs = exynos5x_mfc_suspend_regs, + .nr_suspend_regs = ARRAY_SIZE(exynos5x_mfc_suspend_regs), + .pd_name = "MFC", +}; + +static const struct exynos5_subcmu_info *exynos5x_subcmus[] = { + &exynos5x_disp_subcmu, + &exynos5x_gsc_subcmu, + &exynos5x_mfc_subcmu, };
static const struct samsung_pll_rate_table exynos5420_pll2550x_24mhz_tbl[] __initconst = {
[ Upstream commit b6adeb6bc61c2567b9efd815d61a61b34a2e51a6 ]
This patch fixes broken sound on Exynos5422/5800 platforms after system/suspend resume cycle in cases where the audio root clock is derived from MAU_EPLL_CLK.
In order to preserve state of the USER_MUX_MAU_EPLL_CLK clock mux during system suspend/resume cycle for Exynos5800 we group the MAU block input clocks in "MAU" sub-CMU and add the clock mux control bit to .suspend_regs. This ensures that user configuration of the mux is not lost after the PMU block changes the mux setting to OSC_DIV when switching off the MAU power domain.
Adding the SRC_TOP9 register to exynos5800_clk_regs[] array is not sufficient as at the time of the syscore_ops suspend call MAU power domain is already turned off and we already save and subsequently restore an incorrect register's value.
Fixes: b06a532bf1fa ("clk: samsung: Add Exynos5 sub-CMU clock driver") Reported-by: Jaafar Ali jaafarkhalaf@gmail.com Suggested-by: Marek Szyprowski m.szyprowski@samsung.com Tested-by: Jaafar Ali jaafarkhalaf@gmail.com Signed-off-by: Sylwester Nawrocki s.nawrocki@samsung.com Link: https://lkml.kernel.org/r/20190808144929.18685-2-s.nawrocki@samsung.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/samsung/clk-exynos5420.c | 54 ++++++++++++++++++++++------ 1 file changed, 43 insertions(+), 11 deletions(-)
diff --git a/drivers/clk/samsung/clk-exynos5420.c b/drivers/clk/samsung/clk-exynos5420.c index a6ea5d7e63d02..5eb0ce4b2648b 100644 --- a/drivers/clk/samsung/clk-exynos5420.c +++ b/drivers/clk/samsung/clk-exynos5420.c @@ -524,8 +524,6 @@ static const struct samsung_gate_clock exynos5800_gate_clks[] __initconst = { GATE_BUS_TOP, 24, 0, 0), GATE(CLK_ACLK432_SCALER, "aclk432_scaler", "mout_user_aclk432_scaler", GATE_BUS_TOP, 27, CLK_IS_CRITICAL, 0), - GATE(CLK_MAU_EPLL, "mau_epll", "mout_user_mau_epll", - SRC_MASK_TOP7, 20, CLK_SET_RATE_PARENT, 0), };
static const struct samsung_mux_clock exynos5420_mux_clks[] __initconst = { @@ -567,8 +565,13 @@ static const struct samsung_div_clock exynos5420_div_clks[] __initconst = {
static const struct samsung_gate_clock exynos5420_gate_clks[] __initconst = { GATE(CLK_SECKEY, "seckey", "aclk66_psgen", GATE_BUS_PERIS1, 1, 0, 0), + /* Maudio Block */ GATE(CLK_MAU_EPLL, "mau_epll", "mout_mau_epll_clk", SRC_MASK_TOP7, 20, CLK_SET_RATE_PARENT, 0), + GATE(CLK_SCLK_MAUDIO0, "sclk_maudio0", "dout_maudio0", + GATE_TOP_SCLK_MAU, 0, CLK_SET_RATE_PARENT, 0), + GATE(CLK_SCLK_MAUPCM0, "sclk_maupcm0", "dout_maupcm0", + GATE_TOP_SCLK_MAU, 1, CLK_SET_RATE_PARENT, 0), };
static const struct samsung_mux_clock exynos5x_mux_clks[] __initconst = { @@ -994,12 +997,6 @@ static const struct samsung_gate_clock exynos5x_gate_clks[] __initconst = { GATE(CLK_SCLK_DP1, "sclk_dp1", "dout_dp1", GATE_TOP_SCLK_DISP1, 20, CLK_SET_RATE_PARENT, 0),
- /* Maudio Block */ - GATE(CLK_SCLK_MAUDIO0, "sclk_maudio0", "dout_maudio0", - GATE_TOP_SCLK_MAU, 0, CLK_SET_RATE_PARENT, 0), - GATE(CLK_SCLK_MAUPCM0, "sclk_maupcm0", "dout_maupcm0", - GATE_TOP_SCLK_MAU, 1, CLK_SET_RATE_PARENT, 0), - /* FSYS Block */ GATE(CLK_TSI, "tsi", "aclk200_fsys", GATE_BUS_FSYS0, 0, 0, 0), GATE(CLK_PDMA0, "pdma0", "aclk200_fsys", GATE_BUS_FSYS0, 1, 0, 0), @@ -1232,6 +1229,20 @@ static struct exynos5_subcmu_reg_dump exynos5x_mfc_suspend_regs[] = { { DIV4_RATIO, 0, 0x3 }, /* DIV dout_mfc_blk */ };
+ +static const struct samsung_gate_clock exynos5800_mau_gate_clks[] __initconst = { + GATE(CLK_MAU_EPLL, "mau_epll", "mout_user_mau_epll", + SRC_MASK_TOP7, 20, CLK_SET_RATE_PARENT, 0), + GATE(CLK_SCLK_MAUDIO0, "sclk_maudio0", "dout_maudio0", + GATE_TOP_SCLK_MAU, 0, CLK_SET_RATE_PARENT, 0), + GATE(CLK_SCLK_MAUPCM0, "sclk_maupcm0", "dout_maupcm0", + GATE_TOP_SCLK_MAU, 1, CLK_SET_RATE_PARENT, 0), +}; + +static struct exynos5_subcmu_reg_dump exynos5800_mau_suspend_regs[] = { + { SRC_TOP9, 0, BIT(8) }, /* MUX mout_user_mau_epll */ +}; + static const struct exynos5_subcmu_info exynos5x_disp_subcmu = { .div_clks = exynos5x_disp_div_clks, .nr_div_clks = ARRAY_SIZE(exynos5x_disp_div_clks), @@ -1262,12 +1273,27 @@ static const struct exynos5_subcmu_info exynos5x_mfc_subcmu = { .pd_name = "MFC", };
+static const struct exynos5_subcmu_info exynos5800_mau_subcmu = { + .gate_clks = exynos5800_mau_gate_clks, + .nr_gate_clks = ARRAY_SIZE(exynos5800_mau_gate_clks), + .suspend_regs = exynos5800_mau_suspend_regs, + .nr_suspend_regs = ARRAY_SIZE(exynos5800_mau_suspend_regs), + .pd_name = "MAU", +}; + static const struct exynos5_subcmu_info *exynos5x_subcmus[] = { &exynos5x_disp_subcmu, &exynos5x_gsc_subcmu, &exynos5x_mfc_subcmu, };
+static const struct exynos5_subcmu_info *exynos5800_subcmus[] = { + &exynos5x_disp_subcmu, + &exynos5x_gsc_subcmu, + &exynos5x_mfc_subcmu, + &exynos5800_mau_subcmu, +}; + static const struct samsung_pll_rate_table exynos5420_pll2550x_24mhz_tbl[] __initconst = { PLL_35XX_RATE(24 * MHZ, 2000000000, 250, 3, 0), PLL_35XX_RATE(24 * MHZ, 1900000000, 475, 6, 0), @@ -1483,11 +1509,17 @@ static void __init exynos5x_clk_init(struct device_node *np, samsung_clk_extended_sleep_init(reg_base, exynos5x_clk_regs, ARRAY_SIZE(exynos5x_clk_regs), exynos5420_set_clksrc, ARRAY_SIZE(exynos5420_set_clksrc)); - if (soc == EXYNOS5800) + + if (soc == EXYNOS5800) { samsung_clk_sleep_init(reg_base, exynos5800_clk_regs, ARRAY_SIZE(exynos5800_clk_regs)); - exynos5_subcmus_init(ctx, ARRAY_SIZE(exynos5x_subcmus), - exynos5x_subcmus); + + exynos5_subcmus_init(ctx, ARRAY_SIZE(exynos5800_subcmus), + exynos5800_subcmus); + } else { + exynos5_subcmus_init(ctx, ARRAY_SIZE(exynos5x_subcmus), + exynos5x_subcmus); + }
samsung_clk_of_add_provider(np, ctx); }
[ Upstream commit baf7b79e1ad79a41fafd8ab8597b9a96962d822d ]
M2M scaler clocks require special handling of their parent bus clock during power domain on/off sequences. MSCL clocks were not initially added to the sub-CMU handler, because that time there was no driver for the M2M scaler device and it was not possible to test it.
This patch fixes this issue. Parent clock for M2M scaler devices is now properly preserved during MSC power domain on/off sequence. This gives M2M scaler devices proper performance: fullHD XRGB32 image 1000 rotations test takes 3.17s instead of 45.08s.
Fixes: b06a532bf1fa ("clk: samsung: Add Exynos5 sub-CMU clock driver") Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Link: https://lkml.kernel.org/r/20190808121839.23892-1-m.szyprowski@samsung.com Acked-by: Sylwester Nawrocki s.nawrocki@samsung.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/samsung/clk-exynos5420.c | 48 ++++++++++++++++++++-------- 1 file changed, 34 insertions(+), 14 deletions(-)
diff --git a/drivers/clk/samsung/clk-exynos5420.c b/drivers/clk/samsung/clk-exynos5420.c index 5eb0ce4b2648b..893697e00d2aa 100644 --- a/drivers/clk/samsung/clk-exynos5420.c +++ b/drivers/clk/samsung/clk-exynos5420.c @@ -870,9 +870,6 @@ static const struct samsung_div_clock exynos5x_div_clks[] __initconst = { /* GSCL Block */ DIV(0, "dout_gscl_blk_333", "aclk333_432_gscl", DIV2_RATIO0, 6, 2),
- /* MSCL Block */ - DIV(0, "dout_mscl_blk", "aclk400_mscl", DIV2_RATIO0, 28, 2), - /* PSGEN */ DIV(0, "dout_gen_blk", "mout_user_aclk266", DIV2_RATIO0, 8, 1), DIV(0, "dout_jpg_blk", "aclk166", DIV2_RATIO0, 20, 1), @@ -1136,17 +1133,6 @@ static const struct samsung_gate_clock exynos5x_gate_clks[] __initconst = { GATE(CLK_FIMC_LITE3, "fimc_lite3", "aclk333_432_gscl", GATE_IP_GSCL1, 17, 0, 0),
- /* MSCL Block */ - GATE(CLK_MSCL0, "mscl0", "aclk400_mscl", GATE_IP_MSCL, 0, 0, 0), - GATE(CLK_MSCL1, "mscl1", "aclk400_mscl", GATE_IP_MSCL, 1, 0, 0), - GATE(CLK_MSCL2, "mscl2", "aclk400_mscl", GATE_IP_MSCL, 2, 0, 0), - GATE(CLK_SMMU_MSCL0, "smmu_mscl0", "dout_mscl_blk", - GATE_IP_MSCL, 8, 0, 0), - GATE(CLK_SMMU_MSCL1, "smmu_mscl1", "dout_mscl_blk", - GATE_IP_MSCL, 9, 0, 0), - GATE(CLK_SMMU_MSCL2, "smmu_mscl2", "dout_mscl_blk", - GATE_IP_MSCL, 10, 0, 0), - /* ISP */ GATE(CLK_SCLK_UART_ISP, "sclk_uart_isp", "dout_uart_isp", GATE_TOP_SCLK_ISP, 0, CLK_SET_RATE_PARENT, 0), @@ -1229,6 +1215,28 @@ static struct exynos5_subcmu_reg_dump exynos5x_mfc_suspend_regs[] = { { DIV4_RATIO, 0, 0x3 }, /* DIV dout_mfc_blk */ };
+static const struct samsung_gate_clock exynos5x_mscl_gate_clks[] __initconst = { + /* MSCL Block */ + GATE(CLK_MSCL0, "mscl0", "aclk400_mscl", GATE_IP_MSCL, 0, 0, 0), + GATE(CLK_MSCL1, "mscl1", "aclk400_mscl", GATE_IP_MSCL, 1, 0, 0), + GATE(CLK_MSCL2, "mscl2", "aclk400_mscl", GATE_IP_MSCL, 2, 0, 0), + GATE(CLK_SMMU_MSCL0, "smmu_mscl0", "dout_mscl_blk", + GATE_IP_MSCL, 8, 0, 0), + GATE(CLK_SMMU_MSCL1, "smmu_mscl1", "dout_mscl_blk", + GATE_IP_MSCL, 9, 0, 0), + GATE(CLK_SMMU_MSCL2, "smmu_mscl2", "dout_mscl_blk", + GATE_IP_MSCL, 10, 0, 0), +}; + +static const struct samsung_div_clock exynos5x_mscl_div_clks[] __initconst = { + DIV(0, "dout_mscl_blk", "aclk400_mscl", DIV2_RATIO0, 28, 2), +}; + +static struct exynos5_subcmu_reg_dump exynos5x_mscl_suspend_regs[] = { + { GATE_IP_MSCL, 0xffffffff, 0xffffffff }, /* MSCL gates */ + { SRC_TOP3, 0, BIT(4) }, /* MUX mout_user_aclk400_mscl */ + { DIV2_RATIO0, 0, 0x30000000 }, /* DIV dout_mscl_blk */ +};
static const struct samsung_gate_clock exynos5800_mau_gate_clks[] __initconst = { GATE(CLK_MAU_EPLL, "mau_epll", "mout_user_mau_epll", @@ -1273,6 +1281,16 @@ static const struct exynos5_subcmu_info exynos5x_mfc_subcmu = { .pd_name = "MFC", };
+static const struct exynos5_subcmu_info exynos5x_mscl_subcmu = { + .div_clks = exynos5x_mscl_div_clks, + .nr_div_clks = ARRAY_SIZE(exynos5x_mscl_div_clks), + .gate_clks = exynos5x_mscl_gate_clks, + .nr_gate_clks = ARRAY_SIZE(exynos5x_mscl_gate_clks), + .suspend_regs = exynos5x_mscl_suspend_regs, + .nr_suspend_regs = ARRAY_SIZE(exynos5x_mscl_suspend_regs), + .pd_name = "MSC", +}; + static const struct exynos5_subcmu_info exynos5800_mau_subcmu = { .gate_clks = exynos5800_mau_gate_clks, .nr_gate_clks = ARRAY_SIZE(exynos5800_mau_gate_clks), @@ -1285,12 +1303,14 @@ static const struct exynos5_subcmu_info *exynos5x_subcmus[] = { &exynos5x_disp_subcmu, &exynos5x_gsc_subcmu, &exynos5x_mfc_subcmu, + &exynos5x_mscl_subcmu, };
static const struct exynos5_subcmu_info *exynos5800_subcmus[] = { &exynos5x_disp_subcmu, &exynos5x_gsc_subcmu, &exynos5x_mfc_subcmu, + &exynos5x_mscl_subcmu, &exynos5800_mau_subcmu, };
[ Upstream commit 8c25d0887a8bd0e1ca2074ac0c6dff173787a83b ]
As spin_unlock_irq will enable interrupts. Function tsi108_stat_carry is called from interrupt handler tsi108_irq. Interrupts are enabled in interrupt handler. Use spin_lock_irqsave/spin_unlock_irqrestore instead of spin_(un)lock_irq in IRQ context to avoid this.
Signed-off-by: Fuqian Huang huangfq.daxian@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/tundra/tsi108_eth.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/tundra/tsi108_eth.c b/drivers/net/ethernet/tundra/tsi108_eth.c index 78a7de3fb622f..c62f474b6d08e 100644 --- a/drivers/net/ethernet/tundra/tsi108_eth.c +++ b/drivers/net/ethernet/tundra/tsi108_eth.c @@ -371,9 +371,10 @@ tsi108_stat_carry_one(int carry, int carry_bit, int carry_shift, static void tsi108_stat_carry(struct net_device *dev) { struct tsi108_prv_data *data = netdev_priv(dev); + unsigned long flags; u32 carry1, carry2;
- spin_lock_irq(&data->misclock); + spin_lock_irqsave(&data->misclock, flags);
carry1 = TSI_READ(TSI108_STAT_CARRY1); carry2 = TSI_READ(TSI108_STAT_CARRY2); @@ -441,7 +442,7 @@ static void tsi108_stat_carry(struct net_device *dev) TSI108_STAT_TXPAUSEDROP_CARRY, &data->tx_pause_drop);
- spin_unlock_irq(&data->misclock); + spin_unlock_irqrestore(&data->misclock, flags); }
/* Read a stat counter atomically with respect to carries.
[ Upstream commit 6a0a8d10a3661a036b55af695542a714c429ab7c ]
If a rule that has already a bound anonymous set fails to be added, the preparation phase releases the rule and the bound set. However, the transaction object from the abort path still has a reference to the set object that is stale, leading to a use-after-free when checking for the set->bound field. Add a new field to the transaction that specifies if the set is bound, so the abort path can skip releasing it since the rule command owns it and it takes care of releasing it. After this update, the set->bound field is removed.
[ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434 [ 24.657858] Mem abort info: [ 24.660686] ESR = 0x96000004 [ 24.663769] Exception class = DABT (current EL), IL = 32 bits [ 24.669725] SET = 0, FnV = 0 [ 24.672804] EA = 0, S1PTW = 0 [ 24.675975] Data abort info: [ 24.678880] ISV = 0, ISS = 0x00000004 [ 24.682743] CM = 0, WnR = 0 [ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000 [ 24.692207] [0000000000040434] pgd=0000000000000000 [ 24.697119] Internal error: Oops: 96000004 [#1] SMP [...] [ 24.889414] Call trace: [ 24.891870] __nf_tables_abort+0x3f0/0x7a0 [ 24.895984] nf_tables_abort+0x20/0x40 [ 24.899750] nfnetlink_rcv_batch+0x17c/0x588 [ 24.904037] nfnetlink_rcv+0x13c/0x190 [ 24.907803] netlink_unicast+0x18c/0x208 [ 24.911742] netlink_sendmsg+0x1b0/0x350 [ 24.915682] sock_sendmsg+0x4c/0x68 [ 24.919185] ___sys_sendmsg+0x288/0x2c8 [ 24.923037] __sys_sendmsg+0x7c/0xd0 [ 24.926628] __arm64_sys_sendmsg+0x2c/0x38 [ 24.930744] el0_svc_common.constprop.0+0x94/0x158 [ 24.935556] el0_svc_handler+0x34/0x90 [ 24.939322] el0_svc+0x8/0xc [ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863) [ 24.948336] ---[ end trace cebbb9dcbed3b56f ]---
Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 9 +++++++-- net/netfilter/nf_tables_api.c | 15 ++++++++++----- 2 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 5b8624ae4a27f..930d062940b7d 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -419,8 +419,7 @@ struct nft_set { unsigned char *udata; /* runtime data below here */ const struct nft_set_ops *ops ____cacheline_aligned; - u16 flags:13, - bound:1, + u16 flags:14, genmask:2; u8 klen; u8 dlen; @@ -1333,12 +1332,15 @@ struct nft_trans_rule { struct nft_trans_set { struct nft_set *set; u32 set_id; + bool bound; };
#define nft_trans_set(trans) \ (((struct nft_trans_set *)trans->data)->set) #define nft_trans_set_id(trans) \ (((struct nft_trans_set *)trans->data)->set_id) +#define nft_trans_set_bound(trans) \ + (((struct nft_trans_set *)trans->data)->bound)
struct nft_trans_chain { bool update; @@ -1369,12 +1371,15 @@ struct nft_trans_table { struct nft_trans_elem { struct nft_set *set; struct nft_set_elem elem; + bool bound; };
#define nft_trans_elem_set(trans) \ (((struct nft_trans_elem *)trans->data)->set) #define nft_trans_elem(trans) \ (((struct nft_trans_elem *)trans->data)->elem) +#define nft_trans_elem_set_bound(trans) \ + (((struct nft_trans_elem *)trans->data)->bound)
struct nft_trans_obj { struct nft_object *obj; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index bcf17fb46d965..8e4cdae2c4f14 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -136,9 +136,14 @@ static void nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set) return;
list_for_each_entry_reverse(trans, &net->nft.commit_list, list) { - if (trans->msg_type == NFT_MSG_NEWSET && - nft_trans_set(trans) == set) { - set->bound = true; + switch (trans->msg_type) { + case NFT_MSG_NEWSET: + if (nft_trans_set(trans) == set) + nft_trans_set_bound(trans) = true; + break; + case NFT_MSG_NEWSETELEM: + if (nft_trans_elem_set(trans) == set) + nft_trans_elem_set_bound(trans) = true; break; } } @@ -6849,7 +6854,7 @@ static int __nf_tables_abort(struct net *net) break; case NFT_MSG_NEWSET: trans->ctx.table->use--; - if (nft_trans_set(trans)->bound) { + if (nft_trans_set_bound(trans)) { nft_trans_destroy(trans); break; } @@ -6861,7 +6866,7 @@ static int __nf_tables_abort(struct net *net) nft_trans_destroy(trans); break; case NFT_MSG_NEWSETELEM: - if (nft_trans_elem_set(trans)->bound) { + if (nft_trans_elem_set_bound(trans)) { nft_trans_destroy(trans); break; }
[ Upstream commit 3e68db2f6422d711550a32cbc87abd97bb6efab3 ]
Update conntrack entry to pick up expired flows, otherwise the conntrack entry gets stuck with the internal offload timeout (one day). The TCP state also needs to be adjusted to ESTABLISHED state and tracking is set to liberal mode in order to give conntrack a chance to pick up the expired flow.
Fixes: ac2a66665e23 ("netfilter: add generic flow table infrastructure") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_flow_table_core.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 948b4ebbe3fbd..4254e42605135 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -112,7 +112,7 @@ static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp) #define NF_FLOWTABLE_TCP_PICKUP_TIMEOUT (120 * HZ) #define NF_FLOWTABLE_UDP_PICKUP_TIMEOUT (30 * HZ)
-static void flow_offload_fixup_ct_state(struct nf_conn *ct) +static void flow_offload_fixup_ct(struct nf_conn *ct) { const struct nf_conntrack_l4proto *l4proto; unsigned int timeout; @@ -209,6 +209,11 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow) } EXPORT_SYMBOL_GPL(flow_offload_add);
+static inline bool nf_flow_has_expired(const struct flow_offload *flow) +{ + return (__s32)(flow->timeout - (u32)jiffies) <= 0; +} + static void flow_offload_del(struct nf_flowtable *flow_table, struct flow_offload *flow) { @@ -224,6 +229,9 @@ static void flow_offload_del(struct nf_flowtable *flow_table, e = container_of(flow, struct flow_offload_entry, flow); clear_bit(IPS_OFFLOAD_BIT, &e->ct->status);
+ if (nf_flow_has_expired(flow)) + flow_offload_fixup_ct(e->ct); + flow_offload_free(flow); }
@@ -234,7 +242,7 @@ void flow_offload_teardown(struct flow_offload *flow) flow->flags |= FLOW_OFFLOAD_TEARDOWN;
e = container_of(flow, struct flow_offload_entry, flow); - flow_offload_fixup_ct_state(e->ct); + flow_offload_fixup_ct(e->ct); } EXPORT_SYMBOL_GPL(flow_offload_teardown);
@@ -299,11 +307,6 @@ nf_flow_table_iterate(struct nf_flowtable *flow_table, return err; }
-static inline bool nf_flow_has_expired(const struct flow_offload *flow) -{ - return (__s32)(flow->timeout - (u32)jiffies) <= 0; -} - static void nf_flow_offload_gc_step(struct flow_offload *flow, void *data) { struct nf_flowtable *flow_table = data;
[ Upstream commit 1e5b2471bcc4838df298080ae1ec042c2cbc9ce9 ]
Flows that are in teardown state (due to RST / FIN TCP packet) still have their offload flag set on. Hence, the conntrack garbage collector may race to undo the timeout adjustment that the fixup routine performs, leaving the conntrack entry in place with the internal offload timeout (one day).
Update teardown flow state to ESTABLISHED and set tracking to liberal, then once the offload bit is cleared, adjust timeout if it is more than the default fixup timeout (conntrack might already have set a lower timeout from the packet path).
Fixes: da5984e51063 ("netfilter: nf_flow_table: add support for sending flows back to the slow path") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_flow_table_core.c | 34 ++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 9 deletions(-)
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 4254e42605135..49248fe5847a1 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -112,15 +112,16 @@ static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp) #define NF_FLOWTABLE_TCP_PICKUP_TIMEOUT (120 * HZ) #define NF_FLOWTABLE_UDP_PICKUP_TIMEOUT (30 * HZ)
-static void flow_offload_fixup_ct(struct nf_conn *ct) +static inline __s32 nf_flow_timeout_delta(unsigned int timeout) +{ + return (__s32)(timeout - (u32)jiffies); +} + +static void flow_offload_fixup_ct_timeout(struct nf_conn *ct) { const struct nf_conntrack_l4proto *l4proto; + int l4num = nf_ct_protonum(ct); unsigned int timeout; - int l4num; - - l4num = nf_ct_protonum(ct); - if (l4num == IPPROTO_TCP) - flow_offload_fixup_tcp(&ct->proto.tcp);
l4proto = nf_ct_l4proto_find(l4num); if (!l4proto) @@ -133,7 +134,20 @@ static void flow_offload_fixup_ct(struct nf_conn *ct) else return;
- ct->timeout = nfct_time_stamp + timeout; + if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout) + ct->timeout = nfct_time_stamp + timeout; +} + +static void flow_offload_fixup_ct_state(struct nf_conn *ct) +{ + if (nf_ct_protonum(ct) == IPPROTO_TCP) + flow_offload_fixup_tcp(&ct->proto.tcp); +} + +static void flow_offload_fixup_ct(struct nf_conn *ct) +{ + flow_offload_fixup_ct_state(ct); + flow_offload_fixup_ct_timeout(ct); }
void flow_offload_free(struct flow_offload *flow) @@ -211,7 +225,7 @@ EXPORT_SYMBOL_GPL(flow_offload_add);
static inline bool nf_flow_has_expired(const struct flow_offload *flow) { - return (__s32)(flow->timeout - (u32)jiffies) <= 0; + return nf_flow_timeout_delta(flow->timeout) <= 0; }
static void flow_offload_del(struct nf_flowtable *flow_table, @@ -231,6 +245,8 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
if (nf_flow_has_expired(flow)) flow_offload_fixup_ct(e->ct); + else if (flow->flags & FLOW_OFFLOAD_TEARDOWN) + flow_offload_fixup_ct_timeout(e->ct);
flow_offload_free(flow); } @@ -242,7 +258,7 @@ void flow_offload_teardown(struct flow_offload *flow) flow->flags |= FLOW_OFFLOAD_TEARDOWN;
e = container_of(flow, struct flow_offload_entry, flow); - flow_offload_fixup_ct(e->ct); + flow_offload_fixup_ct_state(e->ct); } EXPORT_SYMBOL_GPL(flow_offload_teardown);
[ Upstream commit b3e78adcbf991a4e8b2ebb23c9889e968ec76c5f ]
Change an error message to work for any object being pinned not just programs.
Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool") Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Reviewed-by: Quentin Monnet quentin.monnet@netronome.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/bpf/bpftool/common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index f7261fad45c19..647d8a4044fbd 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -236,7 +236,7 @@ int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
fd = get_fd_by_id(id); if (fd < 0) { - p_err("can't get prog by id (%u): %s", id, strerror(errno)); + p_err("can't open object by id (%u): %s", id, strerror(errno)); return -1; }
[ Upstream commit 8b6381600d59871fbe44d36522272f961ab42410 ]
ixgbe_service_task() calls unregister_netdev() under rtnl_lock(). But unregister_netdev() internally calls rtnl_lock(). So deadlock would occur.
Fixes: 59dd45d550c5 ("ixgbe: firmware recovery mode") Signed-off-by: Taehee Yoo ap420073@gmail.com Tested-by: Andrew Bowers andrewx.bowers@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 57fd9ee6de665..f7c049559c1a5 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -7893,11 +7893,8 @@ static void ixgbe_service_task(struct work_struct *work) return; } if (ixgbe_check_fw_error(adapter)) { - if (!test_bit(__IXGBE_DOWN, &adapter->state)) { - rtnl_lock(); + if (!test_bit(__IXGBE_DOWN, &adapter->state)) unregister_netdev(adapter->netdev); - rtnl_unlock(); - } ixgbe_service_event_complete(adapter); return; }
[ Upstream commit 6d0d779dca73cd5acb649c54f81401f93098b298 ]
This fixes a warning of "suspicious rcu_dereference_check() usage" when nload runs.
Fixes: 776e726bfb34 ("netvsc: fix RCU warning in get_stats") Signed-off-by: Dexuan Cui decui@microsoft.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/hyperv/netvsc_drv.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 3544e19915792..e8fce6d715ef0 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -1239,12 +1239,15 @@ static void netvsc_get_stats64(struct net_device *net, struct rtnl_link_stats64 *t) { struct net_device_context *ndev_ctx = netdev_priv(net); - struct netvsc_device *nvdev = rcu_dereference_rtnl(ndev_ctx->nvdev); + struct netvsc_device *nvdev; struct netvsc_vf_pcpu_stats vf_tot; int i;
+ rcu_read_lock(); + + nvdev = rcu_dereference(ndev_ctx->nvdev); if (!nvdev) - return; + goto out;
netdev_stats_to_stats64(t, &net->stats);
@@ -1283,6 +1286,8 @@ static void netvsc_get_stats64(struct net_device *net, t->rx_packets += packets; t->multicast += multicast; } +out: + rcu_read_unlock(); }
static int netvsc_set_mac_addr(struct net_device *ndev, void *p)
[ Upstream commit 125b7e0949d4e72b15c2b1a1590f8cece985a918 ]
clang warns:
drivers/net/ethernet/toshiba/tc35815.c:1507:30: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand] if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) ^ ~~~~~~~~~~~~ drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: use '&' for a bitwise operation if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) ^~ & drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: remove constant to silence this warning if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) ~^~~~~~~~~~~~~~~ 1 warning generated.
Explicitly check that NET_IP_ALIGN is not zero, which matches how this is checked in other parts of the tree. Because NET_IP_ALIGN is a build time constant, this check will be constant folded away during optimization.
Fixes: 82a9928db560 ("tc35815: Enable StripCRC feature") Link: https://github.com/ClangBuiltLinux/linux/issues/608 Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/toshiba/tc35815.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/toshiba/tc35815.c b/drivers/net/ethernet/toshiba/tc35815.c index c50a9772f4aff..3b5a26b05295f 100644 --- a/drivers/net/ethernet/toshiba/tc35815.c +++ b/drivers/net/ethernet/toshiba/tc35815.c @@ -1504,7 +1504,7 @@ tc35815_rx(struct net_device *dev, int limit) pci_unmap_single(lp->pci_dev, lp->rx_skbs[cur_bd].skb_dma, RX_BUF_SIZE, PCI_DMA_FROMDEVICE); - if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) + if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN != 0) memmove(skb->data, skb->data - NET_IP_ALIGN, pkt_len); data = skb_put(skb, pkt_len);
[ Upstream commit 8059ba0bd0e4694e51c2ee6438a77b325f06c0d5 ]
On WCN3990 downloading the NVM sometimes fails with a "TLV response size mismatch" error:
[ 174.949955] Bluetooth: btqca.c:qca_download_firmware() hci0: QCA Downloading qca/crnv21.bin [ 174.958718] Bluetooth: btqca.c:qca_tlv_send_segment() hci0: QCA TLV response size mismatch
It seems the controller needs a short time after downloading the firmware before it is ready for the NVM. A delay as short as 1 ms seems sufficient, make it 10 ms just in case. No event is received during the delay, hence we don't just silently drop an extra event.
Signed-off-by: Matthias Kaehlcke mka@chromium.org Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btqca.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c index aff1d22223bd4..0ee5acb685a10 100644 --- a/drivers/bluetooth/btqca.c +++ b/drivers/bluetooth/btqca.c @@ -350,6 +350,9 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, return err; }
+ /* Give the controller some time to get ready to receive the NVM */ + msleep(10); + /* Download NVM configuration */ config.type = TLV_TYPE_NVM; if (qca_is_wcn399x(soc_type))
[ Upstream commit a2780889e247561744dd8efbd3478a1999b72ae3 ]
WCN399x chips are coex chips, it needs a VS pre shutdown command while turning off the BT. So that chip can inform BT is OFF to other active clients.
Signed-off-by: Harish Bandi c-hbandi@codeaurora.org Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btqca.c | 21 +++++++++++++++++++++ drivers/bluetooth/btqca.h | 7 +++++++ drivers/bluetooth/hci_qca.c | 3 +++ 3 files changed, 31 insertions(+)
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c index 0ee5acb685a10..ee25e6ae1a098 100644 --- a/drivers/bluetooth/btqca.c +++ b/drivers/bluetooth/btqca.c @@ -99,6 +99,27 @@ static int qca_send_reset(struct hci_dev *hdev) return 0; }
+int qca_send_pre_shutdown_cmd(struct hci_dev *hdev) +{ + struct sk_buff *skb; + int err; + + bt_dev_dbg(hdev, "QCA pre shutdown cmd"); + + skb = __hci_cmd_sync(hdev, QCA_PRE_SHUTDOWN_CMD, 0, + NULL, HCI_INIT_TIMEOUT); + if (IS_ERR(skb)) { + err = PTR_ERR(skb); + bt_dev_err(hdev, "QCA preshutdown_cmd failed (%d)", err); + return err; + } + + kfree_skb(skb); + + return 0; +} +EXPORT_SYMBOL_GPL(qca_send_pre_shutdown_cmd); + static void qca_tlv_check_data(struct rome_config *config, const struct firmware *fw) { diff --git a/drivers/bluetooth/btqca.h b/drivers/bluetooth/btqca.h index e9c9999596035..f2a9e576a86ce 100644 --- a/drivers/bluetooth/btqca.h +++ b/drivers/bluetooth/btqca.h @@ -13,6 +13,7 @@ #define EDL_PATCH_TLV_REQ_CMD (0x1E) #define EDL_NVM_ACCESS_SET_REQ_CMD (0x01) #define MAX_SIZE_PER_TLV_SEGMENT (243) +#define QCA_PRE_SHUTDOWN_CMD (0xFC08)
#define EDL_CMD_REQ_RES_EVT (0x00) #define EDL_PATCH_VER_RES_EVT (0x19) @@ -130,6 +131,7 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, enum qca_btsoc_type soc_type, u32 soc_ver); int qca_read_soc_version(struct hci_dev *hdev, u32 *soc_version); int qca_set_bdaddr(struct hci_dev *hdev, const bdaddr_t *bdaddr); +int qca_send_pre_shutdown_cmd(struct hci_dev *hdev); static inline bool qca_is_wcn399x(enum qca_btsoc_type soc_type) { return soc_type == QCA_WCN3990 || soc_type == QCA_WCN3998; @@ -161,4 +163,9 @@ static inline bool qca_is_wcn399x(enum qca_btsoc_type soc_type) { return false; } + +static inline int qca_send_pre_shutdown_cmd(struct hci_dev *hdev) +{ + return -EOPNOTSUPP; +} #endif diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index f41fb2c02e4fd..d88b024eaf566 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -1319,6 +1319,9 @@ static int qca_power_off(struct hci_dev *hdev) { struct hci_uart *hu = hci_get_drvdata(hdev);
+ /* Perform pre shutdown command */ + qca_send_pre_shutdown_cmd(hdev); + qca_power_shutdown(hu); return 0; }
[ Upstream commit 48d9cc9d85dde37c87abb7ac9bbec6598ba44b56 ]
Let hidp_send_message return the number of successfully queued bytes instead of an unconditional 0.
With the return value fixed to 0, other drivers relying on hidp, such as hidraw, can not return meaningful values from their respective implementations of write(). In particular, with the current behavior, a hidraw device's write() will have different return values depending on whether the device is connected via USB or Bluetooth, which makes it harder to abstract away the transport layer.
Signed-off-by: Fabian Henneke fabian.henneke@gmail.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hidp/core.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c index 5abd423b55fa9..8d889969ae7ed 100644 --- a/net/bluetooth/hidp/core.c +++ b/net/bluetooth/hidp/core.c @@ -101,6 +101,7 @@ static int hidp_send_message(struct hidp_session *session, struct socket *sock, { struct sk_buff *skb; struct sock *sk = sock->sk; + int ret;
BT_DBG("session %p data %p size %d", session, data, size);
@@ -114,13 +115,17 @@ static int hidp_send_message(struct hidp_session *session, struct socket *sock, }
skb_put_u8(skb, hdr); - if (data && size > 0) + if (data && size > 0) { skb_put_data(skb, data, size); + ret = size; + } else { + ret = 0; + }
skb_queue_tail(transmit, skb); wake_up_interruptible(sk_sleep(sk));
- return 0; + return ret; }
static int hidp_send_ctrl_message(struct hidp_session *session,
[ Upstream commit 072f79400032f74917726cf76f4248367ea2b5b8 ]
Callbacks for a cmd reply run outside the protection of card->lock, to allow for additional cmds to be issued & enqueued in parallel.
When qeth_send_control_data() bails out for a cmd without having received a reply (eg. due to timeout), its callback may concurrently be processing a reply that just arrived. In this case, the callback potentially accesses a stale reply->reply_param area that eg. was on-stack and has already been released.
To avoid this race, add some locking so that qeth_send_control_data() can (1) wait for a concurrently running callback, and (2) zap any pending callback that still wants to run.
Signed-off-by: Julian Wiedmann jwi@linux.ibm.com Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/net/qeth_core.h | 1 + drivers/s390/net/qeth_core_main.c | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+)
diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index 784a2e76a1b04..c5f60f95e8db9 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -640,6 +640,7 @@ struct qeth_seqno { struct qeth_reply { struct list_head list; struct completion received; + spinlock_t lock; int (*callback)(struct qeth_card *, struct qeth_reply *, unsigned long); u32 seqno; diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index b1823d75dd35c..6b8f99e7d8a81 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -548,6 +548,7 @@ static struct qeth_reply *qeth_alloc_reply(struct qeth_card *card) if (reply) { refcount_set(&reply->refcnt, 1); init_completion(&reply->received); + spin_lock_init(&reply->lock); } return reply; } @@ -832,6 +833,13 @@ static void qeth_issue_next_read_cb(struct qeth_card *card,
if (!reply->callback) { rc = 0; + goto no_callback; + } + + spin_lock_irqsave(&reply->lock, flags); + if (reply->rc) { + /* Bail out when the requestor has already left: */ + rc = reply->rc; } else { if (cmd) { reply->offset = (u16)((char *)cmd - (char *)iob->data); @@ -840,7 +848,9 @@ static void qeth_issue_next_read_cb(struct qeth_card *card, rc = reply->callback(card, reply, (unsigned long)iob); } } + spin_unlock_irqrestore(&reply->lock, flags);
+no_callback: if (rc <= 0) qeth_notify_reply(reply, rc); qeth_put_reply(reply); @@ -1880,6 +1890,16 @@ static int qeth_send_control_data(struct qeth_card *card, int len, rc = (timeout == -ERESTARTSYS) ? -EINTR : -ETIME;
qeth_dequeue_reply(card, reply); + + if (reply_cb) { + /* Wait until the callback for a late reply has completed: */ + spin_lock_irq(&reply->lock); + if (rc) + /* Zap any callback that's still pending: */ + reply->rc = rc; + spin_unlock_irq(&reply->lock); + } + if (!rc) rc = reply->rc; qeth_put_reply(reply);
[ Upstream commit 66cf4710b23ab2adda11155684a2c8826f4fe732 ]
The ibm,mac-address-filters property defines the maximum number of addresses the hypervisor's multicast filter list can support. It is encoded as a big-endian integer in the OF device tree, but the virtual ethernet driver does not convert it for use by little-endian systems. As a result, the driver is not behaving as it should on affected systems when a large number of multicast addresses are assigned to the device.
Reported-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: Thomas Falcon tlfalcon@linux.ibm.com Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmveth.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index d654c234aaf75..c5be4ebd84373 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1605,7 +1605,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id) struct net_device *netdev; struct ibmveth_adapter *adapter; unsigned char *mac_addr_p; - unsigned int *mcastFilterSize_p; + __be32 *mcastFilterSize_p; long ret; unsigned long ret_attr;
@@ -1627,8 +1627,9 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id) return -EINVAL; }
- mcastFilterSize_p = (unsigned int *)vio_get_attribute(dev, - VETH_MCAST_FILTER_SIZE, NULL); + mcastFilterSize_p = (__be32 *)vio_get_attribute(dev, + VETH_MCAST_FILTER_SIZE, + NULL); if (!mcastFilterSize_p) { dev_err(&dev->dev, "Can't find VETH_MCAST_FILTER_SIZE " "attribute\n"); @@ -1645,7 +1646,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
adapter->vdev = dev; adapter->netdev = netdev; - adapter->mcastFilterSize = *mcastFilterSize_p; + adapter->mcastFilterSize = be32_to_cpu(*mcastFilterSize_p); adapter->pool_config = 0;
netif_napi_add(netdev, &adapter->napi, ibmveth_poll, 16);
[ Upstream commit 68e03b85474a51ec1921b4d13204782594ef7223 ]
when do randbuilding, I got this error:
In file included from drivers/hwmon/pmbus/ucd9000.c:19:0: ./include/linux/gpio/driver.h:576:1: error: redefinition of gpiochip_add_pin_range gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name, ^~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/hwmon/pmbus/ucd9000.c:18:0: ./include/linux/gpio.h:245:1: note: previous definition of gpiochip_add_pin_range was here gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name, ^~~~~~~~~~~~~~~~~~~~~~
Reported-by: Hulk Robot hulkci@huawei.com Fixes: 964cb341882f ("gpio: move pincontrol calls to <linux/gpio/driver.h>") Signed-off-by: YueHaibing yuehaibing@huawei.com Link: https://lore.kernel.org/r/20190731123814.46624-1-yuehaibing@huawei.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/gpio.h | 24 ------------------------ 1 file changed, 24 deletions(-)
diff --git a/include/linux/gpio.h b/include/linux/gpio.h index 39745b8bdd65d..b3115d1a7d494 100644 --- a/include/linux/gpio.h +++ b/include/linux/gpio.h @@ -240,30 +240,6 @@ static inline int irq_to_gpio(unsigned irq) return -EINVAL; }
-static inline int -gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name, - unsigned int gpio_offset, unsigned int pin_offset, - unsigned int npins) -{ - WARN_ON(1); - return -EINVAL; -} - -static inline int -gpiochip_add_pingroup_range(struct gpio_chip *chip, - struct pinctrl_dev *pctldev, - unsigned int gpio_offset, const char *pin_group) -{ - WARN_ON(1); - return -EINVAL; -} - -static inline void -gpiochip_remove_pin_ranges(struct gpio_chip *chip) -{ - WARN_ON(1); -} - static inline int devm_gpio_request(struct device *dev, unsigned gpio, const char *label) {
[ Upstream commit dfe42be15fde16232340b8b2a57c359f51cc10d9 ]
TCP rst and fin packets do not qualify to place a flow into the flowtable. Most likely there will be no more packets after connection closure. Without this patch, this flow entry expires and connection tracking picks up the entry in ESTABLISHED state using the fixup timeout, which makes this look inconsistent to the user for a connection that is actually already closed.
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_flow_offload.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c index aa5f571d43619..060a4ed46d5e6 100644 --- a/net/netfilter/nft_flow_offload.c +++ b/net/netfilter/nft_flow_offload.c @@ -72,11 +72,11 @@ static void nft_flow_offload_eval(const struct nft_expr *expr, { struct nft_flow_offload *priv = nft_expr_priv(expr); struct nf_flowtable *flowtable = &priv->flowtable->data; + struct tcphdr _tcph, *tcph = NULL; enum ip_conntrack_info ctinfo; struct nf_flow_route route; struct flow_offload *flow; enum ip_conntrack_dir dir; - bool is_tcp = false; struct nf_conn *ct; int ret;
@@ -89,7 +89,10 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
switch (ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum) { case IPPROTO_TCP: - is_tcp = true; + tcph = skb_header_pointer(pkt->skb, pkt->xt.thoff, + sizeof(_tcph), &_tcph); + if (unlikely(!tcph || tcph->fin || tcph->rst)) + goto out; break; case IPPROTO_UDP: break; @@ -115,7 +118,7 @@ static void nft_flow_offload_eval(const struct nft_expr *expr, if (!flow) goto err_flow_alloc;
- if (is_tcp) { + if (tcph) { ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL; ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL; }
[ Upstream commit 4c6f3196e6ea111c456c6086dc3f57d4706b0b2d ]
PRIME buffers should be imported using the DMA device. To this end, use a custom import function that mimics drm_gem_prime_import_dev(), but passes the correct device.
Fixes: 119f5173628aa ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.") Signed-off-by: Alexandre Courbot acourbot@chromium.org Signed-off-by: CK Hu ck.hu@mediatek.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/mediatek/mtk_drm_drv.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c index 95fdbd0fbcace..8b18a00a58c7e 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c @@ -320,6 +320,18 @@ static const struct file_operations mtk_drm_fops = { .compat_ioctl = drm_compat_ioctl, };
+/* + * We need to override this because the device used to import the memory is + * not dev->dev, as drm_gem_prime_import() expects. + */ +struct drm_gem_object *mtk_drm_gem_prime_import(struct drm_device *dev, + struct dma_buf *dma_buf) +{ + struct mtk_drm_private *private = dev->dev_private; + + return drm_gem_prime_import_dev(dev, dma_buf, private->dma_dev); +} + static struct drm_driver mtk_drm_driver = { .driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_PRIME | DRIVER_ATOMIC, @@ -331,7 +343,7 @@ static struct drm_driver mtk_drm_driver = { .prime_handle_to_fd = drm_gem_prime_handle_to_fd, .prime_fd_to_handle = drm_gem_prime_fd_to_handle, .gem_prime_export = drm_gem_prime_export, - .gem_prime_import = drm_gem_prime_import, + .gem_prime_import = mtk_drm_gem_prime_import, .gem_prime_get_sg_table = mtk_gem_prime_get_sg_table, .gem_prime_import_sg_table = mtk_gem_prime_import_sg_table, .gem_prime_mmap = mtk_drm_gem_mmap_buf,
[ Upstream commit 070955558e820b9a89c570b91b1f21762f62b288 ]
This driver requires imported PRIME buffers to appear contiguously in its IO address space. Make sure this is the case by setting the maximum DMA segment size to a more suitable value than the default 64KB.
Signed-off-by: Alexandre Courbot acourbot@chromium.org Reviewed-by: Tomasz Figa tfiga@chromium.org Signed-off-by: CK Hu ck.hu@mediatek.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/mediatek/mtk_drm_drv.c | 35 ++++++++++++++++++++++++-- drivers/gpu/drm/mediatek/mtk_drm_drv.h | 2 ++ 2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c index 8b18a00a58c7e..c021d4c8324f5 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c @@ -213,6 +213,7 @@ static int mtk_drm_kms_init(struct drm_device *drm) struct mtk_drm_private *private = drm->dev_private; struct platform_device *pdev; struct device_node *np; + struct device *dma_dev; int ret;
if (!iommu_present(&platform_bus_type)) @@ -275,7 +276,29 @@ static int mtk_drm_kms_init(struct drm_device *drm) goto err_component_unbind; }
- private->dma_dev = &pdev->dev; + dma_dev = &pdev->dev; + private->dma_dev = dma_dev; + + /* + * Configure the DMA segment size to make sure we get contiguous IOVA + * when importing PRIME buffers. + */ + if (!dma_dev->dma_parms) { + private->dma_parms_allocated = true; + dma_dev->dma_parms = + devm_kzalloc(drm->dev, sizeof(*dma_dev->dma_parms), + GFP_KERNEL); + } + if (!dma_dev->dma_parms) { + ret = -ENOMEM; + goto err_component_unbind; + } + + ret = dma_set_max_seg_size(dma_dev, (unsigned int)DMA_BIT_MASK(32)); + if (ret) { + dev_err(dma_dev, "Failed to set DMA segment size\n"); + goto err_unset_dma_parms; + }
/* * We don't use the drm_irq_install() helpers provided by the DRM @@ -285,13 +308,16 @@ static int mtk_drm_kms_init(struct drm_device *drm) drm->irq_enabled = true; ret = drm_vblank_init(drm, MAX_CRTC); if (ret < 0) - goto err_component_unbind; + goto err_unset_dma_parms;
drm_kms_helper_poll_init(drm); drm_mode_config_reset(drm);
return 0;
+err_unset_dma_parms: + if (private->dma_parms_allocated) + dma_dev->dma_parms = NULL; err_component_unbind: component_unbind_all(drm->dev, drm); err_config_cleanup: @@ -302,9 +328,14 @@ err_config_cleanup:
static void mtk_drm_kms_deinit(struct drm_device *drm) { + struct mtk_drm_private *private = drm->dev_private; + drm_kms_helper_poll_fini(drm); drm_atomic_helper_shutdown(drm);
+ if (private->dma_parms_allocated) + private->dma_dev->dma_parms = NULL; + component_unbind_all(drm->dev, drm); drm_mode_config_cleanup(drm); } diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.h b/drivers/gpu/drm/mediatek/mtk_drm_drv.h index 598ff3e704465..e03fea12ff598 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.h +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.h @@ -51,6 +51,8 @@ struct mtk_drm_private { } commit;
struct drm_atomic_state *suspend_state; + + bool dma_parms_allocated; };
extern struct platform_driver mtk_ddp_driver;
[ Upstream commit 26fa656e9a0cbccddf7db132ea020d2169dbe46e ]
If HBA initialization fails unexpectedly (exiting via probe_failed:), we may fail to free vha->gnl.l. So that we don't attempt to double free, set this pointer to NULL after a free and check for NULL at probe_failed: so we know whether or not to call dma_free_coherent.
Signed-off-by: Bill Kuzeja william.kuzeja@stratus.com Acked-by: Himanshu Madhani hmadhani@marvell.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/qla2xxx/qla_attr.c | 2 ++ drivers/scsi/qla2xxx/qla_os.c | 11 ++++++++++- 2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/qla2xxx/qla_attr.c b/drivers/scsi/qla2xxx/qla_attr.c index 8d560c562e9c0..6b7b390b2e522 100644 --- a/drivers/scsi/qla2xxx/qla_attr.c +++ b/drivers/scsi/qla2xxx/qla_attr.c @@ -2956,6 +2956,8 @@ qla24xx_vport_delete(struct fc_vport *fc_vport) dma_free_coherent(&ha->pdev->dev, vha->gnl.size, vha->gnl.l, vha->gnl.ldma);
+ vha->gnl.l = NULL; + vfree(vha->scan.l);
if (vha->qpair && vha->qpair->vp_idx == vha->vp_idx) { diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index d056f5e7cf930..794478e5f7ec8 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -3440,6 +3440,12 @@ skip_dpc: return 0;
probe_failed: + if (base_vha->gnl.l) { + dma_free_coherent(&ha->pdev->dev, base_vha->gnl.size, + base_vha->gnl.l, base_vha->gnl.ldma); + base_vha->gnl.l = NULL; + } + if (base_vha->timer_active) qla2x00_stop_timer(base_vha); base_vha->flags.online = 0; @@ -3673,7 +3679,7 @@ qla2x00_remove_one(struct pci_dev *pdev) if (!atomic_read(&pdev->enable_cnt)) { dma_free_coherent(&ha->pdev->dev, base_vha->gnl.size, base_vha->gnl.l, base_vha->gnl.ldma); - + base_vha->gnl.l = NULL; scsi_host_put(base_vha->host); kfree(ha); pci_set_drvdata(pdev, NULL); @@ -3713,6 +3719,8 @@ qla2x00_remove_one(struct pci_dev *pdev) dma_free_coherent(&ha->pdev->dev, base_vha->gnl.size, base_vha->gnl.l, base_vha->gnl.ldma);
+ base_vha->gnl.l = NULL; + vfree(base_vha->scan.l);
if (IS_QLAFX00(ha)) @@ -4817,6 +4825,7 @@ struct scsi_qla_host *qla2x00_create_host(struct scsi_host_template *sht, "Alloc failed for scan database.\n"); dma_free_coherent(&ha->pdev->dev, vha->gnl.size, vha->gnl.l, vha->gnl.ldma); + vha->gnl.l = NULL; scsi_remove_host(vha->host); return NULL; }
[ Upstream commit a86a75865ff4d8c05f355d1750a5250aec89ab15 ]
In tcmu_handle_completion() function, the variable called read_len is always initialized with a value taken from se_cmd structure. If this function is called to complete an expired (timed out) out command, the session command pointed by se_cmd is likely to be already deallocated by the target core at that moment. As the result, this access triggers a use-after-free warning from KASAN.
This patch fixes the code not to touch se_cmd when completing timed out TCMU commands. It also resets the pointer to se_cmd at the time when the TCMU_CMD_BIT_EXPIRED flag is set because it is going to become invalid after calling target_complete_cmd() later in the same function, tcmu_check_expired_cmd().
Signed-off-by: Dmitry Fomichev dmitry.fomichev@wdc.com Acked-by: Mike Christie mchristi@redhat.com Reviewed-by: Damien Le Moal damien.lemoal@wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/target_core_user.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index b43d6385a1a09..95b2371fb67b6 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -1132,14 +1132,16 @@ static void tcmu_handle_completion(struct tcmu_cmd *cmd, struct tcmu_cmd_entry * struct se_cmd *se_cmd = cmd->se_cmd; struct tcmu_dev *udev = cmd->tcmu_dev; bool read_len_valid = false; - uint32_t read_len = se_cmd->data_length; + uint32_t read_len;
/* * cmd has been completed already from timeout, just reclaim * data area space and free cmd */ - if (test_bit(TCMU_CMD_BIT_EXPIRED, &cmd->flags)) + if (test_bit(TCMU_CMD_BIT_EXPIRED, &cmd->flags)) { + WARN_ON_ONCE(se_cmd); goto out; + }
list_del_init(&cmd->queue_entry);
@@ -1152,6 +1154,7 @@ static void tcmu_handle_completion(struct tcmu_cmd *cmd, struct tcmu_cmd_entry * goto done; }
+ read_len = se_cmd->data_length; if (se_cmd->data_direction == DMA_FROM_DEVICE && (entry->hdr.uflags & TCMU_UFLAG_READ_LEN) && entry->rsp.read_len) { read_len_valid = true; @@ -1307,6 +1310,7 @@ static int tcmu_check_expired_cmd(int id, void *p, void *data) */ scsi_status = SAM_STAT_CHECK_CONDITION; list_del_init(&cmd->queue_entry); + cmd->se_cmd = NULL; } else { list_del_init(&cmd->queue_entry); idr_remove(&udev->commands, id); @@ -2024,6 +2028,7 @@ static void tcmu_reset_ring(struct tcmu_dev *udev, u8 err_level)
idr_remove(&udev->commands, i); if (!test_bit(TCMU_CMD_BIT_EXPIRED, &cmd->flags)) { + WARN_ON(!cmd->se_cmd); list_del_init(&cmd->queue_entry); if (err_level == 1) { /*
[ Upstream commit c554336efa9bbc28d6ec14efbee3c7d63c61a34f ]
In blocked_fl_write(), 't' is not deallocated if bitmap_parse_user() fails, leading to a memory leak bug. To fix this issue, free t before returning the error.
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c index 02959035ed3f2..d692251ee252c 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c @@ -3236,8 +3236,10 @@ static ssize_t blocked_fl_write(struct file *filp, const char __user *ubuf, return -ENOMEM;
err = bitmap_parse_user(ubuf, count, t, adap->sge.egr_sz); - if (err) + if (err) { + kvfree(t); return err; + }
bitmap_copy(adap->sge.blocked_fl, t, adap->sge.egr_sz); kvfree(t);
[ Upstream commit 92cd0f0be3d7adb63611c28693ec0399beded837 ]
This test is only covering various edge cases of the KVM_SET_NESTED_STATE ioctl. Running the VM does not really add anything.
Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../kvm/x86_64/vmx_set_nested_state_test.c | 15 --------------- 1 file changed, 15 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c b/tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c index ed7218d166da6..a99fc66dafeb6 100644 --- a/tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c +++ b/tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c @@ -27,22 +27,13 @@
void test_nested_state(struct kvm_vm *vm, struct kvm_nested_state *state) { - volatile struct kvm_run *run; - vcpu_nested_state_set(vm, VCPU_ID, state, false); - run = vcpu_state(vm, VCPU_ID); - vcpu_run(vm, VCPU_ID); - TEST_ASSERT(run->exit_reason == KVM_EXIT_SHUTDOWN, - "Got exit_reason other than KVM_EXIT_SHUTDOWN: %u (%s),\n", - run->exit_reason, - exit_reason_str(run->exit_reason)); }
void test_nested_state_expect_errno(struct kvm_vm *vm, struct kvm_nested_state *state, int expected_errno) { - volatile struct kvm_run *run; int rv;
rv = vcpu_nested_state_set(vm, VCPU_ID, state, true); @@ -50,12 +41,6 @@ void test_nested_state_expect_errno(struct kvm_vm *vm, "Expected %s (%d) from vcpu_nested_state_set but got rv: %i errno: %s (%d)", strerror(expected_errno), expected_errno, rv, strerror(errno), errno); - run = vcpu_state(vm, VCPU_ID); - vcpu_run(vm, VCPU_ID); - TEST_ASSERT(run->exit_reason == KVM_EXIT_SHUTDOWN, - "Got exit_reason other than KVM_EXIT_SHUTDOWN: %u (%s),\n", - run->exit_reason, - exit_reason_str(run->exit_reason)); }
void test_nested_state_expect_einval(struct kvm_vm *vm,
[ Upstream commit 65efa61dc0d536d5f0602c33ee805a57cc07e9dc ]
There are two tests already enabling eVMCS and a third is coming. Add a function that enables the capability and tests the result.
Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/include/evmcs.h | 2 ++ tools/testing/selftests/kvm/lib/x86_64/vmx.c | 20 +++++++++++++++++++ .../testing/selftests/kvm/x86_64/evmcs_test.c | 15 ++------------ .../selftests/kvm/x86_64/hyperv_cpuid.c | 12 ++++------- 4 files changed, 28 insertions(+), 21 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/evmcs.h b/tools/testing/selftests/kvm/include/evmcs.h index 4059014d93ea1..4912d23844bc6 100644 --- a/tools/testing/selftests/kvm/include/evmcs.h +++ b/tools/testing/selftests/kvm/include/evmcs.h @@ -220,6 +220,8 @@ struct hv_enlightened_vmcs { struct hv_enlightened_vmcs *current_evmcs; struct hv_vp_assist_page *current_vp_assist;
+int vcpu_enable_evmcs(struct kvm_vm *vm, int vcpu_id); + static inline int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist) { u64 val = (vp_assist_pa & HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK) | diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c index fe56d159d65fd..52b6491ed7061 100644 --- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c +++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c @@ -14,6 +14,26 @@
bool enable_evmcs;
+int vcpu_enable_evmcs(struct kvm_vm *vm, int vcpu_id) +{ + uint16_t evmcs_ver; + + struct kvm_enable_cap enable_evmcs_cap = { + .cap = KVM_CAP_HYPERV_ENLIGHTENED_VMCS, + .args[0] = (unsigned long)&evmcs_ver + }; + + vcpu_ioctl(vm, vcpu_id, KVM_ENABLE_CAP, &enable_evmcs_cap); + + /* KVM should return supported EVMCS version range */ + TEST_ASSERT(((evmcs_ver >> 8) >= (evmcs_ver & 0xff)) && + (evmcs_ver & 0xff) > 0, + "Incorrect EVMCS version range: %x:%x\n", + evmcs_ver & 0xff, evmcs_ver >> 8); + + return evmcs_ver; +} + /* Allocate memory regions for nested VMX tests. * * Input Args: diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/evmcs_test.c index 241919ef1eaca..9f250c39c9bb8 100644 --- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c +++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c @@ -79,11 +79,6 @@ int main(int argc, char *argv[]) struct kvm_x86_state *state; struct ucall uc; int stage; - uint16_t evmcs_ver; - struct kvm_enable_cap enable_evmcs_cap = { - .cap = KVM_CAP_HYPERV_ENLIGHTENED_VMCS, - .args[0] = (unsigned long)&evmcs_ver - };
/* Create VM */ vm = vm_create_default(VCPU_ID, 0, guest_code); @@ -96,13 +91,7 @@ int main(int argc, char *argv[]) exit(KSFT_SKIP); }
- vcpu_ioctl(vm, VCPU_ID, KVM_ENABLE_CAP, &enable_evmcs_cap); - - /* KVM should return supported EVMCS version range */ - TEST_ASSERT(((evmcs_ver >> 8) >= (evmcs_ver & 0xff)) && - (evmcs_ver & 0xff) > 0, - "Incorrect EVMCS version range: %x:%x\n", - evmcs_ver & 0xff, evmcs_ver >> 8); + vcpu_enable_evmcs(vm, VCPU_ID);
run = vcpu_state(vm, VCPU_ID);
@@ -146,7 +135,7 @@ int main(int argc, char *argv[]) kvm_vm_restart(vm, O_RDWR); vm_vcpu_add(vm, VCPU_ID, 0, 0); vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); - vcpu_ioctl(vm, VCPU_ID, KVM_ENABLE_CAP, &enable_evmcs_cap); + vcpu_enable_evmcs(vm, VCPU_ID); vcpu_load_state(vm, VCPU_ID, state); run = vcpu_state(vm, VCPU_ID); free(state); diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c b/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c index f72b3043db0eb..ee59831fbc984 100644 --- a/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c +++ b/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c @@ -18,6 +18,7 @@ #include "test_util.h" #include "kvm_util.h" #include "processor.h" +#include "vmx.h"
#define VCPU_ID 0
@@ -106,12 +107,7 @@ int main(int argc, char *argv[]) { struct kvm_vm *vm; int rv; - uint16_t evmcs_ver; struct kvm_cpuid2 *hv_cpuid_entries; - struct kvm_enable_cap enable_evmcs_cap = { - .cap = KVM_CAP_HYPERV_ENLIGHTENED_VMCS, - .args[0] = (unsigned long)&evmcs_ver - };
/* Tell stdout not to buffer its content */ setbuf(stdout, NULL); @@ -136,14 +132,14 @@ int main(int argc, char *argv[])
free(hv_cpuid_entries);
- rv = _vcpu_ioctl(vm, VCPU_ID, KVM_ENABLE_CAP, &enable_evmcs_cap); - - if (rv) { + if (!kvm_check_cap(KVM_CAP_HYPERV_ENLIGHTENED_VMCS)) { fprintf(stderr, "Enlightened VMCS is unsupported, skip related test\n"); goto vm_free; }
+ vcpu_enable_evmcs(vm, VCPU_ID); + hv_cpuid_entries = kvm_get_supported_hv_cpuid(vm); if (!hv_cpuid_entries) return 1;
[ Upstream commit c930e19790bbbff31c018009907c813fa0925f63 ]
vmx_set_nested_state_test is trying to use the KVM_STATE_NESTED_EVMCS without enabling enlightened VMCS first. Correct the outcome of the test, and actually test that it succeeds after the capability is enabled.
Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../kvm/x86_64/vmx_set_nested_state_test.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c b/tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c index a99fc66dafeb6..853e370e8a394 100644 --- a/tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c +++ b/tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c @@ -25,6 +25,8 @@ #define VMCS12_REVISION 0x11e57ed0 #define VCPU_ID 5
+bool have_evmcs; + void test_nested_state(struct kvm_vm *vm, struct kvm_nested_state *state) { vcpu_nested_state_set(vm, VCPU_ID, state, false); @@ -75,8 +77,9 @@ void set_default_vmx_state(struct kvm_nested_state *state, int size) { memset(state, 0, size); state->flags = KVM_STATE_NESTED_GUEST_MODE | - KVM_STATE_NESTED_RUN_PENDING | - KVM_STATE_NESTED_EVMCS; + KVM_STATE_NESTED_RUN_PENDING; + if (have_evmcs) + state->flags |= KVM_STATE_NESTED_EVMCS; state->format = 0; state->size = size; state->hdr.vmx.vmxon_pa = 0x1000; @@ -126,13 +129,19 @@ void test_vmx_nested_state(struct kvm_vm *vm) /* * Setting vmxon_pa == -1ull and vmcs_pa == -1ull exits early without * setting the nested state but flags other than eVMCS must be clear. + * The eVMCS flag can be set if the enlightened VMCS capability has + * been enabled. */ set_default_vmx_state(state, state_sz); state->hdr.vmx.vmxon_pa = -1ull; state->hdr.vmx.vmcs12_pa = -1ull; test_nested_state_expect_einval(vm, state);
- state->flags = KVM_STATE_NESTED_EVMCS; + state->flags &= KVM_STATE_NESTED_EVMCS; + if (have_evmcs) { + test_nested_state_expect_einval(vm, state); + vcpu_enable_evmcs(vm, VCPU_ID); + } test_nested_state(vm, state);
/* It is invalid to have vmxon_pa == -1ull and SMM flags non-zero. */ @@ -217,6 +226,8 @@ int main(int argc, char *argv[]) struct kvm_nested_state state; struct kvm_cpuid_entry2 *entry = kvm_get_supported_cpuid_entry(1);
+ have_evmcs = kvm_check_cap(KVM_CAP_HYPERV_ENLIGHTENED_VMCS); + if (!kvm_check_cap(KVM_CAP_NESTED_STATE)) { printf("KVM_CAP_NESTED_STATE not available, skipping test\n"); exit(KSFT_SKIP);
[ Upstream commit 6f967f8b1be7001b31c46429f2ee7d275af2190f ]
If oct->fn_list.enable_io_queues() fails, no cleanup is executed, leading to memory/resource leaks. To fix this issue, invoke octeon_delete_instr_queue() before returning from the function.
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cavium/liquidio/request_manager.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c b/drivers/net/ethernet/cavium/liquidio/request_manager.c index fcf20a8f92d94..6a823710987da 100644 --- a/drivers/net/ethernet/cavium/liquidio/request_manager.c +++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c @@ -239,8 +239,10 @@ int octeon_setup_iq(struct octeon_device *oct, }
oct->num_iqs++; - if (oct->fn_list.enable_io_queues(oct)) + if (oct->fn_list.enable_io_queues(oct)) { + octeon_delete_instr_queue(oct, iq_no); return 1; + }
return 0; }
[ Upstream commit 20fb7c7a39b5c719e2e619673b5f5729ee7d2306 ]
In myri10ge_probe(), myri10ge_alloc_slices() is invoked to allocate slices related structures. Later on, myri10ge_request_irq() is used to get an irq. However, if this process fails, the allocated slices related structures are not deallocated, leading to memory leaks. To fix this issue, revise the target label of the goto statement to 'abort_with_slices'.
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c index d8b7fba96d58e..337b0cbfd153e 100644 --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c @@ -3919,7 +3919,7 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent) * setup (if available). */ status = myri10ge_request_irq(mgp); if (status != 0) - goto abort_with_firmware; + goto abort_with_slices; myri10ge_free_irq(mgp);
/* Save configuration space to be restored if the
[ Upstream commit 4f8c6aba37da199155a121c6cdc38505a9eb0259 ]
Calls to clk_core_get() will return ERR_PTR(-EINVAL) if we've started migrating a clk driver to use the DT based style of specifying parents but we haven't made any DT updates yet. This happens when we pass a non-NULL value as the 'name' argument of of_parse_clkspec(). That function returns -EINVAL in such a situation, instead of -ENOENT like we expected. The return value comes back up to clk_core_fill_parent_index() which proceeds to skip calling clk_core_lookup() because the error pointer isn't equal to -ENOENT, it's -EINVAL.
Furthermore, we blindly overwrite the error pointer returned by clk_core_get() with NULL when there isn't a legacy .name member specified in the parent map. This isn't too bad right now because we don't really care to differentiate NULL from an error, but in the future we should only try to do a legacy lookup if we know we might find something. This way DT lookups that fail don't try to lookup based on strings when there isn't any string to match, hiding the error from DT parsing.
Fix both these problems so that clk provider drivers can use the new style of parent mapping without having to also update their DT at the same time. This patch is based on an earlier patch from Taniya Das which checked for -EINVAL in addition to -ENOENT return values from clk_core_get().
Fixes: 601b6e93304a ("clk: Allow parents to be specified via clkspec index") Cc: Taniya Das tdas@codeaurora.org Cc: Jerome Brunet jbrunet@baylibre.com Cc: Chen-Yu Tsai wens@csie.org Reported-by: Taniya Das tdas@codeaurora.org Signed-off-by: Stephen Boyd sboyd@kernel.org Link: https://lkml.kernel.org/r/20190813214147.34394-1-sboyd@kernel.org Tested-by: Taniya Das tdas@codeaurora.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/clk.c | 46 ++++++++++++++++++++++++++++++++++------------ 1 file changed, 34 insertions(+), 12 deletions(-)
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 87b410d6e51de..498cd7bbe8984 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -324,6 +324,25 @@ static struct clk_core *clk_core_lookup(const char *name) return NULL; }
+#ifdef CONFIG_OF +static int of_parse_clkspec(const struct device_node *np, int index, + const char *name, struct of_phandle_args *out_args); +static struct clk_hw * +of_clk_get_hw_from_clkspec(struct of_phandle_args *clkspec); +#else +static inline int of_parse_clkspec(const struct device_node *np, int index, + const char *name, + struct of_phandle_args *out_args) +{ + return -ENOENT; +} +static inline struct clk_hw * +of_clk_get_hw_from_clkspec(struct of_phandle_args *clkspec) +{ + return ERR_PTR(-ENOENT); +} +#endif + /** * clk_core_get - Find the clk_core parent of a clk * @core: clk to find parent of @@ -355,8 +374,9 @@ static struct clk_core *clk_core_lookup(const char *name) * }; * * Returns: -ENOENT when the provider can't be found or the clk doesn't - * exist in the provider. -EINVAL when the name can't be found. NULL when the - * provider knows about the clk but it isn't provided on this system. + * exist in the provider or the name can't be found in the DT node or + * in a clkdev lookup. NULL when the provider knows about the clk but it + * isn't provided on this system. * A valid clk_core pointer when the clk can be found in the provider. */ static struct clk_core *clk_core_get(struct clk_core *core, u8 p_index) @@ -367,17 +387,19 @@ static struct clk_core *clk_core_get(struct clk_core *core, u8 p_index) struct device *dev = core->dev; const char *dev_id = dev ? dev_name(dev) : NULL; struct device_node *np = core->of_node; + struct of_phandle_args clkspec;
- if (np && (name || index >= 0)) - hw = of_clk_get_hw(np, index, name); - - /* - * If the DT search above couldn't find the provider or the provider - * didn't know about this clk, fallback to looking up via clkdev based - * clk_lookups - */ - if (PTR_ERR(hw) == -ENOENT && name) + if (np && (name || index >= 0) && + !of_parse_clkspec(np, index, name, &clkspec)) { + hw = of_clk_get_hw_from_clkspec(&clkspec); + of_node_put(clkspec.np); + } else if (name) { + /* + * If the DT search above couldn't find the provider fallback to + * looking up via clkdev based clk_lookups. + */ hw = clk_find_hw(dev_id, name); + }
if (IS_ERR(hw)) return ERR_CAST(hw); @@ -401,7 +423,7 @@ static void clk_core_fill_parent_index(struct clk_core *core, u8 index) parent = ERR_PTR(-EPROBE_DEFER); } else { parent = clk_core_get(core, index); - if (IS_ERR(parent) && PTR_ERR(parent) == -ENOENT) + if (IS_ERR(parent) && PTR_ERR(parent) == -ENOENT && entry->name) parent = clk_core_lookup(entry->name); }
[ Upstream commit 24876f09a7dfe36a82f53d304d8c1bceb3257a0f ]
Don't compare the parent clock name with a NULL name in the clk_parent_map. This prevents a kernel crash when passing NULL core->parents[i].name to strcmp().
An example which triggered this is a mux clock with four parents when each of them is referenced in the clock driver using clk_parent_data.fw_name and then calling clk_set_parent(clk, 3rd_parent) on this mux. In this case the first parent is also the HW default so core->parents[i].hw is populated when the clock is registered. Calling clk_set_parent(clk, 3rd_parent) will then go through all parents and skip the first parent because it's hw pointer doesn't match. For the second parent no hw pointer is cached yet and clk_core_get(core, 1) returns a non-matching pointer (which is correct because we are comparing the second with the third parent). Comparing the result of clk_core_get(core, 2) with the requested parent gives a match. However we don't reach this point because right after the clk_core_get(core, 1) mismatch the old code tried to !strcmp(parent->name, NULL) (where the second argument is actually core->parents[i].name, but that was never populated by the clock driver).
Signed-off-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Link: https://lkml.kernel.org/r/20190815223155.21384-1-martin.blumenstingl@googlem... Fixes: fc0c209c147f ("clk: Allow parents to be specified without string names") Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/clk.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 498cd7bbe8984..3a4961dc58313 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -1657,7 +1657,8 @@ static int clk_fetch_parent_index(struct clk_core *core, break;
/* Fallback to comparing globally unique names */ - if (!strcmp(parent->name, core->parents[i].name)) + if (core->parents[i].name && + !strcmp(parent->name, core->parents[i].name)) break; }
[ Upstream commit b9cbf8a64865b50fd0f4a3915fa00ac7365cdf8f ]
In lan78xx_probe(), a new urb is allocated through usb_alloc_urb() and saved to 'dev->urb_intr'. However, in the following execution, if an error occurs, 'dev->urb_intr' is not deallocated, leading to memory leaks. To fix this issue, invoke usb_free_urb() to free the allocated urb before returning from the function.
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/lan78xx.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index 3d92ea6fcc02b..f033fee225a11 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -3792,7 +3792,7 @@ static int lan78xx_probe(struct usb_interface *intf, ret = register_netdev(netdev); if (ret != 0) { netif_err(dev, probe, netdev, "couldn't register the device\n"); - goto out3; + goto out4; }
usb_set_intfdata(intf, dev); @@ -3807,12 +3807,14 @@ static int lan78xx_probe(struct usb_interface *intf,
ret = lan78xx_phy_init(dev); if (ret < 0) - goto out4; + goto out5;
return 0;
-out4: +out5: unregister_netdev(netdev); +out4: + usb_free_urb(dev->urb_intr); out3: lan78xx_unbind(dev, intf); out2:
[ Upstream commit edc58dd0123b552453a74369bd0c8d890b497b4b ]
When dedupe wants to use the page cache to compare parts of two files for dedupe, we must be very careful to handle locking correctly. The current code doesn't do this. It must lock and unlock the page only once if the two pages are the same, since the overlapping range check doesn't catch this when blocksize < pagesize. If the pages are distinct but from the same file, we must observe page locking order and lock them in order of increasing offset to avoid clashing with writeback locking.
Fixes: 876bec6f9bbfcb3 ("vfs: refactor clone/dedupe_file_range common functions") Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Bill O'Donnell billodo@redhat.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/read_write.c | 49 +++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 41 insertions(+), 8 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c index c543d965e2880..e8b0f1192a3a4 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1776,10 +1776,7 @@ static int generic_remap_check_len(struct inode *inode_in, return (remap_flags & REMAP_FILE_DEDUP) ? -EBADE : -EINVAL; }
-/* - * Read a page's worth of file data into the page cache. Return the page - * locked. - */ +/* Read a page's worth of file data into the page cache. */ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset) { struct page *page; @@ -1791,10 +1788,32 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset) put_page(page); return ERR_PTR(-EIO); } - lock_page(page); return page; }
+/* + * Lock two pages, ensuring that we lock in offset order if the pages are from + * the same file. + */ +static void vfs_lock_two_pages(struct page *page1, struct page *page2) +{ + /* Always lock in order of increasing index. */ + if (page1->index > page2->index) + swap(page1, page2); + + lock_page(page1); + if (page1 != page2) + lock_page(page2); +} + +/* Unlock two pages, being careful not to unlock the same page twice. */ +static void vfs_unlock_two_pages(struct page *page1, struct page *page2) +{ + unlock_page(page1); + if (page1 != page2) + unlock_page(page2); +} + /* * Compare extents of two files to see if they are the same. * Caller must have locked both inodes to prevent write races. @@ -1832,10 +1851,24 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, dest_page = vfs_dedupe_get_page(dest, destoff); if (IS_ERR(dest_page)) { error = PTR_ERR(dest_page); - unlock_page(src_page); put_page(src_page); goto out_error; } + + vfs_lock_two_pages(src_page, dest_page); + + /* + * Now that we've locked both pages, make sure they're still + * mapped to the file data we're interested in. If not, + * someone is invalidating pages on us and we lose. + */ + if (!PageUptodate(src_page) || !PageUptodate(dest_page) || + src_page->mapping != src->i_mapping || + dest_page->mapping != dest->i_mapping) { + same = false; + goto unlock; + } + src_addr = kmap_atomic(src_page); dest_addr = kmap_atomic(dest_page);
@@ -1847,8 +1880,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
kunmap_atomic(dest_addr); kunmap_atomic(src_addr); - unlock_page(dest_page); - unlock_page(src_page); +unlock: + vfs_unlock_two_pages(src_page, dest_page); put_page(dest_page); put_page(src_page);
[ Upstream commit 1eca92eef18719027d394bf1a2d276f43e7cf886 ]
In cx82310_bind(), 'dev->partial_data' is allocated through kmalloc(). Then, the execution waits for the firmware to become ready. If the firmware is not ready in time, the execution is terminated. However, the allocated 'dev->partial_data' is not deallocated on this path, leading to a memory leak bug. To fix this issue, free 'dev->partial_data' before returning the error.
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/cx82310_eth.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/usb/cx82310_eth.c b/drivers/net/usb/cx82310_eth.c index 5519248a791eb..32b08b18e1208 100644 --- a/drivers/net/usb/cx82310_eth.c +++ b/drivers/net/usb/cx82310_eth.c @@ -163,7 +163,8 @@ static int cx82310_bind(struct usbnet *dev, struct usb_interface *intf) } if (!timeout) { dev_err(&udev->dev, "firmware not ready in time\n"); - return -ETIMEDOUT; + ret = -ETIMEDOUT; + goto err; }
/* enable ethernet mode (?) */
[ Upstream commit f1472cb09f11ddb41d4be84f0650835cb65a9073 ]
In kalmia_init_and_get_ethernet_addr(), 'usb_buf' is allocated through kmalloc(). In the following execution, if the 'status' returned by kalmia_send_init_packet() is not 0, 'usb_buf' is not deallocated, leading to memory leaks. To fix this issue, add the 'out' label to free 'usb_buf'.
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/kalmia.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/usb/kalmia.c b/drivers/net/usb/kalmia.c index d62b6706a5376..fc5895f85cee2 100644 --- a/drivers/net/usb/kalmia.c +++ b/drivers/net/usb/kalmia.c @@ -113,16 +113,16 @@ kalmia_init_and_get_ethernet_addr(struct usbnet *dev, u8 *ethernet_addr) status = kalmia_send_init_packet(dev, usb_buf, ARRAY_SIZE(init_msg_1), usb_buf, 24); if (status != 0) - return status; + goto out;
memcpy(usb_buf, init_msg_2, 12); status = kalmia_send_init_packet(dev, usb_buf, ARRAY_SIZE(init_msg_2), usb_buf, 28); if (status != 0) - return status; + goto out;
memcpy(ethernet_addr, usb_buf + 10, ETH_ALEN); - +out: kfree(usb_buf); return status; }
[ Upstream commit 80f0fe0934cd3daa13a5e4d48a103f469115b160 ]
There's no need to wait until a completion is received to unmap TX descriptor buffers that have been passed to the hypervisor. Instead unmap it when the hypervisor call has completed. This patch avoids the possibility that a buffer will not be unmapped because a TX completion is lost or mishandled.
Reported-by: Abdul Haleem abdhalee@linux.vnet.ibm.com Tested-by: Devesh K. Singh devesh_singh@in.ibm.com Signed-off-by: Thomas Falcon tlfalcon@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 3da6800732656..cebd20f3128d4 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1568,6 +1568,8 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num], (u64)tx_buff->indir_dma, (u64)num_entries); + dma_unmap_single(dev, tx_buff->indir_dma, + sizeof(tx_buff->indir_arr), DMA_TO_DEVICE); } else { tx_buff->num_entries = num_entries; lpar_rc = send_subcrq(adapter, handle_array[queue_num], @@ -2788,7 +2790,6 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter *adapter, union sub_crq *next; int index; int i, j; - u8 *first;
restart_loop: while (pending_scrq(adapter, scrq)) { @@ -2818,14 +2819,6 @@ restart_loop:
txbuff->data_dma[j] = 0; } - /* if sub_crq was sent indirectly */ - first = &txbuff->indir_arr[0].generic.first; - if (*first == IBMVNIC_CRQ_CMD) { - dma_unmap_single(dev, txbuff->indir_dma, - sizeof(txbuff->indir_arr), - DMA_TO_DEVICE); - *first = 0; - }
if (txbuff->last_frag) { dev_kfree_skb_any(txbuff->skb);
[ Upstream commit 3434341004a380f4e47c3a03d4320d43982162a0 ]
The driver name gets exposed in sysfs under /sys/bus/pci/drivers so it should look like other devices. Change it to be common format (instead of "Cavium PTP").
This is a trivial fix that was observed by accident because Debian kernels were building this driver into kernel (bug).
Signed-off-by: Stephen Hemminger stephen@networkplumber.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cavium/common/cavium_ptp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cavium/common/cavium_ptp.c b/drivers/net/ethernet/cavium/common/cavium_ptp.c index 73632b8437498..b821c9e1604cf 100644 --- a/drivers/net/ethernet/cavium/common/cavium_ptp.c +++ b/drivers/net/ethernet/cavium/common/cavium_ptp.c @@ -10,7 +10,7 @@
#include "cavium_ptp.h"
-#define DRV_NAME "Cavium PTP Driver" +#define DRV_NAME "cavium_ptp"
#define PCI_DEVICE_ID_CAVIUM_PTP 0xA00C #define PCI_DEVICE_ID_CAVIUM_RST 0xA00E
[ Upstream commit 44ef3a03252844a8753479b0cea7f29e4a804bdc ]
In i2400m_barker_db_init(), 'options_orig' is allocated through kstrdup() to hold the original command line options. Then, the options are parsed. However, if an error occurs during the parsing process, 'options_orig' is not deallocated, leading to a memory leak bug. To fix this issue, free 'options_orig' before returning the error.
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wimax/i2400m/fw.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wimax/i2400m/fw.c b/drivers/net/wimax/i2400m/fw.c index e9fc168bb7345..489cba9b284d1 100644 --- a/drivers/net/wimax/i2400m/fw.c +++ b/drivers/net/wimax/i2400m/fw.c @@ -351,13 +351,15 @@ int i2400m_barker_db_init(const char *_options) } result = i2400m_barker_db_add(barker); if (result < 0) - goto error_add; + goto error_parse_add; } kfree(options_orig); } return 0;
+error_parse_add: error_parse: + kfree(options_orig); error_add: kfree(i2400m_barker_db); return result;
[ Upstream commit cfef46d692efd852a0da6803f920cc756eea2855 ]
When a Tx timestamp is requested, a pointer to the skb is stored in the ravb_tstamp_skb struct. This was done without an skb_get. There exists the possibility that the skb could be freed by ravb_tx_free (when ravb_tx_free is called from ravb_start_xmit) before the timestamp was processed, leading to a use-after-free bug.
Use skb_get when filling a ravb_tstamp_skb struct, and add appropriate frees/consumes when a ravb_tstamp_skb struct is freed.
Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Tho Vu tho.vu.wh@rvc.renesas.com Signed-off-by: Kazuya Mizuguchi kazuya.mizuguchi.ks@renesas.com Signed-off-by: Simon Horman horms+renesas@verge.net.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/ravb_main.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index ef8f08931fe8b..6cacd5e893aca 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Renesas Ethernet AVB device driver * - * Copyright (C) 2014-2015 Renesas Electronics Corporation + * Copyright (C) 2014-2019 Renesas Electronics Corporation * Copyright (C) 2015 Renesas Solutions Corp. * Copyright (C) 2015-2016 Cogent Embedded, Inc. source@cogentembedded.com * @@ -513,7 +513,10 @@ static void ravb_get_tx_tstamp(struct net_device *ndev) kfree(ts_skb); if (tag == tfa_tag) { skb_tstamp_tx(skb, &shhwtstamps); + dev_consume_skb_any(skb); break; + } else { + dev_kfree_skb_any(skb); } } ravb_modify(ndev, TCCR, TCCR_TFR, TCCR_TFR); @@ -1564,7 +1567,7 @@ static netdev_tx_t ravb_start_xmit(struct sk_buff *skb, struct net_device *ndev) } goto unmap; } - ts_skb->skb = skb; + ts_skb->skb = skb_get(skb); ts_skb->tag = priv->ts_skb_tag++; priv->ts_skb_tag &= 0x3ff; list_add_tail(&ts_skb->list, &priv->ts_skb_list); @@ -1693,6 +1696,7 @@ static int ravb_close(struct net_device *ndev) /* Clear the timestamp list */ list_for_each_entry_safe(ts_skb, ts_skb2, &priv->ts_skb_list, list) { list_del(&ts_skb->list); + kfree_skb(ts_skb->skb); kfree(ts_skb); }
[ Upstream commit b0fdc01354f45d43f082025636ef808968a27b36 ]
If a task is PI-blocked (blocking on sleeping spinlock) then we don't want to schedule a new kworker if we schedule out due to lock contention because !RT does not do that as well. A spinning spinlock disables preemption and a worker does not schedule out on lock contention (but spin).
On RT the RW-semaphore implementation uses an rtmutex so tsk_is_pi_blocked() will return true if a task blocks on it. In this case we will now start a new worker which may deadlock if one worker is waiting on progress from another worker. Since a RW-semaphore starts a new worker on !RT, we should do the same on RT.
XFS is able to trigger this deadlock.
Allow to schedule new worker if the current worker is PI-blocked.
Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20190816160626.12742-1-bigeasy@linutronix.de Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4d5962232a553..42bc2986520d7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3469,7 +3469,7 @@ void __noreturn do_task_dead(void)
static inline void sched_submit_work(struct task_struct *tsk) { - if (!tsk->state || tsk_is_pi_blocked(tsk)) + if (!tsk->state) return;
/* @@ -3485,6 +3485,9 @@ static inline void sched_submit_work(struct task_struct *tsk) preempt_enable_no_resched(); }
+ if (tsk_is_pi_blocked(tsk)) + return; + /* * If we are going to sleep and we have plugged IO queued, * make sure to submit it to avoid deadlocks.
[ Upstream commit f1c6ece23729257fb46562ff9224cf5f61b818da ]
lockdep reports the following deadlock scenario:
WARNING: possible circular locking dependency detected
kworker/1:1/48 is trying to acquire lock: 000000008d7a62b2 (text_mutex){+.+.}, at: kprobe_optimizer+0x163/0x290
but task is already holding lock: 00000000850b5e2d (module_mutex){+.+.}, at: kprobe_optimizer+0x31/0x290
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (module_mutex){+.+.}: __mutex_lock+0xac/0x9f0 mutex_lock_nested+0x1b/0x20 set_all_modules_text_rw+0x22/0x90 ftrace_arch_code_modify_prepare+0x1c/0x20 ftrace_run_update_code+0xe/0x30 ftrace_startup_enable+0x2e/0x50 ftrace_startup+0xa7/0x100 register_ftrace_function+0x27/0x70 arm_kprobe+0xb3/0x130 enable_kprobe+0x83/0xa0 enable_trace_kprobe.part.0+0x2e/0x80 kprobe_register+0x6f/0xc0 perf_trace_event_init+0x16b/0x270 perf_kprobe_init+0xa7/0xe0 perf_kprobe_event_init+0x3e/0x70 perf_try_init_event+0x4a/0x140 perf_event_alloc+0x93a/0xde0 __do_sys_perf_event_open+0x19f/0xf30 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x65/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (text_mutex){+.+.}: __lock_acquire+0xfcb/0x1b60 lock_acquire+0xca/0x1d0 __mutex_lock+0xac/0x9f0 mutex_lock_nested+0x1b/0x20 kprobe_optimizer+0x163/0x290 process_one_work+0x22b/0x560 worker_thread+0x50/0x3c0 kthread+0x112/0x150 ret_from_fork+0x3a/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(module_mutex); lock(text_mutex); lock(module_mutex); lock(text_mutex);
*** DEADLOCK ***
As a reproducer I've been using bcc's funccount.py (https://github.com/iovisor/bcc/blob/master/tools/funccount.py), for example:
# ./funccount.py '*interrupt*'
That immediately triggers the lockdep splat.
Fix by acquiring text_mutex before module_mutex in kprobe_optimizer().
Signed-off-by: Andrea Righi andrea.righi@canonical.com Acked-by: Masami Hiramatsu mhiramat@kernel.org Cc: Anil S Keshavamurthy anil.s.keshavamurthy@intel.com Cc: David S. Miller davem@davemloft.net Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Naveen N. Rao naveen.n.rao@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Fixes: d5b844a2cf50 ("ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()") Link: http://lkml.kernel.org/r/20190812184302.GA7010@xps-13 Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/kprobes.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 445337c107e0f..2504c269e6583 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -470,6 +470,7 @@ static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer); */ static void do_optimize_kprobes(void) { + lockdep_assert_held(&text_mutex); /* * The optimization/unoptimization refers online_cpus via * stop_machine() and cpu-hotplug modifies online_cpus. @@ -487,9 +488,7 @@ static void do_optimize_kprobes(void) list_empty(&optimizing_list)) return;
- mutex_lock(&text_mutex); arch_optimize_kprobes(&optimizing_list); - mutex_unlock(&text_mutex); }
/* @@ -500,6 +499,7 @@ static void do_unoptimize_kprobes(void) { struct optimized_kprobe *op, *tmp;
+ lockdep_assert_held(&text_mutex); /* See comment in do_optimize_kprobes() */ lockdep_assert_cpus_held();
@@ -507,7 +507,6 @@ static void do_unoptimize_kprobes(void) if (list_empty(&unoptimizing_list)) return;
- mutex_lock(&text_mutex); arch_unoptimize_kprobes(&unoptimizing_list, &freeing_list); /* Loop free_list for disarming */ list_for_each_entry_safe(op, tmp, &freeing_list, list) { @@ -524,7 +523,6 @@ static void do_unoptimize_kprobes(void) } else list_del_init(&op->list); } - mutex_unlock(&text_mutex); }
/* Reclaim all kprobes on the free_list */ @@ -556,6 +554,7 @@ static void kprobe_optimizer(struct work_struct *work) { mutex_lock(&kprobe_mutex); cpus_read_lock(); + mutex_lock(&text_mutex); /* Lock modules while optimizing kprobes */ mutex_lock(&module_mutex);
@@ -583,6 +582,7 @@ static void kprobe_optimizer(struct work_struct *work) do_free_cleaned_kprobes();
mutex_unlock(&module_mutex); + mutex_unlock(&text_mutex); cpus_read_unlock(); mutex_unlock(&kprobe_mutex);
[ Upstream commit b640be5bc8e4673dc8049cf74176ddedecea5597 ]
EHL is a new platform using ishtp solution, add its device id to support list.
Signed-off-by: Even Xu even.xu@intel.com Acked-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/intel-ish-hid/ipc/hw-ish.h | 1 + drivers/hid/intel-ish-hid/ipc/pci-ish.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/drivers/hid/intel-ish-hid/ipc/hw-ish.h b/drivers/hid/intel-ish-hid/ipc/hw-ish.h index 1065692f90e20..5792a104000a9 100644 --- a/drivers/hid/intel-ish-hid/ipc/hw-ish.h +++ b/drivers/hid/intel-ish-hid/ipc/hw-ish.h @@ -24,6 +24,7 @@ #define ICL_MOBILE_DEVICE_ID 0x34FC #define SPT_H_DEVICE_ID 0xA135 #define CML_LP_DEVICE_ID 0x02FC +#define EHL_Ax_DEVICE_ID 0x4BB3
#define REVISION_ID_CHT_A0 0x6 #define REVISION_ID_CHT_Ax_SI 0x0 diff --git a/drivers/hid/intel-ish-hid/ipc/pci-ish.c b/drivers/hid/intel-ish-hid/ipc/pci-ish.c index 17ae49fba920f..8cce3cfe28e08 100644 --- a/drivers/hid/intel-ish-hid/ipc/pci-ish.c +++ b/drivers/hid/intel-ish-hid/ipc/pci-ish.c @@ -33,6 +33,7 @@ static const struct pci_device_id ish_pci_tbl[] = { {PCI_DEVICE(PCI_VENDOR_ID_INTEL, ICL_MOBILE_DEVICE_ID)}, {PCI_DEVICE(PCI_VENDOR_ID_INTEL, SPT_H_DEVICE_ID)}, {PCI_DEVICE(PCI_VENDOR_ID_INTEL, CML_LP_DEVICE_ID)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, EHL_Ax_DEVICE_ID)}, {0, } }; MODULE_DEVICE_TABLE(pci, ish_pci_tbl);
[ Upstream commit 2d05dba2b25ecb0f8fc3a0b4eb2232da6454a47b ]
When calling request_threaded_irq() with a CP2112, the function cp2112_gpio_irq_startup() is called in a IRQ context.
Therefore we can not sleep, and we can not call cp2112_gpio_direction_input() there.
Move the call to cp2112_gpio_direction_input() earlier to have a working driver.
Signed-off-by: Benjamin Tissoires benjamin.tissoires@redhat.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-cp2112.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/hid/hid-cp2112.c b/drivers/hid/hid-cp2112.c index 8bbe3d0cbe5d9..8fd44407a0df7 100644 --- a/drivers/hid/hid-cp2112.c +++ b/drivers/hid/hid-cp2112.c @@ -1152,8 +1152,6 @@ static unsigned int cp2112_gpio_irq_startup(struct irq_data *d)
INIT_DELAYED_WORK(&dev->gpio_poll_worker, cp2112_gpio_poll_callback);
- cp2112_gpio_direction_input(gc, d->hwirq); - if (!dev->gpio_poll) { dev->gpio_poll = true; schedule_delayed_work(&dev->gpio_poll_worker, 0); @@ -1201,6 +1199,12 @@ static int __maybe_unused cp2112_allocate_irq(struct cp2112_device *dev, return PTR_ERR(dev->desc[pin]); }
+ ret = cp2112_gpio_direction_input(&dev->gc, pin); + if (ret < 0) { + dev_err(dev->gc.parent, "Failed to set GPIO to input dir\n"); + goto err_desc; + } + ret = gpiochip_lock_as_irq(&dev->gc, pin); if (ret) { dev_err(dev->gc.parent, "Failed to lock GPIO as interrupt\n");
[ Upstream commit 0a46fff2f9108c2c44218380a43a736cf4612541 ]
BIOS on Samsung 500C Chromebook reports very rudimentary E820 table that consists of 2 entries:
BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] usable BIOS-e820: [mem 0x00000000fffff000-0x00000000ffffffff] reserved
It breaks logic in find_trampoline_placement(): bios_start lands on the end of the first 4k page and trampoline start gets placed below 0.
Detect underflow and don't touch bios_start for such cases. It makes kernel ignore E820 table on machines that doesn't have two usable pages below BIOS_START_MAX.
Fixes: 1b3a62643660 ("x86/boot/compressed/64: Validate trampoline placement against E820") Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: x86-ml x86@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=203463 Link: https://lkml.kernel.org/r/20190813131654.24378-1-kirill.shutemov@linux.intel... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/boot/compressed/pgtable_64.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c index f8debf7aeb4c1..f0537a1f7fc25 100644 --- a/arch/x86/boot/compressed/pgtable_64.c +++ b/arch/x86/boot/compressed/pgtable_64.c @@ -73,6 +73,8 @@ static unsigned long find_trampoline_placement(void)
/* Find the first usable memory region under bios_start. */ for (i = boot_params->e820_entries - 1; i >= 0; i--) { + unsigned long new; + entry = &boot_params->e820_table[i];
/* Skip all entries above bios_start. */ @@ -85,15 +87,20 @@ static unsigned long find_trampoline_placement(void)
/* Adjust bios_start to the end of the entry if needed. */ if (bios_start > entry->addr + entry->size) - bios_start = entry->addr + entry->size; + new = entry->addr + entry->size;
/* Keep bios_start page-aligned. */ - bios_start = round_down(bios_start, PAGE_SIZE); + new = round_down(new, PAGE_SIZE);
/* Skip the entry if it's too small. */ - if (bios_start - TRAMPOLINE_32BIT_SIZE < entry->addr) + if (new - TRAMPOLINE_32BIT_SIZE < entry->addr) continue;
+ /* Protect against underflow. */ + if (new - TRAMPOLINE_32BIT_SIZE > bios_start) + break; + + bios_start = new; break; }
[ Upstream commit 77ffd3465ba837e9dc714e17b014e77b2eae765a ]
When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of MQ resources based on shost values set by the driver. In newer cases of the driver, which attempts to set nr_hw_queues to the cpu count, the multipliers become excessive, with a single shost having SCSI-MQ pre-allocation reaching into the multiple GBytes range. NPIV, which creates additional shosts, only multiply this overhead. On lower-memory systems, this can exhaust system memory very quickly, resulting in a system crash or failures in the driver or elsewhere due to low memory conditions.
After testing several scenarios, the situation can be mitigated by limiting the value set in shost->nr_hw_queues to 4. Although the shost values were changed, the driver still had per-cpu hardware queues of its own that allowed parallelization per-cpu. Testing revealed that even with the smallish number for nr_hw_queues for SCSI-MQ, performance levels remained near maximum with the within-driver affiinitization.
A module parameter was created to allow the value set for the nr_hw_queues to be tunable.
Signed-off-by: Dick Kennedy dick.kennedy@broadcom.com Signed-off-by: James Smart jsmart2021@gmail.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Ewan D. Milne emilne@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_attr.c | 15 +++++++++++++++ drivers/scsi/lpfc/lpfc_init.c | 10 ++++++---- drivers/scsi/lpfc/lpfc_sli4.h | 5 +++++ 4 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index aafcffaa25f71..4604e1bc334c0 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -822,6 +822,7 @@ struct lpfc_hba { uint32_t cfg_cq_poll_threshold; uint32_t cfg_cq_max_proc_limit; uint32_t cfg_fcp_cpu_map; + uint32_t cfg_fcp_mq_threshold; uint32_t cfg_hdw_queue; uint32_t cfg_irq_chann; uint32_t cfg_suppress_rsp; diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c index d4c65e2109e2f..353da12d797ba 100644 --- a/drivers/scsi/lpfc/lpfc_attr.c +++ b/drivers/scsi/lpfc/lpfc_attr.c @@ -5640,6 +5640,19 @@ LPFC_ATTR_RW(nvme_oas, 0, 0, 1, LPFC_ATTR_RW(nvme_embed_cmd, 1, 0, 2, "Embed NVME Command in WQE");
+/* + * lpfc_fcp_mq_threshold: Set the maximum number of Hardware Queues + * the driver will advertise it supports to the SCSI layer. + * + * 0 = Set nr_hw_queues by the number of CPUs or HW queues. + * 1,128 = Manually specify the maximum nr_hw_queue value to be set, + * + * Value range is [0,128]. Default value is 8. + */ +LPFC_ATTR_R(fcp_mq_threshold, LPFC_FCP_MQ_THRESHOLD_DEF, + LPFC_FCP_MQ_THRESHOLD_MIN, LPFC_FCP_MQ_THRESHOLD_MAX, + "Set the number of SCSI Queues advertised"); + /* * lpfc_hdw_queue: Set the number of Hardware Queues the driver * will advertise it supports to the NVME and SCSI layers. This also @@ -5961,6 +5974,7 @@ struct device_attribute *lpfc_hba_attrs[] = { &dev_attr_lpfc_cq_poll_threshold, &dev_attr_lpfc_cq_max_proc_limit, &dev_attr_lpfc_fcp_cpu_map, + &dev_attr_lpfc_fcp_mq_threshold, &dev_attr_lpfc_hdw_queue, &dev_attr_lpfc_irq_chann, &dev_attr_lpfc_suppress_rsp, @@ -7042,6 +7056,7 @@ lpfc_get_cfgparam(struct lpfc_hba *phba) /* Initialize first burst. Target vs Initiator are different. */ lpfc_nvme_enable_fb_init(phba, lpfc_nvme_enable_fb); lpfc_nvmet_fb_size_init(phba, lpfc_nvmet_fb_size); + lpfc_fcp_mq_threshold_init(phba, lpfc_fcp_mq_threshold); lpfc_hdw_queue_init(phba, lpfc_hdw_queue); lpfc_irq_chann_init(phba, lpfc_irq_chann); lpfc_enable_bbcr_init(phba, lpfc_enable_bbcr); diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index eaaef682de251..2fd8f15f99975 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -4308,10 +4308,12 @@ lpfc_create_port(struct lpfc_hba *phba, int instance, struct device *dev) shost->max_cmd_len = 16;
if (phba->sli_rev == LPFC_SLI_REV4) { - if (phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) - shost->nr_hw_queues = phba->cfg_hdw_queue; - else - shost->nr_hw_queues = phba->sli4_hba.num_present_cpu; + if (!phba->cfg_fcp_mq_threshold || + phba->cfg_fcp_mq_threshold > phba->cfg_hdw_queue) + phba->cfg_fcp_mq_threshold = phba->cfg_hdw_queue; + + shost->nr_hw_queues = min_t(int, 2 * num_possible_nodes(), + phba->cfg_fcp_mq_threshold);
shost->dma_boundary = phba->sli4_hba.pc_sli4_params.sge_supp_len-1; diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h index 8e4fd1a98023c..986594ec40e2a 100644 --- a/drivers/scsi/lpfc/lpfc_sli4.h +++ b/drivers/scsi/lpfc/lpfc_sli4.h @@ -44,6 +44,11 @@ #define LPFC_HBA_HDWQ_MAX 128 #define LPFC_HBA_HDWQ_DEF 0
+/* FCP MQ queue count limiting */ +#define LPFC_FCP_MQ_THRESHOLD_MIN 0 +#define LPFC_FCP_MQ_THRESHOLD_MAX 128 +#define LPFC_FCP_MQ_THRESHOLD_DEF 8 + /* Common buffer size to accomidate SCSI and NVME IO buffers */ #define LPFC_COMMON_IO_BUF_SZ 768
[ Upstream commit d09bc83640d524b8467a660db7b1d15e6562a1de ]
Simplify the ring buffer handling with the in-place API.
Also avoid the dynamic allocation and the memory leak in the channel callback function.
Signed-off-by: Dexuan Cui decui@microsoft.com Acked-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/serio/hyperv-keyboard.c | 35 +++++---------------------- 1 file changed, 6 insertions(+), 29 deletions(-)
diff --git a/drivers/input/serio/hyperv-keyboard.c b/drivers/input/serio/hyperv-keyboard.c index 8e457e50f837e..770e36d0c66fb 100644 --- a/drivers/input/serio/hyperv-keyboard.c +++ b/drivers/input/serio/hyperv-keyboard.c @@ -237,40 +237,17 @@ static void hv_kbd_handle_received_packet(struct hv_device *hv_dev,
static void hv_kbd_on_channel_callback(void *context) { + struct vmpacket_descriptor *desc; struct hv_device *hv_dev = context; - void *buffer; - int bufferlen = 0x100; /* Start with sensible size */ u32 bytes_recvd; u64 req_id; - int error;
- buffer = kmalloc(bufferlen, GFP_ATOMIC); - if (!buffer) - return; - - while (1) { - error = vmbus_recvpacket_raw(hv_dev->channel, buffer, bufferlen, - &bytes_recvd, &req_id); - switch (error) { - case 0: - if (bytes_recvd == 0) { - kfree(buffer); - return; - } - - hv_kbd_handle_received_packet(hv_dev, buffer, - bytes_recvd, req_id); - break; + foreach_vmbus_pkt(desc, hv_dev->channel) { + bytes_recvd = desc->len8 * 8; + req_id = desc->trans_id;
- case -ENOBUFS: - kfree(buffer); - /* Handle large packet */ - bufferlen = bytes_recvd; - buffer = kmalloc(bytes_recvd, GFP_ATOMIC); - if (!buffer) - return; - break; - } + hv_kbd_handle_received_packet(hv_dev, desc, bytes_recvd, + req_id); } }
[ Upstream commit 89eb4d8d25722a0a0194cf7fa47ba602e32a6da7 ]
When building hv_kvp_daemon GCC-8.3 complains:
hv_kvp_daemon.c: In function ‘kvp_get_ip_info.constprop’: hv_kvp_daemon.c:812:30: warning: ‘ip_buffer’ may be used uninitialized in this function [-Wmaybe-uninitialized] struct hv_kvp_ipaddr_value *ip_buffer;
this seems to be a false positive: we only use ip_buffer when op == KVP_OP_GET_IP_INFO and it is only unset when op == KVP_OP_ENUMERATE.
Silence the warning by initializing ip_buffer to NULL.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/hv/hv_kvp_daemon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c index 0ce50c319cfd6..ef8a82f29f024 100644 --- a/tools/hv/hv_kvp_daemon.c +++ b/tools/hv/hv_kvp_daemon.c @@ -809,7 +809,7 @@ kvp_get_ip_info(int family, char *if_name, int op, int sn_offset = 0; int error = 0; char *buffer; - struct hv_kvp_ipaddr_value *ip_buffer; + struct hv_kvp_ipaddr_value *ip_buffer = NULL; char cidr_mask[5]; /* /xyz */ int weight; int i;
[ Upstream commit 504db087aaccdb32af61539916409f7dca31ceb5 ]
nvme_state_set_live() making a path available triggers requeue_work in order to resubmit requests that ended up on requeue_list when no paths were available.
This requeue_work may race with concurrent nvme_ns_head_make_request() that do not observe the live path yet. Such concurrent requests may by made by either: - New IO submission. - Requeue_work triggered by nvme_failover_req() or another ana_work.
A race may cause requeue_work capture the state of requeue_list before more requests get onto the list. These requests will stay on the list forever unless requeue_work is triggered again.
In order to prevent such race, nvme_state_set_live() should synchronize_srcu(&head->srcu) before triggering the requeue_work and prevent nvme_ns_head_make_request referencing an old snapshot of the path list.
Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Anton Eidelman anton@lightbitslabs.com Signed-off-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/multipath.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 747c0d4f9ff5b..304aa8a65f2f8 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -420,6 +420,7 @@ static void nvme_mpath_set_live(struct nvme_ns *ns) srcu_read_unlock(&head->srcu, srcu_idx); }
+ synchronize_srcu(&ns->head->srcu); kblockd_schedule_work(&ns->head->requeue_work); }
[ Upstream commit a89fcca8185633993018dc081d6b021d005e6d0b ]
Commit 1b1031ca63b2 ("nvme: validate cntlid during controller initialisation") introduced a validation for controllers with duplicate cntlid that runs on nvme_init_subsystem(). The problem is that the validation relies on ctrl->cntlid, and this value is assigned (from id_ctrl value) after the call for nvme_init_subsystem() in nvme_init_identify() for non-fabrics scenario. That leads to ctrl->cntlid always being 0 in case we have a physical set of controllers in the same subsystem.
This patch fixes that by loading the discovered cntlid id_ctrl value into ctrl->cntlid before the subsystem initialization, only for the non-fabrics case. The patch was tested with emulated nvme devices (qemu) having two controllers in a single subsystem. Without the patch, we couldn't make it work failing in the duplicate check; when running with the patch, we could see the subsystem holding both controllers.
For the fabrics case we see ctrl->cntlid has a more intricate relation with the admin connect, so we didn't change that.
Fixes: 1b1031ca63b2 ("nvme: validate cntlid during controller initialisation") Signed-off-by: Guilherme G. Piccoli gpiccoli@canonical.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 601509b3251ae..963b4c6309b9c 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2549,6 +2549,9 @@ int nvme_init_identify(struct nvme_ctrl *ctrl) goto out_free; }
+ if (!(ctrl->ops->flags & NVME_F_FABRICS)) + ctrl->cntlid = le16_to_cpu(id->cntlid); + if (!ctrl->identified) { int i;
@@ -2649,7 +2652,6 @@ int nvme_init_identify(struct nvme_ctrl *ctrl) goto out_free; } } else { - ctrl->cntlid = le16_to_cpu(id->cntlid); ctrl->hmpre = le32_to_cpu(id->hmpre); ctrl->hmmin = le32_to_cpu(id->hmmin); ctrl->hmminds = le32_to_cpu(id->hmminds);
[ Upstream commit a7bfb93f0211b4a2f1ffeeb259ed6206bac30460 ]
In cma_init, if cma_configfs_init fails, need to free the previously memory and return fail, otherwise will trigger null-ptr-deref Read in cma_cleanup.
cma_cleanup cma_configfs_exit configfs_unregister_subsystem
Fixes: 045959db65c6 ("IB/cma: Add configfs for rdma_cm") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: zhengbin zhengbin13@huawei.com Reviewed-by: Parav Pandit parav@mellanox.com Link: https://lore.kernel.org/r/1566188859-103051-1-git-send-email-zhengbin13@huaw... Signed-off-by: Doug Ledford dledford@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/cma.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 19f1730a4f244..a68d0ccf67a43 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -4724,10 +4724,14 @@ static int __init cma_init(void) if (ret) goto err;
- cma_configfs_init(); + ret = cma_configfs_init(); + if (ret) + goto err_ib;
return 0;
+err_ib: + ib_unregister_client(&cma_client); err: unregister_netdevice_notifier(&cma_nb); ib_sa_unregister_client(&sa_client);
[ Upstream commit 5c1baaa82cea2c815a5180ded402a7cd455d1810 ]
In mlx4_ib_alloc_pv_bufs(), 'tun_qp->tx_ring' is allocated through kcalloc(). However, it is not always deallocated in the following execution if an error occurs, leading to memory leaks. To fix this issue, free 'tun_qp->tx_ring' whenever an error occurs.
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Acked-by: Leon Romanovsky leonro@mellanox.com Link: https://lore.kernel.org/r/1566159781-4642-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford dledford@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/mlx4/mad.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 68c951491a08a..57079110af9b5 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -1677,8 +1677,6 @@ tx_err: tx_buf_size, DMA_TO_DEVICE); kfree(tun_qp->tx_ring[i].buf.addr); } - kfree(tun_qp->tx_ring); - tun_qp->tx_ring = NULL; i = MLX4_NUM_TUNNEL_BUFS; err: while (i > 0) { @@ -1687,6 +1685,8 @@ err: rx_buf_size, DMA_FROM_DEVICE); kfree(tun_qp->ring[i].addr); } + kfree(tun_qp->tx_ring); + tun_qp->tx_ring = NULL; kfree(tun_qp->ring); tun_qp->ring = NULL; return -ENOMEM;
[ Upstream commit b08afa064c320e5d85cdc27228426b696c4c8dae ]
In fault_opcodes_read(), 'data' is not deallocated if debugfs_file_get() fails, leading to a memory leak. To fix this bug, introduce the 'free_data' label to free 'data' before returning the error.
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Reviewed-by: Leon Romanovsky leonro@mellanox.com Acked-by: Dennis Dalessandro dennis.dalessandro@intel.com Link: https://lore.kernel.org/r/1566156571-4335-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford dledford@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hfi1/fault.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hfi1/fault.c b/drivers/infiniband/hw/hfi1/fault.c index 93613e5def9b7..814324d172950 100644 --- a/drivers/infiniband/hw/hfi1/fault.c +++ b/drivers/infiniband/hw/hfi1/fault.c @@ -214,7 +214,7 @@ static ssize_t fault_opcodes_read(struct file *file, char __user *buf, return -ENOMEM; ret = debugfs_file_get(file->f_path.dentry); if (unlikely(ret)) - return ret; + goto free_data; bit = find_first_bit(fault->opcodes, bitsize); while (bit < bitsize) { zero = find_next_zero_bit(fault->opcodes, bitsize, bit); @@ -232,6 +232,7 @@ static ssize_t fault_opcodes_read(struct file *file, char __user *buf, data[size - 1] = '\n'; data[size] = '\0'; ret = simple_read_from_buffer(buf, len, pos, data, size); +free_data: kfree(data); return ret; }
[ Upstream commit 2323d7baab2b18d87d9bc267452e387aa9f0060a ]
In fault_opcodes_write(), 'data' is allocated through kcalloc(). However, it is not deallocated in the following execution if an error occurs, leading to memory leaks. To fix this issue, introduce the 'free_data' label to free 'data' before returning the error.
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Reviewed-by: Leon Romanovsky leonro@mellanox.com Acked-by: Dennis Dalessandro dennis.dalessandro@intel.com Link: https://lore.kernel.org/r/1566154486-3713-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford dledford@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hfi1/fault.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/fault.c b/drivers/infiniband/hw/hfi1/fault.c index 814324d172950..986c12153e62e 100644 --- a/drivers/infiniband/hw/hfi1/fault.c +++ b/drivers/infiniband/hw/hfi1/fault.c @@ -141,12 +141,14 @@ static ssize_t fault_opcodes_write(struct file *file, const char __user *buf, if (!data) return -ENOMEM; copy = min(len, datalen - 1); - if (copy_from_user(data, buf, copy)) - return -EFAULT; + if (copy_from_user(data, buf, copy)) { + ret = -EFAULT; + goto free_data; + }
ret = debugfs_file_get(file->f_path.dentry); if (unlikely(ret)) - return ret; + goto free_data; ptr = data; token = ptr; for (ptr = data; *ptr; ptr = end + 1, token = ptr) { @@ -195,6 +197,7 @@ static ssize_t fault_opcodes_write(struct file *file, const char __user *buf, ret = len;
debugfs_file_put(file->f_path.dentry); +free_data: kfree(data); return ret; }
[ Upstream commit 54577e5018a8c0cb79c9a0fa118a55c68715d398 ]
state_test and smm_test are failing on older processors that do not have xcr0. This is because on those processor KVM does provide support for KVM_GET/SET_XSAVE (to avoid having to rely on the older KVM_GET/SET_FPU) but not for KVM_GET/SET_XCRS.
Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../testing/selftests/kvm/lib/x86_64/processor.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c index d2ad85fb01ac0..5f1ba3da2dbd3 100644 --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c @@ -1059,9 +1059,11 @@ struct kvm_x86_state *vcpu_save_state(struct kvm_vm *vm, uint32_t vcpuid) TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_XSAVE, r: %i", r);
- r = ioctl(vcpu->fd, KVM_GET_XCRS, &state->xcrs); - TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_XCRS, r: %i", - r); + if (kvm_check_cap(KVM_CAP_XCRS)) { + r = ioctl(vcpu->fd, KVM_GET_XCRS, &state->xcrs); + TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_XCRS, r: %i", + r); + }
r = ioctl(vcpu->fd, KVM_GET_SREGS, &state->sregs); TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_SREGS, r: %i", @@ -1102,9 +1104,11 @@ void vcpu_load_state(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_x86_state *s TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_XSAVE, r: %i", r);
- r = ioctl(vcpu->fd, KVM_SET_XCRS, &state->xcrs); - TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_XCRS, r: %i", - r); + if (kvm_check_cap(KVM_CAP_XCRS)) { + r = ioctl(vcpu->fd, KVM_SET_XCRS, &state->xcrs); + TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_XCRS, r: %i", + r); + }
r = ioctl(vcpu->fd, KVM_SET_SREGS, &state->sregs); TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_SREGS, r: %i",
[ Upstream commit e4427372398c31f57450565de277f861a4db5b3b ]
test_msr_platform_info_disabled() generates EXIT_SHUTDOWN but VMCB state is undefined after that so an attempt to launch this guest again from test_msr_platform_info_enabled() fails. Reorder the tests to make test pass.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/x86_64/platform_info_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/x86_64/platform_info_test.c b/tools/testing/selftests/kvm/x86_64/platform_info_test.c index 40050e44ec0ac..f9334bd3cce9f 100644 --- a/tools/testing/selftests/kvm/x86_64/platform_info_test.c +++ b/tools/testing/selftests/kvm/x86_64/platform_info_test.c @@ -99,8 +99,8 @@ int main(int argc, char *argv[]) msr_platform_info = vcpu_get_msr(vm, VCPU_ID, MSR_PLATFORM_INFO); vcpu_set_msr(vm, VCPU_ID, MSR_PLATFORM_INFO, msr_platform_info | MSR_PLATFORM_INFO_MAX_TURBO_RATIO); - test_msr_platform_info_disabled(vm); test_msr_platform_info_enabled(vm); + test_msr_platform_info_disabled(vm); vcpu_set_msr(vm, VCPU_ID, MSR_PLATFORM_INFO, msr_platform_info);
kvm_vm_free(vm);
[ Upstream commit 1a701ea924815b0518733aa8d5d05c1f6fa87062 ]
Error out if the AMDGPU_CS ioctl is called with multiple SYNCOBJ_OUT and/or TIMELINE_SIGNAL chunks, since otherwise the last chunk wins while the allocated array as well as the reference counts of sync objects are leaked.
Signed-off-by: Nicolai Hähnle nicolai.haehnle@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index fe028561dc0e6..bc40d6eabce7d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1192,6 +1192,9 @@ static int amdgpu_cs_process_syncobj_out_dep(struct amdgpu_cs_parser *p, num_deps = chunk->length_dw * 4 / sizeof(struct drm_amdgpu_cs_chunk_sem);
+ if (p->post_deps) + return -EINVAL; + p->post_deps = kmalloc_array(num_deps, sizeof(*p->post_deps), GFP_KERNEL); p->num_post_deps = 0; @@ -1215,8 +1218,7 @@ static int amdgpu_cs_process_syncobj_out_dep(struct amdgpu_cs_parser *p,
static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p, - struct amdgpu_cs_chunk - *chunk) + struct amdgpu_cs_chunk *chunk) { struct drm_amdgpu_cs_chunk_syncobj *syncobj_deps; unsigned num_deps; @@ -1226,6 +1228,9 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p num_deps = chunk->length_dw * 4 / sizeof(struct drm_amdgpu_cs_chunk_syncobj);
+ if (p->post_deps) + return -EINVAL; + p->post_deps = kmalloc_array(num_deps, sizeof(*p->post_deps), GFP_KERNEL); p->num_post_deps = 0;
[ Upstream commit 86968ef21596515958d5f0a40233d02be78ecec0 ]
Calling ceph_buffer_put() in __ceph_setxattr() may end up freeing the i_xattrs.prealloc_blob buffer while holding the i_ceph_lock. This can be fixed by postponing the call until later, when the lock is released.
The following backtrace was triggered by fstests generic/117.
BUG: sleeping function called from invalid context at mm/vmalloc.c:2283 in_atomic(): 1, irqs_disabled(): 0, pid: 650, name: fsstress 3 locks held by fsstress/650: #0: 00000000870a0fe8 (sb_writers#8){.+.+}, at: mnt_want_write+0x20/0x50 #1: 00000000ba0c4c74 (&type->i_mutex_dir_key#6){++++}, at: vfs_setxattr+0x55/0xa0 #2: 000000008dfbb3f2 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: __ceph_setxattr+0x297/0x810 CPU: 1 PID: 650 Comm: fsstress Not tainted 5.2.0+ #437 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x67/0x90 ___might_sleep.cold+0x9f/0xb1 vfree+0x4b/0x60 ceph_buffer_release+0x1b/0x60 __ceph_setxattr+0x2b4/0x810 __vfs_setxattr+0x66/0x80 __vfs_setxattr_noperm+0x59/0xf0 vfs_setxattr+0x81/0xa0 setxattr+0x115/0x230 ? filename_lookup+0xc9/0x140 ? rcu_read_lock_sched_held+0x74/0x80 ? rcu_sync_lockdep_assert+0x2e/0x60 ? __sb_start_write+0x142/0x1a0 ? mnt_want_write+0x20/0x50 path_setxattr+0xba/0xd0 __x64_sys_lsetxattr+0x24/0x30 do_syscall_64+0x50/0x1c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7ff23514359a
Signed-off-by: Luis Henriques lhenriques@suse.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/xattr.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c index 0619adbcbe14c..8382299fc2d84 100644 --- a/fs/ceph/xattr.c +++ b/fs/ceph/xattr.c @@ -1028,6 +1028,7 @@ int __ceph_setxattr(struct inode *inode, const char *name, struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc; struct ceph_cap_flush *prealloc_cf = NULL; + struct ceph_buffer *old_blob = NULL; int issued; int err; int dirty = 0; @@ -1101,13 +1102,15 @@ retry: struct ceph_buffer *blob;
spin_unlock(&ci->i_ceph_lock); - dout(" preaallocating new blob size=%d\n", required_blob_size); + ceph_buffer_put(old_blob); /* Shouldn't be required */ + dout(" pre-allocating new blob size=%d\n", required_blob_size); blob = ceph_buffer_new(required_blob_size, GFP_NOFS); if (!blob) goto do_sync_unlocked; spin_lock(&ci->i_ceph_lock); + /* prealloc_blob can't be released while holding i_ceph_lock */ if (ci->i_xattrs.prealloc_blob) - ceph_buffer_put(ci->i_xattrs.prealloc_blob); + old_blob = ci->i_xattrs.prealloc_blob; ci->i_xattrs.prealloc_blob = blob; goto retry; } @@ -1123,6 +1126,7 @@ retry: }
spin_unlock(&ci->i_ceph_lock); + ceph_buffer_put(old_blob); if (lock_snap_rwsem) up_read(&mdsc->snap_rwsem); if (dirty)
[ Upstream commit 12fe3dda7ed89c95cc0ef7abc001ad1ad3e092f8 ]
Calling ceph_buffer_put() in __ceph_build_xattrs_blob() may result in freeing the i_xattrs.blob buffer while holding the i_ceph_lock. This can be fixed by having this function returning the old blob buffer and have the callers of this function freeing it when the lock is released.
The following backtrace was triggered by fstests generic/117.
BUG: sleeping function called from invalid context at mm/vmalloc.c:2283 in_atomic(): 1, irqs_disabled(): 0, pid: 649, name: fsstress 4 locks held by fsstress/649: #0: 00000000a7478e7e (&type->s_umount_key#19){++++}, at: iterate_supers+0x77/0xf0 #1: 00000000f8de1423 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: ceph_check_caps+0x7b/0xc60 #2: 00000000562f2b27 (&s->s_mutex){+.+.}, at: ceph_check_caps+0x3bd/0xc60 #3: 00000000f83ce16a (&mdsc->snap_rwsem){++++}, at: ceph_check_caps+0x3ed/0xc60 CPU: 1 PID: 649 Comm: fsstress Not tainted 5.2.0+ #439 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x67/0x90 ___might_sleep.cold+0x9f/0xb1 vfree+0x4b/0x60 ceph_buffer_release+0x1b/0x60 __ceph_build_xattrs_blob+0x12b/0x170 __send_cap+0x302/0x540 ? __lock_acquire+0x23c/0x1e40 ? __mark_caps_flushing+0x15c/0x280 ? _raw_spin_unlock+0x24/0x30 ceph_check_caps+0x5f0/0xc60 ceph_flush_dirty_caps+0x7c/0x150 ? __ia32_sys_fdatasync+0x20/0x20 ceph_sync_fs+0x5a/0x130 iterate_supers+0x8f/0xf0 ksys_sync+0x4f/0xb0 __ia32_sys_sync+0xa/0x10 do_syscall_64+0x50/0x1c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fc6409ab617
Signed-off-by: Luis Henriques lhenriques@suse.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/caps.c | 5 ++++- fs/ceph/snap.c | 4 +++- fs/ceph/super.h | 2 +- fs/ceph/xattr.c | 11 ++++++++--- 4 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 7754d76791228..622467e47cde8 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -1305,6 +1305,7 @@ static int __send_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap, { struct ceph_inode_info *ci = cap->ci; struct inode *inode = &ci->vfs_inode; + struct ceph_buffer *old_blob = NULL; struct cap_msg_args arg; int held, revoking; int wake = 0; @@ -1369,7 +1370,7 @@ static int __send_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap, ci->i_requested_max_size = arg.max_size;
if (flushing & CEPH_CAP_XATTR_EXCL) { - __ceph_build_xattrs_blob(ci); + old_blob = __ceph_build_xattrs_blob(ci); arg.xattr_version = ci->i_xattrs.version; arg.xattr_buf = ci->i_xattrs.blob; } else { @@ -1404,6 +1405,8 @@ static int __send_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap,
spin_unlock(&ci->i_ceph_lock);
+ ceph_buffer_put(old_blob); + ret = send_cap_msg(&arg); if (ret < 0) { dout("error sending cap msg, must requeue %p\n", inode); diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c index 72c6c022f02bd..213bc1475e91f 100644 --- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -464,6 +464,7 @@ void ceph_queue_cap_snap(struct ceph_inode_info *ci) struct inode *inode = &ci->vfs_inode; struct ceph_cap_snap *capsnap; struct ceph_snap_context *old_snapc, *new_snapc; + struct ceph_buffer *old_blob = NULL; int used, dirty;
capsnap = kzalloc(sizeof(*capsnap), GFP_NOFS); @@ -540,7 +541,7 @@ void ceph_queue_cap_snap(struct ceph_inode_info *ci) capsnap->gid = inode->i_gid;
if (dirty & CEPH_CAP_XATTR_EXCL) { - __ceph_build_xattrs_blob(ci); + old_blob = __ceph_build_xattrs_blob(ci); capsnap->xattr_blob = ceph_buffer_get(ci->i_xattrs.blob); capsnap->xattr_version = ci->i_xattrs.version; @@ -583,6 +584,7 @@ update_snapc: } spin_unlock(&ci->i_ceph_lock);
+ ceph_buffer_put(old_blob); kfree(capsnap); ceph_put_snap_context(old_snapc); } diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 1d313d0536f9d..38b42d7594b67 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -924,7 +924,7 @@ extern int ceph_getattr(const struct path *path, struct kstat *stat, int __ceph_setxattr(struct inode *, const char *, const void *, size_t, int); ssize_t __ceph_getxattr(struct inode *, const char *, void *, size_t); extern ssize_t ceph_listxattr(struct dentry *, char *, size_t); -extern void __ceph_build_xattrs_blob(struct ceph_inode_info *ci); +extern struct ceph_buffer *__ceph_build_xattrs_blob(struct ceph_inode_info *ci); extern void __ceph_destroy_xattrs(struct ceph_inode_info *ci); extern void __init ceph_xattr_init(void); extern void ceph_xattr_exit(void); diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c index 8382299fc2d84..9772db01720b9 100644 --- a/fs/ceph/xattr.c +++ b/fs/ceph/xattr.c @@ -752,12 +752,15 @@ static int __get_required_blob_size(struct ceph_inode_info *ci, int name_size,
/* * If there are dirty xattrs, reencode xattrs into the prealloc_blob - * and swap into place. + * and swap into place. It returns the old i_xattrs.blob (or NULL) so + * that it can be freed by the caller as the i_ceph_lock is likely to be + * held. */ -void __ceph_build_xattrs_blob(struct ceph_inode_info *ci) +struct ceph_buffer *__ceph_build_xattrs_blob(struct ceph_inode_info *ci) { struct rb_node *p; struct ceph_inode_xattr *xattr = NULL; + struct ceph_buffer *old_blob = NULL; void *dest;
dout("__build_xattrs_blob %p\n", &ci->vfs_inode); @@ -788,12 +791,14 @@ void __ceph_build_xattrs_blob(struct ceph_inode_info *ci) dest - ci->i_xattrs.prealloc_blob->vec.iov_base;
if (ci->i_xattrs.blob) - ceph_buffer_put(ci->i_xattrs.blob); + old_blob = ci->i_xattrs.blob; ci->i_xattrs.blob = ci->i_xattrs.prealloc_blob; ci->i_xattrs.prealloc_blob = NULL; ci->i_xattrs.dirty = false; ci->i_xattrs.version++; } + + return old_blob; }
static inline int __get_request_mask(struct inode *in) {
[ Upstream commit af8a85a41734f37b67ba8ce69d56b685bee4ac48 ]
Calling ceph_buffer_put() in fill_inode() may result in freeing the i_xattrs.blob buffer while holding the i_ceph_lock. This can be fixed by postponing the call until later, when the lock is released.
The following backtrace was triggered by fstests generic/070.
BUG: sleeping function called from invalid context at mm/vmalloc.c:2283 in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: kworker/0:4 6 locks held by kworker/0:4/3852: #0: 000000004270f6bb ((wq_completion)ceph-msgr){+.+.}, at: process_one_work+0x1b8/0x5f0 #1: 00000000eb420803 ((work_completion)(&(&con->work)->work)){+.+.}, at: process_one_work+0x1b8/0x5f0 #2: 00000000be1c53a4 (&s->s_mutex){+.+.}, at: dispatch+0x288/0x1476 #3: 00000000559cb958 (&mdsc->snap_rwsem){++++}, at: dispatch+0x2eb/0x1476 #4: 000000000d5ebbae (&req->r_fill_mutex){+.+.}, at: dispatch+0x2fc/0x1476 #5: 00000000a83d0514 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: fill_inode.isra.0+0xf8/0xf70 CPU: 0 PID: 3852 Comm: kworker/0:4 Not tainted 5.2.0+ #441 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Workqueue: ceph-msgr ceph_con_workfn Call Trace: dump_stack+0x67/0x90 ___might_sleep.cold+0x9f/0xb1 vfree+0x4b/0x60 ceph_buffer_release+0x1b/0x60 fill_inode.isra.0+0xa9b/0xf70 ceph_fill_trace+0x13b/0xc70 ? dispatch+0x2eb/0x1476 dispatch+0x320/0x1476 ? __mutex_unlock_slowpath+0x4d/0x2a0 ceph_con_workfn+0xc97/0x2ec0 ? process_one_work+0x1b8/0x5f0 process_one_work+0x244/0x5f0 worker_thread+0x4d/0x3e0 kthread+0x105/0x140 ? process_one_work+0x5f0/0x5f0 ? kthread_park+0x90/0x90 ret_from_fork+0x3a/0x50
Signed-off-by: Luis Henriques lhenriques@suse.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/inode.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 3c7a32779574b..ca3821b0309f7 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -743,6 +743,7 @@ static int fill_inode(struct inode *inode, struct page *locked_page, int issued, new_issued, info_caps; struct timespec64 mtime, atime, ctime; struct ceph_buffer *xattr_blob = NULL; + struct ceph_buffer *old_blob = NULL; struct ceph_string *pool_ns = NULL; struct ceph_cap *new_cap = NULL; int err = 0; @@ -883,7 +884,7 @@ static int fill_inode(struct inode *inode, struct page *locked_page, if ((ci->i_xattrs.version == 0 || !(issued & CEPH_CAP_XATTR_EXCL)) && le64_to_cpu(info->xattr_version) > ci->i_xattrs.version) { if (ci->i_xattrs.blob) - ceph_buffer_put(ci->i_xattrs.blob); + old_blob = ci->i_xattrs.blob; ci->i_xattrs.blob = xattr_blob; if (xattr_blob) memcpy(ci->i_xattrs.blob->vec.iov_base, @@ -1023,8 +1024,8 @@ static int fill_inode(struct inode *inode, struct page *locked_page, out: if (new_cap) ceph_put_cap(mdsc, new_cap); - if (xattr_blob) - ceph_buffer_put(xattr_blob); + ceph_buffer_put(old_blob); + ceph_buffer_put(xattr_blob); ceph_put_string(pool_ns); return err; }
[ Upstream commit 2113c5f62b7423e4a72b890bd479704aa85c81ba ]
If after an MMIO exit to userspace a VCPU is immediately run with an immediate_exit request, such as when a signal is delivered or an MMIO emulation completion is needed, then the VCPU completes the MMIO emulation and immediately returns to userspace. As the exit_reason does not get changed from KVM_EXIT_MMIO in these cases we have to be careful not to complete the MMIO emulation again, when the VCPU is eventually run again, because the emulation does an instruction skip (and doing too many skips would be a waste of guest code :-) We need to use additional VCPU state to track if the emulation is complete. As luck would have it, we already have 'mmio_needed', which even appears to be used in this way by other architectures already.
Fixes: 0d640732dbeb ("arm64: KVM: Skip MMIO insn after emulation") Acked-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Andrew Jones drjones@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- virt/kvm/arm/mmio.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c index a8a6a0c883f1b..6af5c91337f25 100644 --- a/virt/kvm/arm/mmio.c +++ b/virt/kvm/arm/mmio.c @@ -86,6 +86,12 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run) unsigned int len; int mask;
+ /* Detect an already handled MMIO return */ + if (unlikely(!vcpu->mmio_needed)) + return 0; + + vcpu->mmio_needed = 0; + if (!run->mmio.is_write) { len = run->mmio.len; if (len > sizeof(unsigned long)) @@ -188,6 +194,7 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, run->mmio.is_write = is_write; run->mmio.phys_addr = fault_ipa; run->mmio.len = len; + vcpu->mmio_needed = 1;
if (!ret) { /* We handled the access successfully in the kernel. */
[ Upstream commit a5fb8e6c02d6a518fb2b1a2b8c2471fa77b69436 ]
Fix a leak on the cell refcount in afs_lookup_cell_rcu() due to non-clearance of the default error in the case a NULL cell name is passed and the workstation default cell is used.
Also put a bit at the end to make sure we don't leak a cell ref if we're going to be returning an error.
This leak results in an assertion like the following when the kafs module is unloaded:
AFS: Assertion failed 2 == 1 is false 0x2 == 0x1 is false ------------[ cut here ]------------ kernel BUG at fs/afs/cell.c:770! ... RIP: 0010:afs_manage_cells+0x220/0x42f [kafs] ... process_one_work+0x4c2/0x82c ? pool_mayday_timeout+0x1e1/0x1e1 ? do_raw_spin_lock+0x134/0x175 worker_thread+0x336/0x4a6 ? rescuer_thread+0x4af/0x4af kthread+0x1de/0x1ee ? kthread_park+0xd4/0xd4 ret_from_fork+0x24/0x30
Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/cell.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/afs/cell.c b/fs/afs/cell.c index a2a87117d2626..fd5133e26a38b 100644 --- a/fs/afs/cell.c +++ b/fs/afs/cell.c @@ -74,6 +74,7 @@ struct afs_cell *afs_lookup_cell_rcu(struct afs_net *net, cell = rcu_dereference_raw(net->ws_cell); if (cell) { afs_get_cell(cell); + ret = 0; break; } ret = -EDESTADDRREQ; @@ -108,6 +109,9 @@ struct afs_cell *afs_lookup_cell_rcu(struct afs_net *net,
done_seqretry(&net->cells_lock, seq);
+ if (ret != 0 && cell) + afs_put_cell(net, cell); + return ret == 0 ? cell : ERR_PTR(ret); }
[ Upstream commit c4c613ff08d92e72bf64a65ec35a2c3aa1cfcd06 ]
The afs_lookup trace event can cause the following:
[ 216.576777] BUG: kernel NULL pointer dereference, address: 000000000000023b [ 216.576803] #PF: supervisor read access in kernel mode [ 216.576813] #PF: error_code(0x0000) - not-present page ... [ 216.576913] RIP: 0010:trace_event_raw_event_afs_lookup+0x9e/0x1c0 [kafs]
If the inode from afs_do_lookup() is an error other than ENOENT, or if it is ENOENT and afs_try_auto_mntpt() returns an error, the trace event will try to dereference the error pointer as a valid pointer.
Use IS_ERR_OR_NULL to only pass a valid pointer for the trace, or NULL.
Ideally the trace would include the error value, but for now just avoid the oops.
Fixes: 80548b03991f ("afs: Add more tracepoints") Signed-off-by: Marc Dionne marc.dionne@auristor.com Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/dir.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 9620f19308f58..9bd5c067d55d1 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -960,7 +960,8 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry, inode ? AFS_FS_I(inode) : NULL); } else { trace_afs_lookup(dvnode, &dentry->d_name, - inode ? AFS_FS_I(inode) : NULL); + IS_ERR_OR_NULL(inode) ? NULL + : AFS_FS_I(inode)); } return d; }
[ Upstream commit 7533be858f5b9a036b9f91556a3ed70786abca8e ]
It seems that 'yfs_RXYFSStoreOpaqueACL2' should be use in yfs_fs_store_opaque_acl2().
Fixes: f5e4546347bc ("afs: Implement YFS ACL setting") Signed-off-by: YueHaibing yuehaibing@huawei.com Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/yfsclient.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index 18722aaeda33a..a1baf3f1f14d1 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -2155,7 +2155,7 @@ int yfs_fs_store_opaque_acl2(struct afs_fs_cursor *fc, const struct afs_acl *acl key_serial(fc->key), vnode->fid.vid, vnode->fid.vnode);
size = round_up(acl->size, 4); - call = afs_alloc_flat_call(net, &yfs_RXYFSStoreStatus, + call = afs_alloc_flat_call(net, &yfs_RXYFSStoreOpaqueACL2, sizeof(__be32) * 2 + sizeof(struct yfs_xdr_YFSFid) + sizeof(__be32) + size,
[ Upstream commit d37b1e534071ab1983e7c85273234b132c77591a ]
Driver copies FW commands to the HW queue as units of 16 bytes. Some of the command structures are not exact multiple of 16. So while copying the data from those structures, the stack out of bounds messages are reported by KASAN. The following error is reported.
[ 1337.530155] ================================================================== [ 1337.530277] BUG: KASAN: stack-out-of-bounds in bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530413] Read of size 16 at addr ffff888725477a48 by task rmmod/2785
[ 1337.530540] CPU: 5 PID: 2785 Comm: rmmod Tainted: G OE 5.2.0-rc6+ #75 [ 1337.530541] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 1337.530542] Call Trace: [ 1337.530548] dump_stack+0x5b/0x90 [ 1337.530556] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530560] print_address_description+0x65/0x22e [ 1337.530568] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530575] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530577] __kasan_report.cold.3+0x37/0x77 [ 1337.530581] ? _raw_write_trylock+0x10/0xe0 [ 1337.530588] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530590] kasan_report+0xe/0x20 [ 1337.530592] memcpy+0x1f/0x50 [ 1337.530600] bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530608] ? bnxt_qplib_creq_irq+0xa0/0xa0 [bnxt_re] [ 1337.530611] ? xas_create+0x3aa/0x5f0 [ 1337.530613] ? xas_start+0x77/0x110 [ 1337.530615] ? xas_clear_mark+0x34/0xd0 [ 1337.530623] bnxt_qplib_free_mrw+0x104/0x1a0 [bnxt_re] [ 1337.530631] ? bnxt_qplib_destroy_ah+0x110/0x110 [bnxt_re] [ 1337.530633] ? bit_wait_io_timeout+0xc0/0xc0 [ 1337.530641] bnxt_re_dealloc_mw+0x2c/0x60 [bnxt_re] [ 1337.530648] bnxt_re_destroy_fence_mr+0x77/0x1d0 [bnxt_re] [ 1337.530655] bnxt_re_dealloc_pd+0x25/0x60 [bnxt_re] [ 1337.530677] ib_dealloc_pd_user+0xbe/0xe0 [ib_core] [ 1337.530683] srpt_remove_one+0x5de/0x690 [ib_srpt] [ 1337.530689] ? __srpt_close_all_ch+0xc0/0xc0 [ib_srpt] [ 1337.530692] ? xa_load+0x87/0xe0 ... [ 1337.530840] do_syscall_64+0x6d/0x1f0 [ 1337.530843] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1337.530845] RIP: 0033:0x7ff5b389035b [ 1337.530848] Code: 73 01 c3 48 8b 0d 2d 0b 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd 0a 2c 00 f7 d8 64 89 01 48 [ 1337.530849] RSP: 002b:00007fff83425c28 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 1337.530852] RAX: ffffffffffffffda RBX: 00005596443e6750 RCX: 00007ff5b389035b [ 1337.530853] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005596443e67b8 [ 1337.530854] RBP: 0000000000000000 R08: 00007fff83424ba1 R09: 0000000000000000 [ 1337.530856] R10: 00007ff5b3902960 R11: 0000000000000206 R12: 00007fff83425e50 [ 1337.530857] R13: 00007fff8342673c R14: 00005596443e6260 R15: 00005596443e6750
[ 1337.530885] The buggy address belongs to the page: [ 1337.530962] page:ffffea001c951dc0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 1337.530964] flags: 0x57ffffc0000000() [ 1337.530967] raw: 0057ffffc0000000 0000000000000000 ffffffff1c950101 0000000000000000 [ 1337.530970] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 1337.530970] page dumped because: kasan: bad access detected
[ 1337.530996] Memory state around the buggy address: [ 1337.531072] ffff888725477900: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f2 f2 f2 [ 1337.531180] ffff888725477980: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 [ 1337.531288] >ffff888725477a00: 00 f2 f2 f2 f2 f2 f2 00 00 00 f2 00 00 00 00 00 [ 1337.531393] ^ [ 1337.531478] ffff888725477a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1337.531585] ffff888725477b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1337.531691] ==================================================================
Fix this by passing the exact size of each FW command to bnxt_qplib_rcfw_send_message as req->cmd_size. Before sending the command to HW, modify the req->cmd_size to number of 16 byte units.
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Selvin Xavier selvin.xavier@broadcom.com Link: https://lore.kernel.org/r/1566468170-489-1-git-send-email-selvin.xavier@broa... Signed-off-by: Doug Ledford dledford@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 +++++++- drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 11 ++++++++--- 2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c index 48b04d2f175f9..60c8f76aab33d 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c @@ -136,6 +136,13 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req, spin_unlock_irqrestore(&cmdq->lock, flags); return -EBUSY; } + + size = req->cmd_size; + /* change the cmd_size to the number of 16byte cmdq unit. + * req->cmd_size is modified here + */ + bnxt_qplib_set_cmd_slots(req); + memset(resp, 0, sizeof(*resp)); crsqe->resp = (struct creq_qp_event *)resp; crsqe->resp->cookie = req->cookie; @@ -150,7 +157,6 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
cmdq_ptr = (struct bnxt_qplib_cmdqe **)cmdq->pbl_ptr; preq = (u8 *)req; - size = req->cmd_size * BNXT_QPLIB_CMDQE_UNITS; do { /* Locate the next cmdq slot */ sw_prod = HWQ_CMP(cmdq->prod, cmdq); diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h index 2138533bb6426..dfeadc192e174 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h @@ -55,9 +55,7 @@ do { \ memset(&(req), 0, sizeof((req))); \ (req).opcode = CMDQ_BASE_OPCODE_##CMD; \ - (req).cmd_size = (sizeof((req)) + \ - BNXT_QPLIB_CMDQE_UNITS - 1) / \ - BNXT_QPLIB_CMDQE_UNITS; \ + (req).cmd_size = sizeof((req)); \ (req).flags = cpu_to_le16(cmd_flags); \ } while (0)
@@ -95,6 +93,13 @@ static inline u32 bnxt_qplib_cmdqe_cnt_per_pg(u32 depth) BNXT_QPLIB_CMDQE_UNITS); }
+/* Set the cmd_size to a factor of CMDQE unit */ +static inline void bnxt_qplib_set_cmd_slots(struct cmdq_base *req) +{ + req->cmd_size = (req->cmd_size + BNXT_QPLIB_CMDQE_UNITS - 1) / + BNXT_QPLIB_CMDQE_UNITS; +} + #define MAX_CMDQ_IDX(depth) ((depth) - 1)
static inline u32 bnxt_qplib_max_cmdq_idx_per_pg(u32 depth)
[ Upstream commit 48057ed1840fde9239b1e000bea1a0a1f07c5e99 ]
The new API for registering a gpio_irq_chip along with a gpio_chip has a different semantic ordering than the old API which added the irqchip explicitly after registering the gpio_chip.
Move the calls to add the gpio_irq_chip *last* in the function, so that the different hooks setting up OF and ACPI and machine gpio_chips are called *before* we try to register the interrupts, preserving the elder semantic order.
This cropped up in the PL061 driver which used to work fine with no special ACPI quirks, but started to misbehave using the new API.
Fixes: e0d897289813 ("gpio: Implement tighter IRQ chip integration") Cc: Thierry Reding treding@nvidia.com Cc: Grygorii Strashko grygorii.strashko@ti.com Cc: Andy Shevchenko andy.shevchenko@gmail.com Reported-by: Wei Xu xuwei5@hisilicon.com Tested-by: Wei Xu xuwei5@hisilicon.com Reported-by: Andy Shevchenko andy.shevchenko@gmail.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Link: https://lore.kernel.org/r/20190820080527.11796-1-linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpiolib.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 7f9f752011382..f272b51439977 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -1373,21 +1373,13 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data, if (status) goto err_remove_from_list;
- status = gpiochip_irqchip_init_valid_mask(chip); - if (status) - goto err_remove_from_list; - status = gpiochip_alloc_valid_mask(chip); if (status) - goto err_remove_irqchip_mask; - - status = gpiochip_add_irqchip(chip, lock_key, request_key); - if (status) - goto err_free_gpiochip_mask; + goto err_remove_from_list;
status = of_gpiochip_add(chip); if (status) - goto err_remove_chip; + goto err_free_gpiochip_mask;
status = gpiochip_init_valid_mask(chip); if (status) @@ -1413,6 +1405,14 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data,
machine_gpiochip_add(chip);
+ status = gpiochip_irqchip_init_valid_mask(chip); + if (status) + goto err_remove_acpi_chip; + + status = gpiochip_add_irqchip(chip, lock_key, request_key); + if (status) + goto err_remove_irqchip_mask; + /* * By first adding the chardev, and then adding the device, * we get a device node entry in sysfs under @@ -1424,21 +1424,21 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data, if (gpiolib_initialized) { status = gpiochip_setup_dev(gdev); if (status) - goto err_remove_acpi_chip; + goto err_remove_irqchip; } return 0;
+err_remove_irqchip: + gpiochip_irqchip_remove(chip); +err_remove_irqchip_mask: + gpiochip_irqchip_free_valid_mask(chip); err_remove_acpi_chip: acpi_gpiochip_remove(chip); err_remove_of_chip: gpiochip_free_hogs(chip); of_gpiochip_remove(chip); -err_remove_chip: - gpiochip_irqchip_remove(chip); err_free_gpiochip_mask: gpiochip_free_valid_mask(chip); -err_remove_irqchip_mask: - gpiochip_irqchip_free_valid_mask(chip); err_remove_from_list: spin_lock_irqsave(&gpio_lock, flags); list_del(&gdev->list);
[ Upstream commit 2e16f3e926ed48373c98edea85c6ad0ef69425d1 ]
At the moment we initialise the target *mask* of a virtual IRQ to the VCPU it belongs to, even though this mask is only defined for GICv2 and quickly runs out of bits for many GICv3 guests. This behaviour triggers an UBSAN complaint for more than 32 VCPUs: ------ [ 5659.462377] UBSAN: Undefined behaviour in virt/kvm/arm/vgic/vgic-init.c:223:21 [ 5659.471689] shift exponent 32 is too large for 32-bit type 'unsigned int' ------ Also for GICv3 guests the reporting of TARGET in the "vgic-state" debugfs dump is wrong, due to this very same problem.
Because there is no requirement to create the VGIC device before the VCPUs (and QEMU actually does it the other way round), we can't safely initialise mpidr or targets in kvm_vgic_vcpu_init(). But since we touch every private IRQ for each VCPU anyway later (in vgic_init()), we can just move the initialisation of those fields into there, where we definitely know the VGIC type.
On the way make sure we really have either a VGICv2 or a VGICv3 device, since the existing code is just checking for "VGICv3 or not", silently ignoring the uninitialised case.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reported-by: Dave Martin dave.martin@arm.com Tested-by: Julien Grall julien.grall@arm.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- virt/kvm/arm/vgic/vgic-init.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-)
diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c index bdbc297d06fb4..e621b5d45b278 100644 --- a/virt/kvm/arm/vgic/vgic-init.c +++ b/virt/kvm/arm/vgic/vgic-init.c @@ -8,6 +8,7 @@ #include <linux/cpu.h> #include <linux/kvm_host.h> #include <kvm/arm_vgic.h> +#include <asm/kvm_emulate.h> #include <asm/kvm_mmu.h> #include "vgic.h"
@@ -164,12 +165,18 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis) irq->vcpu = NULL; irq->target_vcpu = vcpu0; kref_init(&irq->refcount); - if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2) { + switch (dist->vgic_model) { + case KVM_DEV_TYPE_ARM_VGIC_V2: irq->targets = 0; irq->group = 0; - } else { + break; + case KVM_DEV_TYPE_ARM_VGIC_V3: irq->mpidr = 0; irq->group = 1; + break; + default: + kfree(dist->spis); + return -EINVAL; } } return 0; @@ -209,7 +216,6 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) irq->intid = i; irq->vcpu = NULL; irq->target_vcpu = vcpu; - irq->targets = 1U << vcpu->vcpu_id; kref_init(&irq->refcount); if (vgic_irq_is_sgi(i)) { /* SGIs */ @@ -219,11 +225,6 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) /* PPIs */ irq->config = VGIC_CONFIG_LEVEL; } - - if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) - irq->group = 1; - else - irq->group = 0; }
if (!irqchip_in_kernel(vcpu->kvm)) @@ -286,10 +287,19 @@ int vgic_init(struct kvm *kvm)
for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) { struct vgic_irq *irq = &vgic_cpu->private_irqs[i]; - if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) + switch (dist->vgic_model) { + case KVM_DEV_TYPE_ARM_VGIC_V3: irq->group = 1; - else + irq->mpidr = kvm_vcpu_get_mpidr_aff(vcpu); + break; + case KVM_DEV_TYPE_ARM_VGIC_V2: irq->group = 0; + irq->targets = 1U << idx; + break; + default: + ret = -EINVAL; + goto out; + } } }
[ Upstream commit c96e8483cb2da6695c8b8d0896fe7ae272a07b54 ]
Gustavo noticed that 'new' can be left uninitialized if 'bios_start' happens to be less or equal to 'entry->addr + entry->size'.
Initialize the variable at the begin of the iteration to the current value of 'bios_start'.
Fixes: 0a46fff2f910 ("x86/boot/compressed/64: Fix boot on machines with broken E820 table") Reported-by: "Gustavo A. R. Silva" gustavo@embeddedor.com Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20190826133326.7cxb4vbmiawffv2r@box Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/boot/compressed/pgtable_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c index f0537a1f7fc25..76e1edf5bf12a 100644 --- a/arch/x86/boot/compressed/pgtable_64.c +++ b/arch/x86/boot/compressed/pgtable_64.c @@ -73,7 +73,7 @@ static unsigned long find_trampoline_placement(void)
/* Find the first usable memory region under bios_start. */ for (i = boot_params->e820_entries - 1; i >= 0; i--) { - unsigned long new; + unsigned long new = bios_start;
entry = &boot_params->e820_table[i];
[ Upstream commit 5c498950f730aa17c5f8a2cdcb903524e4002ed2 ]
Signed-off-by: Luis Henriques lhenriques@suse.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/ceph/buffer.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/ceph/buffer.h b/include/linux/ceph/buffer.h index 5e58bb29b1a36..11cdc7c60480f 100644 --- a/include/linux/ceph/buffer.h +++ b/include/linux/ceph/buffer.h @@ -30,7 +30,8 @@ static inline struct ceph_buffer *ceph_buffer_get(struct ceph_buffer *b)
static inline void ceph_buffer_put(struct ceph_buffer *b) { - kref_put(&b->kref, ceph_buffer_release); + if (b) + kref_put(&b->kref, ceph_buffer_release); }
extern int ceph_decode_buffer(struct ceph_buffer **b, void **p, void *end);
[ Upstream commit 950b07c14e8c59444e2359f15fd70ed5112e11a0 ]
This reverts commit 558682b5291937a70748d36fd9ba757fb25b99ae.
Chris Wilson reports that it breaks his CPU hotplug test scripts. In particular, it breaks offlining and then re-onlining the boot CPU, which we treat specially (and the BIOS does too).
The symptoms are that we can offline the CPU, but it then does not come back online again:
smpboot: CPU 0 is now offline smpboot: Booting Node 0 Processor 0 APIC 0x0 smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
Thomas says he knows why it's broken (my personal suspicion: our magic handling of the "cpu0_logical_apicid" thing), but for 5.3 the right fix is to just revert it, since we've never touched the LDR bits before, and it's not worth the risk to do anything else at this stage.
[ Hotpluging of the boot CPU is special anyway, and should be off by default. See the "BOOTPARAM_HOTPLUG_CPU0" config option and the cpu0_hotplug kernel parameter.
In general you should not do it, and it has various known limitations (hibernate and suspend require the boot CPU, for example).
But it should work, even if the boot CPU is special and needs careful treatment - Linus ]
Link: https://lore.kernel.org/lkml/156785100521.13300.14461504732265570003@skylake... Reported-by: Chris Wilson chris@chris-wilson.co.uk Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Bandan Das bsd@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/apic/apic.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 97c3a1c9502e7..2f067b443326e 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1152,10 +1152,6 @@ void clear_local_APIC(void) apic_write(APIC_LVT0, v | APIC_LVT_MASKED); v = apic_read(APIC_LVT1); apic_write(APIC_LVT1, v | APIC_LVT_MASKED); - if (!x2apic_enabled()) { - v = apic_read(APIC_LDR) & ~APIC_LDR_MASK; - apic_write(APIC_LDR, v); - } if (maxlvt >= 4) { v = apic_read(APIC_LVTPC); apic_write(APIC_LVTPC, v | APIC_LVT_MASKED);
From: John S. Gruber JohnSGruber@gmail.com
commit 29d9a0b50736768f042752070e5cdf4e4d4c00df upstream.
Commit
a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else")
now zeroes the secure boot setting information (enabled/disabled/...) passed by the boot loader or by the kernel's EFI handover mechanism.
The problem manifests itself with signed kernels using the EFI handoff protocol with grub and the kernel loses the information whether secure boot is enabled in the firmware, i.e., the log message "Secure boot enabled" becomes "Secure boot could not be determined".
efi_main() arch/x86/boot/compressed/eboot.c sets this field early but it is subsequently zeroed by the above referenced commit.
Include boot_params.secure_boot in the preserve field list.
[ bp: restructure commit message and massage. ]
Fixes: a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else") Signed-off-by: John S. Gruber JohnSGruber@gmail.com Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: John Hubbard jhubbard@nvidia.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Juergen Gross jgross@suse.com Cc: Mark Brown broonie@kernel.org Cc: stable stable@vger.kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: x86-ml x86@kernel.org Link: https://lkml.kernel.org/r/CAPotdmSPExAuQcy9iAHqX3js_fc4mMLQOTr5RBGvizyCOPcTQ... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/bootparam_utils.h | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/include/asm/bootparam_utils.h +++ b/arch/x86/include/asm/bootparam_utils.h @@ -70,6 +70,7 @@ static void sanitize_boot_params(struct BOOT_PARAM_PRESERVE(eddbuf_entries), BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries), BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer), + BOOT_PARAM_PRESERVE(secure_boot), BOOT_PARAM_PRESERVE(hdr), BOOT_PARAM_PRESERVE(e820_table), BOOT_PARAM_PRESERVE(eddbuf),
From: Jan Kaisrlik ja.kaisrlik@gmail.com
commit 8ad8e02c2fa70cfddc1ded53ba9001c9d444075d upstream.
Turns out the commit 3a0681c7448b ("mmc: core: do not retry CMD6 in __mmc_switch()") breaks initialization of a Toshiba THGBMNG5 eMMC card, when using the meson-gx-mmc.c driver on a custom board based on Amlogic A113D.
The CMD6 that switches the card into HS200 mode is then one that fails and according to the below printed messages from the log:
[ 1.648951] mmc0: mmc_select_hs200 failed, error -84 [ 1.648988] mmc0: error -84 whilst initialising MMC card
After some analyze, it turns out that adding a delay of ~5ms inside mmc_select_bus_width() but after mmc_compare_ext_csds() has been executed, also fixes the problem. Adding yet some more debug code, trying to figure out if potentially the card could be in a busy state, both by using CMD13 and ->card_busy() ops concluded that this was not the case.
Therefore, let's simply revert the commit that dropped support for retrying of CMD6, as this also fixes the problem.
Fixes: 3a0681c7448b ("mmc: core: do not retry CMD6 in __mmc_switch()") Cc: stable@vger.kernel.org Signed-off-by: Jan Kaisrlik ja.kaisrlik@gmail.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/core/mmc_ops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/mmc/core/mmc_ops.c +++ b/drivers/mmc/core/mmc_ops.c @@ -564,7 +564,7 @@ int __mmc_switch(struct mmc_card *card, if (index == EXT_CSD_SANITIZE_START) cmd.sanitize_busy = true;
- err = mmc_wait_for_cmd(host, &cmd, 0); + err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES); if (err) goto out;
On Sun, 8 Sep 2019 at 18:19, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.2.14 release. There are 94 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue 10 Sep 2019 12:09:36 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.14-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.2.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 5.2.14-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.2.y git commit: 562387856c86ea4a3aa5ba333cb9806f8065b6ab git describe: v5.2.12-97-g562387856c86 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.2-oe/build/v5.2.12-97-g...
No regressions (compared to build v5.2.12)
No fixes (compared to build v5.2.12)
Ran 22530 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - x86
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libgpiod * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * network-basic-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * ltp-fs-tests * ltp-open-posix-tests * kvm-unit-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Mon, Sep 09, 2019 at 11:24:19AM +0530, Naresh Kamboju wrote:
On Sun, 8 Sep 2019 at 18:19, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.2.14 release. There are 94 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue 10 Sep 2019 12:09:36 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.14-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.2.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Thanks for testing all of these and letting me know.
greg k-h
On Sun, Sep 08, 2019 at 01:40:56PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.2.14 release. There are 94 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue 10 Sep 2019 12:09:36 PM UTC. Anything received after that time might be too late.
Build results: total: 159 pass: 159 fail: 0 Qemu test results: total: 390 pass: 390 fail: 0
Guenter
On Mon, Sep 09, 2019 at 12:40:07PM -0700, Guenter Roeck wrote:
On Sun, Sep 08, 2019 at 01:40:56PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.2.14 release. There are 94 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue 10 Sep 2019 12:09:36 PM UTC. Anything received after that time might be too late.
Build results: total: 159 pass: 159 fail: 0 Qemu test results: total: 390 pass: 390 fail: 0
Wonderful, thanks for testing all of these and letting me know.
greg k-h
On 08/09/2019 13:40, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.2.14 release. There are 94 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue 10 Sep 2019 12:09:36 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.14-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.2.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v5.2: 12 builds: 12 pass, 0 fail 22 boots: 22 pass, 0 fail 38 tests: 38 pass, 0 fail
Linux version: 5.2.14-rc1-g562387856c86 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
On Tue, Sep 10, 2019 at 10:20:23AM +0100, Jon Hunter wrote:
On 08/09/2019 13:40, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.2.14 release. There are 94 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue 10 Sep 2019 12:09:36 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.14-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.2.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v5.2: 12 builds: 12 pass, 0 fail 22 boots: 22 pass, 0 fail 38 tests: 38 pass, 0 fail
Linux version: 5.2.14-rc1-g562387856c86 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Thanks for testing all of these and letting me know.
greg k-h
linux-stable-mirror@lists.linaro.org