This is the start of the stable review cycle for the 4.20.7 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Feb 6 10:35:33 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.20.7-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.20.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.20.7-rc1
Paulo Alcantara paulo@paulo.ac cifs: Always resolve hostname before reconnecting
Alexei Naberezhnov anaberezhnov@fb.com md/raid5: fix 'out of memory' during raid cache recovery
Frank Rowand frank.rowand@sony.com of: overlay: do not duplicate properties from overlay for new nodes
Frank Rowand frank.rowand@sony.com of: overlay: use prop add changeset entry for property in new nodes
Frank Rowand frank.rowand@sony.com of: overlay: add missing of_node_get() in __of_attach_node_sysfs
Frank Rowand frank.rowand@sony.com of: overlay: add tests to validate kfrees from overlay removal
David Hildenbrand david@redhat.com mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
Naoya Horiguchi n-horiguchi@ah.jp.nec.com mm: hwpoison: use do_send_sig_info() instead of force_sig()
Shakeel Butt shakeelb@google.com mm, oom: fix use-after-free in oom_kill_process
Oscar Salvador osalvador@suse.de mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp oom, oom_reaper: do not enqueue same task twice
Andrea Arcangeli aarcange@redhat.com mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT
Andrei Vagin avagin@gmail.com kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
Eric W. Biederman ebiederm@xmission.com btrfs: On error always free subvol_name in btrfs_mount
Filipe Manana fdmanana@suse.com Btrfs: fix deadlock when allocating tree block during leaf/node split
João Paulo Rechi Vita jprvita@gmail.com platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes
João Paulo Rechi Vita jprvita@gmail.com platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK
Mike Marciniszyn mike.marciniszyn@intel.com IB/hfi1: Add limit test for RC/UC send via loopback
Michael J. Ruhl michael.j.ruhl@intel.com IB/hfi1: Remove overly conservative VM_EXEC flag check
Yishai Hadas yishaih@mellanox.com IB/uverbs: Fix OOPs in uverbs_user_mmap_disassociate
Yishai Hadas yishaih@mellanox.com IB/uverbs: Fix OOPs upon device disassociation
Takashi Iwai tiwai@suse.de ALSA: pcm: Fix tight loop of OSS capture stream
Kailang Yang kailang@realtek.com ALSA: hda/realtek - Fixed hp_pin no value
Olek Poplavsky woodenbits@gmail.com ALSA: usb-audio: Add Opus #3 to quirks for native DSD support
Chaotian Jing chaotian.jing@mediatek.com mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay
Lukas Wunner lukas@wunner.de mmc: bcm2835: Fix DMA channel leak on probe error
Andreas Gruenbacher agruenba@redhat.com gfs2: Revert "Fix loop in gfs2_rbm_find"
Neo Hou neo.hou@unisoc.com gpio: sprd: Fix incorrect irq type setting for the async EIC
Neo Hou neo.hou@unisoc.com gpio: sprd: Fix the incorrect data register
Roger Quadros rogerq@ti.com gpio: pcf857x: Fix interrupts on multiple instances
Bartosz Golaszewski bgolaszewski@baylibre.com gpiolib: fix line event timestamps for nested irqs
Axel Lin axel.lin@ingics.com gpio: altera-a10sr: Set proper output level for direction_output
James Morse james.morse@arm.com arm64: hibernate: Clean the __hyp_text to PoC after resume
James Morse james.morse@arm.com arm64: hyp-stub: Forbid kprobing of the hyp-stub
Catalin Marinas catalin.marinas@arm.com arm64: Do not issue IPIs for user executable ptes
Ard Biesheuvel ard.biesheuvel@linaro.org arm64: kaslr: ensure randomized quantities are clean also when kaslr is off
Koen Vandeputte koen.vandeputte@ncentric.com ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment
Trond Myklebust trondmy@gmail.com NFS: Fix up return value on fatal errors in nfs_page_async_flush()
Kees Cook keescook@chromium.org selftests/seccomp: Enhance per-arch ptrace syscall skip tests
Gerald Schaefer gerald.schaefer@de.ibm.com iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()
Waiman Long longman@redhat.com fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
Pavel Shilovsky pshilov@microsoft.com CIFS: Do not consider -ENODATA as stat failure for reads
Aurelien Aptel aaptel@suse.com CIFS: fix use-after-free of the lease keys
Pavel Shilovsky pshilov@microsoft.com CIFS: Fix trace command logging for SMB2 reads and writes
Pavel Shilovsky pshilov@microsoft.com CIFS: Fix possible oops and memory leaks in async IO
Pavel Shilovsky pshilov@microsoft.com CIFS: Do not count -ENODATA as failure for query directory
David Ahern dsahern@gmail.com ipv6: Consider sk_bound_dev_if when binding a socket to an address
Toshiaki Makita makita.toshiaki@lab.ntt.co.jp virtio_net: Differentiate sk_buff and xdp_frame on freeing
Toshiaki Makita makita.toshiaki@lab.ntt.co.jp virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs
Toshiaki Makita makita.toshiaki@lab.ntt.co.jp virtio_net: Don't process redirected XDP frames when XDP is disabled
Toshiaki Makita makita.toshiaki@lab.ntt.co.jp virtio_net: Fix out of bounds access of sq
Toshiaki Makita makita.toshiaki@lab.ntt.co.jp virtio_net: Fix not restoring real_num_rx_queues
Toshiaki Makita makita.toshiaki@lab.ntt.co.jp virtio_net: Don't call free_old_xmit_skbs for xdp_frames
Toshiaki Makita makita.toshiaki@lab.ntt.co.jp virtio_net: Don't enable NAPI when interface is down
Dave Watson davejwatson@fb.com net: tls: Save iv in tls_rec for async crypto requests
Dave Watson davejwatson@fb.com net: tls: Fix deadlock in free_resources tx
Xin Long lucien.xin@gmail.com sctp: set flow sport from saddr only when it's 0
Xin Long lucien.xin@gmail.com sctp: set chunk transport correctly when it's a new asoc
Bodong Wang bodong@mellanox.com Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"
Nir Dotan nird@mellanox.com ip6mr: Fix notifiers call on mroute_clean_tables()
Aya Levin ayal@mellanox.com net/mlx5e: Allow MAC invalidation while spoofchk is ON
Xin Long lucien.xin@gmail.com sctp: improve the events for sctp stream adding
Lorenzo Bianconi lorenzo.bianconi@redhat.com net: ip6_gre: always reports o_key to userspace
Jason Wang jasowang@redhat.com vhost: fix OOB in get_rx_bufs()
Mathias Thore mathias.thore@infinera.com ucc_geth: Reset BQL queue when stopping device
George Amanakis gamanakis@gmail.com tun: move the call to tun_set_real_num_queues
Xin Long lucien.xin@gmail.com sctp: improve the events for sctp stream reset
Simon Horman horms+renesas@verge.net.au ravb: expand rx descriptor data to accommodate hw checksum
Josh Elsasser jelsasser@appneta.com net: set default network namespace in init_dummy_netdev()
Bernard Pidoux f6bvp@free.fr net/rose: fix NULL ax25_cb kernel panic
Cong Wang xiyou.wangcong@gmail.com netrom: switch to sock timer API
Aya Levin ayal@mellanox.com net/mlx4_core: Add masking for a few queries on HCA caps
Jakub Kicinski jakub.kicinski@netronome.com net/ipv6: don't return positive numbers when nothing was dumped
Lorenzo Bianconi lorenzo.bianconi@redhat.com net: ip_gre: use erspan key field for tunnel lookup
Lorenzo Bianconi lorenzo.bianconi@redhat.com net: ip_gre: always reports o_key to userspace
Jacob Wen jian.w.wen@oracle.com l2tp: fix reading optional fields of L2TPv3
Jacob Wen jian.w.wen@oracle.com l2tp: copy 4 more bytes to linear part if necessary
Daniel Borkmann daniel@iogearbox.net ipvlan, l3mdev: fix broken l3s mode wrt local routes
Yohei Kanemaru yohei.kanemaru@gmail.com ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation
Arnd Bergmann arnd@arndb.de drm/msm/gpu: fix building without debugfs
-------------
Diffstat:
Makefile | 4 +- arch/arm/mach-cns3xxx/pcie.c | 2 +- arch/arm64/kernel/hibernate.c | 4 +- arch/arm64/kernel/hyp-stub.S | 2 + arch/arm64/kernel/kaslr.c | 1 + arch/arm64/mm/flush.c | 6 +- drivers/gpio/gpio-altera-a10sr.c | 4 +- drivers/gpio/gpio-eic-sprd.c | 14 +- drivers/gpio/gpio-pcf857x.c | 26 ++-- drivers/gpio/gpiolib.c | 9 +- drivers/gpu/drm/msm/msm_gpu.h | 2 +- drivers/infiniband/core/uverbs_main.c | 25 ++-- drivers/infiniband/hw/hfi1/file_ops.c | 2 +- drivers/infiniband/sw/rdmavt/qp.c | 7 +- drivers/iommu/intel-iommu.c | 2 +- drivers/md/raid5-cache.c | 33 +++-- drivers/md/raid5.c | 8 +- drivers/mmc/host/bcm2835.c | 2 + drivers/mmc/host/mtk-sd.c | 2 +- drivers/net/ethernet/freescale/ucc_geth.c | 2 + drivers/net/ethernet/mellanox/mlx4/fw.c | 75 ++++++---- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 22 +-- drivers/net/ethernet/renesas/ravb_main.c | 12 +- drivers/net/ipvlan/ipvlan_main.c | 6 +- drivers/net/tun.c | 3 +- drivers/net/virtio_net.c | 169 +++++++++++++++------- drivers/of/dynamic.c | 32 +++- drivers/of/kobj.c | 4 +- drivers/of/overlay.c | 115 ++++++++++----- drivers/platform/x86/asus-nb-wmi.c | 3 +- drivers/vhost/net.c | 3 +- drivers/vhost/scsi.c | 2 +- drivers/vhost/vhost.c | 7 +- drivers/vhost/vhost.h | 4 +- drivers/vhost/vsock.c | 2 +- fs/btrfs/ctree.c | 78 ++++++---- fs/btrfs/super.c | 3 + fs/cifs/connect.c | 53 +++++++ fs/cifs/file.c | 11 +- fs/cifs/smb2pdu.c | 54 ++++--- fs/dcache.c | 6 +- fs/gfs2/rgrp.c | 2 +- fs/nfs/write.c | 9 +- include/linux/netdevice.h | 8 + include/linux/of.h | 15 +- include/linux/sched/coredump.h | 1 + include/net/l3mdev.h | 3 +- include/net/tls.h | 2 + kernel/exit.c | 12 +- mm/hugetlb.c | 3 +- mm/memory-failure.c | 3 +- mm/memory_hotplug.c | 36 +++-- mm/migrate.c | 7 +- mm/oom_kill.c | 12 +- net/core/dev.c | 3 + net/ipv4/gre_demux.c | 17 +++ net/ipv4/ip_gre.c | 16 +- net/ipv6/addrconf.c | 2 + net/ipv6/af_inet6.c | 3 + net/ipv6/ip6_gre.c | 11 +- net/ipv6/ip6mr.c | 7 +- net/ipv6/seg6_iptunnel.c | 2 + net/l2tp/l2tp_core.c | 9 +- net/l2tp/l2tp_core.h | 20 +++ net/l2tp/l2tp_ip.c | 3 + net/l2tp/l2tp_ip6.c | 3 + net/netrom/nr_timer.c | 20 +-- net/rose/rose_route.c | 5 + net/sctp/ipv6.c | 3 +- net/sctp/protocol.c | 3 +- net/sctp/sm_make_chunk.c | 11 +- net/sctp/stream.c | 58 ++++---- net/tls/tls_sw.c | 6 +- sound/core/pcm_lib.c | 9 +- sound/pci/hda/patch_realtek.c | 78 +++++----- sound/usb/quirks.c | 1 + tools/testing/selftests/seccomp/seccomp_bpf.c | 72 +++++++-- 77 files changed, 879 insertions(+), 417 deletions(-)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
commit c878a628e0c483ec36fa70f4590e4a58e34a6e49 upstream.
When debugfs is disabled, but coredump is turned on, the adreno driver fails to build:
drivers/gpu/drm/msm/adreno/a3xx_gpu.c:460:4: error: 'struct msm_gpu_funcs' has no member named 'show' .show = adreno_show, ^~~~ drivers/gpu/drm/msm/adreno/a3xx_gpu.c:460:11: note: (near initialization for 'funcs.base') drivers/gpu/drm/msm/adreno/a3xx_gpu.c:460:11: error: initialization of 'void (*)(struct msm_gpu *, struct msm_gem_submit *, struct msm_file_private *)' from incompatible pointer type 'void (*)(struct msm_gpu *, struct msm_gpu_state *, struct drm_printer *)' [-Werror=incompatible-pointer-types] drivers/gpu/drm/msm/adreno/a3xx_gpu.c:460:11: note: (near initialization for 'funcs.base.submit') drivers/gpu/drm/msm/adreno/a4xx_gpu.c:546:4: error: 'struct msm_gpu_funcs' has no member named 'show' drivers/gpu/drm/msm/adreno/a5xx_gpu.c:1460:4: error: 'struct msm_gpu_funcs' has no member named 'show' drivers/gpu/drm/msm/adreno/a6xx_gpu.c:769:4: error: 'struct msm_gpu_funcs' has no member named 'show' drivers/gpu/drm/msm/msm_gpu.c: In function 'msm_gpu_devcoredump_read': drivers/gpu/drm/msm/msm_gpu.c:289:12: error: 'const struct msm_gpu_funcs' has no member named 'show'
Adjust the #ifdef to make it build again.
Fixes: c0fec7f562ec ("drm/msm/gpu: Capture the GPU state on a GPU hang") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Rob Clark robdclark@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/msm/msm_gpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -63,7 +63,7 @@ struct msm_gpu_funcs { struct msm_ringbuffer *(*active_ring)(struct msm_gpu *gpu); void (*recover)(struct msm_gpu *gpu); void (*destroy)(struct msm_gpu *gpu); -#ifdef CONFIG_DEBUG_FS +#if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP) /* show GPU status in debugfs: */ void (*show)(struct msm_gpu *gpu, struct msm_gpu_state *state, struct drm_printer *p);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yohei Kanemaru yohei.kanemaru@gmail.com
[ Upstream commit ef489749aae508e6f17886775c075f12ff919fb1 ]
skb->cb may contain data from previous layers (in an observed case IPv4 with L3 Master Device). In the observed scenario, the data in IPCB(skb)->frags was misinterpreted as IP6CB(skb)->frag_max_size, eventually caused an unexpected IPv6 fragmentation in ip6_fragment() through ip6_finish_output().
This patch clears IP6CB(skb), which potentially contains garbage data, on the SRH ip4ip6 encapsulation.
Fixes: 32d99d0b6702 ("ipv6: sr: add support for ip4ip6 encapsulation") Signed-off-by: Yohei Kanemaru yohei.kanemaru@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/seg6_iptunnel.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/ipv6/seg6_iptunnel.c +++ b/net/ipv6/seg6_iptunnel.c @@ -146,6 +146,8 @@ int seg6_do_srh_encap(struct sk_buff *sk } else { ip6_flow_hdr(hdr, 0, flowlabel); hdr->hop_limit = ip6_dst_hoplimit(skb_dst(skb)); + + memset(IP6CB(skb), 0, sizeof(*IP6CB(skb))); }
hdr->nexthdr = NEXTHDR_ROUTING;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit d5256083f62e2720f75bb3c5a928a0afe47d6bc3 ]
While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin, I ran into the issue that while l3 mode is working fine, l3s mode does not have any connectivity to kube-apiserver and hence all pods end up in Error state as well. The ipvlan master device sits on top of a bond device and hostns traffic to kube-apiserver (also running in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573 where the latter is the address of the bond0. While in l3 mode, a curl to https://10.152.183.1:443 or to https://139.178.29.207:37573 works fine from hostns, neither of them do in case of l3s. In the latter only a curl to https://127.0.0.1:37573 appeared to work where for local addresses of bond0 I saw kernel suddenly starting to emit ARP requests to query HW address of bond0 which remained unanswered and neighbor entries in INCOMPLETE state. These ARP requests only happen while in l3s.
Debugging this further, I found the issue is that l3s mode is piggy- backing on l3 master device, and in this case local routes are using l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback"). I found that reverting them back into using the net->loopback_dev fixed ipvlan l3s connectivity and got everything working for the CNI.
Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the l3mdev paper in [0] the only sole reason why ipvlan l3s is relying on l3 master device is to get the l3mdev_ip_rcv() receive hook for setting the dst entry of the input route without adding its own ipvlan specific hacks into the receive path, however, any l3 domain semantics beyond just that are breaking l3s operation. Note that ipvlan also has the ability to dynamically switch its internal operation from l3 to l3s for all ports via ipvlan_set_port_mode() at runtime. In any case, l3 vs l3s soley distinguishes itself by 'de-confusing' netfilter through switching skb->dev to ipvlan slave device late in NF_INET_LOCAL_IN before handing the skb to L4.
Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which, if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook without any additional l3mdev semantics on top. This should also have minimal impact since dev->priv_flags is already hot in cache. With this set, l3s mode is working fine and I also get things like masquerading pod traffic on the ipvlan master properly working.
[0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant") Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback") Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Mahesh Bandewar maheshb@google.com Cc: David Ahern dsa@cumulusnetworks.com Cc: Florian Westphal fw@strlen.de Cc: Martynas Pumputis m@lambda.lt Acked-by: David Ahern dsa@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ipvlan/ipvlan_main.c | 6 +++--- include/linux/netdevice.h | 8 ++++++++ include/net/l3mdev.h | 3 ++- 3 files changed, 13 insertions(+), 4 deletions(-)
--- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -97,12 +97,12 @@ static int ipvlan_set_port_mode(struct i err = ipvlan_register_nf_hook(read_pnet(&port->pnet)); if (!err) { mdev->l3mdev_ops = &ipvl_l3mdev_ops; - mdev->priv_flags |= IFF_L3MDEV_MASTER; + mdev->priv_flags |= IFF_L3MDEV_RX_HANDLER; } else goto fail; } else if (port->mode == IPVLAN_MODE_L3S) { /* Old mode was L3S */ - mdev->priv_flags &= ~IFF_L3MDEV_MASTER; + mdev->priv_flags &= ~IFF_L3MDEV_RX_HANDLER; ipvlan_unregister_nf_hook(read_pnet(&port->pnet)); mdev->l3mdev_ops = NULL; } @@ -162,7 +162,7 @@ static void ipvlan_port_destroy(struct n struct sk_buff *skb;
if (port->mode == IPVLAN_MODE_L3S) { - dev->priv_flags &= ~IFF_L3MDEV_MASTER; + dev->priv_flags &= ~IFF_L3MDEV_RX_HANDLER; ipvlan_unregister_nf_hook(dev_net(dev)); dev->l3mdev_ops = NULL; } --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1487,6 +1487,7 @@ struct net_device_ops { * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook * @IFF_FAILOVER: device is a failover master device * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device + * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1518,6 +1519,7 @@ enum netdev_priv_flags { IFF_NO_RX_HANDLER = 1<<26, IFF_FAILOVER = 1<<27, IFF_FAILOVER_SLAVE = 1<<28, + IFF_L3MDEV_RX_HANDLER = 1<<29, };
#define IFF_802_1Q_VLAN IFF_802_1Q_VLAN @@ -1548,6 +1550,7 @@ enum netdev_priv_flags { #define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER #define IFF_FAILOVER IFF_FAILOVER #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE +#define IFF_L3MDEV_RX_HANDLER IFF_L3MDEV_RX_HANDLER
/** * struct net_device - The DEVICE structure. @@ -4523,6 +4526,11 @@ static inline bool netif_supports_nofcs( return dev->priv_flags & IFF_SUPP_NOFCS; }
+static inline bool netif_has_l3_rx_handler(const struct net_device *dev) +{ + return dev->priv_flags & IFF_L3MDEV_RX_HANDLER; +} + static inline bool netif_is_l3_master(const struct net_device *dev) { return dev->priv_flags & IFF_L3MDEV_MASTER; --- a/include/net/l3mdev.h +++ b/include/net/l3mdev.h @@ -142,7 +142,8 @@ struct sk_buff *l3mdev_l3_rcv(struct sk_
if (netif_is_l3_slave(skb->dev)) master = netdev_master_upper_dev_get_rcu(skb->dev); - else if (netif_is_l3_master(skb->dev)) + else if (netif_is_l3_master(skb->dev) || + netif_has_l3_rx_handler(skb->dev)) master = skb->dev;
if (master && master->l3mdev_ops->l3mdev_l3_rcv)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacob Wen jian.w.wen@oracle.com
[ Upstream commit 91c524708de6207f59dd3512518d8a1c7b434ee3 ]
The size of L2TPv2 header with all optional fields is 14 bytes. l2tp_udp_recv_core only moves 10 bytes to the linear part of a skb. This may lead to l2tp_recv_common read data outside of a skb.
This patch make sure that there is at least 14 bytes in the linear part of a skb to meet the maximum need of l2tp_udp_recv_core and l2tp_recv_common. The minimum size of both PPP HDLC-like frame and Ethernet frame is larger than 14 bytes, so we are safe to do so.
Also remove L2TP_HDR_SIZE_NOSEQ, it is unused now.
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Suggested-by: Guillaume Nault gnault@redhat.com Signed-off-by: Jacob Wen jian.w.wen@oracle.com Acked-by: Guillaume Nault gnault@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/l2tp/l2tp_core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -83,8 +83,7 @@ #define L2TP_SLFLAG_S 0x40000000 #define L2TP_SL_SEQ_MASK 0x00ffffff
-#define L2TP_HDR_SIZE_SEQ 10 -#define L2TP_HDR_SIZE_NOSEQ 6 +#define L2TP_HDR_SIZE_MAX 14
/* Default trace flags */ #define L2TP_DEFAULT_DEBUG_FLAGS 0 @@ -808,7 +807,7 @@ static int l2tp_udp_recv_core(struct l2t __skb_pull(skb, sizeof(struct udphdr));
/* Short packet? */ - if (!pskb_may_pull(skb, L2TP_HDR_SIZE_SEQ)) { + if (!pskb_may_pull(skb, L2TP_HDR_SIZE_MAX)) { l2tp_info(tunnel, L2TP_MSG_DATA, "%s: recv short packet (len=%d)\n", tunnel->name, skb->len);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacob Wen jian.w.wen@oracle.com
[ Upstream commit 4522a70db7aa5e77526a4079628578599821b193 ]
Use pskb_may_pull() to make sure the optional fields are in skb linear parts, so we can safely read them later.
It's easy to reproduce the issue with a net driver that supports paged skb data. Just create a L2TPv3 over IP tunnel and then generates some network traffic. Once reproduced, rx err in /sys/kernel/debug/l2tp/tunnels will increase.
Changes in v4: 1. s/l2tp_v3_pull_opt/l2tp_v3_ensure_opt_in_linear/ 2. s/tunnel->version != L2TP_HDR_VER_2/tunnel->version == L2TP_HDR_VER_3/ 3. Add 'Fixes' in commit messages.
Changes in v3: 1. To keep consistency, move the code out of l2tp_recv_common. 2. Use "net" instead of "net-next", since this is a bug fix.
Changes in v2: 1. Only fix L2TPv3 to make code simple. To fix both L2TPv3 and L2TPv2, we'd better refactor l2tp_recv_common. It's complicated to do so. 2. Reloading pointers after pskb_may_pull
Fixes: f7faffa3ff8e ("l2tp: Add L2TPv3 protocol support") Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support") Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6") Signed-off-by: Jacob Wen jian.w.wen@oracle.com Acked-by: Guillaume Nault gnault@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/l2tp/l2tp_core.c | 4 ++++ net/l2tp/l2tp_core.h | 20 ++++++++++++++++++++ net/l2tp/l2tp_ip.c | 3 +++ net/l2tp/l2tp_ip6.c | 3 +++ 4 files changed, 30 insertions(+)
--- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -883,6 +883,10 @@ static int l2tp_udp_recv_core(struct l2t goto error; }
+ if (tunnel->version == L2TP_HDR_VER_3 && + l2tp_v3_ensure_opt_in_linear(session, skb, &ptr, &optr)) + goto error; + l2tp_recv_common(session, skb, ptr, optr, hdrflags, length); l2tp_session_dec_refcount(session);
--- a/net/l2tp/l2tp_core.h +++ b/net/l2tp/l2tp_core.h @@ -301,6 +301,26 @@ static inline bool l2tp_tunnel_uses_xfrm } #endif
+static inline int l2tp_v3_ensure_opt_in_linear(struct l2tp_session *session, struct sk_buff *skb, + unsigned char **ptr, unsigned char **optr) +{ + int opt_len = session->peer_cookie_len + l2tp_get_l2specific_len(session); + + if (opt_len > 0) { + int off = *ptr - *optr; + + if (!pskb_may_pull(skb, off + opt_len)) + return -1; + + if (skb->data != *optr) { + *optr = skb->data; + *ptr = skb->data + off; + } + } + + return 0; +} + #define l2tp_printk(ptr, type, func, fmt, ...) \ do { \ if (((ptr)->debug) & (type)) \ --- a/net/l2tp/l2tp_ip.c +++ b/net/l2tp/l2tp_ip.c @@ -165,6 +165,9 @@ static int l2tp_ip_recv(struct sk_buff * print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, ptr, length); }
+ if (l2tp_v3_ensure_opt_in_linear(session, skb, &ptr, &optr)) + goto discard_sess; + l2tp_recv_common(session, skb, ptr, optr, 0, skb->len); l2tp_session_dec_refcount(session);
--- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -178,6 +178,9 @@ static int l2tp_ip6_recv(struct sk_buff print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, ptr, length); }
+ if (l2tp_v3_ensure_opt_in_linear(session, skb, &ptr, &optr)) + goto discard_sess; + l2tp_recv_common(session, skb, ptr, optr, 0, skb->len); l2tp_session_dec_refcount(session);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Bianconi lorenzo.bianconi@redhat.com
[ Upstream commit feaf5c796b3f0240f10d0d6d0b686715fd58a05b ]
Erspan protocol (version 1 and 2) relies on o_key to configure session id header field. However TUNNEL_KEY bit is cleared in erspan_xmit since ERSPAN protocol does not set the key field of the external GRE header and so the configured o_key is not reported to userspace. The issue can be triggered with the following reproducer:
$ip link add erspan1 type erspan local 192.168.0.1 remote 192.168.0.2 \ key 1 seq erspan_ver 1 $ip link set erspan1 up $ip -d link sh erspan1
erspan1@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UNKNOWN mode DEFAULT link/ether 52:aa:99:95:9a:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500 erspan remote 192.168.0.2 local 192.168.0.1 ttl inherit ikey 0.0.0.1 iseq oseq erspan_index 0
Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in ipgre_fill_info
Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/ip_gre.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -1468,12 +1468,17 @@ static int ipgre_fill_info(struct sk_buf { struct ip_tunnel *t = netdev_priv(dev); struct ip_tunnel_parm *p = &t->parms; + __be16 o_flags = p->o_flags; + + if ((t->erspan_ver == 1 || t->erspan_ver == 2) && + !t->collect_md) + o_flags |= TUNNEL_KEY;
if (nla_put_u32(skb, IFLA_GRE_LINK, p->link) || nla_put_be16(skb, IFLA_GRE_IFLAGS, gre_tnl_flags_to_gre_flags(p->i_flags)) || nla_put_be16(skb, IFLA_GRE_OFLAGS, - gre_tnl_flags_to_gre_flags(p->o_flags)) || + gre_tnl_flags_to_gre_flags(o_flags)) || nla_put_be32(skb, IFLA_GRE_IKEY, p->i_key) || nla_put_be32(skb, IFLA_GRE_OKEY, p->o_key) || nla_put_in_addr(skb, IFLA_GRE_LOCAL, p->iph.saddr) ||
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Bianconi lorenzo.bianconi@redhat.com
[ Upstream commit cb73ee40b1b381eaf3749e6dbeed567bb38e5258 ]
Use ERSPAN key header field as tunnel key in gre_parse_header routine since ERSPAN protocol sets the key field of the external GRE header to 0 resulting in a tunnel lookup fail in ip6gre_err. In addition remove key field parsing and pskb_may_pull check in erspan_rcv and ip6erspan_rcv
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/gre_demux.c | 17 +++++++++++++++++ net/ipv4/ip_gre.c | 9 --------- net/ipv6/ip6_gre.c | 4 ---- 3 files changed, 17 insertions(+), 13 deletions(-)
--- a/net/ipv4/gre_demux.c +++ b/net/ipv4/gre_demux.c @@ -25,6 +25,7 @@ #include <linux/spinlock.h> #include <net/protocol.h> #include <net/gre.h> +#include <net/erspan.h>
#include <net/icmp.h> #include <net/route.h> @@ -119,6 +120,22 @@ int gre_parse_header(struct sk_buff *skb hdr_len += 4; } tpi->hdr_len = hdr_len; + + /* ERSPAN ver 1 and 2 protocol sets GRE key field + * to 0 and sets the configured key in the + * inner erspan header field + */ + if (greh->protocol == htons(ETH_P_ERSPAN) || + greh->protocol == htons(ETH_P_ERSPAN2)) { + struct erspan_base_hdr *ershdr; + + if (!pskb_may_pull(skb, nhs + hdr_len + sizeof(*ershdr))) + return -EINVAL; + + ershdr = (struct erspan_base_hdr *)options; + tpi->key = cpu_to_be32(get_session_id(ershdr)); + } + return hdr_len; } EXPORT_SYMBOL(gre_parse_header); --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -266,20 +266,11 @@ static int erspan_rcv(struct sk_buff *sk int len;
itn = net_generic(net, erspan_net_id); - len = gre_hdr_len + sizeof(*ershdr); - - /* Check based hdr len */ - if (unlikely(!pskb_may_pull(skb, len))) - return PACKET_REJECT;
iph = ip_hdr(skb); ershdr = (struct erspan_base_hdr *)(skb->data + gre_hdr_len); ver = ershdr->ver;
- /* The original GRE header does not have key field, - * Use ERSPAN 10-bit session ID as key. - */ - tpi->key = cpu_to_be32(get_session_id(ershdr)); tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags | TUNNEL_KEY, iph->saddr, iph->daddr, tpi->key); --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -532,13 +532,9 @@ static int ip6erspan_rcv(struct sk_buff struct ip6_tnl *tunnel; u8 ver;
- if (unlikely(!pskb_may_pull(skb, sizeof(*ershdr)))) - return PACKET_REJECT; - ipv6h = ipv6_hdr(skb); ershdr = (struct erspan_base_hdr *)skb->data; ver = ershdr->ver; - tpi->key = cpu_to_be32(get_session_id(ershdr));
tunnel = ip6gre_tunnel_lookup(skb->dev, &ipv6h->saddr, &ipv6h->daddr, tpi->key,
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski jakub.kicinski@netronome.com
[ Upstream commit 1518039f6b5ac794313c24c76f85cead0cd60f6c ]
in6_dump_addrs() returns a positive 1 if there was nothing to dump. This return value can not be passed as return from inet6_dump_addr() as is, because it will confuse rtnetlink, resulting in NLMSG_DONE never getting set:
$ ip addr list dev lo EOF on netlink Dump terminated
v2: flip condition to avoid a new goto (DaveA)
Fixes: 7c1e8a3817c5 ("netlink: fixup regression in RTM_GETADDR") Reported-by: Brendan Galloway brendan.galloway@netronome.com Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Reviewed-by: David Ahern dsahern@gmail.com Tested-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/addrconf.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -5120,6 +5120,8 @@ static int inet6_dump_addr(struct sk_buf if (idev) { err = in6_dump_addrs(idev, skb, cb, s_ip_idx, &fillargs); + if (err > 0) + err = 0; } goto put_tgt_net; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aya Levin ayal@mellanox.com
[ Upstream commit a40ded6043658444ee4dd6ee374119e4e98b33fc ]
Driver reads the query HCA capabilities without the corresponding masks. Without the correct masks, the base addresses of the queues are unaligned. In addition some reserved bits were wrongly read. Using the correct masks, ensures alignment of the base addresses and allows future firmware versions safe use of the reserved bits.
Fixes: ab9c17a009ee ("mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet") Fixes: 0ff1fb654bec ("{NET, IB}/mlx4: Add device managed flow steering firmware API") Signed-off-by: Aya Levin ayal@mellanox.com Signed-off-by: Tariq Toukan tariqt@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx4/fw.c | 75 +++++++++++++++++++------------- 1 file changed, 46 insertions(+), 29 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c @@ -2064,9 +2064,11 @@ int mlx4_QUERY_HCA(struct mlx4_dev *dev, { struct mlx4_cmd_mailbox *mailbox; __be32 *outbox; + u64 qword_field; u32 dword_field; - int err; + u16 word_field; u8 byte_field; + int err; static const u8 a0_dmfs_query_hw_steering[] = { [0] = MLX4_STEERING_DMFS_A0_DEFAULT, [1] = MLX4_STEERING_DMFS_A0_DYNAMIC, @@ -2094,19 +2096,32 @@ int mlx4_QUERY_HCA(struct mlx4_dev *dev,
/* QPC/EEC/CQC/EQC/RDMARC attributes */
- MLX4_GET(param->qpc_base, outbox, INIT_HCA_QPC_BASE_OFFSET); - MLX4_GET(param->log_num_qps, outbox, INIT_HCA_LOG_QP_OFFSET); - MLX4_GET(param->srqc_base, outbox, INIT_HCA_SRQC_BASE_OFFSET); - MLX4_GET(param->log_num_srqs, outbox, INIT_HCA_LOG_SRQ_OFFSET); - MLX4_GET(param->cqc_base, outbox, INIT_HCA_CQC_BASE_OFFSET); - MLX4_GET(param->log_num_cqs, outbox, INIT_HCA_LOG_CQ_OFFSET); - MLX4_GET(param->altc_base, outbox, INIT_HCA_ALTC_BASE_OFFSET); - MLX4_GET(param->auxc_base, outbox, INIT_HCA_AUXC_BASE_OFFSET); - MLX4_GET(param->eqc_base, outbox, INIT_HCA_EQC_BASE_OFFSET); - MLX4_GET(param->log_num_eqs, outbox, INIT_HCA_LOG_EQ_OFFSET); - MLX4_GET(param->num_sys_eqs, outbox, INIT_HCA_NUM_SYS_EQS_OFFSET); - MLX4_GET(param->rdmarc_base, outbox, INIT_HCA_RDMARC_BASE_OFFSET); - MLX4_GET(param->log_rd_per_qp, outbox, INIT_HCA_LOG_RD_OFFSET); + MLX4_GET(qword_field, outbox, INIT_HCA_QPC_BASE_OFFSET); + param->qpc_base = qword_field & ~((u64)0x1f); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_QP_OFFSET); + param->log_num_qps = byte_field & 0x1f; + MLX4_GET(qword_field, outbox, INIT_HCA_SRQC_BASE_OFFSET); + param->srqc_base = qword_field & ~((u64)0x1f); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_SRQ_OFFSET); + param->log_num_srqs = byte_field & 0x1f; + MLX4_GET(qword_field, outbox, INIT_HCA_CQC_BASE_OFFSET); + param->cqc_base = qword_field & ~((u64)0x1f); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_CQ_OFFSET); + param->log_num_cqs = byte_field & 0x1f; + MLX4_GET(qword_field, outbox, INIT_HCA_ALTC_BASE_OFFSET); + param->altc_base = qword_field; + MLX4_GET(qword_field, outbox, INIT_HCA_AUXC_BASE_OFFSET); + param->auxc_base = qword_field; + MLX4_GET(qword_field, outbox, INIT_HCA_EQC_BASE_OFFSET); + param->eqc_base = qword_field & ~((u64)0x1f); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_EQ_OFFSET); + param->log_num_eqs = byte_field & 0x1f; + MLX4_GET(word_field, outbox, INIT_HCA_NUM_SYS_EQS_OFFSET); + param->num_sys_eqs = word_field & 0xfff; + MLX4_GET(qword_field, outbox, INIT_HCA_RDMARC_BASE_OFFSET); + param->rdmarc_base = qword_field & ~((u64)0x1f); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_RD_OFFSET); + param->log_rd_per_qp = byte_field & 0x7;
MLX4_GET(dword_field, outbox, INIT_HCA_FLAGS_OFFSET); if (dword_field & (1 << INIT_HCA_DEVICE_MANAGED_FLOW_STEERING_EN)) { @@ -2125,22 +2140,21 @@ int mlx4_QUERY_HCA(struct mlx4_dev *dev, /* steering attributes */ if (param->steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) { MLX4_GET(param->mc_base, outbox, INIT_HCA_FS_BASE_OFFSET); - MLX4_GET(param->log_mc_entry_sz, outbox, - INIT_HCA_FS_LOG_ENTRY_SZ_OFFSET); - MLX4_GET(param->log_mc_table_sz, outbox, - INIT_HCA_FS_LOG_TABLE_SZ_OFFSET); - MLX4_GET(byte_field, outbox, - INIT_HCA_FS_A0_OFFSET); + MLX4_GET(byte_field, outbox, INIT_HCA_FS_LOG_ENTRY_SZ_OFFSET); + param->log_mc_entry_sz = byte_field & 0x1f; + MLX4_GET(byte_field, outbox, INIT_HCA_FS_LOG_TABLE_SZ_OFFSET); + param->log_mc_table_sz = byte_field & 0x1f; + MLX4_GET(byte_field, outbox, INIT_HCA_FS_A0_OFFSET); param->dmfs_high_steer_mode = a0_dmfs_query_hw_steering[(byte_field >> 6) & 3]; } else { MLX4_GET(param->mc_base, outbox, INIT_HCA_MC_BASE_OFFSET); - MLX4_GET(param->log_mc_entry_sz, outbox, - INIT_HCA_LOG_MC_ENTRY_SZ_OFFSET); - MLX4_GET(param->log_mc_hash_sz, outbox, - INIT_HCA_LOG_MC_HASH_SZ_OFFSET); - MLX4_GET(param->log_mc_table_sz, outbox, - INIT_HCA_LOG_MC_TABLE_SZ_OFFSET); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_MC_ENTRY_SZ_OFFSET); + param->log_mc_entry_sz = byte_field & 0x1f; + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_MC_HASH_SZ_OFFSET); + param->log_mc_hash_sz = byte_field & 0x1f; + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_MC_TABLE_SZ_OFFSET); + param->log_mc_table_sz = byte_field & 0x1f; }
/* CX3 is capable of extending CQEs/EQEs from 32 to 64 bytes */ @@ -2164,15 +2178,18 @@ int mlx4_QUERY_HCA(struct mlx4_dev *dev, /* TPT attributes */
MLX4_GET(param->dmpt_base, outbox, INIT_HCA_DMPT_BASE_OFFSET); - MLX4_GET(param->mw_enabled, outbox, INIT_HCA_TPT_MW_OFFSET); - MLX4_GET(param->log_mpt_sz, outbox, INIT_HCA_LOG_MPT_SZ_OFFSET); + MLX4_GET(byte_field, outbox, INIT_HCA_TPT_MW_OFFSET); + param->mw_enabled = byte_field >> 7; + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_MPT_SZ_OFFSET); + param->log_mpt_sz = byte_field & 0x3f; MLX4_GET(param->mtt_base, outbox, INIT_HCA_MTT_BASE_OFFSET); MLX4_GET(param->cmpt_base, outbox, INIT_HCA_CMPT_BASE_OFFSET);
/* UAR attributes */
MLX4_GET(param->uar_page_sz, outbox, INIT_HCA_UAR_PAGE_SZ_OFFSET); - MLX4_GET(param->log_uar_sz, outbox, INIT_HCA_LOG_UAR_SZ_OFFSET); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_UAR_SZ_OFFSET); + param->log_uar_sz = byte_field & 0xf;
/* phv_check enable */ MLX4_GET(byte_field, outbox, INIT_HCA_CACHELINE_SZ_OFFSET);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 63346650c1a94a92be61a57416ac88c0a47c4327 ]
sk_reset_timer() and sk_stop_timer() properly handle sock refcnt for timer function. Switching to them could fix a refcounting bug reported by syzbot.
Reported-and-tested-by: syzbot+defa700d16f1bd1b9a05@syzkaller.appspotmail.com Cc: Ralf Baechle ralf@linux-mips.org Cc: linux-hams@vger.kernel.org Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netrom/nr_timer.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
--- a/net/netrom/nr_timer.c +++ b/net/netrom/nr_timer.c @@ -52,21 +52,21 @@ void nr_start_t1timer(struct sock *sk) { struct nr_sock *nr = nr_sk(sk);
- mod_timer(&nr->t1timer, jiffies + nr->t1); + sk_reset_timer(sk, &nr->t1timer, jiffies + nr->t1); }
void nr_start_t2timer(struct sock *sk) { struct nr_sock *nr = nr_sk(sk);
- mod_timer(&nr->t2timer, jiffies + nr->t2); + sk_reset_timer(sk, &nr->t2timer, jiffies + nr->t2); }
void nr_start_t4timer(struct sock *sk) { struct nr_sock *nr = nr_sk(sk);
- mod_timer(&nr->t4timer, jiffies + nr->t4); + sk_reset_timer(sk, &nr->t4timer, jiffies + nr->t4); }
void nr_start_idletimer(struct sock *sk) @@ -74,37 +74,37 @@ void nr_start_idletimer(struct sock *sk) struct nr_sock *nr = nr_sk(sk);
if (nr->idle > 0) - mod_timer(&nr->idletimer, jiffies + nr->idle); + sk_reset_timer(sk, &nr->idletimer, jiffies + nr->idle); }
void nr_start_heartbeat(struct sock *sk) { - mod_timer(&sk->sk_timer, jiffies + 5 * HZ); + sk_reset_timer(sk, &sk->sk_timer, jiffies + 5 * HZ); }
void nr_stop_t1timer(struct sock *sk) { - del_timer(&nr_sk(sk)->t1timer); + sk_stop_timer(sk, &nr_sk(sk)->t1timer); }
void nr_stop_t2timer(struct sock *sk) { - del_timer(&nr_sk(sk)->t2timer); + sk_stop_timer(sk, &nr_sk(sk)->t2timer); }
void nr_stop_t4timer(struct sock *sk) { - del_timer(&nr_sk(sk)->t4timer); + sk_stop_timer(sk, &nr_sk(sk)->t4timer); }
void nr_stop_idletimer(struct sock *sk) { - del_timer(&nr_sk(sk)->idletimer); + sk_stop_timer(sk, &nr_sk(sk)->idletimer); }
void nr_stop_heartbeat(struct sock *sk) { - del_timer(&sk->sk_timer); + sk_stop_timer(sk, &sk->sk_timer); }
int nr_t1timer_running(struct sock *sk)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bernard Pidoux f6bvp@free.fr
[ Upstream commit b0cf029234f9b18e10703ba5147f0389c382bccc ]
When an internally generated frame is handled by rose_xmit(), rose_route_frame() is called:
if (!rose_route_frame(skb, NULL)) { dev_kfree_skb(skb); stats->tx_errors++; return NETDEV_TX_OK; }
We have the same code sequence in Net/Rom where an internally generated frame is handled by nr_xmit() calling nr_route_frame(skb, NULL). However, in this function NULL argument is tested while it is not in rose_route_frame(). Then kernel panic occurs later on when calling ax25cmp() with a NULL ax25_cb argument as reported many times and recently with syzbot.
We need to test if ax25 is NULL before using it.
Testing: Built kernel with CONFIG_ROSE=y.
Signed-off-by: Bernard Pidoux f6bvp@free.fr Acked-by: Dmitry Vyukov dvyukov@google.com Reported-by: syzbot+1a2c456a1ea08fa5b5f7@syzkaller.appspotmail.com Cc: "David S. Miller" davem@davemloft.net Cc: Ralf Baechle ralf@linux-mips.org Cc: Bernard Pidoux f6bvp@free.fr Cc: linux-hams@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rose/rose_route.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/net/rose/rose_route.c +++ b/net/rose/rose_route.c @@ -850,6 +850,7 @@ void rose_link_device_down(struct net_de
/* * Route a frame to an appropriate AX.25 connection. + * A NULL ax25_cb indicates an internally generated frame. */ int rose_route_frame(struct sk_buff *skb, ax25_cb *ax25) { @@ -867,6 +868,10 @@ int rose_route_frame(struct sk_buff *skb
if (skb->len < ROSE_MIN_LEN) return res; + + if (!ax25) + return rose_loopback_queue(skb, NULL); + frametype = skb->data[2]; lci = ((skb->data[0] << 8) & 0xF00) + ((skb->data[1] << 0) & 0x0FF); if (frametype == ROSE_CALL_REQUEST &&
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Elsasser jelsasser@appneta.com
[ Upstream commit 35edfdc77f683c8fd27d7732af06cf6489af60a5 ]
Assign a default net namespace to netdevs created by init_dummy_netdev(). Fixes a NULL pointer dereference caused by busy-polling a socket bound to an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat if napi_poll() received packets:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000190 IP: napi_busy_loop+0xd6/0x200 Call Trace: sock_poll+0x5e/0x80 do_sys_poll+0x324/0x5a0 SyS_poll+0x6c/0xf0 do_syscall_64+0x6b/0x1f0 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: 7db6b048da3b ("net: Commonize busy polling code to focus on napi_id instead of socket") Signed-off-by: Josh Elsasser jelsasser@appneta.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -8624,6 +8624,9 @@ int init_dummy_netdev(struct net_device set_bit(__LINK_STATE_PRESENT, &dev->state); set_bit(__LINK_STATE_START, &dev->state);
+ /* napi_busy_loop stats accounting wants this */ + dev_net_set(dev, &init_net); + /* Note : We dont allocate pcpu_refcnt for dummy devices, * because users of this 'device' dont need to change * its refcount.
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Simon Horman horms+renesas@verge.net.au
[ Upstream commit 12da64300fbc76b875900445f4146c3dc617d43e ]
EtherAVB may provide a checksum of packet data appended to packet data. In order to allow this checksum to be received by the host descriptor data needs to be enlarged by 2 bytes to accommodate the checksum.
In the case of MTU-sized packets without a VLAN tag the checksum were already accommodated by virtue of the space reserved for the VLAN tag. However, a packet of MTU-size with a VLAN tag consumed all packet data space provided by a descriptor leaving no space for the trailing checksum.
This was not detected by the driver which incorrectly used the last two bytes of packet data as the checksum and truncate the packet by two bytes. This resulted all such packets being dropped.
A work around is to disable RX checksum offload # ethtool -K eth0 rx off
This patch resolves this problem by increasing the size available for packet data in RX descriptors by two bytes.
Tested on R-Car E3 (r8a77990) ES1.0 based Ebisu-4D board
v2 * Use sizeof(__sum16) directly rather than adding a driver-local #define for the size of the checksum provided by the hw (2 bytes).
Fixes: 4d86d3818627 ("ravb: RX checksum offload") Signed-off-by: Simon Horman horms+renesas@verge.net.au Reviewed-by: Sergei Shtylyov sergei.shtylyov@cogentembedded.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/renesas/ravb_main.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -350,7 +350,7 @@ static int ravb_ring_init(struct net_dev int i;
priv->rx_buf_sz = (ndev->mtu <= 1492 ? PKT_BUF_SZ : ndev->mtu) + - ETH_HLEN + VLAN_HLEN; + ETH_HLEN + VLAN_HLEN + sizeof(__sum16);
/* Allocate RX and TX skb rings */ priv->rx_skb[q] = kcalloc(priv->num_rx_ring[q], @@ -533,13 +533,15 @@ static void ravb_rx_csum(struct sk_buff { u8 *hw_csum;
- /* The hardware checksum is 2 bytes appended to packet data */ - if (unlikely(skb->len < 2)) + /* The hardware checksum is contained in sizeof(__sum16) (2) bytes + * appended to packet data + */ + if (unlikely(skb->len < sizeof(__sum16))) return; - hw_csum = skb_tail_pointer(skb) - 2; + hw_csum = skb_tail_pointer(skb) - sizeof(__sum16); skb->csum = csum_unfold((__force __sum16)get_unaligned_le16(hw_csum)); skb->ip_summed = CHECKSUM_COMPLETE; - skb_trim(skb, skb->len - 2); + skb_trim(skb, skb->len - sizeof(__sum16)); }
/* Packet receive function for Ethernet AVB */
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 2e6dc4d95110becfe0ff4c3d4749c33ea166e9e7 ]
This patch is to improve sctp stream reset events in 4 places:
1. In sctp_process_strreset_outreq(), the flag should always be set with SCTP_STREAM_RESET_INCOMING_SSN instead of OUTGOING, as receiver's in stream is reset here. 2. In sctp_process_strreset_outreq(), move up SCTP_STRRESET_ERR_WRONG_SSN check, as the reset has to succeed after reconf_timer stops for the in stream reset request retransmission. 3. In sctp_process_strreset_inreq(), no event should be sent, as no in or out stream is reset here. 4. In sctp_process_strreset_resp(), SCTP_STREAM_RESET_INCOMING_SSN or OUTGOING event should always be sent for stream reset requests, no matter it fails or succeeds to process the request.
Fixes: 810544764536 ("sctp: implement receiver-side procedures for the Outgoing SSN Reset Request Parameter") Fixes: 16e1a91965b0 ("sctp: implement receiver-side procedures for the Incoming SSN Reset Request Parameter") Fixes: 11ae76e67a17 ("sctp: implement receiver-side procedures for the Reconf Response Parameter") Reported-by: Ying Xu yinxu@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/stream.c | 39 +++++++++++++++++---------------------- 1 file changed, 17 insertions(+), 22 deletions(-)
--- a/net/sctp/stream.c +++ b/net/sctp/stream.c @@ -585,9 +585,9 @@ struct sctp_chunk *sctp_process_strreset struct sctp_strreset_outreq *outreq = param.v; struct sctp_stream *stream = &asoc->stream; __u32 result = SCTP_STRRESET_DENIED; - __u16 i, nums, flags = 0; __be16 *str_p = NULL; __u32 request_seq; + __u16 i, nums;
request_seq = ntohl(outreq->request_seq);
@@ -615,6 +615,15 @@ struct sctp_chunk *sctp_process_strreset if (!(asoc->strreset_enable & SCTP_ENABLE_RESET_STREAM_REQ)) goto out;
+ nums = (ntohs(param.p->length) - sizeof(*outreq)) / sizeof(__u16); + str_p = outreq->list_of_streams; + for (i = 0; i < nums; i++) { + if (ntohs(str_p[i]) >= stream->incnt) { + result = SCTP_STRRESET_ERR_WRONG_SSN; + goto out; + } + } + if (asoc->strreset_chunk) { if (!sctp_chunk_lookup_strreset_param( asoc, outreq->response_seq, @@ -637,32 +646,19 @@ struct sctp_chunk *sctp_process_strreset sctp_chunk_put(asoc->strreset_chunk); asoc->strreset_chunk = NULL; } - - flags = SCTP_STREAM_RESET_INCOMING_SSN; }
- nums = (ntohs(param.p->length) - sizeof(*outreq)) / sizeof(__u16); - if (nums) { - str_p = outreq->list_of_streams; - for (i = 0; i < nums; i++) { - if (ntohs(str_p[i]) >= stream->incnt) { - result = SCTP_STRRESET_ERR_WRONG_SSN; - goto out; - } - } - + if (nums) for (i = 0; i < nums; i++) SCTP_SI(stream, ntohs(str_p[i]))->mid = 0; - } else { + else for (i = 0; i < stream->incnt; i++) SCTP_SI(stream, i)->mid = 0; - }
result = SCTP_STRRESET_PERFORMED;
*evp = sctp_ulpevent_make_stream_reset_event(asoc, - flags | SCTP_STREAM_RESET_OUTGOING_SSN, nums, str_p, - GFP_ATOMIC); + SCTP_STREAM_RESET_INCOMING_SSN, nums, str_p, GFP_ATOMIC);
out: sctp_update_strreset_result(asoc, result); @@ -738,9 +734,6 @@ struct sctp_chunk *sctp_process_strreset
result = SCTP_STRRESET_PERFORMED;
- *evp = sctp_ulpevent_make_stream_reset_event(asoc, - SCTP_STREAM_RESET_INCOMING_SSN, nums, str_p, GFP_ATOMIC); - out: sctp_update_strreset_result(asoc, result); err: @@ -1036,10 +1029,10 @@ struct sctp_chunk *sctp_process_strreset sout->mid_uo = 0; } } - - flags = SCTP_STREAM_RESET_OUTGOING_SSN; }
+ flags |= SCTP_STREAM_RESET_OUTGOING_SSN; + for (i = 0; i < stream->outcnt; i++) SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
@@ -1058,6 +1051,8 @@ struct sctp_chunk *sctp_process_strreset nums = (ntohs(inreq->param_hdr.length) - sizeof(*inreq)) / sizeof(__u16);
+ flags |= SCTP_STREAM_RESET_INCOMING_SSN; + *evp = sctp_ulpevent_make_stream_reset_event(asoc, flags, nums, str_p, GFP_ATOMIC); } else if (req->type == SCTP_PARAM_RESET_TSN_REQUEST) {
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: George Amanakis gamanakis@gmail.com
[ Upstream commit 3a03cb8456cc1d61c467a5375e0a10e5207b948c ]
Call tun_set_real_num_queues() after the increment of tun->numqueues since the former depends on it. Otherwise, the number of queues is not correctly accounted for, which results to warnings similar to: "vnet0 selects TX queue 11, but real number of TX queues is 11".
Fixes: 0b7959b62573 ("tun: publish tfile after it's fully initialized") Reported-and-tested-by: George Amanakis gamanakis@gmail.com Signed-off-by: George Amanakis gamanakis@gmail.com Signed-off-by: Stanislav Fomichev sdf@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/tun.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -862,8 +862,6 @@ static int tun_attach(struct tun_struct if (rtnl_dereference(tun->xdp_prog)) sock_set_flag(&tfile->sk, SOCK_XDP);
- tun_set_real_num_queues(tun); - /* device is allowed to go away first, so no need to hold extra * refcnt. */ @@ -875,6 +873,7 @@ static int tun_attach(struct tun_struct rcu_assign_pointer(tfile->tun, tun); rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); tun->numqueues++; + tun_set_real_num_queues(tun); out: return err; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Thore mathias.thore@infinera.com
[ Upstream commit e15aa3b2b1388c399c1a2ce08550d2cc4f7e3e14 ]
After a timeout event caused by for example a broadcast storm, when the MAC and PHY are reset, the BQL TX queue needs to be reset as well. Otherwise, the device will exhibit severe performance issues even after the storm has ended.
Co-authored-by: David Gounaris david.gounaris@infinera.com Signed-off-by: Mathias Thore mathias.thore@infinera.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/freescale/ucc_geth.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/freescale/ucc_geth.c +++ b/drivers/net/ethernet/freescale/ucc_geth.c @@ -1883,6 +1883,8 @@ static void ucc_geth_free_tx(struct ucc_ u16 i, j; u8 __iomem *bd;
+ netdev_reset_queue(ugeth->ndev); + ug_info = ugeth->ug_info; uf_info = &ug_info->uf_info;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit b46a0bf78ad7b150ef5910da83859f7f5a514ffd ]
After batched used ring updating was introduced in commit e2b3b35eb989 ("vhost_net: batch used ring update in rx"). We tend to batch heads in vq->heads for more than one packet. But the quota passed to get_rx_bufs() was not correctly limited, which can result a OOB write in vq->heads.
headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx, vhost_len, &in, vq_log, &log, likely(mergeable) ? UIO_MAXIOV : 1);
UIO_MAXIOV was still used which is wrong since we could have batched used in vq->heads, this will cause OOB if the next buffer needs more than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've batched 64 (VHOST_NET_BATCH) heads: Acked-by: Stefan Hajnoczi stefanha@redhat.com
============================================================================= BUG kmalloc-8k (Tainted: G B ): Redzone overwritten -----------------------------------------------------------------------------
INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674 kmem_cache_alloc_trace+0xbb/0x140 alloc_pd+0x22/0x60 gen8_ppgtt_create+0x11d/0x5f0 i915_ppgtt_create+0x16/0x80 i915_gem_create_context+0x248/0x390 i915_gem_context_create_ioctl+0x4b/0xe0 drm_ioctl_kernel+0xa5/0xf0 drm_ioctl+0x2ed/0x3a0 do_vfs_ioctl+0x9f/0x620 ksys_ioctl+0x6b/0x80 __x64_sys_ioctl+0x11/0x20 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x (null) flags=0x200000000010201 INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b
Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for vhost-net. This is done through set the limitation through vhost_dev_init(), then set_owner can allocate the number of iov in a per device manner.
This fixes CVE-2018-16880.
Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx") Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vhost/net.c | 3 ++- drivers/vhost/scsi.c | 2 +- drivers/vhost/vhost.c | 7 ++++--- drivers/vhost/vhost.h | 4 +++- drivers/vhost/vsock.c | 2 +- 5 files changed, 11 insertions(+), 7 deletions(-)
--- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1293,7 +1293,8 @@ static int vhost_net_open(struct inode * n->vqs[i].rx_ring = NULL; vhost_net_buf_init(&n->vqs[i].rxq); } - vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX); + vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, + UIO_MAXIOV + VHOST_NET_BATCH);
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev); vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev); --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1628,7 +1628,7 @@ static int vhost_scsi_open(struct inode vqs[i] = &vs->vqs[i].vq; vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick; } - vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ); + vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV);
vhost_scsi_init_inflight(vs, NULL);
--- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -390,9 +390,9 @@ static long vhost_dev_alloc_iovecs(struc vq->indirect = kmalloc_array(UIO_MAXIOV, sizeof(*vq->indirect), GFP_KERNEL); - vq->log = kmalloc_array(UIO_MAXIOV, sizeof(*vq->log), + vq->log = kmalloc_array(dev->iov_limit, sizeof(*vq->log), GFP_KERNEL); - vq->heads = kmalloc_array(UIO_MAXIOV, sizeof(*vq->heads), + vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads), GFP_KERNEL); if (!vq->indirect || !vq->log || !vq->heads) goto err_nomem; @@ -414,7 +414,7 @@ static void vhost_dev_free_iovecs(struct }
void vhost_dev_init(struct vhost_dev *dev, - struct vhost_virtqueue **vqs, int nvqs) + struct vhost_virtqueue **vqs, int nvqs, int iov_limit) { struct vhost_virtqueue *vq; int i; @@ -427,6 +427,7 @@ void vhost_dev_init(struct vhost_dev *de dev->iotlb = NULL; dev->mm = NULL; dev->worker = NULL; + dev->iov_limit = iov_limit; init_llist_head(&dev->work_list); init_waitqueue_head(&dev->wait); INIT_LIST_HEAD(&dev->read_list); --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -170,9 +170,11 @@ struct vhost_dev { struct list_head read_list; struct list_head pending_list; wait_queue_head_t wait; + int iov_limit; };
-void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, int nvqs); +void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, + int nvqs, int iov_limit); long vhost_dev_set_owner(struct vhost_dev *dev); bool vhost_dev_has_owner(struct vhost_dev *dev); long vhost_dev_check_owner(struct vhost_dev *); --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -531,7 +531,7 @@ static int vhost_vsock_dev_open(struct i vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick; vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
- vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs)); + vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs), UIO_MAXIOV);
file->private_data = vsock; spin_lock_init(&vsock->send_pkt_list_lock);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Bianconi lorenzo.bianconi@redhat.com
[ Upstream commit c706863bc8902d0c2d1a5a27ac8e1ead5d06b79d ]
As Erspan_v4, Erspan_v6 protocol relies on o_key to configure session id header field. However TUNNEL_KEY bit is cleared in ip6erspan_tunnel_xmit since ERSPAN protocol does not set the key field of the external GRE header and so the configured o_key is not reported to userspace. The issue can be triggered with the following reproducer:
$ip link add ip6erspan1 type ip6erspan local 2000::1 remote 2000::2 \ key 1 seq erspan_ver 1 $ip link set ip6erspan1 up ip -d link sh ip6erspan1
ip6erspan1@NONE: <BROADCAST,MULTICAST> mtu 1422 qdisc noop state DOWN mode DEFAULT link/ether ba:ff:09:24:c3:0e brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500 ip6erspan remote 2000::2 local 2000::1 encaplimit 4 flowlabel 0x00000 ikey 0.0.0.1 iseq oseq
Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in ip6gre_fill_info
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_gre.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -2102,12 +2102,17 @@ static int ip6gre_fill_info(struct sk_bu { struct ip6_tnl *t = netdev_priv(dev); struct __ip6_tnl_parm *p = &t->parms; + __be16 o_flags = p->o_flags; + + if ((p->erspan_ver == 1 || p->erspan_ver == 2) && + !p->collect_md) + o_flags |= TUNNEL_KEY;
if (nla_put_u32(skb, IFLA_GRE_LINK, p->link) || nla_put_be16(skb, IFLA_GRE_IFLAGS, gre_tnl_flags_to_gre_flags(p->i_flags)) || nla_put_be16(skb, IFLA_GRE_OFLAGS, - gre_tnl_flags_to_gre_flags(p->o_flags)) || + gre_tnl_flags_to_gre_flags(o_flags)) || nla_put_be32(skb, IFLA_GRE_IKEY, p->i_key) || nla_put_be32(skb, IFLA_GRE_OKEY, p->o_key) || nla_put_in6_addr(skb, IFLA_GRE_LOCAL, &p->laddr) ||
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 8220c870cb0f4eaa4e335c9645dbd9a1c461c1dd ]
This patch is to improve sctp stream adding events in 2 places:
1. In sctp_process_strreset_addstrm_out(), move up SCTP_MAX_STREAM and in stream allocation failure checks, as the adding has to succeed after reconf_timer stops for the in stream adding request retransmission.
3. In sctp_process_strreset_addstrm_in(), no event should be sent, as no in or out stream is added here.
Fixes: 50a41591f110 ("sctp: implement receiver-side procedures for the Add Outgoing Streams Request Parameter") Fixes: c5c4ebb3ab87 ("sctp: implement receiver-side procedures for the Add Incoming Streams Request Parameter") Reported-by: Ying Xu yinxu@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/stream.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-)
--- a/net/sctp/stream.c +++ b/net/sctp/stream.c @@ -866,6 +866,14 @@ struct sctp_chunk *sctp_process_strreset if (!(asoc->strreset_enable & SCTP_ENABLE_CHANGE_ASSOC_REQ)) goto out;
+ in = ntohs(addstrm->number_of_streams); + incnt = stream->incnt + in; + if (!in || incnt > SCTP_MAX_STREAM) + goto out; + + if (sctp_stream_alloc_in(stream, incnt, GFP_ATOMIC)) + goto out; + if (asoc->strreset_chunk) { if (!sctp_chunk_lookup_strreset_param( asoc, 0, SCTP_PARAM_RESET_ADD_IN_STREAMS)) { @@ -889,14 +897,6 @@ struct sctp_chunk *sctp_process_strreset } }
- in = ntohs(addstrm->number_of_streams); - incnt = stream->incnt + in; - if (!in || incnt > SCTP_MAX_STREAM) - goto out; - - if (sctp_stream_alloc_in(stream, incnt, GFP_ATOMIC)) - goto out; - stream->incnt = incnt;
result = SCTP_STRRESET_PERFORMED; @@ -966,9 +966,6 @@ struct sctp_chunk *sctp_process_strreset
result = SCTP_STRRESET_PERFORMED;
- *evp = sctp_ulpevent_make_stream_change_event(asoc, - 0, 0, ntohs(addstrm->number_of_streams), GFP_ATOMIC); - out: sctp_update_strreset_result(asoc, result); err:
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aya Levin ayal@mellanox.com
[ Upstream commit 9d2cbdc5d334967c35b5f58c7bf3208e17325647 ]
Prior to this patch the driver prohibited spoof checking on invalid MAC. Now the user can set this configuration if it wishes to.
This is required since libvirt might invalidate the VF Mac by setting it to zero, while spoofcheck is ON.
Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore") Signed-off-by: Aya Levin ayal@mellanox.com Reviewed-by: Eran Ben Elisha eranbe@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1133,13 +1133,6 @@ static int esw_vport_ingress_config(stru int err = 0; u8 *smac_v;
- if (vport->info.spoofchk && !is_valid_ether_addr(vport->info.mac)) { - mlx5_core_warn(esw->dev, - "vport[%d] configure ingress rules failed, illegal mac with spoofchk\n", - vport->vport); - return -EPERM; - } - esw_vport_cleanup_ingress_rules(esw, vport);
if (!vport->info.vlan && !vport->info.qos && !vport->info.spoofchk) { @@ -1812,13 +1805,10 @@ int mlx5_eswitch_set_vport_mac(struct ml mutex_lock(&esw->state_lock); evport = &esw->vports[vport];
- if (evport->info.spoofchk && !is_valid_ether_addr(mac)) { + if (evport->info.spoofchk && !is_valid_ether_addr(mac)) mlx5_core_warn(esw->dev, - "MAC invalidation is not allowed when spoofchk is on, vport(%d)\n", + "Set invalid MAC while spoofchk is on, vport(%d)\n", vport); - err = -EPERM; - goto unlock; - }
err = mlx5_modify_nic_vport_mac_address(esw->dev, vport, mac); if (err) { @@ -1964,6 +1954,10 @@ int mlx5_eswitch_set_vport_spoofchk(stru evport = &esw->vports[vport]; pschk = evport->info.spoofchk; evport->info.spoofchk = spoofchk; + if (pschk && !is_valid_ether_addr(evport->info.mac)) + mlx5_core_warn(esw->dev, + "Spoofchk in set while MAC is invalid, vport(%d)\n", + evport->vport); if (evport->enabled && esw->mode == SRIOV_LEGACY) err = esw_vport_ingress_config(esw, evport); if (err)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nir Dotan nird@mellanox.com
[ Upstream commit 146820cc240f4389cf33481c058d9493aef95e25 ]
When the MC route socket is closed, mroute_clean_tables() is called to cleanup existing routes. Mistakenly notifiers call was put on the cleanup of the unresolved MC route entries cache. In a case where the MC socket closes before an unresolved route expires, the notifier call leads to a crash, caused by the driver trying to increment a non initialized refcount_t object [1] and then when handling is done, to decrement it [2]. This was detected by a test recently added in commit 6d4efada3b82 ("selftests: forwarding: Add multicast routing test").
Fix that by putting notifiers call on the resolved entries traversal, instead of on the unresolved entries traversal.
[1]
[ 245.748967] refcount_t: increment on 0; use-after-free. [ 245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30 ... [ 245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016 [ 245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30 ... [ 245.907487] Call Trace: [ 245.910231] mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum] [ 245.917913] notifier_call_chain+0x45/0x7 [ 245.922484] atomic_notifier_call_chain+0x15/0x20 [ 245.927729] call_fib_notifiers+0x15/0x30 [ 245.932205] mroute_clean_tables+0x372/0x3f [ 245.936971] ip6mr_sk_done+0xb1/0xc0 [ 245.940960] ip6_mroute_setsockopt+0x1da/0x5f0 ...
[2]
[ 246.128487] refcount_t: underflow; use-after-free. [ 246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60 [ 246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016 ... [ 246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum] [ 246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60 ... [ 246.298889] Call Trace: [ 246.301617] refcount_dec_and_test_checked+0x11/0x20 [ 246.307170] mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum] [ 246.315531] process_one_work+0x1fa/0x3f0 [ 246.320005] worker_thread+0x2f/0x3e0 [ 246.324083] kthread+0x118/0x130 [ 246.327683] ? wq_update_unbound_numa+0x1b0/0x1b0 [ 246.332926] ? kthread_park+0x80/0x80 [ 246.337013] ret_from_fork+0x1f/0x30
Fixes: 088aa3eec2ce ("ip6mr: Support fib notifications") Signed-off-by: Nir Dotan nird@mellanox.com Reviewed-by: Ido Schimmel idosch@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6mr.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -1516,6 +1516,9 @@ static void mroute_clean_tables(struct m continue; rhltable_remove(&mrt->mfc_hash, &c->mnode, ip6mr_rht_params); list_del_rcu(&c->list); + call_ip6mr_mfc_entry_notifiers(read_pnet(&mrt->net), + FIB_EVENT_ENTRY_DEL, + (struct mfc6_cache *)c, mrt->id); mr6_netlink_event(mrt, (struct mfc6_cache *)c, RTM_DELROUTE); mr_cache_put(c); } @@ -1524,10 +1527,6 @@ static void mroute_clean_tables(struct m spin_lock_bh(&mfc_unres_lock); list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) { list_del(&c->list); - call_ip6mr_mfc_entry_notifiers(read_pnet(&mrt->net), - FIB_EVENT_ENTRY_DEL, - (struct mfc6_cache *)c, - mrt->id); mr6_netlink_event(mrt, (struct mfc6_cache *)c, RTM_DELROUTE); ip6mr_destroy_unres(mrt, (struct mfc6_cache *)c);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bodong Wang bodong@mellanox.com
[ Upstream commit 4e046de0f50e04acd48eb373d6a9061ddf014e0c ]
This reverts commit 5f5991f36dce1e69dd8bd7495763eec2e28f08e7.
With the original commit, eswitch instance will not be initialized for a function which is vport group manager but not eswitch manager such as host PF on SmartNIC (BlueField) card. This will result in a kernel crash when such a vport group manager is trying to access vports in its group. E.g, PF vport manager (not eswitch manager) tries to configure the MAC of its VF vport, a kernel trace will happen similar as bellow:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 ... RIP: 0010:mlx5_eswitch_get_vport_config+0xc/0x180 [mlx5_core] ...
Fixes: 5f5991f36dce ("net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager") Signed-off-by: Bodong Wang bodong@mellanox.com Reported-by: Yuval Avnery yuvalav@mellanox.com Reviewed-by: Daniel Jurgens danielj@mellanox.com Reviewed-by: Or Gerlitz ogerlitz@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1689,7 +1689,7 @@ int mlx5_eswitch_init(struct mlx5_core_d int vport_num; int err;
- if (!MLX5_ESWITCH_MANAGER(dev)) + if (!MLX5_VPORT_MANAGER(dev)) return 0;
esw_info(dev, @@ -1758,7 +1758,7 @@ abort:
void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw) { - if (!esw || !MLX5_ESWITCH_MANAGER(esw->dev)) + if (!esw || !MLX5_VPORT_MANAGER(esw->dev)) return;
esw_info(esw->dev, "cleanup\n");
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 4ff40b86262b73553ee47cc3784ce8ba0f220bd8 ]
In the paths:
sctp_sf_do_unexpected_init() -> sctp_make_init_ack() sctp_sf_do_dupcook_a/b()() -> sctp_sf_do_5_1D_ce()
The new chunk 'retval' transport is set from the incoming chunk 'chunk' transport. However, 'retval' transport belong to the new asoc, which is a different one from 'chunk' transport's asoc.
It will cause that the 'retval' chunk gets set with a wrong transport. Later when sending it and because of Commit b9fd683982c9 ("sctp: add sctp_packet_singleton"), sctp_packet_singleton() will set some fields, like vtag to 'retval' chunk from that wrong transport's asoc.
This patch is to fix it by setting 'retval' transport correctly which belongs to the right asoc in sctp_make_init_ack() and sctp_sf_do_5_1D_ce().
Fixes: b9fd683982c9 ("sctp: add sctp_packet_singleton") Reported-by: Ying Xu yinxu@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/sm_make_chunk.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -495,7 +495,10 @@ struct sctp_chunk *sctp_make_init_ack(co * * [INIT ACK back to where the INIT came from.] */ - retval->transport = chunk->transport; + if (chunk->transport) + retval->transport = + sctp_assoc_lookup_paddr(asoc, + &chunk->transport->ipaddr);
retval->subh.init_hdr = sctp_addto_chunk(retval, sizeof(initack), &initack); @@ -642,8 +645,10 @@ struct sctp_chunk *sctp_make_cookie_ack( * * [COOKIE ACK back to where the COOKIE ECHO came from.] */ - if (retval && chunk) - retval->transport = chunk->transport; + if (retval && chunk && chunk->transport) + retval->transport = + sctp_assoc_lookup_paddr(asoc, + &chunk->transport->ipaddr);
return retval; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit ecf938fe7d0088077ee1280419a2b3c5429b47c8 ]
Now sctp_transport_pmtu() passes transport->saddr into .get_dst() to set flow sport from 'saddr'. However, transport->saddr is set only when transport->dst exists in sctp_transport_route().
If sctp_transport_pmtu() is called without transport->saddr set, like when transport->dst doesn't exists, the flow sport will be set to 0 from transport->saddr, which will cause a wrong route to be got.
Commit 6e91b578bf3f ("sctp: re-use sctp_transport_pmtu in sctp_transport_route") made the issue be triggered more easily since sctp_transport_pmtu() would be called in sctp_transport_route() after that.
In gerneral, fl4->fl4_sport should always be set to htons(asoc->base.bind_addr.port), unless transport->asoc doesn't exist in sctp_v4/6_get_dst(), which is the case:
sctp_ootb_pkt_new() -> sctp_transport_route()
For that, we can simply handle it by setting flow sport from saddr only when it's 0 in sctp_v4/6_get_dst().
Fixes: 6e91b578bf3f ("sctp: re-use sctp_transport_pmtu in sctp_transport_route") Reported-by: Ying Xu yinxu@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/ipv6.c | 3 ++- net/sctp/protocol.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
--- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -277,7 +277,8 @@ static void sctp_v6_get_dst(struct sctp_
if (saddr) { fl6->saddr = saddr->v6.sin6_addr; - fl6->fl6_sport = saddr->v6.sin6_port; + if (!fl6->fl6_sport) + fl6->fl6_sport = saddr->v6.sin6_port;
pr_debug("src=%pI6 - ", &fl6->saddr); } --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -440,7 +440,8 @@ static void sctp_v4_get_dst(struct sctp_ } if (saddr) { fl4->saddr = saddr->v4.sin_addr.s_addr; - fl4->fl4_sport = saddr->v4.sin_port; + if (!fl4->fl4_sport) + fl4->fl4_sport = saddr->v4.sin_port; }
pr_debug("%s: dst:%pI4, src:%pI4 - ", __func__, &fl4->daddr,
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Watson davejwatson@fb.com
[ Upstream commit 1023121375c6b0b3dc00334983c762ba2b76cb19 ]
If there are outstanding async tx requests (when crypto returns EINPROGRESS), there is a potential deadlock: the tx work acquires the lock, while we cancel_delayed_work_sync() while holding the lock. Drop the lock while waiting for the work to complete.
Fixes: a42055e8d2c30 ("Add support for async encryption of records...") Signed-off-by: Dave Watson davejwatson@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tls/tls_sw.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1768,7 +1768,9 @@ void tls_sw_free_resources_tx(struct soc if (atomic_read(&ctx->encrypt_pending)) crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
+ release_sock(sk); cancel_delayed_work_sync(&ctx->tx_work.work); + lock_sock(sk);
/* Tx whatever records we can transmit and abandon the rest */ tls_tx_records(sk, -1);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Watson davejwatson@fb.com
[ Upstream commit 32eb67b93c9e3cd62cb423e30b090cdd4aa8d275 ]
aead_request_set_crypt takes an iv pointer, and we change the iv soon after setting it. Some async crypto algorithms don't save the iv, so we need to save it in the tls_rec for async requests.
Found by hardcoding x64 aesni to use async crypto manager (to test the async codepath), however I don't think this combination can happen in the wild. Presumably other hardware offloads will need this fix, but there have been no user reports.
Fixes: a42055e8d2c30 ("Add support for async encryption of records...") Signed-off-by: Dave Watson davejwatson@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/tls.h | 2 ++ net/tls/tls_sw.c | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-)
--- a/include/net/tls.h +++ b/include/net/tls.h @@ -120,6 +120,8 @@ struct tls_rec { struct scatterlist sg_aead_out[2];
char aad_space[TLS_AAD_SPACE_SIZE]; + u8 iv_data[TLS_CIPHER_AES_GCM_128_IV_SIZE + + TLS_CIPHER_AES_GCM_128_SALT_SIZE]; struct aead_request aead_req; u8 aead_req_ctx[]; }; --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -439,6 +439,8 @@ static int tls_do_encryption(struct sock struct scatterlist *sge = sk_msg_elem(msg_en, start); int rc;
+ memcpy(rec->iv_data, tls_ctx->tx.iv, sizeof(rec->iv_data)); + sge->offset += tls_ctx->tx.prepend_size; sge->length -= tls_ctx->tx.prepend_size;
@@ -448,7 +450,7 @@ static int tls_do_encryption(struct sock aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE); aead_request_set_crypt(aead_req, rec->sg_aead_in, rec->sg_aead_out, - data_len, tls_ctx->tx.iv); + data_len, rec->iv_data);
aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG, tls_encrypt_done, sk);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp
[ Upstream commit 8be4d9a492f88b96d4d3a06c6cbedbc40ca14c83 ]
Commit 4e09ff536284 ("virtio-net: disable NAPI only when enabled during XDP set") tried to fix inappropriate NAPI enabling/disabling when !netif_running(), but was not complete.
On error path virtio_net could enable NAPI even when !netif_running(). This can cause enabling NAPI twice on virtnet_open(), which would trigger BUG_ON() in napi_enable().
Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set") Signed-off-by: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp Acked-by: Jason Wang jasowang@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2429,8 +2429,10 @@ static int virtnet_xdp_set(struct net_de return 0;
err: - for (i = 0; i < vi->max_queue_pairs; i++) - virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); + if (netif_running(dev)) { + for (i = 0; i < vi->max_queue_pairs; i++) + virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); + } if (prog) bpf_prog_sub(prog, vi->max_queue_pairs - 1); return err;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp
[ Upstream commit 534da5e856334fb54cb0272a9fb3afec28ea3aed ]
When napi_tx is enabled, virtnet_poll_cleantx() called free_old_xmit_skbs() even for xdp send queue. This is bogus since the queue has xdp_frames, not sk_buffs, thus mangled device tx bytes counters because skb->len is meaningless value, and even triggered oops due to general protection fault on freeing them.
Since xdp send queues do not aquire locks, old xdp_frames should be freed only in virtnet_xdp_xmit(), so just skip free_old_xmit_skbs() for xdp send queues.
Similarly virtnet_poll_tx() called free_old_xmit_skbs(). This NAPI handler is called even without calling start_xmit() because cb for tx is by default enabled. Once the handler is called, it enabled the cb again, and then the handler would be called again. We don't need this handler for XDP, so don't enable cb as well as not calling free_old_xmit_skbs().
Also, we need to disable tx NAPI when disabling XDP, so virtnet_poll_tx() can safely access curr_queue_pairs and xdp_queue_pairs, which are not atomically updated while disabling XDP.
Fixes: b92f1e6751a6 ("virtio-net: transmit napi") Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") Signed-off-by: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp Acked-by: Jason Wang jasowang@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 49 +++++++++++++++++++++++++++++++---------------- 1 file changed, 33 insertions(+), 16 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1357,6 +1357,16 @@ static void free_old_xmit_skbs(struct se u64_stats_update_end(&sq->stats.syncp); }
+static bool is_xdp_raw_buffer_queue(struct virtnet_info *vi, int q) +{ + if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs)) + return false; + else if (q < vi->curr_queue_pairs) + return true; + else + return false; +} + static void virtnet_poll_cleantx(struct receive_queue *rq) { struct virtnet_info *vi = rq->vq->vdev->priv; @@ -1364,7 +1374,7 @@ static void virtnet_poll_cleantx(struct struct send_queue *sq = &vi->sq[index]; struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, index);
- if (!sq->napi.weight) + if (!sq->napi.weight || is_xdp_raw_buffer_queue(vi, index)) return;
if (__netif_tx_trylock(txq)) { @@ -1441,8 +1451,16 @@ static int virtnet_poll_tx(struct napi_s { struct send_queue *sq = container_of(napi, struct send_queue, napi); struct virtnet_info *vi = sq->vq->vdev->priv; - struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq)); + unsigned int index = vq2txq(sq->vq); + struct netdev_queue *txq;
+ if (unlikely(is_xdp_raw_buffer_queue(vi, index))) { + /* We don't need to enable cb for XDP */ + napi_complete_done(napi, 0); + return 0; + } + + txq = netdev_get_tx_queue(vi->dev, index); __netif_tx_lock(txq, raw_smp_processor_id()); free_old_xmit_skbs(sq); __netif_tx_unlock(txq); @@ -2401,9 +2419,12 @@ static int virtnet_xdp_set(struct net_de }
/* Make sure NAPI is not using any XDP TX queues for RX. */ - if (netif_running(dev)) - for (i = 0; i < vi->max_queue_pairs; i++) + if (netif_running(dev)) { + for (i = 0; i < vi->max_queue_pairs; i++) { napi_disable(&vi->rq[i].napi); + virtnet_napi_tx_disable(&vi->sq[i].napi); + } + }
netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp); err = _virtnet_set_queues(vi, curr_qp + xdp_qp); @@ -2422,16 +2443,22 @@ static int virtnet_xdp_set(struct net_de } if (old_prog) bpf_prog_put(old_prog); - if (netif_running(dev)) + if (netif_running(dev)) { virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); + virtnet_napi_tx_enable(vi, vi->sq[i].vq, + &vi->sq[i].napi); + } }
return 0;
err: if (netif_running(dev)) { - for (i = 0; i < vi->max_queue_pairs; i++) + for (i = 0; i < vi->max_queue_pairs; i++) { virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); + virtnet_napi_tx_enable(vi, vi->sq[i].vq, + &vi->sq[i].napi); + } } if (prog) bpf_prog_sub(prog, vi->max_queue_pairs - 1); @@ -2588,16 +2615,6 @@ static void free_receive_page_frags(stru put_page(vi->rq[i].alloc_frag.page); }
-static bool is_xdp_raw_buffer_queue(struct virtnet_info *vi, int q) -{ - if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs)) - return false; - else if (q < vi->curr_queue_pairs) - return true; - else - return false; -} - static void free_unused_bufs(struct virtnet_info *vi) { void *buf;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp
[ Upstream commit 188313c137c4f76afd0862f50dbc185b198b9e2a ]
When _virtnet_set_queues() failed we did not restore real_num_rx_queues. Fix this by placing the change of real_num_rx_queues after _virtnet_set_queues(). This order is also in line with virtnet_set_channels().
Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set") Signed-off-by: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp Acked-by: Jason Wang jasowang@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2426,10 +2426,10 @@ static int virtnet_xdp_set(struct net_de } }
- netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp); err = _virtnet_set_queues(vi, curr_qp + xdp_qp); if (err) goto err; + netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp); vi->xdp_queue_pairs = xdp_qp;
for (i = 0; i < vi->max_queue_pairs; i++) {
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp
[ Upstream commit 1667c08a9d31c7cdf09f4890816bfbf20b685495 ]
When XDP is disabled, curr_queue_pairs + smp_processor_id() can be larger than max_queue_pairs. There is no guarantee that we have enough XDP send queues dedicated for each cpu when XDP is disabled, so do not count drops on sq in that case.
Fixes: 5b8f3c8d30a6 ("virtio_net: Add XDP related stats") Signed-off-by: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp Acked-by: Jason Wang jasowang@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -490,20 +490,17 @@ static int virtnet_xdp_xmit(struct net_d int ret, err; int i;
- sq = virtnet_xdp_sq(vi); - - if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) { - ret = -EINVAL; - drops = n; - goto out; - } - /* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this * indicate XDP resources have been successfully allocated. */ xdp_prog = rcu_dereference(rq->xdp_prog); - if (!xdp_prog) { - ret = -ENXIO; + if (!xdp_prog) + return -ENXIO; + + sq = virtnet_xdp_sq(vi); + + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) { + ret = -EINVAL; drops = n; goto out; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp
[ Upstream commit 03aa6d34868c07b2b1b8b2db080602d7ec528173 ]
Commit 8dcc5b0ab0ec ("virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP") tried to avoid access to unexpected sq while XDP is disabled, but was not complete.
There was a small window which causes out of bounds sq access in virtnet_xdp_xmit() while disabling XDP.
An example case of - curr_queue_pairs = 6 (2 for SKB and 4 for XDP) - online_cpu_num = xdp_queue_paris = 4 when XDP is enabled:
CPU 0 CPU 1 (Disabling XDP) (Processing redirected XDP frames)
virtnet_xdp_xmit() virtnet_xdp_set() _virtnet_set_queues() set curr_queue_pairs (2) check if rq->xdp_prog is not NULL virtnet_xdp_sq(vi) qp = curr_queue_pairs - xdp_queue_pairs + smp_processor_id() = 2 - 4 + 1 = -1 sq = &vi->sq[qp] // out of bounds access set xdp_queue_pairs (0) rq->xdp_prog = NULL
Basically we should not change curr_queue_pairs and xdp_queue_pairs while someone can read the values. Thus, when disabling XDP, assign NULL to rq->xdp_prog first, and wait for RCU grace period, then change xxx_queue_pairs. Note that we need to keep the current order when enabling XDP though.
- v2: Make rcu_assign_pointer/synchronize_net conditional instead of _virtnet_set_queues.
Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp Acked-by: Jason Wang jasowang@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2409,6 +2409,10 @@ static int virtnet_xdp_set(struct net_de return -ENOMEM; }
+ old_prog = rtnl_dereference(vi->rq[0].xdp_prog); + if (!prog && !old_prog) + return 0; + if (prog) { prog = bpf_prog_add(prog, vi->max_queue_pairs - 1); if (IS_ERR(prog)) @@ -2423,21 +2427,30 @@ static int virtnet_xdp_set(struct net_de } }
+ if (!prog) { + for (i = 0; i < vi->max_queue_pairs; i++) { + rcu_assign_pointer(vi->rq[i].xdp_prog, prog); + if (i == 0) + virtnet_restore_guest_offloads(vi); + } + synchronize_net(); + } + err = _virtnet_set_queues(vi, curr_qp + xdp_qp); if (err) goto err; netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp); vi->xdp_queue_pairs = xdp_qp;
- for (i = 0; i < vi->max_queue_pairs; i++) { - old_prog = rtnl_dereference(vi->rq[i].xdp_prog); - rcu_assign_pointer(vi->rq[i].xdp_prog, prog); - if (i == 0) { - if (!old_prog) + if (prog) { + for (i = 0; i < vi->max_queue_pairs; i++) { + rcu_assign_pointer(vi->rq[i].xdp_prog, prog); + if (i == 0 && !old_prog) virtnet_clear_guest_offloads(vi); - if (!prog) - virtnet_restore_guest_offloads(vi); } + } + + for (i = 0; i < vi->max_queue_pairs; i++) { if (old_prog) bpf_prog_put(old_prog); if (netif_running(dev)) { @@ -2450,6 +2463,12 @@ static int virtnet_xdp_set(struct net_de return 0;
err: + if (!prog) { + virtnet_clear_guest_offloads(vi); + for (i = 0; i < vi->max_queue_pairs; i++) + rcu_assign_pointer(vi->rq[i].xdp_prog, old_prog); + } + if (netif_running(dev)) { for (i = 0; i < vi->max_queue_pairs; i++) { virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp
[ Upstream commit 07b344f494ddda9f061b396407c96df8c46c82b5 ]
put_page() can work as a fallback for freeing xdp_frames, but the appropriate way is to use xdp_return_frame().
Fixes: cac320c850ef ("virtio_net: convert to use generic xdp_frame and xdp_return_frame API") Signed-off-by: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp Acked-by: Jason Wang jasowang@redhat.com Acked-by: Jesper Dangaard Brouer brouer@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2642,7 +2642,7 @@ static void free_unused_bufs(struct virt if (!is_xdp_raw_buffer_queue(vi, i)) dev_kfree_skb(buf); else - put_page(virt_to_head_page(buf)); + xdp_return_frame(buf); } }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp
[ Upstream commit 5050471d35d1316ba32dfcbb409978337eb9e75e
I had to fold commit df133f3f9625 ("virtio_net: bulk free tx skbs") into this to make it work. ]
We do not reset or free up unused buffers when enabling/disabling XDP, so it can happen that xdp_frames are freed after disabling XDP or sk_buffs are freed after enabling XDP on xdp tx queues. Thus we need to handle both forms (xdp_frames and sk_buffs) regardless of XDP setting. One way to trigger this problem is to disable XDP when napi_tx is enabled. In that case, virtnet_xdp_set() calls virtnet_napi_enable() which kicks NAPI. The NAPI handler will call virtnet_poll_cleantx() which invokes free_old_xmit_skbs() for queues which have been used by XDP.
Note that even with this change we need to keep skipping free_old_xmit_skbs() from NAPI handlers when XDP is enabled, because XDP tx queues do not aquire queue locks.
- v2: Use napi_consume_skb() instead of dev_consume_skb_any()
Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set") Signed-off-by: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp Acked-by: Jason Wang jasowang@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 64 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 47 insertions(+), 17 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -57,6 +57,8 @@ module_param(napi_tx, bool, 0644); #define VIRTIO_XDP_TX BIT(0) #define VIRTIO_XDP_REDIR BIT(1)
+#define VIRTIO_XDP_FLAG BIT(0) + /* RX packet size EWMA. The average packet size is used to determine the packet * buffer size when refilling RX rings. As the entire RX ring may be refilled * at once, the weight is chosen so that the EWMA will be insensitive to short- @@ -251,6 +253,21 @@ struct padded_vnet_hdr { char padding[4]; };
+static bool is_xdp_frame(void *ptr) +{ + return (unsigned long)ptr & VIRTIO_XDP_FLAG; +} + +static void *xdp_to_ptr(struct xdp_frame *ptr) +{ + return (void *)((unsigned long)ptr | VIRTIO_XDP_FLAG); +} + +static struct xdp_frame *ptr_to_xdp(void *ptr) +{ + return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG); +} + /* Converting between virtqueue no. and kernel tx/rx queue no. * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq */ @@ -461,7 +478,8 @@ static int __virtnet_xdp_xmit_one(struct
sg_init_one(sq->sg, xdpf->data, xdpf->len);
- err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdpf, GFP_ATOMIC); + err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf), + GFP_ATOMIC); if (unlikely(err)) return -ENOSPC; /* Caller handle free/refcnt */
@@ -481,13 +499,13 @@ static int virtnet_xdp_xmit(struct net_d { struct virtnet_info *vi = netdev_priv(dev); struct receive_queue *rq = vi->rq; - struct xdp_frame *xdpf_sent; struct bpf_prog *xdp_prog; struct send_queue *sq; unsigned int len; int drops = 0; int kicks = 0; int ret, err; + void *ptr; int i;
/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this @@ -506,8 +524,12 @@ static int virtnet_xdp_xmit(struct net_d }
/* Free up any pending old buffers before queueing new ones. */ - while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) - xdp_return_frame(xdpf_sent); + while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { + if (likely(is_xdp_frame(ptr))) + xdp_return_frame(ptr_to_xdp(ptr)); + else + napi_consume_skb(ptr, false); + }
for (i = 0; i < n; i++) { struct xdp_frame *xdpf = frames[i]; @@ -1326,20 +1348,28 @@ static int virtnet_receive(struct receiv return stats.packets; }
-static void free_old_xmit_skbs(struct send_queue *sq) +static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi) { - struct sk_buff *skb; unsigned int len; unsigned int packets = 0; unsigned int bytes = 0; + void *ptr;
- while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) { - pr_debug("Sent skb %p\n", skb); + while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { + if (likely(!is_xdp_frame(ptr))) { + struct sk_buff *skb = ptr;
- bytes += skb->len; - packets++; + pr_debug("Sent skb %p\n", skb);
- dev_consume_skb_any(skb); + bytes += skb->len; + napi_consume_skb(skb, in_napi); + } else { + struct xdp_frame *frame = ptr_to_xdp(ptr); + + bytes += frame->len; + xdp_return_frame(frame); + } + packets++; }
/* Avoid overhead when no packets have been processed @@ -1375,7 +1405,7 @@ static void virtnet_poll_cleantx(struct return;
if (__netif_tx_trylock(txq)) { - free_old_xmit_skbs(sq); + free_old_xmit_skbs(sq, true); __netif_tx_unlock(txq); }
@@ -1459,7 +1489,7 @@ static int virtnet_poll_tx(struct napi_s
txq = netdev_get_tx_queue(vi->dev, index); __netif_tx_lock(txq, raw_smp_processor_id()); - free_old_xmit_skbs(sq); + free_old_xmit_skbs(sq, true); __netif_tx_unlock(txq);
virtqueue_napi_complete(napi, sq->vq, 0); @@ -1528,7 +1558,7 @@ static netdev_tx_t start_xmit(struct sk_ bool use_napi = sq->napi.weight;
/* Free up any pending old buffers before queueing new ones. */ - free_old_xmit_skbs(sq); + free_old_xmit_skbs(sq, false);
if (use_napi && kick) virtqueue_enable_cb_delayed(sq->vq); @@ -1571,7 +1601,7 @@ static netdev_tx_t start_xmit(struct sk_ if (!use_napi && unlikely(!virtqueue_enable_cb_delayed(sq->vq))) { /* More just got used, free them then recheck. */ - free_old_xmit_skbs(sq); + free_old_xmit_skbs(sq, false); if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) { netif_start_subqueue(dev, qnum); virtqueue_disable_cb(sq->vq); @@ -2639,10 +2669,10 @@ static void free_unused_bufs(struct virt for (i = 0; i < vi->max_queue_pairs; i++) { struct virtqueue *vq = vi->sq[i].vq; while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) { - if (!is_xdp_raw_buffer_queue(vi, i)) + if (!is_xdp_frame(buf)) dev_kfree_skb(buf); else - xdp_return_frame(buf); + xdp_return_frame(ptr_to_xdp(buf)); } }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Ahern dsahern@gmail.com
[ Upstream commit c5ee066333ebc322a24a00a743ed941a0c68617e ]
IPv6 does not consider if the socket is bound to a device when binding to an address. The result is that a socket can be bound to eth0 and then bound to the address of eth1. If the device is a VRF, the result is that a socket can only be bound to an address in the default VRF.
Resolve by considering the device if sk_bound_dev_if is set.
This problem exists from the beginning of git history.
Signed-off-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/af_inet6.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -362,6 +362,9 @@ static int __inet6_bind(struct sock *sk, err = -EINVAL; goto out_unlock; } + } + + if (sk->sk_bound_dev_if) { dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if); if (!dev) { err = -ENODEV;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Shilovsky pshilov@microsoft.com
commit 8e6e72aeceaaed5aeeb1cb43d3085de7ceb14f79 upstream.
Signed-off-by: Pavel Shilovsky pshilov@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com CC: Stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/smb2pdu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -3726,8 +3726,8 @@ SMB2_query_directory(const unsigned int rsp->sync_hdr.Status == STATUS_NO_MORE_FILES) { srch_inf->endOfSearch = true; rc = 0; - } - cifs_stats_fail_inc(tcon, SMB2_QUERY_DIRECTORY_HE); + } else + cifs_stats_fail_inc(tcon, SMB2_QUERY_DIRECTORY_HE); goto qdir_exit; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Shilovsky pshilov@microsoft.com
commit 9bda8723da2d55b1de833b98cf802b88006e5b69 upstream.
Allocation of a page array for non-cached IO was separated from allocation of rdata and wdata structures and this introduced memory leaks and a possible null pointer dereference. This patch fixes these problems.
Cc: stable@vger.kernel.org Signed-off-by: Pavel Shilovsky pshilov@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/file.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2670,6 +2670,7 @@ cifs_write_from_iter(loff_t offset, size
rc = cifs_write_allocate_pages(wdata->pages, nr_pages); if (rc) { + kvfree(wdata->pages); kfree(wdata); add_credits_and_wake_if(server, credits, 0); break; @@ -2681,6 +2682,7 @@ cifs_write_from_iter(loff_t offset, size if (rc) { for (i = 0; i < nr_pages; i++) put_page(wdata->pages[i]); + kvfree(wdata->pages); kfree(wdata); add_credits_and_wake_if(server, credits, 0); break; @@ -3360,8 +3362,12 @@ cifs_send_async_read(loff_t offset, size }
rc = cifs_read_allocate_pages(rdata, npages); - if (rc) - goto error; + if (rc) { + kvfree(rdata->pages); + kfree(rdata); + add_credits_and_wake_if(server, credits, 0); + break; + }
rdata->tailsz = PAGE_SIZE; } @@ -3381,7 +3387,6 @@ cifs_send_async_read(loff_t offset, size if (!rdata->cfile->invalidHandle || !(rc = cifs_reopen_file(rdata->cfile, true))) rc = server->ops->async_readv(rdata); -error: if (rc) { add_credits_and_wake_if(server, rdata->credits, 0); kref_put(&rdata->refcount,
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Shilovsky pshilov@microsoft.com
commit 7d42e72fe8ee5ab70b1af843dd7d8615e6fb0abe upstream.
Currently we log success once we send an async IO request to the server. Instead we need to analyse a response and then log success or failure for a particular command. Also fix argument list for read logging.
Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Pavel Shilovsky pshilov@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/smb2pdu.c | 46 ++++++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 16 deletions(-)
--- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -3139,8 +3139,17 @@ smb2_readv_callback(struct mid_q_entry * rdata->mr = NULL; } #endif - if (rdata->result) + if (rdata->result) { cifs_stats_fail_inc(tcon, SMB2_READ_HE); + trace_smb3_read_err(0 /* xid */, + rdata->cfile->fid.persistent_fid, + tcon->tid, tcon->ses->Suid, rdata->offset, + rdata->bytes, rdata->result); + } else + trace_smb3_read_done(0 /* xid */, + rdata->cfile->fid.persistent_fid, + tcon->tid, tcon->ses->Suid, + rdata->offset, rdata->got_bytes);
queue_work(cifsiod_wq, &rdata->work); DeleteMidQEntry(mid); @@ -3215,13 +3224,11 @@ smb2_async_readv(struct cifs_readdata *r if (rc) { kref_put(&rdata->refcount, cifs_readdata_release); cifs_stats_fail_inc(io_parms.tcon, SMB2_READ_HE); - trace_smb3_read_err(rc, 0 /* xid */, io_parms.persistent_fid, - io_parms.tcon->tid, io_parms.tcon->ses->Suid, - io_parms.offset, io_parms.length); - } else - trace_smb3_read_done(0 /* xid */, io_parms.persistent_fid, - io_parms.tcon->tid, io_parms.tcon->ses->Suid, - io_parms.offset, io_parms.length); + trace_smb3_read_err(0 /* xid */, io_parms.persistent_fid, + io_parms.tcon->tid, + io_parms.tcon->ses->Suid, + io_parms.offset, io_parms.length, rc); + }
cifs_small_buf_release(buf); return rc; @@ -3265,10 +3272,11 @@ SMB2_read(const unsigned int xid, struct if (rc != -ENODATA) { cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE); cifs_dbg(VFS, "Send error in read = %d\n", rc); + trace_smb3_read_err(xid, req->PersistentFileId, + io_parms->tcon->tid, ses->Suid, + io_parms->offset, io_parms->length, + rc); } - trace_smb3_read_err(rc, xid, req->PersistentFileId, - io_parms->tcon->tid, ses->Suid, - io_parms->offset, io_parms->length); free_rsp_buf(resp_buftype, rsp_iov.iov_base); return rc == -ENODATA ? 0 : rc; } else @@ -3354,8 +3362,17 @@ smb2_writev_callback(struct mid_q_entry wdata->mr = NULL; } #endif - if (wdata->result) + if (wdata->result) { cifs_stats_fail_inc(tcon, SMB2_WRITE_HE); + trace_smb3_write_err(0 /* no xid */, + wdata->cfile->fid.persistent_fid, + tcon->tid, tcon->ses->Suid, wdata->offset, + wdata->bytes, wdata->result); + } else + trace_smb3_write_done(0 /* no xid */, + wdata->cfile->fid.persistent_fid, + tcon->tid, tcon->ses->Suid, + wdata->offset, wdata->bytes);
queue_work(cifsiod_wq, &wdata->work); DeleteMidQEntry(mid); @@ -3497,10 +3514,7 @@ smb2_async_writev(struct cifs_writedata wdata->bytes, rc); kref_put(&wdata->refcount, release); cifs_stats_fail_inc(tcon, SMB2_WRITE_HE); - } else - trace_smb3_write_done(0 /* no xid */, req->PersistentFileId, - tcon->tid, tcon->ses->Suid, wdata->offset, - wdata->bytes); + }
async_writev_out: cifs_small_buf_release(req);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aurelien Aptel aaptel@suse.com
commit d339adc12a4f885b572c5412e4869af8939db854 upstream.
The request buffers are freed right before copying the pointers. Use the func args instead which are identical and still valid.
Simple reproducer (requires KASAN enabled) on a cifs mount:
echo foo > foo ; tail -f foo & rm foo
Cc: stable@vger.kernel.org # 4.20 Fixes: 179e44d49c2f ("smb3: add tracepoint for sending lease break responses to server") Signed-off-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Steve French stfrench@microsoft.com Reviewed-by: Paulo Alcantara palcantara@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/smb2pdu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -4339,8 +4339,8 @@ SMB2_lease_break(const unsigned int xid, rc = cifs_send_recv(xid, ses, &rqst, &resp_buf_type, flags, &rsp_iov); cifs_small_buf_release(req);
- please_key_low = (__u64 *)req->LeaseKey; - please_key_high = (__u64 *)(req->LeaseKey+8); + please_key_low = (__u64 *)lease_key; + please_key_high = (__u64 *)(lease_key+8); if (rc) { cifs_stats_fail_inc(tcon, SMB2_OPLOCK_BREAK_HE); trace_smb3_lease_err(le32_to_cpu(lease_state), tcon->tid,
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Shilovsky pshilov@microsoft.com
commit 082aaa8700415f6471ec9c5ef0c8307ca214989a upstream.
When doing reads beyound the end of a file the server returns error STATUS_END_OF_FILE error which is mapped to -ENODATA. Currently we report it as a failure which confuses read stats. Change it to not consider -ENODATA as failure for stat purposes.
Signed-off-by: Pavel Shilovsky pshilov@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com CC: Stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/smb2pdu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -3139,7 +3139,7 @@ smb2_readv_callback(struct mid_q_entry * rdata->mr = NULL; } #endif - if (rdata->result) { + if (rdata->result && rdata->result != -ENODATA) { cifs_stats_fail_inc(tcon, SMB2_READ_HE); trace_smb3_read_err(0 /* xid */, rdata->cfile->fid.persistent_fid,
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Waiman Long longman@redhat.com
commit 1dbd449c9943e3145148cc893c2461b72ba6fef0 upstream.
The nr_dentry_unused per-cpu counter tracks dentries in both the LRU lists and the shrink lists where the DCACHE_LRU_LIST bit is set.
The shrink_dcache_sb() function moves dentries from the LRU list to a shrink list and subtracts the dentry count from nr_dentry_unused. This is incorrect as the nr_dentry_unused count will also be decremented in shrink_dentry_list() via d_shrink_del().
To fix this double decrement, the decrement in the shrink_dcache_sb() function is taken out.
Fixes: 4e717f5c1083 ("list_lru: remove special case function list_lru_dispose_all." Cc: stable@kernel.org Signed-off-by: Waiman Long longman@redhat.com Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/dcache.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
--- a/fs/dcache.c +++ b/fs/dcache.c @@ -1188,15 +1188,11 @@ static enum lru_status dentry_lru_isolat */ void shrink_dcache_sb(struct super_block *sb) { - long freed; - do { LIST_HEAD(dispose);
- freed = list_lru_walk(&sb->s_dentry_lru, + list_lru_walk(&sb->s_dentry_lru, dentry_lru_isolate_shrink, &dispose, 1024); - - this_cpu_sub(nr_dentry_unused, freed); shrink_dentry_list(&dispose); } while (list_lru_count(&sb->s_dentry_lru) > 0); }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gerald Schaefer gerald.schaefer@de.ibm.com
commit 198bc3252ea3a45b0c5d500e6a5b91cfdd08f001 upstream.
Commit 9d3a4de4cb8d ("iommu: Disambiguate MSI region types") changed the reserved region type in intel_iommu_get_resv_regions() from IOMMU_RESV_RESERVED to IOMMU_RESV_MSI, but it forgot to also change the type in intel_iommu_put_resv_regions().
This leads to a memory leak, because now the check in intel_iommu_put_resv_regions() for IOMMU_RESV_RESERVED will never be true, and no allocated regions will be freed.
Fix this by changing the region type in intel_iommu_put_resv_regions() to IOMMU_RESV_MSI, matching the type of the allocated regions.
Fixes: 9d3a4de4cb8d ("iommu: Disambiguate MSI region types") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Gerald Schaefer gerald.schaefer@de.ibm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/iommu/intel-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5204,7 +5204,7 @@ static void intel_iommu_put_resv_regions struct iommu_resv_region *entry, *next;
list_for_each_entry_safe(entry, next, head, list) { - if (entry->type == IOMMU_RESV_RESERVED) + if (entry->type == IOMMU_RESV_MSI) kfree(entry); } }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
commit ed5f13261cb65b02c611ae9971677f33581d4286 upstream.
Passing EPERM during syscall skipping was confusing since the test wasn't actually exercising the errno evaluation -- it was just passing a literal "1" (EPERM). Instead, expand the tests to check both direct value returns (positive, 45000 in this case), and errno values (negative, -ESRCH in this case) to check both fake success and fake failure during syscall skipping.
Reported-by: Colin Ian King colin.king@canonical.com Fixes: a33b2d0359a0 ("selftests/seccomp: Add tests for basic ptrace actions") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Shuah Khan shuah@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/testing/selftests/seccomp/seccomp_bpf.c | 72 ++++++++++++++++++++------ 1 file changed, 57 insertions(+), 15 deletions(-)
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -1563,7 +1563,16 @@ TEST_F(TRACE_poke, getpid_runs_normally) #ifdef SYSCALL_NUM_RET_SHARE_REG # define EXPECT_SYSCALL_RETURN(val, action) EXPECT_EQ(-1, action) #else -# define EXPECT_SYSCALL_RETURN(val, action) EXPECT_EQ(val, action) +# define EXPECT_SYSCALL_RETURN(val, action) \ + do { \ + errno = 0; \ + if (val < 0) { \ + EXPECT_EQ(-1, action); \ + EXPECT_EQ(-(val), errno); \ + } else { \ + EXPECT_EQ(val, action); \ + } \ + } while (0) #endif
/* Use PTRACE_GETREGS and PTRACE_SETREGS when available. This is useful for @@ -1602,7 +1611,7 @@ int get_syscall(struct __test_metadata *
/* Architecture-specific syscall changing routine. */ void change_syscall(struct __test_metadata *_metadata, - pid_t tracee, int syscall) + pid_t tracee, int syscall, int result) { int ret; ARCH_REGS regs; @@ -1661,7 +1670,7 @@ void change_syscall(struct __test_metada #ifdef SYSCALL_NUM_RET_SHARE_REG TH_LOG("Can't modify syscall return on this architecture"); #else - regs.SYSCALL_RET = EPERM; + regs.SYSCALL_RET = result; #endif
#ifdef HAVE_GETREGS @@ -1689,14 +1698,19 @@ void tracer_syscall(struct __test_metada case 0x1002: /* change getpid to getppid. */ EXPECT_EQ(__NR_getpid, get_syscall(_metadata, tracee)); - change_syscall(_metadata, tracee, __NR_getppid); + change_syscall(_metadata, tracee, __NR_getppid, 0); break; case 0x1003: - /* skip gettid. */ + /* skip gettid with valid return code. */ EXPECT_EQ(__NR_gettid, get_syscall(_metadata, tracee)); - change_syscall(_metadata, tracee, -1); + change_syscall(_metadata, tracee, -1, 45000); break; case 0x1004: + /* skip openat with error. */ + EXPECT_EQ(__NR_openat, get_syscall(_metadata, tracee)); + change_syscall(_metadata, tracee, -1, -ESRCH); + break; + case 0x1005: /* do nothing (allow getppid) */ EXPECT_EQ(__NR_getppid, get_syscall(_metadata, tracee)); break; @@ -1729,9 +1743,11 @@ void tracer_ptrace(struct __test_metadat nr = get_syscall(_metadata, tracee);
if (nr == __NR_getpid) - change_syscall(_metadata, tracee, __NR_getppid); + change_syscall(_metadata, tracee, __NR_getppid, 0); + if (nr == __NR_gettid) + change_syscall(_metadata, tracee, -1, 45000); if (nr == __NR_openat) - change_syscall(_metadata, tracee, -1); + change_syscall(_metadata, tracee, -1, -ESRCH); }
FIXTURE_DATA(TRACE_syscall) { @@ -1748,8 +1764,10 @@ FIXTURE_SETUP(TRACE_syscall) BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_TRACE | 0x1002), BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_gettid, 0, 1), BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_TRACE | 0x1003), - BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_getppid, 0, 1), + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_openat, 0, 1), BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_TRACE | 0x1004), + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_getppid, 0, 1), + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_TRACE | 0x1005), BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), };
@@ -1797,15 +1815,26 @@ TEST_F(TRACE_syscall, ptrace_syscall_red EXPECT_NE(self->mypid, syscall(__NR_getpid)); }
-TEST_F(TRACE_syscall, ptrace_syscall_dropped) +TEST_F(TRACE_syscall, ptrace_syscall_errno) +{ + /* Swap SECCOMP_RET_TRACE tracer for PTRACE_SYSCALL tracer. */ + teardown_trace_fixture(_metadata, self->tracer); + self->tracer = setup_trace_fixture(_metadata, tracer_ptrace, NULL, + true); + + /* Tracer should skip the open syscall, resulting in ESRCH. */ + EXPECT_SYSCALL_RETURN(-ESRCH, syscall(__NR_openat)); +} + +TEST_F(TRACE_syscall, ptrace_syscall_faked) { /* Swap SECCOMP_RET_TRACE tracer for PTRACE_SYSCALL tracer. */ teardown_trace_fixture(_metadata, self->tracer); self->tracer = setup_trace_fixture(_metadata, tracer_ptrace, NULL, true);
- /* Tracer should skip the open syscall, resulting in EPERM. */ - EXPECT_SYSCALL_RETURN(EPERM, syscall(__NR_openat)); + /* Tracer should skip the gettid syscall, resulting fake pid. */ + EXPECT_SYSCALL_RETURN(45000, syscall(__NR_gettid)); }
TEST_F(TRACE_syscall, syscall_allowed) @@ -1838,7 +1867,21 @@ TEST_F(TRACE_syscall, syscall_redirected EXPECT_NE(self->mypid, syscall(__NR_getpid)); }
-TEST_F(TRACE_syscall, syscall_dropped) +TEST_F(TRACE_syscall, syscall_errno) +{ + long ret; + + ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + ASSERT_EQ(0, ret); + + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->prog, 0, 0); + ASSERT_EQ(0, ret); + + /* openat has been skipped and an errno return. */ + EXPECT_SYSCALL_RETURN(-ESRCH, syscall(__NR_openat)); +} + +TEST_F(TRACE_syscall, syscall_faked) { long ret;
@@ -1849,8 +1892,7 @@ TEST_F(TRACE_syscall, syscall_dropped) ASSERT_EQ(0, ret);
/* gettid has been skipped and an altered return value stored. */ - EXPECT_SYSCALL_RETURN(EPERM, syscall(__NR_gettid)); - EXPECT_NE(self->mytid, syscall(__NR_gettid)); + EXPECT_SYSCALL_RETURN(45000, syscall(__NR_gettid)); }
TEST_F(TRACE_syscall, skip_after_RET_TRACE)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trondmy@gmail.com
commit 8fc75bed96bb94e23ca51bd9be4daf65c57697bf upstream.
Ensure that we return the fatal error value that caused us to exit nfs_page_async_flush().
Fixes: c373fff7bd25 ("NFSv4: Don't special case "launder"") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Cc: stable@vger.kernel.org # v4.12+ Reviewed-by: Benjamin Coddington bcodding@redhat.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfs/write.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -621,11 +621,12 @@ static int nfs_page_async_flush(struct n nfs_set_page_writeback(page); WARN_ON_ONCE(test_bit(PG_CLEAN, &req->wb_flags));
- ret = 0; + ret = req->wb_context->error; /* If there is a fatal error that covers this write, just exit */ - if (nfs_error_is_fatal_on_server(req->wb_context->error)) + if (nfs_error_is_fatal_on_server(ret)) goto out_launder;
+ ret = 0; if (!nfs_pageio_add_request(pgio, req)) { ret = pgio->pg_error; /* @@ -635,9 +636,9 @@ static int nfs_page_async_flush(struct n nfs_context_set_write_error(req->wb_context, ret); if (nfs_error_is_fatal_on_server(ret)) goto out_launder; - } + } else + ret = -EAGAIN; nfs_redirty_request(req); - ret = -EAGAIN; } else nfs_add_stats(page_file_mapping(page)->host, NFSIOS_WRITEPAGES, 1);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Koen Vandeputte koen.vandeputte@ncentric.com
commit 65dbb423cf28232fed1732b779249d6164c5999b upstream.
Originally, cns3xxx used its own functions for mapping, reading and writing config registers.
Commit 802b7c06adc7 ("ARM: cns3xxx: Convert PCI to use generic config accessors") removed the internal PCI config write function in favor of the generic one:
cns3xxx_pci_write_config() --> pci_generic_config_write()
cns3xxx_pci_write_config() expected aligned addresses, being produced by cns3xxx_pci_map_bus() while the generic one pci_generic_config_write() actually expects the real address as both the function and hardware are capable of byte-aligned writes.
This currently leads to pci_generic_config_write() writing to the wrong registers.
For instance, upon ath9k module loading:
- driver ath9k gets loaded - The driver wants to write value 0xA8 to register PCI_LATENCY_TIMER, located at 0x0D - cns3xxx_pci_map_bus() aligns the address to 0x0C - pci_generic_config_write() effectively writes 0xA8 into register 0x0C (CACHE_LINE_SIZE)
Fix the bug by removing the alignment in the cns3xxx mapping function.
Fixes: 802b7c06adc7 ("ARM: cns3xxx: Convert PCI to use generic config accessors") Signed-off-by: Koen Vandeputte koen.vandeputte@ncentric.com [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Acked-by: Krzysztof Halasa khalasa@piap.pl Acked-by: Tim Harvey tharvey@gateworks.com Acked-by: Arnd Bergmann arnd@arndb.de CC: stable@vger.kernel.org # v4.0+ CC: Bjorn Helgaas bhelgaas@google.com CC: Olof Johansson olof@lixom.net CC: Robin Leblon robin.leblon@ncentric.com CC: Rob Herring robh@kernel.org CC: Russell King linux@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/mach-cns3xxx/pcie.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/mach-cns3xxx/pcie.c +++ b/arch/arm/mach-cns3xxx/pcie.c @@ -83,7 +83,7 @@ static void __iomem *cns3xxx_pci_map_bus } else /* remote PCI bus */ base = cnspci->cfg1_regs + ((busno & 0xf) << 20);
- return base + (where & 0xffc) + (devfn << 12); + return base + where + (devfn << 12); }
static int cns3xxx_pci_read_config(struct pci_bus *bus, unsigned int devfn,
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel ard.biesheuvel@linaro.org
commit 8ea235932314311f15ea6cf65c1393ed7e31af70 upstream.
Commit 1598ecda7b23 ("arm64: kaslr: ensure randomized quantities are clean to the PoC") added cache maintenance to ensure that global variables set by the kaslr init routine are not wiped clean due to cache invalidation occurring during the second round of page table creation.
However, if kaslr_early_init() exits early with no randomization being applied (either due to the lack of a seed, or because the user has disabled kaslr explicitly), no cache maintenance is performed, leading to the same issue we attempted to fix earlier, as far as the module_alloc_base variable is concerned.
Note that module_alloc_base cannot be initialized statically, because that would cause it to be subject to a R_AARCH64_RELATIVE relocation, causing it to be overwritten by the second round of KASLR relocation processing.
Fixes: f80fb3a3d508 ("arm64: add support for kernel ASLR") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/kaslr.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm64/kernel/kaslr.c +++ b/arch/arm64/kernel/kaslr.c @@ -88,6 +88,7 @@ u64 __init kaslr_early_init(u64 dt_phys) * we end up running with module randomization disabled. */ module_alloc_base = (u64)_etext - MODULES_VSIZE; + __flush_dcache_area(&module_alloc_base, sizeof(module_alloc_base));
/* * Try to map the FDT early. If this fails, we simply bail,
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Catalin Marinas catalin.marinas@arm.com
commit 132fdc379eb143932d209a20fd581e1ce7630960 upstream.
Commit 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings") was aimed at fixing the I-cache invalidation for kernel mappings. However, it inadvertently caused all cache maintenance for user mappings via set_pte_at() -> __sync_icache_dcache() -> sync_icache_aliases() to call kick_all_cpus_sync().
Reported-by: Shijith Thotton sthotton@marvell.com Tested-by: Shijith Thotton sthotton@marvell.com Reported-by: Wandun Chen chenwandun@huawei.com Fixes: 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings") Cc: stable@vger.kernel.org # 4.19.x- Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/mm/flush.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -33,7 +33,11 @@ void sync_icache_aliases(void *kaddr, un __clean_dcache_area_pou(kaddr, len); __flush_icache_all(); } else { - flush_icache_range(addr, addr + len); + /* + * Don't issue kick_all_cpus_sync() after I-cache invalidation + * for user mappings. + */ + __flush_icache_range(addr, addr + len); } }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
commit 8fac5cbdfe0f01254d9d265c6aa1a95f94f58595 upstream.
The hyp-stub is loaded by the kernel's early startup code at EL2 during boot, before KVM takes ownership later. The hyp-stub's text is part of the regular kernel text, meaning it can be kprobed.
A breakpoint in the hyp-stub causes the CPU to spin in el2_sync_invalid.
Add it to the __hyp_text.
Signed-off-by: James Morse james.morse@arm.com Cc: stable@vger.kernel.org Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/hyp-stub.S | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -28,6 +28,8 @@ #include <asm/virt.h>
.text + .pushsection .hyp.text, "ax" + .align 11
ENTRY(__hyp_stub_vectors)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
commit f7daa9c8fd191724b9ab9580a7be55cd1a67d799 upstream.
During resume hibernate restores all physical memory. Any memory that is accessed with the MMU disabled needs to be cleaned to the PoC.
KVMs __hyp_text was previously ommitted as it runs with the MMU enabled, but now that the hyp-stub is located in this section, we must clean __hyp_text too.
This ensures secondary CPUs that come online after hibernate has finished resuming, and load KVM via the freshly written hyp-stub see the correct instructions.
Signed-off-by: James Morse james.morse@arm.com Cc: stable@vger.kernel.org Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/hibernate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -299,8 +299,10 @@ int swsusp_arch_suspend(void) dcache_clean_range(__idmap_text_start, __idmap_text_end);
/* Clean kvm setup code to PoC? */ - if (el2_reset_needed()) + if (el2_reset_needed()) { dcache_clean_range(__hyp_idmap_text_start, __hyp_idmap_text_end); + dcache_clean_range(__hyp_text_start, __hyp_text_end); + }
/* make the crash dump kernel image protected again */ crash_post_resume();
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Axel Lin axel.lin@ingics.com
commit 2095a45e345e669ea77a9b34bdd7de5ceb422f93 upstream.
The altr_a10sr_gpio_direction_output should set proper output level based on the value argument.
Fixes: 26a48c4cc2f1 ("gpio: altera-a10sr: Add A10 System Resource Chip GPIO support.") Cc: stable@vger.kernel.org Signed-off-by: Axel Lin axel.lin@ingics.com Tested by: Thor Thayer thor.thayer@linux.intel.com Reviewed by: Thor Thayer thor.thayer@linux.intel.com Signed-off-by: Bartosz Golaszewski bgolaszewski@baylibre.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpio/gpio-altera-a10sr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/gpio/gpio-altera-a10sr.c +++ b/drivers/gpio/gpio-altera-a10sr.c @@ -66,8 +66,10 @@ static int altr_a10sr_gpio_direction_inp static int altr_a10sr_gpio_direction_output(struct gpio_chip *gc, unsigned int nr, int value) { - if (nr <= (ALTR_A10SR_OUT_VALID_RANGE_HI - ALTR_A10SR_LED_VALID_SHIFT)) + if (nr <= (ALTR_A10SR_OUT_VALID_RANGE_HI - ALTR_A10SR_LED_VALID_SHIFT)) { + altr_a10sr_gpio_set(gc, nr, value); return 0; + } return -EINVAL; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bartosz Golaszewski bgolaszewski@baylibre.com
commit 1033be58992f818dc564196ded2bcc3f360bc297 upstream.
Nested interrupts run inside the calling thread's context and the top half handler is never called which means that we never read the timestamp.
This issue came up when trying to read line events from a gpiochip using regmap_irq_chip for interrupts.
Fix it by reading the timestamp from the irq thread function if it's still 0 by the time the second handler is called.
Fixes: d58f2bf261fd ("gpio: Timestamp events in hardirq handler") Cc: stable@vger.kernel.org Signed-off-by: Bartosz Golaszewski bgolaszewski@baylibre.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpio/gpiolib.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -828,7 +828,14 @@ static irqreturn_t lineevent_irq_thread( /* Do not leak kernel stack to userspace */ memset(&ge, 0, sizeof(ge));
- ge.timestamp = le->timestamp; + /* + * We may be running from a nested threaded interrupt in which case + * we didn't get the timestamp from lineevent_irq_handler(). + */ + if (!le->timestamp) + ge.timestamp = ktime_get_real_ns(); + else + ge.timestamp = le->timestamp;
if (le->eflags & GPIOEVENT_REQUEST_RISING_EDGE && le->eflags & GPIOEVENT_REQUEST_FALLING_EDGE) {
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Roger Quadros rogerq@ti.com
commit 2486e67374aa8b7854c2de32869642c2873b3d53 upstream.
When multiple instances of pcf857x chips are present, a fix up message [1] is printed during the probe of the 2nd and later instances.
The issue is that the driver is using the same irq_chip data structure between multiple instances.
Fix this by allocating the irq_chip data structure per instance.
[1] fix up message addressed by this patch [ 1.212100] gpio gpiochip9: (pcf8575): detected irqchip that is shared with multiple gpiochips: please fix the driver.
Cc: Stable stable@vger.kernel.org Signed-off-by: Roger Quadros rogerq@ti.com Signed-off-by: Bartosz Golaszewski bgolaszewski@baylibre.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpio/gpio-pcf857x.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-)
--- a/drivers/gpio/gpio-pcf857x.c +++ b/drivers/gpio/gpio-pcf857x.c @@ -84,6 +84,7 @@ MODULE_DEVICE_TABLE(of, pcf857x_of_table */ struct pcf857x { struct gpio_chip chip; + struct irq_chip irqchip; struct i2c_client *client; struct mutex lock; /* protect 'out' */ unsigned out; /* software latch */ @@ -252,18 +253,6 @@ static void pcf857x_irq_bus_sync_unlock( mutex_unlock(&gpio->lock); }
-static struct irq_chip pcf857x_irq_chip = { - .name = "pcf857x", - .irq_enable = pcf857x_irq_enable, - .irq_disable = pcf857x_irq_disable, - .irq_ack = noop, - .irq_mask = noop, - .irq_unmask = noop, - .irq_set_wake = pcf857x_irq_set_wake, - .irq_bus_lock = pcf857x_irq_bus_lock, - .irq_bus_sync_unlock = pcf857x_irq_bus_sync_unlock, -}; - /*-------------------------------------------------------------------------*/
static int pcf857x_probe(struct i2c_client *client, @@ -376,8 +365,17 @@ static int pcf857x_probe(struct i2c_clie
/* Enable irqchip if we have an interrupt */ if (client->irq) { + gpio->irqchip.name = "pcf857x", + gpio->irqchip.irq_enable = pcf857x_irq_enable, + gpio->irqchip.irq_disable = pcf857x_irq_disable, + gpio->irqchip.irq_ack = noop, + gpio->irqchip.irq_mask = noop, + gpio->irqchip.irq_unmask = noop, + gpio->irqchip.irq_set_wake = pcf857x_irq_set_wake, + gpio->irqchip.irq_bus_lock = pcf857x_irq_bus_lock, + gpio->irqchip.irq_bus_sync_unlock = pcf857x_irq_bus_sync_unlock, status = gpiochip_irqchip_add_nested(&gpio->chip, - &pcf857x_irq_chip, + &gpio->irqchip, 0, handle_level_irq, IRQ_TYPE_NONE); if (status) { @@ -392,7 +390,7 @@ static int pcf857x_probe(struct i2c_clie if (status) goto fail;
- gpiochip_set_nested_irqchip(&gpio->chip, &pcf857x_irq_chip, + gpiochip_set_nested_irqchip(&gpio->chip, &gpio->irqchip, client->irq); gpio->irq_parent = client->irq; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neo Hou neo.hou@unisoc.com
commit 09d158d52d2bceda736797a61b6c13d7fc83707b upstream.
Since differnt type EICs have its own data register to read, thus fix the incorrect data register.
Fixes: 25518e024e3a ("gpio: Add Spreadtrum EIC driver support") Cc: stable@vger.kernel.org Signed-off-by: Neo Hou neo.hou@unisoc.com Signed-off-by: Baolin Wang baolin.wang@linaro.org Signed-off-by: Bartosz Golaszewski bgolaszewski@baylibre.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpio/gpio-eic-sprd.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
--- a/drivers/gpio/gpio-eic-sprd.c +++ b/drivers/gpio/gpio-eic-sprd.c @@ -180,7 +180,18 @@ static void sprd_eic_free(struct gpio_ch
static int sprd_eic_get(struct gpio_chip *chip, unsigned int offset) { - return sprd_eic_read(chip, offset, SPRD_EIC_DBNC_DATA); + struct sprd_eic *sprd_eic = gpiochip_get_data(chip); + + switch (sprd_eic->type) { + case SPRD_EIC_DEBOUNCE: + return sprd_eic_read(chip, offset, SPRD_EIC_DBNC_DATA); + case SPRD_EIC_ASYNC: + return sprd_eic_read(chip, offset, SPRD_EIC_ASYNC_DATA); + case SPRD_EIC_SYNC: + return sprd_eic_read(chip, offset, SPRD_EIC_SYNC_DATA); + default: + return -ENOTSUPP; + } }
static int sprd_eic_direction_input(struct gpio_chip *chip, unsigned int offset)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neo Hou neo.hou@unisoc.com
commit f785ffb61605734b518afa766d1b5445e9f38c8d upstream.
When setting async EIC as IRQ_TYPE_EDGE_BOTH type, we missed to set the SPRD_EIC_ASYNC_INTMODE register to 0, which means detecting edge signals.
Thus this patch fixes the issue.
Fixes: 25518e024e3a ("gpio: Add Spreadtrum EIC driver support") Cc: stable@vger.kernel.org Signed-off-by: Neo Hou neo.hou@unisoc.com Signed-off-by: Baolin Wang baolin.wang@linaro.org Signed-off-by: Bartosz Golaszewski bgolaszewski@baylibre.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpio/gpio-eic-sprd.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpio/gpio-eic-sprd.c +++ b/drivers/gpio/gpio-eic-sprd.c @@ -379,6 +379,7 @@ static int sprd_eic_irq_set_type(struct irq_set_handler_locked(data, handle_edge_irq); break; case IRQ_TYPE_EDGE_BOTH: + sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTMODE, 0); sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTBOTH, 1); irq_set_handler_locked(data, handle_edge_irq); break;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
commit e74c98ca2d6ae4376cc15fa2a22483430909d96b upstream.
This reverts commit 2d29f6b96d8f80322ed2dd895bca590491c38d34.
It turns out that the fix can lead to a ~20 percent performance regression in initial writes to the page cache according to iozone. Let's revert this for now to have more time for a proper fix.
Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Bob Peterson rpeterso@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/gfs2/rgrp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -1780,9 +1780,9 @@ static int gfs2_rbm_find(struct gfs2_rbm goto next_iter; } if (ret == -E2BIG) { - n += rbm->bii - initial_bii; rbm->bii = 0; rbm->offset = 0; + n += (rbm->bii - initial_bii); goto res_covered_end_of_rgrp; } return ret;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lukas Wunner lukas@wunner.de
commit 8c9620b1cc9b69e82fa8d4081d646d0016b602e7 upstream.
The BCM2835 MMC host driver requests a DMA channel on probe but neglects to release the channel in the probe error path. The channel may therefore be leaked, in particular if devm_clk_get() causes probe deferral. Fix it.
Fixes: 660fc733bd74 ("mmc: bcm2835: Add new driver for the sdhost controller.") Signed-off-by: Lukas Wunner lukas@wunner.de Cc: stable@vger.kernel.org # v4.12+ Cc: Frank Pavlic f.pavlic@kunbus.de Tested-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/bcm2835.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/mmc/host/bcm2835.c +++ b/drivers/mmc/host/bcm2835.c @@ -1427,6 +1427,8 @@ static int bcm2835_probe(struct platform
err: dev_dbg(dev, "%s -> err %d\n", __func__, ret); + if (host->dma_chan_rxtx) + dma_release_channel(host->dma_chan_rxtx); mmc_free_host(mmc);
return ret;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chaotian Jing chaotian.jing@mediatek.com
commit 3751e008da0df4384031bd66a516c0292f915605 upstream.
to set cmd internal delay, need set PAD_TUNE register but not PAD_CMD_TUNE register.
Signed-off-by: Chaotian Jing chaotian.jing@mediatek.com Fixes: 1ede5cb88a29 ("mmc: mediatek: Use data tune for CMD line tune") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/mtk-sd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/mmc/host/mtk-sd.c +++ b/drivers/mmc/host/mtk-sd.c @@ -846,7 +846,7 @@ static void msdc_set_mclk(struct msdc_ho
if (timing == MMC_TIMING_MMC_HS400 && host->dev_comp->hs400_tune) - sdr_set_field(host->base + PAD_CMD_TUNE, + sdr_set_field(host->base + tune_reg, MSDC_PAD_TUNE_CMDRRDLY, host->hs400_cmd_int_delay); dev_dbg(host->dev, "sclk: %d, timing: %d\n", host->mmc->actual_clock,
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Olek Poplavsky woodenbits@gmail.com
commit 9e6966646b6bc5078d579151b90016522d4ff2cb upstream.
This patch adds quirk VID/PID IDs for the Opus #3 DAP (made by 'The Bit') in order to enable Native DSD support.
[ NOTE: this could be handled in the generic way with fp->dvd_raw if we add 0x10cb to the vendor whitelist, but since 0x10cb shows a different vendor name (Erantech), put to the individual entry at this time -- tiwai ]
Signed-off-by: Olek Poplavsky woodenbits@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/quirks.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -1373,6 +1373,7 @@ u64 snd_usb_interface_dsd_format_quirks( return SNDRV_PCM_FMTBIT_DSD_U32_BE; break;
+ case USB_ID(0x10cb, 0x0103): /* The Bit Opus #3; with fp->dsd_raw */ case USB_ID(0x152a, 0x85de): /* SMSL D1 DAC */ case USB_ID(0x16d0, 0x09dd): /* Encore mDSD */ case USB_ID(0x0d8c, 0x0316): /* Hegel HD12 DSD */
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kailang Yang kailang@realtek.com
commit 693abe11aa6b27aed6eb8222162f8fb986325cef upstream.
Fix hp_pin always no value.
[More notes on the changes:
The hp_pin value that is referred in alc294_hp_init() is always zero at the moment the function gets called, hence this is actually useless as in the current code.
And, this kind of init sequence should be called from the codec init callback, instead of the parser function. So, the first fix in this patch to move the call call into its own init_hook.
OTOH, this function is needed to be called only once after the boot, and it'd take too long for invoking at each resume (where the init callback gets called). So we add a new flag and invoke this only once as an additional fix.
The one case is still not covered, though: S4 resume. But this change itself won't lead to any regression in that regard, so we leave S4 issue as is for now and fix it later. -- tiwai ]
Fixes: bde1a7459623 ("ALSA: hda/realtek - Fixed headphone issue for ALC700") Signed-off-by: Kailang Yang kailang@realtek.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 78 ++++++++++++++++++++++++------------------ 1 file changed, 45 insertions(+), 33 deletions(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -117,6 +117,7 @@ struct alc_spec { int codec_variant; /* flag for other variants */ unsigned int has_alc5505_dsp:1; unsigned int no_depop_delay:1; + unsigned int done_hp_init:1;
/* for PLL fix */ hda_nid_t pll_nid; @@ -3372,6 +3373,48 @@ static void alc_default_shutup(struct hd snd_hda_shutup_pins(codec); }
+static void alc294_hp_init(struct hda_codec *codec) +{ + struct alc_spec *spec = codec->spec; + hda_nid_t hp_pin = spec->gen.autocfg.hp_pins[0]; + int i, val; + + if (!hp_pin) + return; + + snd_hda_codec_write(codec, hp_pin, 0, + AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_MUTE); + + msleep(100); + + snd_hda_codec_write(codec, hp_pin, 0, + AC_VERB_SET_PIN_WIDGET_CONTROL, 0x0); + + alc_update_coef_idx(codec, 0x6f, 0x000f, 0);/* Set HP depop to manual mode */ + alc_update_coefex_idx(codec, 0x58, 0x00, 0x8000, 0x8000); /* HP depop procedure start */ + + /* Wait for depop procedure finish */ + val = alc_read_coefex_idx(codec, 0x58, 0x01); + for (i = 0; i < 20 && val & 0x0080; i++) { + msleep(50); + val = alc_read_coefex_idx(codec, 0x58, 0x01); + } + /* Set HP depop to auto mode */ + alc_update_coef_idx(codec, 0x6f, 0x000f, 0x000b); + msleep(50); +} + +static void alc294_init(struct hda_codec *codec) +{ + struct alc_spec *spec = codec->spec; + + if (!spec->done_hp_init) { + alc294_hp_init(codec); + spec->done_hp_init = true; + } + alc_default_init(codec); +} + static void alc5505_coef_set(struct hda_codec *codec, unsigned int index_reg, unsigned int val) { @@ -7288,37 +7331,6 @@ static void alc269_fill_coef(struct hda_ alc_update_coef_idx(codec, 0x4, 0, 1<<11); }
-static void alc294_hp_init(struct hda_codec *codec) -{ - struct alc_spec *spec = codec->spec; - hda_nid_t hp_pin = spec->gen.autocfg.hp_pins[0]; - int i, val; - - if (!hp_pin) - return; - - snd_hda_codec_write(codec, hp_pin, 0, - AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_MUTE); - - msleep(100); - - snd_hda_codec_write(codec, hp_pin, 0, - AC_VERB_SET_PIN_WIDGET_CONTROL, 0x0); - - alc_update_coef_idx(codec, 0x6f, 0x000f, 0);/* Set HP depop to manual mode */ - alc_update_coefex_idx(codec, 0x58, 0x00, 0x8000, 0x8000); /* HP depop procedure start */ - - /* Wait for depop procedure finish */ - val = alc_read_coefex_idx(codec, 0x58, 0x01); - for (i = 0; i < 20 && val & 0x0080; i++) { - msleep(50); - val = alc_read_coefex_idx(codec, 0x58, 0x01); - } - /* Set HP depop to auto mode */ - alc_update_coef_idx(codec, 0x6f, 0x000f, 0x000b); - msleep(50); -} - /* */ static int patch_alc269(struct hda_codec *codec) @@ -7444,7 +7456,7 @@ static int patch_alc269(struct hda_codec spec->codec_variant = ALC269_TYPE_ALC294; spec->gen.mixer_nid = 0; /* ALC2x4 does not have any loopback mixer path */ alc_update_coef_idx(codec, 0x6b, 0x0018, (1<<4) | (1<<3)); /* UAJ MIC Vref control by verb */ - alc294_hp_init(codec); + spec->init_hook = alc294_init; break; case 0x10ec0300: spec->codec_variant = ALC269_TYPE_ALC300; @@ -7456,7 +7468,7 @@ static int patch_alc269(struct hda_codec spec->codec_variant = ALC269_TYPE_ALC700; spec->gen.mixer_nid = 0; /* ALC700 does not have any loopback mixer path */ alc_update_coef_idx(codec, 0x4a, 1 << 15, 0); /* Combo jack auto trigger control */ - alc294_hp_init(codec); + spec->init_hook = alc294_init; break;
}
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit e190161f96b88ffae870405fd6c3fdd1d2e7f98d upstream.
When the trigger=off is passed for a PCM OSS stream, it sets the start_threshold of the given substream to the boundary size, so that it won't be automatically started. This can be problematic for a capture stream, unfortunately, as detected by syzkaller. The scenario is like the following:
- In __snd_pcm_lib_xfer() that is invoked from snd_pcm_oss_read() loop, we have a check whether the stream was already started or the stream can be auto-started. - The function at this check returns 0 with trigger=off since we explicitly disable the auto-start. - The loop continues and repeats calling __snd_pcm_lib_xfer() tightly, which may lead to an RCU stall.
This patch fixes the bug by simply allowing the wait for non-started stream in the case of OSS capture. For native usages, it's supposed to be done by the caller side (which is user-space), hence it returns zero like before.
(In theory, __snd_pcm_lib_xfer() could wait even for the native API usage cases, too; but I'd like to stay in a safer side for not breaking the existing stuff for now.)
Reported-by: syzbot+fbe0496f92a0ce7b786c@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/pcm_lib.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/sound/core/pcm_lib.c +++ b/sound/core/pcm_lib.c @@ -2112,6 +2112,13 @@ int pcm_lib_apply_appl_ptr(struct snd_pc return 0; }
+/* allow waiting for a capture stream that hasn't been started */ +#if IS_ENABLED(CONFIG_SND_PCM_OSS) +#define wait_capture_start(substream) ((substream)->oss.oss) +#else +#define wait_capture_start(substream) false +#endif + /* the common loop for read/write data */ snd_pcm_sframes_t __snd_pcm_lib_xfer(struct snd_pcm_substream *substream, void *data, bool interleaved, @@ -2182,7 +2189,7 @@ snd_pcm_sframes_t __snd_pcm_lib_xfer(str err = snd_pcm_start(substream); if (err < 0) goto _end_unlock; - } else { + } else if (!wait_capture_start(substream)) { /* nothing to do */ err = 0; goto _end_unlock;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yishai Hadas yishaih@mellanox.com
commit 425784aa5b029eeb80498c73a68f62c3ad1d3b3f upstream.
The async_file might be freed before the disassociation has been ended, causing qp shutdown to use after free on it.
Since uverbs_destroy_ufile_hw is not a fence, it returns if a disassociation is ongoing in another thread. It has to be written this way to avoid deadlock. However this means that the ufile FD close cannot destroy anything that may still be used by an active kref, such as the the async_file.
To fix that move the kref_put() to be in ib_uverbs_release_file().
BUG: unable to handle kernel paging request at ffffffffba682787 PGD bc80e067 P4D bc80e067 PUD bc80f063 PMD 1313df163 PTE 80000000bc682061 Oops: 0003 [#1] SMP PTI CPU: 1 PID: 32410 Comm: bash Tainted: G OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__pv_queued_spin_lock_slowpath+0x1b3/0x2a0 Code: 98 83 e2 60 49 89 df 48 8b 04 c5 80 18 72 ba 48 8d ba 80 32 02 00 ba 00 80 00 00 4c 8d 65 14 41 bd 01 00 00 00 48 01 c7 85 d2 <48> 89 2f 48 89 fb 74 14 8b 45 08 85 c0 75 42 84 d2 74 6b f3 90 83 RSP: 0018:ffffc1bbc064fb58 EFLAGS: 00010006 RAX: ffffffffba65f4e7 RBX: ffff9f209c656c00 RCX: 0000000000000001 RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffffffffba682787 RBP: ffff9f217bb23280 R08: 0000000000000001 R09: 0000000000000000 R10: ffff9f209d2c7800 R11: ffffffffffffffe8 R12: ffff9f217bb23294 R13: 0000000000000001 R14: 0000000000000000 R15: ffff9f209c656c00 FS: 00007fac55aad740(0000) GS:ffff9f217bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffba682787 CR3: 000000012f8e0000 CR4: 00000000000006e0 Call Trace: _raw_spin_lock_irq+0x27/0x30 ib_uverbs_release_uevent+0x1e/0xa0 [ib_uverbs] uverbs_free_qp+0x7e/0x90 [ib_uverbs] destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs] uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs] __uverbs_cleanup_ufile+0x73/0x90 [ib_uverbs] uverbs_destroy_ufile_hw+0x5d/0x120 [ib_uverbs] ib_uverbs_remove_one+0xea/0x240 [ib_uverbs] ib_unregister_device+0xfb/0x200 [ib_core] mlx5_ib_remove+0x51/0xe0 [mlx5_ib] mlx5_remove_device+0xc1/0xd0 [mlx5_core] mlx5_unregister_device+0x3d/0xb0 [mlx5_core] remove_one+0x2a/0x90 [mlx5_core] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x16d/0x240 unbind_store+0xb2/0x100 kernfs_fop_write+0x102/0x180 __vfs_write+0x36/0x1a0 ? __alloc_fd+0xa9/0x170 ? set_close_on_exec+0x49/0x70 vfs_write+0xad/0x1a0 ksys_write+0x52/0xc0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fac551aac60
Cc: stable@vger.kernel.org # 4.2 Fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications") Signed-off-by: Yishai Hadas yishaih@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/core/uverbs_main.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -262,6 +262,9 @@ void ib_uverbs_release_file(struct kref if (atomic_dec_and_test(&file->device->refcount)) ib_uverbs_comp_dev(file->device);
+ if (file->async_file) + kref_put(&file->async_file->ref, + ib_uverbs_release_async_event_file); put_device(&file->device->dev); kfree(file); } @@ -1132,10 +1135,6 @@ static int ib_uverbs_close(struct inode list_del_init(&file->list); mutex_unlock(&file->device->lists_mutex);
- if (file->async_file) - kref_put(&file->async_file->ref, - ib_uverbs_release_async_event_file); - kref_put(&file->ref, ib_uverbs_release_file);
return 0;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yishai Hadas yishaih@mellanox.com
commit 7b21b69ab203136fdc153c7707fa6c409e523c2e upstream.
The vma->vm_mm can become impossible to get before rdma_umap_close() is called, in this case we must not try to get an mm that is already undergoing process exit. In this case there is no need to wait for anything as the VMA will be destroyed by another thread soon and is already effectively 'unreachable' by userspace.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000012bc50067 P4D 800000012bc50067 PUD 129db5067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 2050 Comm: bash Tainted: G W OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__rb_erase_color+0xb9/0x280 Code: 84 17 01 00 00 48 3b 68 10 0f 84 15 01 00 00 48 89 58 08 48 89 de 48 89 ef 4c 89 e3 e8 90 84 22 00 e9 60 ff ff ff 48 8b 5d 10 <f6> 03 01 0f 84 9c 00 00 00 48 8b 43 10 48 85 c0 74 09 f6 00 01 0f RSP: 0018:ffffbecfc090bab8 EFLAGS: 00010246 RAX: ffff97616346cf30 RBX: 0000000000000000 RCX: 0000000000000101 RDX: 0000000000000000 RSI: ffff97623b6ca828 RDI: ffff97621ef10828 RBP: ffff97621ef10828 R08: ffff97621ef10828 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff97623b6ca838 R13: ffffffffbb3fef50 R14: ffff97623b6ca828 R15: 0000000000000000 FS: 00007f7a5c31d740(0000) GS:ffff97623bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000011255a000 CR4: 00000000000006e0 Call Trace: unlink_file_vma+0x3b/0x50 free_pgtables+0xa1/0x110 exit_mmap+0xca/0x1a0 ? mlx5_ib_dealloc_pd+0x28/0x30 [mlx5_ib] mmput+0x54/0x140 uverbs_user_mmap_disassociate+0xcc/0x160 [ib_uverbs] uverbs_destroy_ufile_hw+0xf7/0x120 [ib_uverbs] ib_uverbs_remove_one+0xea/0x240 [ib_uverbs] ib_unregister_device+0xfb/0x200 [ib_core] mlx5_ib_remove+0x51/0xe0 [mlx5_ib] mlx5_remove_device+0xc1/0xd0 [mlx5_core] mlx5_unregister_device+0x3d/0xb0 [mlx5_core] remove_one+0x2a/0x90 [mlx5_core] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x16d/0x240 unbind_store+0xb2/0x100 kernfs_fop_write+0x102/0x180 __vfs_write+0x36/0x1a0 ? __alloc_fd+0xa9/0x170 ? set_close_on_exec+0x49/0x70 vfs_write+0xad/0x1a0 ksys_write+0x52/0xc0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Cc: stable@vger.kernel.org # 4.19 Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Yishai Hadas yishaih@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/core/uverbs_main.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
--- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -1000,11 +1000,19 @@ void uverbs_user_mmap_disassociate(struc
/* Get an arbitrary mm pointer that hasn't been cleaned yet */ mutex_lock(&ufile->umap_lock); - if (!list_empty(&ufile->umaps)) { - mm = list_first_entry(&ufile->umaps, - struct rdma_umap_priv, list) - ->vma->vm_mm; - mmget(mm); + while (!list_empty(&ufile->umaps)) { + int ret; + + priv = list_first_entry(&ufile->umaps, + struct rdma_umap_priv, list); + mm = priv->vma->vm_mm; + ret = mmget_not_zero(mm); + if (!ret) { + list_del_init(&priv->list); + mm = NULL; + continue; + } + break; } mutex_unlock(&ufile->umap_lock); if (!mm)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael J. Ruhl michael.j.ruhl@intel.com
commit 7709b0dc265f28695487712c45f02bbd1f98415d upstream.
Applications that use the stack for execution purposes cause userspace PSM jobs to fail during mmap().
Both Fortran (non-standard format parsing) and C (callback functions located in the stack) applications can be written such that stack execution is required. The linker notes this via the gnu_stack ELF flag.
This causes READ_IMPLIES_EXEC to be set which forces all PROT_READ mmaps to have PROT_EXEC for the process.
Checking for VM_EXEC bit and failing the request with EPERM is overly conservative and will break any PSM application using executable stacks.
Cc: stable@vger.kernel.org #v4.14+ Fixes: 12220267645c ("IB/hfi: Protect against writable mmap") Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Reviewed-by: Dennis Dalessandro dennis.dalessandro@intel.com Reviewed-by: Ira Weiny ira.weiny@intel.com Signed-off-by: Michael J. Ruhl michael.j.ruhl@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/hw/hfi1/file_ops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/infiniband/hw/hfi1/file_ops.c +++ b/drivers/infiniband/hw/hfi1/file_ops.c @@ -488,7 +488,7 @@ static int hfi1_file_mmap(struct file *f vmf = 1; break; case STATUS: - if (flags & (unsigned long)(VM_WRITE | VM_EXEC)) { + if (flags & VM_WRITE) { ret = -EPERM; goto done; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mike Marciniszyn mike.marciniszyn@intel.com
commit 09ce351dff8e7636af0beb72cd4a86c3904a0500 upstream.
Fix potential memory corruption and panic in loopback for IB_WR_SEND variants.
The code blindly assumes the posted length will fit in the fetched rwqe, which is not a valid assumption.
Fix by adding a limit test, and triggering the appropriate send completion and putting the QP in an error state. This mimics the handling for non-loopback QPs.
Fixes: 15703461533a ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt") Cc: stable@vger.kernel.org #v4.20+ Reviewed-by: Michael J. Ruhl michael.j.ruhl@intel.com Signed-off-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/sw/rdmavt/qp.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/infiniband/sw/rdmavt/qp.c +++ b/drivers/infiniband/sw/rdmavt/qp.c @@ -2903,6 +2903,8 @@ send: goto op_err; if (!ret) goto rnr_nak; + if (wqe->length > qp->r_len) + goto inv_err; break;
case IB_WR_RDMA_WRITE_WITH_IMM: @@ -3071,7 +3073,10 @@ op_err: goto err;
inv_err: - send_status = IB_WC_REM_INV_REQ_ERR; + send_status = + sqp->ibqp.qp_type == IB_QPT_RC ? + IB_WC_REM_INV_REQ_ERR : + IB_WC_SUCCESS; wc.status = IB_WC_LOC_QP_OP_ERR; goto err;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit b3f2f3799a972d3863d0fdc2ab6287aef6ca631f ]
When the OS registers to handle events from the display off hotkey the EC will send a notification with 0x35 for every key press, independent of the backlight state.
The behavior of this key on Windows, with the ATKACPI driver from Asus installed, is turning off the backlight of all connected displays with a fading effect, and any cursor input or key press turning the backlight back on. The key press or cursor input that wakes up the display is also passed through to the application under the cursor or under focus.
The key that matches this behavior the closest is KEY_SCREENLOCK.
Signed-off-by: João Paulo Rechi Vita jprvita@endlessm.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/asus-nb-wmi.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/platform/x86/asus-nb-wmi.c +++ b/drivers/platform/x86/asus-nb-wmi.c @@ -444,6 +444,7 @@ static const struct key_entry asus_nb_wm { KE_KEY, 0x32, { KEY_MUTE } }, { KE_KEY, 0x33, { KEY_DISPLAYTOGGLE } }, /* LCD on */ { KE_KEY, 0x34, { KEY_DISPLAY_OFF } }, /* LCD off */ + { KE_KEY, 0x35, { KEY_SCREENLOCK } }, { KE_KEY, 0x40, { KEY_PREVIOUSSONG } }, { KE_KEY, 0x41, { KEY_NEXTSONG } }, { KE_KEY, 0x43, { KEY_STOPCD } }, /* Stop/Eject */
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 71b12beaf12f21a53bfe100795d0797f1035b570 ]
According to Asus firmware engineers, the meaning of these codes is only to notify the OS that the screen brightness has been turned on/off by the EC. This does not match the meaning of KEY_DISPLAYTOGGLE / KEY_DISPLAY_OFF, where userspace is expected to change the display brightness.
Signed-off-by: João Paulo Rechi Vita jprvita@endlessm.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/asus-nb-wmi.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/platform/x86/asus-nb-wmi.c +++ b/drivers/platform/x86/asus-nb-wmi.c @@ -442,8 +442,6 @@ static const struct key_entry asus_nb_wm { KE_KEY, 0x30, { KEY_VOLUMEUP } }, { KE_KEY, 0x31, { KEY_VOLUMEDOWN } }, { KE_KEY, 0x32, { KEY_MUTE } }, - { KE_KEY, 0x33, { KEY_DISPLAYTOGGLE } }, /* LCD on */ - { KE_KEY, 0x34, { KEY_DISPLAY_OFF } }, /* LCD off */ { KE_KEY, 0x35, { KEY_SCREENLOCK } }, { KE_KEY, 0x40, { KEY_PREVIOUSSONG } }, { KE_KEY, 0x41, { KEY_NEXTSONG } },
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit a6279470762c19ba97e454f90798373dccdf6148 upstream.
When splitting a leaf or node from one of the trees that are modified when flushing pending block groups (extent, chunk, device and free space trees), we need to allocate a new tree block, which in turn can result in the need to allocate a new block group. After allocating the new block group we may need to flush new block groups that were previously allocated during the course of the current transaction, which is what may cause a deadlock due to attempts to write lock twice the same leaf or node, as when splitting a leaf or node we are holding a write lock on it and its parent node.
The same type of deadlock can also happen when increasing the tree's height, since we are holding a lock on the existing root while allocating the tree block to use as the new root node.
An example trace when the deadlock happens during the leaf split path is:
[27175.293054] CPU: 0 PID: 3005 Comm: kworker/u17:6 Tainted: G W 4.19.16 #1 [27175.293942] Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018 [27175.294846] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] (...) [27175.298384] RSP: 0018:ffffab2087107758 EFLAGS: 00010246 [27175.299269] RAX: 0000000000000bbd RBX: ffff9fadc7141c48 RCX: 0000000000000001 [27175.300155] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9fadc7141c48 [27175.301023] RBP: 0000000000000001 R08: ffff9faeb6ac1040 R09: ffff9fa9c0000000 [27175.301887] R10: 0000000000000000 R11: 0000000000000040 R12: ffff9fb21aac8000 [27175.302743] R13: ffff9fb1a64d6a20 R14: 0000000000000001 R15: ffff9fb1a64d6a18 [27175.303601] FS: 0000000000000000(0000) GS:ffff9fb21fa00000(0000) knlGS:0000000000000000 [27175.304468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [27175.305339] CR2: 00007fdc8743ead8 CR3: 0000000763e0a006 CR4: 00000000003606f0 [27175.306220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [27175.307087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [27175.307940] Call Trace: [27175.308802] btrfs_search_slot+0x779/0x9a0 [btrfs] [27175.309669] ? update_space_info+0xba/0xe0 [btrfs] [27175.310534] btrfs_insert_empty_items+0x67/0xc0 [btrfs] [27175.311397] btrfs_insert_item+0x60/0xd0 [btrfs] [27175.312253] btrfs_create_pending_block_groups+0xee/0x210 [btrfs] [27175.313116] do_chunk_alloc+0x25f/0x300 [btrfs] [27175.313984] find_free_extent+0x706/0x10d0 [btrfs] [27175.314855] btrfs_reserve_extent+0x9b/0x1d0 [btrfs] [27175.315707] btrfs_alloc_tree_block+0x100/0x5b0 [btrfs] [27175.316548] split_leaf+0x130/0x610 [btrfs] [27175.317390] btrfs_search_slot+0x94d/0x9a0 [btrfs] [27175.318235] btrfs_insert_empty_items+0x67/0xc0 [btrfs] [27175.319087] alloc_reserved_file_extent+0x84/0x2c0 [btrfs] [27175.319938] __btrfs_run_delayed_refs+0x596/0x1150 [btrfs] [27175.320792] btrfs_run_delayed_refs+0xed/0x1b0 [btrfs] [27175.321643] delayed_ref_async_start+0x81/0x90 [btrfs] [27175.322491] normal_work_helper+0xd0/0x320 [btrfs] [27175.323328] ? move_linked_works+0x6e/0xa0 [27175.324160] process_one_work+0x191/0x370 [27175.324976] worker_thread+0x4f/0x3b0 [27175.325763] kthread+0xf8/0x130 [27175.326531] ? rescuer_thread+0x320/0x320 [27175.327284] ? kthread_create_worker_on_cpu+0x50/0x50 [27175.328027] ret_from_fork+0x35/0x40 [27175.328741] ---[ end trace 300a1b9f0ac30e26 ]---
Fix this by preventing the flushing of new blocks groups when splitting a leaf/node and when inserting a new root node for one of the trees modified by the flushing operation, similar to what is done when COWing a node/leaf from on of these trees.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202383 Reported-by: Eli V eliventer@gmail.com CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/ctree.c | 78 +++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 50 insertions(+), 28 deletions(-)
--- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -967,6 +967,48 @@ static noinline int update_ref_for_cow(s return 0; }
+static struct extent_buffer *alloc_tree_block_no_bg_flush( + struct btrfs_trans_handle *trans, + struct btrfs_root *root, + u64 parent_start, + const struct btrfs_disk_key *disk_key, + int level, + u64 hint, + u64 empty_size) +{ + struct btrfs_fs_info *fs_info = root->fs_info; + struct extent_buffer *ret; + + /* + * If we are COWing a node/leaf from the extent, chunk, device or free + * space trees, make sure that we do not finish block group creation of + * pending block groups. We do this to avoid a deadlock. + * COWing can result in allocation of a new chunk, and flushing pending + * block groups (btrfs_create_pending_block_groups()) can be triggered + * when finishing allocation of a new chunk. Creation of a pending block + * group modifies the extent, chunk, device and free space trees, + * therefore we could deadlock with ourselves since we are holding a + * lock on an extent buffer that btrfs_create_pending_block_groups() may + * try to COW later. + * For similar reasons, we also need to delay flushing pending block + * groups when splitting a leaf or node, from one of those trees, since + * we are holding a write lock on it and its parent or when inserting a + * new root node for one of those trees. + */ + if (root == fs_info->extent_root || + root == fs_info->chunk_root || + root == fs_info->dev_root || + root == fs_info->free_space_root) + trans->can_flush_pending_bgs = false; + + ret = btrfs_alloc_tree_block(trans, root, parent_start, + root->root_key.objectid, disk_key, level, + hint, empty_size); + trans->can_flush_pending_bgs = true; + + return ret; +} + /* * does the dirty work in cow of a single block. The parent block (if * supplied) is updated to point to the new cow copy. The new buffer is marked @@ -1014,28 +1056,8 @@ static noinline int __btrfs_cow_block(st if ((root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) && parent) parent_start = parent->start;
- /* - * If we are COWing a node/leaf from the extent, chunk, device or free - * space trees, make sure that we do not finish block group creation of - * pending block groups. We do this to avoid a deadlock. - * COWing can result in allocation of a new chunk, and flushing pending - * block groups (btrfs_create_pending_block_groups()) can be triggered - * when finishing allocation of a new chunk. Creation of a pending block - * group modifies the extent, chunk, device and free space trees, - * therefore we could deadlock with ourselves since we are holding a - * lock on an extent buffer that btrfs_create_pending_block_groups() may - * try to COW later. - */ - if (root == fs_info->extent_root || - root == fs_info->chunk_root || - root == fs_info->dev_root || - root == fs_info->free_space_root) - trans->can_flush_pending_bgs = false; - - cow = btrfs_alloc_tree_block(trans, root, parent_start, - root->root_key.objectid, &disk_key, level, - search_start, empty_size); - trans->can_flush_pending_bgs = true; + cow = alloc_tree_block_no_bg_flush(trans, root, parent_start, &disk_key, + level, search_start, empty_size); if (IS_ERR(cow)) return PTR_ERR(cow);
@@ -3342,8 +3364,8 @@ static noinline int insert_new_root(stru else btrfs_node_key(lower, &lower_key, 0);
- c = btrfs_alloc_tree_block(trans, root, 0, root->root_key.objectid, - &lower_key, level, root->node->start, 0); + c = alloc_tree_block_no_bg_flush(trans, root, 0, &lower_key, level, + root->node->start, 0); if (IS_ERR(c)) return PTR_ERR(c);
@@ -3472,8 +3494,8 @@ static noinline int split_node(struct bt mid = (c_nritems + 1) / 2; btrfs_node_key(c, &disk_key, mid);
- split = btrfs_alloc_tree_block(trans, root, 0, root->root_key.objectid, - &disk_key, level, c->start, 0); + split = alloc_tree_block_no_bg_flush(trans, root, 0, &disk_key, level, + c->start, 0); if (IS_ERR(split)) return PTR_ERR(split);
@@ -4257,8 +4279,8 @@ again: else btrfs_item_key(l, &disk_key, mid);
- right = btrfs_alloc_tree_block(trans, root, 0, root->root_key.objectid, - &disk_key, 0, l->start, 0); + right = alloc_tree_block_no_bg_flush(trans, root, 0, &disk_key, 0, + l->start, 0); if (IS_ERR(right)) return PTR_ERR(right);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
commit 532b618bdf237250d6d4566536d4b6ce3d0a31fe upstream.
The subvol_name is allocated in btrfs_parse_subvol_options and is consumed and freed in mount_subvol. Add a free to the error paths that don't call mount_subvol so that it is guaranteed that subvol_name is freed when an error happens.
Fixes: 312c89fbca06 ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()") Cc: stable@vger.kernel.org # v4.19+ Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/super.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1677,6 +1677,7 @@ static struct dentry *btrfs_mount(struct flags | SB_RDONLY, device_name, data); if (IS_ERR(mnt_root)) { root = ERR_CAST(mnt_root); + kfree(subvol_name); goto out; }
@@ -1686,12 +1687,14 @@ static struct dentry *btrfs_mount(struct if (error < 0) { root = ERR_PTR(error); mntput(mnt_root); + kfree(subvol_name); goto out; } } } if (IS_ERR(mnt_root)) { root = ERR_CAST(mnt_root); + kfree(subvol_name); goto out; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrei Vagin avagin@gmail.com
commit 8fb335e078378c8426fabeed1ebee1fbf915690c upstream.
Currently, exit_ptrace() adds all ptraced tasks in a dead list, then zap_pid_ns_processes() waits on all tasks in a current pidns, and only then are tasks from the dead list released.
zap_pid_ns_processes() can get stuck on waiting tasks from the dead list. In this case, we will have one unkillable process with one or more dead children.
Thanks to Oleg for the advice to release tasks in find_child_reaper().
Link: http://lkml.kernel.org/r/20190110175200.12442-1-avagin@gmail.com Fixes: 7c8bd2322c7f ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()") Signed-off-by: Andrei Vagin avagin@gmail.com Signed-off-by: Oleg Nesterov oleg@redhat.com Cc: "Eric W. Biederman" ebiederm@xmission.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/exit.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/kernel/exit.c +++ b/kernel/exit.c @@ -558,12 +558,14 @@ static struct task_struct *find_alive_th return NULL; }
-static struct task_struct *find_child_reaper(struct task_struct *father) +static struct task_struct *find_child_reaper(struct task_struct *father, + struct list_head *dead) __releases(&tasklist_lock) __acquires(&tasklist_lock) { struct pid_namespace *pid_ns = task_active_pid_ns(father); struct task_struct *reaper = pid_ns->child_reaper; + struct task_struct *p, *n;
if (likely(reaper != father)) return reaper; @@ -579,6 +581,12 @@ static struct task_struct *find_child_re panic("Attempted to kill init! exitcode=0x%08x\n", father->signal->group_exit_code ?: father->exit_code); } + + list_for_each_entry_safe(p, n, dead, ptrace_entry) { + list_del_init(&p->ptrace_entry); + release_task(p); + } + zap_pid_ns_processes(pid_ns); write_lock_irq(&tasklist_lock);
@@ -668,7 +676,7 @@ static void forget_original_parent(struc exit_ptrace(father, dead);
/* Can drop and reacquire tasklist_lock */ - reaper = find_child_reaper(father); + reaper = find_child_reaper(father, dead); if (list_empty(&father->children)) return;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrea Arcangeli aarcange@redhat.com
commit 1ac25013fb9e4ed595cd608a406191e93520881e upstream.
hugetlb needs the same fix as faultin_nopage (which was applied in commit 96312e61282a ("mm/gup.c: teach get_user_pages_unlocked to handle FOLL_NOWAIT")) or KVM hangs because it thinks the mmap_sem was already released by hugetlb_fault() if it returned VM_FAULT_RETRY, but it wasn't in the FOLL_NOWAIT case.
Link: http://lkml.kernel.org/r/20190109020203.26669-2-aarcange@redhat.com Fixes: ce53053ce378 ("kvm: switch get_user_page_nowait() to get_user_pages_unlocked()") Signed-off-by: Andrea Arcangeli aarcange@redhat.com Tested-by: "Dr. David Alan Gilbert" dgilbert@redhat.com Reported-by: "Dr. David Alan Gilbert" dgilbert@redhat.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Reviewed-by: Peter Xu peterx@redhat.com Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/hugetlb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4270,7 +4270,8 @@ long follow_hugetlb_page(struct mm_struc break; } if (ret & VM_FAULT_RETRY) { - if (nonblocking) + if (nonblocking && + !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) *nonblocking = 0; *nr_pages = 0; /*
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 9bcdeb51bd7d2ae9fe65ea4d60643d2aeef5bfe3 upstream.
Arkadiusz reported that enabling memcg's group oom killing causes strange memcg statistics where there is no task in a memcg despite the number of tasks in that memcg is not 0. It turned out that there is a bug in wake_oom_reaper() which allows enqueuing same task twice which makes impossible to decrease the number of tasks in that memcg due to a refcount leak.
This bug existed since the OOM reaper became invokable from task_will_free_mem(current) path in out_of_memory() in Linux 4.7,
T1@P1 |T2@P1 |T3@P1 |OOM reaper ----------+----------+----------+------------ # Processing an OOM victim in a different memcg domain. try_charge() mem_cgroup_out_of_memory() mutex_lock(&oom_lock) try_charge() mem_cgroup_out_of_memory() mutex_lock(&oom_lock) try_charge() mem_cgroup_out_of_memory() mutex_lock(&oom_lock) out_of_memory() oom_kill_process(P1) do_send_sig_info(SIGKILL, @P1) mark_oom_victim(T1@P1) wake_oom_reaper(T1@P1) # T1@P1 is enqueued. mutex_unlock(&oom_lock) out_of_memory() mark_oom_victim(T2@P1) wake_oom_reaper(T2@P1) # T2@P1 is enqueued. mutex_unlock(&oom_lock) out_of_memory() mark_oom_victim(T1@P1) wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL. mutex_unlock(&oom_lock) # Completed processing an OOM victim in a different memcg domain. spin_lock(&oom_reaper_lock) # T1P1 is dequeued. spin_unlock(&oom_reaper_lock)
but memcg's group oom killing made it easier to trigger this bug by calling wake_oom_reaper() on the same task from one out_of_memory() request.
Fix this bug using an approach used by commit 855b018325737f76 ("oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task"). As a side effect of this patch, this patch also avoids enqueuing multiple threads sharing memory via task_will_free_mem(current) path.
Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.... Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.... Fixes: af8e15cc85a25315 ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head") Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Reported-by: Arkadiusz Miskiewicz arekm@maven.pl Tested-by: Arkadiusz Miskiewicz arekm@maven.pl Acked-by: Michal Hocko mhocko@suse.com Acked-by: Roman Gushchin guro@fb.com Cc: Tejun Heo tj@kernel.org Cc: Aleksa Sarai asarai@suse.de Cc: Jay Kamat jgkamat@fb.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/sched/coredump.h | 1 + mm/oom_kill.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-)
--- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -71,6 +71,7 @@ static inline int get_dumpable(struct mm #define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */ #define MMF_DISABLE_THP 24 /* disable THP for all VMAs */ #define MMF_OOM_VICTIM 25 /* mm is the oom victim */ +#define MMF_OOM_REAP_QUEUED 26 /* mm was queued for oom_reaper */ #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -634,8 +634,8 @@ static int oom_reaper(void *unused)
static void wake_oom_reaper(struct task_struct *tsk) { - /* tsk is already queued? */ - if (tsk == oom_reaper_list || tsk->oom_reaper_list) + /* mm is already queued? */ + if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) return;
get_task_struct(tsk);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oscar Salvador osalvador@suse.de
commit eeb0efd071d821a88da3fbd35f2d478f40d3b2ea upstream.
This is the same sort of error we saw in commit 17e2e7d7e1b8 ("mm, page_alloc: fix has_unmovable_pages for HugePages").
Gigantic hugepages cross several memblocks, so it can be that the page we get in scan_movable_pages() is a page-tail belonging to a 1G-hugepage. If that happens, page_hstate()->size_to_hstate() will return NULL, and we will blow up in hugepage_migration_supported().
The splat is as follows:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 1350 Comm: bash Tainted: G E 5.0.0-rc1-mm1-1-default+ #27 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:__offline_pages+0x6ae/0x900 Call Trace: memory_subsys_offline+0x42/0x60 device_offline+0x80/0xa0 state_store+0xab/0xc0 kernfs_fop_write+0x102/0x180 __vfs_write+0x26/0x190 vfs_write+0xad/0x1b0 ksys_write+0x42/0x90 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Modules linked in: af_packet(E) xt_tcpudp(E) ipt_REJECT(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv4(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) iptable_filter(E) ip_tables(E) x_tables(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) bochs_drm(E) ttm(E) aesni_intel(E) drm_kms_helper(E) aes_x86_64(E) crypto_simd(E) cryptd(E) glue_helper(E) drm(E) virtio_net(E) syscopyarea(E) sysfillrect(E) net_failover(E) sysimgblt(E) pcspkr(E) failover(E) i2c_piix4(E) fb_sys_fops(E) parport_pc(E) parport(E) button(E) btrfs(E) libcrc32c(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) raid6_pq(E) sd_mod(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) libata(E) crc32c_intel(E) serio_raw(E) virtio_pci(E) virtio_ring(E) virtio(E) sg(E) scsi_mod(E) autofs4(E)
[akpm@linux-foundation.org: fix brace layout, per David. Reduce indentation] Link: http://lkml.kernel.org/r/20190122154407.18417-1-osalvador@suse.de Signed-off-by: Oscar Salvador osalvador@suse.de Reviewed-by: Anthony Yznaga anthony.yznaga@oracle.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: David Hildenbrand david@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/memory_hotplug.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-)
--- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1302,23 +1302,27 @@ int test_pages_in_a_zone(unsigned long s static unsigned long scan_movable_pages(unsigned long start, unsigned long end) { unsigned long pfn; - struct page *page; + for (pfn = start; pfn < end; pfn++) { - if (pfn_valid(pfn)) { - page = pfn_to_page(pfn); - if (PageLRU(page)) - return pfn; - if (__PageMovable(page)) - return pfn; - if (PageHuge(page)) { - if (hugepage_migration_supported(page_hstate(page)) && - page_huge_active(page)) - return pfn; - else - pfn = round_up(pfn + 1, - 1 << compound_order(page)) - 1; - } - } + struct page *page, *head; + unsigned long skip; + + if (!pfn_valid(pfn)) + continue; + page = pfn_to_page(pfn); + if (PageLRU(page)) + return pfn; + if (__PageMovable(page)) + return pfn; + + if (!PageHuge(page)) + continue; + head = compound_head(page); + if (hugepage_migration_supported(page_hstate(head)) && + page_huge_active(head)) + return pfn; + skip = (1 << compound_order(head)) - (page - head); + pfn += skip - 1; } return 0; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shakeel Butt shakeelb@google.com
commit cefc7ef3c87d02fc9307835868ff721ea12cc597 upstream.
Syzbot instance running on upstream kernel found a use-after-free bug in oom_kill_process. On further inspection it seems like the process selected to be oom-killed has exited even before reaching read_lock(&tasklist_lock) in oom_kill_process(). More specifically the tsk->usage is 1 which is due to get_task_struct() in oom_evaluate_task() and the put_task_struct within for_each_thread() frees the tsk and for_each_thread() tries to access the tsk. The easiest fix is to do get/put across the for_each_thread() on the selected task.
Now the next question is should we continue with the oom-kill as the previously selected task has exited? However before adding more complexity and heuristics, let's answer why we even look at the children of oom-kill selected task? The select_bad_process() has already selected the worst process in the system/memcg. Due to race, the selected process might not be the worst at the kill time but does that matter? The userspace can use the oom_score_adj interface to prefer children to be killed before the parent. I looked at the history but it seems like this is there before git history.
Link: http://lkml.kernel.org/r/20190121215850.221745-1-shakeelb@google.com Reported-by: syzbot+7fbbfa368521945f0e3d@syzkaller.appspotmail.com Fixes: 6b0c81b3be11 ("mm, oom: reduce dependency on tasklist_lock") Signed-off-by: Shakeel Butt shakeelb@google.com Reviewed-by: Roman Gushchin guro@fb.com Acked-by: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/oom_kill.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -962,6 +962,13 @@ static void oom_kill_process(struct oom_ * still freeing memory. */ read_lock(&tasklist_lock); + + /* + * The task 'p' might have already exited before reaching here. The + * put_task_struct() will free task_struct 'p' while the loop still try + * to access the field of 'p', so, get an extra reference. + */ + get_task_struct(p); for_each_thread(p, t) { list_for_each_entry(child, &t->children, sibling) { unsigned int child_points; @@ -981,6 +988,7 @@ static void oom_kill_process(struct oom_ } } } + put_task_struct(p); read_unlock(&tasklist_lock);
/*
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naoya Horiguchi n-horiguchi@ah.jp.nec.com
commit 6376360ecbe525a9c17b3d081dfd88ba3e4ed65b upstream.
Currently memory_failure() is racy against process's exiting, which results in kernel crash by null pointer dereference.
The root cause is that memory_failure() uses force_sig() to forcibly kill asynchronous (meaning not in the current context) processes. As discussed in thread https://lkml.org/lkml/2010/6/8/236 years ago for OOM fixes, this is not a right thing to do. OOM solves this issue by using do_send_sig_info() as done in commit d2d393099de2 ("signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig()"), so this patch is suggesting to do the same for hwpoison. do_send_sig_info() properly accesses to siglock with lock_task_sighand(), so is free from the reported race.
I confirmed that the reported bug reproduces with inserting some delay in kill_procs(), and it never reproduces with this patch.
Note that memory_failure() can send another type of signal using force_sig_mceerr(), and the reported race shouldn't happen on it because force_sig_mceerr() is called only for synchronous processes (i.e. BUS_MCEERR_AR happens only when some process accesses to the corrupted memory.)
Link: http://lkml.kernel.org/r/20190116093046.GA29835@hori1.linux.bs1.fc.nec.co.jp Signed-off-by: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Reported-by: Jane Chu jane.chu@oracle.com Reviewed-by: Dan Williams dan.j.williams@intel.com Reviewed-by: William Kucharski william.kucharski@oracle.com Cc: Oleg Nesterov oleg@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/memory-failure.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -372,7 +372,8 @@ static void kill_procs(struct list_head if (fail || tk->addr_valid == 0) { pr_err("Memory failure: %#lx: forcibly killing %s:%d because of failure to unmap corrupted page\n", pfn, tk->tsk->comm, tk->tsk->pid); - force_sig(SIGKILL, tk->tsk); + do_send_sig_info(SIGKILL, SEND_SIG_PRIV, + tk->tsk, PIDTYPE_PID); }
/*
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit e0a352fabce61f730341d119fbedf71ffdb8663f upstream.
We had a race in the old balloon compaction code before b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature") refactored it that became visible after backporting 195a8c43e93d ("virtio-balloon: deflate via a page list") without the refactoring.
The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") till b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature"). d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") was backported to 3.12, so the broken kernels are stable kernels [3.12 - 4.7].
There was a subtle race between dropping the page lock of the newpage in __unmap_and_move() and checking for __is_movable_balloon_page(newpage).
Just after dropping this page lock, virtio-balloon could go ahead and deflate the newpage, effectively dequeueing it and clearing PageBalloon, in turn making __is_movable_balloon_page(newpage) fail.
This resulted in dropping the reference of the newpage via putback_lru_page(newpage) instead of put_page(newpage), leading to page->lru getting modified and a !LRU page ending up in the LRU lists. With 195a8c43e93d ("virtio-balloon: deflate via a page list") backported, one would suddenly get corrupted lists in release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0 - list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
Nowadays this race is no longer possible, but it is hidden behind very ugly handling of __ClearPageMovable() and __PageMovable().
__ClearPageMovable() will not make __PageMovable() fail, only PageMovable(). So the new check (__PageMovable(newpage)) will still hold even after newpage was dequeued by virtio-balloon.
If anybody would ever change that special handling, the BUG would be introduced again. So instead, make it explicit and use the information of the original isolated page before migration.
This patch can be backported fairly easy to stable kernels (in contrast to the refactoring).
Link: http://lkml.kernel.org/r/20190129233217.10747-1-david@redhat.com Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") Signed-off-by: David Hildenbrand david@redhat.com Reported-by: Vratislav Bendel vbendel@redhat.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: Rafael Aquini aquini@redhat.com Cc: Mel Gorman mgorman@techsingularity.net Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Michal Hocko mhocko@suse.com Cc: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Jan Kara jack@suse.cz Cc: Andrea Arcangeli aarcange@redhat.com Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Matthew Wilcox willy@infradead.org Cc: Vratislav Bendel vbendel@redhat.com Cc: Rafael Aquini aquini@redhat.com Cc: Konstantin Khlebnikov k.khlebnikov@samsung.com Cc: Minchan Kim minchan@kernel.org Cc: stable@vger.kernel.org [3.12 - 4.7] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/migrate.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/migrate.c +++ b/mm/migrate.c @@ -1108,10 +1108,13 @@ out: * If migration is successful, decrease refcount of the newpage * which will not free the page because new page owner increased * refcounter. As well, if it is LRU page, add the page to LRU - * list in here. + * list in here. Use the old state of the isolated source page to + * determine if we migrated a LRU page. newpage was already unlocked + * and possibly modified by its owner - don't rely on the page + * state. */ if (rc == MIGRATEPAGE_SUCCESS) { - if (unlikely(__PageMovable(newpage))) + if (unlikely(!is_lru)) put_page(newpage); else putback_lru_page(newpage);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Rowand frank.rowand@sony.com
commit 144552c786925314c1e7cb8f91a71dae1aca8798 upstream.
Add checks: - attempted kfree due to refcount reaching zero before overlay is removed - properties linked to an overlay node when the node is removed - node refcount > one during node removal in a changeset destroy, if the node was created by the changeset
After applying this patch, several validation warnings will be reported from the devicetree unittest during boot due to pre-existing devicetree bugs. The warnings will be similar to:
OF: ERROR: of_node_release(), unexpected properties in /testcase-data/overlay-node/test-bus/test-unittest11 OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /testcase-data-2/substation@100/ hvac-medium-2
Tested-by: Alan Tull atull@kernel.org Signed-off-by: Frank Rowand frank.rowand@sony.com Cc: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/of/dynamic.c | 29 +++++++++++++++++++++++++++++ drivers/of/overlay.c | 1 + include/linux/of.h | 15 ++++++++++----- 3 files changed, 40 insertions(+), 5 deletions(-)
--- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -333,6 +333,25 @@ void of_node_release(struct kobject *kob if (!of_node_check_flag(node, OF_DYNAMIC)) return;
+ if (of_node_check_flag(node, OF_OVERLAY)) { + + if (!of_node_check_flag(node, OF_OVERLAY_FREE_CSET)) { + /* premature refcount of zero, do not free memory */ + pr_err("ERROR: memory leak before free overlay changeset, %pOF\n", + node); + return; + } + + /* + * If node->properties non-empty then properties were added + * to this node either by different overlay that has not + * yet been removed, or by a non-overlay mechanism. + */ + if (node->properties) + pr_err("ERROR: %s(), unexpected properties in %pOF\n", + __func__, node); + } + property_list_free(node->properties); property_list_free(node->deadprops);
@@ -437,6 +456,16 @@ struct device_node *__of_node_dup(const
static void __of_changeset_entry_destroy(struct of_changeset_entry *ce) { + if (ce->action == OF_RECONFIG_ATTACH_NODE && + of_node_check_flag(ce->np, OF_OVERLAY)) { + if (kref_read(&ce->np->kobj.kref) > 1) { + pr_err("ERROR: memory leak, expected refcount 1 instead of %d, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node %pOF\n", + kref_read(&ce->np->kobj.kref), ce->np); + } else { + of_node_set_flag(ce->np, OF_OVERLAY_FREE_CSET); + } + } + of_node_put(ce->np); list_del(&ce->node); kfree(ce); --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -373,6 +373,7 @@ static int add_changeset_node(struct ove return -ENOMEM;
tchild->parent = target_node; + of_node_set_flag(tchild, OF_OVERLAY);
ret = of_changeset_attach_node(&ovcs->cset, tchild); if (ret) --- a/include/linux/of.h +++ b/include/linux/of.h @@ -138,11 +138,16 @@ extern struct device_node *of_aliases; extern struct device_node *of_stdout; extern raw_spinlock_t devtree_lock;
-/* flag descriptions (need to be visible even when !CONFIG_OF) */ -#define OF_DYNAMIC 1 /* node and properties were allocated via kmalloc */ -#define OF_DETACHED 2 /* node has been detached from the device tree */ -#define OF_POPULATED 3 /* device already created for the node */ -#define OF_POPULATED_BUS 4 /* of_platform_populate recursed to children of this node */ +/* + * struct device_node flag descriptions + * (need to be visible even when !CONFIG_OF) + */ +#define OF_DYNAMIC 1 /* (and properties) allocated via kmalloc */ +#define OF_DETACHED 2 /* detached from the device tree */ +#define OF_POPULATED 3 /* device already created */ +#define OF_POPULATED_BUS 4 /* platform bus created for children */ +#define OF_OVERLAY 5 /* allocated for an overlay */ +#define OF_OVERLAY_FREE_CSET 6 /* in overlay cset being freed */
#define OF_BAD_ADDR ((u64)-1)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Rowand frank.rowand@sony.com
commit 5b2c2f5a0ea3a43e0dee78059e34c7cb54136dcc upstream.
There is a matching of_node_put() in __of_detach_node_sysfs()
Remove misleading comment from function header comment for of_detach_node().
This patch may result in memory leaks from code that directly calls the dynamic node add and delete functions directly instead of using changesets.
This commit should result in powerpc systems that dynamically allocate a node, then later deallocate the node to have a memory leak when the node is deallocated.
The next commit will fix the leak.
Tested-by: Alan Tull atull@kernel.org Acked-by: Michael Ellerman mpe@ellerman.id.au (powerpc) Signed-off-by: Frank Rowand frank.rowand@sony.com Cc: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/of/dynamic.c | 3 --- drivers/of/kobj.c | 4 +++- 2 files changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -275,9 +275,6 @@ void __of_detach_node(struct device_node
/** * of_detach_node() - "Unplug" a node from the device tree. - * - * The caller must hold a reference to the node. The memory associated with - * the node is not freed until its refcount goes to zero. */ int of_detach_node(struct device_node *np) { --- a/drivers/of/kobj.c +++ b/drivers/of/kobj.c @@ -133,6 +133,9 @@ int __of_attach_node_sysfs(struct device } if (!name) return -ENOMEM; + + of_node_get(np); + rc = kobject_add(&np->kobj, parent, "%s", name); kfree(name); if (rc) @@ -159,6 +162,5 @@ void __of_detach_node_sysfs(struct devic kobject_del(&np->kobj); }
- /* finally remove the kobj_init ref */ of_node_put(np); }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Rowand frank.rowand@sony.com
commit 6b4955ba7bc05e40c8c41071cc121bc26ca65277 upstream.
The changeset entry 'update property' was used for new properties in an overlay instead of 'add property'.
The decision of whether to use 'update property' was based on whether the property already exists in the subtree where the node is being spliced into. At the top level of creating a changeset describing the overlay, the target node is in the live devicetree, so checking whether the property exists in the target node returns the correct result. As soon as the changeset creation algorithm recurses into a new node, the target is no longer in the live devicetree, but is instead in the detached overlay tree, thus all properties are incorrectly found to already exist in the target.
This fix will expose another devicetree bug that will be fixed in the following patch in the series.
When this patch is applied the errors reported by the devictree unittest will change, and the unittest results will change from:
### dt-test ### end of unittest - 210 passed, 0 failed
to
### dt-test ### end of unittest - 203 passed, 7 failed
Tested-by: Alan Tull atull@kernel.org Signed-off-by: Frank Rowand frank.rowand@sony.com Cc: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/of/overlay.c | 112 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 74 insertions(+), 38 deletions(-)
--- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -24,6 +24,26 @@ #include "of_private.h"
/** + * struct target - info about current target node as recursing through overlay + * @np: node where current level of overlay will be applied + * @in_livetree: @np is a node in the live devicetree + * + * Used in the algorithm to create the portion of a changeset that describes + * an overlay fragment, which is a devicetree subtree. Initially @np is a node + * in the live devicetree where the overlay subtree is targeted to be grafted + * into. When recursing to the next level of the overlay subtree, the target + * also recurses to the next level of the live devicetree, as long as overlay + * subtree node also exists in the live devicetree. When a node in the overlay + * subtree does not exist at the same level in the live devicetree, target->np + * points to a newly allocated node, and all subsequent targets in the subtree + * will be newly allocated nodes. + */ +struct target { + struct device_node *np; + bool in_livetree; +}; + +/** * struct fragment - info about fragment nodes in overlay expanded device tree * @target: target of the overlay operation * @overlay: pointer to the __overlay__ node @@ -72,8 +92,7 @@ static int devicetree_corrupt(void) }
static int build_changeset_next_level(struct overlay_changeset *ovcs, - struct device_node *target_node, - const struct device_node *overlay_node); + struct target *target, const struct device_node *overlay_node);
/* * of_resolve_phandles() finds the largest phandle in the live tree. @@ -257,14 +276,17 @@ err_free_target_path: /** * add_changeset_property() - add @overlay_prop to overlay changeset * @ovcs: overlay changeset - * @target_node: where to place @overlay_prop in live tree + * @target: where @overlay_prop will be placed * @overlay_prop: property to add or update, from overlay tree * @is_symbols_prop: 1 if @overlay_prop is from node "/__symbols__" * - * If @overlay_prop does not already exist in @target_node, add changeset entry - * to add @overlay_prop in @target_node, else add changeset entry to update + * If @overlay_prop does not already exist in live devicetree, add changeset + * entry to add @overlay_prop in @target, else add changeset entry to update * value of @overlay_prop. * + * @target may be either in the live devicetree or in a new subtree that + * is contained in the changeset. + * * Some special properties are not updated (no error returned). * * Update of property in symbols node is not allowed. @@ -273,20 +295,22 @@ err_free_target_path: * invalid @overlay. */ static int add_changeset_property(struct overlay_changeset *ovcs, - struct device_node *target_node, - struct property *overlay_prop, + struct target *target, struct property *overlay_prop, bool is_symbols_prop) { struct property *new_prop = NULL, *prop; int ret = 0;
- prop = of_find_property(target_node, overlay_prop->name, NULL); - if (!of_prop_cmp(overlay_prop->name, "name") || !of_prop_cmp(overlay_prop->name, "phandle") || !of_prop_cmp(overlay_prop->name, "linux,phandle")) return 0;
+ if (target->in_livetree) + prop = of_find_property(target->np, overlay_prop->name, NULL); + else + prop = NULL; + if (is_symbols_prop) { if (prop) return -EINVAL; @@ -299,10 +323,10 @@ static int add_changeset_property(struct return -ENOMEM;
if (!prop) - ret = of_changeset_add_property(&ovcs->cset, target_node, + ret = of_changeset_add_property(&ovcs->cset, target->np, new_prop); else - ret = of_changeset_update_property(&ovcs->cset, target_node, + ret = of_changeset_update_property(&ovcs->cset, target->np, new_prop);
if (ret) { @@ -315,14 +339,14 @@ static int add_changeset_property(struct
/** * add_changeset_node() - add @node (and children) to overlay changeset - * @ovcs: overlay changeset - * @target_node: where to place @node in live tree - * @node: node from within overlay device tree fragment + * @ovcs: overlay changeset + * @target: where @node will be placed in live tree or changeset + * @node: node from within overlay device tree fragment * - * If @node does not already exist in @target_node, add changeset entry - * to add @node in @target_node. + * If @node does not already exist in @target, add changeset entry + * to add @node in @target. * - * If @node already exists in @target_node, and the existing node has + * If @node already exists in @target, and the existing node has * a phandle, the overlay node is not allowed to have a phandle. * * If @node has child nodes, add the children recursively via @@ -355,15 +379,16 @@ static int add_changeset_property(struct * invalid @overlay. */ static int add_changeset_node(struct overlay_changeset *ovcs, - struct device_node *target_node, struct device_node *node) + struct target *target, struct device_node *node) { const char *node_kbasename; struct device_node *tchild; + struct target target_child; int ret = 0;
node_kbasename = kbasename(node->full_name);
- for_each_child_of_node(target_node, tchild) + for_each_child_of_node(target->np, tchild) if (!of_node_cmp(node_kbasename, kbasename(tchild->full_name))) break;
@@ -372,22 +397,28 @@ static int add_changeset_node(struct ove if (!tchild) return -ENOMEM;
- tchild->parent = target_node; + tchild->parent = target->np; of_node_set_flag(tchild, OF_OVERLAY);
ret = of_changeset_attach_node(&ovcs->cset, tchild); if (ret) return ret;
- ret = build_changeset_next_level(ovcs, tchild, node); + target_child.np = tchild; + target_child.in_livetree = false; + + ret = build_changeset_next_level(ovcs, &target_child, node); of_node_put(tchild); return ret; }
- if (node->phandle && tchild->phandle) + if (node->phandle && tchild->phandle) { ret = -EINVAL; - else - ret = build_changeset_next_level(ovcs, tchild, node); + } else { + target_child.np = tchild; + target_child.in_livetree = target->in_livetree; + ret = build_changeset_next_level(ovcs, &target_child, node); + } of_node_put(tchild);
return ret; @@ -396,7 +427,7 @@ static int add_changeset_node(struct ove /** * build_changeset_next_level() - add level of overlay changeset * @ovcs: overlay changeset - * @target_node: where to place @overlay_node in live tree + * @target: where to place @overlay_node in live tree * @overlay_node: node from within an overlay device tree fragment * * Add the properties (if any) and nodes (if any) from @overlay_node to the @@ -409,27 +440,26 @@ static int add_changeset_node(struct ove * invalid @overlay_node. */ static int build_changeset_next_level(struct overlay_changeset *ovcs, - struct device_node *target_node, - const struct device_node *overlay_node) + struct target *target, const struct device_node *overlay_node) { struct device_node *child; struct property *prop; int ret;
for_each_property_of_node(overlay_node, prop) { - ret = add_changeset_property(ovcs, target_node, prop, 0); + ret = add_changeset_property(ovcs, target, prop, 0); if (ret) { pr_debug("Failed to apply prop @%pOF/%s, err=%d\n", - target_node, prop->name, ret); + target->np, prop->name, ret); return ret; } }
for_each_child_of_node(overlay_node, child) { - ret = add_changeset_node(ovcs, target_node, child); + ret = add_changeset_node(ovcs, target, child); if (ret) { pr_debug("Failed to apply node @%pOF/%pOFn, err=%d\n", - target_node, child, ret); + target->np, child, ret); of_node_put(child); return ret; } @@ -442,17 +472,17 @@ static int build_changeset_next_level(st * Add the properties from __overlay__ node to the @ovcs->cset changeset. */ static int build_changeset_symbols_node(struct overlay_changeset *ovcs, - struct device_node *target_node, + struct target *target, const struct device_node *overlay_symbols_node) { struct property *prop; int ret;
for_each_property_of_node(overlay_symbols_node, prop) { - ret = add_changeset_property(ovcs, target_node, prop, 1); + ret = add_changeset_property(ovcs, target, prop, 1); if (ret) { pr_debug("Failed to apply prop @%pOF/%s, err=%d\n", - target_node, prop->name, ret); + target->np, prop->name, ret); return ret; } } @@ -475,6 +505,7 @@ static int build_changeset_symbols_node( static int build_changeset(struct overlay_changeset *ovcs) { struct fragment *fragment; + struct target target; int fragments_count, i, ret;
/* @@ -489,7 +520,9 @@ static int build_changeset(struct overla for (i = 0; i < fragments_count; i++) { fragment = &ovcs->fragments[i];
- ret = build_changeset_next_level(ovcs, fragment->target, + target.np = fragment->target; + target.in_livetree = true; + ret = build_changeset_next_level(ovcs, &target, fragment->overlay); if (ret) { pr_debug("apply failed '%pOF'\n", fragment->target); @@ -499,7 +532,10 @@ static int build_changeset(struct overla
if (ovcs->symbols_fragment) { fragment = &ovcs->fragments[ovcs->count - 1]; - ret = build_changeset_symbols_node(ovcs, fragment->target, + + target.np = fragment->target; + target.in_livetree = true; + ret = build_changeset_symbols_node(ovcs, &target, fragment->overlay); if (ret) { pr_debug("apply failed '%pOF'\n", fragment->target); @@ -517,7 +553,7 @@ static int build_changeset(struct overla * 1) "target" property containing the phandle of the target * 2) "target-path" property containing the path of the target */ -static struct device_node *find_target_node(struct device_node *info_node) +static struct device_node *find_target(struct device_node *info_node) { struct device_node *node; const char *path; @@ -623,7 +659,7 @@ static int init_overlay_changeset(struct
fragment = &fragments[cnt]; fragment->overlay = overlay_node; - fragment->target = find_target_node(node); + fragment->target = find_target(node); if (!fragment->target) { of_node_put(fragment->overlay); ret = -EINVAL;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Rowand frank.rowand@sony.com
commit 8814dc46bd9e347d4de55ec5bf8f16ea54470499 upstream.
When allocating a new node, add_changeset_node() was duplicating the properties from the respective node in the overlay instead of allocating a node with no properties.
When this patch is applied the errors reported by the devictree unittest from patch "of: overlay: add tests to validate kfrees from overlay removal" will no longer occur. These error messages are of the form:
"OF: ERROR: ..."
and the unittest results will change from:
### dt-test ### end of unittest - 203 passed, 7 failed
to
### dt-test ### end of unittest - 210 passed, 0 failed
Tested-by: Alan Tull atull@kernel.org Signed-off-by: Frank Rowand frank.rowand@sony.com Cc: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/of/overlay.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -393,7 +393,7 @@ static int add_changeset_node(struct ove break;
if (!tchild) { - tchild = __of_node_dup(node, node_kbasename); + tchild = __of_node_dup(NULL, node_kbasename); if (!tchild) return -ENOMEM;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexei Naberezhnov anaberezhnov@fb.com
commit 483cbbeddd5fe2c80fd4141ff0748fa06c4ff146 upstream.
This fixes the case when md array assembly fails because of raid cache recovery unable to allocate a stripe, despite attempts to replay stripes and increase cache size. This happens because stripes released by r5c_recovery_replay_stripes and raid5_set_cache_size don't become available for allocation immediately. Released stripes first are placed on conf->released_stripes list and require md thread to merge them on conf->inactive_list before they can be allocated.
Patch allows final allocation attempt during cache recovery to wait for new stripes to become availabe for allocation.
Cc: linux-raid@vger.kernel.org Cc: Shaohua Li shli@kernel.org Cc: linux-stable stable@vger.kernel.org # 4.10+ Fixes: b4c625c67362 ("md/r5cache: r5cache recovery: part 1") Signed-off-by: Alexei Naberezhnov anaberezhnov@fb.com Signed-off-by: Song Liu songliubraving@fb.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/raid5-cache.c | 33 ++++++++++++++++++++++----------- drivers/md/raid5.c | 8 ++++++-- 2 files changed, 28 insertions(+), 13 deletions(-)
--- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -1935,12 +1935,14 @@ out: }
static struct stripe_head * -r5c_recovery_alloc_stripe(struct r5conf *conf, - sector_t stripe_sect) +r5c_recovery_alloc_stripe( + struct r5conf *conf, + sector_t stripe_sect, + int noblock) { struct stripe_head *sh;
- sh = raid5_get_active_stripe(conf, stripe_sect, 0, 1, 0); + sh = raid5_get_active_stripe(conf, stripe_sect, 0, noblock, 0); if (!sh) return NULL; /* no more stripe available */
@@ -2150,7 +2152,7 @@ r5c_recovery_analyze_meta_block(struct r stripe_sect);
if (!sh) { - sh = r5c_recovery_alloc_stripe(conf, stripe_sect); + sh = r5c_recovery_alloc_stripe(conf, stripe_sect, 1); /* * cannot get stripe from raid5_get_active_stripe * try replay some stripes @@ -2159,20 +2161,29 @@ r5c_recovery_analyze_meta_block(struct r r5c_recovery_replay_stripes( cached_stripe_list, ctx); sh = r5c_recovery_alloc_stripe( - conf, stripe_sect); + conf, stripe_sect, 1); } if (!sh) { + int new_size = conf->min_nr_stripes * 2; pr_debug("md/raid:%s: Increasing stripe cache size to %d to recovery data on journal.\n", mdname(mddev), - conf->min_nr_stripes * 2); - raid5_set_cache_size(mddev, - conf->min_nr_stripes * 2); - sh = r5c_recovery_alloc_stripe(conf, - stripe_sect); + new_size); + ret = raid5_set_cache_size(mddev, new_size); + if (conf->min_nr_stripes <= new_size / 2) { + pr_err("md/raid:%s: Cannot increase cache size, ret=%d, new_size=%d, min_nr_stripes=%d, max_nr_stripes=%d\n", + mdname(mddev), + ret, + new_size, + conf->min_nr_stripes, + conf->max_nr_stripes); + return -ENOMEM; + } + sh = r5c_recovery_alloc_stripe( + conf, stripe_sect, 0); } if (!sh) { pr_err("md/raid:%s: Cannot get enough stripes due to memory pressure. Recovery failed.\n", - mdname(mddev)); + mdname(mddev)); return -ENOMEM; } list_add_tail(&sh->lru, cached_stripe_list); --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -6369,6 +6369,7 @@ raid5_show_stripe_cache_size(struct mdde int raid5_set_cache_size(struct mddev *mddev, int size) { + int result = 0; struct r5conf *conf = mddev->private;
if (size <= 16 || size > 32768) @@ -6385,11 +6386,14 @@ raid5_set_cache_size(struct mddev *mddev
mutex_lock(&conf->cache_size_mutex); while (size > conf->max_nr_stripes) - if (!grow_one_stripe(conf, GFP_KERNEL)) + if (!grow_one_stripe(conf, GFP_KERNEL)) { + conf->min_nr_stripes = conf->max_nr_stripes; + result = -ENOMEM; break; + } mutex_unlock(&conf->cache_size_mutex);
- return 0; + return result; } EXPORT_SYMBOL(raid5_set_cache_size);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara paulo@paulo.ac
commit 28eb24ff75c5ac130eb326b3b4d0dcecfc0f427d upstream.
In case a hostname resolves to a different IP address (e.g. long running mounts), make sure to resolve it every time prior to calling generic_ip_connect() in reconnect.
Suggested-by: Steve French stfrench@microsoft.com Signed-off-by: Paulo Alcantara palcantara@suse.de Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Pavel Shilovsky pshilov@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/connect.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+)
--- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -50,6 +50,7 @@ #include "cifs_unicode.h" #include "cifs_debug.h" #include "cifs_fs_sb.h" +#include "dns_resolve.h" #include "ntlmssp.h" #include "nterr.h" #include "rfc1002pdu.h" @@ -319,6 +320,53 @@ static int cifs_setup_volume_info(struct const char *devname, bool is_smb3);
/* + * Resolve hostname and set ip addr in tcp ses. Useful for hostnames that may + * get their ip addresses changed at some point. + * + * This should be called with server->srv_mutex held. + */ +#ifdef CONFIG_CIFS_DFS_UPCALL +static int reconn_set_ipaddr(struct TCP_Server_Info *server) +{ + int rc; + int len; + char *unc, *ipaddr = NULL; + + if (!server->hostname) + return -EINVAL; + + len = strlen(server->hostname) + 3; + + unc = kmalloc(len, GFP_KERNEL); + if (!unc) { + cifs_dbg(FYI, "%s: failed to create UNC path\n", __func__); + return -ENOMEM; + } + snprintf(unc, len, "\\%s", server->hostname); + + rc = dns_resolve_server_name_to_ip(unc, &ipaddr); + kfree(unc); + + if (rc < 0) { + cifs_dbg(FYI, "%s: failed to resolve server part of %s to IP: %d\n", + __func__, server->hostname, rc); + return rc; + } + + rc = cifs_convert_address((struct sockaddr *)&server->dstaddr, ipaddr, + strlen(ipaddr)); + kfree(ipaddr); + + return !rc ? -1 : 0; +} +#else +static inline int reconn_set_ipaddr(struct TCP_Server_Info *server) +{ + return 0; +} +#endif + +/* * cifs tcp session reconnection * * mark tcp session as reconnecting so temporarily locked @@ -418,6 +466,11 @@ cifs_reconnect(struct TCP_Server_Info *s rc = generic_ip_connect(server); if (rc) { cifs_dbg(FYI, "reconnect error %d\n", rc); + rc = reconn_set_ipaddr(server); + if (rc) { + cifs_dbg(FYI, "%s: failed to resolve hostname: %d\n", + __func__, rc); + } mutex_unlock(&server->srv_mutex); msleep(3000); } else {
On Mon, Feb 04, 2019 at 11:36:20AM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.20.7 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Feb 6 10:35:33 UTC 2019. Anything received after that time might be too late.
Build results: total: 159 pass: 159 fail: 0 Qemu test results: total: 343 pass: 343 fail: 0
Guenter
On Mon, 4 Feb 2019 at 16:20, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.20.7 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Feb 6 10:35:33 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.20.7-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.20.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.20.7-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.20.y git commit: cc6e62e7b38a4a9309c07e6e5d45d8e619570823 git describe: v4.20.6-81-gcc6e62e7b38a Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.20-oe/build/v4.20.6-81-...
No regressions (compared to build v4.20.5-121-gb3b8ce0697df)
No fixes (compared to build v4.20.5-121-gb3b8ce0697df)
Ran 21126 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * spectre-meltdown-checker-test * ltp-fs-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Tue, Feb 05, 2019 at 11:50:57AM +0530, Naresh Kamboju wrote:
On Mon, 4 Feb 2019 at 16:20, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.20.7 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Feb 6 10:35:33 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.20.7-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.20.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Thanks for testing these and letting me know.
greg k-h
linux-stable-mirror@lists.linaro.org