This is the start of the stable review cycle for the 4.13.16 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri Nov 24 10:11:25 UTC 2017. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.13.16-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.13.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.13.16-rc1
Jan Harkes jaharkes@cs.cmu.edu coda: fix 'kernel memory exposure attempt' in fsync
Suravee Suthikulpanit suravee.suthikulpanit@amd.com x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask
Jaewon Kim jaewon31.kim@samsung.com mm/page_ext.c: check if page_ext is not prepared
Pavel Tatashin pasha.tatashin@oracle.com mm/page_alloc.c: broken deferred calculation
Corey Minyard cminyard@mvista.com ipmi: fix unsigned long underflow
alex chen alex.chen@huawei.com ocfs2: should wait dio before inode lock in ocfs2_setattr()
Changwei Ge ge.changwei@h3c.com ocfs2: fix cluster hang after a node dies
Jann Horn jannh@google.com mm/pagewalk.c: report holes in hugetlb ranges
Neeraj Upadhyay neeraju@codeaurora.org rcu: Fix up pending cbs check in rcu_prepare_for_idle
Alexander Steffen Alexander.Steffen@infineon.com tpm-dev-common: Reject too short writes
Ji-Ze Hong (Peter Hong) hpeter@gmail.com serial: 8250_fintek: Fix finding base_port with activated SuperIO
Lukas Wunner lukas@wunner.de serial: omap: Fix EFR write on RTS deassertion
Roberto Sassu roberto.sassu@huawei.com ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
Eric W. Biederman ebiederm@xmission.com net/sctp: Always set scope_id in sctp_inet6_skb_msgname
Huacai Chen chenhc@lemote.com fealnx: Fix building error on MIPS
Xin Long lucien.xin@gmail.com sctp: do not peel off an assoc from one netns to another one
Bjørn Mork bjorn@mork.no net: cdc_ncm: GetNtbFormat endian fix
Xin Long lucien.xin@gmail.com vxlan: fix the issue that neigh proxy blocks all icmpv6 packets
Jason A. Donenfeld Jason@zx2c4.com af_netlink: ensure that NLMSG_DONE never fails in dumps
Inbar Karmy inbark@mellanox.com net/mlx5e: Set page to null in case dma mapping fails
Huy Nguyen huyn@mellanox.com net/mlx5: Cancel health poll before sending panic teardown command
Cong Wang xiyou.wangcong@gmail.com vlan: fix a use-after-free in vlan_device_event()
Yuchung Cheng ycheng@google.com tcp: fix tcp_fastretrans_alert warning
Eric Dumazet edumazet@google.com tcp: gso: avoid refcount_t warning from tcp_gso_segment()
Andrey Konovalov andreyknvl@google.com net: usb: asix: fill null-ptr-deref in asix_suspend
Kristian Evensen kristian.evensen@gmail.com qmi_wwan: Add missing skb_reset_mac_header-call
Bjørn Mork bjorn@mork.no net: qmi_wwan: fix divide by 0 on bad descriptors
Bjørn Mork bjorn@mork.no net: cdc_ether: fix divide by 0 on bad descriptors
Hangbin Liu liuhangbin@gmail.com bonding: discard lowest hash bit for 802.3ad layer3+4
Guillaume Nault g.nault@alphalink.fr l2tp: don't use l2tp_tunnel_find() in l2tp_ip and l2tp_ip6
Ye Yin hustcat@gmail.com netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
Florian Fainelli f.fainelli@gmail.com net: systemport: Correct IPG length settings
Eric Dumazet edumazet@google.com tcp: do not mangle skb->cb[] in tcp_make_synack()
Jeff Barnhill 0xeffeff@gmail.com net: vrf: correct FRA_L3MDEV encode type
Konstantin Khlebnikov khlebnikov@yandex-team.ru tcp_nv: fix division by zero in tcpnv_acked()
-------------
Diffstat:
Makefile | 4 ++-- arch/x86/kernel/cpu/intel_cacheinfo.c | 32 ++++++++++++++----------- drivers/char/ipmi/ipmi_msghandler.c | 10 ++++---- drivers/char/tpm/tpm-dev-common.c | 6 +++++ drivers/net/bonding/bond_main.c | 2 +- drivers/net/ethernet/broadcom/bcmsysport.c | 10 ++++---- drivers/net/ethernet/fealnx.c | 6 ++--- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 ++++------ drivers/net/ethernet/mellanox/mlx5/core/main.c | 7 ++++++ drivers/net/usb/asix_devices.c | 4 ++-- drivers/net/usb/cdc_ether.c | 2 +- drivers/net/usb/cdc_ncm.c | 4 ++-- drivers/net/usb/qmi_wwan.c | 3 ++- drivers/net/vrf.c | 2 +- drivers/net/vxlan.c | 31 ++++++++++-------------- drivers/tty/serial/8250/8250_fintek.c | 3 +++ drivers/tty/serial/omap-serial.c | 2 +- fs/coda/upcall.c | 3 +-- fs/ocfs2/dlm/dlmrecovery.c | 1 + fs/ocfs2/file.c | 9 +++++-- include/linux/mmzone.h | 3 ++- include/linux/skbuff.h | 7 ++++++ kernel/rcu/tree_plugin.h | 2 +- mm/page_alloc.c | 27 ++++++++++++++------- mm/page_ext.c | 4 ---- mm/pagewalk.c | 6 ++++- net/8021q/vlan.c | 6 ++--- net/core/skbuff.c | 1 + net/ipv4/tcp_input.c | 3 +-- net/ipv4/tcp_nv.c | 2 +- net/ipv4/tcp_offload.c | 12 ++++++++-- net/ipv4/tcp_output.c | 9 ++----- net/l2tp/l2tp_ip.c | 24 +++++++------------ net/l2tp/l2tp_ip6.c | 24 +++++++------------ net/netlink/af_netlink.c | 17 ++++++++----- net/netlink/af_netlink.h | 1 + net/sctp/ipv6.c | 5 ++-- net/sctp/socket.c | 4 ++++ security/integrity/ima/ima_appraise.c | 3 +++ 39 files changed, 179 insertions(+), 134 deletions(-)
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konstantin Khlebnikov khlebnikov@yandex-team.ru
[ Upstream commit 4eebff27ca4182bbf5f039dd60d79e2d7c0a707e ]
Average RTT could become zero. This happened in real life at least twice. This patch treats zero as 1us.
Signed-off-by: Konstantin Khlebnikov khlebnikov@yandex-team.ru Acked-by: Lawrence Brakmo Brakmo@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_nv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/tcp_nv.c +++ b/net/ipv4/tcp_nv.c @@ -263,7 +263,7 @@ static void tcpnv_acked(struct sock *sk,
/* rate in 100's bits per second */ rate64 = ((u64)sample->in_flight) * 8000000; - rate = (u32)div64_u64(rate64, (u64)(avg_rtt * 100)); + rate = (u32)div64_u64(rate64, (u64)(avg_rtt ?: 1) * 100);
/* Remember the maximum rate seen during this RTT * Note: It may be more than one RTT. This function should be
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Barnhill 0xeffeff@gmail.com
[ Upstream commit 18129a24983906eaf2a2d448ce4b83e27091ebe2 ]
FRA_L3MDEV is defined as U8, but is being added as a U32 attribute. On big endian architecture, this results in the l3mdev entry not being added to the FIB rules.
Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create") Signed-off-by: Jeff Barnhill 0xeffeff@gmail.com Acked-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/vrf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -1271,7 +1271,7 @@ static int vrf_fib_rule(const struct net frh->family = family; frh->action = FR_ACT_TO_TBL;
- if (nla_put_u32(skb, FRA_L3MDEV, 1)) + if (nla_put_u8(skb, FRA_L3MDEV, 1)) goto nla_put_failure;
if (nla_put_u32(skb, FRA_PRIORITY, FIB_RULE_PREF))
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 3b11775033dc87c3d161996c54507b15ba26414a ]
Christoph Paasch sent a patch to address the following issue :
tcp_make_synack() is leaving some TCP private info in skb->cb[], then send the packet by other means than tcp_transmit_skb()
tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.
tcp_make_synack() should not use tcp_init_nondata_skb() :
tcp_init_nondata_skb() really should be limited to skbs put in write/rtx queues (the ones that are only sent via tcp_transmit_skb())
This patch fixes the issue and should even save few cpu cycles ;)
Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Christoph Paasch cpaasch@apple.com Reviewed-by: Christoph Paasch cpaasch@apple.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_output.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
--- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3207,13 +3207,8 @@ struct sk_buff *tcp_make_synack(const st th->source = htons(ireq->ir_num); th->dest = ireq->ir_rmt_port; skb->mark = ireq->ir_mark; - /* Setting of flags are superfluous here for callers (and ECE is - * not even correctly set) - */ - tcp_init_nondata_skb(skb, tcp_rsk(req)->snt_isn, - TCPHDR_SYN | TCPHDR_ACK); - - th->seq = htonl(TCP_SKB_CB(skb)->seq); + skb->ip_summed = CHECKSUM_PARTIAL; + th->seq = htonl(tcp_rsk(req)->snt_isn); /* XXX data is queued and acked as is. No buffer/window check */ th->ack_seq = htonl(tcp_rsk(req)->rcv_nxt);
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Fainelli f.fainelli@gmail.com
[ Upstream commit 93824c80bf47ebe087414b3a40ca0ff9aab7d1fb ]
Due to a documentation mistake, the IPG length was set to 0x12 while it should have been 12 (decimal). This would affect short packet (64B typically) performance since the IPG was bigger than necessary.
Fixes: 44a4524c54af ("net: systemport: Add support for SYSTEMPORT Lite") Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bcmsysport.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/broadcom/bcmsysport.c +++ b/drivers/net/ethernet/broadcom/bcmsysport.c @@ -1743,15 +1743,17 @@ static inline void bcm_sysport_mask_all_
static inline void gib_set_pad_extension(struct bcm_sysport_priv *priv) { - u32 __maybe_unused reg; + u32 reg;
- /* Include Broadcom tag in pad extension */ + reg = gib_readl(priv, GIB_CONTROL); + /* Include Broadcom tag in pad extension and fix up IPG_LENGTH */ if (netdev_uses_dsa(priv->netdev)) { - reg = gib_readl(priv, GIB_CONTROL); reg &= ~(GIB_PAD_EXTENSION_MASK << GIB_PAD_EXTENSION_SHIFT); reg |= ENET_BRCM_TAG_LEN << GIB_PAD_EXTENSION_SHIFT; - gib_writel(priv, reg, GIB_CONTROL); } + reg &= ~(GIB_IPG_LEN_MASK << GIB_IPG_LEN_SHIFT); + reg |= 12 << GIB_IPG_LEN_SHIFT; + gib_writel(priv, reg, GIB_CONTROL); }
static int bcm_sysport_open(struct net_device *dev)
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ye Yin hustcat@gmail.com
[ Upstream commit 2b5ec1a5f9738ee7bf8f5ec0526e75e00362c48f ]
When run ipvs in two different network namespace at the same host, and one ipvs transport network traffic to the other network namespace ipvs. 'ipvs_property' flag will make the second ipvs take no effect. So we should clear 'ipvs_property' when SKB network namespace changed.
Fixes: 621e84d6f373 ("dev: introduce skb_scrub_packet()") Signed-off-by: Ye Yin hustcat@gmail.com Signed-off-by: Wei Zhou chouryzhou@gmail.com Signed-off-by: Julian Anastasov ja@ssi.bg Signed-off-by: Simon Horman horms@verge.net.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/skbuff.h | 7 +++++++ net/core/skbuff.c | 1 + 2 files changed, 8 insertions(+)
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3655,6 +3655,13 @@ static inline void nf_reset_trace(struct #endif }
+static inline void ipvs_reset(struct sk_buff *skb) +{ +#if IS_ENABLED(CONFIG_IP_VS) + skb->ipvs_property = 0; +#endif +} + /* Note: This doesn't put any conntrack and bridge info in dst. */ static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src, bool copy) --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4476,6 +4476,7 @@ void skb_scrub_packet(struct sk_buff *sk if (!xnet) return;
+ ipvs_reset(skb); skb_orphan(skb); skb->mark = 0; }
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guillaume Nault g.nault@alphalink.fr
[ Upstream commit 8f7dc9ae4a7aece9fbc3e6637bdfa38b36bcdf09 ]
Using l2tp_tunnel_find() in l2tp_ip_recv() is wrong for two reasons:
* It doesn't take a reference on the returned tunnel, which makes the call racy wrt. concurrent tunnel deletion.
* The lookup is only based on the tunnel identifier, so it can return a tunnel that doesn't match the packet's addresses or protocol.
For example, a packet sent to an L2TPv3 over IPv6 tunnel can be delivered to an L2TPv2 over UDPv4 tunnel. This is worse than a simple cross-talk: when delivering the packet to an L2TP over UDP tunnel, the corresponding socket is UDP, where ->sk_backlog_rcv() is NULL. Calling sk_receive_skb() will then crash the kernel by trying to execute this callback.
And l2tp_tunnel_find() isn't even needed here. __l2tp_ip_bind_lookup() properly checks the socket binding and connection settings. It was used as a fallback mechanism for finding tunnels that didn't have their data path registered yet. But it's not limited to this case and can be used to replace l2tp_tunnel_find() in the general case.
Fix l2tp_ip6 in the same way.
Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support") Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6") Signed-off-by: Guillaume Nault g.nault@alphalink.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/l2tp/l2tp_ip.c | 24 +++++++++--------------- net/l2tp/l2tp_ip6.c | 24 +++++++++--------------- 2 files changed, 18 insertions(+), 30 deletions(-)
--- a/net/l2tp/l2tp_ip.c +++ b/net/l2tp/l2tp_ip.c @@ -123,6 +123,7 @@ static int l2tp_ip_recv(struct sk_buff * unsigned char *ptr, *optr; struct l2tp_session *session; struct l2tp_tunnel *tunnel = NULL; + struct iphdr *iph; int length;
if (!pskb_may_pull(skb, 4)) @@ -178,24 +179,17 @@ pass_up: goto discard;
tunnel_id = ntohl(*(__be32 *) &skb->data[4]); - tunnel = l2tp_tunnel_find(net, tunnel_id); - if (tunnel) { - sk = tunnel->sock; - sock_hold(sk); - } else { - struct iphdr *iph = (struct iphdr *) skb_network_header(skb); - - read_lock_bh(&l2tp_ip_lock); - sk = __l2tp_ip_bind_lookup(net, iph->daddr, iph->saddr, - inet_iif(skb), tunnel_id); - if (!sk) { - read_unlock_bh(&l2tp_ip_lock); - goto discard; - } + iph = (struct iphdr *)skb_network_header(skb);
- sock_hold(sk); + read_lock_bh(&l2tp_ip_lock); + sk = __l2tp_ip_bind_lookup(net, iph->daddr, iph->saddr, inet_iif(skb), + tunnel_id); + if (!sk) { read_unlock_bh(&l2tp_ip_lock); + goto discard; } + sock_hold(sk); + read_unlock_bh(&l2tp_ip_lock);
if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_put; --- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -136,6 +136,7 @@ static int l2tp_ip6_recv(struct sk_buff unsigned char *ptr, *optr; struct l2tp_session *session; struct l2tp_tunnel *tunnel = NULL; + struct ipv6hdr *iph; int length;
if (!pskb_may_pull(skb, 4)) @@ -192,24 +193,17 @@ pass_up: goto discard;
tunnel_id = ntohl(*(__be32 *) &skb->data[4]); - tunnel = l2tp_tunnel_find(net, tunnel_id); - if (tunnel) { - sk = tunnel->sock; - sock_hold(sk); - } else { - struct ipv6hdr *iph = ipv6_hdr(skb); - - read_lock_bh(&l2tp_ip6_lock); - sk = __l2tp_ip6_bind_lookup(net, &iph->daddr, &iph->saddr, - inet6_iif(skb), tunnel_id); - if (!sk) { - read_unlock_bh(&l2tp_ip6_lock); - goto discard; - } + iph = ipv6_hdr(skb);
- sock_hold(sk); + read_lock_bh(&l2tp_ip6_lock); + sk = __l2tp_ip6_bind_lookup(net, &iph->daddr, &iph->saddr, + inet6_iif(skb), tunnel_id); + if (!sk) { read_unlock_bh(&l2tp_ip6_lock); + goto discard; } + sock_hold(sk); + read_unlock_bh(&l2tp_ip6_lock);
if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_put;
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit b5f862180d7011d9575d0499fa37f0f25b423b12 ]
After commit 07f4c90062f8 ("tcp/dccp: try to not exhaust ip_local_port_range in connect()"), we will try to use even ports for connect(). Then if an application (seen clearly with iperf) opens multiple streams to the same destination IP and port, each stream will be given an even source port.
So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing will always hash all these streams to the same interface. And the total throughput will limited to a single slave.
Change the tcp code will impact the whole tcp behavior, only for bonding usage. Paolo Abeni suggested fix this by changing the bonding code only, which should be more reasonable, and less impact.
Fix this by discarding the lowest hash bit because it contains little entropy. After the fix we can re-balance between slaves.
Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/bonding/bond_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3253,7 +3253,7 @@ u32 bond_xmit_hash(struct bonding *bond, hash ^= (hash >> 16); hash ^= (hash >> 8);
- return hash; + return hash >> 1; }
/*-------------------------- Device entry points ----------------------------*/
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrey Konovalov andreyknvl@google.com
[ Upstream commit 8f5624629105589bcc23d0e51cc01bd8103d09a5 ]
When asix_suspend() is called dev->driver_priv might not have been assigned a value, so we need to check that it's not NULL.
Similar issue is present in asix_resume(), this patch fixes it as well.
Found by syzkaller.
kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN Modules linked in: CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc4-43422-geccacdd69a8c #400 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: usb_hub_wq hub_event task: ffff88006bb36300 task.stack: ffff88006bba8000 RIP: 0010:asix_suspend+0x76/0xc0 drivers/net/usb/asix_devices.c:629 RSP: 0018:ffff88006bbae718 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff880061ba3b80 RCX: 1ffff1000c34d644 RDX: 0000000000000001 RSI: 0000000000000402 RDI: 0000000000000008 RBP: ffff88006bbae738 R08: 1ffff1000d775cad R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800630a8b40 R13: 0000000000000000 R14: 0000000000000402 R15: ffff880061ba3b80 FS: 0000000000000000(0000) GS:ffff88006c600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff33cf89000 CR3: 0000000061c0a000 CR4: 00000000000006f0 Call Trace: usb_suspend_interface drivers/usb/core/driver.c:1209 usb_suspend_both+0x27f/0x7e0 drivers/usb/core/driver.c:1314 usb_runtime_suspend+0x41/0x120 drivers/usb/core/driver.c:1852 __rpm_callback+0x339/0xb60 drivers/base/power/runtime.c:334 rpm_callback+0x106/0x220 drivers/base/power/runtime.c:461 rpm_suspend+0x465/0x1980 drivers/base/power/runtime.c:596 __pm_runtime_suspend+0x11e/0x230 drivers/base/power/runtime.c:1009 pm_runtime_put_sync_autosuspend ./include/linux/pm_runtime.h:251 usb_new_device+0xa37/0x1020 drivers/usb/core/hub.c:2487 hub_port_connect drivers/usb/core/hub.c:4903 hub_port_connect_change drivers/usb/core/hub.c:5009 port_event drivers/usb/core/hub.c:5115 hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195 process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119 worker_thread+0x221/0x1850 kernel/workqueue.c:2253 kthread+0x3a1/0x470 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Code: 8d 7c 24 20 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5b 48 b8 00 00 00 00 00 fc ff df 4d 8b 6c 24 20 49 8d 7d 08 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 34 4d 8b 6d 08 4d 85 ed 74 0b e8 26 2b 51 fd 4c RIP: asix_suspend+0x76/0xc0 RSP: ffff88006bbae718 ---[ end trace dfc4f5649284342c ]---
Signed-off-by: Andrey Konovalov andreyknvl@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/asix_devices.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -626,7 +626,7 @@ static int asix_suspend(struct usb_inter struct usbnet *dev = usb_get_intfdata(intf); struct asix_common_private *priv = dev->driver_priv;
- if (priv->suspend) + if (priv && priv->suspend) priv->suspend(dev);
return usbnet_suspend(intf, message); @@ -678,7 +678,7 @@ static int asix_resume(struct usb_interf struct usbnet *dev = usb_get_intfdata(intf); struct asix_common_private *priv = dev->driver_priv;
- if (priv->resume) + if (priv && priv->resume) priv->resume(dev);
return usbnet_resume(intf);
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 7ec318feeed10a64c0359ec4d10889cb4defa39a ]
When a GSO skb of truesize O is segmented into 2 new skbs of truesize N1 and N2, we want to transfer socket ownership to the new fresh skbs.
In order to avoid expensive atomic operations on a cache line subject to cache bouncing, we replace the sequence :
refcount_add(N1, &sk->sk_wmem_alloc); refcount_add(N2, &sk->sk_wmem_alloc); // repeated by number of segments
refcount_sub(O, &sk->sk_wmem_alloc);
by a single
refcount_add(sum_of(N) - O, &sk->sk_wmem_alloc);
Problem is :
In some pathological cases, sum(N) - O might be a negative number, and syzkaller bot was apparently able to trigger this trace [1]
atomic_t was ok with this construct, but we need to take care of the negative delta with refcount_t
[1] refcount_t: saturated; leaking memory. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 8404 at lib/refcount.c:77 refcount_add_not_zero+0x198/0x200 lib/refcount.c:77 Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 8404 Comm: syz-executor2 Not tainted 4.14.0-rc5-mm1+ #20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1c4/0x1e0 kernel/panic.c:546 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177 do_trap_no_signal arch/x86/kernel/traps.c:211 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:260 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310 invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 RIP: 0010:refcount_add_not_zero+0x198/0x200 lib/refcount.c:77 RSP: 0018:ffff8801c606e3a0 EFLAGS: 00010282 RAX: 0000000000000026 RBX: 0000000000001401 RCX: 0000000000000000 RDX: 0000000000000026 RSI: ffffc900036fc000 RDI: ffffed0038c0dc68 RBP: ffff8801c606e430 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8801d97f5eba R11: 0000000000000000 R12: ffff8801d5acf73c R13: 1ffff10038c0dc75 R14: 00000000ffffffff R15: 00000000fffff72f refcount_add+0x1b/0x60 lib/refcount.c:101 tcp_gso_segment+0x10d0/0x16b0 net/ipv4/tcp_offload.c:155 tcp4_gso_segment+0xd4/0x310 net/ipv4/tcp_offload.c:51 inet_gso_segment+0x60c/0x11c0 net/ipv4/af_inet.c:1271 skb_mac_gso_segment+0x33f/0x660 net/core/dev.c:2749 __skb_gso_segment+0x35f/0x7f0 net/core/dev.c:2821 skb_gso_segment include/linux/netdevice.h:3971 [inline] validate_xmit_skb+0x4ba/0xb20 net/core/dev.c:3074 __dev_queue_xmit+0xe49/0x2070 net/core/dev.c:3497 dev_queue_xmit+0x17/0x20 net/core/dev.c:3538 neigh_hh_output include/net/neighbour.h:471 [inline] neigh_output include/net/neighbour.h:479 [inline] ip_finish_output2+0xece/0x1460 net/ipv4/ip_output.c:229 ip_finish_output+0x85e/0xd10 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:238 [inline] ip_output+0x1cc/0x860 net/ipv4/ip_output.c:405 dst_output include/net/dst.h:459 [inline] ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124 ip_queue_xmit+0x8c6/0x18e0 net/ipv4/ip_output.c:504 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1137 tcp_write_xmit+0x663/0x4de0 net/ipv4/tcp_output.c:2341 __tcp_push_pending_frames+0xa0/0x250 net/ipv4/tcp_output.c:2513 tcp_push_pending_frames include/net/tcp.h:1722 [inline] tcp_data_snd_check net/ipv4/tcp_input.c:5050 [inline] tcp_rcv_established+0x8c7/0x18a0 net/ipv4/tcp_input.c:5497 tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1460 sk_backlog_rcv include/net/sock.h:909 [inline] __release_sock+0x124/0x360 net/core/sock.c:2264 release_sock+0xa4/0x2a0 net/core/sock.c:2776 tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:632 [inline] sock_sendmsg+0xca/0x110 net/socket.c:642 ___sys_sendmsg+0x31c/0x890 net/socket.c:2048 __sys_sendmmsg+0x1e6/0x5f0 net/socket.c:2138
Fixes: 14afee4b6092 ("net: convert sock.sk_wmem_alloc from atomic_t to refcount_t") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_offload.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -149,11 +149,19 @@ struct sk_buff *tcp_gso_segment(struct s * is freed by GSO engine */ if (copy_destructor) { + int delta; + swap(gso_skb->sk, skb->sk); swap(gso_skb->destructor, skb->destructor); sum_truesize += skb->truesize; - refcount_add(sum_truesize - gso_skb->truesize, - &skb->sk->sk_wmem_alloc); + delta = sum_truesize - gso_skb->truesize; + /* In some pathological cases, delta can be negative. + * We need to either use refcount_add() or refcount_sub_and_test() + */ + if (likely(delta >= 0)) + refcount_add(delta, &skb->sk->sk_wmem_alloc); + else + WARN_ON_ONCE(refcount_sub_and_test(-delta, &skb->sk->sk_wmem_alloc)); }
delta = htonl(oldlen + (skb_tail_pointer(skb) -
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuchung Cheng ycheng@google.com
[ Upstream commit 0eb96bf754d7fa6635aa0b0f6650c74b8a6b1cc9 ]
This patch fixes the cause of an WARNING indicatng TCP has pending retransmission in Open state in tcp_fastretrans_alert().
The root cause is a bad interaction between path mtu probing, if enabled, and the RACK loss detection. Upong receiving a SACK above the sequence of the MTU probing packet, RACK could mark the probe packet lost in tcp_fastretrans_alert(), prior to calling tcp_simple_retransmit().
tcp_simple_retransmit() only enters Loss state if it newly marks the probe packet lost. If the probe packet is already identified as lost by RACK, the sender remains in Open state with some packets marked lost and retransmitted. Then the next SACK would trigger the warning. The likely scenario is that the probe packet was lost due to its size or network congestion. The actual impact of this warning is small by potentially entering fast recovery an ACK later.
The simple fix is always entering recovery (Loss) state if some packet is marked lost during path MTU probing.
Fixes: a0370b3f3f2c ("tcp: enable RACK loss detection to trigger recovery") Reported-by: Oleksandr Natalenko oleksandr@natalenko.name Reported-by: Alexei Starovoitov alexei.starovoitov@gmail.com Reported-by: Roman Gushchin guro@fb.com Signed-off-by: Yuchung Cheng ycheng@google.com Reviewed-by: Eric Dumazet edumazet@google.com Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2613,7 +2613,6 @@ void tcp_simple_retransmit(struct sock * struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; unsigned int mss = tcp_current_mss(sk); - u32 prior_lost = tp->lost_out;
tcp_for_write_queue(skb, sk) { if (skb == tcp_send_head(sk)) @@ -2630,7 +2629,7 @@ void tcp_simple_retransmit(struct sock *
tcp_clear_retrans_hints_partial(tp);
- if (prior_lost == tp->lost_out) + if (!tp->lost_out) return;
if (tcp_is_reno(tp))
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 052d41c01b3a2e3371d66de569717353af489d63 ]
After refcnt reaches zero, vlan_vid_del() could free dev->vlan_info via RCU:
RCU_INIT_POINTER(dev->vlan_info, NULL); call_rcu(&vlan_info->rcu, vlan_info_rcu_free);
However, the pointer 'grp' still points to that memory since it is set before vlan_vid_del():
vlan_info = rtnl_dereference(dev->vlan_info); if (!vlan_info) goto out; grp = &vlan_info->grp;
Depends on when that RCU callback is scheduled, we could trigger a use-after-free in vlan_group_for_each_dev() right following this vlan_vid_del().
Fix it by moving vlan_vid_del() before setting grp. This is also symmetric to the vlan_vid_add() we call in vlan_device_event().
Reported-by: Fengguang Wu fengguang.wu@intel.com Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct") Cc: Alexander Duyck alexander.duyck@gmail.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Girish Moodalbail girish.moodalbail@oracle.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Reviewed-by: Girish Moodalbail girish.moodalbail@oracle.com Tested-by: Fengguang Wu fengguang.wu@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/8021q/vlan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -376,6 +376,9 @@ static int vlan_device_event(struct noti dev->name); vlan_vid_add(dev, htons(ETH_P_8021Q), 0); } + if (event == NETDEV_DOWN && + (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) + vlan_vid_del(dev, htons(ETH_P_8021Q), 0);
vlan_info = rtnl_dereference(dev->vlan_info); if (!vlan_info) @@ -423,9 +426,6 @@ static int vlan_device_event(struct noti struct net_device *tmp; LIST_HEAD(close_list);
- if (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER) - vlan_vid_del(dev, htons(ETH_P_8021Q), 0); - /* Put all VLANs for this dev in the down state too. */ vlan_group_for_each_dev(grp, i, vlandev) { flgs = vlandev->flags;
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huy Nguyen huyn@mellanox.com
[ Upstream commit d2aa060d40fa060e963f9a356d43481e43ba3dac ]
After the panic teardown firmware command, health_care detects the error in PCI bus and calls the mlx5_pci_err_detected. This health_care flow is no longer needed because the panic teardown firmware command will bring down the PCI bus communication with the HCA.
The solution is to cancel the health care timer and its pending workqueue request before sending panic teardown firmware command.
Kernel trace: mlx5_core 0033:01:00.0: Shutdown was called mlx5_core 0033:01:00.0: health_care:154:(pid 9304): handling bad device here mlx5_core 0033:01:00.0: mlx5_handle_bad_state:114:(pid 9304): NIC state 1 mlx5_core 0033:01:00.0: mlx5_pci_err_detected was called mlx5_core 0033:01:00.0: mlx5_enter_error_state:96:(pid 9304): start mlx5_3:mlx5_ib_event:3061:(pid 9304): warning: event on port 0 mlx5_core 0033:01:00.0: mlx5_enter_error_state:104:(pid 9304): end Unable to handle kernel paging request for data at address 0x0000003f Faulting instruction address: 0xc0080000434b8c80
Fixes: 8812c24d28f4 ('net/mlx5: Add fast unload support in shutdown flow') Signed-off-by: Huy Nguyen huyn@mellanox.com Reviewed-by: Moshe Shemesh moshe@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1545,9 +1545,16 @@ static int mlx5_try_fast_unload(struct m return -EAGAIN; }
+ /* Panic tear down fw command will stop the PCI bus communication + * with the HCA, so the health polll is no longer needed. + */ + mlx5_drain_health_wq(dev); + mlx5_stop_health_poll(dev); + ret = mlx5_cmd_force_teardown_hca(dev); if (ret) { mlx5_core_dbg(dev, "Firmware couldn't do fast unload error: %d\n", ret); + mlx5_start_health_poll(dev); return ret; }
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Inbar Karmy inbark@mellanox.com
[ Upstream commit 2e50b2619538ea0224c037f6fa746023089e0654 ]
Currently, when dma mapping fails, put_page is called, but the page is not set to null. Later, in the page_reuse treatment in mlx5e_free_rx_descs(), mlx5e_page_release() is called for the second time, improperly doing dma_unmap (for a non-mapped address) and an extra put_page. Prevent this by nullifying the page pointer when dma_map fails.
Fixes: accd58833237 ("net/mlx5e: Introduce RX Page-Reuse") Signed-off-by: Inbar Karmy inbark@mellanox.com Reviewed-by: Tariq Toukan tariqt@mellanox.com Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -213,22 +213,20 @@ static inline bool mlx5e_rx_cache_get(st static inline int mlx5e_page_alloc_mapped(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info) { - struct page *page; - if (mlx5e_rx_cache_get(rq, dma_info)) return 0;
- page = dev_alloc_pages(rq->buff.page_order); - if (unlikely(!page)) + dma_info->page = dev_alloc_pages(rq->buff.page_order); + if (unlikely(!dma_info->page)) return -ENOMEM;
- dma_info->addr = dma_map_page(rq->pdev, page, 0, + dma_info->addr = dma_map_page(rq->pdev, dma_info->page, 0, RQ_PAGE_SIZE(rq), rq->buff.map_dir); if (unlikely(dma_mapping_error(rq->pdev, dma_info->addr))) { - put_page(page); + put_page(dma_info->page); + dma_info->page = NULL; return -ENOMEM; } - dma_info->page = page;
return 0; }
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Jason A. Donenfeld" Jason@zx2c4.com
[ Upstream commit 0642840b8bb008528dbdf929cec9f65ac4231ad0 ]
The way people generally use netlink_dump is that they fill in the skb as much as possible, breaking when nla_put returns an error. Then, they get called again and start filling out the next skb, and again, and so forth. The mechanism at work here is the ability for the iterative dumping function to detect when the skb is filled up and not fill it past the brim, waiting for a fresh skb for the rest of the data.
However, if the attributes are small and nicely packed, it is possible that a dump callback function successfully fills in attributes until the skb is of size 4080 (libmnl's default page-sized receive buffer size). The dump function completes, satisfied, and then, if it happens to be that this is actually the last skb, and no further ones are to be sent, then netlink_dump will add on the NLMSG_DONE part:
nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
It is very important that netlink_dump does this, of course. However, in this example, that call to nlmsg_put_answer will fail, because the previous filling by the dump function did not leave it enough room. And how could it possibly have done so? All of the nla_put variety of functions simply check to see if the skb has enough tailroom, independent of the context it is in.
In order to keep the important assumptions of all netlink dump users, it is therefore important to give them an skb that has this end part of the tail already reserved, so that the call to nlmsg_put_answer does not fail. Otherwise, library authors are forced to find some bizarre sized receive buffer that has a large modulo relative to the common sizes of messages received, which is ugly and buggy.
This patch thus saves the NLMSG_DONE for an additional message, for the case that things are dangerously close to the brim. This requires keeping track of the errno from ->dump() across calls.
Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netlink/af_netlink.c | 17 +++++++++++------ net/netlink/af_netlink.h | 1 + 2 files changed, 12 insertions(+), 6 deletions(-)
--- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2128,7 +2128,7 @@ static int netlink_dump(struct sock *sk) struct sk_buff *skb = NULL; struct nlmsghdr *nlh; struct module *module; - int len, err = -ENOBUFS; + int err = -ENOBUFS; int alloc_min_size; int alloc_size;
@@ -2175,9 +2175,11 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk);
- len = cb->dump(skb, cb); + if (nlk->dump_done_errno > 0) + nlk->dump_done_errno = cb->dump(skb, cb);
- if (len > 0) { + if (nlk->dump_done_errno > 0 || + skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex);
if (sk_filter(sk, skb)) @@ -2187,13 +2189,15 @@ static int netlink_dump(struct sock *sk) return 0; }
- nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, + sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) goto errout_skb;
nl_dump_check_consistent(cb, nlh);
- memcpy(nlmsg_data(nlh), &len, sizeof(len)); + memcpy(nlmsg_data(nlh), &nlk->dump_done_errno, + sizeof(nlk->dump_done_errno));
if (sk_filter(sk, skb)) kfree_skb(skb); @@ -2265,6 +2269,7 @@ int __netlink_dump_start(struct sock *ss }
nlk->cb_running = true; + nlk->dump_done_errno = INT_MAX;
mutex_unlock(nlk->cb_mutex);
--- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -33,6 +33,7 @@ struct netlink_sock { wait_queue_head_t wait; bool bound; bool cb_running; + int dump_done_errno; struct netlink_callback cb; struct mutex *cb_mutex; struct mutex cb_def_mutex;
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 8bff3685a4bbf175a96bc6a528f13455d8d38244 ]
Commit f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport header offset") removed icmp6_code and icmp6_type check before calling neigh_reduce when doing neigh proxy.
It means all icmpv6 packets would be blocked by this, not only ns packet. In Jianlin's env, even ping6 couldn't work through it.
This patch is to bring the icmp6_code and icmp6_type check back and also removed the same check from neigh_reduce().
Fixes: f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport header offset") Reported-by: Jianlin Shi jishi@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Reviewed-by: Vincent Bernat vincent@bernat.im Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/vxlan.c | 31 +++++++++++++------------------ 1 file changed, 13 insertions(+), 18 deletions(-)
--- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1632,26 +1632,19 @@ static struct sk_buff *vxlan_na_create(s static int neigh_reduce(struct net_device *dev, struct sk_buff *skb, __be32 vni) { struct vxlan_dev *vxlan = netdev_priv(dev); - struct nd_msg *msg; - const struct ipv6hdr *iphdr; const struct in6_addr *daddr; - struct neighbour *n; + const struct ipv6hdr *iphdr; struct inet6_dev *in6_dev; + struct neighbour *n; + struct nd_msg *msg;
in6_dev = __in6_dev_get(dev); if (!in6_dev) goto out;
- if (!pskb_may_pull(skb, sizeof(struct ipv6hdr) + sizeof(struct nd_msg))) - goto out; - iphdr = ipv6_hdr(skb); daddr = &iphdr->daddr; - msg = (struct nd_msg *)(iphdr + 1); - if (msg->icmph.icmp6_code != 0 || - msg->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION) - goto out;
if (ipv6_addr_loopback(daddr) || ipv6_addr_is_multicast(&msg->target)) @@ -2258,11 +2251,11 @@ tx_error: static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) { struct vxlan_dev *vxlan = netdev_priv(dev); + struct vxlan_rdst *rdst, *fdst = NULL; const struct ip_tunnel_info *info; - struct ethhdr *eth; bool did_rsc = false; - struct vxlan_rdst *rdst, *fdst = NULL; struct vxlan_fdb *f; + struct ethhdr *eth; __be32 vni = 0;
info = skb_tunnel_info(skb); @@ -2287,12 +2280,14 @@ static netdev_tx_t vxlan_xmit(struct sk_ if (ntohs(eth->h_proto) == ETH_P_ARP) return arp_reduce(dev, skb, vni); #if IS_ENABLED(CONFIG_IPV6) - else if (ntohs(eth->h_proto) == ETH_P_IPV6) { - struct ipv6hdr *hdr, _hdr; - if ((hdr = skb_header_pointer(skb, - skb_network_offset(skb), - sizeof(_hdr), &_hdr)) && - hdr->nexthdr == IPPROTO_ICMPV6) + else if (ntohs(eth->h_proto) == ETH_P_IPV6 && + pskb_may_pull(skb, sizeof(struct ipv6hdr) + + sizeof(struct nd_msg)) && + ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) { + struct nd_msg *m = (struct nd_msg *)(ipv6_hdr(skb) + 1); + + if (m->icmph.icmp6_code == 0 && + m->icmph.icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) return neigh_reduce(dev, skb, vni); } #endif
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit df80cd9b28b9ebaa284a41df611dbf3a2d05ca74 ]
Now when peeling off an association to the sock in another netns, all transports in this assoc are not to be rehashed and keep use the old key in hashtable.
As a transport uses sk->net as the hash key to insert into hashtable, it would miss removing these transports from hashtable due to the new netns when closing the sock and all transports are being freeed, then later an use-after-free issue could be caused when looking up an asoc and dereferencing those transports.
This is a very old issue since very beginning, ChunYu found it with syzkaller fuzz testing with this series:
socket$inet6_sctp() bind$inet6() sendto$inet6() unshare(0x40000000) getsockopt$inet_sctp6_SCTP_GET_ASSOC_ID_LIST() getsockopt$inet_sctp6_SCTP_SOCKOPT_PEELOFF()
This patch is to block this call when peeling one assoc off from one netns to another one, so that the netns of all transport would not go out-sync with the key in hashtable.
Note that this patch didn't fix it by rehashing transports, as it's difficult to handle the situation when the tuple is already in use in the new netns. Besides, no one would like to peel off one assoc to another netns, considering ipaddrs, ifaces, etc. are usually different.
Reported-by: ChunYu Wang chunwang@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/socket.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4924,6 +4924,10 @@ int sctp_do_peeloff(struct sock *sk, sct struct socket *sock; int err = 0;
+ /* Do not peel off from one netns to another one. */ + if (!net_eq(current->nsproxy->net_ns, sock_net(sk))) + return -EINVAL; + if (!asoc) return -EINVAL;
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huacai Chen chenhc@lemote.com
[ Upstream commit cc54c1d32e6a4bb3f116721abf900513173e4d02 ]
This patch try to fix the building error on MIPS. The reason is MIPS has already defined the LONG macro, which conflicts with the LONG enum in drivers/net/ethernet/fealnx.c.
Signed-off-by: Huacai Chen chenhc@lemote.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/fealnx.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/fealnx.c +++ b/drivers/net/ethernet/fealnx.c @@ -257,8 +257,8 @@ enum rx_desc_status_bits { RXFSD = 0x00000800, /* first descriptor */ RXLSD = 0x00000400, /* last descriptor */ ErrorSummary = 0x80, /* error summary */ - RUNT = 0x40, /* runt packet received */ - LONG = 0x20, /* long packet received */ + RUNTPKT = 0x40, /* runt packet received */ + LONGPKT = 0x20, /* long packet received */ FAE = 0x10, /* frame align error */ CRC = 0x08, /* crc error */ RXER = 0x04, /* receive error */ @@ -1632,7 +1632,7 @@ static int netdev_rx(struct net_device * dev->name, rx_status);
dev->stats.rx_errors++; /* end of a packet. */ - if (rx_status & (LONG | RUNT)) + if (rx_status & (LONGPKT | RUNTPKT)) dev->stats.rx_length_errors++; if (rx_status & RXER) dev->stats.rx_frame_errors++;
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Eric W. Biederman" ebiederm@xmission.com
[ Upstream commit 7c8a61d9ee1df0fb4747879fa67a99614eb62fec ]
Alexandar Potapenko while testing the kernel with KMSAN and syzkaller discovered that in some configurations sctp would leak 4 bytes of kernel stack.
Working with his reproducer I discovered that those 4 bytes that are leaked is the scope id of an ipv6 address returned by recvmsg.
With a little code inspection and a shrewd guess I discovered that sctp_inet6_skb_msgname only initializes the scope_id field for link local ipv6 addresses to the interface index the link local address pertains to instead of initializing the scope_id field for all ipv6 addresses.
That is almost reasonable as scope_id's are meaniningful only for link local addresses. Set the scope_id in all other cases to 0 which is not a valid interface index to make it clear there is nothing useful in the scope_id field.
There should be no danger of breaking userspace as the stack leak guaranteed that previously meaningless random data was being returned.
Fixes: 372f525b495c ("SCTP: Resync with LKSCTP tree.") History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Reported-by: Alexander Potapenko glider@google.com Tested-by: Alexander Potapenko glider@google.com Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/ipv6.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -807,9 +807,10 @@ static void sctp_inet6_skb_msgname(struc addr->v6.sin6_flowinfo = 0; addr->v6.sin6_port = sh->source; addr->v6.sin6_addr = ipv6_hdr(skb)->saddr; - if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) { + if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) addr->v6.sin6_scope_id = sctp_v6_skb_iif(skb); - } + else + addr->v6.sin6_scope_id = 0; }
*addr_len = sctp_v6_addr_to_user(sctp_sk(skb->sk), addr);
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Roberto Sassu roberto.sassu@huawei.com
commit 020aae3ee58c1af0e7ffc4e2cc9fe4dc630338cb upstream.
Commit b65a9cfc2c38 ("Untangling ima mess, part 2: deal with counters") moved the call of ima_file_check() from may_open() to do_filp_open() at a point where the file descriptor is already opened.
This breaks the assumption made by IMA that file descriptors being closed belong to files whose access was granted by ima_file_check(). The consequence is that security.ima and security.evm are updated with good values, regardless of the current appraisal status.
For example, if a file does not have security.ima, IMA will create it after opening the file for writing, even if access is denied. Access to the file will be allowed afterwards.
Avoid this issue by checking the appraisal status before updating security.ima.
Signed-off-by: Roberto Sassu roberto.sassu@huawei.com Signed-off-by: Mimi Zohar zohar@linux.vnet.ibm.com Signed-off-by: James Morris james.l.morris@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- security/integrity/ima/ima_appraise.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/security/integrity/ima/ima_appraise.c +++ b/security/integrity/ima/ima_appraise.c @@ -320,6 +320,9 @@ void ima_update_xattr(struct integrity_i if (iint->flags & IMA_DIGSIG) return;
+ if (iint->ima_file_status != INTEGRITY_PASS) + return; + rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo); if (rc < 0) return;
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lukas Wunner lukas@wunner.de
commit 2a71de2f7366fb1aec632116d0549ec56d6a3940 upstream.
Commit 348f9bb31c56 ("serial: omap: Fix RTS handling") sought to enable auto RTS upon manual RTS assertion and disable it on deassertion. However it seems the latter was done incorrectly, it clears all bits in the Extended Features Register *except* auto RTS.
Fixes: 348f9bb31c56 ("serial: omap: Fix RTS handling") Cc: Peter Hurley peter@hurleysoftware.com Signed-off-by: Lukas Wunner lukas@wunner.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/serial/omap-serial.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/serial/omap-serial.c +++ b/drivers/tty/serial/omap-serial.c @@ -693,7 +693,7 @@ static void serial_omap_set_mctrl(struct if ((mctrl & TIOCM_RTS) && (port->status & UPSTAT_AUTORTS)) up->efr |= UART_EFR_RTS; else - up->efr &= UART_EFR_RTS; + up->efr &= ~UART_EFR_RTS; serial_out(up, UART_EFR, up->efr); serial_out(up, UART_LCR, lcr);
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ji-Ze Hong (Peter Hong) hpeter@gmail.com
commit fd97e66c5529046e989a0879c3bb58fddb592c71 upstream.
The SuperIO will be configured at boot time by BIOS, but some BIOS will not deactivate the SuperIO when the end of configuration. It'll lead to mismatch for pdata->base_port in probe_setup_port(). So we'll deactivate all SuperIO before activate special base_port in fintek_8250_enter_key().
Tested on iBASE MI802.
Tested-by: Ji-Ze Hong (Peter Hong) hpeter+linux_kernel@gmail.com Signed-off-by: Ji-Ze Hong (Peter Hong) hpeter+linux_kernel@gmail.com Reviewd-by: Alan Cox alan@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/serial/8250/8250_fintek.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/tty/serial/8250/8250_fintek.c +++ b/drivers/tty/serial/8250/8250_fintek.c @@ -118,6 +118,9 @@ static int fintek_8250_enter_key(u16 bas if (!request_muxed_region(base_port, 2, "8250_fintek")) return -EBUSY;
+ /* Force to deactive all SuperIO in this base_port */ + outb(EXIT_KEY, base_port + ADDR_PORT); + outb(key, base_port + ADDR_PORT); outb(key, base_port + ADDR_PORT); return 0;
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Steffen Alexander.Steffen@infineon.com
commit ee70bc1e7b63ac8023c9ff9475d8741e397316e7 upstream.
tpm_transmit() does not offer an explicit interface to indicate the number of valid bytes in the communication buffer. Instead, it relies on the commandSize field in the TPM header that is encoded within the buffer. Therefore, ensure that a) enough data has been written to the buffer, so that the commandSize field is present and b) the commandSize field does not announce more data than has been written to the buffer.
This should have been fixed with CVE-2011-1161 long ago, but apparently a correct version of that patch never made it into the kernel.
Signed-off-by: Alexander Steffen Alexander.Steffen@infineon.com Reviewed-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Tested-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/tpm/tpm-dev-common.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/char/tpm/tpm-dev-common.c +++ b/drivers/char/tpm/tpm-dev-common.c @@ -110,6 +110,12 @@ ssize_t tpm_common_write(struct file *fi return -EFAULT; }
+ if (in_size < 6 || + in_size < be32_to_cpu(*((__be32 *) (priv->data_buffer + 2)))) { + mutex_unlock(&priv->buffer_mutex); + return -EINVAL; + } + /* atomic tpm command send and result receive. We only hold the ops * lock during this period so that the tpm can be unregistered even if * the char dev is held open.
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neeraj Upadhyay neeraju@codeaurora.org
commit 135bd1a230bb69a68c9808a7d25467318900b80a upstream.
The pending-callbacks check in rcu_prepare_for_idle() is backwards. It should accelerate if there are pending callbacks, but the check rather uselessly accelerates only if there are no callbacks. This commit therefore inverts this check.
Fixes: 15fecf89e46a ("srcu: Abstract multi-tail callback list handling") Signed-off-by: Neeraj Upadhyay neeraju@codeaurora.org Signed-off-by: Paul E. McKenney paulmck@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/rcu/tree_plugin.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1493,7 +1493,7 @@ static void rcu_prepare_for_idle(void) rdtp->last_accelerate = jiffies; for_each_rcu_flavor(rsp) { rdp = this_cpu_ptr(rsp->rda); - if (rcu_segcblist_pend_cbs(&rdp->cblist)) + if (!rcu_segcblist_pend_cbs(&rdp->cblist)) continue; rnp = rdp->mynode; raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
commit 373c4557d2aa362702c4c2d41288fb1e54990b7c upstream.
This matters at least for the mincore syscall, which will otherwise copy uninitialized memory from the page allocator to userspace. It is probably also a correctness error for /proc/$pid/pagemap, but I haven't tested that.
Removing the `walk->hugetlb_entry` condition in walk_hugetlb_range() has no effect because the caller already checks for that.
This only reports holes in hugetlb ranges to callers who have specified a hugetlb_entry callback.
This issue was found using an AFL-based fuzzer.
v2: - don't crash on ->pte_hole==NULL (Andrew Morton) - add Cc stable (Andrew Morton)
Fixes: 1e25a271c8ac ("mincore: apply page table walker on do_mincore()") Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/pagewalk.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -187,8 +187,12 @@ static int walk_hugetlb_range(unsigned l do { next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask, sz); - if (pte && walk->hugetlb_entry) + + if (pte) err = walk->hugetlb_entry(pte, hmask, addr, next, walk); + else if (walk->pte_hole) + err = walk->pte_hole(addr, next, walk); + if (err) break; } while (addr = next, addr != end);
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Changwei Ge ge.changwei@h3c.com
commit 1c01967116a678fed8e2c68a6ab82abc8effeddc upstream.
When a node dies, other live nodes have to choose a new master for an existed lock resource mastered by the dead node.
As for ocfs2/dlm implementation, this is done by function - dlm_move_lockres_to_recovery_list which marks those lock rsources as DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM changes lock resource's master later.
So without invoking dlm_move_lockres_to_recovery_list, no master will be choosed after dlm recovery accomplishment since no lock resource can be found through ::resource list.
What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock resources mastered a dead node, it will break up synchronization among nodes.
So invoke dlm_move_lockres_to_recovery_list again.
Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")' Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-E... Signed-off-by: Changwei Ge ge.changwei@h3c.com Reported-by: Vitaly Mayatskih v.mayatskih@gmail.com Tested-by: Vitaly Mayatskikh v.mayatskih@gmail.com Cc: Mark Fasheh mfasheh@versity.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Joseph Qi jiangqi903@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ocfs2/dlm/dlmrecovery.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanu dlm_lockres_put(res); continue; } + dlm_move_lockres_to_recovery_list(dlm, res); } else if (res->owner == dlm->node_num) { dlm_free_dead_locks(dlm, res, dead_node); __dlm_lockres_calc_usage(dlm, res);
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: alex chen alex.chen@huawei.com
commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
we should wait dio requests to finish before inode lock in ocfs2_setattr(), otherwise the following deadlock will happen:
process 1 process 2 process 3 truncate file 'A' end_io of writing file 'A' receiving the bast messages ocfs2_setattr ocfs2_inode_lock_tracker ocfs2_inode_lock_full inode_dio_wait __inode_dio_wait -->waiting for all dio requests finish dlm_proxy_ast_handler dlm_do_local_bast ocfs2_blocking_ast ocfs2_generic_handle_bast set OCFS2_LOCK_BLOCKED flag dio_end_io dio_bio_end_aio dio_complete ocfs2_dio_end_io ocfs2_dio_end_io_write ocfs2_inode_lock __ocfs2_cluster_lock ocfs2_wait_for_mask -->waiting for OCFS2_LOCK_BLOCKED flag to be cleared, that is waiting for 'process 1' unlocking the inode lock inode_dio_end -->here dec the i_dio_count, but will never be called, so a deadlock happened.
Link: http://lkml.kernel.org/r/59F81636.70508@huawei.com Signed-off-by: Alex Chen alex.chen@huawei.com Reviewed-by: Jun Piao piaojun@huawei.com Reviewed-by: Joseph Qi jiangqi903@gmail.com Acked-by: Changwei Ge ge.changwei@h3c.com Cc: Mark Fasheh mfasheh@versity.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ocfs2/file.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1168,6 +1168,13 @@ int ocfs2_setattr(struct dentry *dentry, } size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE; if (size_change) { + /* + * Here we should wait dio to finish before inode lock + * to avoid a deadlock between ocfs2_setattr() and + * ocfs2_dio_end_io_write() + */ + inode_dio_wait(inode); + status = ocfs2_rw_lock(inode, 1); if (status < 0) { mlog_errno(status); @@ -1207,8 +1214,6 @@ int ocfs2_setattr(struct dentry *dentry, if (status) goto bail_unlock;
- inode_dio_wait(inode); - if (i_size_read(inode) >= attr->ia_size) { if (ocfs2_should_order_data(inode)) { status = ocfs2_begin_ordered_truncate(inode,
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Corey Minyard cminyard@mvista.com
commit 392a17b10ec4320d3c0e96e2a23ebaad1123b989 upstream.
When I set the timeout to a specific value such as 500ms, the timeout event will not happen in time due to the overflow in function check_msg_timeout: ... ent->timeout -= timeout_period; if (ent->timeout > 0) return; ...
The type of timeout_period is long, but ent->timeout is unsigned long. This patch makes the type consistent.
Reported-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Corey Minyard cminyard@mvista.com Tested-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/ipmi/ipmi_msghandler.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -4030,7 +4030,8 @@ smi_from_recv_msg(ipmi_smi_t intf, struc }
static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, - struct list_head *timeouts, long timeout_period, + struct list_head *timeouts, + unsigned long timeout_period, int slot, unsigned long *flags, unsigned int *waiting_msgs) { @@ -4043,8 +4044,8 @@ static void check_msg_timeout(ipmi_smi_t if (!ent->inuse) return;
- ent->timeout -= timeout_period; - if (ent->timeout > 0) { + if (timeout_period < ent->timeout) { + ent->timeout -= timeout_period; (*waiting_msgs)++; return; } @@ -4110,7 +4111,8 @@ static void check_msg_timeout(ipmi_smi_t } }
-static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, long timeout_period) +static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, + unsigned long timeout_period) { struct list_head timeouts; struct ipmi_recv_msg *msg, *msg2;
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Tatashin pasha.tatashin@oracle.com
commit d135e5750205a21a212a19dbb05aeb339e2cbea7 upstream.
In reset_deferred_meminit() we determine number of pages that must not be deferred. We initialize pages for at least 2G of memory, but also pages for reserved memory in this node.
The reserved memory is determined in this function: memblock_reserved_memory_within(), which operates over physical addresses, and returns size in bytes. However, reset_deferred_meminit() assumes that that this function operates with pfns, and returns page count.
The result is that in the best case machine boots slower than expected due to initializing more pages than needed in single thread, and in the worst case panics because fewer than needed pages are initialized early.
Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing") Signed-off-by: Pavel Tatashin pasha.tatashin@oracle.com Acked-by: Michal Hocko mhocko@suse.com Cc: Mel Gorman mgorman@techsingularity.net Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/mmzone.h | 3 ++- mm/page_alloc.c | 27 ++++++++++++++++++--------- 2 files changed, 20 insertions(+), 10 deletions(-)
--- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -691,7 +691,8 @@ typedef struct pglist_data { * is the first PFN that needs to be initialised. */ unsigned long first_deferred_pfn; - unsigned long static_init_size; + /* Number of non-deferred pages */ + unsigned long static_init_pgcnt; #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
#ifdef CONFIG_TRANSPARENT_HUGEPAGE --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -289,28 +289,37 @@ EXPORT_SYMBOL(nr_online_nodes); int page_group_by_mobility_disabled __read_mostly;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT + +/* + * Determine how many pages need to be initialized durig early boot + * (non-deferred initialization). + * The value of first_deferred_pfn will be set later, once non-deferred pages + * are initialized, but for now set it ULONG_MAX. + */ static inline void reset_deferred_meminit(pg_data_t *pgdat) { - unsigned long max_initialise; - unsigned long reserved_lowmem; + phys_addr_t start_addr, end_addr; + unsigned long max_pgcnt; + unsigned long reserved;
/* * Initialise at least 2G of a node but also take into account that * two large system hashes that can take up 1GB for 0.25TB/node. */ - max_initialise = max(2UL << (30 - PAGE_SHIFT), - (pgdat->node_spanned_pages >> 8)); + max_pgcnt = max(2UL << (30 - PAGE_SHIFT), + (pgdat->node_spanned_pages >> 8));
/* * Compensate the all the memblock reservations (e.g. crash kernel) * from the initial estimation to make sure we will initialize enough * memory to boot. */ - reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn, - pgdat->node_start_pfn + max_initialise); - max_initialise += reserved_lowmem; + start_addr = PFN_PHYS(pgdat->node_start_pfn); + end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt); + reserved = memblock_reserved_memory_within(start_addr, end_addr); + max_pgcnt += PHYS_PFN(reserved);
- pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages); + pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages); pgdat->first_deferred_pfn = ULONG_MAX; }
@@ -337,7 +346,7 @@ static inline bool update_defer_init(pg_ if (zone_end < pgdat_end_pfn(pgdat)) return true; (*nr_initialised)++; - if ((*nr_initialised > pgdat->static_init_size) && + if ((*nr_initialised > pgdat->static_init_pgcnt) && (pfn & (PAGES_PER_SECTION - 1)) == 0) { pgdat->first_deferred_pfn = pfn; return false;
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jaewon Kim jaewon31.kim@samsung.com
commit e492080e640c2d1235ddf3441cae634cfffef7e1 upstream.
online_page_ext() and page_ext_init() allocate page_ext for each section, but they do not allocate if the first PFN is !pfn_present(pfn) or !pfn_valid(pfn). Then section->page_ext remains as NULL. lookup_page_ext checks NULL only if CONFIG_DEBUG_VM is enabled. For a valid PFN, __set_page_owner will try to get page_ext through lookup_page_ext. Without CONFIG_DEBUG_VM lookup_page_ext will misuse NULL pointer as value 0. This incurrs invalid address access.
This is the panic example when PFN 0x100000 is not valid but PFN 0x13FC00 is being used for page_ext. section->page_ext is NULL, get_entry returned invalid page_ext address as 0x1DFA000 for a PFN 0x13FC00.
To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext will be checked at all times.
Unable to handle kernel paging request at virtual address 01dfa014 ------------[ cut here ]------------ Kernel BUG at ffffff80082371e0 [verbose debug info unavailable] Internal error: Oops: 96000045 [#1] PREEMPT SMP Modules linked in: PC is at __set_page_owner+0x48/0x78 LR is at __set_page_owner+0x44/0x78 __set_page_owner+0x48/0x78 get_page_from_freelist+0x880/0x8e8 __alloc_pages_nodemask+0x14c/0xc48 __do_page_cache_readahead+0xdc/0x264 filemap_fault+0x2ac/0x550 ext4_filemap_fault+0x3c/0x58 __do_fault+0x80/0x120 handle_mm_fault+0x704/0xbb0 do_page_fault+0x2e8/0x394 do_mem_abort+0x88/0x124
Pre-4.7 kernels also need commit f86e4271978b ("mm: check the return value of lookup_page_ext for all call sites").
Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com Fixes: eefa864b701d ("mm/page_ext: resurrect struct page extending code for debugging") Signed-off-by: Jaewon Kim jaewon31.kim@samsung.com Acked-by: Michal Hocko mhocko@suse.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Minchan Kim minchan@kernel.org Cc: Joonsoo Kim js1304@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/page_ext.c | 4 ---- 1 file changed, 4 deletions(-)
--- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -124,7 +124,6 @@ struct page_ext *lookup_page_ext(struct struct page_ext *base;
base = NODE_DATA(page_to_nid(page))->node_page_ext; -#if defined(CONFIG_DEBUG_VM) /* * The sanity checks the page allocator does upon freeing a * page can reach here before the page_ext arrays are @@ -133,7 +132,6 @@ struct page_ext *lookup_page_ext(struct */ if (unlikely(!base)) return NULL; -#endif index = pfn - round_down(node_start_pfn(page_to_nid(page)), MAX_ORDER_NR_PAGES); return get_entry(base, index); @@ -198,7 +196,6 @@ struct page_ext *lookup_page_ext(struct { unsigned long pfn = page_to_pfn(page); struct mem_section *section = __pfn_to_section(pfn); -#if defined(CONFIG_DEBUG_VM) /* * The sanity checks the page allocator does upon freeing a * page can reach here before the page_ext arrays are @@ -207,7 +204,6 @@ struct page_ext *lookup_page_ext(struct */ if (!section->page_ext) return NULL; -#endif return get_entry(section->page_ext, pfn); }
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suravee Suthikulpanit suravee.suthikulpanit@amd.com
commit 2b83809a5e6d619a780876fcaf68cdc42b50d28c upstream.
For systems with X86_FEATURE_TOPOEXT, current logic uses the APIC ID to calculate shared_cpu_map. However, APIC IDs are not guaranteed to be contiguous for cores across different L3s (e.g. family17h system w/ downcore configuration). This breaks the logic, and results in an incorrect L3 shared_cpu_map.
Instead, always use the previously calculated cpu_llc_shared_mask of each CPU to derive the L3 shared_cpu_map.
Signed-off-by: Suravee Suthikulpanit suravee.suthikulpanit@amd.com Signed-off-by: Borislav Petkov bp@suse.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20170731085159.9455-3-bp@alien8.de Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/intel_cacheinfo.c | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-)
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c +++ b/arch/x86/kernel/cpu/intel_cacheinfo.c @@ -811,7 +811,24 @@ static int __cache_amd_cpumap_setup(unsi struct cacheinfo *this_leaf; int i, sibling;
- if (boot_cpu_has(X86_FEATURE_TOPOEXT)) { + /* + * For L3, always use the pre-calculated cpu_llc_shared_mask + * to derive shared_cpu_map. + */ + if (index == 3) { + for_each_cpu(i, cpu_llc_shared_mask(cpu)) { + this_cpu_ci = get_cpu_cacheinfo(i); + if (!this_cpu_ci->info_list) + continue; + this_leaf = this_cpu_ci->info_list + index; + for_each_cpu(sibling, cpu_llc_shared_mask(cpu)) { + if (!cpu_online(sibling)) + continue; + cpumask_set_cpu(sibling, + &this_leaf->shared_cpu_map); + } + } + } else if (boot_cpu_has(X86_FEATURE_TOPOEXT)) { unsigned int apicid, nshared, first, last;
this_leaf = this_cpu_ci->info_list + index; @@ -837,19 +854,6 @@ static int __cache_amd_cpumap_setup(unsi continue; cpumask_set_cpu(sibling, &this_leaf->shared_cpu_map); - } - } - } else if (index == 3) { - for_each_cpu(i, cpu_llc_shared_mask(cpu)) { - this_cpu_ci = get_cpu_cacheinfo(i); - if (!this_cpu_ci->info_list) - continue; - this_leaf = this_cpu_ci->info_list + index; - for_each_cpu(sibling, cpu_llc_shared_mask(cpu)) { - if (!cpu_online(sibling)) - continue; - cpumask_set_cpu(sibling, - &this_leaf->shared_cpu_map); } } } else
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Harkes jaharkes@cs.cmu.edu
commit d337b66a4c52c7b04eec661d86c2ef6e168965a2 upstream.
When an application called fsync on a file in Coda a small request with just the file identifier was allocated, but the declared length was set to the size of union of all possible upcall requests.
This bug has been around for a very long time and is now caught by the extra checking in usercopy that was introduced in Linux-4.8.
The exposure happens when the Coda cache manager process reads the fsync upcall request at which point it is killed. As a result there is nobody servicing any further upcalls, trapping any processes that try to access the mounted Coda filesystem.
Signed-off-by: Jan Harkes jaharkes@cs.cmu.edu Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/coda/upcall.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/coda/upcall.c +++ b/fs/coda/upcall.c @@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb, UPARG(CODA_FSYNC);
inp->coda_fsync.VFid = *fid; - error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs), - &outsize, inp); + error = coda_upcall(coda_vcp(sb), insize, &outsize, inp);
CODA_FREE(inp, insize); return error;
On Wed, Nov 22, 2017 at 11:11:54AM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.13.16 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri Nov 24 10:11:25 UTC 2017. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.13.16-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.13.y and the diffstat can be found below.
Note, barring anything very odd, this is going to be the last 4.13.y release, the kernel branch will be end-of-life after this upcoming release. Everyone should be using 4.14.y instead.
thanks,
greg k-h
On Wed, Nov 22, 2017 at 11:11:54AM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.13.16 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri Nov 24 10:11:25 UTC 2017. Anything received after that time might be too late.
Build results: total: 145 pass: 145 fail: 0 Qemu test results: total: 123 pass: 123 fail: 0
Details are available at http://kerneltests.org/builders.
Guenter
On 22 November 2017 at 15:41, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.13.16 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri Nov 24 10:11:25 UTC 2017. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.13.16-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.13.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.13.16-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.13.y git commit: c1e1787df7b972a96d47a3f656d2c7e767054cd0 git describe: v4.13.15-36-gc1e1787df7b9 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.13-oe/build/v4.13.15-36...
No regressions (compared to build v4.13.15-28-gaa64f5ecf110)
Boards, architectures and test suites: -------------------------------------
hi6220-hikey - arm64 * boot - pass: 20, * kselftest - pass: 39, skip: 15 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 21, skip: 1, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 982, skip: 121, * ltp-timers-tests - pass: 12,
juno-r2 - arm64 * boot - pass: 20, * kselftest - pass: 38, skip: 15 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 10, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 939, skip: 156 * ltp-timers-tests - pass: 12,
x15 - arm * boot - pass: 20, * kselftest - pass: 35, skip: 18 * libhugetlbfs - pass: 87, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 20, skip: 2 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 13, skip: 1, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1036, skip: 66, * ltp-timers-tests - pass: 12,
SuperServer 5019S-ML - x86_64 * boot - pass: 20, * kselftest - pass: 54, skip: 14 * libhugetlbfs - pass: 76, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 61, skip: 1, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 9, skip: 1, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 957, skip: 163, * ltp-timers-tests - pass: 12,
Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports
Signed-off-by: Naresh Kamboju naresh.kamboju@linaro.org
linux-stable-mirror@lists.linaro.org