This is the start of the stable review cycle for the 4.4.173 release. There are 65 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Feb 6 10:35:30 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.173-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.4.173-rc1
Dan Carpenter dan.carpenter@oracle.com ipv4: frags: precedence bug in ip_expire()
Taehee Yoo ap420073@gmail.com ip: frags: fix crash in ip_do_fragment()
Michal Kubecek mkubecek@suse.cz net: ipv4: do not handle duplicate fragments as overlapping
Peter Oskolkov posk@google.com ip: process in-order fragments efficiently
Peter Oskolkov posk@google.com ip: add helpers to process in-order fragments faster.
Florian Westphal fw@strlen.de ipv6: defrag: drop non-last frags smaller than min mtu
Peter Oskolkov posk@google.com ip: use rb trees for IP frag queue.
Eric Dumazet edumazet@google.com inet: frags: get rif of inet_frag_evicting()
Peter Oskolkov posk@google.com net: modify skb_rbtree_purge to return the truesize of all purged skbs.
Peter Oskolkov posk@google.com ip: discard IPv4 datagrams with overlapping segments.
Dave Chinner dchinner@redhat.com fs: don't scan the inode cache before SB_BORN is set
David Hildenbrand david@redhat.com mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
Benjamin Herrenschmidt benh@kernel.crashing.org drivers: core: Remove glue dirs from sysfs earlier
Paulo Alcantara paulo@paulo.ac cifs: Always resolve hostname before reconnecting
Shakeel Butt shakeelb@google.com mm, oom: fix use-after-free in oom_kill_process
Andrei Vagin avagin@gmail.com kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
Stefan Wahren stefan.wahren@i2se.com mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
João Paulo Rechi Vita jprvita@gmail.com platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes
João Paulo Rechi Vita jprvita@gmail.com platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK
Andreas Gruenbacher agruenba@redhat.com gfs2: Revert "Fix loop in gfs2_rbm_find"
James Morse james.morse@arm.com arm64: hyp-stub: Forbid kprobing of the hyp-stub
Koen Vandeputte koen.vandeputte@ncentric.com ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment
Waiman Long longman@redhat.com fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
Pavel Shilovsky pshilov@microsoft.com CIFS: Do not count -ENODATA as failure for query directory
Jacob Wen jian.w.wen@oracle.com l2tp: fix reading optional fields of L2TPv3
Lorenzo Bianconi lorenzo.bianconi@redhat.com l2tp: remove l2specific_len dependency in l2tp_core
Mathias Thore mathias.thore@infinera.com ucc_geth: Reset BQL queue when stopping device
Bernard Pidoux f6bvp@free.fr net/rose: fix NULL ax25_cb kernel panic
Cong Wang xiyou.wangcong@gmail.com netrom: switch to sock timer API
Aya Levin ayal@mellanox.com net/mlx4_core: Add masking for a few queries on HCA caps
Jacob Wen jian.w.wen@oracle.com l2tp: copy 4 more bytes to linear part if necessary
David Ahern dsahern@gmail.com ipv6: Consider sk_bound_dev_if when binding a socket to an address
Jimmy Durand Wesolowski jdw@amazon.de fs: add the fsnotify call to vfs_iter_write
David Hildenbrand david@redhat.com s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "loop: Fold __loop_release into loop_release"
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "loop: Get rid of loop_index_mutex"
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()"
Pan Bian bianpan2016@163.com f2fs: read page index before freeing
Shaokun Zhang zhangshaokun@hisilicon.com arm64: mm: remove page_mapping check in __sync_icache_dcache
Marc Zyngier marc.zyngier@arm.com irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size
Milian Wolff milian.wolff@kdab.com perf unwind: Take pgoff into account when reporting elf to libdwfl
Martin Vuille jpmv27@aim.com perf unwind: Unwind with libdw doesn't take symfs into account
Nicolas Pitre nicolas.pitre@linaro.org vt: invoke notifier on screen size change
Oliver Hartkopp socketcan@hartkopp.net can: bcm: check timer values before ktime conversion
Manfred Schlaegl manfred.schlaegl@ginzinger.com can: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it
Daniel Drake drake@endlessm.com x86/kaslr: Fix incorrect i8254 outb() parameters
Alexander Popov alex.popov@linux.com KVM: x86: Fix single-step debugging
Tom Panfil tom@steelseries.com Input: xpad - add support for SteelSeries Stratus Duo
Pavel Shilovsky pshilov@microsoft.com CIFS: Fix possible hang during async MTU reads and writes
Paul Fulghum paulkf@microgate.com tty/n_hdlc: fix __might_sleep warning
Greg Kroah-Hartman gregkh@linuxfoundation.org tty: Handle problem if line discipline does not have receive_buf
Michael Straube straube.linux@gmail.com staging: rtl8188eu: Add device code for D-Link DWA-121 rev B1
Gustavo A. R. Silva gustavo@embeddedor.com char/mwave: fix potential Spectre v1 vulnerability
Gerald Schaefer gerald.schaefer@de.ibm.com s390/smp: fix CPU hotplug deadlock with CPU rescan
Christian Borntraeger borntraeger@de.ibm.com s390/early: improve machine detection
Eugeniy Paltsev Eugeniy.Paltsev@synopsys.com ARC: perf: map generic branches to correct hardware condition
Kangjie Lu kjlu@umn.edu ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages
Charles Yeh charlesyeh522@gmail.com USB: serial: pl2303: add new PID to support PL2303TB
Max Schulze max.schulze@posteo.de USB: serial: simple: add Motorola Tetra TPG2200 device id
Vijay Viswanath vviswana@codeaurora.org mmc: Kconfig: Enable CONFIG_MMC_SDHCI_IO_ACCESSORS
Yunjian Wang wangyunjian@huawei.com net: bridge: Fix ethernet header pointer before check skb forwardable
Cong Wang xiyou.wangcong@gmail.com net_sched: refetch skb protocol for each filter
Ido Schimmel idosch@mellanox.com net: ipv4: Fix memory leak in network namespace dismantle
Ross Lagerwall ross.lagerwall@citrix.com openvswitch: Avoid OOB read when parsing flow nlattrs
Ross Lagerwall ross.lagerwall@citrix.com net: Fix usage of pskb_trim_rcsum
-------------
Diffstat:
Makefile | 4 +- arch/arc/include/asm/perf_event.h | 3 +- arch/arm/mach-cns3xxx/pcie.c | 2 +- arch/arm64/kernel/hyp-stub.S | 2 + arch/arm64/mm/flush.c | 4 - arch/s390/kernel/early.c | 4 +- arch/s390/kernel/setup.c | 2 + arch/s390/kernel/smp.c | 12 +- arch/x86/boot/compressed/aslr.c | 4 +- arch/x86/kvm/x86.c | 3 +- drivers/base/core.c | 2 + drivers/block/loop.c | 47 +-- drivers/char/mwave/mwavedd.c | 7 + drivers/input/joystick/xpad.c | 3 + drivers/irqchip/irq-gic-v3-its.c | 25 +- drivers/mmc/host/Kconfig | 1 + drivers/mmc/host/sdhci-iproc.c | 5 +- drivers/net/can/dev.c | 27 +- drivers/net/ethernet/freescale/ucc_geth.c | 2 + drivers/net/ethernet/mellanox/mlx4/fw.c | 75 +++-- drivers/net/ppp/pppoe.c | 1 + drivers/platform/x86/asus-nb-wmi.c | 3 +- drivers/s390/char/sclp_config.c | 2 + drivers/staging/rtl8188eu/os_dep/usb_intf.c | 1 + drivers/tty/n_hdlc.c | 1 + drivers/tty/tty_io.c | 3 +- drivers/tty/vt/vt.c | 1 + drivers/usb/serial/pl2303.c | 1 + drivers/usb/serial/pl2303.h | 2 + drivers/usb/serial/usb-serial-simple.c | 3 +- fs/cifs/connect.c | 53 ++++ fs/cifs/smb2ops.c | 6 +- fs/cifs/smb2pdu.c | 4 +- fs/dcache.c | 6 +- fs/f2fs/node.c | 4 +- fs/gfs2/rgrp.c | 2 +- fs/read_write.c | 4 +- fs/super.c | 30 +- include/linux/kobject.h | 17 ++ include/linux/skbuff.h | 5 +- include/net/inet_frag.h | 12 +- include/net/ip_fib.h | 2 +- include/uapi/linux/snmp.h | 1 + kernel/exit.c | 12 +- mm/migrate.c | 7 +- mm/oom_kill.c | 8 + net/bridge/br_forward.c | 7 +- net/bridge/br_netfilter_ipv6.c | 1 + net/bridge/netfilter/nft_reject_bridge.c | 1 + net/can/bcm.c | 27 ++ net/core/skbuff.c | 6 +- net/ipv4/fib_frontend.c | 4 +- net/ipv4/fib_trie.c | 14 +- net/ipv4/inet_fragment.c | 16 +- net/ipv4/ip_fragment.c | 410 ++++++++++++++++----------- net/ipv4/ip_input.c | 1 + net/ipv4/proc.c | 1 + net/ipv6/af_inet6.c | 3 + net/ipv6/netfilter/nf_conntrack_reasm.c | 6 + net/ipv6/reassembly.c | 9 +- net/l2tp/l2tp_core.c | 43 +-- net/l2tp/l2tp_core.h | 31 ++ net/l2tp/l2tp_ip.c | 3 + net/l2tp/l2tp_ip6.c | 3 + net/netrom/nr_timer.c | 20 +- net/openvswitch/flow_netlink.c | 2 +- net/rose/rose_route.c | 5 + net/sched/sch_api.c | 3 +- sound/soc/intel/atom/sst-mfld-platform-pcm.c | 8 +- tools/perf/util/unwind-libdw.c | 4 +- 70 files changed, 706 insertions(+), 347 deletions(-)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ross Lagerwall ross.lagerwall@citrix.com
[ Upstream commit 6c57f0458022298e4da1729c67bd33ce41c14e7a ]
In certain cases, pskb_trim_rcsum() may change skb pointers. Reinitialize header pointers afterwards to avoid potential use-after-frees. Add a note in the documentation of pskb_trim_rcsum(). Found by KASAN.
Signed-off-by: Ross Lagerwall ross.lagerwall@citrix.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ppp/pppoe.c | 1 + include/linux/skbuff.h | 1 + net/bridge/br_netfilter_ipv6.c | 1 + net/bridge/netfilter/nft_reject_bridge.c | 1 + net/ipv4/ip_input.c | 1 + 5 files changed, 5 insertions(+)
--- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -442,6 +442,7 @@ static int pppoe_rcv(struct sk_buff *skb if (pskb_trim_rcsum(skb, len)) goto drop;
+ ph = pppoe_hdr(skb); pn = pppoe_pernet(dev_net(dev));
/* Note that get_item does a sock_hold(), so sk_pppox(po) --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2798,6 +2798,7 @@ static inline unsigned char *skb_push_rc * * This is exactly the same as pskb_trim except that it ensures the * checksum of received packets are still valid after the operation. + * It can change skb pointers. */
static inline int pskb_trim_rcsum(struct sk_buff *skb, unsigned int len) --- a/net/bridge/br_netfilter_ipv6.c +++ b/net/bridge/br_netfilter_ipv6.c @@ -131,6 +131,7 @@ int br_validate_ipv6(struct net *net, st IPSTATS_MIB_INDISCARDS); goto drop; } + hdr = ipv6_hdr(skb); } if (hdr->nexthdr == NEXTHDR_HOP && br_nf_check_hbh_len(skb)) goto drop; --- a/net/bridge/netfilter/nft_reject_bridge.c +++ b/net/bridge/netfilter/nft_reject_bridge.c @@ -192,6 +192,7 @@ static bool reject6_br_csum_ok(struct sk pskb_trim_rcsum(skb, ntohs(ip6h->payload_len) + sizeof(*ip6h))) return false;
+ ip6h = ipv6_hdr(skb); thoff = ipv6_skip_exthdr(skb, ((u8*)(ip6h+1) - skb->data), &proto, &fo); if (thoff < 0 || thoff >= skb->len || (fo & htons(~0x7)) != 0) return false; --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -444,6 +444,7 @@ int ip_rcv(struct sk_buff *skb, struct n goto drop; }
+ iph = ip_hdr(skb); skb->transport_header = skb->network_header + iph->ihl*4;
/* Remove any debris in the socket control block */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ross Lagerwall ross.lagerwall@citrix.com
[ Upstream commit 04a4af334b971814eedf4e4a413343ad3287d9a9 ]
For nested and variable attributes, the expected length of an attribute is not known and marked by a negative number. This results in an OOB read when the expected length is later used to check if the attribute is all zeros. Fix this by using the actual length of the attribute rather than the expected length.
Signed-off-by: Ross Lagerwall ross.lagerwall@citrix.com Acked-by: Pravin B Shelar pshelar@ovn.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/openvswitch/flow_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -409,7 +409,7 @@ static int __parse_flow_nlattrs(const st return -EINVAL; }
- if (!nz || !is_all_zero(nla_data(nla), expected_len)) { + if (!nz || !is_all_zero(nla_data(nla), nla_len(nla))) { attrs |= 1 << type; a[type] = nla; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@mellanox.com
[ Upstream commit f97f4dd8b3bb9d0993d2491e0f22024c68109184 ]
IPv4 routing tables are flushed in two cases:
1. In response to events in the netdev and inetaddr notification chains 2. When a network namespace is being dismantled
In both cases only routes associated with a dead nexthop group are flushed. However, a nexthop group will only be marked as dead in case it is populated with actual nexthops using a nexthop device. This is not the case when the route in question is an error route (e.g., 'blackhole', 'unreachable').
Therefore, when a network namespace is being dismantled such routes are not flushed and leaked [1].
To reproduce: # ip netns add blue # ip -n blue route add unreachable 192.0.2.0/24 # ip netns del blue
Fix this by not skipping error routes that are not marked with RTNH_F_DEAD when flushing the routing tables.
To prevent the flushing of such routes in case #1, add a parameter to fib_table_flush() that indicates if the table is flushed as part of namespace dismantle or not.
Note that this problem does not exist in IPv6 since error routes are associated with the loopback device.
[1] unreferenced object 0xffff888066650338 (size 56): comm "ip", pid 1206, jiffies 4294786063 (age 26.235s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff ..........ba.... e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00 ...d............ backtrace: [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220 [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20 [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380 [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690 [<0000000014f62875>] netlink_sendmsg+0x929/0xe10 [<00000000bac9d967>] sock_sendmsg+0xc8/0x110 [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0 [<000000002e94f880>] __sys_sendmsg+0xf7/0x250 [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610 [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<000000003a8b605b>] 0xffffffffffffffff unreferenced object 0xffff888061621c88 (size 48): comm "ip", pid 1206, jiffies 4294786063 (age 26.235s) hex dump (first 32 bytes): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff kkkkkkkk..&_.... backtrace: [<00000000733609e3>] fib_table_insert+0x978/0x1500 [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220 [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20 [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380 [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690 [<0000000014f62875>] netlink_sendmsg+0x929/0xe10 [<00000000bac9d967>] sock_sendmsg+0xc8/0x110 [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0 [<000000002e94f880>] __sys_sendmsg+0xf7/0x250 [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610 [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<000000003a8b605b>] 0xffffffffffffffff
Fixes: 8cced9eff1d4 ("[NETNS]: Enable routing configuration in non-initial namespace.") Signed-off-by: Ido Schimmel idosch@mellanox.com Reviewed-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ip_fib.h | 2 +- net/ipv4/fib_frontend.c | 4 ++-- net/ipv4/fib_trie.c | 14 ++++++++++++-- 3 files changed, 15 insertions(+), 5 deletions(-)
--- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -200,7 +200,7 @@ int fib_table_insert(struct fib_table *, int fib_table_delete(struct fib_table *, struct fib_config *); int fib_table_dump(struct fib_table *table, struct sk_buff *skb, struct netlink_callback *cb); -int fib_table_flush(struct fib_table *table); +int fib_table_flush(struct fib_table *table, bool flush_all); struct fib_table *fib_trie_unmerge(struct fib_table *main_tb); void fib_table_flush_external(struct fib_table *table); void fib_free_table(struct fib_table *tb); --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -187,7 +187,7 @@ static void fib_flush(struct net *net) struct fib_table *tb;
hlist_for_each_entry_safe(tb, tmp, head, tb_hlist) - flushed += fib_table_flush(tb); + flushed += fib_table_flush(tb, false); }
if (flushed) @@ -1277,7 +1277,7 @@ static void ip_fib_net_exit(struct net *
hlist_for_each_entry_safe(tb, tmp, head, tb_hlist) { hlist_del(&tb->tb_hlist); - fib_table_flush(tb); + fib_table_flush(tb, true); fib_free_table(tb); } } --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1806,7 +1806,7 @@ void fib_table_flush_external(struct fib }
/* Caller must hold RTNL. */ -int fib_table_flush(struct fib_table *tb) +int fib_table_flush(struct fib_table *tb, bool flush_all) { struct trie *t = (struct trie *)tb->tb_data; struct key_vector *pn = t->kv; @@ -1850,7 +1850,17 @@ int fib_table_flush(struct fib_table *tb hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) { struct fib_info *fi = fa->fa_info;
- if (!fi || !(fi->fib_flags & RTNH_F_DEAD)) { + if (!fi || + (!(fi->fib_flags & RTNH_F_DEAD) && + !fib_props[fa->fa_type].error)) { + slen = fa->fa_slen; + continue; + } + + /* Do not flush error routes if network namespace is + * not being dismantled + */ + if (!flush_all && fib_props[fa->fa_type].error) { slen = fa->fa_slen; continue; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit cd0c4e70fc0ccfa705cdf55efb27519ce9337a26 ]
Martin reported a set of filters don't work after changing from reclassify to continue. Looking into the code, it looks like skb protocol is not always fetched for each iteration of the filters. But, as demonstrated by Martin, TC actions could modify skb->protocol, for example act_vlan, this means we have to refetch skb protocol in each iteration, rather than using the one we fetch in the beginning of the loop.
This bug is _not_ introduced by commit 3b3ae880266d ("net: sched: consolidate tc_classify{,_compat}"), technically, if act_vlan is the only action that modifies skb protocol, then it is commit c7e2b9689ef8 ("sched: introduce vlan action") which introduced this bug.
Reported-by: Martin Olsson martin.olsson+netdev@sentorsecurity.com Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Jiri Pirko jiri@resnulli.us Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Acked-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_api.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1823,7 +1823,6 @@ done: int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct tcf_result *res, bool compat_mode) { - __be16 protocol = tc_skb_protocol(skb); #ifdef CONFIG_NET_CLS_ACT const struct tcf_proto *old_tp = tp; int limit = 0; @@ -1831,6 +1830,7 @@ int tc_classify(struct sk_buff *skb, con reclassify: #endif for (; tp; tp = rcu_dereference_bh(tp->next)) { + __be16 protocol = tc_skb_protocol(skb); int err;
if (tp->protocol != protocol && @@ -1857,7 +1857,6 @@ reset: }
tp = old_tp; - protocol = tc_skb_protocol(skb); goto reclassify; #endif }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yunjian Wang wangyunjian@huawei.com
[ Upstream commit 28c1382fa28f2e2d9d0d6f25ae879b5af2ecbd03 ]
The skb header should be set to ethernet header before using is_skb_forwardable. Because the ethernet header length has been considered in is_skb_forwardable(including dev->hard_header_len length).
To reproduce the issue: 1, add 2 ports on linux bridge br using following commands: $ brctl addbr br $ brctl addif br eth0 $ brctl addif br eth1 2, the MTU of eth0 and eth1 is 1500 3, send a packet(Data 1480, UDP 8, IP 20, Ethernet 14, VLAN 4) from eth0 to eth1
So the expect result is packet larger than 1500 cannot pass through eth0 and eth1. But currently, the packet passes through success, it means eth1's MTU limit doesn't take effect.
Fixes: f6367b4660dd ("bridge: use is_skb_forwardable in forward path") Cc: bridge@lists.linux-foundation.org Cc: Nkolay Aleksandrov nikolay@cumulusnetworks.com Cc: Roopa Prabhu roopa@cumulusnetworks.com Cc: Stephen Hemminger stephen@networkplumber.org Signed-off-by: Yunjian Wang wangyunjian@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bridge/br_forward.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -39,10 +39,10 @@ static inline int should_deliver(const s
int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) { + skb_push(skb, ETH_HLEN); if (!is_skb_forwardable(skb->dev, skb)) goto drop;
- skb_push(skb, ETH_HLEN); br_drop_fake_rtable(skb); skb_sender_cpu_clear(skb);
@@ -88,12 +88,11 @@ static void __br_deliver(const struct ne skb->dev = to->dev;
if (unlikely(netpoll_tx_running(to->br->dev))) { + skb_push(skb, ETH_HLEN); if (!is_skb_forwardable(skb->dev, skb)) kfree_skb(skb); - else { - skb_push(skb, ETH_HLEN); + else br_netpoll_send_skb(to, skb); - } return; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
commit 99d570da309813f67e9c741edeff55bafc6c1d5e upstream.
Enable CONFIG_MMC_SDHCI_IO_ACCESSORS so that SDHC controller specific register read and write APIs, if registered, can be used.
Signed-off-by: Vijay Viswanath vviswana@codeaurora.org Acked-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Cc: Koen Vandeputte koen.vandeputte@ncentric.com Cc: Loic Poulain loic.poulain@linaro.org Signed-off-by: Georgi Djakov georgi.djakov@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/Kconfig | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/mmc/host/Kconfig +++ b/drivers/mmc/host/Kconfig @@ -409,6 +409,7 @@ config MMC_SDHCI_MSM tristate "Qualcomm SDHCI Controller Support" depends on ARCH_QCOM || (ARM && COMPILE_TEST) depends on MMC_SDHCI_PLTFM + select MMC_SDHCI_IO_ACCESSORS help This selects the Secure Digital Host Controller Interface (SDHCI) support present in Qualcomm SOCs. The controller supports
On 2/4/19 12:35, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me know.
commit 99d570da309813f67e9c741edeff55bafc6c1d5e upstream.
Enable CONFIG_MMC_SDHCI_IO_ACCESSORS so that SDHC controller specific register read and write APIs, if registered, can be used.
Signed-off-by: Vijay Viswanath vviswana@codeaurora.org Acked-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Cc: Koen Vandeputte koen.vandeputte@ncentric.com Cc: Loic Poulain loic.poulain@linaro.org Signed-off-by: Georgi Djakov georgi.djakov@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/mmc/host/Kconfig | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/mmc/host/Kconfig +++ b/drivers/mmc/host/Kconfig @@ -409,6 +409,7 @@ config MMC_SDHCI_MSM tristate "Qualcomm SDHCI Controller Support" depends on ARCH_QCOM || (ARM && COMPILE_TEST) depends on MMC_SDHCI_PLTFM
- select MMC_SDHCI_IO_ACCESSORS help This selects the Secure Digital Host Controller Interface (SDHCI) support present in Qualcomm SOCs. The controller supports
This patch is not needed in 4.4-stable. Please drop it.
Thanks, Georgi
On Mon, Feb 04, 2019 at 01:05:32PM +0200, Georgi Djakov wrote:
On 2/4/19 12:35, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me know.
commit 99d570da309813f67e9c741edeff55bafc6c1d5e upstream.
Enable CONFIG_MMC_SDHCI_IO_ACCESSORS so that SDHC controller specific register read and write APIs, if registered, can be used.
Signed-off-by: Vijay Viswanath vviswana@codeaurora.org Acked-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Cc: Koen Vandeputte koen.vandeputte@ncentric.com Cc: Loic Poulain loic.poulain@linaro.org Signed-off-by: Georgi Djakov georgi.djakov@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/mmc/host/Kconfig | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/mmc/host/Kconfig +++ b/drivers/mmc/host/Kconfig @@ -409,6 +409,7 @@ config MMC_SDHCI_MSM tristate "Qualcomm SDHCI Controller Support" depends on ARCH_QCOM || (ARM && COMPILE_TEST) depends on MMC_SDHCI_PLTFM
- select MMC_SDHCI_IO_ACCESSORS help This selects the Secure Digital Host Controller Interface (SDHCI) support present in Qualcomm SOCs. The controller supports
This patch is not needed in 4.4-stable. Please drop it.
Oops, sorry about that, now dropped, thanks.
greg k-h
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Schulze max.schulze@posteo.de
commit b81c2c33eab79dfd3650293b2227ee5c6036585c upstream.
Add new Motorola Tetra device id for Motorola Solutions TETRA PEI device
T: Bus=02 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 4 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=0cad ProdID=9016 Rev=24.16 S: Manufacturer=Motorola Solutions, Inc. S: Product=TETRA PEI interface C: #Ifs= 2 Cfg#= 1 Atr=80 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=usb_serial_simple I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=usb_serial_simple
Signed-off-by: Max Schulze max.schulze@posteo.de Cc: stable stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/serial/usb-serial-simple.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/usb/serial/usb-serial-simple.c +++ b/drivers/usb/serial/usb-serial-simple.c @@ -88,7 +88,8 @@ DEVICE(moto_modem, MOTO_IDS); /* Motorola Tetra driver */ #define MOTOROLA_TETRA_IDS() \ { USB_DEVICE(0x0cad, 0x9011) }, /* Motorola Solutions TETRA PEI */ \ - { USB_DEVICE(0x0cad, 0x9012) } /* MTP6550 */ + { USB_DEVICE(0x0cad, 0x9012) }, /* MTP6550 */ \ + { USB_DEVICE(0x0cad, 0x9016) } /* TPG2200 */ DEVICE(motorola_tetra, MOTOROLA_TETRA_IDS);
/* Novatel Wireless GPS driver */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Charles Yeh charlesyeh522@gmail.com
commit 4dcf9ddc9ad5ab649abafa98c5a4d54b1a33dabb upstream.
Add new PID to support PL2303TB (TYPE_HX)
Signed-off-by: Charles Yeh charlesyeh522@gmail.com Cc: stable stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/serial/pl2303.c | 1 + drivers/usb/serial/pl2303.h | 2 ++ 2 files changed, 3 insertions(+)
--- a/drivers/usb/serial/pl2303.c +++ b/drivers/usb/serial/pl2303.c @@ -47,6 +47,7 @@ static const struct usb_device_id id_tab { USB_DEVICE(PL2303_VENDOR_ID, PL2303_PRODUCT_ID_HCR331) }, { USB_DEVICE(PL2303_VENDOR_ID, PL2303_PRODUCT_ID_MOTOROLA) }, { USB_DEVICE(PL2303_VENDOR_ID, PL2303_PRODUCT_ID_ZTEK) }, + { USB_DEVICE(PL2303_VENDOR_ID, PL2303_PRODUCT_ID_TB) }, { USB_DEVICE(IODATA_VENDOR_ID, IODATA_PRODUCT_ID) }, { USB_DEVICE(IODATA_VENDOR_ID, IODATA_PRODUCT_ID_RSAQ5) }, { USB_DEVICE(ATEN_VENDOR_ID, ATEN_PRODUCT_ID) }, --- a/drivers/usb/serial/pl2303.h +++ b/drivers/usb/serial/pl2303.h @@ -13,6 +13,7 @@
#define PL2303_VENDOR_ID 0x067b #define PL2303_PRODUCT_ID 0x2303 +#define PL2303_PRODUCT_ID_TB 0x2304 #define PL2303_PRODUCT_ID_RSAQ2 0x04bb #define PL2303_PRODUCT_ID_DCU11 0x1234 #define PL2303_PRODUCT_ID_PHAROS 0xaaa0 @@ -25,6 +26,7 @@ #define PL2303_PRODUCT_ID_MOTOROLA 0x0307 #define PL2303_PRODUCT_ID_ZTEK 0xe1f1
+ #define ATEN_VENDOR_ID 0x0557 #define ATEN_VENDOR_ID2 0x0547 #define ATEN_PRODUCT_ID 0x2008
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kangjie Lu kjlu@umn.edu
commit 44fabd8cdaaa3acb80ad2bb3b5c61ae2136af661 upstream.
snd_pcm_lib_malloc_pages() may fail, so let's check its status and return its error code upstream.
Signed-off-by: Kangjie Lu kjlu@umn.edu Acked-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/intel/atom/sst-mfld-platform-pcm.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/sound/soc/intel/atom/sst-mfld-platform-pcm.c +++ b/sound/soc/intel/atom/sst-mfld-platform-pcm.c @@ -398,7 +398,13 @@ static int sst_media_hw_params(struct sn struct snd_pcm_hw_params *params, struct snd_soc_dai *dai) { - snd_pcm_lib_malloc_pages(substream, params_buffer_bytes(params)); + int ret; + + ret = + snd_pcm_lib_malloc_pages(substream, + params_buffer_bytes(params)); + if (ret) + return ret; memset(substream->runtime->dma_area, 0, params_buffer_bytes(params)); return 0; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eugeniy Paltsev Eugeniy.Paltsev@synopsys.com
commit 3affbf0e154ee351add6fcc254c59c3f3947fa8f upstream.
So far we've mapped branches to "ijmp" which also counts conditional branches NOT taken. This makes us different from other architectures such as ARM which seem to be counting only taken branches.
So use "ijmptak" hardware condition which only counts (all jump instructions that are taken)
'ijmptak' event is available on both ARCompact and ARCv2 ISA based cores.
Signed-off-by: Eugeniy Paltsev Eugeniy.Paltsev@synopsys.com Cc: stable@vger.kernel.org Signed-off-by: Vineet Gupta vgupta@synopsys.com [vgupta: reworked changelog] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arc/include/asm/perf_event.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arc/include/asm/perf_event.h +++ b/arch/arc/include/asm/perf_event.h @@ -103,7 +103,8 @@ static const char * const arc_pmu_ev_hw_
/* counts condition */ [PERF_COUNT_HW_INSTRUCTIONS] = "iall", - [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = "ijmp", /* Excludes ZOL jumps */ + /* All jump instructions that are taken */ + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = "ijmptak", [PERF_COUNT_ARC_BPOK] = "bpok", /* NP-NT, PT-T, PNT-NT */ #ifdef CONFIG_ISA_ARCV2 [PERF_COUNT_HW_BRANCH_MISSES] = "bpmp",
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Borntraeger borntraeger@de.ibm.com
commit 03aa047ef2db4985e444af6ee1c1dd084ad9fb4c upstream.
Right now the early machine detection code check stsi 3.2.2 for "KVM" and set MACHINE_IS_VM if this is different. As the console detection uses diagnose 8 if MACHINE_IS_VM returns true this will crash Linux early for any non z/VM system that sets a different value than KVM. So instead of assuming z/VM, do not set any of MACHINE_IS_LPAR, MACHINE_IS_VM, or MACHINE_IS_KVM.
CC: stable@vger.kernel.org Reviewed-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/early.c | 4 ++-- arch/s390/kernel/setup.c | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-)
--- a/arch/s390/kernel/early.c +++ b/arch/s390/kernel/early.c @@ -224,10 +224,10 @@ static noinline __init void detect_machi if (stsi(vmms, 3, 2, 2) || !vmms->count) return;
- /* Running under KVM? If not we assume z/VM */ + /* Detect known hypervisors */ if (!memcmp(vmms->vm[0].cpi, "\xd2\xe5\xd4", 3)) S390_lowcore.machine_flags |= MACHINE_FLAG_KVM; - else + else if (!memcmp(vmms->vm[0].cpi, "\xa9\x61\xe5\xd4", 4)) S390_lowcore.machine_flags |= MACHINE_FLAG_VM; }
--- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -833,6 +833,8 @@ void __init setup_arch(char **cmdline_p) pr_info("Linux is running under KVM in 64-bit mode\n"); else if (MACHINE_IS_LPAR) pr_info("Linux is running natively in 64-bit mode\n"); + else + pr_info("Linux is running as a guest in 64-bit mode\n");
/* Have one command line that is parsed and saved in /proc/cmdline */ /* boot_command_line has been already set up in early.c */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gerald Schaefer gerald.schaefer@de.ibm.com
commit b7cb707c373094ce4008d4a6ac9b6b366ec52da5 upstream.
smp_rescan_cpus() is called without the device_hotplug_lock, which can lead to a dedlock when a new CPU is found and immediately set online by a udev rule.
This was observed on an older kernel version, where the cpu_hotplug_begin() loop was still present, and it resulted in hanging chcpu and systemd-udev processes. This specific deadlock will not show on current kernels. However, there may be other possible deadlocks, and since smp_rescan_cpus() can still trigger a CPU hotplug operation, the device_hotplug_lock should be held.
For reference, this was the deadlock with the old cpu_hotplug_begin() loop:
chcpu (rescan) systemd-udevd
echo 1 > /sys/../rescan -> smp_rescan_cpus() -> (*) get_online_cpus() (increases refcount) -> smp_add_present_cpu() (new CPU found) -> register_cpu() -> device_add() -> udev "add" event triggered -----------> udev rule sets CPU online -> echo 1 > /sys/.../online -> lock_device_hotplug_sysfs() (this is missing in rescan path) -> device_online() -> (**) device_lock(new CPU dev) -> cpu_up() -> cpu_hotplug_begin() (loops until refcount == 0) -> deadlock with (*) -> bus_probe_device() -> device_attach() -> device_lock(new CPU dev) -> deadlock with (**)
Fix this by taking the device_hotplug_lock in the CPU rescan path.
Cc: stable@vger.kernel.org Signed-off-by: Gerald Schaefer gerald.schaefer@de.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/smp.c | 4 ++++ drivers/s390/char/sclp_config.c | 2 ++ 2 files changed, 6 insertions(+)
--- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -1152,7 +1152,11 @@ static ssize_t __ref rescan_store(struct { int rc;
+ rc = lock_device_hotplug_sysfs(); + if (rc) + return rc; rc = smp_rescan_cpus(); + unlock_device_hotplug(); return rc ? rc : count; } static DEVICE_ATTR(rescan, 0200, NULL, rescan_store); --- a/drivers/s390/char/sclp_config.c +++ b/drivers/s390/char/sclp_config.c @@ -43,7 +43,9 @@ static void sclp_cpu_capability_notify(s
static void __ref sclp_cpu_change_notify(struct work_struct *work) { + lock_device_hotplug(); smp_rescan_cpus(); + unlock_device_hotplug(); }
static void sclp_conf_receiver_fn(struct evbuf_header *evbuf)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit 701956d4018e5d5438570e39e8bda47edd32c489 upstream.
ipcnum is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/char/mwave/mwavedd.c:299 mwave_ioctl() warn: potential spectre issue 'pDrvData->IPCs' [w] (local cap)
Fix this by sanitizing ipcnum before using it to index pDrvData->IPCs.
Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/mwave/mwavedd.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/char/mwave/mwavedd.c +++ b/drivers/char/mwave/mwavedd.c @@ -59,6 +59,7 @@ #include <linux/mutex.h> #include <linux/delay.h> #include <linux/serial_8250.h> +#include <linux/nospec.h> #include "smapi.h" #include "mwavedd.h" #include "3780i.h" @@ -289,6 +290,8 @@ static long mwave_ioctl(struct file *fil ipcnum); return -EINVAL; } + ipcnum = array_index_nospec(ipcnum, + ARRAY_SIZE(pDrvData->IPCs)); PRINTK_3(TRACE_MWAVE, "mwavedd::mwave_ioctl IOCTL_MW_REGISTER_IPC" " ipcnum %x entry usIntCount %x\n", @@ -317,6 +320,8 @@ static long mwave_ioctl(struct file *fil " Invalid ipcnum %x\n", ipcnum); return -EINVAL; } + ipcnum = array_index_nospec(ipcnum, + ARRAY_SIZE(pDrvData->IPCs)); PRINTK_3(TRACE_MWAVE, "mwavedd::mwave_ioctl IOCTL_MW_GET_IPC" " ipcnum %x, usIntCount %x\n", @@ -383,6 +388,8 @@ static long mwave_ioctl(struct file *fil ipcnum); return -EINVAL; } + ipcnum = array_index_nospec(ipcnum, + ARRAY_SIZE(pDrvData->IPCs)); mutex_lock(&mwave_mutex); if (pDrvData->IPCs[ipcnum].bIsEnabled == TRUE) { pDrvData->IPCs[ipcnum].bIsEnabled = FALSE;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Straube straube.linux@gmail.com
commit 5f74a8cbb38d10615ed46bc3e37d9a4c9af8045a upstream.
This device was added to the stand-alone driver on github. Add it to the staging driver as well.
Link: https://github.com/lwfinger/rtl8188eu/commit/a0619a07cd1e Signed-off-by: Michael Straube straube.linux@gmail.com Acked-by: Larry Finger Larry.Finger@lwfinger.net Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/staging/rtl8188eu/os_dep/usb_intf.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/staging/rtl8188eu/os_dep/usb_intf.c +++ b/drivers/staging/rtl8188eu/os_dep/usb_intf.c @@ -47,6 +47,7 @@ static struct usb_device_id rtw_usb_id_t {USB_DEVICE(0x2001, 0x330F)}, /* DLink DWA-125 REV D1 */ {USB_DEVICE(0x2001, 0x3310)}, /* Dlink DWA-123 REV D1 */ {USB_DEVICE(0x2001, 0x3311)}, /* DLink GO-USB-N150 REV B1 */ + {USB_DEVICE(0x2001, 0x331B)}, /* D-Link DWA-121 rev B1 */ {USB_DEVICE(0x2357, 0x010c)}, /* TP-Link TL-WN722N v2 */ {USB_DEVICE(0x0df6, 0x0076)}, /* Sitecom N150 v2 */ {USB_DEVICE(USB_VENDER_ID_REALTEK, 0xffef)}, /* Rosewill RNX-N150NUB */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit 27cfb3a53be46a54ec5e0bd04e51995b74c90343 upstream.
Some tty line disciplines do not have a receive buf callback, so properly check for that before calling it. If they do not have this callback, just eat the character quietly, as we can't fail this call.
Reported-by: Jann Horn jannh@google.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/tty_io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -2297,7 +2297,8 @@ static int tiocsti(struct tty_struct *tt return -EFAULT; tty_audit_tiocsti(tty, ch); ld = tty_ldisc_ref_wait(tty); - ld->ops->receive_buf(tty, &ch, &mbz, 1); + if (ld->ops->receive_buf) + ld->ops->receive_buf(tty, &ch, &mbz, 1); tty_ldisc_deref(ld); return 0; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Fulghum paulkf@microgate.com
commit fc01d8c61ce02c034e67378cd3e645734bc18c8c upstream.
Fix __might_sleep warning[1] in tty/n_hdlc.c read due to copy_to_user call while current is TASK_INTERRUPTIBLE. This is a false positive since the code path does not depend on current state remaining TASK_INTERRUPTIBLE. The loop breaks out and sets TASK_RUNNING after calling copy_to_user.
This patch supresses the warning by setting TASK_RUNNING before calling copy_to_user.
[1] https://syzkaller.appspot.com/bug?id=17d5de7f1fcab794cb8c40032f893f52de89932...
Signed-off-by: Paul Fulghum paulkf@microgate.com Reported-by: syzbot syzbot+c244af085a0159d22879@syzkaller.appspotmail.com Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Alan Cox alan@lxorguk.ukuu.org.uk Cc: stable stable@vger.kernel.org Acked-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/n_hdlc.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/tty/n_hdlc.c +++ b/drivers/tty/n_hdlc.c @@ -598,6 +598,7 @@ static ssize_t n_hdlc_tty_read(struct tt /* too large for caller's buffer */ ret = -EOVERFLOW; } else { + __set_current_state(TASK_RUNNING); if (copy_to_user(buf, rbuf->buf, rbuf->count)) ret = -EFAULT; else
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Shilovsky pshilov@microsoft.com
commit acc58d0bab55a50e02c25f00bd6a210ee121595f upstream.
When doing MTU i/o we need to leave some credits for possible reopen requests and other operations happening in parallel. Currently we leave 1 credit which is not enough even for reopen only: we need at least 2 credits if durable handle reconnect fails. Also there may be other operations at the same time including compounding ones which require 3 credits at a time each. Fix this by leaving 8 credits which is big enough to cover most scenarios.
Was able to reproduce this when server was configured to give out fewer credits than usual.
The proper fix would be to reconnect a file handle first and then obtain credits for an MTU request but this leads to bigger code changes and should happen in other patches.
Cc: stable@vger.kernel.org Signed-off-by: Pavel Shilovsky pshilov@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/smb2ops.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -143,14 +143,14 @@ smb2_wait_mtu_credits(struct TCP_Server_
scredits = server->credits; /* can deadlock with reopen */ - if (scredits == 1) { + if (scredits <= 8) { *num = SMB2_MAX_BUFFER_SIZE; *credits = 0; break; }
- /* leave one credit for a possible reopen */ - scredits--; + /* leave some credits for reopen and other ops */ + scredits -= 8; *num = min_t(unsigned int, size, scredits * SMB2_MAX_BUFFER_SIZE);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Panfil tom@steelseries.com
commit fe2bfd0d40c935763812973ce15f5764f1c12833 upstream.
Add support for the SteelSeries Stratus Duo, a wireless Xbox 360 controller. The Stratus Duo ships with a USB dongle to enable wireless connectivity, but it can also function as a wired controller by connecting it directly to a PC via USB, hence the need for two USD PIDs. 0x1430 is the dongle, and 0x1431 is the controller.
Signed-off-by: Tom Panfil tom@steelseries.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/joystick/xpad.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/input/joystick/xpad.c +++ b/drivers/input/joystick/xpad.c @@ -255,6 +255,8 @@ static const struct xpad_device { { 0x0f30, 0x0202, "Joytech Advanced Controller", 0, XTYPE_XBOX }, { 0x0f30, 0x8888, "BigBen XBMiniPad Controller", 0, XTYPE_XBOX }, { 0x102c, 0xff0c, "Joytech Wireless Advanced Controller", 0, XTYPE_XBOX }, + { 0x1038, 0x1430, "SteelSeries Stratus Duo", 0, XTYPE_XBOX360 }, + { 0x1038, 0x1431, "SteelSeries Stratus Duo", 0, XTYPE_XBOX360 }, { 0x11c9, 0x55f0, "Nacon GC-100XF", 0, XTYPE_XBOX360 }, { 0x12ab, 0x0004, "Honey Bee Xbox360 dancepad", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX360 }, { 0x12ab, 0x0301, "PDP AFTERGLOW AX.1", 0, XTYPE_XBOX360 }, @@ -431,6 +433,7 @@ static const struct usb_device_id xpad_t XPAD_XBOXONE_VENDOR(0x0e6f), /* 0x0e6f X-Box One controllers */ XPAD_XBOX360_VENDOR(0x0f0d), /* Hori Controllers */ XPAD_XBOXONE_VENDOR(0x0f0d), /* Hori Controllers */ + XPAD_XBOX360_VENDOR(0x1038), /* SteelSeries Controllers */ XPAD_XBOX360_VENDOR(0x11c9), /* Nacon GC100XF */ XPAD_XBOX360_VENDOR(0x12ab), /* X-Box 360 dance pads */ XPAD_XBOX360_VENDOR(0x1430), /* RedOctane X-Box 360 controllers */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Popov alex.popov@linux.com
commit 5cc244a20b86090c087073c124284381cdf47234 upstream.
The single-step debugging of KVM guests on x86 is broken: if we run gdb 'stepi' command at the breakpoint when the guest interrupts are enabled, RIP always jumps to native_apic_mem_write(). Then other nasty effects follow.
Long investigation showed that on Jun 7, 2017 the commit c8401dda2f0a00cd25c0 ("KVM: x86: fix singlestepping over syscall") introduced the kvm_run.debug corruption: kvm_vcpu_do_singlestep() can be called without X86_EFLAGS_TF set.
Let's fix it. Please consider that for -stable.
Signed-off-by: Alexander Popov alex.popov@linux.com Cc: stable@vger.kernel.org Fixes: c8401dda2f0a00cd25c0 ("KVM: x86: fix singlestepping over syscall") Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/x86.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5524,8 +5524,7 @@ restart: toggle_interruptibility(vcpu, ctxt->interruptibility); vcpu->arch.emulate_regs_need_sync_to_vcpu = false; kvm_rip_write(vcpu, ctxt->eip); - if (r == EMULATE_DONE && - (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP))) + if (r == EMULATE_DONE && ctxt->tf) kvm_vcpu_do_singlestep(vcpu, &r); if (!ctxt->have_exception || exception_type(ctxt->exception.vector) == EXCPT_TRAP)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Drake drake@endlessm.com
commit 7e6fc2f50a3197d0e82d1c0e86282976c9e6c8a4 upstream.
The outb() function takes parameters value and port, in that order. Fix the parameters used in the kalsr i8254 fallback code.
Fixes: 5bfce5ef55cb ("x86, kaslr: Provide randomness functions") Signed-off-by: Daniel Drake drake@endlessm.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: bp@alien8.de Cc: hpa@zytor.com Cc: linux@endlessm.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190107034024.15005-1-drake@endlessm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/boot/compressed/aslr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -25,8 +25,8 @@ static inline u16 i8254(void) u16 status, timer;
do { - outb(I8254_PORT_CONTROL, - I8254_CMD_READBACK | I8254_SELECT_COUNTER0); + outb(I8254_CMD_READBACK | I8254_SELECT_COUNTER0, + I8254_PORT_CONTROL); status = inb(I8254_PORT_COUNTER0); timer = inb(I8254_PORT_COUNTER0); timer |= inb(I8254_PORT_COUNTER0) << 8;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Manfred Schlaegl manfred.schlaegl@ginzinger.com
commit 7b12c8189a3dc50638e7d53714c88007268d47ef upstream.
This patch revert commit 7da11ba5c506 ("can: dev: __can_get_echo_skb(): print error message, if trying to echo non existing skb")
After introduction of this change we encountered following new error message on various i.MX plattforms (flexcan):
| flexcan 53fc8000.can can0: __can_get_echo_skb: BUG! Trying to echo non | existing skb: can_priv::echo_skb[0]
The introduction of the message was a mistake because priv->echo_skb[idx] = NULL is a perfectly valid in following case: If CAN_RAW_LOOPBACK is disabled (setsockopt) in applications, the pkt_type of the tx skb's given to can_put_echo_skb is set to PACKET_LOOPBACK. In this case can_put_echo_skb will not set priv->echo_skb[idx]. It is therefore kept NULL.
As additional argument for revert: The order of check and usage of idx was changed. idx is used to access an array element before checking it's boundaries.
Signed-off-by: Manfred Schlaegl manfred.schlaegl@ginzinger.com Fixes: 7da11ba5c506 ("can: dev: __can_get_echo_skb(): print error message, if trying to echo non existing skb") Cc: linux-stable stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/can/dev.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-)
--- a/drivers/net/can/dev.c +++ b/drivers/net/can/dev.c @@ -426,8 +426,6 @@ EXPORT_SYMBOL_GPL(can_put_echo_skb); struct sk_buff *__can_get_echo_skb(struct net_device *dev, unsigned int idx, u8 *len_ptr) { struct can_priv *priv = netdev_priv(dev); - struct sk_buff *skb = priv->echo_skb[idx]; - struct canfd_frame *cf;
if (idx >= priv->echo_skb_max) { netdev_err(dev, "%s: BUG! Trying to access can_priv::echo_skb out of bounds (%u/max %u)\n", @@ -435,20 +433,21 @@ struct sk_buff *__can_get_echo_skb(struc return NULL; }
- if (!skb) { - netdev_err(dev, "%s: BUG! Trying to echo non existing skb: can_priv::echo_skb[%u]\n", - __func__, idx); - return NULL; - } + if (priv->echo_skb[idx]) { + /* Using "struct canfd_frame::len" for the frame + * length is supported on both CAN and CANFD frames. + */ + struct sk_buff *skb = priv->echo_skb[idx]; + struct canfd_frame *cf = (struct canfd_frame *)skb->data; + u8 len = cf->len;
- /* Using "struct canfd_frame::len" for the frame - * length is supported on both CAN and CANFD frames. - */ - cf = (struct canfd_frame *)skb->data; - *len_ptr = cf->len; - priv->echo_skb[idx] = NULL; + *len_ptr = len; + priv->echo_skb[idx] = NULL; + + return skb; + }
- return skb; + return NULL; }
/*
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oliver Hartkopp socketcan@hartkopp.net
commit 93171ba6f1deffd82f381d36cb13177872d023f6 upstream.
Kyungtae Kim detected a potential integer overflow in bcm_[rx|tx]_setup() when the conversion into ktime multiplies the given value with NSEC_PER_USEC (1000).
Reference: https://marc.info/?l=linux-can&m=154732118819828&w=2
Add a check for the given tv_usec, so that the value stays below one second. Additionally limit the tv_sec value to a reasonable value for CAN related use-cases of 400 days and ensure all values to be positive.
Reported-by: Kyungtae Kim kt0755@gmail.com Tested-by: Oliver Hartkopp socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp socketcan@hartkopp.net Cc: linux-stable stable@vger.kernel.org # versions 2.6.26 to 4.7 Tested-by: Kyungtae Kim kt0755@gmail.com Acked-by: Andre Naujoks nautsch2@gmail.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/can/bcm.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
--- a/net/can/bcm.c +++ b/net/can/bcm.c @@ -67,6 +67,9 @@ */ #define MAX_NFRAMES 256
+/* limit timers to 400 days for sending/timeouts */ +#define BCM_TIMER_SEC_MAX (400 * 24 * 60 * 60) + /* use of last_frames[index].can_dlc */ #define RX_RECV 0x40 /* received data for this element */ #define RX_THR 0x80 /* element not been sent due to throttle feature */ @@ -136,6 +139,22 @@ static inline ktime_t bcm_timeval_to_kti return ktime_set(tv.tv_sec, tv.tv_usec * NSEC_PER_USEC); }
+/* check limitations for timeval provided by user */ +static bool bcm_is_invalid_tv(struct bcm_msg_head *msg_head) +{ + if ((msg_head->ival1.tv_sec < 0) || + (msg_head->ival1.tv_sec > BCM_TIMER_SEC_MAX) || + (msg_head->ival1.tv_usec < 0) || + (msg_head->ival1.tv_usec >= USEC_PER_SEC) || + (msg_head->ival2.tv_sec < 0) || + (msg_head->ival2.tv_sec > BCM_TIMER_SEC_MAX) || + (msg_head->ival2.tv_usec < 0) || + (msg_head->ival2.tv_usec >= USEC_PER_SEC)) + return true; + + return false; +} + #define CFSIZ sizeof(struct can_frame) #define OPSIZ sizeof(struct bcm_op) #define MHSIZ sizeof(struct bcm_msg_head) @@ -855,6 +874,10 @@ static int bcm_tx_setup(struct bcm_msg_h if (msg_head->nframes < 1 || msg_head->nframes > MAX_NFRAMES) return -EINVAL;
+ /* check timeval limitations */ + if ((msg_head->flags & SETTIMER) && bcm_is_invalid_tv(msg_head)) + return -EINVAL; + /* check the given can_id */ op = bcm_find_op(&bo->tx_ops, msg_head->can_id, ifindex);
@@ -1020,6 +1043,10 @@ static int bcm_rx_setup(struct bcm_msg_h (!(msg_head->can_id & CAN_RTR_FLAG)))) return -EINVAL;
+ /* check timeval limitations */ + if ((msg_head->flags & SETTIMER) && bcm_is_invalid_tv(msg_head)) + return -EINVAL; + /* check the given can_id */ op = bcm_find_op(&bo->rx_ops, msg_head->can_id, ifindex); if (op) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicolas Pitre nicolas.pitre@linaro.org
commit 0c9b1965faddad7534b6974b5b36c4ad37998f8e upstream.
User space using poll() on /dev/vcs devices are not awaken when a screen size change occurs. Let's fix that.
Signed-off-by: Nicolas Pitre nico@linaro.org Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/vt/vt.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -958,6 +958,7 @@ static int vc_do_resize(struct tty_struc if (CON_IS_VISIBLE(vc)) update_screen(vc); vt_event_post(VT_EVENT_RESIZE, vc->vc_num, vc->vc_num); + notify_update(vc); return err; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 3d20c6246690219881786de10d2dda93f616d0ac ]
Path passed to libdw for unwinding doesn't include symfs path if specified, so unwinding fails because ELF file is not found.
Similar to unwinding with libunwind, pass symsrc_filename instead of long_name. If there is no symsrc_filename, fallback to long_name.
Signed-off-by: Martin Vuille jpmv27@aim.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: David Ahern dsahern@gmail.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: Wang Nan wangnan0@huawei.com Link: http://lkml.kernel.org/r/20180211212420.18388-1-jpmv27@aim.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/unwind-libdw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c index 60edec383281..880e8db32484 100644 --- a/tools/perf/util/unwind-libdw.c +++ b/tools/perf/util/unwind-libdw.c @@ -47,7 +47,7 @@ static int __report_module(struct addr_location *al, u64 ip,
if (!mod) mod = dwfl_report_elf(ui->dwfl, dso->short_name, - dso->long_name, -1, al->map->start, + (dso->symsrc_filename ? dso->symsrc_filename : dso->long_name), -1, al->map->start, false);
return mod && dwfl_addrmodule(ui->dwfl, ip) == mod ? 0 : -1;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 1fe627da30331024f453faef04d500079b901107 ]
libdwfl parses an ELF file itself and creates mappings for the individual sections. perf on the other hand sees raw mmap events which represent individual sections. When we encounter an address pointing into a mapping with pgoff != 0, we must take that into account and report the file at the non-offset base address.
This fixes unwinding with libdwfl in some cases. E.g. for a file like:
```
using namespace std;
mutex g_mutex;
double worker() { lock_guard<mutex> guard(g_mutex); uniform_real_distribution<double> uniform(-1E5, 1E5); default_random_engine engine; double s = 0; for (int i = 0; i < 1000; ++i) { s += norm(complex<double>(uniform(engine), uniform(engine))); } cout << s << endl; return s; }
int main() { vector<std::future<double>> results; for (int i = 0; i < 10000; ++i) { results.push_back(async(launch::async, worker)); } return 0; } ```
Compile it with `g++ -g -O2 -lpthread cpp-locking.cpp -o cpp-locking`, then record it with `perf record --call-graph dwarf -e sched:sched_switch`.
When you analyze it with `perf script` and libunwind, you should see:
``` cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120 ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux) 7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so) 7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so) 7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so) 7f38e42569e5 __GI___libc_malloc+0x115 (inlined) 7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined) 7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined) 7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined) 7f38e424df36 _IO_new_file_xsputn+0x116 (inlined) 7f38e4242bfb __GI__IO_fwrite+0xdb (inlined) 7f38e463fa6d std::basic_streambuf<char, std::char_traits<char> >::sputn(char const*, long)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> >::_M_put(char const*, long)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::__write<char>(std::ostreambuf_iterator<char, std::char_traits<char> >, char const*, int)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double>(std::ostreambuf_iterator<c> 7f38e464bd70 std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, double) const+0x90 (inl> 7f38e464bd70 std::ostream& std::ostream::_M_insert<double>(double)+0x90 (/usr/lib/libstdc++.so.6.0.25) 563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined) 563b9cb502f7 worker()+0xb7 (/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking) 563b9cb506fb double std::__invoke_impl<double, double (*)()>(std::__invoke_other, double (*&&)())+0x2b (inlined) 563b9cb506fb std::__invoke_result<double (*)()>::type std::__invoke<double (*)()>(double (*&&)())+0x2b (inlined) 563b9cb506fb decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<double (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x2b (inlined) 563b9cb506fb std::thread::_Invoker<std::tuple<double (*)()> >::operator()()+0x2b (inlined) 563b9cb506fb std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<double>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<double (*)()> >, dou> 563b9cb506fb std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptrstd::__future_ 563b9cb507e8 std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const+0x28 (inlined) 563b9cb507e8 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)+0x28 (/ssd/milian/> 7f38e46d24fe __pthread_once_slow+0xbe (/usr/lib/libpthread-2.28.so) 563b9cb51149 __gthread_once+0xe9 (inlined) 563b9cb51149 void std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)> 563b9cb51149 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool)+0xe9 (inlined) 563b9cb51149 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<double (*)()> >&&)::{lambda()#1}::op> 563b9cb51149 void std::__invoke_impl<void, std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<double> 563b9cb51149 std::__invoke_result<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<double (*)()> >> 563b9cb51149 decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_> 563b9cb51149 std::thread::_Invoker<std::tuple<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<dou> 563b9cb51149 std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread> 7f38e45f0062 execute_native_thread_routine+0x12 (/usr/lib/libstdc++.so.6.0.25) 7f38e46caa9c start_thread+0xfc (/usr/lib/libpthread-2.28.so) 7f38e42ccb22 __GI___clone+0x42 (inlined) ```
Before this patch, using libdwfl, you would see:
``` cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120 ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux) 7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so) a041161e77950c5c [unknown] ([unknown]) ```
With this patch applied, we get a bit further in unwinding:
``` cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120 ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux) ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux) 7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so) 7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so) 7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so) 7f38e42569e5 __GI___libc_malloc+0x115 (inlined) 7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined) 7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined) 7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined) 7f38e424df36 _IO_new_file_xsputn+0x116 (inlined) 7f38e4242bfb __GI__IO_fwrite+0xdb (inlined) 7f38e463fa6d std::basic_streambuf<char, std::char_traits<char> >::sputn(char const*, long)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> >::_M_put(char const*, long)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::__write<char>(std::ostreambuf_iterator<char, std::char_traits<char> >, char const*, int)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double>(std::ostreambuf_iterator<c> 7f38e464bd70 std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, double) const+0x90 (inl> 7f38e464bd70 std::ostream& std::ostream::_M_insert<double>(double)+0x90 (/usr/lib/libstdc++.so.6.0.25) 563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined) 563b9cb502f7 worker()+0xb7 (/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking) 6eab825c1ee3e4ff [unknown] ([unknown]) ```
Note that the backtrace is still stopping too early, when compared to the nice results obtained via libunwind. It's unclear so far what the reason for that is.
Committer note:
Further comment by Milian on the thread started on the Link: tag below:
--- The remaining issue is due to a bug in elfutils:
https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html
With both patches applied, libunwind and elfutils produce the same output for the above scenario. ---
Signed-off-by: Milian Wolff milian.wolff@kdab.com Acked-by: Jiri Olsa jolsa@kernel.org Link: http://lkml.kernel.org/r/20181029141644.3907-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/unwind-libdw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c index 880e8db32484..bf5ee8906fb2 100644 --- a/tools/perf/util/unwind-libdw.c +++ b/tools/perf/util/unwind-libdw.c @@ -41,13 +41,13 @@ static int __report_module(struct addr_location *al, u64 ip, Dwarf_Addr s;
dwfl_module_info(mod, NULL, &s, NULL, NULL, NULL, NULL, NULL); - if (s != al->map->start) + if (s != al->map->start - al->map->pgoff) mod = 0; }
if (!mod) mod = dwfl_report_elf(ui->dwfl, dso->short_name, - (dso->symsrc_filename ? dso->symsrc_filename : dso->long_name), -1, al->map->start, + (dso->symsrc_filename ? dso->symsrc_filename : dso->long_name), -1, al->map->start - al->map->pgoff, false);
return mod && dwfl_addrmodule(ui->dwfl, ip) == mod ? 0 : -1;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 8208d1708b88b412ca97f50a6d951242c88cbbac upstream.
The way we allocate events works fine in most cases, except when multiple PCI devices share an ITS-visible DevID, and that one of them is trying to use MultiMSI allocation.
In that case, our allocation is not guaranteed to be zero-based anymore, and we have to make sure we allocate it on a boundary that is compatible with the PCI Multi-MSI constraints.
Fix this by allocating the full region upfront instead of iterating over the number of MSIs. MSI-X are always allocated one by one, so this shouldn't change anything on that front.
Fixes: b48ac83d6bbc2 ("irqchip: GICv3: ITS: MSI support") Cc: stable@vger.kernel.org Reported-by: Ard Biesheuvel ard.biesheuvel@linaro.org Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com [ardb: rebased onto v4.9.153, should apply cleanly onto v4.4.y as well] Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org
--- drivers/irqchip/irq-gic-v3-its.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-)
--- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1230,13 +1230,14 @@ static void its_free_device(struct its_d kfree(its_dev); }
-static int its_alloc_device_irq(struct its_device *dev, irq_hw_number_t *hwirq) +static int its_alloc_device_irq(struct its_device *dev, int nvecs, irq_hw_number_t *hwirq) { int idx;
- idx = find_first_zero_bit(dev->event_map.lpi_map, - dev->event_map.nr_lpis); - if (idx == dev->event_map.nr_lpis) + idx = bitmap_find_free_region(dev->event_map.lpi_map, + dev->event_map.nr_lpis, + get_count_order(nvecs)); + if (idx < 0) return -ENOSPC;
*hwirq = dev->event_map.lpi_base + idx; @@ -1317,20 +1318,20 @@ static int its_irq_domain_alloc(struct i int err; int i;
- for (i = 0; i < nr_irqs; i++) { - err = its_alloc_device_irq(its_dev, &hwirq); - if (err) - return err; + err = its_alloc_device_irq(its_dev, nr_irqs, &hwirq); + if (err) + return err;
- err = its_irq_gic_domain_alloc(domain, virq + i, hwirq); + for (i = 0; i < nr_irqs; i++) { + err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) return err;
irq_domain_set_hwirq_and_chip(domain, virq + i, - hwirq, &its_irq_chip, its_dev); + hwirq + i, &its_irq_chip, its_dev); pr_debug("ID:%d pID:%d vID:%d\n", - (int)(hwirq - its_dev->event_map.lpi_base), - (int) hwirq, virq + i); + (int)(hwirq + i - its_dev->event_map.lpi_base), + (int)(hwirq + i), virq + i); }
return 0;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shaokun Zhang zhangshaokun@hisilicon.com
commit 20c27a4270c775d7ed661491af8ac03264d60fc6 upstream.
__sync_icache_dcache unconditionally skips the cache maintenance for anonymous pages, under the assumption that flushing is only required in the presence of D-side aliases [see 7249b79f6b4cc ("arm64: Do not flush the D-cache for anonymous pages")].
Unfortunately, this breaks migration of anonymous pages holding self-modifying code, where userspace cannot be reasonably expected to reissue maintenance instructions in response to a migration.
This patch fixes the problem by removing the broken page_mapping(page) check from the cache syncing code, otherwise we may end up fetching and executing stale instructions from the PoU.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Cc: Mark Rutland mark.rutland@arm.com Cc: stable@vger.kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Shaokun Zhang zhangshaokun@hisilicon.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Sasha Levin sasha.levin@oracle.com Cc: Amanieu d'Antras amanieu@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/mm/flush.c | 4 ---- 1 file changed, 4 deletions(-)
--- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -70,10 +70,6 @@ void __sync_icache_dcache(pte_t pte, uns { struct page *page = pte_page(pte);
- /* no flushing needed for anonymous pages */ - if (!page_mapping(page)) - return; - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) { __flush_dcache_area(page_address(page), PAGE_SIZE << compound_order(page));
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pan Bian bianpan2016@163.com
commit 0ea295dd853e0879a9a30ab61f923c26be35b902 upstream.
The function truncate_node frees the page with f2fs_put_page. However, the page index is read after that. So, the patch reads the index before freeing the page.
Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated") Cc: stable@vger.kernel.org Signed-off-by: Pan Bian bianpan2016@163.com Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/f2fs/node.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -590,6 +590,7 @@ static void truncate_node(struct dnode_o { struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); struct node_info ni; + pgoff_t index;
get_node_info(sbi, dn->nid, &ni); if (dn->inode->i_blocks == 0) { @@ -613,10 +614,11 @@ invalidate: clear_node_page_dirty(dn->node_page); set_sbi_flag(sbi, SBI_IS_DIRTY);
+ index = dn->node_page->index; f2fs_put_page(dn->node_page, 1);
invalidate_mapping_pages(NODE_MAPPING(sbi), - dn->node_page->index, dn->node_page->index); + index, index);
dn->node_page = NULL; trace_f2fs_truncate_node(dn->inode, dn->nid, ni.blk_addr);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 9ec298cc874d08020f45791a8396e1593c3278c1 which is commit 628bd85947091830a8c4872adfd5ed1d515a9cf2 upstream.
It is not needed in the 4.4.y tree at this point in time.
Reported-by: Jan Kara jack@suse.cz Cc: Ming Lei ming.lei@redhat.com Cc: Jan Kara jack@suse.cz Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/loop.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1936,10 +1936,12 @@ static long loop_control_ioctl(struct fi break; if (lo->lo_state != Lo_unbound) { ret = -EBUSY; + mutex_unlock(&loop_ctl_mutex); break; } if (atomic_read(&lo->lo_refcnt) > 0) { ret = -EBUSY; + mutex_unlock(&loop_ctl_mutex); break; } lo->lo_disk->private_data = NULL;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 611f77199cd763e6b7c0462c2f199ddb3a089750 which is commit 0a42e99b58a208839626465af194cfe640ef9493 upstream.
It is not needed in the 4.4.y tree at this time.
Reported-by: Jan Kara jack@suse.cz Cc: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/loop.c | 39 +++++++++++++++++++-------------------- 1 file changed, 19 insertions(+), 20 deletions(-)
--- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -81,6 +81,7 @@ #include <asm/uaccess.h>
static DEFINE_IDR(loop_index_idr); +static DEFINE_MUTEX(loop_index_mutex); static DEFINE_MUTEX(loop_ctl_mutex);
static int max_part; @@ -1570,11 +1571,9 @@ static int lo_compat_ioctl(struct block_ static int lo_open(struct block_device *bdev, fmode_t mode) { struct loop_device *lo; - int err; + int err = 0;
- err = mutex_lock_killable(&loop_ctl_mutex); - if (err) - return err; + mutex_lock(&loop_index_mutex); lo = bdev->bd_disk->private_data; if (!lo) { err = -ENXIO; @@ -1583,7 +1582,7 @@ static int lo_open(struct block_device *
atomic_inc(&lo->lo_refcnt); out: - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&loop_index_mutex); return err; }
@@ -1592,11 +1591,12 @@ static void lo_release(struct gendisk *d struct loop_device *lo; int err;
- mutex_lock(&loop_ctl_mutex); + mutex_lock(&loop_index_mutex); lo = disk->private_data; if (atomic_dec_return(&lo->lo_refcnt)) - goto out_unlock; + goto unlock_index;
+ mutex_lock(&loop_ctl_mutex); if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) { /* * In autoclear mode, stop the loop thread @@ -1604,7 +1604,7 @@ static void lo_release(struct gendisk *d */ err = loop_clr_fd(lo); if (!err) - return; + goto unlock_index; } else { /* * Otherwise keep thread (if running) and config, @@ -1613,8 +1613,9 @@ static void lo_release(struct gendisk *d loop_flush(lo); }
-out_unlock: mutex_unlock(&loop_ctl_mutex); +unlock_index: + mutex_unlock(&loop_index_mutex); }
static const struct block_device_operations lo_fops = { @@ -1896,7 +1897,7 @@ static struct kobject *loop_probe(dev_t struct kobject *kobj; int err;
- mutex_lock(&loop_ctl_mutex); + mutex_lock(&loop_index_mutex); err = loop_lookup(&lo, MINOR(dev) >> part_shift); if (err < 0) err = loop_add(&lo, MINOR(dev) >> part_shift); @@ -1904,7 +1905,7 @@ static struct kobject *loop_probe(dev_t kobj = NULL; else kobj = get_disk(lo->lo_disk); - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&loop_index_mutex);
*part = 0; return kobj; @@ -1914,13 +1915,9 @@ static long loop_control_ioctl(struct fi unsigned long parm) { struct loop_device *lo; - int ret; - - ret = mutex_lock_killable(&loop_ctl_mutex); - if (ret) - return ret; + int ret = -ENOSYS;
- ret = -ENOSYS; + mutex_lock(&loop_index_mutex); switch (cmd) { case LOOP_CTL_ADD: ret = loop_lookup(&lo, parm); @@ -1934,6 +1931,7 @@ static long loop_control_ioctl(struct fi ret = loop_lookup(&lo, parm); if (ret < 0) break; + mutex_lock(&loop_ctl_mutex); if (lo->lo_state != Lo_unbound) { ret = -EBUSY; mutex_unlock(&loop_ctl_mutex); @@ -1945,6 +1943,7 @@ static long loop_control_ioctl(struct fi break; } lo->lo_disk->private_data = NULL; + mutex_unlock(&loop_ctl_mutex); idr_remove(&loop_index_idr, lo->lo_number); loop_remove(lo); break; @@ -1954,7 +1953,7 @@ static long loop_control_ioctl(struct fi break; ret = loop_add(&lo, -1); } - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&loop_index_mutex);
return ret; } @@ -2037,10 +2036,10 @@ static int __init loop_init(void) THIS_MODULE, loop_probe, NULL, NULL);
/* pre-create number of devices given by config or max_loop */ - mutex_lock(&loop_ctl_mutex); + mutex_lock(&loop_index_mutex); for (i = 0; i < nr; i++) loop_add(&lo, i); - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&loop_index_mutex);
printk(KERN_INFO "loop: module loaded\n"); return 0;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 4ee414c3b6021db621901f2697d35774926268f6 which is commit 967d1dc144b50ad005e5eecdfadfbcfb399ffff6 upstream.
It is not needed in the 4.4.y tree at this time.
Reported-by: Jan Kara jack@suse.cz Cc: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/loop.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
--- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1586,15 +1586,12 @@ out: return err; }
-static void lo_release(struct gendisk *disk, fmode_t mode) +static void __lo_release(struct loop_device *lo) { - struct loop_device *lo; int err;
- mutex_lock(&loop_index_mutex); - lo = disk->private_data; if (atomic_dec_return(&lo->lo_refcnt)) - goto unlock_index; + return;
mutex_lock(&loop_ctl_mutex); if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) { @@ -1604,7 +1601,7 @@ static void lo_release(struct gendisk *d */ err = loop_clr_fd(lo); if (!err) - goto unlock_index; + return; } else { /* * Otherwise keep thread (if running) and config, @@ -1614,7 +1611,12 @@ static void lo_release(struct gendisk *d }
mutex_unlock(&loop_ctl_mutex); -unlock_index: +} + +static void lo_release(struct gendisk *disk, fmode_t mode) +{ + mutex_lock(&loop_index_mutex); + __lo_release(disk->private_data); mutex_unlock(&loop_index_mutex); }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit 60f1bf29c0b2519989927cae640cd1f50f59dc7f upstream.
When calling smp_call_ipl_cpu() from the IPL CPU, we will try to read from pcpu_devices->lowcore. However, due to prefixing, that will result in reading from absolute address 0 on that CPU. We have to go via the actual lowcore instead.
This means that right now, we will read lc->nodat_stack == 0 and therfore work on a very wrong stack.
This BUG essentially broke rebooting under QEMU TCG (which will report a low address protection exception). And checking under KVM, it is also broken under KVM. With 1 VCPU it can be easily triggered.
:/# echo 1 > /proc/sys/kernel/sysrq :/# echo b > /proc/sysrq-trigger [ 28.476745] sysrq: SysRq : Resetting [ 28.476793] Kernel stack overflow. [ 28.476817] CPU: 0 PID: 424 Comm: sh Not tainted 5.0.0-rc1+ #13 [ 28.476820] Hardware name: IBM 2964 NE1 716 (KVM/Linux) [ 28.476826] Krnl PSW : 0400c00180000000 0000000000115c0c (pcpu_delegate+0x12c/0x140) [ 28.476861] R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 28.476863] Krnl GPRS: ffffffffffffffff 0000000000000000 000000000010dff8 0000000000000000 [ 28.476864] 0000000000000000 0000000000000000 0000000000ab7090 000003e0006efbf0 [ 28.476864] 000000000010dff8 0000000000000000 0000000000000000 0000000000000000 [ 28.476865] 000000007fffc000 0000000000730408 000003e0006efc58 0000000000000000 [ 28.476887] Krnl Code: 0000000000115bfe: 4170f000 la %r7,0(%r15) [ 28.476887] 0000000000115c02: 41f0a000 la %r15,0(%r10) [ 28.476887] #0000000000115c06: e370f0980024 stg %r7,152(%r15) [ 28.476887] >0000000000115c0c: c0e5fffff86e brasl %r14,114ce8 [ 28.476887] 0000000000115c12: 41f07000 la %r15,0(%r7) [ 28.476887] 0000000000115c16: a7f4ffa8 brc 15,115b66 [ 28.476887] 0000000000115c1a: 0707 bcr 0,%r7 [ 28.476887] 0000000000115c1c: 0707 bcr 0,%r7 [ 28.476901] Call Trace: [ 28.476902] Last Breaking-Event-Address: [ 28.476920] [<0000000000a01c4a>] arch_call_rest_init+0x22/0x80 [ 28.476927] Kernel panic - not syncing: Corrupt kernel stack, can't continue. [ 28.476930] CPU: 0 PID: 424 Comm: sh Not tainted 5.0.0-rc1+ #13 [ 28.476932] Hardware name: IBM 2964 NE1 716 (KVM/Linux) [ 28.476932] Call Trace:
Fixes: 2f859d0dad81 ("s390/smp: reduce size of struct pcpu") Cc: stable@vger.kernel.org # 4.0+ Reported-by: Cornelia Huck cohuck@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/smp.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -360,9 +360,13 @@ void smp_call_online_cpu(void (*func)(vo */ void smp_call_ipl_cpu(void (*func)(void *), void *data) { + struct _lowcore *lc = pcpu_devices->lowcore; + + if (pcpu_devices[0].address == stap()) + lc = &S390_lowcore; + pcpu_delegate(&pcpu_devices[0], func, data, - pcpu_devices->lowcore->panic_stack - - PANIC_FRAME_OFFSET + PAGE_SIZE); + lc->panic_stack - PANIC_FRAME_OFFSET + PAGE_SIZE); }
int smp_find_processor_id(u16 address)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jimmy Durand Wesolowski jdw@amazon.de
A bug has been discovered when redirecting splice output to regular files on EXT4 and tmpfs. Other filesystems might be affected. This commit fixes the issue for stable series kernel, using one of the change introduced during the rewrite and refactoring of vfs_iter_write in 4.13, specifically in the commit abbb65899aec ("fs: implement vfs_iter_write using do_iter_write").
This issue affects v4.4 and v4.9 stable series of kernels.
Without this fix for v4.4 and v4.9 stable, the following upstream commits (and their dependencies would need to be backported): * commit abbb65899aec ("fs: implement vfs_iter_write using do_iter_write") * commit 18e9710ee59c ("fs: implement vfs_iter_read using do_iter_read") * commit edab5fe38c2c ("fs: move more code into do_iter_read/do_iter_write") * commit 19c735868dd0 ("fs: remove __do_readv_writev") * commit 26c87fb7d10d ("fs: remove do_compat_readv_writev") * commit 251b42a1dc64 ("fs: remove do_readv_writev")
as well as the following dependencies: * commit bb7462b6fd64 ("vfs: use helpers for calling f_op->{read,write}_iter()") * commit 0f78d06ac1e9 ("vfs: pass type instead of fn to do_{loop,iter}_readv_writev()") * commit 7687a7a4435f ("vfs: extract common parts of {compat_,}do_readv_writev()")
In order to reduce the changes, this commit uses only the part of commit abbb65899aec ("fs: implement vfs_iter_write using do_iter_write") that fixes the issue.
This issue and the reproducer can be found on https://bugzilla.kernel.org/show_bug.cgi?id=85381
Reported-by: Richard Li richardpku@gmail.com Reported-by: Chad Miller millchad@amazon.com Reviewed-by: Stefan Nuernberger snu@amazon.de Reviewed-by: Frank Becker becke@amazon.de Signed-off-by: Jimmy Durand Wesolowski jdw@amazon.de --- fs/read_write.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/fs/read_write.c +++ b/fs/read_write.c @@ -363,8 +363,10 @@ ssize_t vfs_iter_write(struct file *file iter->type |= WRITE; ret = file->f_op->write_iter(&kiocb, iter); BUG_ON(ret == -EIOCBQUEUED); - if (ret > 0) + if (ret > 0) { *ppos = kiocb.ki_pos; + fsnotify_modify(file); + } return ret; } EXPORT_SYMBOL(vfs_iter_write);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Ahern dsahern@gmail.com
[ Upstream commit c5ee066333ebc322a24a00a743ed941a0c68617e ]
IPv6 does not consider if the socket is bound to a device when binding to an address. The result is that a socket can be bound to eth0 and then bound to the address of eth1. If the device is a VRF, the result is that a socket can only be bound to an address in the default VRF.
Resolve by considering the device if sk_bound_dev_if is set.
This problem exists from the beginning of git history.
Signed-off-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/af_inet6.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -345,6 +345,9 @@ int inet6_bind(struct socket *sock, stru err = -EINVAL; goto out_unlock; } + } + + if (sk->sk_bound_dev_if) { dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if); if (!dev) { err = -ENODEV;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacob Wen jian.w.wen@oracle.com
[ Upstream commit 91c524708de6207f59dd3512518d8a1c7b434ee3 ]
The size of L2TPv2 header with all optional fields is 14 bytes. l2tp_udp_recv_core only moves 10 bytes to the linear part of a skb. This may lead to l2tp_recv_common read data outside of a skb.
This patch make sure that there is at least 14 bytes in the linear part of a skb to meet the maximum need of l2tp_udp_recv_core and l2tp_recv_common. The minimum size of both PPP HDLC-like frame and Ethernet frame is larger than 14 bytes, so we are safe to do so.
Also remove L2TP_HDR_SIZE_NOSEQ, it is unused now.
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Suggested-by: Guillaume Nault gnault@redhat.com Signed-off-by: Jacob Wen jian.w.wen@oracle.com Acked-by: Guillaume Nault gnault@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/l2tp/l2tp_core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -83,8 +83,7 @@ #define L2TP_SLFLAG_S 0x40000000 #define L2TP_SL_SEQ_MASK 0x00ffffff
-#define L2TP_HDR_SIZE_SEQ 10 -#define L2TP_HDR_SIZE_NOSEQ 6 +#define L2TP_HDR_SIZE_MAX 14
/* Default trace flags */ #define L2TP_DEFAULT_DEBUG_FLAGS 0 @@ -860,7 +859,7 @@ static int l2tp_udp_recv_core(struct l2t __skb_pull(skb, sizeof(struct udphdr));
/* Short packet? */ - if (!pskb_may_pull(skb, L2TP_HDR_SIZE_SEQ)) { + if (!pskb_may_pull(skb, L2TP_HDR_SIZE_MAX)) { l2tp_info(tunnel, L2TP_MSG_DATA, "%s: recv short packet (len=%d)\n", tunnel->name, skb->len);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aya Levin ayal@mellanox.com
[ Upstream commit a40ded6043658444ee4dd6ee374119e4e98b33fc ]
Driver reads the query HCA capabilities without the corresponding masks. Without the correct masks, the base addresses of the queues are unaligned. In addition some reserved bits were wrongly read. Using the correct masks, ensures alignment of the base addresses and allows future firmware versions safe use of the reserved bits.
Fixes: ab9c17a009ee ("mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet") Fixes: 0ff1fb654bec ("{NET, IB}/mlx4: Add device managed flow steering firmware API") Signed-off-by: Aya Levin ayal@mellanox.com Signed-off-by: Tariq Toukan tariqt@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx4/fw.c | 75 +++++++++++++++++++------------- 1 file changed, 46 insertions(+), 29 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c @@ -1906,9 +1906,11 @@ int mlx4_QUERY_HCA(struct mlx4_dev *dev, { struct mlx4_cmd_mailbox *mailbox; __be32 *outbox; + u64 qword_field; u32 dword_field; - int err; + u16 word_field; u8 byte_field; + int err; static const u8 a0_dmfs_query_hw_steering[] = { [0] = MLX4_STEERING_DMFS_A0_DEFAULT, [1] = MLX4_STEERING_DMFS_A0_DYNAMIC, @@ -1936,19 +1938,32 @@ int mlx4_QUERY_HCA(struct mlx4_dev *dev,
/* QPC/EEC/CQC/EQC/RDMARC attributes */
- MLX4_GET(param->qpc_base, outbox, INIT_HCA_QPC_BASE_OFFSET); - MLX4_GET(param->log_num_qps, outbox, INIT_HCA_LOG_QP_OFFSET); - MLX4_GET(param->srqc_base, outbox, INIT_HCA_SRQC_BASE_OFFSET); - MLX4_GET(param->log_num_srqs, outbox, INIT_HCA_LOG_SRQ_OFFSET); - MLX4_GET(param->cqc_base, outbox, INIT_HCA_CQC_BASE_OFFSET); - MLX4_GET(param->log_num_cqs, outbox, INIT_HCA_LOG_CQ_OFFSET); - MLX4_GET(param->altc_base, outbox, INIT_HCA_ALTC_BASE_OFFSET); - MLX4_GET(param->auxc_base, outbox, INIT_HCA_AUXC_BASE_OFFSET); - MLX4_GET(param->eqc_base, outbox, INIT_HCA_EQC_BASE_OFFSET); - MLX4_GET(param->log_num_eqs, outbox, INIT_HCA_LOG_EQ_OFFSET); - MLX4_GET(param->num_sys_eqs, outbox, INIT_HCA_NUM_SYS_EQS_OFFSET); - MLX4_GET(param->rdmarc_base, outbox, INIT_HCA_RDMARC_BASE_OFFSET); - MLX4_GET(param->log_rd_per_qp, outbox, INIT_HCA_LOG_RD_OFFSET); + MLX4_GET(qword_field, outbox, INIT_HCA_QPC_BASE_OFFSET); + param->qpc_base = qword_field & ~((u64)0x1f); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_QP_OFFSET); + param->log_num_qps = byte_field & 0x1f; + MLX4_GET(qword_field, outbox, INIT_HCA_SRQC_BASE_OFFSET); + param->srqc_base = qword_field & ~((u64)0x1f); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_SRQ_OFFSET); + param->log_num_srqs = byte_field & 0x1f; + MLX4_GET(qword_field, outbox, INIT_HCA_CQC_BASE_OFFSET); + param->cqc_base = qword_field & ~((u64)0x1f); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_CQ_OFFSET); + param->log_num_cqs = byte_field & 0x1f; + MLX4_GET(qword_field, outbox, INIT_HCA_ALTC_BASE_OFFSET); + param->altc_base = qword_field; + MLX4_GET(qword_field, outbox, INIT_HCA_AUXC_BASE_OFFSET); + param->auxc_base = qword_field; + MLX4_GET(qword_field, outbox, INIT_HCA_EQC_BASE_OFFSET); + param->eqc_base = qword_field & ~((u64)0x1f); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_EQ_OFFSET); + param->log_num_eqs = byte_field & 0x1f; + MLX4_GET(word_field, outbox, INIT_HCA_NUM_SYS_EQS_OFFSET); + param->num_sys_eqs = word_field & 0xfff; + MLX4_GET(qword_field, outbox, INIT_HCA_RDMARC_BASE_OFFSET); + param->rdmarc_base = qword_field & ~((u64)0x1f); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_RD_OFFSET); + param->log_rd_per_qp = byte_field & 0x7;
MLX4_GET(dword_field, outbox, INIT_HCA_FLAGS_OFFSET); if (dword_field & (1 << INIT_HCA_DEVICE_MANAGED_FLOW_STEERING_EN)) { @@ -1967,22 +1982,21 @@ int mlx4_QUERY_HCA(struct mlx4_dev *dev, /* steering attributes */ if (param->steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) { MLX4_GET(param->mc_base, outbox, INIT_HCA_FS_BASE_OFFSET); - MLX4_GET(param->log_mc_entry_sz, outbox, - INIT_HCA_FS_LOG_ENTRY_SZ_OFFSET); - MLX4_GET(param->log_mc_table_sz, outbox, - INIT_HCA_FS_LOG_TABLE_SZ_OFFSET); - MLX4_GET(byte_field, outbox, - INIT_HCA_FS_A0_OFFSET); + MLX4_GET(byte_field, outbox, INIT_HCA_FS_LOG_ENTRY_SZ_OFFSET); + param->log_mc_entry_sz = byte_field & 0x1f; + MLX4_GET(byte_field, outbox, INIT_HCA_FS_LOG_TABLE_SZ_OFFSET); + param->log_mc_table_sz = byte_field & 0x1f; + MLX4_GET(byte_field, outbox, INIT_HCA_FS_A0_OFFSET); param->dmfs_high_steer_mode = a0_dmfs_query_hw_steering[(byte_field >> 6) & 3]; } else { MLX4_GET(param->mc_base, outbox, INIT_HCA_MC_BASE_OFFSET); - MLX4_GET(param->log_mc_entry_sz, outbox, - INIT_HCA_LOG_MC_ENTRY_SZ_OFFSET); - MLX4_GET(param->log_mc_hash_sz, outbox, - INIT_HCA_LOG_MC_HASH_SZ_OFFSET); - MLX4_GET(param->log_mc_table_sz, outbox, - INIT_HCA_LOG_MC_TABLE_SZ_OFFSET); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_MC_ENTRY_SZ_OFFSET); + param->log_mc_entry_sz = byte_field & 0x1f; + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_MC_HASH_SZ_OFFSET); + param->log_mc_hash_sz = byte_field & 0x1f; + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_MC_TABLE_SZ_OFFSET); + param->log_mc_table_sz = byte_field & 0x1f; }
/* CX3 is capable of extending CQEs/EQEs from 32 to 64 bytes */ @@ -2006,15 +2020,18 @@ int mlx4_QUERY_HCA(struct mlx4_dev *dev, /* TPT attributes */
MLX4_GET(param->dmpt_base, outbox, INIT_HCA_DMPT_BASE_OFFSET); - MLX4_GET(param->mw_enabled, outbox, INIT_HCA_TPT_MW_OFFSET); - MLX4_GET(param->log_mpt_sz, outbox, INIT_HCA_LOG_MPT_SZ_OFFSET); + MLX4_GET(byte_field, outbox, INIT_HCA_TPT_MW_OFFSET); + param->mw_enabled = byte_field >> 7; + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_MPT_SZ_OFFSET); + param->log_mpt_sz = byte_field & 0x3f; MLX4_GET(param->mtt_base, outbox, INIT_HCA_MTT_BASE_OFFSET); MLX4_GET(param->cmpt_base, outbox, INIT_HCA_CMPT_BASE_OFFSET);
/* UAR attributes */
MLX4_GET(param->uar_page_sz, outbox, INIT_HCA_UAR_PAGE_SZ_OFFSET); - MLX4_GET(param->log_uar_sz, outbox, INIT_HCA_LOG_UAR_SZ_OFFSET); + MLX4_GET(byte_field, outbox, INIT_HCA_LOG_UAR_SZ_OFFSET); + param->log_uar_sz = byte_field & 0xf;
/* phv_check enable */ MLX4_GET(byte_field, outbox, INIT_HCA_CACHELINE_SZ_OFFSET);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 63346650c1a94a92be61a57416ac88c0a47c4327 ]
sk_reset_timer() and sk_stop_timer() properly handle sock refcnt for timer function. Switching to them could fix a refcounting bug reported by syzbot.
Reported-and-tested-by: syzbot+defa700d16f1bd1b9a05@syzkaller.appspotmail.com Cc: Ralf Baechle ralf@linux-mips.org Cc: linux-hams@vger.kernel.org Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netrom/nr_timer.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
--- a/net/netrom/nr_timer.c +++ b/net/netrom/nr_timer.c @@ -53,21 +53,21 @@ void nr_start_t1timer(struct sock *sk) { struct nr_sock *nr = nr_sk(sk);
- mod_timer(&nr->t1timer, jiffies + nr->t1); + sk_reset_timer(sk, &nr->t1timer, jiffies + nr->t1); }
void nr_start_t2timer(struct sock *sk) { struct nr_sock *nr = nr_sk(sk);
- mod_timer(&nr->t2timer, jiffies + nr->t2); + sk_reset_timer(sk, &nr->t2timer, jiffies + nr->t2); }
void nr_start_t4timer(struct sock *sk) { struct nr_sock *nr = nr_sk(sk);
- mod_timer(&nr->t4timer, jiffies + nr->t4); + sk_reset_timer(sk, &nr->t4timer, jiffies + nr->t4); }
void nr_start_idletimer(struct sock *sk) @@ -75,37 +75,37 @@ void nr_start_idletimer(struct sock *sk) struct nr_sock *nr = nr_sk(sk);
if (nr->idle > 0) - mod_timer(&nr->idletimer, jiffies + nr->idle); + sk_reset_timer(sk, &nr->idletimer, jiffies + nr->idle); }
void nr_start_heartbeat(struct sock *sk) { - mod_timer(&sk->sk_timer, jiffies + 5 * HZ); + sk_reset_timer(sk, &sk->sk_timer, jiffies + 5 * HZ); }
void nr_stop_t1timer(struct sock *sk) { - del_timer(&nr_sk(sk)->t1timer); + sk_stop_timer(sk, &nr_sk(sk)->t1timer); }
void nr_stop_t2timer(struct sock *sk) { - del_timer(&nr_sk(sk)->t2timer); + sk_stop_timer(sk, &nr_sk(sk)->t2timer); }
void nr_stop_t4timer(struct sock *sk) { - del_timer(&nr_sk(sk)->t4timer); + sk_stop_timer(sk, &nr_sk(sk)->t4timer); }
void nr_stop_idletimer(struct sock *sk) { - del_timer(&nr_sk(sk)->idletimer); + sk_stop_timer(sk, &nr_sk(sk)->idletimer); }
void nr_stop_heartbeat(struct sock *sk) { - del_timer(&sk->sk_timer); + sk_stop_timer(sk, &sk->sk_timer); }
int nr_t1timer_running(struct sock *sk)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bernard Pidoux f6bvp@free.fr
[ Upstream commit b0cf029234f9b18e10703ba5147f0389c382bccc ]
When an internally generated frame is handled by rose_xmit(), rose_route_frame() is called:
if (!rose_route_frame(skb, NULL)) { dev_kfree_skb(skb); stats->tx_errors++; return NETDEV_TX_OK; }
We have the same code sequence in Net/Rom where an internally generated frame is handled by nr_xmit() calling nr_route_frame(skb, NULL). However, in this function NULL argument is tested while it is not in rose_route_frame(). Then kernel panic occurs later on when calling ax25cmp() with a NULL ax25_cb argument as reported many times and recently with syzbot.
We need to test if ax25 is NULL before using it.
Testing: Built kernel with CONFIG_ROSE=y.
Signed-off-by: Bernard Pidoux f6bvp@free.fr Acked-by: Dmitry Vyukov dvyukov@google.com Reported-by: syzbot+1a2c456a1ea08fa5b5f7@syzkaller.appspotmail.com Cc: "David S. Miller" davem@davemloft.net Cc: Ralf Baechle ralf@linux-mips.org Cc: Bernard Pidoux f6bvp@free.fr Cc: linux-hams@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rose/rose_route.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/net/rose/rose_route.c +++ b/net/rose/rose_route.c @@ -848,6 +848,7 @@ void rose_link_device_down(struct net_de
/* * Route a frame to an appropriate AX.25 connection. + * A NULL ax25_cb indicates an internally generated frame. */ int rose_route_frame(struct sk_buff *skb, ax25_cb *ax25) { @@ -865,6 +866,10 @@ int rose_route_frame(struct sk_buff *skb
if (skb->len < ROSE_MIN_LEN) return res; + + if (!ax25) + return rose_loopback_queue(skb, NULL); + frametype = skb->data[2]; lci = ((skb->data[0] << 8) & 0xF00) + ((skb->data[1] << 0) & 0x0FF); if (frametype == ROSE_CALL_REQUEST &&
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Thore mathias.thore@infinera.com
[ Upstream commit e15aa3b2b1388c399c1a2ce08550d2cc4f7e3e14 ]
After a timeout event caused by for example a broadcast storm, when the MAC and PHY are reset, the BQL TX queue needs to be reset as well. Otherwise, the device will exhibit severe performance issues even after the storm has ended.
Co-authored-by: David Gounaris david.gounaris@infinera.com Signed-off-by: Mathias Thore mathias.thore@infinera.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/freescale/ucc_geth.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/freescale/ucc_geth.c +++ b/drivers/net/ethernet/freescale/ucc_geth.c @@ -1888,6 +1888,8 @@ static void ucc_geth_free_tx(struct ucc_ u16 i, j; u8 __iomem *bd;
+ netdev_reset_queue(ugeth->ndev); + ug_info = ugeth->ug_info; uf_info = &ug_info->uf_info;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Bianconi lorenzo.bianconi@redhat.com
commit 62e7b6a57c7b9bf3c6fd99418eeec05b08a85c38 upstream.
Remove l2specific_len dependency while building l2tpv3 header or parsing the received frame since default L2-Specific Sublayer is always four bytes long and we don't need to rely on a user supplied value. Moreover in l2tp netlink code there are no sanity checks to enforce the relation between l2specific_len and l2specific_type, so sending a malformed netlink message is possible to set l2specific_type to L2TP_L2SPECTYPE_DEFAULT (or even L2TP_L2SPECTYPE_NONE) and set l2specific_len to a value greater than 4 leaking memory on the wire and sending corrupted frames.
Reviewed-by: Guillaume Nault g.nault@alphalink.fr Tested-by: Guillaume Nault g.nault@alphalink.fr Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/l2tp/l2tp_core.c | 34 ++++++++++++++++------------------ net/l2tp/l2tp_core.h | 11 +++++++++++ 2 files changed, 27 insertions(+), 18 deletions(-)
--- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -704,11 +704,9 @@ void l2tp_recv_common(struct l2tp_sessio "%s: recv data ns=%u, session nr=%u\n", session->name, ns, session->nr); } + ptr += 4; }
- /* Advance past L2-specific header, if present */ - ptr += session->l2specific_len; - if (L2TP_SKB_CB(skb)->has_seq) { /* Received a packet with sequence numbers. If we're the LNS, * check if we sre sending sequence numbers and if not, @@ -1030,21 +1028,20 @@ static int l2tp_build_l2tpv3_header(stru memcpy(bufp, &session->cookie[0], session->cookie_len); bufp += session->cookie_len; } - if (session->l2specific_len) { - if (session->l2specific_type == L2TP_L2SPECTYPE_DEFAULT) { - u32 l2h = 0; - if (session->send_seq) { - l2h = 0x40000000 | session->ns; - session->ns++; - session->ns &= 0xffffff; - l2tp_dbg(session, L2TP_MSG_SEQ, - "%s: updated ns to %u\n", - session->name, session->ns); - } + if (session->l2specific_type == L2TP_L2SPECTYPE_DEFAULT) { + u32 l2h = 0;
- *((__be32 *) bufp) = htonl(l2h); + if (session->send_seq) { + l2h = 0x40000000 | session->ns; + session->ns++; + session->ns &= 0xffffff; + l2tp_dbg(session, L2TP_MSG_SEQ, + "%s: updated ns to %u\n", + session->name, session->ns); } - bufp += session->l2specific_len; + + *((__be32 *)bufp) = htonl(l2h); + bufp += 4; } if (session->offset) bufp += session->offset; @@ -1723,7 +1720,7 @@ int l2tp_session_delete(struct l2tp_sess EXPORT_SYMBOL_GPL(l2tp_session_delete);
/* We come here whenever a session's send_seq, cookie_len or - * l2specific_len parameters are set. + * l2specific_type parameters are set. */ void l2tp_session_set_header_len(struct l2tp_session *session, int version) { @@ -1732,7 +1729,8 @@ void l2tp_session_set_header_len(struct if (session->send_seq) session->hdr_len += 4; } else { - session->hdr_len = 4 + session->cookie_len + session->l2specific_len + session->offset; + session->hdr_len = 4 + session->cookie_len + session->offset; + session->hdr_len += l2tp_get_l2specific_len(session); if (session->tunnel->encap == L2TP_ENCAPTYPE_UDP) session->hdr_len += 4; } --- a/net/l2tp/l2tp_core.h +++ b/net/l2tp/l2tp_core.h @@ -313,6 +313,17 @@ do { \ #define l2tp_session_dec_refcount(s) l2tp_session_dec_refcount_1(s) #endif
+static inline int l2tp_get_l2specific_len(struct l2tp_session *session) +{ + switch (session->l2specific_type) { + case L2TP_L2SPECTYPE_DEFAULT: + return 4; + case L2TP_L2SPECTYPE_NONE: + default: + return 0; + } +} + #define l2tp_printk(ptr, type, func, fmt, ...) \ do { \ if (((ptr)->debug) & (type)) \
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacob Wen jian.w.wen@oracle.com
[ Upstream commit 4522a70db7aa5e77526a4079628578599821b193 ]
Use pskb_may_pull() to make sure the optional fields are in skb linear parts, so we can safely read them later.
It's easy to reproduce the issue with a net driver that supports paged skb data. Just create a L2TPv3 over IP tunnel and then generates some network traffic. Once reproduced, rx err in /sys/kernel/debug/l2tp/tunnels will increase.
Changes in v4: 1. s/l2tp_v3_pull_opt/l2tp_v3_ensure_opt_in_linear/ 2. s/tunnel->version != L2TP_HDR_VER_2/tunnel->version == L2TP_HDR_VER_3/ 3. Add 'Fixes' in commit messages.
Changes in v3: 1. To keep consistency, move the code out of l2tp_recv_common. 2. Use "net" instead of "net-next", since this is a bug fix.
Changes in v2: 1. Only fix L2TPv3 to make code simple. To fix both L2TPv3 and L2TPv2, we'd better refactor l2tp_recv_common. It's complicated to do so. 2. Reloading pointers after pskb_may_pull
Fixes: f7faffa3ff8e ("l2tp: Add L2TPv3 protocol support") Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support") Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6") Signed-off-by: Jacob Wen jian.w.wen@oracle.com Acked-by: Guillaume Nault gnault@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/l2tp/l2tp_core.c | 4 ++++ net/l2tp/l2tp_core.h | 20 ++++++++++++++++++++ net/l2tp/l2tp_ip.c | 3 +++ net/l2tp/l2tp_ip6.c | 3 +++ 4 files changed, 30 insertions(+)
--- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -930,6 +930,10 @@ static int l2tp_udp_recv_core(struct l2t goto error; }
+ if (tunnel->version == L2TP_HDR_VER_3 && + l2tp_v3_ensure_opt_in_linear(session, skb, &ptr, &optr)) + goto error; + l2tp_recv_common(session, skb, ptr, optr, hdrflags, length, payload_hook);
return 0; --- a/net/l2tp/l2tp_core.h +++ b/net/l2tp/l2tp_core.h @@ -324,6 +324,26 @@ static inline int l2tp_get_l2specific_le } }
+static inline int l2tp_v3_ensure_opt_in_linear(struct l2tp_session *session, struct sk_buff *skb, + unsigned char **ptr, unsigned char **optr) +{ + int opt_len = session->peer_cookie_len + l2tp_get_l2specific_len(session); + + if (opt_len > 0) { + int off = *ptr - *optr; + + if (!pskb_may_pull(skb, off + opt_len)) + return -1; + + if (skb->data != *optr) { + *optr = skb->data; + *ptr = skb->data + off; + } + } + + return 0; +} + #define l2tp_printk(ptr, type, func, fmt, ...) \ do { \ if (((ptr)->debug) & (type)) \ --- a/net/l2tp/l2tp_ip.c +++ b/net/l2tp/l2tp_ip.c @@ -163,6 +163,9 @@ static int l2tp_ip_recv(struct sk_buff * print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, ptr, length); }
+ if (l2tp_v3_ensure_opt_in_linear(session, skb, &ptr, &optr)) + goto discard; + l2tp_recv_common(session, skb, ptr, optr, 0, skb->len, tunnel->recv_payload_hook);
return 0; --- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -174,6 +174,9 @@ static int l2tp_ip6_recv(struct sk_buff print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, ptr, length); }
+ if (l2tp_v3_ensure_opt_in_linear(session, skb, &ptr, &optr)) + goto discard; + l2tp_recv_common(session, skb, ptr, optr, 0, skb->len, tunnel->recv_payload_hook); return 0;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Shilovsky pshilov@microsoft.com
commit 8e6e72aeceaaed5aeeb1cb43d3085de7ceb14f79 upstream.
Signed-off-by: Pavel Shilovsky pshilov@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com CC: Stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/smb2pdu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -2523,8 +2523,8 @@ SMB2_query_directory(const unsigned int if (rc == -ENODATA && rsp->hdr.Status == STATUS_NO_MORE_FILES) { srch_inf->endOfSearch = true; rc = 0; - } - cifs_stats_fail_inc(tcon, SMB2_QUERY_DIRECTORY_HE); + } else + cifs_stats_fail_inc(tcon, SMB2_QUERY_DIRECTORY_HE); goto qdir_exit; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Waiman Long longman@redhat.com
commit 1dbd449c9943e3145148cc893c2461b72ba6fef0 upstream.
The nr_dentry_unused per-cpu counter tracks dentries in both the LRU lists and the shrink lists where the DCACHE_LRU_LIST bit is set.
The shrink_dcache_sb() function moves dentries from the LRU list to a shrink list and subtracts the dentry count from nr_dentry_unused. This is incorrect as the nr_dentry_unused count will also be decremented in shrink_dentry_list() via d_shrink_del().
To fix this double decrement, the decrement in the shrink_dcache_sb() function is taken out.
Fixes: 4e717f5c1083 ("list_lru: remove special case function list_lru_dispose_all." Cc: stable@kernel.org Signed-off-by: Waiman Long longman@redhat.com Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/dcache.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
--- a/fs/dcache.c +++ b/fs/dcache.c @@ -1155,15 +1155,11 @@ static enum lru_status dentry_lru_isolat */ void shrink_dcache_sb(struct super_block *sb) { - long freed; - do { LIST_HEAD(dispose);
- freed = list_lru_walk(&sb->s_dentry_lru, + list_lru_walk(&sb->s_dentry_lru, dentry_lru_isolate_shrink, &dispose, 1024); - - this_cpu_sub(nr_dentry_unused, freed); shrink_dentry_list(&dispose); cond_resched(); } while (list_lru_count(&sb->s_dentry_lru) > 0);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Koen Vandeputte koen.vandeputte@ncentric.com
commit 65dbb423cf28232fed1732b779249d6164c5999b upstream.
Originally, cns3xxx used its own functions for mapping, reading and writing config registers.
Commit 802b7c06adc7 ("ARM: cns3xxx: Convert PCI to use generic config accessors") removed the internal PCI config write function in favor of the generic one:
cns3xxx_pci_write_config() --> pci_generic_config_write()
cns3xxx_pci_write_config() expected aligned addresses, being produced by cns3xxx_pci_map_bus() while the generic one pci_generic_config_write() actually expects the real address as both the function and hardware are capable of byte-aligned writes.
This currently leads to pci_generic_config_write() writing to the wrong registers.
For instance, upon ath9k module loading:
- driver ath9k gets loaded - The driver wants to write value 0xA8 to register PCI_LATENCY_TIMER, located at 0x0D - cns3xxx_pci_map_bus() aligns the address to 0x0C - pci_generic_config_write() effectively writes 0xA8 into register 0x0C (CACHE_LINE_SIZE)
Fix the bug by removing the alignment in the cns3xxx mapping function.
Fixes: 802b7c06adc7 ("ARM: cns3xxx: Convert PCI to use generic config accessors") Signed-off-by: Koen Vandeputte koen.vandeputte@ncentric.com [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Acked-by: Krzysztof Halasa khalasa@piap.pl Acked-by: Tim Harvey tharvey@gateworks.com Acked-by: Arnd Bergmann arnd@arndb.de CC: stable@vger.kernel.org # v4.0+ CC: Bjorn Helgaas bhelgaas@google.com CC: Olof Johansson olof@lixom.net CC: Robin Leblon robin.leblon@ncentric.com CC: Rob Herring robh@kernel.org CC: Russell King linux@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/mach-cns3xxx/pcie.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/mach-cns3xxx/pcie.c +++ b/arch/arm/mach-cns3xxx/pcie.c @@ -83,7 +83,7 @@ static void __iomem *cns3xxx_pci_map_bus } else /* remote PCI bus */ base = cnspci->cfg1_regs + ((busno & 0xf) << 20);
- return base + (where & 0xffc) + (devfn << 12); + return base + where + (devfn << 12); }
static int cns3xxx_pci_read_config(struct pci_bus *bus, unsigned int devfn,
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
commit 8fac5cbdfe0f01254d9d265c6aa1a95f94f58595 upstream.
The hyp-stub is loaded by the kernel's early startup code at EL2 during boot, before KVM takes ownership later. The hyp-stub's text is part of the regular kernel text, meaning it can be kprobed.
A breakpoint in the hyp-stub causes the CPU to spin in el2_sync_invalid.
Add it to the __hyp_text.
Signed-off-by: James Morse james.morse@arm.com Cc: stable@vger.kernel.org Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/hyp-stub.S | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -26,6 +26,8 @@ #include <asm/virt.h>
.text + .pushsection .hyp.text, "ax" + .align 11
ENTRY(__hyp_stub_vectors)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
commit e74c98ca2d6ae4376cc15fa2a22483430909d96b upstream.
This reverts commit 2d29f6b96d8f80322ed2dd895bca590491c38d34.
It turns out that the fix can lead to a ~20 percent performance regression in initial writes to the page cache according to iozone. Let's revert this for now to have more time for a proper fix.
Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Bob Peterson rpeterso@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/gfs2/rgrp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -1720,9 +1720,9 @@ static int gfs2_rbm_find(struct gfs2_rbm goto next_iter; } if (ret == -E2BIG) { - n += rbm->bii - initial_bii; rbm->bii = 0; rbm->offset = 0; + n += (rbm->bii - initial_bii); goto res_covered_end_of_rgrp; } return ret;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit b3f2f3799a972d3863d0fdc2ab6287aef6ca631f ]
When the OS registers to handle events from the display off hotkey the EC will send a notification with 0x35 for every key press, independent of the backlight state.
The behavior of this key on Windows, with the ATKACPI driver from Asus installed, is turning off the backlight of all connected displays with a fading effect, and any cursor input or key press turning the backlight back on. The key press or cursor input that wakes up the display is also passed through to the application under the cursor or under focus.
The key that matches this behavior the closest is KEY_SCREENLOCK.
Signed-off-by: João Paulo Rechi Vita jprvita@endlessm.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/asus-nb-wmi.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/platform/x86/asus-nb-wmi.c +++ b/drivers/platform/x86/asus-nb-wmi.c @@ -341,6 +341,7 @@ static const struct key_entry asus_nb_wm { KE_KEY, 0x32, { KEY_MUTE } }, { KE_KEY, 0x33, { KEY_DISPLAYTOGGLE } }, /* LCD on */ { KE_KEY, 0x34, { KEY_DISPLAY_OFF } }, /* LCD off */ + { KE_KEY, 0x35, { KEY_SCREENLOCK } }, { KE_KEY, 0x40, { KEY_PREVIOUSSONG } }, { KE_KEY, 0x41, { KEY_NEXTSONG } }, { KE_KEY, 0x43, { KEY_STOPCD } }, /* Stop/Eject */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 71b12beaf12f21a53bfe100795d0797f1035b570 ]
According to Asus firmware engineers, the meaning of these codes is only to notify the OS that the screen brightness has been turned on/off by the EC. This does not match the meaning of KEY_DISPLAYTOGGLE / KEY_DISPLAY_OFF, where userspace is expected to change the display brightness.
Signed-off-by: João Paulo Rechi Vita jprvita@endlessm.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/asus-nb-wmi.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/platform/x86/asus-nb-wmi.c +++ b/drivers/platform/x86/asus-nb-wmi.c @@ -339,8 +339,6 @@ static const struct key_entry asus_nb_wm { KE_KEY, 0x30, { KEY_VOLUMEUP } }, { KE_KEY, 0x31, { KEY_VOLUMEDOWN } }, { KE_KEY, 0x32, { KEY_MUTE } }, - { KE_KEY, 0x33, { KEY_DISPLAYTOGGLE } }, /* LCD on */ - { KE_KEY, 0x34, { KEY_DISPLAY_OFF } }, /* LCD off */ { KE_KEY, 0x35, { KEY_SCREENLOCK } }, { KE_KEY, 0x40, { KEY_PREVIOUSSONG } }, { KE_KEY, 0x41, { KEY_NEXTSONG } },
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren stefan.wahren@i2se.com
commit 2bd44dadd5bfb4135162322fd0b45a174d4ad5bf upstream.
We need to handle mmc_of_parse() errors during probe.
This finally fixes the wifi regression on Raspberry Pi 3 series. In error case the wifi chip was permanently in reset because of the power sequence depending on the deferred probe of the GPIO expander.
Fixes: b580c52d58d9 ("mmc: sdhci-iproc: add IPROC SDHCI driver") Cc: stable@vger.kernel.org Signed-off-by: Stefan Wahren stefan.wahren@i2se.com Acked-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci-iproc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/mmc/host/sdhci-iproc.c +++ b/drivers/mmc/host/sdhci-iproc.c @@ -217,7 +217,10 @@ static int sdhci_iproc_probe(struct plat
iproc_host->data = iproc_data;
- mmc_of_parse(host->mmc); + ret = mmc_of_parse(host->mmc); + if (ret) + goto err; + sdhci_get_of_property(pdev);
/* Enable EMMC 1/8V DDR capable */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrei Vagin avagin@gmail.com
commit 8fb335e078378c8426fabeed1ebee1fbf915690c upstream.
Currently, exit_ptrace() adds all ptraced tasks in a dead list, then zap_pid_ns_processes() waits on all tasks in a current pidns, and only then are tasks from the dead list released.
zap_pid_ns_processes() can get stuck on waiting tasks from the dead list. In this case, we will have one unkillable process with one or more dead children.
Thanks to Oleg for the advice to release tasks in find_child_reaper().
Link: http://lkml.kernel.org/r/20190110175200.12442-1-avagin@gmail.com Fixes: 7c8bd2322c7f ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()") Signed-off-by: Andrei Vagin avagin@gmail.com Signed-off-by: Oleg Nesterov oleg@redhat.com Cc: "Eric W. Biederman" ebiederm@xmission.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/exit.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/kernel/exit.c +++ b/kernel/exit.c @@ -450,12 +450,14 @@ static struct task_struct *find_alive_th return NULL; }
-static struct task_struct *find_child_reaper(struct task_struct *father) +static struct task_struct *find_child_reaper(struct task_struct *father, + struct list_head *dead) __releases(&tasklist_lock) __acquires(&tasklist_lock) { struct pid_namespace *pid_ns = task_active_pid_ns(father); struct task_struct *reaper = pid_ns->child_reaper; + struct task_struct *p, *n;
if (likely(reaper != father)) return reaper; @@ -471,6 +473,12 @@ static struct task_struct *find_child_re panic("Attempted to kill init! exitcode=0x%08x\n", father->signal->group_exit_code ?: father->exit_code); } + + list_for_each_entry_safe(p, n, dead, ptrace_entry) { + list_del_init(&p->ptrace_entry); + release_task(p); + } + zap_pid_ns_processes(pid_ns); write_lock_irq(&tasklist_lock);
@@ -557,7 +565,7 @@ static void forget_original_parent(struc exit_ptrace(father, dead);
/* Can drop and reacquire tasklist_lock */ - reaper = find_child_reaper(father); + reaper = find_child_reaper(father, dead); if (list_empty(&father->children)) return;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shakeel Butt shakeelb@google.com
commit cefc7ef3c87d02fc9307835868ff721ea12cc597 upstream.
Syzbot instance running on upstream kernel found a use-after-free bug in oom_kill_process. On further inspection it seems like the process selected to be oom-killed has exited even before reaching read_lock(&tasklist_lock) in oom_kill_process(). More specifically the tsk->usage is 1 which is due to get_task_struct() in oom_evaluate_task() and the put_task_struct within for_each_thread() frees the tsk and for_each_thread() tries to access the tsk. The easiest fix is to do get/put across the for_each_thread() on the selected task.
Now the next question is should we continue with the oom-kill as the previously selected task has exited? However before adding more complexity and heuristics, let's answer why we even look at the children of oom-kill selected task? The select_bad_process() has already selected the worst process in the system/memcg. Due to race, the selected process might not be the worst at the kill time but does that matter? The userspace can use the oom_score_adj interface to prefer children to be killed before the parent. I looked at the history but it seems like this is there before git history.
Link: http://lkml.kernel.org/r/20190121215850.221745-1-shakeelb@google.com Reported-by: syzbot+7fbbfa368521945f0e3d@syzkaller.appspotmail.com Fixes: 6b0c81b3be11 ("mm, oom: reduce dependency on tasklist_lock") Signed-off-by: Shakeel Butt shakeelb@google.com Reviewed-by: Roman Gushchin guro@fb.com Acked-by: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/oom_kill.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -544,6 +544,13 @@ void oom_kill_process(struct oom_control * still freeing memory. */ read_lock(&tasklist_lock); + + /* + * The task 'p' might have already exited before reaching here. The + * put_task_struct() will free task_struct 'p' while the loop still try + * to access the field of 'p', so, get an extra reference. + */ + get_task_struct(p); for_each_thread(p, t) { list_for_each_entry(child, &t->children, sibling) { unsigned int child_points; @@ -563,6 +570,7 @@ void oom_kill_process(struct oom_control } } } + put_task_struct(p); read_unlock(&tasklist_lock);
p = find_lock_task_mm(victim);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara paulo@paulo.ac
commit 28eb24ff75c5ac130eb326b3b4d0dcecfc0f427d upstream.
In case a hostname resolves to a different IP address (e.g. long running mounts), make sure to resolve it every time prior to calling generic_ip_connect() in reconnect.
Suggested-by: Steve French stfrench@microsoft.com Signed-off-by: Paulo Alcantara palcantara@suse.de Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Pavel Shilovsky pshilov@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/connect.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+)
--- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -48,6 +48,7 @@ #include "cifs_unicode.h" #include "cifs_debug.h" #include "cifs_fs_sb.h" +#include "dns_resolve.h" #include "ntlmssp.h" #include "nterr.h" #include "rfc1002pdu.h" @@ -304,6 +305,53 @@ static int cifs_setup_volume_info(struct const char *devname);
/* + * Resolve hostname and set ip addr in tcp ses. Useful for hostnames that may + * get their ip addresses changed at some point. + * + * This should be called with server->srv_mutex held. + */ +#ifdef CONFIG_CIFS_DFS_UPCALL +static int reconn_set_ipaddr(struct TCP_Server_Info *server) +{ + int rc; + int len; + char *unc, *ipaddr = NULL; + + if (!server->hostname) + return -EINVAL; + + len = strlen(server->hostname) + 3; + + unc = kmalloc(len, GFP_KERNEL); + if (!unc) { + cifs_dbg(FYI, "%s: failed to create UNC path\n", __func__); + return -ENOMEM; + } + snprintf(unc, len, "\\%s", server->hostname); + + rc = dns_resolve_server_name_to_ip(unc, &ipaddr); + kfree(unc); + + if (rc < 0) { + cifs_dbg(FYI, "%s: failed to resolve server part of %s to IP: %d\n", + __func__, server->hostname, rc); + return rc; + } + + rc = cifs_convert_address((struct sockaddr *)&server->dstaddr, ipaddr, + strlen(ipaddr)); + kfree(ipaddr); + + return !rc ? -1 : 0; +} +#else +static inline int reconn_set_ipaddr(struct TCP_Server_Info *server) +{ + return 0; +} +#endif + +/* * cifs tcp session reconnection * * mark tcp session as reconnecting so temporarily locked @@ -400,6 +448,11 @@ cifs_reconnect(struct TCP_Server_Info *s rc = generic_ip_connect(server); if (rc) { cifs_dbg(FYI, "reconnect error %d\n", rc); + rc = reconn_set_ipaddr(server); + if (rc) { + cifs_dbg(FYI, "%s: failed to resolve hostname: %d\n", + __func__, rc); + } mutex_unlock(&server->srv_mutex); msleep(3000); } else {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Herrenschmidt benh@kernel.crashing.org
commit 726e41097920a73e4c7c33385dcc0debb1281e18 upstream.
For devices with a class, we create a "glue" directory between the parent device and the new device with the class name.
This directory is never "explicitely" removed when empty however, this is left to the implicit sysfs removal done by kobject_release() when the object loses its last reference via kobject_put().
This is problematic because as long as it's not been removed from sysfs, it is still present in the class kset and in sysfs directory structure.
The presence in the class kset exposes a use after free bug fixed by the previous patch, but the presence in sysfs means that until the kobject is released, which can take a while (especially with kobject debugging), any attempt at re-creating such as binding a new device for that class/parent pair, will result in a sysfs duplicate file name error.
This fixes it by instead doing an explicit kobject_del() when the glue dir is empty, by keeping track of the number of child devices of the gluedir.
This is made easy by the fact that all glue dir operations are done with a global mutex, and there's already a function (cleanup_glue_dir) called in all the right places taking that mutex that can be enhanced for this. It appears that this was in fact the intent of the function, but the implementation was wrong.
Signed-off-by: Benjamin Herrenschmidt benh@kernel.crashing.org Acked-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Zubin Mithra zsm@chromium.org Cc: Guenter Roeck groeck@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/base/core.c | 2 ++ include/linux/kobject.h | 17 +++++++++++++++++ 2 files changed, 19 insertions(+)
--- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -862,6 +862,8 @@ static void cleanup_glue_dir(struct devi return;
mutex_lock(&gdp_mutex); + if (!kobject_has_children(glue_dir)) + kobject_del(glue_dir); kobject_put(glue_dir); mutex_unlock(&gdp_mutex); } --- a/include/linux/kobject.h +++ b/include/linux/kobject.h @@ -113,6 +113,23 @@ extern void kobject_put(struct kobject * extern const void *kobject_namespace(struct kobject *kobj); extern char *kobject_get_path(struct kobject *kobj, gfp_t flag);
+/** + * kobject_has_children - Returns whether a kobject has children. + * @kobj: the object to test + * + * This will return whether a kobject has other kobjects as children. + * + * It does NOT account for the presence of attribute files, only sub + * directories. It also assumes there is no concurrent addition or + * removal of such children, and thus relies on external locking. + */ +static inline bool kobject_has_children(struct kobject *kobj) +{ + WARN_ON_ONCE(atomic_read(&kobj->kref.refcount) == 0); + + return kobj->sd && kobj->sd->dir.subdirs; +} + struct kobj_type { void (*release)(struct kobject *kobj); const struct sysfs_ops *sysfs_ops;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit e0a352fabce61f730341d119fbedf71ffdb8663f upstream.
We had a race in the old balloon compaction code before b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature") refactored it that became visible after backporting 195a8c43e93d ("virtio-balloon: deflate via a page list") without the refactoring.
The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") till b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature"). d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") was backported to 3.12, so the broken kernels are stable kernels [3.12 - 4.7].
There was a subtle race between dropping the page lock of the newpage in __unmap_and_move() and checking for __is_movable_balloon_page(newpage).
Just after dropping this page lock, virtio-balloon could go ahead and deflate the newpage, effectively dequeueing it and clearing PageBalloon, in turn making __is_movable_balloon_page(newpage) fail.
This resulted in dropping the reference of the newpage via putback_lru_page(newpage) instead of put_page(newpage), leading to page->lru getting modified and a !LRU page ending up in the LRU lists. With 195a8c43e93d ("virtio-balloon: deflate via a page list") backported, one would suddenly get corrupted lists in release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0 - list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
Nowadays this race is no longer possible, but it is hidden behind very ugly handling of __ClearPageMovable() and __PageMovable().
__ClearPageMovable() will not make __PageMovable() fail, only PageMovable(). So the new check (__PageMovable(newpage)) will still hold even after newpage was dequeued by virtio-balloon.
If anybody would ever change that special handling, the BUG would be introduced again. So instead, make it explicit and use the information of the original isolated page before migration.
This patch can be backported fairly easy to stable kernels (in contrast to the refactoring).
Link: http://lkml.kernel.org/r/20190129233217.10747-1-david@redhat.com Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") Signed-off-by: David Hildenbrand david@redhat.com Reported-by: Vratislav Bendel vbendel@redhat.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: Rafael Aquini aquini@redhat.com Cc: Mel Gorman mgorman@techsingularity.net Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Michal Hocko mhocko@suse.com Cc: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Jan Kara jack@suse.cz Cc: Andrea Arcangeli aarcange@redhat.com Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Matthew Wilcox willy@infradead.org Cc: Vratislav Bendel vbendel@redhat.com Cc: Rafael Aquini aquini@redhat.com Cc: Konstantin Khlebnikov k.khlebnikov@samsung.com Cc: Minchan Kim minchan@kernel.org Cc: stable@vger.kernel.org [3.12 - 4.7] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: David Hildenbrand david@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/migrate.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/migrate.c +++ b/mm/migrate.c @@ -936,6 +936,7 @@ static ICE_noinline int unmap_and_move(n int rc = MIGRATEPAGE_SUCCESS; int *result = NULL; struct page *newpage; + bool is_lru = !isolated_balloon_page(page);
newpage = get_new_page(page, private, &result); if (!newpage) @@ -983,11 +984,13 @@ out: /* * If migration was not successful and there's a freeing callback, use * it. Otherwise, putback_lru_page() will drop the reference grabbed - * during isolation. + * during isolation. Use the old state of the isolated source page to + * determine if we migrated a LRU page. newpage was already unlocked + * and possibly modified by its owner - don't rely on the page state. */ if (put_new_page) put_new_page(newpage, private); - else if (unlikely(__is_movable_balloon_page(newpage))) { + else if (rc == MIGRATEPAGE_SUCCESS && unlikely(!is_lru)) { /* drop our reference, page already in the balloon */ put_page(newpage); } else
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
commit 79f546a696bff2590169fb5684e23d65f4d9f591 upstream.
We recently had an oops reported on a 4.14 kernel in xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage and so the m_perag_tree lookup walked into lala land. It produces an oops down this path during the failed mount:
radix_tree_gang_lookup_tag+0xc4/0x130 xfs_perag_get_tag+0x37/0xf0 xfs_reclaim_inodes_count+0x32/0x40 xfs_fs_nr_cached_objects+0x11/0x20 super_cache_count+0x35/0xc0 shrink_slab.part.66+0xb1/0x370 shrink_node+0x7e/0x1a0 try_to_free_pages+0x199/0x470 __alloc_pages_slowpath+0x3a1/0xd20 __alloc_pages_nodemask+0x1c3/0x200 cache_grow_begin+0x20b/0x2e0 fallback_alloc+0x160/0x200 kmem_cache_alloc+0x111/0x4e0
The problem is that the superblock shrinker is running before the filesystem structures it depends on have been fully set up. i.e. the shrinker is registered in sget(), before ->fill_super() has been called, and the shrinker can call into the filesystem before fill_super() does it's setup work. Essentially we are exposed to both use-after-free and use-before-initialisation bugs here.
To fix this, add a check for the SB_BORN flag in super_cache_count. In general, this flag is not set until ->fs_mount() completes successfully, so we know that it is set after the filesystem setup has completed. This matches the trylock_super() behaviour which will not let super_cache_scan() run if SB_BORN is not set, and hence will not allow the superblock shrinker from entering the filesystem while it is being set up or after it has failed setup and is being torn down.
Cc: stable@kernel.org Signed-Off-By: Dave Chinner dchinner@redhat.com Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Aaron Lu aaron.lu@linux.alibaba.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/super.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-)
--- a/fs/super.c +++ b/fs/super.c @@ -118,13 +118,23 @@ static unsigned long super_cache_count(s sb = container_of(shrink, struct super_block, s_shrink);
/* - * Don't call trylock_super as it is a potential - * scalability bottleneck. The counts could get updated - * between super_cache_count and super_cache_scan anyway. - * Call to super_cache_count with shrinker_rwsem held - * ensures the safety of call to list_lru_shrink_count() and - * s_op->nr_cached_objects(). + * We don't call trylock_super() here as it is a scalability bottleneck, + * so we're exposed to partial setup state. The shrinker rwsem does not + * protect filesystem operations backing list_lru_shrink_count() or + * s_op->nr_cached_objects(). Counts can change between + * super_cache_count and super_cache_scan, so we really don't need locks + * here. + * + * However, if we are currently mounting the superblock, the underlying + * filesystem might be in a state of partial construction and hence it + * is dangerous to access it. trylock_super() uses a MS_BORN check to + * avoid this situation, so do the same here. The memory barrier is + * matched with the one in mount_fs() as we don't hold locks here. */ + if (!(sb->s_flags & MS_BORN)) + return 0; + smp_rmb(); + if (sb->s_op && sb->s_op->nr_cached_objects) total_objects = sb->s_op->nr_cached_objects(sb, sc);
@@ -1133,6 +1143,14 @@ mount_fs(struct file_system_type *type, sb = root->d_sb; BUG_ON(!sb); WARN_ON(!sb->s_bdi); + + /* + * Write barrier is for super_cache_count(). We place it before setting + * MS_BORN as the data dependency between the two functions is the + * superblock structure contents that we just set up, not the MS_BORN + * flag. + */ + smp_wmb(); sb->s_flags |= MS_BORN;
error = security_sb_kern_mount(sb, flags, secdata);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Oskolkov posk@google.com
commit 7969e5c40dfd04799d4341f1b7cd266b6e47f227 upstream.
This behavior is required in IPv6, and there is little need to tolerate overlapping fragments in IPv4. This change simplifies the code and eliminates potential DDoS attack vectors.
Tested: ran ip_defrag selftest (not yet available uptream).
Suggested-by: David S. Miller davem@davemloft.net Signed-off-by: Peter Oskolkov posk@google.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Florian Westphal fw@strlen.de Acked-by: Stephen Hemminger stephen@networkplumber.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/uapi/linux/snmp.h | 1 net/ipv4/ip_fragment.c | 73 ++++++++++++---------------------------------- net/ipv4/proc.c | 1 3 files changed, 22 insertions(+), 53 deletions(-)
--- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -55,6 +55,7 @@ enum IPSTATS_MIB_ECT1PKTS, /* InECT1Pkts */ IPSTATS_MIB_ECT0PKTS, /* InECT0Pkts */ IPSTATS_MIB_CEPKTS, /* InCEPkts */ + IPSTATS_MIB_REASM_OVERLAPS, /* ReasmOverlaps */ __IPSTATS_MIB_MAX };
--- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -342,6 +342,7 @@ static int ip_frag_reinit(struct ipq *qp /* Add new segment to existing queue. */ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) { + struct net *net = container_of(qp->q.net, struct net, ipv4.frags); struct sk_buff *prev, *next; struct net_device *dev; unsigned int fragsize; @@ -422,60 +423,22 @@ static int ip_frag_queue(struct ipq *qp, }
found: - /* We found where to put this one. Check for overlap with - * preceding fragment, and, if needed, align things so that - * any overlaps are eliminated. + /* RFC5722, Section 4, amended by Errata ID : 3089 + * When reassembling an IPv6 datagram, if + * one or more its constituent fragments is determined to be an + * overlapping fragment, the entire datagram (and any constituent + * fragments) MUST be silently discarded. + * + * We do the same here for IPv4. */ - if (prev) { - int i = (FRAG_CB(prev)->offset + prev->len) - offset; - - if (i > 0) { - offset += i; - err = -EINVAL; - if (end <= offset) - goto err; - err = -ENOMEM; - if (!pskb_pull(skb, i)) - goto err; - if (skb->ip_summed != CHECKSUM_UNNECESSARY) - skb->ip_summed = CHECKSUM_NONE; - } - } - - err = -ENOMEM; - - while (next && FRAG_CB(next)->offset < end) { - int i = end - FRAG_CB(next)->offset; /* overlap is 'i' bytes */ - - if (i < next->len) { - /* Eat head of the next overlapped fragment - * and leave the loop. The next ones cannot overlap. - */ - if (!pskb_pull(next, i)) - goto err; - FRAG_CB(next)->offset += i; - qp->q.meat -= i; - if (next->ip_summed != CHECKSUM_UNNECESSARY) - next->ip_summed = CHECKSUM_NONE; - break; - } else { - struct sk_buff *free_it = next; - - /* Old fragment is completely overridden with - * new one drop it. - */ - next = next->next; - - if (prev) - prev->next = next; - else - qp->q.fragments = next; - - qp->q.meat -= free_it->len; - sub_frag_mem_limit(qp->q.net, free_it->truesize); - kfree_skb(free_it); - } - } + /* Is there an overlap with the previous fragment? */ + if (prev && + (FRAG_CB(prev)->offset + prev->len) > offset) + goto discard_qp; + + /* Is there an overlap with the next fragment? */ + if (next && FRAG_CB(next)->offset < end) + goto discard_qp;
FRAG_CB(skb)->offset = offset;
@@ -522,6 +485,10 @@ found: skb_dst_drop(skb); return -EINPROGRESS;
+discard_qp: + ipq_kill(qp); + err = -EINVAL; + IP_INC_STATS(net, IPSTATS_MIB_REASM_OVERLAPS); err: kfree_skb(skb); return err; --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -132,6 +132,7 @@ static const struct snmp_mib snmp4_ipext SNMP_MIB_ITEM("InECT1Pkts", IPSTATS_MIB_ECT1PKTS), SNMP_MIB_ITEM("InECT0Pkts", IPSTATS_MIB_ECT0PKTS), SNMP_MIB_ITEM("InCEPkts", IPSTATS_MIB_CEPKTS), + SNMP_MIB_ITEM("ReasmOverlaps", IPSTATS_MIB_REASM_OVERLAPS), SNMP_MIB_SENTINEL };
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Oskolkov posk@google.com
commit 385114dec8a49b5e5945e77ba7de6356106713f4 upstream.
Tested: see the next patch is the series.
Suggested-by: Eric Dumazet edumazet@google.com Signed-off-by: Peter Oskolkov posk@google.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Florian Westphal fw@strlen.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/skbuff.h | 2 +- net/core/skbuff.c | 6 +++++- 2 files changed, 6 insertions(+), 2 deletions(-)
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2273,7 +2273,7 @@ static inline void __skb_queue_purge(str kfree_skb(skb); }
-void skb_rbtree_purge(struct rb_root *root); +unsigned int skb_rbtree_purge(struct rb_root *root);
void *netdev_alloc_frag(unsigned int fragsz);
--- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2380,23 +2380,27 @@ EXPORT_SYMBOL(skb_queue_purge); /** * skb_rbtree_purge - empty a skb rbtree * @root: root of the rbtree to empty + * Return value: the sum of truesizes of all purged skbs. * * Delete all buffers on an &sk_buff rbtree. Each buffer is removed from * the list and one reference dropped. This function does not take * any lock. Synchronization should be handled by the caller (e.g., TCP * out-of-order queue is protected by the socket lock). */ -void skb_rbtree_purge(struct rb_root *root) +unsigned int skb_rbtree_purge(struct rb_root *root) { struct rb_node *p = rb_first(root); + unsigned int sum = 0;
while (p) { struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode);
p = rb_next(p); rb_erase(&skb->rbnode, root); + sum += skb->truesize; kfree_skb(skb); } + return sum; }
/**
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 399d1404be660d355192ff4df5ccc3f4159ec1e4 upstream.
This refactors ip_expire() since one indentation level is removed.
Note: in the future, we should try hard to avoid the skb_clone() since this is a serious performance cost. Under DDOS, the ICMP message wont be sent because of rate limits.
Fact that ip6_expire_frag_queue() does not use skb_clone() is disturbing too. Presumably IPv6 should have the same issue than the one we fixed in commit ec4fbd64751d ("inet: frag: release spinlock before calling icmp_send()")
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/net/inet_frag.h | 5 --- net/ipv4/ip_fragment.c | 66 +++++++++++++++++++++++------------------------- net/ipv6/reassembly.c | 4 -- 3 files changed, 32 insertions(+), 43 deletions(-)
--- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -123,11 +123,6 @@ static inline void inet_frag_put(struct inet_frag_destroy(q, f); }
-static inline bool inet_frag_evicting(struct inet_frag_queue *q) -{ - return !hlist_unhashed(&q->list_evictor); -} - /* Memory Tracking Functions. */
static inline int frag_mem_limit(struct netns_frags *nf) --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -194,8 +194,11 @@ static bool frag_expire_skip_icmp(u32 us */ static void ip_expire(unsigned long arg) { - struct ipq *qp; + struct sk_buff *clone, *head; + const struct iphdr *iph; struct net *net; + struct ipq *qp; + int err;
qp = container_of((struct inet_frag_queue *) arg, struct ipq, q); net = container_of(qp->q.net, struct net, ipv4.frags); @@ -209,45 +212,40 @@ static void ip_expire(unsigned long arg) ipq_kill(qp); IP_INC_STATS_BH(net, IPSTATS_MIB_REASMFAILS);
- if (!inet_frag_evicting(&qp->q)) { - struct sk_buff *clone, *head = qp->q.fragments; - const struct iphdr *iph; - int err; - - IP_INC_STATS_BH(net, IPSTATS_MIB_REASMTIMEOUT); + head = qp->q.fragments;
- if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !qp->q.fragments) - goto out; + IP_INC_STATS_BH(net, IPSTATS_MIB_REASMTIMEOUT);
- head->dev = dev_get_by_index_rcu(net, qp->iif); - if (!head->dev) - goto out; + if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !head) + goto out;
+ head->dev = dev_get_by_index_rcu(net, qp->iif); + if (!head->dev) + goto out;
- /* skb has no dst, perform route lookup again */ - iph = ip_hdr(head); - err = ip_route_input_noref(head, iph->daddr, iph->saddr, + /* skb has no dst, perform route lookup again */ + iph = ip_hdr(head); + err = ip_route_input_noref(head, iph->daddr, iph->saddr, iph->tos, head->dev); - if (err) - goto out; + if (err) + goto out; + + /* Only an end host needs to send an ICMP + * "Fragment Reassembly Timeout" message, per RFC792. + */ + if (frag_expire_skip_icmp(qp->user) && + (skb_rtable(head)->rt_type != RTN_LOCAL)) + goto out; + + clone = skb_clone(head, GFP_ATOMIC);
- /* Only an end host needs to send an ICMP - * "Fragment Reassembly Timeout" message, per RFC792. - */ - if (frag_expire_skip_icmp(qp->user) && - (skb_rtable(head)->rt_type != RTN_LOCAL)) - goto out; - - clone = skb_clone(head, GFP_ATOMIC); - - /* Send an ICMP "Fragment Reassembly Timeout" message. */ - if (clone) { - spin_unlock(&qp->q.lock); - icmp_send(clone, ICMP_TIME_EXCEEDED, - ICMP_EXC_FRAGTIME, 0); - consume_skb(clone); - goto out_rcu_unlock; - } + /* Send an ICMP "Fragment Reassembly Timeout" message. */ + if (clone) { + spin_unlock(&qp->q.lock); + icmp_send(clone, ICMP_TIME_EXCEEDED, + ICMP_EXC_FRAGTIME, 0); + consume_skb(clone); + goto out_rcu_unlock; } out: spin_unlock(&qp->q.lock); --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -146,10 +146,6 @@ void ip6_expire_frag_queue(struct net *n goto out_rcu_unlock;
IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMFAILS); - - if (inet_frag_evicting(&fq->q)) - goto out_rcu_unlock; - IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);
/* Don't send error if the first segment did not arrive. */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Oskolkov posk@google.com
commit fa0f527358bd900ef92f925878ed6bfbd51305cc upstream.
Similar to TCP OOO RX queue, it makes sense to use rb trees to store IP fragments, so that OOO fragments are inserted faster.
Tested:
- a follow-up patch contains a rather comprehensive ip defrag self-test (functional) - ran neper `udp_stream -c -H <host> -F 100 -l 300 -T 20`: netstat --statistics Ip: 282078937 total packets received 0 forwarded 0 incoming packets discarded 946760 incoming packets delivered 18743456 requests sent out 101 fragments dropped after timeout 282077129 reassemblies required 944952 packets reassembled ok 262734239 packet reassembles failed (The numbers/stats above are somewhat better re: reassemblies vs a kernel without this patchset. More comprehensive performance testing TBD).
Reported-by: Jann Horn jannh@google.com Reported-by: Juha-Matti Tilli juha-matti.tilli@iki.fi Suggested-by: Eric Dumazet edumazet@google.com Signed-off-by: Peter Oskolkov posk@google.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Florian Westphal fw@strlen.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/skbuff.h | 2 include/net/inet_frag.h | 3 net/ipv4/inet_fragment.c | 14 +- net/ipv4/ip_fragment.c | 190 +++++++++++++++++--------------- net/ipv6/netfilter/nf_conntrack_reasm.c | 1 net/ipv6/reassembly.c | 1 6 files changed, 120 insertions(+), 91 deletions(-)
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -556,7 +556,7 @@ struct sk_buff { struct skb_mstamp skb_mstamp; }; }; - struct rb_node rbnode; /* used in netem & tcp stack */ + struct rb_node rbnode; /* used in netem, ip4 defrag, and tcp stack */ }; struct sock *sk; struct net_device *dev; --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -45,7 +45,8 @@ struct inet_frag_queue { struct timer_list timer; struct hlist_node list; atomic_t refcnt; - struct sk_buff *fragments; + struct sk_buff *fragments; /* Used in IPv6. */ + struct rb_root rb_fragments; /* Used in IPv4. */ struct sk_buff *fragments_tail; ktime_t stamp; int len; --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -306,12 +306,16 @@ void inet_frag_destroy(struct inet_frag_ /* Release all fragment data. */ fp = q->fragments; nf = q->net; - while (fp) { - struct sk_buff *xp = fp->next; + if (fp) { + do { + struct sk_buff *xp = fp->next;
- sum_truesize += fp->truesize; - frag_kfree_skb(nf, f, fp); - fp = xp; + sum_truesize += fp->truesize; + kfree_skb(fp); + fp = xp; + } while (fp); + } else { + sum_truesize = skb_rbtree_purge(&q->rb_fragments); } sum = sum_truesize + f->qsize;
--- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -194,7 +194,7 @@ static bool frag_expire_skip_icmp(u32 us */ static void ip_expire(unsigned long arg) { - struct sk_buff *clone, *head; + struct sk_buff *head = NULL; const struct iphdr *iph; struct net *net; struct ipq *qp; @@ -211,14 +211,31 @@ static void ip_expire(unsigned long arg)
ipq_kill(qp); IP_INC_STATS_BH(net, IPSTATS_MIB_REASMFAILS); - - head = qp->q.fragments; - IP_INC_STATS_BH(net, IPSTATS_MIB_REASMTIMEOUT);
- if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !head) + if (!qp->q.flags & INET_FRAG_FIRST_IN) goto out;
+ /* sk_buff::dev and sk_buff::rbnode are unionized. So we + * pull the head out of the tree in order to be able to + * deal with head->dev. + */ + if (qp->q.fragments) { + head = qp->q.fragments; + qp->q.fragments = head->next; + } else { + head = skb_rb_first(&qp->q.rb_fragments); + if (!head) + goto out; + rb_erase(&head->rbnode, &qp->q.rb_fragments); + memset(&head->rbnode, 0, sizeof(head->rbnode)); + barrier(); + } + if (head == qp->q.fragments_tail) + qp->q.fragments_tail = NULL; + + sub_frag_mem_limit(qp->q.net, head->truesize); + head->dev = dev_get_by_index_rcu(net, qp->iif); if (!head->dev) goto out; @@ -237,20 +254,17 @@ static void ip_expire(unsigned long arg) (skb_rtable(head)->rt_type != RTN_LOCAL)) goto out;
- clone = skb_clone(head, GFP_ATOMIC); - /* Send an ICMP "Fragment Reassembly Timeout" message. */ - if (clone) { - spin_unlock(&qp->q.lock); - icmp_send(clone, ICMP_TIME_EXCEEDED, - ICMP_EXC_FRAGTIME, 0); - consume_skb(clone); - goto out_rcu_unlock; - } + spin_unlock(&qp->q.lock); + icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0); + goto out_rcu_unlock; + out: spin_unlock(&qp->q.lock); out_rcu_unlock: rcu_read_unlock(); + if (head) + kfree_skb(head); ipq_put(qp); }
@@ -294,7 +308,7 @@ static int ip_frag_too_far(struct ipq *q end = atomic_inc_return(&peer->rid); qp->rid = end;
- rc = qp->q.fragments && (end - start) > max; + rc = qp->q.fragments_tail && (end - start) > max;
if (rc) { struct net *net; @@ -308,7 +322,6 @@ static int ip_frag_too_far(struct ipq *q
static int ip_frag_reinit(struct ipq *qp) { - struct sk_buff *fp; unsigned int sum_truesize = 0;
if (!mod_timer(&qp->q.timer, jiffies + qp->q.net->timeout)) { @@ -316,20 +329,14 @@ static int ip_frag_reinit(struct ipq *qp return -ETIMEDOUT; }
- fp = qp->q.fragments; - do { - struct sk_buff *xp = fp->next; - - sum_truesize += fp->truesize; - kfree_skb(fp); - fp = xp; - } while (fp); + sum_truesize = skb_rbtree_purge(&qp->q.rb_fragments); sub_frag_mem_limit(qp->q.net, sum_truesize);
qp->q.flags = 0; qp->q.len = 0; qp->q.meat = 0; qp->q.fragments = NULL; + qp->q.rb_fragments = RB_ROOT; qp->q.fragments_tail = NULL; qp->iif = 0; qp->ecn = 0; @@ -341,7 +348,8 @@ static int ip_frag_reinit(struct ipq *qp static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) { struct net *net = container_of(qp->q.net, struct net, ipv4.frags); - struct sk_buff *prev, *next; + struct rb_node **rbn, *parent; + struct sk_buff *skb1; struct net_device *dev; unsigned int fragsize; int flags, offset; @@ -404,56 +412,60 @@ static int ip_frag_queue(struct ipq *qp, if (err) goto err;
- /* Find out which fragments are in front and at the back of us - * in the chain of fragments so far. We must know where to put - * this fragment, right? - */ - prev = qp->q.fragments_tail; - if (!prev || FRAG_CB(prev)->offset < offset) { - next = NULL; - goto found; - } - prev = NULL; - for (next = qp->q.fragments; next != NULL; next = next->next) { - if (FRAG_CB(next)->offset >= offset) - break; /* bingo! */ - prev = next; - } + /* Note : skb->rbnode and skb->dev share the same location. */ + dev = skb->dev; + /* Makes sure compiler wont do silly aliasing games */ + barrier();
-found: /* RFC5722, Section 4, amended by Errata ID : 3089 * When reassembling an IPv6 datagram, if * one or more its constituent fragments is determined to be an * overlapping fragment, the entire datagram (and any constituent * fragments) MUST be silently discarded. * - * We do the same here for IPv4. + * We do the same here for IPv4 (and increment an snmp counter). */ - /* Is there an overlap with the previous fragment? */ - if (prev && - (FRAG_CB(prev)->offset + prev->len) > offset) - goto discard_qp; - - /* Is there an overlap with the next fragment? */ - if (next && FRAG_CB(next)->offset < end) - goto discard_qp; - - FRAG_CB(skb)->offset = offset;
- /* Insert this fragment in the chain of fragments. */ - skb->next = next; - if (!next) + /* Find out where to put this fragment. */ + skb1 = qp->q.fragments_tail; + if (!skb1) { + /* This is the first fragment we've received. */ + rb_link_node(&skb->rbnode, NULL, &qp->q.rb_fragments.rb_node); + qp->q.fragments_tail = skb; + } else if ((FRAG_CB(skb1)->offset + skb1->len) < end) { + /* This is the common/special case: skb goes to the end. */ + /* Detect and discard overlaps. */ + if (offset < (FRAG_CB(skb1)->offset + skb1->len)) + goto discard_qp; + /* Insert after skb1. */ + rb_link_node(&skb->rbnode, &skb1->rbnode, &skb1->rbnode.rb_right); qp->q.fragments_tail = skb; - if (prev) - prev->next = skb; - else - qp->q.fragments = skb; + } else { + /* Binary search. Note that skb can become the first fragment, but + * not the last (covered above). */ + rbn = &qp->q.rb_fragments.rb_node; + do { + parent = *rbn; + skb1 = rb_to_skb(parent); + if (end <= FRAG_CB(skb1)->offset) + rbn = &parent->rb_left; + else if (offset >= FRAG_CB(skb1)->offset + skb1->len) + rbn = &parent->rb_right; + else /* Found an overlap with skb1. */ + goto discard_qp; + } while (*rbn); + /* Here we have parent properly set, and rbn pointing to + * one of its NULL left/right children. Insert skb. */ + rb_link_node(&skb->rbnode, parent, rbn); + } + rb_insert_color(&skb->rbnode, &qp->q.rb_fragments);
- dev = skb->dev; if (dev) { qp->iif = dev->ifindex; skb->dev = NULL; } + FRAG_CB(skb)->offset = offset; + qp->q.stamp = skb->tstamp; qp->q.meat += skb->len; qp->ecn |= ecn; @@ -475,7 +487,7 @@ found: unsigned long orefdst = skb->_skb_refdst;
skb->_skb_refdst = 0UL; - err = ip_frag_reasm(qp, prev, dev); + err = ip_frag_reasm(qp, skb, dev); skb->_skb_refdst = orefdst; return err; } @@ -492,15 +504,15 @@ err: return err; }
- /* Build a new IP datagram from all its fragments. */ - -static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev, +static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb, struct net_device *dev) { struct net *net = container_of(qp->q.net, struct net, ipv4.frags); struct iphdr *iph; - struct sk_buff *fp, *head = qp->q.fragments; + struct sk_buff *fp, *head = skb_rb_first(&qp->q.rb_fragments); + struct sk_buff **nextp; /* To build frag_list. */ + struct rb_node *rbn; int len; int ihlen; int err; @@ -514,25 +526,21 @@ static int ip_frag_reasm(struct ipq *qp, goto out_fail; } /* Make the one we just received the head. */ - if (prev) { - head = prev->next; - fp = skb_clone(head, GFP_ATOMIC); + if (head != skb) { + fp = skb_clone(skb, GFP_ATOMIC); if (!fp) goto out_nomem;
- fp->next = head->next; - if (!fp->next) + rb_replace_node(&skb->rbnode, &fp->rbnode, &qp->q.rb_fragments); + if (qp->q.fragments_tail == skb) qp->q.fragments_tail = fp; - prev->next = fp; - - skb_morph(head, qp->q.fragments); - head->next = qp->q.fragments->next; - - consume_skb(qp->q.fragments); - qp->q.fragments = head; + skb_morph(skb, head); + rb_replace_node(&head->rbnode, &skb->rbnode, + &qp->q.rb_fragments); + consume_skb(head); + head = skb; }
- WARN_ON(!head); WARN_ON(FRAG_CB(head)->offset != 0);
/* Allocate a new buffer for the datagram. */ @@ -557,24 +565,35 @@ static int ip_frag_reasm(struct ipq *qp, clone = alloc_skb(0, GFP_ATOMIC); if (!clone) goto out_nomem; - clone->next = head->next; - head->next = clone; skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list; skb_frag_list_init(head); for (i = 0; i < skb_shinfo(head)->nr_frags; i++) plen += skb_frag_size(&skb_shinfo(head)->frags[i]); clone->len = clone->data_len = head->data_len - plen; - head->data_len -= clone->len; - head->len -= clone->len; + skb->truesize += clone->truesize; clone->csum = 0; clone->ip_summed = head->ip_summed; add_frag_mem_limit(qp->q.net, clone->truesize); + skb_shinfo(head)->frag_list = clone; + nextp = &clone->next; + } else { + nextp = &skb_shinfo(head)->frag_list; }
- skb_shinfo(head)->frag_list = head->next; skb_push(head, head->data - skb_network_header(head));
- for (fp=head->next; fp; fp = fp->next) { + /* Traverse the tree in order, to build frag_list. */ + rbn = rb_next(&head->rbnode); + rb_erase(&head->rbnode, &qp->q.rb_fragments); + while (rbn) { + struct rb_node *rbnext = rb_next(rbn); + fp = rb_to_skb(rbn); + rb_erase(rbn, &qp->q.rb_fragments); + rbn = rbnext; + *nextp = fp; + nextp = &fp->next; + fp->prev = NULL; + memset(&fp->rbnode, 0, sizeof(fp->rbnode)); head->data_len += fp->len; head->len += fp->len; if (head->ip_summed != fp->ip_summed) @@ -585,7 +604,9 @@ static int ip_frag_reasm(struct ipq *qp, } sub_frag_mem_limit(qp->q.net, head->truesize);
+ *nextp = NULL; head->next = NULL; + head->prev = NULL; head->dev = dev; head->tstamp = qp->q.stamp; IPCB(head)->frag_max_size = max(qp->max_df_size, qp->q.max_size); @@ -613,6 +634,7 @@ static int ip_frag_reasm(struct ipq *qp,
IP_INC_STATS_BH(net, IPSTATS_MIB_REASMOKS); qp->q.fragments = NULL; + qp->q.rb_fragments = RB_ROOT; qp->q.fragments_tail = NULL; return 0;
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -472,6 +472,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, head->csum);
fq->q.fragments = NULL; + fq->q.rb_fragments = RB_ROOT; fq->q.fragments_tail = NULL;
/* all original skbs are linked into the NFCT_FRAG6_CB(head).orig */ --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -499,6 +499,7 @@ static int ip6_frag_reasm(struct frag_qu IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS); rcu_read_unlock(); fq->q.fragments = NULL; + fq->q.rb_fragments = RB_ROOT; fq->q.fragments_tail = NULL; return 1;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 0ed4229b08c13c84a3c301a08defdc9e7f4467e6 upstream.
don't bother with pathological cases, they only waste cycles. IPv6 requires a minimum MTU of 1280 so we should never see fragments smaller than this (except last frag).
v3: don't use awkward "-offset + len" v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68). There were concerns that there could be even smaller frags generated by intermediate nodes, e.g. on radio networks.
Cc: Peter Oskolkov posk@google.com Cc: Eric Dumazet edumazet@google.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/netfilter/nf_conntrack_reasm.c | 4 ++++ net/ipv6/reassembly.c | 4 ++++ 2 files changed, 8 insertions(+)
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -602,6 +602,10 @@ struct sk_buff *nf_ct_frag6_gather(struc hdr = ipv6_hdr(clone); fhdr = (struct frag_hdr *)skb_transport_header(clone);
+ if (skb->len - skb_network_offset(skb) < IPV6_MIN_MTU && + fhdr->frag_off & htons(IP6_MF)) + goto ret_orig; + skb_orphan(skb); fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr, skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr)); --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -549,6 +549,10 @@ static int ipv6_frag_rcv(struct sk_buff return 1; }
+ if (skb->len - skb_network_offset(skb) < IPV6_MIN_MTU && + fhdr->frag_off & htons(IP6_MF)) + goto fail_hdr; + fq = fq_find(net, fhdr->identification, &hdr->saddr, &hdr->daddr, skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr)); if (fq) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Oskolkov posk@google.com
commit 353c9cb360874e737fb000545f783df756c06f9a upstream.
This patch introduces several helper functions/macros that will be used in the follow-up patch. No runtime changes yet.
The new logic (fully implemented in the second patch) is as follows:
* Nodes in the rb-tree will now contain not single fragments, but lists of consecutive fragments ("runs").
* At each point in time, the current "active" run at the tail is maintained/tracked. Fragments that arrive in-order, adjacent to the previous tail fragment, are added to this tail run without triggering the re-balancing of the rb-tree.
* If a fragment arrives out of order with the offset _before_ the tail run, it is inserted into the rb-tree as a single fragment.
* If a fragment arrives after the current tail fragment (with a gap), it starts a new "tail" run, as is inserted into the rb-tree at the end as the head of the new run.
skb->cb is used to store additional information needed here (suggested by Eric Dumazet).
Reported-by: Willem de Bruijn willemb@google.com Signed-off-by: Peter Oskolkov posk@google.com Cc: Eric Dumazet edumazet@google.com Cc: Florian Westphal fw@strlen.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/inet_frag.h | 4 ++ net/ipv4/ip_fragment.c | 74 +++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 74 insertions(+), 4 deletions(-)
--- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -48,6 +48,7 @@ struct inet_frag_queue { struct sk_buff *fragments; /* Used in IPv6. */ struct rb_root rb_fragments; /* Used in IPv4. */ struct sk_buff *fragments_tail; + struct sk_buff *last_run_head; ktime_t stamp; int len; int meat; @@ -118,6 +119,9 @@ struct inet_frag_queue *inet_frag_find(s void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q, const char *prefix);
+/* Free all skbs in the queue; return the sum of their truesizes. */ +unsigned int inet_frag_rbtree_purge(struct rb_root *root); + static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f) { if (atomic_dec_and_test(&q->refcnt)) --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -58,13 +58,57 @@ static int sysctl_ipfrag_max_dist __read_mostly = 64; static const char ip_frag_cache_name[] = "ip4-frags";
-struct ipfrag_skb_cb -{ +/* Use skb->cb to track consecutive/adjacent fragments coming at + * the end of the queue. Nodes in the rb-tree queue will + * contain "runs" of one or more adjacent fragments. + * + * Invariants: + * - next_frag is NULL at the tail of a "run"; + * - the head of a "run" has the sum of all fragment lengths in frag_run_len. + */ +struct ipfrag_skb_cb { struct inet_skb_parm h; - int offset; + int offset; + struct sk_buff *next_frag; + int frag_run_len; };
-#define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb)) +#define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb)) + +static void ip4_frag_init_run(struct sk_buff *skb) +{ + BUILD_BUG_ON(sizeof(struct ipfrag_skb_cb) > sizeof(skb->cb)); + + FRAG_CB(skb)->next_frag = NULL; + FRAG_CB(skb)->frag_run_len = skb->len; +} + +/* Append skb to the last "run". */ +static void ip4_frag_append_to_last_run(struct inet_frag_queue *q, + struct sk_buff *skb) +{ + RB_CLEAR_NODE(&skb->rbnode); + FRAG_CB(skb)->next_frag = NULL; + + FRAG_CB(q->last_run_head)->frag_run_len += skb->len; + FRAG_CB(q->fragments_tail)->next_frag = skb; + q->fragments_tail = skb; +} + +/* Create a new "run" with the skb. */ +static void ip4_frag_create_run(struct inet_frag_queue *q, struct sk_buff *skb) +{ + if (q->last_run_head) + rb_link_node(&skb->rbnode, &q->last_run_head->rbnode, + &q->last_run_head->rbnode.rb_right); + else + rb_link_node(&skb->rbnode, NULL, &q->rb_fragments.rb_node); + rb_insert_color(&skb->rbnode, &q->rb_fragments); + + ip4_frag_init_run(skb); + q->fragments_tail = skb; + q->last_run_head = skb; +}
/* Describe an entry in the "incomplete datagrams" queue. */ struct ipq { @@ -721,6 +765,28 @@ struct sk_buff *ip_check_defrag(struct n } EXPORT_SYMBOL(ip_check_defrag);
+unsigned int inet_frag_rbtree_purge(struct rb_root *root) +{ + struct rb_node *p = rb_first(root); + unsigned int sum = 0; + + while (p) { + struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode); + + p = rb_next(p); + rb_erase(&skb->rbnode, root); + while (skb) { + struct sk_buff *next = FRAG_CB(skb)->next_frag; + + sum += skb->truesize; + kfree_skb(skb); + skb = next; + } + } + return sum; +} +EXPORT_SYMBOL(inet_frag_rbtree_purge); + #ifdef CONFIG_SYSCTL static int zero;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Oskolkov posk@google.com
commit a4fd284a1f8fd4b6c59aa59db2185b1e17c5c11c upstream.
This patch changes the runtime behavior of IP defrag queue: incoming in-order fragments are added to the end of the current list/"run" of in-order fragments at the tail.
On some workloads, UDP stream performance is substantially improved:
RX: ./udp_stream -F 10 -T 2 -l 60 TX: ./udp_stream -c -H <host> -F 10 -T 5 -l 60
with this patchset applied on a 10Gbps receiver:
throughput=9524.18 throughput_units=Mbit/s
upstream (net-next):
throughput=4608.93 throughput_units=Mbit/s
Reported-by: Willem de Bruijn willemb@google.com Signed-off-by: Peter Oskolkov posk@google.com Cc: Eric Dumazet edumazet@google.com Cc: Florian Westphal fw@strlen.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/inet_fragment.c | 2 net/ipv4/ip_fragment.c | 110 +++++++++++++++++++++++++++++------------------ 2 files changed, 70 insertions(+), 42 deletions(-)
--- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -315,7 +315,7 @@ void inet_frag_destroy(struct inet_frag_ fp = xp; } while (fp); } else { - sum_truesize = skb_rbtree_purge(&q->rb_fragments); + sum_truesize = inet_frag_rbtree_purge(&q->rb_fragments); } sum = sum_truesize + f->qsize;
--- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -139,8 +139,8 @@ int ip_frag_mem(struct net *net) return sum_frag_mem_limit(&net->ipv4.frags); }
-static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev, - struct net_device *dev); +static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb, + struct sk_buff *prev_tail, struct net_device *dev);
struct ip4_create_arg { struct iphdr *iph; @@ -271,7 +271,12 @@ static void ip_expire(unsigned long arg) head = skb_rb_first(&qp->q.rb_fragments); if (!head) goto out; - rb_erase(&head->rbnode, &qp->q.rb_fragments); + if (FRAG_CB(head)->next_frag) + rb_replace_node(&head->rbnode, + &FRAG_CB(head)->next_frag->rbnode, + &qp->q.rb_fragments); + else + rb_erase(&head->rbnode, &qp->q.rb_fragments); memset(&head->rbnode, 0, sizeof(head->rbnode)); barrier(); } @@ -373,7 +378,7 @@ static int ip_frag_reinit(struct ipq *qp return -ETIMEDOUT; }
- sum_truesize = skb_rbtree_purge(&qp->q.rb_fragments); + sum_truesize = inet_frag_rbtree_purge(&qp->q.rb_fragments); sub_frag_mem_limit(qp->q.net, sum_truesize);
qp->q.flags = 0; @@ -382,6 +387,7 @@ static int ip_frag_reinit(struct ipq *qp qp->q.fragments = NULL; qp->q.rb_fragments = RB_ROOT; qp->q.fragments_tail = NULL; + qp->q.last_run_head = NULL; qp->iif = 0; qp->ecn = 0;
@@ -393,7 +399,7 @@ static int ip_frag_queue(struct ipq *qp, { struct net *net = container_of(qp->q.net, struct net, ipv4.frags); struct rb_node **rbn, *parent; - struct sk_buff *skb1; + struct sk_buff *skb1, *prev_tail; struct net_device *dev; unsigned int fragsize; int flags, offset; @@ -471,38 +477,41 @@ static int ip_frag_queue(struct ipq *qp, */
/* Find out where to put this fragment. */ - skb1 = qp->q.fragments_tail; - if (!skb1) { - /* This is the first fragment we've received. */ - rb_link_node(&skb->rbnode, NULL, &qp->q.rb_fragments.rb_node); - qp->q.fragments_tail = skb; - } else if ((FRAG_CB(skb1)->offset + skb1->len) < end) { - /* This is the common/special case: skb goes to the end. */ + prev_tail = qp->q.fragments_tail; + if (!prev_tail) + ip4_frag_create_run(&qp->q, skb); /* First fragment. */ + else if (FRAG_CB(prev_tail)->offset + prev_tail->len < end) { + /* This is the common case: skb goes to the end. */ /* Detect and discard overlaps. */ - if (offset < (FRAG_CB(skb1)->offset + skb1->len)) + if (offset < FRAG_CB(prev_tail)->offset + prev_tail->len) goto discard_qp; - /* Insert after skb1. */ - rb_link_node(&skb->rbnode, &skb1->rbnode, &skb1->rbnode.rb_right); - qp->q.fragments_tail = skb; + if (offset == FRAG_CB(prev_tail)->offset + prev_tail->len) + ip4_frag_append_to_last_run(&qp->q, skb); + else + ip4_frag_create_run(&qp->q, skb); } else { - /* Binary search. Note that skb can become the first fragment, but - * not the last (covered above). */ + /* Binary search. Note that skb can become the first fragment, + * but not the last (covered above). + */ rbn = &qp->q.rb_fragments.rb_node; do { parent = *rbn; skb1 = rb_to_skb(parent); if (end <= FRAG_CB(skb1)->offset) rbn = &parent->rb_left; - else if (offset >= FRAG_CB(skb1)->offset + skb1->len) + else if (offset >= FRAG_CB(skb1)->offset + + FRAG_CB(skb1)->frag_run_len) rbn = &parent->rb_right; else /* Found an overlap with skb1. */ goto discard_qp; } while (*rbn); /* Here we have parent properly set, and rbn pointing to - * one of its NULL left/right children. Insert skb. */ + * one of its NULL left/right children. Insert skb. + */ + ip4_frag_init_run(skb); rb_link_node(&skb->rbnode, parent, rbn); + rb_insert_color(&skb->rbnode, &qp->q.rb_fragments); } - rb_insert_color(&skb->rbnode, &qp->q.rb_fragments);
if (dev) { qp->iif = dev->ifindex; @@ -531,7 +540,7 @@ static int ip_frag_queue(struct ipq *qp, unsigned long orefdst = skb->_skb_refdst;
skb->_skb_refdst = 0UL; - err = ip_frag_reasm(qp, skb, dev); + err = ip_frag_reasm(qp, skb, prev_tail, dev); skb->_skb_refdst = orefdst; return err; } @@ -550,7 +559,7 @@ err:
/* Build a new IP datagram from all its fragments. */ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb, - struct net_device *dev) + struct sk_buff *prev_tail, struct net_device *dev) { struct net *net = container_of(qp->q.net, struct net, ipv4.frags); struct iphdr *iph; @@ -575,10 +584,16 @@ static int ip_frag_reasm(struct ipq *qp, if (!fp) goto out_nomem;
- rb_replace_node(&skb->rbnode, &fp->rbnode, &qp->q.rb_fragments); + FRAG_CB(fp)->next_frag = FRAG_CB(skb)->next_frag; + if (RB_EMPTY_NODE(&skb->rbnode)) + FRAG_CB(prev_tail)->next_frag = fp; + else + rb_replace_node(&skb->rbnode, &fp->rbnode, + &qp->q.rb_fragments); if (qp->q.fragments_tail == skb) qp->q.fragments_tail = fp; skb_morph(skb, head); + FRAG_CB(skb)->next_frag = FRAG_CB(head)->next_frag; rb_replace_node(&head->rbnode, &skb->rbnode, &qp->q.rb_fragments); consume_skb(head); @@ -614,7 +629,7 @@ static int ip_frag_reasm(struct ipq *qp, for (i = 0; i < skb_shinfo(head)->nr_frags; i++) plen += skb_frag_size(&skb_shinfo(head)->frags[i]); clone->len = clone->data_len = head->data_len - plen; - skb->truesize += clone->truesize; + head->truesize += clone->truesize; clone->csum = 0; clone->ip_summed = head->ip_summed; add_frag_mem_limit(qp->q.net, clone->truesize); @@ -627,24 +642,36 @@ static int ip_frag_reasm(struct ipq *qp, skb_push(head, head->data - skb_network_header(head));
/* Traverse the tree in order, to build frag_list. */ + fp = FRAG_CB(head)->next_frag; rbn = rb_next(&head->rbnode); rb_erase(&head->rbnode, &qp->q.rb_fragments); - while (rbn) { - struct rb_node *rbnext = rb_next(rbn); - fp = rb_to_skb(rbn); - rb_erase(rbn, &qp->q.rb_fragments); - rbn = rbnext; - *nextp = fp; - nextp = &fp->next; - fp->prev = NULL; - memset(&fp->rbnode, 0, sizeof(fp->rbnode)); - head->data_len += fp->len; - head->len += fp->len; - if (head->ip_summed != fp->ip_summed) - head->ip_summed = CHECKSUM_NONE; - else if (head->ip_summed == CHECKSUM_COMPLETE) - head->csum = csum_add(head->csum, fp->csum); - head->truesize += fp->truesize; + while (rbn || fp) { + /* fp points to the next sk_buff in the current run; + * rbn points to the next run. + */ + /* Go through the current run. */ + while (fp) { + *nextp = fp; + nextp = &fp->next; + fp->prev = NULL; + memset(&fp->rbnode, 0, sizeof(fp->rbnode)); + head->data_len += fp->len; + head->len += fp->len; + if (head->ip_summed != fp->ip_summed) + head->ip_summed = CHECKSUM_NONE; + else if (head->ip_summed == CHECKSUM_COMPLETE) + head->csum = csum_add(head->csum, fp->csum); + head->truesize += fp->truesize; + fp = FRAG_CB(fp)->next_frag; + } + /* Move to the next run. */ + if (rbn) { + struct rb_node *rbnext = rb_next(rbn); + + fp = rb_to_skb(rbn); + rb_erase(rbn, &qp->q.rb_fragments); + rbn = rbnext; + } } sub_frag_mem_limit(qp->q.net, head->truesize);
@@ -680,6 +707,7 @@ static int ip_frag_reasm(struct ipq *qp, qp->q.fragments = NULL; qp->q.rb_fragments = RB_ROOT; qp->q.fragments_tail = NULL; + qp->q.last_run_head = NULL; return 0;
out_nomem:
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Kubecek mkubecek@suse.cz
commit ade446403bfb79d3528d56071a84b15351a139ad upstream.
Since commit 7969e5c40dfd ("ip: discard IPv4 datagrams with overlapping segments.") IPv4 reassembly code drops the whole queue whenever an overlapping fragment is received. However, the test is written in a way which detects duplicate fragments as overlapping so that in environments with many duplicate packets, fragmented packets may be undeliverable.
Add an extra test and for (potentially) duplicate fragment, only drop the new fragment rather than the whole queue. Only starting offset and length are checked, not the contents of the fragments as that would be too expensive. For similar reason, linear list ("run") of a rbtree node is not iterated, we only check if the new fragment is a subset of the interval covered by existing consecutive fragments.
v2: instead of an exact check iterating through linear list of an rbtree node, only check if the new fragment is subset of the "run" (suggested by Eric Dumazet)
Fixes: 7969e5c40dfd ("ip: discard IPv4 datagrams with overlapping segments.") Signed-off-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv4/ip_fragment.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -400,10 +400,10 @@ static int ip_frag_queue(struct ipq *qp, struct net *net = container_of(qp->q.net, struct net, ipv4.frags); struct rb_node **rbn, *parent; struct sk_buff *skb1, *prev_tail; + int ihl, end, skb1_run_end; struct net_device *dev; unsigned int fragsize; int flags, offset; - int ihl, end; int err = -ENOENT; u8 ecn;
@@ -473,7 +473,9 @@ static int ip_frag_queue(struct ipq *qp, * overlapping fragment, the entire datagram (and any constituent * fragments) MUST be silently discarded. * - * We do the same here for IPv4 (and increment an snmp counter). + * We do the same here for IPv4 (and increment an snmp counter) but + * we do not want to drop the whole queue in response to a duplicate + * fragment. */
/* Find out where to put this fragment. */ @@ -497,13 +499,17 @@ static int ip_frag_queue(struct ipq *qp, do { parent = *rbn; skb1 = rb_to_skb(parent); + skb1_run_end = FRAG_CB(skb1)->offset + + FRAG_CB(skb1)->frag_run_len; if (end <= FRAG_CB(skb1)->offset) rbn = &parent->rb_left; - else if (offset >= FRAG_CB(skb1)->offset + - FRAG_CB(skb1)->frag_run_len) + else if (offset >= skb1_run_end) rbn = &parent->rb_right; - else /* Found an overlap with skb1. */ - goto discard_qp; + else if (offset >= FRAG_CB(skb1)->offset && + end <= skb1_run_end) + goto err; /* No new data, potential duplicate */ + else + goto discard_qp; /* Found an overlap */ } while (*rbn); /* Here we have parent properly set, and rbn pointing to * one of its NULL left/right children. Insert skb.
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Taehee Yoo ap420073@gmail.com
commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream.
A kernel crash occurrs when defragmented packet is fragmented in ip_do_fragment(). In defragment routine, skb_orphan() is called and skb->ip_defrag_offset is set. but skb->sk and skb->ip_defrag_offset are same union member. so that frag->sk is not NULL. Hence crash occurrs in skb->sk check routine in ip_do_fragment() when defragmented packet is fragmented.
test commands: %iptables -t nat -I POSTROUTING -j MASQUERADE %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
splat looks like: [ 261.069429] kernel BUG at net/ipv4/ip_output.c:636! [ 261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3 [ 261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600 [ 261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c [ 261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202 [ 261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004 [ 261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8 [ 261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395 [ 261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4 [ 261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000 [ 261.174169] FS: 00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000 [ 261.183012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0 [ 261.198158] Call Trace: [ 261.199018] ? dst_output+0x180/0x180 [ 261.205011] ? save_trace+0x300/0x300 [ 261.209018] ? ip_copy_metadata+0xb00/0xb00 [ 261.213034] ? sched_clock_local+0xd4/0x140 [ 261.218158] ? kill_l4proto+0x120/0x120 [nf_conntrack] [ 261.223014] ? rt_cpu_seq_stop+0x10/0x10 [ 261.227014] ? find_held_lock+0x39/0x1c0 [ 261.233008] ip_finish_output+0x51d/0xb50 [ 261.237006] ? ip_fragment.constprop.56+0x220/0x220 [ 261.243011] ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack] [ 261.250152] ? rcu_is_watching+0x77/0x120 [ 261.255010] ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4] [ 261.261033] ? nf_hook_slow+0xb1/0x160 [ 261.265007] ip_output+0x1c7/0x710 [ 261.269005] ? ip_mc_output+0x13f0/0x13f0 [ 261.273002] ? __local_bh_enable_ip+0xe9/0x1b0 [ 261.278152] ? ip_fragment.constprop.56+0x220/0x220 [ 261.282996] ? nf_hook_slow+0xb1/0x160 [ 261.287007] raw_sendmsg+0x21f9/0x4420 [ 261.291008] ? dst_output+0x180/0x180 [ 261.297003] ? sched_clock_cpu+0x126/0x170 [ 261.301003] ? find_held_lock+0x39/0x1c0 [ 261.306155] ? stop_critical_timings+0x420/0x420 [ 261.311004] ? check_flags.part.36+0x450/0x450 [ 261.315005] ? _raw_spin_unlock_irq+0x29/0x40 [ 261.320995] ? _raw_spin_unlock_irq+0x29/0x40 [ 261.326142] ? cyc2ns_read_end+0x10/0x10 [ 261.330139] ? raw_bind+0x280/0x280 [ 261.334138] ? sched_clock_cpu+0x126/0x170 [ 261.338995] ? check_flags.part.36+0x450/0x450 [ 261.342991] ? __lock_acquire+0x4500/0x4500 [ 261.348994] ? inet_sendmsg+0x11c/0x500 [ 261.352989] ? dst_output+0x180/0x180 [ 261.357012] inet_sendmsg+0x11c/0x500 [ ... ]
v2: - clear skb->sk at reassembly routine.(Eric Dumarzet)
Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.") Suggested-by: Eric Dumazet edumazet@google.com Signed-off-by: Taehee Yoo ap420073@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv4/ip_fragment.c | 1 + net/ipv6/netfilter/nf_conntrack_reasm.c | 1 + 2 files changed, 2 insertions(+)
--- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -661,6 +661,7 @@ static int ip_frag_reasm(struct ipq *qp, nextp = &fp->next; fp->prev = NULL; memset(&fp->rbnode, 0, sizeof(fp->rbnode)); + fp->sk = NULL; head->data_len += fp->len; head->len += fp->len; if (head->ip_summed != fp->ip_summed) --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -454,6 +454,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, else if (head->ip_summed == CHECKSUM_COMPLETE) head->csum = csum_add(head->csum, fp->csum); head->truesize += fp->truesize; + fp->sk = NULL; } sub_frag_mem_limit(fq->q.net, head->truesize);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@oracle.com
commit 70837ffe3085c9a91488b52ca13ac84424da1042 upstream.
We accidentally removed the parentheses here, but they are required because '!' has higher precedence than '&'.
Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv4/ip_fragment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -257,7 +257,7 @@ static void ip_expire(unsigned long arg) IP_INC_STATS_BH(net, IPSTATS_MIB_REASMFAILS); IP_INC_STATS_BH(net, IPSTATS_MIB_REASMTIMEOUT);
- if (!qp->q.flags & INET_FRAG_FIRST_IN) + if (!(qp->q.flags & INET_FRAG_FIRST_IN)) goto out;
/* sk_buff::dev and sk_buff::rbnode are unionized. So we
On Mon, Feb 04, 2019 at 11:35:53AM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.173 release. There are 65 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Feb 6 10:35:30 UTC 2019. Anything received after that time might be too late.
Build results: total: 171 pass: 171 fail: 0 Qemu test results: total: 291 pass: 291 fail: 0 (*)
Guenter
--- (*) I had to revert to gcc 5.3.0 for sh4 boot tests. With gcc 8.2.0, most tests stall early in boot. I didn't try to track down the root cause.
On Mon, Feb 04, 2019 at 02:48:17PM -0800, Guenter Roeck wrote:
On Mon, Feb 04, 2019 at 11:35:53AM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.173 release. There are 65 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Feb 6 10:35:30 UTC 2019. Anything received after that time might be too late.
Build results: total: 171 pass: 171 fail: 0 Qemu test results: total: 291 pass: 291 fail: 0 (*)
Guenter
(*) I had to revert to gcc 5.3.0 for sh4 boot tests. With gcc 8.2.0, most tests stall early in boot. I didn't try to track down the root cause.
Is this a new issue? I don't see any sh-specific patches in this series.
thnaks,
greg k-h
On 2/5/19 6:42 AM, Greg Kroah-Hartman wrote:
On Mon, Feb 04, 2019 at 02:48:17PM -0800, Guenter Roeck wrote:
On Mon, Feb 04, 2019 at 11:35:53AM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.173 release. There are 65 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Feb 6 10:35:30 UTC 2019. Anything received after that time might be too late.
Build results: total: 171 pass: 171 fail: 0 Qemu test results: total: 291 pass: 291 fail: 0 (*)
Guenter
(*) I had to revert to gcc 5.3.0 for sh4 boot tests. With gcc 8.2.0, most tests stall early in boot. I didn't try to track down the root cause.
Is this a new issue? I don't see any sh-specific patches in this series.
I had switched to gcc 8.2.0 last October since images built with gcc 5.3.0 had stopped working in -next at the time (coincidentally with the same symptoms). Ever since then, I had on-and-off problems where one or two of the test boots would fail. Usually a re-run would be successful. I originally blamed qemu and switched between versions, but that didn't help. This is the first release and branch where 8-10 of the 14 boot tests failed persistently when building the image with gcc 8.2.0. I finally had the idea to look at the compiler and, yes, switching back to an older version helped.
I have no idea if gcc or qemu or the code itself is to blame. I am not sure if anyone would be interested enough to fix the underlying problem. I just wanted to mention it in case someone does care. If so, I'll be happy to help tracking it down further.
Thanks, Guenter
On Mon, 4 Feb 2019 at 16:10, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.4.173 release. There are 65 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Feb 6 10:35:30 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.173-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.4.173-rc2 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: 256c5e4d5358cf3c649b53a0c8bef02f499e357c git describe: v4.4.172-66-g256c5e4d5358 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.172-66-...
No regressions (compared to build v4.4.172)
No fixes (compared to build v4.4.172)
Ran 17275 total tests in the following environments and test suites.
Environments -------------- - i386 - juno-r2 - arm64 - qemu_arm - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * spectre-meltdown-checker-test * install-android-platform-tools-r2600 * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
Summary ------------------------------------------------------------------------
kernel: 4.4.173-rc2 git repo: https://git.linaro.org/lkft/arm64-stable-rc.git git branch: 4.4.173-rc2-hikey-20190204-367 git commit: e7819526baa5163360b89a0c45d2e09992772ff6 git describe: 4.4.173-rc2-hikey-20190204-367 Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.1...
No regressions (compared to build 4.4.173-rc1-hikey-20190204-366)
No fixes (compared to build 4.4.173-rc1-hikey-20190204-366)
Ran 2826 total tests in the following environments and test suites.
Environments -------------- - hi6220-hikey - arm64 - qemu_arm64
Test Suites ----------- * boot * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * spectre-meltdown-checker-test * ltp-fs-tests
On 04/02/2019 10:35, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.173 release. There are 65 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Feb 6 10:35:30 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.173-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v4.4: 6 builds: 6 pass, 0 fail 12 boots: 12 pass, 0 fail 10 tests: 10 pass, 0 fail
Linux version: 4.4.173-rc2-g256c5e4 Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra30-cardhu-a04
Cheers Jon
linux-stable-mirror@lists.linaro.org