This is the start of the stable review cycle for the 4.14.16 release. There are 71 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 31 12:37:59 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.16-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.16-rc1
Rafael J. Wysocki rafael.j.wysocki@intel.com cpufreq: governor: Ensure sufficiently large sampling intervals
Daniel Borkmann daniel@iogearbox.net bpf, arm64: fix stack_depth tracking in combination with tail calls
Daniel Borkmann daniel@iogearbox.net bpf: reject stores into ctx via st and xadd
Alexei Starovoitov ast@kernel.org bpf: fix 32-bit divide by zero
Eric Dumazet edumazet@google.com bpf: fix divides by zero
Daniel Borkmann daniel@iogearbox.net bpf: avoid false sharing of map refcount with max_entries
Alexei Starovoitov ast@kernel.org bpf: introduce BPF_JIT_ALWAYS_ON config
Thomas Gleixner tglx@linutronix.de hrtimer: Reset hrtimer cpu base proper on CPU hotplug
Andy Lutomirski luto@kernel.org x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems
Borislav Petkov bp@suse.de x86/microcode: Fix again accessing initrd after having been freed
Jia Zhang zhang.jia@linux.alibaba.com x86/microcode/intel: Extend BDW late-loading further with LLC size check
Xiao Liang xiliang@redhat.com perf/x86/amd/power: Do not load AMD power module on !AMD platforms
Neil Horman nhorman@tuxdriver.com vmxnet3: repair memory leak
Lorenzo Colitti lorenzo@google.com net: ipv4: Make "ip route get" match iif lo rules again.
Sabrina Dubroca sd@queasysnail.net tls: reset crypto_info when do_tls_setsockopt_tx fails
Sabrina Dubroca sd@queasysnail.net tls: return -EBUSY if crypto_info is already set
Sabrina Dubroca sd@queasysnail.net tls: fix sw_ctx leak
Ilya Lesokhin ilyal@mellanox.com net/tls: Only attach to sockets in ESTABLISHED state
Xin Long lucien.xin@gmail.com netlink: reset extack earlier in netlink_rcv_skb
Jakub Kicinski jakub.kicinski@netronome.com nfp: use the correct index for link speed table
Talat Batheesh talatb@mellanox.com net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare
David Ahern dsahern@gmail.com netlink: extack needs to be reset each time through loop
Xin Long lucien.xin@gmail.com sctp: reinit stream if stream outcnt has been change by sinit in sendmsg
Eric Dumazet edumazet@google.com flow_dissector: properly cap thoff field
Cong Wang xiyou.wangcong@gmail.com tun: fix a memory leak for tfile->tx_array
Yuval Mintz yuvalm@mellanox.com mlxsw: spectrum_router: Don't log an error on missing neighbor
Willem de Bruijn willemb@google.com gso: validate gso_type in GSO handlers
Alexey Kodanev alexey.kodanev@oracle.com ip6_gre: init dev->mtu and dev->hard_header_len correctly
Ivan Vecera cera@cera.cz be2net: restore properly promisc mode after queues reconfiguration
Guillaume Nault g.nault@alphalink.fr ppp: unlock all_ppp_mutex before registering device
Saeed Mahameed saeedm@mellanox.com net/mlx5: Fix get vector affinity helper function
Eran Ben Elisha eranbe@mellanox.com {net,ib}/mlx5: Don't disable local loopback multicast traffic when needed
Cong Wang xiyou.wangcong@gmail.com tipc: fix a memory leak in tipc_nl_node_get_link()
Xin Long lucien.xin@gmail.com sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
Xin Long lucien.xin@gmail.com sctp: do not allow the v4 socket to bind a v4mapped v6 address
Francois Romieu romieu@fr.zoreil.com r8169: fix memory corruption on retrieval of hardware statistics.
Guillaume Nault g.nault@alphalink.fr pppoe: take ->needed_headroom of lower device into account on xmit
David Ahern dsahern@gmail.com net: vrf: Add support for sends to local broadcast address
r.hering@avm.de r.hering@avm.de net/tls: Fix inverted error codes to avoid endless loop
Dan Streetman ddstreet@ieee.org net: tcp: close sock if net namespace is exiting
Eric Dumazet edumazet@google.com net: qdisc_pkt_len_init() should be more robust
Felix Fietkau nbd@nbd.name net: igmp: fix source address check for IGMPv3 reports
Yuiko Oshino yuiko.oshino@microchip.com lan78xx: Fix failure in USB Full Speed
Eric Dumazet edumazet@google.com ipv6: ip6_make_skb() needs to clear cork.base.dst
Mike Maloney maloney@google.com ipv6: fix udpv6 sendmsg crash caused by too small MTU
Ben Hutchings ben.hutchings@codethink.co.uk ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
Alexey Kodanev alexey.kodanev@oracle.com dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
Jim Westfall jwestfall@surrealistic.net ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
Jim Westfall jwestfall@surrealistic.net net: Allow neigh contructor functions ability to modify the primary_key
Boris Brezillon boris.brezillon@free-electrons.com drm/vc4: Fix NULL pointer dereference in vc4_save_hang_state()
Russell King rmk+kernel@armlinux.org.uk ARM: net: bpf: clarify tail_call index
Russell King rmk+kernel@armlinux.org.uk ARM: net: bpf: fix LDX instructions
Russell King rmk+kernel@armlinux.org.uk ARM: net: bpf: fix register saving
Russell King rmk+kernel@armlinux.org.uk ARM: net: bpf: correct stack layout documentation
Russell King rmk+kernel@armlinux.org.uk ARM: net: bpf: move stack documentation
Russell King rmk+kernel@armlinux.org.uk ARM: net: bpf: fix stack alignment
Russell King rmk+kernel@armlinux.org.uk ARM: net: bpf: fix tail call jumps
Russell King rmk+kernel@armlinux.org.uk ARM: net: bpf: avoid 'bx' instruction on non-Thumb capable CPUs
Martin Brandenburg martin@omnibond.com orangefs: fix deadlock; do not write i_size in read_iter
Christian Borntraeger borntraeger@de.ibm.com KVM: s390: add proper locking for CMMA migration bitmap
Josef Bacik jbacik@fb.com Btrfs: fix stale entries in readdir
Dmitry Torokhov dmitry.torokhov@gmail.com Input: trackpoint - only expose supported controls for Elan, ALPS and NXP
Aaron Ma aaron.ma@canonical.com Input: trackpoint - force 3 buttons if 0 button is reported
Mark Furneaux mark@furneaux.ca Input: xpad - add support for PDP Xbox One controllers
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "module: Add retpoline tag to VERMAGIC"
Steffen Klassert steffen.klassert@secunet.com xfrm: Fix a race in the xdst pcpu cache.
Kevin Cernekee cernekee@chromium.org netfilter: xt_osf: Add missing permission checks
Kevin Cernekee cernekee@chromium.org netfilter: nfnetlink_cthelper: Add missing permission checks
Vlastimil Babka vbabka@suse.cz mm, page_alloc: fix potential false positive in __zone_watermark_ok
Martin Brandenburg martin@omnibond.com orangefs: initialize op on loop restart in orangefs_devreq_read
Martin Brandenburg martin@omnibond.com orangefs: use list_for_each_entry_safe in purge_waiting_ops
-------------
Diffstat:
Makefile | 4 +- arch/arm/net/bpf_jit_32.c | 225 ++++++++++--------- arch/arm64/net/bpf_jit_comp.c | 20 +- arch/s390/kvm/kvm-s390.c | 18 +- arch/x86/events/amd/power.c | 2 +- arch/x86/kernel/cpu/microcode/core.c | 2 +- arch/x86/kernel/cpu/microcode/intel.c | 20 +- arch/x86/mm/tlb.c | 34 ++- drivers/cpufreq/cpufreq_governor.c | 19 +- drivers/gpu/drm/vc4/vc4_gem.c | 12 +- drivers/infiniband/hw/mlx5/main.c | 9 +- drivers/input/joystick/xpad.c | 19 ++ drivers/input/mouse/trackpoint.c | 245 +++++++++++++-------- drivers/input/mouse/trackpoint.h | 34 +-- drivers/net/ethernet/emulex/benet/be_main.c | 9 + drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 6 + .../net/ethernet/mellanox/mlx5/core/en_selftest.c | 27 ++- drivers/net/ethernet/mellanox/mlx5/core/main.c | 3 +- drivers/net/ethernet/mellanox/mlx5/core/vport.c | 22 +- .../net/ethernet/mellanox/mlxsw/spectrum_router.c | 10 +- .../net/ethernet/netronome/nfp/nfp_net_ethtool.c | 2 +- drivers/net/ethernet/realtek/r8169.c | 9 +- drivers/net/ppp/ppp_generic.c | 5 +- drivers/net/ppp/pppoe.c | 11 +- drivers/net/tun.c | 15 +- drivers/net/usb/lan78xx.c | 1 + drivers/net/vmxnet3/vmxnet3_drv.c | 2 +- drivers/net/vrf.c | 5 +- fs/btrfs/delayed-inode.c | 26 +-- fs/orangefs/devorangefs-req.c | 3 +- fs/orangefs/file.c | 7 +- fs/orangefs/orangefs-kernel.h | 11 - fs/orangefs/waitqueue.c | 4 +- include/linux/bpf.h | 21 +- include/linux/mlx5/driver.h | 19 +- include/linux/mlx5/mlx5_ifc.h | 5 +- include/linux/vermagic.h | 8 +- include/net/arp.h | 3 + include/net/ipv6.h | 1 + include/net/net_namespace.h | 10 + include/net/tls.h | 2 +- init/Kconfig | 7 + kernel/bpf/core.c | 23 +- kernel/bpf/verifier.c | 37 ++++ kernel/time/hrtimer.c | 3 + lib/test_bpf.c | 11 +- mm/page_alloc.c | 6 +- net/core/dev.c | 19 +- net/core/filter.c | 10 +- net/core/flow_dissector.c | 3 +- net/core/neighbour.c | 4 +- net/core/sysctl_net_core.c | 6 + net/dccp/ccids/ccid2.c | 3 + net/ipv4/arp.c | 7 +- net/ipv4/esp4_offload.c | 3 + net/ipv4/igmp.c | 2 +- net/ipv4/route.c | 1 + net/ipv4/tcp.c | 3 + net/ipv4/tcp_offload.c | 3 + net/ipv4/tcp_timer.c | 15 ++ net/ipv4/udp_offload.c | 3 + net/ipv6/esp6_offload.c | 3 + net/ipv6/ip6_gre.c | 14 +- net/ipv6/ip6_output.c | 9 +- net/ipv6/ipv6_sockglue.c | 2 +- net/ipv6/tcpv6_offload.c | 3 + net/ipv6/udp_offload.c | 3 + net/netfilter/nfnetlink_cthelper.c | 10 + net/netfilter/xt_osf.c | 7 + net/netlink/af_netlink.c | 3 +- net/sctp/offload.c | 3 + net/sctp/socket.c | 40 ++-- net/socket.c | 9 + net/tipc/node.c | 26 ++- net/tls/tls_main.c | 17 +- net/tls/tls_sw.c | 16 +- net/xfrm/xfrm_policy.c | 8 +- tools/testing/selftests/bpf/test_verifier.c | 29 ++- 78 files changed, 843 insertions(+), 438 deletions(-)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Brandenburg martin@omnibond.com
commit 0afc0decf247f65b7aba666a76a0a68adf4bc435 upstream.
set_op_state_purged can delete the op.
Signed-off-by: Martin Brandenburg martin@omnibond.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/orangefs/waitqueue.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/orangefs/waitqueue.c +++ b/fs/orangefs/waitqueue.c @@ -29,10 +29,10 @@ static void orangefs_clean_up_interrupte */ void purge_waiting_ops(void) { - struct orangefs_kernel_op_s *op; + struct orangefs_kernel_op_s *op, *tmp;
spin_lock(&orangefs_request_list_lock); - list_for_each_entry(op, &orangefs_request_list, list) { + list_for_each_entry_safe(op, tmp, &orangefs_request_list, list) { gossip_debug(GOSSIP_WAIT_DEBUG, "pvfs2-client-core: purging op tag %llu %s\n", llu(op->tag),
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Brandenburg martin@omnibond.com
commit a0ec1ded22e6a6bc41981fae22406835b006a66e upstream.
In orangefs_devreq_read, there is a loop which picks an op off the list of pending ops. If the loop fails to find an op, there is nothing to read, and it returns EAGAIN. If the op has been given up on, the loop is restarted via a goto. The bug is that the variable which the found op is written to is not reinitialized, so if there are no more eligible ops on the list, the code runs again on the already handled op.
This is triggered by interrupting a process while the op is being copied to the client-core. It's a fairly small window, but it's there.
Signed-off-by: Martin Brandenburg martin@omnibond.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/orangefs/devorangefs-req.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/orangefs/devorangefs-req.c +++ b/fs/orangefs/devorangefs-req.c @@ -162,7 +162,7 @@ static ssize_t orangefs_devreq_read(stru struct orangefs_kernel_op_s *op, *temp; __s32 proto_ver = ORANGEFS_KERNEL_PROTO_VERSION; static __s32 magic = ORANGEFS_DEVREQ_MAGIC; - struct orangefs_kernel_op_s *cur_op = NULL; + struct orangefs_kernel_op_s *cur_op; unsigned long ret;
/* We do not support blocking IO. */ @@ -186,6 +186,7 @@ static ssize_t orangefs_devreq_read(stru return -EAGAIN;
restart: + cur_op = NULL; /* Get next op (if any) from top of list. */ spin_lock(&orangefs_request_list_lock); list_for_each_entry_safe(op, temp, &orangefs_request_list, list) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vlastimil Babka vbabka@suse.cz
commit b050e3769c6b4013bb937e879fc43bf1847ee819 upstream.
Since commit 97a16fc82a7c ("mm, page_alloc: only enforce watermarks for order-0 allocations"), __zone_watermark_ok() check for high-order allocations will shortcut per-migratetype free list checks for ALLOC_HARDER allocations, and return true as long as there's free page of any migratetype. The intention is that ALLOC_HARDER can allocate from MIGRATE_HIGHATOMIC free lists, while normal allocations can't.
However, as a side effect, the watermark check will then also return true when there are pages only on the MIGRATE_ISOLATE list, or (prior to CMA conversion to ZONE_MOVABLE) on the MIGRATE_CMA list. Since the allocation cannot actually obtain isolated pages, and might not be able to obtain CMA pages, this can result in a false positive.
The condition should be rare and perhaps the outcome is not a fatal one. Still, it's better if the watermark check is correct. There also shouldn't be a performance tradeoff here.
Link: http://lkml.kernel.org/r/20171102125001.23708-1-vbabka@suse.cz Fixes: 97a16fc82a7c ("mm, page_alloc: only enforce watermarks for order-0 allocations") Signed-off-by: Vlastimil Babka vbabka@suse.cz Acked-by: Mel Gorman mgorman@techsingularity.net Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Rik van Riel riel@redhat.com Cc: David Rientjes rientjes@google.com Cc: Johannes Weiner hannes@cmpxchg.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/page_alloc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3011,9 +3011,6 @@ bool __zone_watermark_ok(struct zone *z, if (!area->nr_free) continue;
- if (alloc_harder) - return true; - for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) { if (!list_empty(&area->free_list[mt])) return true; @@ -3025,6 +3022,9 @@ bool __zone_watermark_ok(struct zone *z, return true; } #endif + if (alloc_harder && + !list_empty(&area->free_list[MIGRATE_HIGHATOMIC])) + return true; } return false; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kevin Cernekee cernekee@chromium.org
commit 4b380c42f7d00a395feede754f0bc2292eebe6e5 upstream.
The capability check in nfnetlink_rcv() verifies that the caller has CAP_NET_ADMIN in the namespace that "owns" the netlink socket. However, nfnl_cthelper_list is shared by all net namespaces on the system. An unprivileged user can create user and net namespaces in which he holds CAP_NET_ADMIN to bypass the netlink_net_capable() check:
$ nfct helper list nfct v1.4.4: netlink error: Operation not permitted $ vpnns -- nfct helper list { .name = ftp, .queuenum = 0, .l3protonum = 2, .l4protonum = 6, .priv_data_len = 24, .status = enabled, };
Add capable() checks in nfnetlink_cthelper, as this is cleaner than trying to generalize the solution.
Signed-off-by: Kevin Cernekee cernekee@chromium.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Acked-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/nfnetlink_cthelper.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/net/netfilter/nfnetlink_cthelper.c +++ b/net/netfilter/nfnetlink_cthelper.c @@ -17,6 +17,7 @@ #include <linux/types.h> #include <linux/list.h> #include <linux/errno.h> +#include <linux/capability.h> #include <net/netlink.h> #include <net/sock.h>
@@ -407,6 +408,9 @@ static int nfnl_cthelper_new(struct net struct nfnl_cthelper *nlcth; int ret = 0;
+ if (!capable(CAP_NET_ADMIN)) + return -EPERM; + if (!tb[NFCTH_NAME] || !tb[NFCTH_TUPLE]) return -EINVAL;
@@ -611,6 +615,9 @@ static int nfnl_cthelper_get(struct net struct nfnl_cthelper *nlcth; bool tuple_set = false;
+ if (!capable(CAP_NET_ADMIN)) + return -EPERM; + if (nlh->nlmsg_flags & NLM_F_DUMP) { struct netlink_dump_control c = { .dump = nfnl_cthelper_dump_table, @@ -678,6 +685,9 @@ static int nfnl_cthelper_del(struct net struct nfnl_cthelper *nlcth, *n; int j = 0, ret;
+ if (!capable(CAP_NET_ADMIN)) + return -EPERM; + if (tb[NFCTH_NAME]) helper_name = nla_data(tb[NFCTH_NAME]);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kevin Cernekee cernekee@chromium.org
commit 916a27901de01446bcf57ecca4783f6cff493309 upstream.
The capability check in nfnetlink_rcv() verifies that the caller has CAP_NET_ADMIN in the namespace that "owns" the netlink socket. However, xt_osf_fingers is shared by all net namespaces on the system. An unprivileged user can create user and net namespaces in which he holds CAP_NET_ADMIN to bypass the netlink_net_capable() check:
vpnns -- nfnl_osf -f /tmp/pf.os
vpnns -- nfnl_osf -f /tmp/pf.os -d
These non-root operations successfully modify the systemwide OS fingerprint list. Add new capable() checks so that they can't.
Signed-off-by: Kevin Cernekee cernekee@chromium.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Acked-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/xt_osf.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/net/netfilter/xt_osf.c +++ b/net/netfilter/xt_osf.c @@ -19,6 +19,7 @@ #include <linux/module.h> #include <linux/kernel.h>
+#include <linux/capability.h> #include <linux/if.h> #include <linux/inetdevice.h> #include <linux/ip.h> @@ -70,6 +71,9 @@ static int xt_osf_add_callback(struct ne struct xt_osf_finger *kf = NULL, *sf; int err = 0;
+ if (!capable(CAP_NET_ADMIN)) + return -EPERM; + if (!osf_attrs[OSF_ATTR_FINGER]) return -EINVAL;
@@ -115,6 +119,9 @@ static int xt_osf_remove_callback(struct struct xt_osf_finger *sf; int err = -ENOENT;
+ if (!capable(CAP_NET_ADMIN)) + return -EPERM; + if (!osf_attrs[OSF_ATTR_FINGER]) return -EINVAL;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steffen Klassert steffen.klassert@secunet.com
commit 76a4201191814a0061cb5c861fafb9ecaa764846 upstream.
We need to run xfrm_resolve_and_create_bundle() with bottom halves off. Otherwise we may reuse an already released dst_enty when the xfrm lookup functions are called from process context.
Fixes: c30d78c14a813db39a647b6a348b428 ("xfrm: add xdst pcpu cache") Reported-by: Darius Ski darius.ski@gmail.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Acked-by: David Miller davem@davemloft.net, Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/xfrm/xfrm_policy.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -2056,8 +2056,11 @@ xfrm_bundle_lookup(struct net *net, cons if (num_xfrms <= 0) goto make_dummy_bundle;
+ local_bh_disable(); xdst = xfrm_resolve_and_create_bundle(pols, num_pols, fl, family, - xflo->dst_orig); + xflo->dst_orig); + local_bh_enable(); + if (IS_ERR(xdst)) { err = PTR_ERR(xdst); if (err != -EAGAIN) @@ -2144,9 +2147,12 @@ struct dst_entry *xfrm_lookup(struct net goto no_transform; }
+ local_bh_disable(); xdst = xfrm_resolve_and_create_bundle( pols, num_pols, fl, family, dst_orig); + local_bh_enable(); + if (IS_ERR(xdst)) { xfrm_pols_put(pols, num_pols); err = PTR_ERR(xdst);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit 5132ede0fe8092b043dae09a7cc32b8ae7272baa upstream.
This reverts commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12.
Turns out distros do not want to make retpoline as part of their "ABI", so this patch should not have been merged. Sorry Andi, this was my fault, I suggested it when your original patch was the "correct" way of doing this instead.
Reported-by: Jiri Kosina jikos@kernel.org Fixes: 6cfb521ac0d5 ("module: Add retpoline tag to VERMAGIC") Acked-by: Andi Kleen ak@linux.intel.com Cc: Thomas Gleixner tglx@linutronix.de Cc: David Woodhouse dwmw@amazon.co.uk Cc: rusty@rustcorp.com.au Cc: arjan.van.de.ven@intel.com Cc: jeyu@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/vermagic.h | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
--- a/include/linux/vermagic.h +++ b/include/linux/vermagic.h @@ -31,17 +31,11 @@ #else #define MODULE_RANDSTRUCT_PLUGIN #endif -#ifdef RETPOLINE -#define MODULE_VERMAGIC_RETPOLINE "retpoline " -#else -#define MODULE_VERMAGIC_RETPOLINE "" -#endif
#define VERMAGIC_STRING \ UTS_RELEASE " " \ MODULE_VERMAGIC_SMP MODULE_VERMAGIC_PREEMPT \ MODULE_VERMAGIC_MODULE_UNLOAD MODULE_VERMAGIC_MODVERSIONS \ MODULE_ARCH_VERMAGIC \ - MODULE_RANDSTRUCT_PLUGIN \ - MODULE_VERMAGIC_RETPOLINE + MODULE_RANDSTRUCT_PLUGIN
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Furneaux mark@furneaux.ca
commit e5c9c6a885fad00aa559b49d8fc23a60e290824e upstream.
Adds support for the current lineup of Xbox One controllers from PDP (Performance Designed Products). These controllers are very picky with their initialization sequence and require an additional 2 packets before they send any input reports.
Signed-off-by: Mark Furneaux mark@furneaux.ca Reviewed-by: Cameron Gutman aicommander@gmail.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/joystick/xpad.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
--- a/drivers/input/joystick/xpad.c +++ b/drivers/input/joystick/xpad.c @@ -229,6 +229,7 @@ static const struct xpad_device { { 0x0e6f, 0x0213, "Afterglow Gamepad for Xbox 360", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x021f, "Rock Candy Gamepad for Xbox 360", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0246, "Rock Candy Gamepad for Xbox One 2015", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02ab, "PDP Controller for Xbox One", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0301, "Logic3 Controller", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0346, "Rock Candy Gamepad for Xbox One 2016", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0401, "Logic3 Controller", 0, XTYPE_XBOX360 }, @@ -476,6 +477,22 @@ static const u8 xboxone_hori_init[] = { };
/* + * This packet is required for some of the PDP pads to start + * sending input reports. One of those pads is (0x0e6f:0x02ab). + */ +static const u8 xboxone_pdp_init1[] = { + 0x0a, 0x20, 0x00, 0x03, 0x00, 0x01, 0x14 +}; + +/* + * This packet is required for some of the PDP pads to start + * sending input reports. One of those pads is (0x0e6f:0x02ab). + */ +static const u8 xboxone_pdp_init2[] = { + 0x06, 0x20, 0x00, 0x02, 0x01, 0x00 +}; + +/* * A specific rumble packet is required for some PowerA pads to start * sending input reports. One of those pads is (0x24c6:0x543a). */ @@ -505,6 +522,8 @@ static const struct xboxone_init_packet XBOXONE_INIT_PKT(0x0e6f, 0x0165, xboxone_hori_init), XBOXONE_INIT_PKT(0x0f0d, 0x0067, xboxone_hori_init), XBOXONE_INIT_PKT(0x0000, 0x0000, xboxone_fw2015_init), + XBOXONE_INIT_PKT(0x0e6f, 0x02ab, xboxone_pdp_init1), + XBOXONE_INIT_PKT(0x0e6f, 0x02ab, xboxone_pdp_init2), XBOXONE_INIT_PKT(0x24c6, 0x541a, xboxone_rumblebegin_init), XBOXONE_INIT_PKT(0x24c6, 0x542a, xboxone_rumblebegin_init), XBOXONE_INIT_PKT(0x24c6, 0x543a, xboxone_rumblebegin_init),
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aaron Ma aaron.ma@canonical.com
commit f5d07b9e98022d50720e38aa936fc11c67868ece upstream.
Lenovo introduced trackpoint compatible sticks with minimum PS/2 commands. They supposed to reply with 0x02, 0x03, or 0x04 in response to the "Read Extended ID" command, so we would know not to try certain extended commands. Unfortunately even some trackpoints reporting the original IBM version (0x01 firmware 0x0e) now respond with incorrect data to the "Get Extended Buttons" command:
thinkpad_acpi: ThinkPad BIOS R0DET87W (1.87 ), EC unknown thinkpad_acpi: Lenovo ThinkPad E470, model 20H1004SGE
psmouse serio2: trackpoint: IBM TrackPoint firmware: 0x0e, buttons: 0/0
Since there are no trackpoints without buttons, let's assume the trackpoint has 3 buttons when we get 0 response to the extended buttons query.
Signed-off-by: Aaron Ma aaron.ma@canonical.com Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196253 Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/mouse/trackpoint.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/input/mouse/trackpoint.c +++ b/drivers/input/mouse/trackpoint.c @@ -383,6 +383,9 @@ int trackpoint_detect(struct psmouse *ps if (trackpoint_read(ps2dev, TP_EXT_BTN, &button_info)) { psmouse_warn(psmouse, "failed to get extended button data, assuming 3 buttons\n"); button_info = 0x33; + } else if (!button_info) { + psmouse_warn(psmouse, "got 0 in extended button data, assuming 3 buttons\n"); + button_info = 0x33; }
psmouse->private = kzalloc(sizeof(struct trackpoint_data), GFP_KERNEL);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Torokhov dmitry.torokhov@gmail.com
commit 2a924d71794c530e55e73d0ce2cc77233307eaa9 upstream.
The newer trackpoints from ALPS, Elan and NXP implement a very limited subset of extended commands and controls that the original trackpoints implemented, so we should not be exposing not working controls in sysfs. The newer trackpoints also do not implement "Power On Reset" or "Read Extended Button Status", so we should not be using these commands during initialization.
While we are at it, let's change "unsigned char" to u8 for byte data or bool for booleans and use better suited error codes instead of -1.
Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/mouse/trackpoint.c | 248 +++++++++++++++++++++++---------------- drivers/input/mouse/trackpoint.h | 34 +++-- 2 files changed, 172 insertions(+), 110 deletions(-)
--- a/drivers/input/mouse/trackpoint.c +++ b/drivers/input/mouse/trackpoint.c @@ -19,6 +19,13 @@ #include "psmouse.h" #include "trackpoint.h"
+static const char * const trackpoint_variants[] = { + [TP_VARIANT_IBM] = "IBM", + [TP_VARIANT_ALPS] = "ALPS", + [TP_VARIANT_ELAN] = "Elan", + [TP_VARIANT_NXP] = "NXP", +}; + /* * Power-on Reset: Resets all trackpoint parameters, including RAM values, * to defaults. @@ -26,7 +33,7 @@ */ static int trackpoint_power_on_reset(struct ps2dev *ps2dev) { - unsigned char results[2]; + u8 results[2]; int tries = 0;
/* Issue POR command, and repeat up to once if 0xFC00 received */ @@ -38,7 +45,7 @@ static int trackpoint_power_on_reset(str
/* Check for success response -- 0xAA00 */ if (results[0] != 0xAA || results[1] != 0x00) - return -1; + return -ENODEV;
return 0; } @@ -46,8 +53,7 @@ static int trackpoint_power_on_reset(str /* * Device IO: read, write and toggle bit */ -static int trackpoint_read(struct ps2dev *ps2dev, - unsigned char loc, unsigned char *results) +static int trackpoint_read(struct ps2dev *ps2dev, u8 loc, u8 *results) { if (ps2_command(ps2dev, NULL, MAKE_PS2_CMD(0, 0, TP_COMMAND)) || ps2_command(ps2dev, results, MAKE_PS2_CMD(0, 1, loc))) { @@ -57,8 +63,7 @@ static int trackpoint_read(struct ps2dev return 0; }
-static int trackpoint_write(struct ps2dev *ps2dev, - unsigned char loc, unsigned char val) +static int trackpoint_write(struct ps2dev *ps2dev, u8 loc, u8 val) { if (ps2_command(ps2dev, NULL, MAKE_PS2_CMD(0, 0, TP_COMMAND)) || ps2_command(ps2dev, NULL, MAKE_PS2_CMD(0, 0, TP_WRITE_MEM)) || @@ -70,8 +75,7 @@ static int trackpoint_write(struct ps2de return 0; }
-static int trackpoint_toggle_bit(struct ps2dev *ps2dev, - unsigned char loc, unsigned char mask) +static int trackpoint_toggle_bit(struct ps2dev *ps2dev, u8 loc, u8 mask) { /* Bad things will happen if the loc param isn't in this range */ if (loc < 0x20 || loc >= 0x2F) @@ -87,11 +91,11 @@ static int trackpoint_toggle_bit(struct return 0; }
-static int trackpoint_update_bit(struct ps2dev *ps2dev, unsigned char loc, - unsigned char mask, unsigned char value) +static int trackpoint_update_bit(struct ps2dev *ps2dev, + u8 loc, u8 mask, u8 value) { int retval = 0; - unsigned char data; + u8 data;
trackpoint_read(ps2dev, loc, &data); if (((data & mask) == mask) != !!value) @@ -105,17 +109,18 @@ static int trackpoint_update_bit(struct */ struct trackpoint_attr_data { size_t field_offset; - unsigned char command; - unsigned char mask; - unsigned char inverted; - unsigned char power_on_default; + u8 command; + u8 mask; + bool inverted; + u8 power_on_default; };
-static ssize_t trackpoint_show_int_attr(struct psmouse *psmouse, void *data, char *buf) +static ssize_t trackpoint_show_int_attr(struct psmouse *psmouse, + void *data, char *buf) { struct trackpoint_data *tp = psmouse->private; struct trackpoint_attr_data *attr = data; - unsigned char value = *(unsigned char *)((char *)tp + attr->field_offset); + u8 value = *(u8 *)((void *)tp + attr->field_offset);
if (attr->inverted) value = !value; @@ -128,8 +133,8 @@ static ssize_t trackpoint_set_int_attr(s { struct trackpoint_data *tp = psmouse->private; struct trackpoint_attr_data *attr = data; - unsigned char *field = (unsigned char *)((char *)tp + attr->field_offset); - unsigned char value; + u8 *field = (void *)tp + attr->field_offset; + u8 value; int err;
err = kstrtou8(buf, 10, &value); @@ -157,17 +162,14 @@ static ssize_t trackpoint_set_bit_attr(s { struct trackpoint_data *tp = psmouse->private; struct trackpoint_attr_data *attr = data; - unsigned char *field = (unsigned char *)((char *)tp + attr->field_offset); - unsigned int value; + bool *field = (void *)tp + attr->field_offset; + bool value; int err;
- err = kstrtouint(buf, 10, &value); + err = kstrtobool(buf, &value); if (err) return err;
- if (value > 1) - return -EINVAL; - if (attr->inverted) value = !value;
@@ -193,30 +195,6 @@ PSMOUSE_DEFINE_ATTR(_name, S_IWUSR | S_I &trackpoint_attr_##_name, \ trackpoint_show_int_attr, trackpoint_set_bit_attr)
-#define TRACKPOINT_UPDATE_BIT(_psmouse, _tp, _name) \ -do { \ - struct trackpoint_attr_data *_attr = &trackpoint_attr_##_name; \ - \ - trackpoint_update_bit(&_psmouse->ps2dev, \ - _attr->command, _attr->mask, _tp->_name); \ -} while (0) - -#define TRACKPOINT_UPDATE(_power_on, _psmouse, _tp, _name) \ -do { \ - if (!_power_on || \ - _tp->_name != trackpoint_attr_##_name.power_on_default) { \ - if (!trackpoint_attr_##_name.mask) \ - trackpoint_write(&_psmouse->ps2dev, \ - trackpoint_attr_##_name.command, \ - _tp->_name); \ - else \ - TRACKPOINT_UPDATE_BIT(_psmouse, _tp, _name); \ - } \ -} while (0) - -#define TRACKPOINT_SET_POWER_ON_DEFAULT(_tp, _name) \ - (_tp->_name = trackpoint_attr_##_name.power_on_default) - TRACKPOINT_INT_ATTR(sensitivity, TP_SENS, TP_DEF_SENS); TRACKPOINT_INT_ATTR(speed, TP_SPEED, TP_DEF_SPEED); TRACKPOINT_INT_ATTR(inertia, TP_INERTIA, TP_DEF_INERTIA); @@ -229,13 +207,33 @@ TRACKPOINT_INT_ATTR(ztime, TP_Z_TIME, TP TRACKPOINT_INT_ATTR(jenks, TP_JENKS_CURV, TP_DEF_JENKS_CURV); TRACKPOINT_INT_ATTR(drift_time, TP_DRIFT_TIME, TP_DEF_DRIFT_TIME);
-TRACKPOINT_BIT_ATTR(press_to_select, TP_TOGGLE_PTSON, TP_MASK_PTSON, 0, +TRACKPOINT_BIT_ATTR(press_to_select, TP_TOGGLE_PTSON, TP_MASK_PTSON, false, TP_DEF_PTSON); -TRACKPOINT_BIT_ATTR(skipback, TP_TOGGLE_SKIPBACK, TP_MASK_SKIPBACK, 0, +TRACKPOINT_BIT_ATTR(skipback, TP_TOGGLE_SKIPBACK, TP_MASK_SKIPBACK, false, TP_DEF_SKIPBACK); -TRACKPOINT_BIT_ATTR(ext_dev, TP_TOGGLE_EXT_DEV, TP_MASK_EXT_DEV, 1, +TRACKPOINT_BIT_ATTR(ext_dev, TP_TOGGLE_EXT_DEV, TP_MASK_EXT_DEV, true, TP_DEF_EXT_DEV);
+static bool trackpoint_is_attr_available(struct psmouse *psmouse, + struct attribute *attr) +{ + struct trackpoint_data *tp = psmouse->private; + + return tp->variant_id == TP_VARIANT_IBM || + attr == &psmouse_attr_sensitivity.dattr.attr || + attr == &psmouse_attr_press_to_select.dattr.attr; +} + +static umode_t trackpoint_is_attr_visible(struct kobject *kobj, + struct attribute *attr, int n) +{ + struct device *dev = container_of(kobj, struct device, kobj); + struct serio *serio = to_serio_port(dev); + struct psmouse *psmouse = serio_get_drvdata(serio); + + return trackpoint_is_attr_available(psmouse, attr) ? attr->mode : 0; +} + static struct attribute *trackpoint_attrs[] = { &psmouse_attr_sensitivity.dattr.attr, &psmouse_attr_speed.dattr.attr, @@ -255,24 +253,56 @@ static struct attribute *trackpoint_attr };
static struct attribute_group trackpoint_attr_group = { - .attrs = trackpoint_attrs, + .is_visible = trackpoint_is_attr_visible, + .attrs = trackpoint_attrs, };
-static int trackpoint_start_protocol(struct psmouse *psmouse, unsigned char *firmware_id) -{ - unsigned char param[2] = { 0 }; +#define TRACKPOINT_UPDATE(_power_on, _psmouse, _tp, _name) \ +do { \ + struct trackpoint_attr_data *_attr = &trackpoint_attr_##_name; \ + \ + if ((!_power_on || _tp->_name != _attr->power_on_default) && \ + trackpoint_is_attr_available(_psmouse, \ + &psmouse_attr_##_name.dattr.attr)) { \ + if (!_attr->mask) \ + trackpoint_write(&_psmouse->ps2dev, \ + _attr->command, _tp->_name); \ + else \ + trackpoint_update_bit(&_psmouse->ps2dev, \ + _attr->command, _attr->mask, \ + _tp->_name); \ + } \ +} while (0)
- if (ps2_command(&psmouse->ps2dev, param, MAKE_PS2_CMD(0, 2, TP_READ_ID))) - return -1; +#define TRACKPOINT_SET_POWER_ON_DEFAULT(_tp, _name) \ +do { \ + _tp->_name = trackpoint_attr_##_name.power_on_default; \ +} while (0)
- /* add new TP ID. */ - if (!(param[0] & TP_MAGIC_IDENT)) - return -1; +static int trackpoint_start_protocol(struct psmouse *psmouse, + u8 *variant_id, u8 *firmware_id) +{ + u8 param[2] = { 0 }; + int error;
- if (firmware_id) - *firmware_id = param[1]; + error = ps2_command(&psmouse->ps2dev, + param, MAKE_PS2_CMD(0, 2, TP_READ_ID)); + if (error) + return error;
- return 0; + switch (param[0]) { + case TP_VARIANT_IBM: + case TP_VARIANT_ALPS: + case TP_VARIANT_ELAN: + case TP_VARIANT_NXP: + if (variant_id) + *variant_id = param[0]; + if (firmware_id) + *firmware_id = param[1]; + return 0; + } + + return -ENODEV; }
/* @@ -285,7 +315,7 @@ static int trackpoint_sync(struct psmous { struct trackpoint_data *tp = psmouse->private;
- if (!in_power_on_state) { + if (!in_power_on_state && tp->variant_id == TP_VARIANT_IBM) { /* * Disable features that may make device unusable * with this driver. @@ -347,7 +377,8 @@ static void trackpoint_defaults(struct t
static void trackpoint_disconnect(struct psmouse *psmouse) { - sysfs_remove_group(&psmouse->ps2dev.serio->dev.kobj, &trackpoint_attr_group); + device_remove_group(&psmouse->ps2dev.serio->dev, + &trackpoint_attr_group);
kfree(psmouse->private); psmouse->private = NULL; @@ -355,14 +386,20 @@ static void trackpoint_disconnect(struct
static int trackpoint_reconnect(struct psmouse *psmouse) { - int reset_fail; + struct trackpoint_data *tp = psmouse->private; + int error; + bool was_reset;
- if (trackpoint_start_protocol(psmouse, NULL)) - return -1; + error = trackpoint_start_protocol(psmouse, NULL, NULL); + if (error) + return error;
- reset_fail = trackpoint_power_on_reset(&psmouse->ps2dev); - if (trackpoint_sync(psmouse, !reset_fail)) - return -1; + was_reset = tp->variant_id == TP_VARIANT_IBM && + trackpoint_power_on_reset(&psmouse->ps2dev) == 0; + + error = trackpoint_sync(psmouse, was_reset); + if (error) + return error;
return 0; } @@ -370,49 +407,66 @@ static int trackpoint_reconnect(struct p int trackpoint_detect(struct psmouse *psmouse, bool set_properties) { struct ps2dev *ps2dev = &psmouse->ps2dev; - unsigned char firmware_id; - unsigned char button_info; + struct trackpoint_data *tp; + u8 variant_id; + u8 firmware_id; + u8 button_info; int error;
- if (trackpoint_start_protocol(psmouse, &firmware_id)) - return -1; + error = trackpoint_start_protocol(psmouse, &variant_id, &firmware_id); + if (error) + return error;
if (!set_properties) return 0;
- if (trackpoint_read(ps2dev, TP_EXT_BTN, &button_info)) { - psmouse_warn(psmouse, "failed to get extended button data, assuming 3 buttons\n"); - button_info = 0x33; - } else if (!button_info) { - psmouse_warn(psmouse, "got 0 in extended button data, assuming 3 buttons\n"); - button_info = 0x33; - } - - psmouse->private = kzalloc(sizeof(struct trackpoint_data), GFP_KERNEL); - if (!psmouse->private) + tp = kzalloc(sizeof(*tp), GFP_KERNEL); + if (!tp) return -ENOMEM;
- psmouse->vendor = "IBM"; + trackpoint_defaults(tp); + tp->variant_id = variant_id; + tp->firmware_id = firmware_id; + + psmouse->private = tp; + + psmouse->vendor = trackpoint_variants[variant_id]; psmouse->name = "TrackPoint";
psmouse->reconnect = trackpoint_reconnect; psmouse->disconnect = trackpoint_disconnect;
+ if (variant_id != TP_VARIANT_IBM) { + /* Newer variants do not support extended button query. */ + button_info = 0x33; + } else { + error = trackpoint_read(ps2dev, TP_EXT_BTN, &button_info); + if (error) { + psmouse_warn(psmouse, + "failed to get extended button data, assuming 3 buttons\n"); + button_info = 0x33; + } else if (!button_info) { + psmouse_warn(psmouse, + "got 0 in extended button data, assuming 3 buttons\n"); + button_info = 0x33; + } + } + if ((button_info & 0x0f) >= 3) - __set_bit(BTN_MIDDLE, psmouse->dev->keybit); + input_set_capability(psmouse->dev, EV_KEY, BTN_MIDDLE);
__set_bit(INPUT_PROP_POINTER, psmouse->dev->propbit); __set_bit(INPUT_PROP_POINTING_STICK, psmouse->dev->propbit);
- trackpoint_defaults(psmouse->private); - - error = trackpoint_power_on_reset(ps2dev); - - /* Write defaults to TP only if reset fails. */ - if (error) + if (variant_id != TP_VARIANT_IBM || + trackpoint_power_on_reset(ps2dev) != 0) { + /* + * Write defaults to TP if we did not reset the trackpoint. + */ trackpoint_sync(psmouse, false); + }
- error = sysfs_create_group(&ps2dev->serio->dev.kobj, &trackpoint_attr_group); + error = device_add_group(&ps2dev->serio->dev, &trackpoint_attr_group); if (error) { psmouse_err(psmouse, "failed to create sysfs attributes, error: %d\n", @@ -423,8 +477,8 @@ int trackpoint_detect(struct psmouse *ps }
psmouse_info(psmouse, - "IBM TrackPoint firmware: 0x%02x, buttons: %d/%d\n", - firmware_id, + "%s TrackPoint firmware: 0x%02x, buttons: %d/%d\n", + psmouse->vendor, firmware_id, (button_info & 0xf0) >> 4, button_info & 0x0f);
return 0; --- a/drivers/input/mouse/trackpoint.h +++ b/drivers/input/mouse/trackpoint.h @@ -21,10 +21,16 @@ #define TP_COMMAND 0xE2 /* Commands start with this */
#define TP_READ_ID 0xE1 /* Sent for device identification */ -#define TP_MAGIC_IDENT 0x03 /* Sent after a TP_READ_ID followed */ - /* by the firmware ID */ - /* Firmware ID includes 0x1, 0x2, 0x3 */
+/* + * Valid first byte responses to the "Read Secondary ID" (0xE1) command. + * 0x01 was the original IBM trackpoint, others implement very limited + * subset of trackpoint features. + */ +#define TP_VARIANT_IBM 0x01 +#define TP_VARIANT_ALPS 0x02 +#define TP_VARIANT_ELAN 0x03 +#define TP_VARIANT_NXP 0x04
/* * Commands @@ -136,18 +142,20 @@
#define MAKE_PS2_CMD(params, results, cmd) ((params<<12) | (results<<8) | (cmd))
-struct trackpoint_data -{ - unsigned char sensitivity, speed, inertia, reach; - unsigned char draghys, mindrag; - unsigned char thresh, upthresh; - unsigned char ztime, jenks; - unsigned char drift_time; +struct trackpoint_data { + u8 variant_id; + u8 firmware_id; + + u8 sensitivity, speed, inertia, reach; + u8 draghys, mindrag; + u8 thresh, upthresh; + u8 ztime, jenks; + u8 drift_time;
/* toggles */ - unsigned char press_to_select; - unsigned char skipback; - unsigned char ext_dev; + bool press_to_select; + bool skipback; + bool ext_dev; };
#ifdef CONFIG_MOUSE_PS2_TRACKPOINT
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik jbacik@fb.com
commit e4fd493c0541d36953f7b9d3bfced67a1321792f upstream.
In fixing the readdir+pagefault deadlock I accidentally introduced a stale entry regression in readdir. If we get close to full for the temporary buffer, and then skip a few delayed deletions, and then try to add another entry that won't fit, we will emit the entries we found and retry. Unfortunately we delete entries from our del_list as we find them, assuming we won't need them. However our pos will be with whatever our last entry was, which could be before the delayed deletions we skipped, so the next search will add the deleted entries back into our readdir buffer. So instead don't delete entries we find in our del_list so we can make sure we always find our delayed deletions. This is a slight perf hit for readdir with lots of pending deletions, but hopefully this isn't a common occurrence. If it is we can revist this and optimize it.
Fixes: 23b5ec74943f ("btrfs: fix readdir deadlock with pagefault") Signed-off-by: Josef Bacik jbacik@fb.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/delayed-inode.c | 26 ++++++++------------------ 1 file changed, 8 insertions(+), 18 deletions(-)
--- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1677,28 +1677,18 @@ void btrfs_readdir_put_delayed_items(str int btrfs_should_delete_dir_index(struct list_head *del_list, u64 index) { - struct btrfs_delayed_item *curr, *next; - int ret; + struct btrfs_delayed_item *curr; + int ret = 0;
- if (list_empty(del_list)) - return 0; - - list_for_each_entry_safe(curr, next, del_list, readdir_list) { + list_for_each_entry(curr, del_list, readdir_list) { if (curr->key.offset > index) break; - - list_del(&curr->readdir_list); - ret = (curr->key.offset == index); - - if (refcount_dec_and_test(&curr->refs)) - kfree(curr); - - if (ret) - return 1; - else - continue; + if (curr->key.offset == index) { + ret = 1; + break; + } } - return 0; + return ret; }
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Borntraeger borntraeger@de.ibm.com
commit 1de1ea7efeb9e8543212210e34518b4049ccd285 upstream.
Some parts of the cmma migration bitmap is already protected with the kvm->lock (e.g. the migration start). On the other hand the read of the cmma bits is not protected against a concurrent free, neither is the emulation of the ESSA instruction. Let's extend the locking to all related ioctls by using the slots lock for - kvm_s390_vm_start_migration - kvm_s390_vm_stop_migration - kvm_s390_set_cmma_bits - kvm_s390_get_cmma_bits
In addition to that, we use synchronize_srcu before freeing the migration structure as all users hold kvm->srcu for read. (e.g. the ESSA handler).
Reported-by: David Hildenbrand david@redhat.com Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Fixes: 190df4a212a7 (KVM: s390: CMMA tracking, ESSA emulation, migration mode) Reviewed-by: Claudio Imbrenda imbrenda@linux.vnet.ibm.com Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kvm/kvm-s390.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
--- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -768,7 +768,7 @@ static void kvm_s390_sync_request_broadc
/* * Must be called with kvm->srcu held to avoid races on memslots, and with - * kvm->lock to avoid races with ourselves and kvm_s390_vm_stop_migration. + * kvm->slots_lock to avoid races with ourselves and kvm_s390_vm_stop_migration. */ static int kvm_s390_vm_start_migration(struct kvm *kvm) { @@ -824,7 +824,7 @@ static int kvm_s390_vm_start_migration(s }
/* - * Must be called with kvm->lock to avoid races with ourselves and + * Must be called with kvm->slots_lock to avoid races with ourselves and * kvm_s390_vm_start_migration. */ static int kvm_s390_vm_stop_migration(struct kvm *kvm) @@ -839,6 +839,8 @@ static int kvm_s390_vm_stop_migration(st
if (kvm->arch.use_cmma) { kvm_s390_sync_request_broadcast(kvm, KVM_REQ_STOP_MIGRATION); + /* We have to wait for the essa emulation to finish */ + synchronize_srcu(&kvm->srcu); vfree(mgs->pgste_bitmap); } kfree(mgs); @@ -848,14 +850,12 @@ static int kvm_s390_vm_stop_migration(st static int kvm_s390_vm_set_migration(struct kvm *kvm, struct kvm_device_attr *attr) { - int idx, res = -ENXIO; + int res = -ENXIO;
- mutex_lock(&kvm->lock); + mutex_lock(&kvm->slots_lock); switch (attr->attr) { case KVM_S390_VM_MIGRATION_START: - idx = srcu_read_lock(&kvm->srcu); res = kvm_s390_vm_start_migration(kvm); - srcu_read_unlock(&kvm->srcu, idx); break; case KVM_S390_VM_MIGRATION_STOP: res = kvm_s390_vm_stop_migration(kvm); @@ -863,7 +863,7 @@ static int kvm_s390_vm_set_migration(str default: break; } - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->slots_lock);
return res; } @@ -1753,7 +1753,9 @@ long kvm_arch_vm_ioctl(struct file *filp r = -EFAULT; if (copy_from_user(&args, argp, sizeof(args))) break; + mutex_lock(&kvm->slots_lock); r = kvm_s390_get_cmma_bits(kvm, &args); + mutex_unlock(&kvm->slots_lock); if (!r) { r = copy_to_user(argp, &args, sizeof(args)); if (r) @@ -1767,7 +1769,9 @@ long kvm_arch_vm_ioctl(struct file *filp r = -EFAULT; if (copy_from_user(&args, argp, sizeof(args))) break; + mutex_lock(&kvm->slots_lock); r = kvm_s390_set_cmma_bits(kvm, &args); + mutex_unlock(&kvm->slots_lock); break; } default:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Brandenburg martin@omnibond.com
commit 6793f1c450b1533a5e9c2493490de771d38b24f9 upstream.
After do_readv_writev, the inode cache is invalidated anyway, so i_size will never be read. It will be fetched from the server which will also know about updates from other machines.
Fixes deadlock on 32-bit SMP.
See https://marc.info/?l=linux-fsdevel&m=151268557427760&w=2
Signed-off-by: Martin Brandenburg martin@omnibond.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Mike Marshall hubcap@omnibond.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/orangefs/file.c | 7 ++----- fs/orangefs/orangefs-kernel.h | 11 ----------- 2 files changed, 2 insertions(+), 16 deletions(-)
--- a/fs/orangefs/file.c +++ b/fs/orangefs/file.c @@ -446,7 +446,7 @@ ssize_t orangefs_inode_read(struct inode static ssize_t orangefs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) { struct file *file = iocb->ki_filp; - loff_t pos = *(&iocb->ki_pos); + loff_t pos = iocb->ki_pos; ssize_t rc = 0;
BUG_ON(iocb->private); @@ -486,9 +486,6 @@ static ssize_t orangefs_file_write_iter( } }
- if (file->f_pos > i_size_read(file->f_mapping->host)) - orangefs_i_size_write(file->f_mapping->host, file->f_pos); - rc = generic_write_checks(iocb, iter);
if (rc <= 0) { @@ -502,7 +499,7 @@ static ssize_t orangefs_file_write_iter( * pos to the end of the file, so we will wait till now to set * pos... */ - pos = *(&iocb->ki_pos); + pos = iocb->ki_pos;
rc = do_readv_writev(ORANGEFS_IO_WRITE, file, --- a/fs/orangefs/orangefs-kernel.h +++ b/fs/orangefs/orangefs-kernel.h @@ -566,17 +566,6 @@ do { \ sys_attr.mask = ORANGEFS_ATTR_SYS_ALL_SETABLE; \ } while (0)
-static inline void orangefs_i_size_write(struct inode *inode, loff_t i_size) -{ -#if BITS_PER_LONG == 32 && defined(CONFIG_SMP) - inode_lock(inode); -#endif - i_size_write(inode, i_size); -#if BITS_PER_LONG == 32 && defined(CONFIG_SMP) - inode_unlock(inode); -#endif -} - static inline void orangefs_set_timeout(struct dentry *dentry) { unsigned long time = jiffies + orangefs_dcache_timeout_msecs*HZ/1000;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Russell King rmk+kernel@armlinux.org.uk
commit e9062481824384f00299971f923fecf6b3668001 upstream.
Avoid the 'bx' instruction on CPUs that have no support for Thumb and thus do not implement this instruction by moving the generation of this opcode to a separate function that selects between:
bx reg
and
mov pc, reg
according to the capabilities of the CPU.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/net/bpf_jit_32.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
--- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -285,16 +285,20 @@ static inline void emit_mov_i(const u8 r emit_mov_i_no8m(rd, val, ctx); }
-static inline void emit_blx_r(u8 tgt_reg, struct jit_ctx *ctx) +static void emit_bx_r(u8 tgt_reg, struct jit_ctx *ctx) { - ctx->seen |= SEEN_CALL; -#if __LINUX_ARM_ARCH__ < 5 - emit(ARM_MOV_R(ARM_LR, ARM_PC), ctx); - if (elf_hwcap & HWCAP_THUMB) emit(ARM_BX(tgt_reg), ctx); else emit(ARM_MOV_R(ARM_PC, tgt_reg), ctx); +} + +static inline void emit_blx_r(u8 tgt_reg, struct jit_ctx *ctx) +{ + ctx->seen |= SEEN_CALL; +#if __LINUX_ARM_ARCH__ < 5 + emit(ARM_MOV_R(ARM_LR, ARM_PC), ctx); + emit_bx_r(tgt_reg, ctx); #else emit(ARM_BLX_R(tgt_reg), ctx); #endif @@ -997,7 +1001,7 @@ static int emit_bpf_tail_call(struct jit emit_a32_mov_i(tmp2[1], off, false, ctx); emit(ARM_LDR_R(tmp[1], tmp[1], tmp2[1]), ctx); emit(ARM_ADD_I(tmp[1], tmp[1], ctx->prologue_bytes), ctx); - emit(ARM_BX(tmp[1]), ctx); + emit_bx_r(tmp[1], ctx);
/* out: */ if (out_offset == -1) @@ -1166,7 +1170,7 @@ static void build_epilogue(struct jit_ct emit(ARM_POP(reg_set), ctx); /* Return back to the callee function */ if (!(ctx->seen & SEEN_CALL)) - emit(ARM_BX(ARM_LR), ctx); + emit_bx_r(ARM_LR, ctx); #endif }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Russell King rmk+kernel@armlinux.org.uk
commit f4483f2cc1fdc03488c8a1452e545545ae5bda93 upstream.
When a tail call fails, it is documented that the tail call should continue execution at the following instruction. An example tail call sequence is:
12: (85) call bpf_tail_call#12 13: (b7) r0 = 0 14: (95) exit
The ARM assembler for the tail call in this case ends up branching to instruction 14 instead of instruction 13, resulting in the BPF filter returning a non-zero value:
178: ldr r8, [sp, #588] ; insn 12 17c: ldr r6, [r8, r6] 180: ldr r8, [sp, #580] 184: cmp r8, r6 188: bcs 0x1e8 18c: ldr r6, [sp, #524] 190: ldr r7, [sp, #528] 194: cmp r7, #0 198: cmpeq r6, #32 19c: bhi 0x1e8 1a0: adds r6, r6, #1 1a4: adc r7, r7, #0 1a8: str r6, [sp, #524] 1ac: str r7, [sp, #528] 1b0: mov r6, #104 1b4: ldr r8, [sp, #588] 1b8: add r6, r8, r6 1bc: ldr r8, [sp, #580] 1c0: lsl r7, r8, #2 1c4: ldr r6, [r6, r7] 1c8: cmp r6, #0 1cc: beq 0x1e8 1d0: mov r8, #32 1d4: ldr r6, [r6, r8] 1d8: add r6, r6, #44 1dc: bx r6 1e0: mov r0, #0 ; insn 13 1e4: mov r1, #0 1e8: add sp, sp, #596 ; insn 14 1ec: pop {r4, r5, r6, r7, r8, sl, pc}
For other sequences, the tail call could end up branching midway through the following BPF instructions, or maybe off the end of the function, leading to unknown behaviours.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/net/bpf_jit_32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -949,7 +949,7 @@ static int emit_bpf_tail_call(struct jit const u8 *tcc = bpf2a32[TCALL_CNT]; const int idx0 = ctx->idx; #define cur_offset (ctx->idx - idx0) -#define jmp_offset (out_offset - (cur_offset)) +#define jmp_offset (out_offset - (cur_offset) - 2) u32 off, lo, hi;
/* if (index >= array->map.max_entries)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Russell King rmk+kernel@armlinux.org.uk
commit d1220efd23484c72c82d5471f05daeb35b5d1916 upstream.
As per 2dede2d8e925 ("ARM EABI: stack pointer must be 64-bit aligned after a CPU exception") the stack should be aligned to a 64-bit boundary on EABI systems. Ensure that the eBPF JIT appropraitely aligns the stack.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/net/bpf_jit_32.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -179,8 +179,13 @@ static void jit_fill_hole(void *area, un *ptr++ = __opcode_to_mem_arm(ARM_INST_UDF); }
-/* Stack must be multiples of 16 Bytes */ -#define STACK_ALIGN(sz) (((sz) + 3) & ~3) +#if defined(CONFIG_AEABI) && (__LINUX_ARM_ARCH__ >= 5) +/* EABI requires the stack to be aligned to 64-bit boundaries */ +#define STACK_ALIGNMENT 8 +#else +/* Stack must be aligned to 32-bit boundaries */ +#define STACK_ALIGNMENT 4 +#endif
/* Stack space for BPF_REG_2, BPF_REG_3, BPF_REG_4, * BPF_REG_5, BPF_REG_7, BPF_REG_8, BPF_REG_9, @@ -194,7 +199,7 @@ static void jit_fill_hole(void *area, un + SCRATCH_SIZE + \ + 4 /* extra for skb_copy_bits buffer */)
-#define STACK_SIZE STACK_ALIGN(_STACK_SIZE) +#define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT)
/* Get the offset of eBPF REGISTERs stored on scratch space. */ #define STACK_VAR(off) (STACK_SIZE-off-4)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Russell King rmk+kernel@armlinux.org.uk
commit 70ec3a6c2c11e4b0e107a65de943a082f9aff351 upstream.
Move the stack documentation towards the top of the file, where it's relevant for things like the register layout.
Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/net/bpf_jit_32.c | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-)
--- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -27,6 +27,27 @@
int bpf_jit_enable __read_mostly;
+/* + * eBPF prog stack layout + * + * high + * original ARM_SP => +-----+ eBPF prologue + * |FP/LR| + * current ARM_FP => +-----+ + * | ... | callee saved registers + * eBPF fp register => +-----+ <= (BPF_FP) + * | ... | eBPF JIT scratch space + * | | eBPF prog stack + * +-----+ + * |RSVD | JIT scratchpad + * current ARM_SP => +-----+ <= (BPF_FP - STACK_SIZE) + * | | + * | ... | Function call stack + * | | + * +-----+ + * low + */ + #define STACK_OFFSET(k) (k) #define TMP_REG_1 (MAX_BPF_JIT_REG + 0) /* TEMP Register 1 */ #define TMP_REG_2 (MAX_BPF_JIT_REG + 1) /* TEMP Register 2 */ @@ -1091,27 +1112,6 @@ static void build_prologue(struct jit_ct
u16 reg_set = 0;
- /* - * eBPF prog stack layout - * - * high - * original ARM_SP => +-----+ eBPF prologue - * |FP/LR| - * current ARM_FP => +-----+ - * | ... | callee saved registers - * eBPF fp register => +-----+ <= (BPF_FP) - * | ... | eBPF JIT scratch space - * | | eBPF prog stack - * +-----+ - * |RSVD | JIT scratchpad - * current A64_SP => +-----+ <= (BPF_FP - STACK_SIZE) - * | | - * | ... | Function call stack - * | | - * +-----+ - * low - */ - /* Save callee saved registers. */ reg_set |= (1<<r4) | (1<<r5) | (1<<r6) | (1<<r7) | (1<<r8) | (1<<r10); #ifdef CONFIG_FRAME_POINTER
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Russell King rmk+kernel@armlinux.org.uk
commit 0005e55a79cfda88199e41a406a829c88d708c67 upstream.
The stack layout documentation incorrectly suggests that the BPF JIT scratch space starts immediately below BPF_FP. This is not correct, so let's fix the documentation to reflect reality.
Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/net/bpf_jit_32.c | 35 +++++++++++++++++++++++++++-------- 1 file changed, 27 insertions(+), 8 deletions(-)
--- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -28,24 +28,43 @@ int bpf_jit_enable __read_mostly;
/* - * eBPF prog stack layout + * eBPF prog stack layout: * * high - * original ARM_SP => +-----+ eBPF prologue - * |FP/LR| - * current ARM_FP => +-----+ - * | ... | callee saved registers - * eBPF fp register => +-----+ <= (BPF_FP) + * original ARM_SP => +-----+ + * | | callee saved registers + * +-----+ <= (BPF_FP + SCRATCH_SIZE) * | ... | eBPF JIT scratch space - * | | eBPF prog stack + * eBPF fp register => +-----+ + * (BPF_FP) | ... | eBPF prog stack * +-----+ * |RSVD | JIT scratchpad - * current ARM_SP => +-----+ <= (BPF_FP - STACK_SIZE) + * current ARM_SP => +-----+ <= (BPF_FP - STACK_SIZE + SCRATCH_SIZE) * | | * | ... | Function call stack * | | * +-----+ * low + * + * The callee saved registers depends on whether frame pointers are enabled. + * With frame pointers (to be compliant with the ABI): + * + * high + * original ARM_SP => +------------------+ \ + * | pc | | + * current ARM_FP => +------------------+ } callee saved registers + * |r4-r8,r10,fp,ip,lr| | + * +------------------+ / + * low + * + * Without frame pointers: + * + * high + * original ARM_SP => +------------------+ + * | lr | (optional) + * | r4-r8,r10 | callee saved registers + * +------------------+ + * low */
#define STACK_OFFSET(k) (k)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Russell King rmk+kernel@armlinux.org.uk
commit 02088d9b392f605c892894b46aa8c83e3abd0115 upstream.
When an eBPF program tail-calls another eBPF program, it enters it after the prologue to avoid having complex stack manipulations. This can lead to kernel oopses, and similar.
Resolve this by always using a fixed stack layout, a CPU register frame pointer, and using this when reloading registers before returning.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/net/bpf_jit_32.c | 80 ++++++++++++---------------------------------- 1 file changed, 22 insertions(+), 58 deletions(-)
--- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -61,20 +61,24 @@ int bpf_jit_enable __read_mostly; * * high * original ARM_SP => +------------------+ - * | lr | (optional) - * | r4-r8,r10 | callee saved registers - * +------------------+ + * | r4-r8,r10,fp,lr | callee saved registers + * current ARM_FP => +------------------+ * low + * + * When popping registers off the stack at the end of a BPF function, we + * reference them via the current ARM_FP register. */ +#define CALLEE_MASK (1 << ARM_R4 | 1 << ARM_R5 | 1 << ARM_R6 | \ + 1 << ARM_R7 | 1 << ARM_R8 | 1 << ARM_R10 | \ + 1 << ARM_FP) +#define CALLEE_PUSH_MASK (CALLEE_MASK | 1 << ARM_LR) +#define CALLEE_POP_MASK (CALLEE_MASK | 1 << ARM_PC)
#define STACK_OFFSET(k) (k) #define TMP_REG_1 (MAX_BPF_JIT_REG + 0) /* TEMP Register 1 */ #define TMP_REG_2 (MAX_BPF_JIT_REG + 1) /* TEMP Register 2 */ #define TCALL_CNT (MAX_BPF_JIT_REG + 2) /* Tail Call Count */
-/* Flags used for JIT optimization */ -#define SEEN_CALL (1 << 0) - #define FLAG_IMM_OVERFLOW (1 << 0)
/* @@ -135,7 +139,6 @@ static const u8 bpf2a32[][2] = { * idx : index of current last JITed instruction. * prologue_bytes : bytes used in prologue. * epilogue_offset : offset of epilogue starting. - * seen : bit mask used for JIT optimization. * offsets : array of eBPF instruction offsets in * JITed code. * target : final JITed code. @@ -150,7 +153,6 @@ struct jit_ctx { unsigned int idx; unsigned int prologue_bytes; unsigned int epilogue_offset; - u32 seen; u32 flags; u32 *offsets; u32 *target; @@ -340,7 +342,6 @@ static void emit_bx_r(u8 tgt_reg, struct
static inline void emit_blx_r(u8 tgt_reg, struct jit_ctx *ctx) { - ctx->seen |= SEEN_CALL; #if __LINUX_ARM_ARCH__ < 5 emit(ARM_MOV_R(ARM_LR, ARM_PC), ctx); emit_bx_r(tgt_reg, ctx); @@ -403,7 +404,6 @@ static inline void emit_udivmod(u8 rd, u }
/* Call appropriate function */ - ctx->seen |= SEEN_CALL; emit_mov_i(ARM_IP, op == BPF_DIV ? (u32)jit_udiv32 : (u32)jit_mod32, ctx); emit_blx_r(ARM_IP, ctx); @@ -669,8 +669,6 @@ static inline void emit_a32_lsh_r64(cons /* Do LSH operation */ emit(ARM_SUB_I(ARM_IP, rt, 32), ctx); emit(ARM_RSB_I(tmp2[0], rt, 32), ctx); - /* As we are using ARM_LR */ - ctx->seen |= SEEN_CALL; emit(ARM_MOV_SR(ARM_LR, rm, SRTYPE_ASL, rt), ctx); emit(ARM_ORR_SR(ARM_LR, ARM_LR, rd, SRTYPE_ASL, ARM_IP), ctx); emit(ARM_ORR_SR(ARM_IP, ARM_LR, rd, SRTYPE_LSR, tmp2[0]), ctx); @@ -705,8 +703,6 @@ static inline void emit_a32_arsh_r64(con /* Do the ARSH operation */ emit(ARM_RSB_I(ARM_IP, rt, 32), ctx); emit(ARM_SUBS_I(tmp2[0], rt, 32), ctx); - /* As we are using ARM_LR */ - ctx->seen |= SEEN_CALL; emit(ARM_MOV_SR(ARM_LR, rd, SRTYPE_LSR, rt), ctx); emit(ARM_ORR_SR(ARM_LR, ARM_LR, rm, SRTYPE_ASL, ARM_IP), ctx); _emit(ARM_COND_MI, ARM_B(0), ctx); @@ -741,8 +737,6 @@ static inline void emit_a32_lsr_r64(cons /* Do LSH operation */ emit(ARM_RSB_I(ARM_IP, rt, 32), ctx); emit(ARM_SUBS_I(tmp2[0], rt, 32), ctx); - /* As we are using ARM_LR */ - ctx->seen |= SEEN_CALL; emit(ARM_MOV_SR(ARM_LR, rd, SRTYPE_LSR, rt), ctx); emit(ARM_ORR_SR(ARM_LR, ARM_LR, rm, SRTYPE_ASL, ARM_IP), ctx); emit(ARM_ORR_SR(ARM_LR, ARM_LR, rm, SRTYPE_LSR, tmp2[0]), ctx); @@ -877,8 +871,6 @@ static inline void emit_a32_mul_r64(cons /* Do Multiplication */ emit(ARM_MUL(ARM_IP, rd, rn), ctx); emit(ARM_MUL(ARM_LR, rm, rt), ctx); - /* As we are using ARM_LR */ - ctx->seen |= SEEN_CALL; emit(ARM_ADD_R(ARM_LR, ARM_IP, ARM_LR), ctx);
emit(ARM_UMULL(ARM_IP, rm, rd, rt), ctx); @@ -955,7 +947,6 @@ static inline void emit_ar_r(const u8 rd const u8 rn, struct jit_ctx *ctx, u8 op) { switch (op) { case BPF_JSET: - ctx->seen |= SEEN_CALL; emit(ARM_AND_R(ARM_IP, rt, rn), ctx); emit(ARM_AND_R(ARM_LR, rd, rm), ctx); emit(ARM_ORRS_R(ARM_IP, ARM_LR, ARM_IP), ctx); @@ -1119,33 +1110,22 @@ static void build_prologue(struct jit_ct const u8 r2 = bpf2a32[BPF_REG_1][1]; const u8 r3 = bpf2a32[BPF_REG_1][0]; const u8 r4 = bpf2a32[BPF_REG_6][1]; - const u8 r5 = bpf2a32[BPF_REG_6][0]; - const u8 r6 = bpf2a32[TMP_REG_1][1]; - const u8 r7 = bpf2a32[TMP_REG_1][0]; - const u8 r8 = bpf2a32[TMP_REG_2][1]; - const u8 r10 = bpf2a32[TMP_REG_2][0]; const u8 fplo = bpf2a32[BPF_REG_FP][1]; const u8 fphi = bpf2a32[BPF_REG_FP][0]; - const u8 sp = ARM_SP; const u8 *tcc = bpf2a32[TCALL_CNT];
- u16 reg_set = 0; - /* Save callee saved registers. */ - reg_set |= (1<<r4) | (1<<r5) | (1<<r6) | (1<<r7) | (1<<r8) | (1<<r10); #ifdef CONFIG_FRAME_POINTER - reg_set |= (1<<ARM_FP) | (1<<ARM_IP) | (1<<ARM_LR) | (1<<ARM_PC); - emit(ARM_MOV_R(ARM_IP, sp), ctx); + u16 reg_set = CALLEE_PUSH_MASK | 1 << ARM_IP | 1 << ARM_PC; + emit(ARM_MOV_R(ARM_IP, ARM_SP), ctx); emit(ARM_PUSH(reg_set), ctx); emit(ARM_SUB_I(ARM_FP, ARM_IP, 4), ctx); #else - /* Check if call instruction exists in BPF body */ - if (ctx->seen & SEEN_CALL) - reg_set |= (1<<ARM_LR); - emit(ARM_PUSH(reg_set), ctx); + emit(ARM_PUSH(CALLEE_PUSH_MASK), ctx); + emit(ARM_MOV_R(ARM_FP, ARM_SP), ctx); #endif /* Save frame pointer for later */ - emit(ARM_SUB_I(ARM_IP, sp, SCRATCH_SIZE), ctx); + emit(ARM_SUB_I(ARM_IP, ARM_SP, SCRATCH_SIZE), ctx);
ctx->stack_size = imm8m(STACK_SIZE);
@@ -1168,33 +1148,19 @@ static void build_prologue(struct jit_ct /* end of prologue */ }
+/* restore callee saved registers. */ static void build_epilogue(struct jit_ctx *ctx) { - const u8 r4 = bpf2a32[BPF_REG_6][1]; - const u8 r5 = bpf2a32[BPF_REG_6][0]; - const u8 r6 = bpf2a32[TMP_REG_1][1]; - const u8 r7 = bpf2a32[TMP_REG_1][0]; - const u8 r8 = bpf2a32[TMP_REG_2][1]; - const u8 r10 = bpf2a32[TMP_REG_2][0]; - u16 reg_set = 0; - - /* unwind function call stack */ - emit(ARM_ADD_I(ARM_SP, ARM_SP, ctx->stack_size), ctx); - - /* restore callee saved registers. */ - reg_set |= (1<<r4) | (1<<r5) | (1<<r6) | (1<<r7) | (1<<r8) | (1<<r10); #ifdef CONFIG_FRAME_POINTER - /* the first instruction of the prologue was: mov ip, sp */ - reg_set |= (1<<ARM_FP) | (1<<ARM_SP) | (1<<ARM_PC); + /* When using frame pointers, some additional registers need to + * be loaded. */ + u16 reg_set = CALLEE_POP_MASK | 1 << ARM_SP; + emit(ARM_SUB_I(ARM_SP, ARM_FP, hweight16(reg_set) * 4), ctx); emit(ARM_LDM(ARM_SP, reg_set), ctx); #else - if (ctx->seen & SEEN_CALL) - reg_set |= (1<<ARM_PC); /* Restore callee saved registers. */ - emit(ARM_POP(reg_set), ctx); - /* Return back to the callee function */ - if (!(ctx->seen & SEEN_CALL)) - emit_bx_r(ARM_LR, ctx); + emit(ARM_MOV_R(ARM_SP, ARM_FP), ctx); + emit(ARM_POP(CALLEE_POP_MASK), ctx); #endif }
@@ -1422,8 +1388,6 @@ static int build_insn(const struct bpf_i emit_rev32(rt, rt, ctx); goto emit_bswap_uxt; case 64: - /* Because of the usage of ARM_LR */ - ctx->seen |= SEEN_CALL; emit_rev32(ARM_LR, rt, ctx); emit_rev32(rt, rd, ctx); emit(ARM_MOV_R(rd, ARM_LR), ctx);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Russell King rmk+kernel@armlinux.org.uk
commit ec19e02b343db991d2d1610c409efefebf4e2ca9 upstream.
When the source and destination register are identical, our JIT does not generate correct code, which leads to kernel oopses.
Fix this by (a) generating more efficient code, and (b) making use of the temporary earlier if we will overwrite the address register.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/net/bpf_jit_32.c | 61 ++++++++++++++++++++++++---------------------- 1 file changed, 33 insertions(+), 28 deletions(-)
--- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -913,33 +913,53 @@ static inline void emit_str_r(const u8 d }
/* dst = *(size*)(src + off) */ -static inline void emit_ldx_r(const u8 dst, const u8 src, bool dstk, - const s32 off, struct jit_ctx *ctx, const u8 sz){ +static inline void emit_ldx_r(const u8 dst[], const u8 src, bool dstk, + s32 off, struct jit_ctx *ctx, const u8 sz){ const u8 *tmp = bpf2a32[TMP_REG_1]; - u8 rd = dstk ? tmp[1] : dst; + const u8 *rd = dstk ? tmp : dst; u8 rm = src; + s32 off_max;
- if (off) { + if (sz == BPF_H) + off_max = 0xff; + else + off_max = 0xfff; + + if (off < 0 || off > off_max) { emit_a32_mov_i(tmp[0], off, false, ctx); emit(ARM_ADD_R(tmp[0], tmp[0], src), ctx); rm = tmp[0]; + off = 0; + } else if (rd[1] == rm) { + emit(ARM_MOV_R(tmp[0], rm), ctx); + rm = tmp[0]; } switch (sz) { - case BPF_W: - /* Load a Word */ - emit(ARM_LDR_I(rd, rm, 0), ctx); + case BPF_B: + /* Load a Byte */ + emit(ARM_LDRB_I(rd[1], rm, off), ctx); + emit_a32_mov_i(dst[0], 0, dstk, ctx); break; case BPF_H: /* Load a HalfWord */ - emit(ARM_LDRH_I(rd, rm, 0), ctx); + emit(ARM_LDRH_I(rd[1], rm, off), ctx); + emit_a32_mov_i(dst[0], 0, dstk, ctx); break; - case BPF_B: - /* Load a Byte */ - emit(ARM_LDRB_I(rd, rm, 0), ctx); + case BPF_W: + /* Load a Word */ + emit(ARM_LDR_I(rd[1], rm, off), ctx); + emit_a32_mov_i(dst[0], 0, dstk, ctx); + break; + case BPF_DW: + /* Load a Double Word */ + emit(ARM_LDR_I(rd[1], rm, off), ctx); + emit(ARM_LDR_I(rd[0], rm, off + 4), ctx); break; } if (dstk) - emit(ARM_STR_I(rd, ARM_SP, STACK_VAR(dst)), ctx); + emit(ARM_STR_I(rd[1], ARM_SP, STACK_VAR(dst[1])), ctx); + if (dstk && sz == BPF_DW) + emit(ARM_STR_I(rd[0], ARM_SP, STACK_VAR(dst[0])), ctx); }
/* Arithmatic Operation */ @@ -1440,22 +1460,7 @@ exit: rn = sstk ? tmp2[1] : src_lo; if (sstk) emit(ARM_LDR_I(rn, ARM_SP, STACK_VAR(src_lo)), ctx); - switch (BPF_SIZE(code)) { - case BPF_W: - /* Load a Word */ - case BPF_H: - /* Load a Half-Word */ - case BPF_B: - /* Load a Byte */ - emit_ldx_r(dst_lo, rn, dstk, off, ctx, BPF_SIZE(code)); - emit_a32_mov_i(dst_hi, 0, dstk, ctx); - break; - case BPF_DW: - /* Load a double word */ - emit_ldx_r(dst_lo, rn, dstk, off, ctx, BPF_W); - emit_ldx_r(dst_hi, rn, dstk, off+4, ctx, BPF_W); - break; - } + emit_ldx_r(dst, rn, dstk, off, ctx, BPF_SIZE(code)); break; /* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */ case BPF_LD | BPF_ABS | BPF_W:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Russell King rmk+kernel@armlinux.org.uk
commit 091f02483df7b56615b524491f404e574c5e0668 upstream.
As per 90caccdd8cc0 ("bpf: fix bpf_tail_call() x64 JIT"), the index used for array lookup is defined to be 32-bit wide. Update a misleading comment that suggests it is 64-bit wide.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/net/bpf_jit_32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -1016,7 +1016,7 @@ static int emit_bpf_tail_call(struct jit emit_a32_mov_i(tmp[1], off, false, ctx); emit(ARM_LDR_I(tmp2[1], ARM_SP, STACK_VAR(r2[1])), ctx); emit(ARM_LDR_R(tmp[1], tmp2[1], tmp[1]), ctx); - /* index (64 bit) */ + /* index is 32-bit for arrays */ emit(ARM_LDR_I(tmp2[1], ARM_SP, STACK_VAR(r3[1])), ctx); /* index >= array->map.max_entries */ emit(ARM_CMP_R(tmp2[1], tmp[1]), ctx);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Brezillon boris.brezillon@free-electrons.com
commit 17b11b76b87afe9f8be199d7a5f442497133e2b0 upstream.
When saving BOs in the hang state we skip one entry of the kernel_state->bo[] array, thus leaving it to NULL. This leads to a NULL pointer dereference when, later in this function, we iterate over all BOs to check their ->madv state.
Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs") Signed-off-by: Boris Brezillon boris.brezillon@free-electrons.com Signed-off-by: Eric Anholt eric@anholt.net Reviewed-by: Eric Anholt eric@anholt.net Link: https://patchwork.freedesktop.org/patch/msgid/20180118145821.22344-1-boris.b... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/vc4/vc4_gem.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/drivers/gpu/drm/vc4/vc4_gem.c +++ b/drivers/gpu/drm/vc4/vc4_gem.c @@ -146,7 +146,7 @@ vc4_save_hang_state(struct drm_device *d struct vc4_exec_info *exec[2]; struct vc4_bo *bo; unsigned long irqflags; - unsigned int i, j, unref_list_count, prev_idx; + unsigned int i, j, k, unref_list_count;
kernel_state = kcalloc(1, sizeof(*kernel_state), GFP_KERNEL); if (!kernel_state) @@ -182,24 +182,24 @@ vc4_save_hang_state(struct drm_device *d return; }
- prev_idx = 0; + k = 0; for (i = 0; i < 2; i++) { if (!exec[i]) continue;
for (j = 0; j < exec[i]->bo_count; j++) { drm_gem_object_get(&exec[i]->bo[j]->base); - kernel_state->bo[j + prev_idx] = &exec[i]->bo[j]->base; + kernel_state->bo[k++] = &exec[i]->bo[j]->base; }
list_for_each_entry(bo, &exec[i]->unref_list, unref_head) { drm_gem_object_get(&bo->base.base); - kernel_state->bo[j + prev_idx] = &bo->base.base; - j++; + kernel_state->bo[k++] = &bo->base.base; } - prev_idx = j + 1; }
+ WARN_ON_ONCE(k != state->bo_count); + if (exec[0]) state->start_bin = exec[0]->ct0ca; if (exec[1])
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jim Westfall jwestfall@surrealistic.net
[ Upstream commit 096b9854c04df86f03b38a97d40b6506e5730919 ]
Use n->primary_key instead of pkey to account for the possibility that a neigh constructor function may have modified the primary_key value.
Signed-off-by: Jim Westfall jwestfall@surrealistic.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/neighbour.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -532,7 +532,7 @@ struct neighbour *__neigh_create(struct if (atomic_read(&tbl->entries) > (1 << nht->hash_shift)) nht = neigh_hash_grow(tbl, nht->hash_shift + 1);
- hash_val = tbl->hash(pkey, dev, nht->hash_rnd) >> (32 - nht->hash_shift); + hash_val = tbl->hash(n->primary_key, dev, nht->hash_rnd) >> (32 - nht->hash_shift);
if (n->parms->dead) { rc = ERR_PTR(-EINVAL); @@ -544,7 +544,7 @@ struct neighbour *__neigh_create(struct n1 != NULL; n1 = rcu_dereference_protected(n1->next, lockdep_is_held(&tbl->lock))) { - if (dev == n1->dev && !memcmp(n1->primary_key, pkey, key_len)) { + if (dev == n1->dev && !memcmp(n1->primary_key, n->primary_key, key_len)) { if (want_ref) neigh_hold(n1); rc = n1;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jim Westfall jwestfall@surrealistic.net
[ Upstream commit cd9ff4de0107c65d69d02253bb25d6db93c3dbc1 ]
Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices to avoid making an entry for every remote ip the device needs to talk to.
This used the be the old behavior but became broken in a263b3093641f (ipv4: Make neigh lookups directly in output packet path) and later removed in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point devices) because it was broken.
Signed-off-by: Jim Westfall jwestfall@surrealistic.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/arp.h | 3 +++ net/ipv4/arp.c | 7 ++++++- 2 files changed, 9 insertions(+), 1 deletion(-)
--- a/include/net/arp.h +++ b/include/net/arp.h @@ -20,6 +20,9 @@ static inline u32 arp_hashfn(const void
static inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev, u32 key) { + if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT)) + key = INADDR_ANY; + return ___neigh_lookup_noref(&arp_tbl, neigh_key_eq32, arp_hashfn, &key, dev); }
--- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -223,11 +223,16 @@ static bool arp_key_eq(const struct neig
static int arp_constructor(struct neighbour *neigh) { - __be32 addr = *(__be32 *)neigh->primary_key; + __be32 addr; struct net_device *dev = neigh->dev; struct in_device *in_dev; struct neigh_parms *parms; + u32 inaddr_any = INADDR_ANY;
+ if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT)) + memcpy(neigh->primary_key, &inaddr_any, arp_tbl.key_len); + + addr = *(__be32 *)neigh->primary_key; rcu_read_lock(); in_dev = __in_dev_get_rcu(dev); if (!in_dev) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Kodanev alexey.kodanev@oracle.com
[ Upstream commit dd5684ecae3bd8e44b644f50e2c12c7e57fdfef5 ]
ccid2_hc_tx_rto_expire() timer callback always restarts the timer again and can run indefinitely (unless it is stopped outside), and after commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time"), which moved ccid_hc_tx_delete() (also includes sk_stop_timer()) from dccp_destroy_sock() to sk_destruct(), this started to happen quite often. The timer prevents releasing the socket, as a result, sk_destruct() won't be called.
Found with LTP/dccp_ipsec tests running on the bonding device, which later couldn't be unloaded after the tests were completed:
unregister_netdevice: waiting for bond0 to become free. Usage count = 148
Fixes: 2a91aa396739 ("[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation") Signed-off-by: Alexey Kodanev alexey.kodanev@oracle.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dccp/ccids/ccid2.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/dccp/ccids/ccid2.c +++ b/net/dccp/ccids/ccid2.c @@ -140,6 +140,9 @@ static void ccid2_hc_tx_rto_expire(unsig
ccid2_pr_debug("RTO_EXPIRE\n");
+ if (sk->sk_state == DCCP_CLOSED) + goto out; + /* back-off timer */ hc->tx_rto <<= 1; if (hc->tx_rto > DCCP_RTO_MAX)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Hutchings ben.hutchings@codethink.co.uk
[ Upstream commit e9191ffb65d8e159680ce0ad2224e1acbde6985c ]
Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl setting") removed the initialisation of ipv6_pinfo::autoflowlabel and added a second flag to indicate whether this field or the net namespace default should be used.
The getsockopt() handling for this case was not updated, so it currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is not explicitly enabled. Fix it to return the effective value, whether that has been set at the socket or net namespace level.
Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...") Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ipv6.h | 1 + net/ipv6/ip6_output.c | 2 +- net/ipv6/ipv6_sockglue.c | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-)
--- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -291,6 +291,7 @@ int ipv6_flowlabel_opt_get(struct sock * int flags); int ip6_flowlabel_init(void); void ip6_flowlabel_cleanup(void); +bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np);
static inline void fl6_sock_release(struct ip6_flowlabel *fl) { --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -166,7 +166,7 @@ int ip6_output(struct net *net, struct s !(IP6CB(skb)->flags & IP6SKB_REROUTED)); }
-static bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np) +bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np) { if (!np->autoflowlabel_set) return ip6_default_np_autolabel(net); --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -1324,7 +1324,7 @@ static int do_ipv6_getsockopt(struct soc break;
case IPV6_AUTOFLOWLABEL: - val = np->autoflowlabel; + val = ip6_autoflowlabel(sock_net(sk), np); break;
case IPV6_RECVFRAGSIZE:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mike Maloney maloney@google.com
[ Upstream commit 749439bfac6e1a2932c582e2699f91d329658196 ]
The logic in __ip6_append_data() assumes that the MTU is at least large enough for the headers. A device's MTU may be adjusted after being added while sendmsg() is processing data, resulting in __ip6_append_data() seeing any MTU. For an mtu smaller than the size of the fragmentation header, the math results in a negative 'maxfraglen', which causes problems when refragmenting any previous skb in the skb_write_queue, leaving it possibly malformed.
Instead sendmsg returns EINVAL when the mtu is calculated to be less than IPV6_MIN_MTU.
Found by syzkaller: kernel BUG at ./include/linux/skbuff.h:2064! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801d0b68580 task.stack: ffff8801ac6b8000 RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline] RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216 RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000 RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0 RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000 R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8 R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000 FS: 00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6_finish_skb include/net/ipv6.h:911 [inline] udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093 udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x352/0x5a0 net/socket.c:1750 SyS_sendto+0x40/0x50 net/socket.c:1718 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4512e9 RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9 RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005 RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69 R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000 Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570 RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570
Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Mike Maloney maloney@google.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_output.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1206,14 +1206,16 @@ static int ip6_setup_cork(struct sock *s v6_cork->tclass = ipc6->tclass; if (rt->dst.flags & DST_XFRM_TUNNEL) mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ? - rt->dst.dev->mtu : dst_mtu(&rt->dst); + READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst); else mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ? - rt->dst.dev->mtu : dst_mtu(rt->dst.path); + READ_ONCE(rt->dst.dev->mtu) : dst_mtu(rt->dst.path); if (np->frag_size < mtu) { if (np->frag_size) mtu = np->frag_size; } + if (mtu < IPV6_MIN_MTU) + return -EINVAL; cork->base.fragsize = mtu; if (dst_allfrag(rt->dst.path)) cork->base.flags |= IPCORK_ALLFRAG;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 95ef498d977bf44ac094778fd448b98af158a3e6 ]
In my last patch, I missed fact that cork.base.dst was not initialized in ip6_make_skb() :
If ip6_setup_cork() returns an error, we might attempt a dst_release() on some random pointer.
Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_output.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1735,6 +1735,7 @@ struct sk_buff *ip6_make_skb(struct sock cork.base.flags = 0; cork.base.addr = 0; cork.base.opt = NULL; + cork.base.dst = NULL; v6_cork.opt = NULL; err = ip6_setup_cork(sk, &cork, &v6_cork, ipc6, rt, fl6); if (err) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuiko Oshino yuiko.oshino@microchip.com
[ Upstream commit a5b1379afbfabf91e3a689e82ac619a7157336b3 ]
Fix initialize the uninitialized tx_qlen to an appropriate value when USB Full Speed is used.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Yuiko Oshino yuiko.oshino@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/lan78xx.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -2396,6 +2396,7 @@ static int lan78xx_reset(struct lan78xx_ buf = DEFAULT_BURST_CAP_SIZE / FS_USB_PKT_SIZE; dev->rx_urb_size = DEFAULT_BURST_CAP_SIZE; dev->rx_qlen = 4; + dev->tx_qlen = 4; }
ret = lan78xx_write_reg(dev, BURST_CAP, buf);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
[ Upstream commit ad23b750933ea7bf962678972a286c78a8fa36aa ]
Commit "net: igmp: Use correct source address on IGMPv3 reports" introduced a check to validate the source address of locally generated IGMPv3 packets. Instead of checking the local interface address directly, it uses inet_ifa_match(fl4->saddr, ifa), which checks if the address is on the local subnet (or equal to the point-to-point address if used).
This breaks for point-to-point interfaces, so check against ifa->ifa_local directly.
Cc: Kevin Cernekee cernekee@chromium.org Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports") Reported-by: Sebastian Gottschall s.gottschall@dd-wrt.com Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/igmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -332,7 +332,7 @@ static __be32 igmpv3_get_srcaddr(struct return htonl(INADDR_ANY);
for_ifa(in_dev) { - if (inet_ifa_match(fl4->saddr, ifa)) + if (fl4->saddr == ifa->ifa_local) return fl4->saddr; } endfor_ifa(in_dev);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 7c68d1a6b4db9012790af7ac0f0fdc0d2083422a ]
Without proper validation of DODGY packets, we might very well feed qdisc_pkt_len_init() with invalid GSO packets.
tcp_hdrlen() might access out-of-bound data, so let's use skb_header_pointer() and proper checks.
Whole story is described in commit d0c081b49137 ("flow_dissector: properly cap thoff field")
We have the goal of validating DODGY packets earlier in the stack, so we might very well revert this fix in the future.
Signed-off-by: Eric Dumazet edumazet@google.com Cc: Willem de Bruijn willemb@google.com Cc: Jason Wang jasowang@redhat.com Reported-by: syzbot+9da69ebac7dddd804552@syzkaller.appspotmail.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -3128,10 +3128,21 @@ static void qdisc_pkt_len_init(struct sk hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
/* + transport layer */ - if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) - hdr_len += tcp_hdrlen(skb); - else - hdr_len += sizeof(struct udphdr); + if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) { + const struct tcphdr *th; + struct tcphdr _tcphdr; + + th = skb_header_pointer(skb, skb_transport_offset(skb), + sizeof(_tcphdr), &_tcphdr); + if (likely(th)) + hdr_len += __tcp_hdrlen(th); + } else { + struct udphdr _udphdr; + + if (skb_header_pointer(skb, skb_transport_offset(skb), + sizeof(_udphdr), &_udphdr)) + hdr_len += sizeof(struct udphdr); + }
if (shinfo->gso_type & SKB_GSO_DODGY) gso_segs = DIV_ROUND_UP(skb->len - hdr_len,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Streetman ddstreet@ieee.org
[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]
When a tcp socket is closed, if it detects that its net namespace is exiting, close immediately and do not wait for FIN sequence.
For normal sockets, a reference is taken to their net namespace, so it will never exit while the socket is open. However, kernel sockets do not take a reference to their net namespace, so it may begin exiting while the kernel socket is still open. In this case if the kernel socket is a tcp socket, it will stay open trying to complete its close sequence. The sock's dst(s) hold a reference to their interface, which are all transferred to the namespace's loopback interface when the real interfaces are taken down. When the namespace tries to take down its loopback interface, it hangs waiting for all references to the loopback interface to release, which results in messages like:
unregister_netdevice: waiting for lo to become free. Usage count = 1
These messages continue until the socket finally times out and closes. Since the net namespace cleanup holds the net_mutex while calling its registered pernet callbacks, any new net namespace initialization is blocked until the current net namespace finishes exiting.
After this change, the tcp socket notices the exiting net namespace, and closes immediately, releasing its dst(s) and their reference to the loopback interface, which lets the net namespace continue exiting.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811 Signed-off-by: Dan Streetman ddstreet@canonical.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/net_namespace.h | 10 ++++++++++ net/ipv4/tcp.c | 3 +++ net/ipv4/tcp_timer.c | 15 +++++++++++++++ 3 files changed, 28 insertions(+)
--- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -223,6 +223,11 @@ int net_eq(const struct net *net1, const return net1 == net2; }
+static inline int check_net(const struct net *net) +{ + return atomic_read(&net->count) != 0; +} + void net_drop_ns(void *);
#else @@ -246,6 +251,11 @@ int net_eq(const struct net *net1, const { return 1; } + +static inline int check_net(const struct net *net) +{ + return 1; +}
#define net_drop_ns NULL #endif --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2273,6 +2273,9 @@ adjudge_to_death: tcp_send_active_reset(sk, GFP_ATOMIC); __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY); + } else if (!check_net(sock_net(sk))) { + /* Not possible to send reset; just close */ + tcp_set_state(sk, TCP_CLOSE); } }
--- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -50,11 +50,19 @@ static void tcp_write_err(struct sock *s * to prevent DoS attacks. It is called when a retransmission timeout * or zero probe timeout occurs on orphaned socket. * + * Also close if our net namespace is exiting; in that case there is no + * hope of ever communicating again since all netns interfaces are already + * down (or about to be down), and we need to release our dst references, + * which have been moved to the netns loopback interface, so the namespace + * can finish exiting. This condition is only possible if we are a kernel + * socket, as those do not hold references to the namespace. + * * Criteria is still not confirmed experimentally and may change. * We kill the socket, if: * 1. If number of orphaned sockets exceeds an administratively configured * limit. * 2. If we have strong memory pressure. + * 3. If our net namespace is exiting. */ static int tcp_out_of_resources(struct sock *sk, bool do_reset) { @@ -83,6 +91,13 @@ static int tcp_out_of_resources(struct s __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY); return 1; } + + if (!check_net(sock_net(sk))) { + /* Not possible to send reset; just close */ + tcp_done(sk); + return 1; + } + return 0; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "r.hering@avm.de" r.hering@avm.de
[ Upstream commit 30be8f8dba1bd2aff73e8447d59228471233a3d4 ]
sendfile() calls can hang endless with using Kernel TLS if a socket error occurs. Socket error codes must be inverted by Kernel TLS before returning because they are stored with positive sign. If returned non-inverted they are interpreted as number of bytes sent, causing endless looping of the splice mechanic behind sendfile().
Signed-off-by: Robert Hering r.hering@avm.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/tls.h | 2 +- net/tls/tls_sw.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
--- a/include/net/tls.h +++ b/include/net/tls.h @@ -168,7 +168,7 @@ static inline bool tls_is_pending_open_r
static inline void tls_err_abort(struct sock *sk) { - sk->sk_err = -EBADMSG; + sk->sk_err = EBADMSG; sk->sk_error_report(sk); }
--- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -407,7 +407,7 @@ int tls_sw_sendmsg(struct sock *sk, stru
while (msg_data_left(msg)) { if (sk->sk_err) { - ret = sk->sk_err; + ret = -sk->sk_err; goto send_end; }
@@ -560,7 +560,7 @@ int tls_sw_sendpage(struct sock *sk, str size_t copy, required_size;
if (sk->sk_err) { - ret = sk->sk_err; + ret = -sk->sk_err; goto sendpage_end; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Ahern dsahern@gmail.com
[ Upstream commit 1e19c4d689dc1e95bafd23ef68fbc0c6b9e05180 ]
Sukumar reported that sends to the local broadcast address (255.255.255.255) are broken. Check for the address in vrf driver and do not redirect to the VRF device - similar to multicast packets.
With this change sockets can use SO_BINDTODEVICE to specify an egress interface and receive responses. Note: the egress interface can not be a VRF device but needs to be the enslaved device.
https://bugzilla.kernel.org/show_bug.cgi?id=198521
Reported-by: Sukumar Gopalakrishnan sukumarg1973@gmail.com Signed-off-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/vrf.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -674,8 +674,9 @@ static struct sk_buff *vrf_ip_out(struct struct sock *sk, struct sk_buff *skb) { - /* don't divert multicast */ - if (ipv4_is_multicast(ip_hdr(skb)->daddr)) + /* don't divert multicast or local broadcast */ + if (ipv4_is_multicast(ip_hdr(skb)->daddr) || + ipv4_is_lbcast(ip_hdr(skb)->daddr)) return skb;
if (qdisc_tx_is_default(vrf_dev))
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guillaume Nault g.nault@alphalink.fr
[ Upstream commit 02612bb05e51df8489db5e94d0cf8d1c81f87b0c ]
In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom was probably fine before the introduction of ->needed_headroom in commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom").
But now, virtual devices typically advertise the size of their overhead in dev->needed_headroom, so we must also take it into account in skb_reserve(). Allocation size of skb is also updated to take dev->needed_tailroom into account and replace the arbitrary 32 bytes with the real size of a PPPoE header.
This issue was discovered by syzbot, who connected a pppoe socket to a gre device which had dev->header_ops->create == ipgre_header and dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any headroom, and dev_hard_header() crashed when ipgre_header() tried to prepend its header to skb->data.
skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24 head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:104! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted 4.15.0-rc7-next-20180115+ #97 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100 RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282 RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000 RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0 R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180 FS: 00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: skb_under_panic net/core/skbuff.c:114 [inline] skb_push+0xce/0xf0 net/core/skbuff.c:1714 ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879 dev_hard_header include/linux/netdevice.h:2723 [inline] pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg+0xca/0x110 net/socket.c:640 sock_write_iter+0x31a/0x5d0 net/socket.c:909 call_write_iter include/linux/fs.h:1775 [inline] do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653 do_iter_write+0x154/0x540 fs/read_write.c:932 vfs_writev+0x18a/0x340 fs/read_write.c:977 do_writev+0xfc/0x2a0 fs/read_write.c:1012 SYSC_writev fs/read_write.c:1085 [inline] SyS_writev+0x27/0x30 fs/read_write.c:1082 entry_SYSCALL_64_fastpath+0x29/0xa0
Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like interfaces, but reserving space for ->needed_headroom is a more fundamental issue that needs to be addressed first.
Same problem exists for __pppoe_xmit(), which also needs to take dev->needed_headroom into account in skb_cow_head().
Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom") Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault g.nault@alphalink.fr Reviewed-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ppp/pppoe.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -842,6 +842,7 @@ static int pppoe_sendmsg(struct socket * struct pppoe_hdr *ph; struct net_device *dev; char *start; + int hlen;
lock_sock(sk); if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) { @@ -860,16 +861,16 @@ static int pppoe_sendmsg(struct socket * if (total_len > (dev->mtu + dev->hard_header_len)) goto end;
- - skb = sock_wmalloc(sk, total_len + dev->hard_header_len + 32, - 0, GFP_KERNEL); + hlen = LL_RESERVED_SPACE(dev); + skb = sock_wmalloc(sk, hlen + sizeof(*ph) + total_len + + dev->needed_tailroom, 0, GFP_KERNEL); if (!skb) { error = -ENOMEM; goto end; }
/* Reserve space for headers. */ - skb_reserve(skb, dev->hard_header_len); + skb_reserve(skb, hlen); skb_reset_network_header(skb);
skb->dev = dev; @@ -930,7 +931,7 @@ static int __pppoe_xmit(struct sock *sk, /* Copy the data if there is no space for the header or if it's * read-only. */ - if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len)) + if (skb_cow_head(skb, LL_RESERVED_SPACE(dev) + sizeof(*ph))) goto abort;
__skb_push(skb, sizeof(*ph));
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Francois Romieu romieu@fr.zoreil.com
[ Upstream commit a78e93661c5fd30b9e1dee464b2f62f966883ef7 ]
Hardware statistics retrieval hurts in tight invocation loops.
Avoid extraneous write and enforce strict ordering of writes targeted to the tally counters dump area address registers.
Signed-off-by: Francois Romieu romieu@fr.zoreil.com Tested-by: Oliver Freyermuth o.freyermuth@googlemail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/realtek/r8169.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
--- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -2239,19 +2239,14 @@ static bool rtl8169_do_counters(struct n void __iomem *ioaddr = tp->mmio_addr; dma_addr_t paddr = tp->counters_phys_addr; u32 cmd; - bool ret;
RTL_W32(CounterAddrHigh, (u64)paddr >> 32); + RTL_R32(CounterAddrHigh); cmd = (u64)paddr & DMA_BIT_MASK(32); RTL_W32(CounterAddrLow, cmd); RTL_W32(CounterAddrLow, cmd | counter_cmd);
- ret = rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000); - - RTL_W32(CounterAddrLow, 0); - RTL_W32(CounterAddrHigh, 0); - - return ret; + return rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000); }
static bool rtl8169_reset_counters(struct net_device *dev)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit c5006b8aa74599ce19104b31d322d2ea9ff887cc ]
The check in sctp_sockaddr_af is not robust enough to forbid binding a v4mapped v6 addr on a v4 socket.
The worse thing is that v4 socket's bind_verify would not convert this v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4 socket bound a v6 addr.
This patch is to fix it by doing the common sa.sa_family check first, then AF_INET check for v4mapped v6 addrs.
Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.") Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/socket.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
--- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -334,16 +334,14 @@ static struct sctp_af *sctp_sockaddr_af( if (len < sizeof (struct sockaddr)) return NULL;
+ if (!opt->pf->af_supported(addr->sa.sa_family, opt)) + return NULL; + /* V4 mapped address are really of AF_INET family */ if (addr->sa.sa_family == AF_INET6 && - ipv6_addr_v4mapped(&addr->v6.sin6_addr)) { - if (!opt->pf->af_supported(AF_INET, opt)) - return NULL; - } else { - /* Does this PF support this AF? */ - if (!opt->pf->af_supported(addr->sa.sa_family, opt)) - return NULL; - } + ipv6_addr_v4mapped(&addr->v6.sin6_addr) && + !opt->pf->af_supported(AF_INET, opt)) + return NULL;
/* If we get this far, af is valid. */ af = sctp_get_af_specific(addr->sa.sa_family);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit a0ff660058b88d12625a783ce9e5c1371c87951f ]
After commit cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep"), it may change to lock another sk if the asoc has been peeled off in sctp_wait_for_sndbuf.
However, the asoc's new sk could be already closed elsewhere, as it's in the sendmsg context of the old sk that can't avoid the new sk's closing. If the sk's last one refcnt is held by this asoc, later on after putting this asoc, the new sk will be freed, while under it's own lock.
This patch is to revert that commit, but fix the old issue by returning error under the old sk's lock.
Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep") Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/socket.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
--- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -84,7 +84,7 @@ static int sctp_writeable(struct sock *sk); static void sctp_wfree(struct sk_buff *skb); static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p, - size_t msg_len, struct sock **orig_sk); + size_t msg_len); static int sctp_wait_for_packet(struct sock *sk, int *err, long *timeo_p); static int sctp_wait_for_connect(struct sctp_association *, long *timeo_p); static int sctp_wait_for_accept(struct sock *sk, long timeo); @@ -1961,7 +1961,7 @@ static int sctp_sendmsg(struct sock *sk, timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); if (!sctp_wspace(asoc)) { /* sk can be changed by peel off when waiting for buf. */ - err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len, &sk); + err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len); if (err) { if (err == -ESRCH) { /* asoc is already dead. */ @@ -7825,12 +7825,12 @@ void sctp_sock_rfree(struct sk_buff *skb
/* Helper function to wait for space in the sndbuf. */ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p, - size_t msg_len, struct sock **orig_sk) + size_t msg_len) { struct sock *sk = asoc->base.sk; - int err = 0; long current_timeo = *timeo_p; DEFINE_WAIT(wait); + int err = 0;
pr_debug("%s: asoc:%p, timeo:%ld, msg_len:%zu\n", __func__, asoc, *timeo_p, msg_len); @@ -7859,17 +7859,13 @@ static int sctp_wait_for_sndbuf(struct s release_sock(sk); current_timeo = schedule_timeout(current_timeo); lock_sock(sk); - if (sk != asoc->base.sk) { - release_sock(sk); - sk = asoc->base.sk; - lock_sock(sk); - } + if (sk != asoc->base.sk) + goto do_error;
*timeo_p = current_timeo; }
out: - *orig_sk = sk; finish_wait(&asoc->wait, &wait);
/* Release the association's refcnt. */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 59b36613e85fb16ebf9feaf914570879cd5c2a21 ]
When tipc_node_find_by_name() fails, the nlmsg is not freed.
While on it, switch to a goto label to properly free it.
Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node") Reported-by: Dmitry Vyukov dvyukov@google.com Cc: Jon Maloy jon.maloy@ericsson.com Cc: Ying Xue ying.xue@windriver.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Acked-by: Ying Xue ying.xue@windriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tipc/node.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-)
--- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -1848,36 +1848,38 @@ int tipc_nl_node_get_link(struct sk_buff
if (strcmp(name, tipc_bclink_name) == 0) { err = tipc_nl_add_bc_link(net, &msg); - if (err) { - nlmsg_free(msg.skb); - return err; - } + if (err) + goto err_free; } else { int bearer_id; struct tipc_node *node; struct tipc_link *link;
node = tipc_node_find_by_name(net, name, &bearer_id); - if (!node) - return -EINVAL; + if (!node) { + err = -EINVAL; + goto err_free; + }
tipc_node_read_lock(node); link = node->links[bearer_id].link; if (!link) { tipc_node_read_unlock(node); - nlmsg_free(msg.skb); - return -EINVAL; + err = -EINVAL; + goto err_free; }
err = __tipc_nl_add_link(net, &msg, link, 0); tipc_node_read_unlock(node); - if (err) { - nlmsg_free(msg.skb); - return err; - } + if (err) + goto err_free; }
return genlmsg_reply(msg.skb, info); + +err_free: + nlmsg_free(msg.skb); + return err; }
int tipc_nl_node_reset_link_stats(struct sk_buff *skb, struct genl_info *info)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eran Ben Elisha eranbe@mellanox.com
[ Upstream commit 8978cc921fc7fad3f4d6f91f1da01352aeeeff25 ]
There are systems platform information management interfaces (such as HOST2BMC) for which we cannot disable local loopback multicast traffic.
Separate disable_local_lb_mc and disable_local_lb_uc capability bits so driver will not disable multicast loopback traffic if not supported. (It is expected that Firmware will not set disable_local_lb_mc if HOST2BMC is running for example.)
Function mlx5_nic_vport_update_local_lb will do best effort to disable/enable UC/MC loopback traffic and return success only in case it succeeded to changed all allowed by Firmware.
Adapt mlx5_ib and mlx5e to support the new cap bits.
Fixes: 2c43c5a036be ("net/mlx5e: Enable local loopback in loopback selftest") Fixes: c85023e153e3 ("IB/mlx5: Add raw ethernet local loopback support") Fixes: bded747bb432 ("net/mlx5: Add raw ethernet local loopback firmware command") Signed-off-by: Eran Ben Elisha eranbe@mellanox.com Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/mlx5/main.c | 9 ++++-- drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c | 27 ++++++++++++------ drivers/net/ethernet/mellanox/mlx5/core/main.c | 3 -- drivers/net/ethernet/mellanox/mlx5/core/vport.c | 22 ++++++++++---- include/linux/mlx5/mlx5_ifc.h | 5 ++- 5 files changed, 44 insertions(+), 22 deletions(-)
--- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1276,7 +1276,8 @@ static int mlx5_ib_alloc_transport_domai return err;
if ((MLX5_CAP_GEN(dev->mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH) || - !MLX5_CAP_GEN(dev->mdev, disable_local_lb)) + (!MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) && + !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc))) return err;
mutex_lock(&dev->lb_mutex); @@ -1294,7 +1295,8 @@ static void mlx5_ib_dealloc_transport_do mlx5_core_dealloc_transport_domain(dev->mdev, tdn);
if ((MLX5_CAP_GEN(dev->mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH) || - !MLX5_CAP_GEN(dev->mdev, disable_local_lb)) + (!MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) && + !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc))) return;
mutex_lock(&dev->lb_mutex); @@ -4161,7 +4163,8 @@ static void *mlx5_ib_add(struct mlx5_cor }
if ((MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH) && - MLX5_CAP_GEN(mdev, disable_local_lb)) + (MLX5_CAP_GEN(mdev, disable_local_lb_uc) || + MLX5_CAP_GEN(mdev, disable_local_lb_mc))) mutex_init(&dev->lb_mutex);
dev->ib_active = true; --- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c @@ -238,15 +238,19 @@ static int mlx5e_test_loopback_setup(str int err = 0;
/* Temporarily enable local_lb */ - if (MLX5_CAP_GEN(priv->mdev, disable_local_lb)) { - mlx5_nic_vport_query_local_lb(priv->mdev, &lbtp->local_lb); - if (!lbtp->local_lb) - mlx5_nic_vport_update_local_lb(priv->mdev, true); + err = mlx5_nic_vport_query_local_lb(priv->mdev, &lbtp->local_lb); + if (err) + return err; + + if (!lbtp->local_lb) { + err = mlx5_nic_vport_update_local_lb(priv->mdev, true); + if (err) + return err; }
err = mlx5e_refresh_tirs(priv, true); if (err) - return err; + goto out;
lbtp->loopback_ok = false; init_completion(&lbtp->comp); @@ -256,16 +260,21 @@ static int mlx5e_test_loopback_setup(str lbtp->pt.dev = priv->netdev; lbtp->pt.af_packet_priv = lbtp; dev_add_pack(&lbtp->pt); + + return 0; + +out: + if (!lbtp->local_lb) + mlx5_nic_vport_update_local_lb(priv->mdev, false); + return err; }
static void mlx5e_test_loopback_cleanup(struct mlx5e_priv *priv, struct mlx5e_lbt_priv *lbtp) { - if (MLX5_CAP_GEN(priv->mdev, disable_local_lb)) { - if (!lbtp->local_lb) - mlx5_nic_vport_update_local_lb(priv->mdev, false); - } + if (!lbtp->local_lb) + mlx5_nic_vport_update_local_lb(priv->mdev, false);
dev_remove_pack(&lbtp->pt); mlx5e_refresh_tirs(priv, false); --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -577,8 +577,7 @@ static int mlx5_core_set_hca_defaults(st int ret = 0;
/* Disable local_lb by default */ - if ((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH) && - MLX5_CAP_GEN(dev, disable_local_lb)) + if (MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH) ret = mlx5_nic_vport_update_local_lb(dev, false);
return ret; --- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c @@ -908,23 +908,33 @@ int mlx5_nic_vport_update_local_lb(struc void *in; int err;
- mlx5_core_dbg(mdev, "%s local_lb\n", enable ? "enable" : "disable"); + if (!MLX5_CAP_GEN(mdev, disable_local_lb_mc) && + !MLX5_CAP_GEN(mdev, disable_local_lb_uc)) + return 0; + in = kvzalloc(inlen, GFP_KERNEL); if (!in) return -ENOMEM;
MLX5_SET(modify_nic_vport_context_in, in, - field_select.disable_mc_local_lb, 1); - MLX5_SET(modify_nic_vport_context_in, in, nic_vport_context.disable_mc_local_lb, !enable); - - MLX5_SET(modify_nic_vport_context_in, in, - field_select.disable_uc_local_lb, 1); MLX5_SET(modify_nic_vport_context_in, in, nic_vport_context.disable_uc_local_lb, !enable);
+ if (MLX5_CAP_GEN(mdev, disable_local_lb_mc)) + MLX5_SET(modify_nic_vport_context_in, in, + field_select.disable_mc_local_lb, 1); + + if (MLX5_CAP_GEN(mdev, disable_local_lb_uc)) + MLX5_SET(modify_nic_vport_context_in, in, + field_select.disable_uc_local_lb, 1); + err = mlx5_modify_nic_vport_context(mdev, in, inlen);
+ if (!err) + mlx5_core_dbg(mdev, "%s local_lb\n", + enable ? "enable" : "disable"); + kvfree(in); return err; } --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1023,8 +1023,9 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 log_max_wq_sz[0x5];
u8 nic_vport_change_event[0x1]; - u8 disable_local_lb[0x1]; - u8 reserved_at_3e2[0x9]; + u8 disable_local_lb_uc[0x1]; + u8 disable_local_lb_mc[0x1]; + u8 reserved_at_3e3[0x8]; u8 log_max_vlan_list[0x5]; u8 reserved_at_3f0[0x3]; u8 log_max_current_mc_list[0x5];
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Saeed Mahameed saeedm@mellanox.com
[ Upstream commit 05e0cc84e00c54fb152d1f4b86bc211823a83d0c ]
mlx5_get_vector_affinity used to call pci_irq_get_affinity and after reverting the patch that sets the device affinity via PCI_IRQ_AFFINITY API, calling pci_irq_get_affinity becomes useless and it breaks RDMA mlx5 users. To fix this, this patch provides an alternative way to retrieve IRQ vector affinity using legacy IRQ API, following smp_affinity read procfs implementation.
Fixes: 231243c82793 ("Revert mlx5: move affinity hints assignments to generic code") Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code") Cc: Sagi Grimberg sagi@grimberg.me Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/mlx5/driver.h | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-)
--- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -36,6 +36,7 @@ #include <linux/kernel.h> #include <linux/completion.h> #include <linux/pci.h> +#include <linux/irq.h> #include <linux/spinlock_types.h> #include <linux/semaphore.h> #include <linux/slab.h> @@ -1194,7 +1195,23 @@ enum { static inline const struct cpumask * mlx5_get_vector_affinity(struct mlx5_core_dev *dev, int vector) { - return pci_irq_get_affinity(dev->pdev, MLX5_EQ_VEC_COMP_BASE + vector); + const struct cpumask *mask; + struct irq_desc *desc; + unsigned int irq; + int eqn; + int err; + + err = mlx5_vector2eqn(dev, vector, &eqn, &irq); + if (err) + return NULL; + + desc = irq_to_desc(irq); +#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK + mask = irq_data_get_effective_affinity_mask(&desc->irq_data); +#else + mask = desc->irq_common_data.affinity; +#endif + return mask; }
#endif /* MLX5_DRIVER_H */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guillaume Nault g.nault@alphalink.fr
[ Upstream commit 0171c41835591e9aa2e384b703ef9a6ae367c610 ]
ppp_dev_uninit(), which is the .ndo_uninit() handler of PPP devices, needs to lock pn->all_ppp_mutex. Therefore we mustn't call register_netdevice() with pn->all_ppp_mutex already locked, or we'd deadlock in case register_netdevice() fails and calls .ndo_uninit().
Fortunately, we can unlock pn->all_ppp_mutex before calling register_netdevice(). This lock protects pn->units_idr, which isn't used in the device registration process.
However, keeping pn->all_ppp_mutex locked during device registration did ensure that no device in transient state would be published in pn->units_idr. In practice, unlocking it before calling register_netdevice() doesn't change this property: ppp_unit_register() is called with 'ppp_mutex' locked and all searches done in pn->units_idr hold this lock too.
Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Reported-and-tested-by: syzbot+367889b9c9e279219175@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault g.nault@alphalink.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ppp/ppp_generic.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -1003,17 +1003,18 @@ static int ppp_unit_register(struct ppp if (!ifname_is_set) snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
+ mutex_unlock(&pn->all_ppp_mutex); + ret = register_netdevice(ppp->dev); if (ret < 0) goto err_unit;
atomic_inc(&ppp_unit_count);
- mutex_unlock(&pn->all_ppp_mutex); - return 0;
err_unit: + mutex_lock(&pn->all_ppp_mutex); unit_put(&pn->units_idr, ppp->file.index); err: mutex_unlock(&pn->all_ppp_mutex);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera cera@cera.cz
[ Upstream commit 52acf06451930eb4cefabd5ecea56e2d46c32f76 ]
The commit 622190669403 ("be2net: Request RSS capability of Rx interface depending on number of Rx rings") modified be_update_queues() so the IFACE (HW representation of the netdevice) is destroyed and then re-created. This causes a regression because potential promiscuous mode is not restored properly during be_open() because the driver thinks that the HW has promiscuous mode already enabled.
Note that Lancer is not affected by this bug because RX-filter flags are disabled during be_close() for this chipset.
Cc: Sathya Perla sathya.perla@broadcom.com Cc: Ajit Khaparde ajit.khaparde@broadcom.com Cc: Sriharsha Basavapatna sriharsha.basavapatna@broadcom.com Cc: Somnath Kotur somnath.kotur@broadcom.com
Fixes: 622190669403 ("be2net: Request RSS capability of Rx interface depending on number of Rx rings") Signed-off-by: Ivan Vecera ivecera@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/emulex/benet/be_main.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -4634,6 +4634,15 @@ int be_update_queues(struct be_adapter *
be_schedule_worker(adapter);
+ /* + * The IF was destroyed and re-created. We need to clear + * all promiscuous flags valid for the destroyed IF. + * Without this promisc mode is not restored during + * be_open() because the driver thinks that it is + * already enabled in HW. + */ + adapter->if_flags &= ~BE_IF_FLAGS_ALL_PROMISCUOUS; + if (netif_running(netdev)) status = be_open(netdev);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Kodanev alexey.kodanev@oracle.com
[ Upstream commit 128bb975dc3c25d00de04e503e2fe0a780d04459 ]
Commit b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions") moved dev->mtu initialization from ip6gre_tunnel_setup() to ip6gre_tunnel_init(), as a result, the previously set values, before ndo_init(), are reset in the following cases:
* rtnl_create_link() can update dev->mtu from IFLA_MTU parameter.
* ip6gre_tnl_link_config() is invoked before ndo_init() in netlink and ioctl setup, so ndo_init() can reset MTU adjustments with the lower device MTU as well, dev->mtu and dev->hard_header_len.
Not applicable for ip6gretap because it has one more call to ip6gre_tnl_link_config(tunnel, 1) in ip6gre_tap_init().
Fix the first case by updating dev->mtu with 'tb[IFLA_MTU]' parameter if a user sets it manually on a device creation, and fix the second one by moving ip6gre_tnl_link_config() call after register_netdevice().
Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions") Fixes: db2ec95d1ba4 ("ip6_gre: Fix MTU setting") Signed-off-by: Alexey Kodanev alexey.kodanev@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_gre.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
--- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -337,11 +337,12 @@ static struct ip6_tnl *ip6gre_tunnel_loc
nt->dev = dev; nt->net = dev_net(dev); - ip6gre_tnl_link_config(nt, 1);
if (register_netdevice(dev) < 0) goto failed_free;
+ ip6gre_tnl_link_config(nt, 1); + /* Can use a lockless transmit, unless we generate output sequences */ if (!(nt->parms.o_flags & TUNNEL_SEQ)) dev->features |= NETIF_F_LLTX; @@ -1307,7 +1308,6 @@ static void ip6gre_netlink_parms(struct
static int ip6gre_tap_init(struct net_device *dev) { - struct ip6_tnl *tunnel; int ret;
ret = ip6gre_tunnel_init_common(dev); @@ -1316,10 +1316,6 @@ static int ip6gre_tap_init(struct net_de
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
- tunnel = netdev_priv(dev); - - ip6gre_tnl_link_config(tunnel, 1); - return 0; }
@@ -1411,12 +1407,16 @@ static int ip6gre_newlink(struct net *sr
nt->dev = dev; nt->net = dev_net(dev); - ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);
err = register_netdevice(dev); if (err) goto out;
+ ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]); + + if (tb[IFLA_MTU]) + ip6_tnl_change_mtu(dev, nla_get_u32(tb[IFLA_MTU])); + dev_hold(dev); ip6gre_tunnel_link(ign, nt);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
[ Upstream commit 121d57af308d0cf943f08f4738d24d3966c38cd9 ]
Validate gso_type during segmentation as SKB_GSO_DODGY sources may pass packets where the gso_type does not match the contents.
Syzkaller was able to enter the SCTP gso handler with a packet of gso_type SKB_GSO_TCPV4.
On entry of transport layer gso handlers, verify that the gso_type matches the transport protocol.
Fixes: 90017accff61 ("sctp: Add GSO support") Link: http://lkml.kernel.org/r/001a1137452496ffc305617e5fe0@google.com Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn willemb@google.com Acked-by: Jason Wang jasowang@redhat.com Reviewed-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/esp4_offload.c | 3 +++ net/ipv4/tcp_offload.c | 3 +++ net/ipv4/udp_offload.c | 3 +++ net/ipv6/esp6_offload.c | 3 +++ net/ipv6/tcpv6_offload.c | 3 +++ net/ipv6/udp_offload.c | 3 +++ net/sctp/offload.c | 3 +++ 7 files changed, 21 insertions(+)
--- a/net/ipv4/esp4_offload.c +++ b/net/ipv4/esp4_offload.c @@ -121,6 +121,9 @@ static struct sk_buff *esp4_gso_segment( if (!xo) goto out;
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_ESP)) + goto out; + seq = xo->seq.low;
x = skb->sp->xvec[skb->sp->len - 1]; --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -32,6 +32,9 @@ static void tcp_gso_tstamp(struct sk_buf static struct sk_buff *tcp4_gso_segment(struct sk_buff *skb, netdev_features_t features) { + if (!(skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)) + return ERR_PTR(-EINVAL); + if (!pskb_may_pull(skb, sizeof(struct tcphdr))) return ERR_PTR(-EINVAL);
--- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -203,6 +203,9 @@ static struct sk_buff *udp4_ufo_fragment goto out; }
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP)) + goto out; + if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto out;
--- a/net/ipv6/esp6_offload.c +++ b/net/ipv6/esp6_offload.c @@ -148,6 +148,9 @@ static struct sk_buff *esp6_gso_segment( if (!xo) goto out;
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_ESP)) + goto out; + seq = xo->seq.low;
x = skb->sp->xvec[skb->sp->len - 1]; --- a/net/ipv6/tcpv6_offload.c +++ b/net/ipv6/tcpv6_offload.c @@ -46,6 +46,9 @@ static struct sk_buff *tcp6_gso_segment( { struct tcphdr *th;
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)) + return ERR_PTR(-EINVAL); + if (!pskb_may_pull(skb, sizeof(*th))) return ERR_PTR(-EINVAL);
--- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -42,6 +42,9 @@ static struct sk_buff *udp6_ufo_fragment const struct ipv6hdr *ipv6h; struct udphdr *uh;
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP)) + goto out; + if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto out;
--- a/net/sctp/offload.c +++ b/net/sctp/offload.c @@ -45,6 +45,9 @@ static struct sk_buff *sctp_gso_segment( struct sk_buff *segs = ERR_PTR(-EINVAL); struct sctphdr *sh;
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP)) + goto out; + sh = sctp_hdr(skb); if (!pskb_may_pull(skb, sizeof(*sh))) goto out;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuval Mintz yuvalm@mellanox.com
[ Upstream commit 1ecdaea02ca6bfacf2ecda500dc1af51e9780c42 ]
Driver periodically samples all neighbors configured in device in order to update the kernel regarding their state. When finding an entry configured in HW that doesn't show in neigh_lookup() driver logs an error message. This introduces a race when removing multiple neighbors - it's possible that a given entry would still be configured in HW as its removal is still being processed but is already removed from the kernel's neighbor tables.
Simply remove the error message and gracefully accept such events.
Fixes: c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table") Fixes: 60f040ca11b9 ("mlxsw: spectrum_router: Periodically dump active IPv6 neighbours") Signed-off-by: Yuval Mintz yuvalm@mellanox.com Reviewed-by: Ido Schimmel idosch@mellanox.com Signed-off-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -1531,11 +1531,8 @@ static void mlxsw_sp_router_neigh_ent_ip dipn = htonl(dip); dev = mlxsw_sp->router->rifs[rif]->dev; n = neigh_lookup(&arp_tbl, &dipn, dev); - if (!n) { - netdev_err(dev, "Failed to find matching neighbour for IP=%pI4h\n", - &dip); + if (!n) return; - }
netdev_dbg(dev, "Updating neighbour with IP=%pI4h\n", &dip); neigh_event_send(n, NULL); @@ -1562,11 +1559,8 @@ static void mlxsw_sp_router_neigh_ent_ip
dev = mlxsw_sp->router->rifs[rif]->dev; n = neigh_lookup(&nd_tbl, &dip, dev); - if (!n) { - netdev_err(dev, "Failed to find matching neighbour for IP=%pI6c\n", - &dip); + if (!n) return; - }
netdev_dbg(dev, "Updating neighbour with IP=%pI6c\n", &dip); neigh_event_send(n, NULL);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 4df0bfc79904b7169dc77dcce44598b1545721f9 ]
tfile->tun could be detached before we close the tun fd, via tun_detach_all(), so it should not be used to check for tfile->tx_array.
As Jason suggested, we probably have to clean it up unconditionally both in __tun_deatch() and tun_detach_all(), but this requires to check if it is initialized or not. Currently skb_array_cleanup() doesn't have such a check, so I check it in the caller and introduce a helper function, it is a bit ugly but we can always improve it in net-next.
Reported-by: Dmitry Vyukov dvyukov@google.com Fixes: 1576d9860599 ("tun: switch to use skb array for tx") Cc: Jason Wang jasowang@redhat.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/tun.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
--- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -534,6 +534,14 @@ static void tun_queue_purge(struct tun_f skb_queue_purge(&tfile->sk.sk_error_queue); }
+static void tun_cleanup_tx_array(struct tun_file *tfile) +{ + if (tfile->tx_array.ring.queue) { + skb_array_cleanup(&tfile->tx_array); + memset(&tfile->tx_array, 0, sizeof(tfile->tx_array)); + } +} + static void __tun_detach(struct tun_file *tfile, bool clean) { struct tun_file *ntfile; @@ -575,8 +583,7 @@ static void __tun_detach(struct tun_file tun->dev->reg_state == NETREG_REGISTERED) unregister_netdevice(tun->dev); } - if (tun) - skb_array_cleanup(&tfile->tx_array); + tun_cleanup_tx_array(tfile); sock_put(&tfile->sk); } } @@ -616,11 +623,13 @@ static void tun_detach_all(struct net_de /* Drop read queue */ tun_queue_purge(tfile); sock_put(&tfile->sk); + tun_cleanup_tx_array(tfile); } list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) { tun_enable_queue(tfile); tun_queue_purge(tfile); sock_put(&tfile->sk); + tun_cleanup_tx_array(tfile); } BUG_ON(tun->numdisabled != 0);
@@ -2624,6 +2633,8 @@ static int tun_chr_open(struct inode *in
sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
+ memset(&tfile->tx_array, 0, sizeof(tfile->tx_array)); + return 0; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit d0c081b49137cd3200f2023c0875723be66e7ce5 ]
syzbot reported yet another crash [1] that is caused by insufficient validation of DODGY packets.
Two bugs are happening here to trigger the crash.
1) Flow dissection leaves with incorrect thoff field.
2) skb_probe_transport_header() sets transport header to this invalid thoff, even if pointing after skb valid data.
3) qdisc_pkt_len_init() reads out-of-bound data because it trusts tcp_hdrlen(skb)
Possible fixes :
- Full flow dissector validation before injecting bad DODGY packets in the stack. This approach was attempted here : https://patchwork.ozlabs.org/patch/ 861874/
- Have more robust functions in the core. This might be needed anyway for stable versions.
This patch fixes the flow dissection issue.
[1] CPU: 1 PID: 3144 Comm: syzkaller271204 Not tainted 4.15.0-rc4-mm1+ #49 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:355 [inline] kasan_report+0x23b/0x360 mm/kasan/report.c:413 __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:432 __tcp_hdrlen include/linux/tcp.h:35 [inline] tcp_hdrlen include/linux/tcp.h:40 [inline] qdisc_pkt_len_init net/core/dev.c:3160 [inline] __dev_queue_xmit+0x20d3/0x2200 net/core/dev.c:3465 dev_queue_xmit+0x17/0x20 net/core/dev.c:3554 packet_snd net/packet/af_packet.c:2943 [inline] packet_sendmsg+0x3ad5/0x60a0 net/packet/af_packet.c:2968 sock_sendmsg_nosec net/socket.c:628 [inline] sock_sendmsg+0xca/0x110 net/socket.c:638 sock_write_iter+0x31a/0x5d0 net/socket.c:907 call_write_iter include/linux/fs.h:1776 [inline] new_sync_write fs/read_write.c:469 [inline] __vfs_write+0x684/0x970 fs/read_write.c:482 vfs_write+0x189/0x510 fs/read_write.c:544 SYSC_write fs/read_write.c:589 [inline] SyS_write+0xef/0x220 fs/read_write.c:581 entry_SYSCALL_64_fastpath+0x1f/0x96
Fixes: 34fad54c2537 ("net: __skb_flow_dissect() must cap its return value") Fixes: a6e544b0a88b ("flow_dissector: Jump to exit code in __skb_flow_dissect") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Willem de Bruijn willemb@google.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/flow_dissector.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -876,8 +876,8 @@ ip_proto_again: out_good: ret = true;
- key_control->thoff = (u16)nhoff; out: + key_control->thoff = min_t(u16, nhoff, skb ? skb->len : hlen); key_basic->n_proto = proto; key_basic->ip_proto = ip_proto;
@@ -885,7 +885,6 @@ out:
out_bad: ret = false; - key_control->thoff = min_t(u16, nhoff, skb ? skb->len : hlen); goto out; } EXPORT_SYMBOL(__skb_flow_dissect);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 625637bf4afa45204bd87e4218645182a919485a ]
After introducing sctp_stream structure, sctp uses stream->outcnt as the out stream nums instead of c.sinit_num_ostreams.
However when users use sinit in cmsg, it only updates c.sinit_num_ostreams in sctp_sendmsg. At that moment, stream->outcnt is still using previous value. If it's value is not updated, the sinit_num_ostreams of sinit could not really work.
This patch is to fix it by updating stream->outcnt and reiniting stream if stream outcnt has been change by sinit in sendmsg.
Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf") Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/socket.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -1880,8 +1880,14 @@ static int sctp_sendmsg(struct sock *sk, */ if (sinit) { if (sinit->sinit_num_ostreams) { - asoc->c.sinit_num_ostreams = - sinit->sinit_num_ostreams; + __u16 outcnt = sinit->sinit_num_ostreams; + + asoc->c.sinit_num_ostreams = outcnt; + /* outcnt has been changed, so re-init stream */ + err = sctp_stream_init(&asoc->stream, outcnt, 0, + GFP_KERNEL); + if (err) + goto out_free; } if (sinit->sinit_max_instreams) { asoc->c.sinit_max_instreams =
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Ahern dsahern@gmail.com
[ Upstream commit cbbdf8433a5f117b1a2119ea30fc651b61ef7570 ]
syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value. The problem is that netlink_rcv_skb loops over the skb repeatedly invoking the callback and without resetting the extack leaving potentially stale data. Initializing each time through avoids the WARN_ON.
Fixes: 2d4bc93368f5a ("netlink: extended ACK reporting") Reported-by: syzbot+315fa6766d0f7c359327@syzkaller.appspotmail.com Signed-off-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netlink/af_netlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2393,7 +2393,7 @@ int netlink_rcv_skb(struct sk_buff *skb, struct nlmsghdr *, struct netlink_ext_ack *)) { - struct netlink_ext_ack extack = {}; + struct netlink_ext_ack extack; struct nlmsghdr *nlh; int err;
@@ -2414,6 +2414,7 @@ int netlink_rcv_skb(struct sk_buff *skb, if (nlh->nlmsg_type < NLMSG_MIN_TYPE) goto ack;
+ memset(&extack, 0, sizeof(extack)); err = cb(skb, nlh, &extack); if (err == -EINTR) goto skip;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Talat Batheesh talatb@mellanox.com
[ Upstream commit e58edaa4863583b54409444f11b4f80dff0af1cd ]
Helmut reported a bug about division by zero while running traffic and doing physical cable pull test.
When the cable unplugged the ppms become zero, so when dividing the current ppms by the previous ppms in the next dim iteration there is division by zero.
This patch prevent this division for both ppms and epms.
Fixes: c3164d2fc48f ("net/mlx5e: Added BW check for DIM decision mechanism") Reported-by: Helmut Grauer helmut.grauer@de.ibm.com Signed-off-by: Talat Batheesh talatb@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c @@ -197,9 +197,15 @@ static int mlx5e_am_stats_compare(struct return (curr->bpms > prev->bpms) ? MLX5E_AM_STATS_BETTER : MLX5E_AM_STATS_WORSE;
+ if (!prev->ppms) + return curr->ppms ? MLX5E_AM_STATS_BETTER : + MLX5E_AM_STATS_SAME; + if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms)) return (curr->ppms > prev->ppms) ? MLX5E_AM_STATS_BETTER : MLX5E_AM_STATS_WORSE; + if (!prev->epms) + return MLX5E_AM_STATS_SAME;
if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms)) return (curr->epms < prev->epms) ? MLX5E_AM_STATS_BETTER :
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski jakub.kicinski@netronome.com
[ Upstream commit 0d9c9f0f40ca262b67fc06a702b85f3976f5e1a1 ]
sts variable is holding link speed as well as state. We should be using ls to index into ls_to_ethtool.
Fixes: 265aeb511bd5 ("nfp: add support for .get_link_ksettings()") Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c @@ -306,7 +306,7 @@ nfp_net_get_link_ksettings(struct net_de ls >= ARRAY_SIZE(ls_to_ethtool)) return 0;
- cmd->base.speed = ls_to_ethtool[sts]; + cmd->base.speed = ls_to_ethtool[ls]; cmd->base.duplex = DUPLEX_FULL;
return 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit cd443f1e91ca600a092e780e8250cd6a2954b763 ]
Move up the extack reset/initialization in netlink_rcv_skb, so that those 'goto ack' will not skip it. Otherwise, later on netlink_ack may use the uninitialized extack and cause kernel crash.
Fixes: cbbdf8433a5f ("netlink: extack needs to be reset each time through loop") Reported-by: syzbot+03bee3680a37466775e7@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netlink/af_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2400,6 +2400,7 @@ int netlink_rcv_skb(struct sk_buff *skb, while (skb->len >= nlmsg_total_size(0)) { int msglen;
+ memset(&extack, 0, sizeof(extack)); nlh = nlmsg_hdr(skb); err = 0;
@@ -2414,7 +2415,6 @@ int netlink_rcv_skb(struct sk_buff *skb, if (nlh->nlmsg_type < NLMSG_MIN_TYPE) goto ack;
- memset(&extack, 0, sizeof(extack)); err = cb(skb, nlh, &extack); if (err == -EINTR) goto skip;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Lesokhin ilyal@mellanox.com
[ Upstream commit d91c3e17f75f218022140dee18cf515292184a8f ]
Calling accept on a TCP socket with a TLS ulp attached results in two sockets that share the same ulp context. The ulp context is freed while a socket is destroyed, so after one of the sockets is released, the second second will trigger a use after free when it tries to access the ulp context attached to it. We restrict the TLS ulp to sockets in ESTABLISHED state to prevent the scenario above.
Fixes: 3c4d7559159b ("tls: kernel TLS support") Reported-by: syzbot+904e7cd6c5c741609228@syzkaller.appspotmail.com Signed-off-by: Ilya Lesokhin ilyal@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tls/tls_main.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -444,6 +444,15 @@ static int tls_init(struct sock *sk) struct tls_context *ctx; int rc = 0;
+ /* The TLS ulp is currently supported only for TCP sockets + * in ESTABLISHED state. + * Supporting sockets in LISTEN state will require us + * to modify the accept implementation to clone rather then + * share the ulp context. + */ + if (sk->sk_state != TCP_ESTABLISHED) + return -ENOTSUPP; + /* allocate tls context */ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit cf6d43ef66f416282121f436ce1bee9a25199d52 ]
During setsockopt(SOL_TCP, TLS_TX), if initialization of the software context fails in tls_set_sw_offload(), we leak sw_ctx. We also don't reassign ctx->priv_ctx to NULL, so we can't even do another attempt to set it up on the same socket, as it will fail with -EEXIST.
Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tls/tls_sw.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -697,18 +697,17 @@ int tls_set_sw_offload(struct sock *sk, } default: rc = -EINVAL; - goto out; + goto free_priv; }
ctx->prepend_size = TLS_HEADER_SIZE + nonce_size; ctx->tag_size = tag_size; ctx->overhead_size = ctx->prepend_size + ctx->tag_size; ctx->iv_size = iv_size; - ctx->iv = kmalloc(iv_size + TLS_CIPHER_AES_GCM_128_SALT_SIZE, - GFP_KERNEL); + ctx->iv = kmalloc(iv_size + TLS_CIPHER_AES_GCM_128_SALT_SIZE, GFP_KERNEL); if (!ctx->iv) { rc = -ENOMEM; - goto out; + goto free_priv; } memcpy(ctx->iv, gcm_128_info->salt, TLS_CIPHER_AES_GCM_128_SALT_SIZE); memcpy(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv, iv_size); @@ -756,7 +755,7 @@ int tls_set_sw_offload(struct sock *sk,
rc = crypto_aead_setauthsize(sw_ctx->aead_send, ctx->tag_size); if (!rc) - goto out; + return 0;
free_aead: crypto_free_aead(sw_ctx->aead_send); @@ -767,6 +766,9 @@ free_rec_seq: free_iv: kfree(ctx->iv); ctx->iv = NULL; +free_priv: + kfree(ctx->priv_ctx); + ctx->priv_ctx = NULL; out: return rc; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 877d17c79b66466942a836403773276e34fe3614 ]
do_tls_setsockopt_tx returns 0 without doing anything when crypto_info is already set. Silent failure is confusing for users.
Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tls/tls_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -364,8 +364,10 @@ static int do_tls_setsockopt_tx(struct s crypto_info = &ctx->crypto_send;
/* Currently we don't support set crypto info more than one time */ - if (TLS_CRYPTO_INFO_READY(crypto_info)) + if (TLS_CRYPTO_INFO_READY(crypto_info)) { + rc = -EBUSY; goto out; + }
switch (tmp_crypto_info.cipher_type) { case TLS_CIPHER_AES_GCM_128: {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 6db959c82eb039a151d95a0f8b7dea643657327a ]
The current code copies directly from userspace to ctx->crypto_send, but doesn't always reinitialize it to 0 on failure. This causes any subsequent attempt to use this setsockopt to fail because of the TLS_CRYPTO_INFO_READY check, eventhough crypto_info is not actually ready.
This should result in a correctly set up socket after the 3rd call, but currently it does not:
size_t s = sizeof(struct tls12_crypto_info_aes_gcm_128); struct tls12_crypto_info_aes_gcm_128 crypto_good = { .info.version = TLS_1_2_VERSION, .info.cipher_type = TLS_CIPHER_AES_GCM_128, };
struct tls12_crypto_info_aes_gcm_128 crypto_bad_type = crypto_good; crypto_bad_type.info.cipher_type = 42;
setsockopt(sock, SOL_TLS, TLS_TX, &crypto_bad_type, s); setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s - 1); setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s);
Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tls/tls_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -373,7 +373,7 @@ static int do_tls_setsockopt_tx(struct s case TLS_CIPHER_AES_GCM_128: { if (optlen != sizeof(struct tls12_crypto_info_aes_gcm_128)) { rc = -EINVAL; - goto out; + goto err_crypto_info; } rc = copy_from_user( crypto_info, @@ -388,7 +388,7 @@ static int do_tls_setsockopt_tx(struct s } default: rc = -EINVAL; - goto out; + goto err_crypto_info; }
ctx->sk_write_space = sk->sk_write_space;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Colitti lorenzo@google.com
[ Upstream commit 6503a30440962f1e1ccb8868816b4e18201218d4 ]
Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") broke "ip route get" in the presence of rules that specify iif lo.
Host-originated traffic always has iif lo, because ip_route_output_key_hash and ip6_route_output_flags set the flow iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a convenient way to select only originated traffic and not forwarded traffic.
inet_rtm_getroute used to match these rules correctly because even though it sets the flow iif to 0, it called ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX. But now that it calls ip_route_output_key_hash_rcu, the ifindex will remain 0 and not match the iif lo in the rule. As a result, "ip route get" will return ENETUNREACH.
Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork... passes again Signed-off-by: Lorenzo Colitti lorenzo@google.com Acked-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/route.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2762,6 +2762,7 @@ static int inet_rtm_getroute(struct sk_b if (err == 0 && rt->dst.error) err = -rt->dst.error; } else { + fl4.flowi4_iif = LOOPBACK_IFINDEX; rt = ip_route_output_key_hash_rcu(net, &fl4, &res, skb); err = 0; if (IS_ERR(rt))
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neil Horman nhorman@tuxdriver.com
[ Upstream commit 848b159835ddef99cc4193083f7e786c3992f580 ]
with the introduction of commit b0eb57cb97e7837ebb746404c2c58c6f536f23fa, it appears that rq->buf_info is improperly handled. While it is heap allocated when an rx queue is setup, and freed when torn down, an old line of code in vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0] being set to NULL prior to its being freed, causing a memory leak, which eventually exhausts the system on repeated create/destroy operations (for example, when the mtu of a vmxnet3 interface is changed frequently.
Fix is pretty straight forward, just move the NULL set to after the free.
Tested by myself with successful results
Applies to net, and should likely be queued for stable, please
Signed-off-by: Neil Horman nhorman@tuxdriver.com Reported-By: boyang@redhat.com CC: boyang@redhat.com CC: Shrikrishna Khare skhare@vmware.com CC: "VMware, Inc." pv-drivers@vmware.com CC: David S. Miller davem@davemloft.net Acked-by: Shrikrishna Khare skhare@vmware.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/vmxnet3/vmxnet3_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -1616,7 +1616,6 @@ static void vmxnet3_rq_destroy(struct vm rq->rx_ring[i].basePA); rq->rx_ring[i].base = NULL; } - rq->buf_info[i] = NULL; }
if (rq->data_ring.base) { @@ -1638,6 +1637,7 @@ static void vmxnet3_rq_destroy(struct vm (rq->rx_ring[0].size + rq->rx_ring[1].size); dma_free_coherent(&adapter->pdev->dev, sz, rq->buf_info[0], rq->buf_info_pa); + rq->buf_info[0] = rq->buf_info[1] = NULL; } }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiao Liang xiliang@redhat.com
commit 40d4071ce2d20840d224b4a77b5dc6f752c9ab15 upstream.
The AMD power module can be loaded on non AMD platforms, but unload fails with the following Oops:
BUG: unable to handle kernel NULL pointer dereference at (null) IP: __list_del_entry_valid+0x29/0x90 Call Trace: perf_pmu_unregister+0x25/0xf0 amd_power_pmu_exit+0x1c/0xd23 [power] SyS_delete_module+0x1a8/0x2b0 ? exit_to_usermode_loop+0x8f/0xb0 entry_SYSCALL_64_fastpath+0x20/0x83
Return -ENODEV instead of 0 from the module init function if the CPU does not match.
Fixes: c7ab62bfbe0e ("perf/x86/amd/power: Add AMD accumulated power reporting mechanism") Signed-off-by: Xiao Liang xiliang@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20180122061252.6394-1-xiliang@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/power.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/events/amd/power.c +++ b/arch/x86/events/amd/power.c @@ -277,7 +277,7 @@ static int __init amd_power_pmu_init(voi int ret;
if (!x86_match_cpu(cpu_match)) - return 0; + return -ENODEV;
if (!boot_cpu_has(X86_FEATURE_ACC_POWER)) return -ENODEV;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jia Zhang zhang.jia@linux.alibaba.com
commit 7e702d17ed138cf4ae7c00e8c00681ed464587c7 upstream.
Commit b94b73733171 ("x86/microcode/intel: Extend BDW late-loading with a revision check") reduced the impact of erratum BDF90 for Broadwell model 79.
The impact can be reduced further by checking the size of the last level cache portion per core.
Tony: "The erratum says the problem only occurs on the large-cache SKUs. So we only need to avoid the update if we are on a big cache SKU that is also running old microcode."
For more details, see erratum BDF90 in document #334165 (Intel Xeon Processor E7-8800/4800 v4 Product Family Specification Update) from September 2017.
Fixes: b94b73733171 ("x86/microcode/intel: Extend BDW late-loading with a revision check") Signed-off-by: Jia Zhang zhang.jia@linux.alibaba.com Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Tony Luck tony.luck@intel.com Link: https://lkml.kernel.org/r/1516321542-31161-1-git-send-email-zhang.jia@linux.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/microcode/intel.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -45,6 +45,9 @@ static const char ucode_path[] = "kernel /* Current microcode patch used in early patching on the APs. */ static struct microcode_intel *intel_ucode_patch;
+/* last level cache size per core */ +static int llc_size_per_core; + static inline bool cpu_signatures_match(unsigned int s1, unsigned int p1, unsigned int s2, unsigned int p2) { @@ -912,12 +915,14 @@ static bool is_blacklisted(unsigned int
/* * Late loading on model 79 with microcode revision less than 0x0b000021 - * may result in a system hang. This behavior is documented in item - * BDF90, #334165 (Intel Xeon Processor E7-8800/4800 v4 Product Family). + * and LLC size per core bigger than 2.5MB may result in a system hang. + * This behavior is documented in item BDF90, #334165 (Intel Xeon + * Processor E7-8800/4800 v4 Product Family). */ if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X && c->x86_mask == 0x01 && + llc_size_per_core > 2621440 && c->microcode < 0x0b000021) { pr_err_once("Erratum BDF90: late loading with revision < 0x0b000021 (0x%x) disabled.\n", c->microcode); pr_err_once("Please consider either early loading through initrd/built-in or a potential BIOS update.\n"); @@ -975,6 +980,15 @@ static struct microcode_ops microcode_in .apply_microcode = apply_microcode_intel, };
+static int __init calc_llc_size_per_core(struct cpuinfo_x86 *c) +{ + u64 llc_size = c->x86_cache_size * 1024; + + do_div(llc_size, c->x86_max_cores); + + return (int)llc_size; +} + struct microcode_ops * __init init_intel_microcode(void) { struct cpuinfo_x86 *c = &boot_cpu_data; @@ -985,5 +999,7 @@ struct microcode_ops * __init init_intel return NULL; }
+ llc_size_per_core = calc_llc_size_per_core(c); + return µcode_intel_ops; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov bp@suse.de
commit 1d080f096fe33f031d26e19b3ef0146f66b8b0f1 upstream.
Commit 24c2503255d3 ("x86/microcode: Do not access the initrd after it has been freed") fixed attempts to access initrd from the microcode loader after it has been freed. However, a similar KASAN warning was reported (stack trace edited):
smpboot: Booting Node 0 Processor 1 APIC 0x11 ================================================================== BUG: KASAN: use-after-free in find_cpio_data+0x9b5/0xa50 Read of size 1 at addr ffff880035ffd000 by task swapper/1/0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.8-slack #7 Hardware name: System manufacturer System Product Name/A88X-PLUS, BIOS 3003 03/10/2016 Call Trace: dump_stack print_address_description kasan_report ? find_cpio_data __asan_report_load1_noabort find_cpio_data find_microcode_in_initrd __load_ucode_amd load_ucode_amd_ap load_ucode_ap
After some investigation, it turned out that a merge was done using the wrong side to resolve, leading to picking up the previous state, before the 24c2503255d3 fix. Therefore the Fixes tag below contains a merge commit.
Revert the mismerge by catching the save_microcode_in_initrd_amd() retval and thus letting the function exit with the last return statement so that initrd_gone can be set to true.
Fixes: f26483eaedec ("Merge branch 'x86/urgent' into x86/microcode, to resolve conflicts") Reported-by: higuita@gmx.net Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://bugzilla.kernel.org/show_bug.cgi?id=198295 Link: https://lkml.kernel.org/r/20180123104133.918-2-bp@alien8.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/microcode/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -239,7 +239,7 @@ static int __init save_microcode_in_init break; case X86_VENDOR_AMD: if (c->x86 >= 0x10) - return save_microcode_in_initrd_amd(cpuid_eax(1)); + ret = save_microcode_in_initrd_amd(cpuid_eax(1)); break; default: break;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 5beda7d54eafece4c974cfa9fbb9f60fb18fd20a upstream.
Neil Berrington reported a double-fault on a VM with 768GB of RAM that uses large amounts of vmalloc space with PTI enabled.
The cause is that load_new_mm_cr3() was never fixed to take the 5-level pgd folding code into account, so, on a 4-level kernel, the pgd synchronization logic compiles away to exactly nothing.
Interestingly, the problem doesn't trigger with nopti. I assume this is because the kernel is mapped with global pages if we boot with nopti. The sequence of operations when we create a new task is that we first load its mm while still running on the old stack (which crashes if the old stack is unmapped in the new mm unless the TLB saves us), then we call prepare_switch_to(), and then we switch to the new stack. prepare_switch_to() pokes the new stack directly, which will populate the mapping through vmalloc_fault(). I assume that we're getting lucky on non-PTI systems -- the old stack's TLB entry stays alive long enough to make it all the way through prepare_switch_to() and switch_to() so that we make it to a valid stack.
Fixes: b50858ce3e2a ("x86/mm/vmalloc: Add 5-level paging support") Reported-and-tested-by: Neil Berrington neil.berrington@datacore.com Signed-off-by: Andy Lutomirski luto@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Dave Hansen dave.hansen@intel.com Cc: Borislav Petkov bp@alien8.de Link: https://lkml.kernel.org/r/346541c56caed61abbe693d7d2742b4a380c5001.151691452... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/mm/tlb.c | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-)
--- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -151,6 +151,34 @@ void switch_mm(struct mm_struct *prev, s local_irq_restore(flags); }
+static void sync_current_stack_to_mm(struct mm_struct *mm) +{ + unsigned long sp = current_stack_pointer; + pgd_t *pgd = pgd_offset(mm, sp); + + if (CONFIG_PGTABLE_LEVELS > 4) { + if (unlikely(pgd_none(*pgd))) { + pgd_t *pgd_ref = pgd_offset_k(sp); + + set_pgd(pgd, *pgd_ref); + } + } else { + /* + * "pgd" is faked. The top level entries are "p4d"s, so sync + * the p4d. This compiles to approximately the same code as + * the 5-level case. + */ + p4d_t *p4d = p4d_offset(pgd, sp); + + if (unlikely(p4d_none(*p4d))) { + pgd_t *pgd_ref = pgd_offset_k(sp); + p4d_t *p4d_ref = p4d_offset(pgd_ref, sp); + + set_p4d(p4d, *p4d_ref); + } + } +} + void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) { @@ -226,11 +254,7 @@ void switch_mm_irqs_off(struct mm_struct * mapped in the new pgd, we'll double-fault. Forcibly * map it. */ - unsigned int index = pgd_index(current_stack_pointer); - pgd_t *pgd = next->pgd + index; - - if (unlikely(pgd_none(*pgd))) - set_pgd(pgd, init_mm.pgd[index]); + sync_current_stack_to_mm(next); }
/* Stop remote flushes for the previous mm */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit d5421ea43d30701e03cadc56a38854c36a8b4433 upstream.
The hrtimer interrupt code contains a hang detection and mitigation mechanism, which prevents that a long delayed hrtimer interrupt causes a continous retriggering of interrupts which prevent the system from making progress. If a hang is detected then the timer hardware is programmed with a certain delay into the future and a flag is set in the hrtimer cpu base which prevents newly enqueued timers from reprogramming the timer hardware prior to the chosen delay. The subsequent hrtimer interrupt after the delay clears the flag and resumes normal operation.
If such a hang happens in the last hrtimer interrupt before a CPU is unplugged then the hang_detected flag is set and stays that way when the CPU is plugged in again. At that point the timer hardware is not armed and it cannot be armed because the hang_detected flag is still active, so nothing clears that flag. As a consequence the CPU does not receive hrtimer interrupts and no timers expire on that CPU which results in RCU stalls and other malfunctions.
Clear the flag along with some other less critical members of the hrtimer cpu base to ensure starting from a clean state when a CPU is plugged in.
Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the root cause of that hard to reproduce heisenbug. Once understood it's trivial and certainly justifies a brown paperbag.
Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic") Reported-by: Paul E. McKenney paulmck@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Sebastian Sewior bigeasy@linutronix.de Cc: Anna-Maria Gleixner anna-maria@linutronix.de Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/hrtimer.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -655,7 +655,9 @@ static void hrtimer_reprogram(struct hrt static inline void hrtimer_init_hres(struct hrtimer_cpu_base *base) { base->expires_next = KTIME_MAX; + base->hang_detected = 0; base->hres_active = 0; + base->next_timer = NULL; }
/* @@ -1591,6 +1593,7 @@ int hrtimers_prepare_cpu(unsigned int cp timerqueue_init_head(&cpu_base->clock_base[i].active); }
+ cpu_base->active_bases = 0; cpu_base->cpu = cpu; hrtimer_init_hres(cpu_base); return 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexei Starovoitov ast@kernel.org
[ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]
The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
A quote from goolge project zero blog: "At this point, it would normally be necessary to locate gadgets in the host kernel code that can be used to actually leak data by reading from an attacker-controlled location, shifting and masking the result appropriately and then using the result of that as offset to an attacker-controlled address for a load. But piecing gadgets together and figuring out which ones work in a speculation context seems annoying. So instead, we decided to use the eBPF interpreter, which is built into the host kernel - while there is no legitimate way to invoke it from inside a VM, the presence of the code in the host kernel's text section is sufficient to make it usable for the attack, just like with ordinary ROP gadgets."
To make attacker job harder introduce BPF_JIT_ALWAYS_ON config option that removes interpreter from the kernel in favor of JIT-only mode. So far eBPF JIT is supported by: x64, arm64, arm32, sparc64, s390, powerpc64, mips64
The start of JITed program is randomized and code page is marked as read-only. In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
v2->v3: - move __bpf_prog_ret0 under ifdef (Daniel)
v1->v2: - fix init order, test_bpf and cBPF (Daniel's feedback) - fix offloaded bpf (Jakub's feedback) - add 'return 0' dummy in case something can invoke prog->bpf_func - retarget bpf tree. For bpf-next the patch would need one extra hunk. It will be sent when the trees are merged back to net-next
Considered doing: int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT; but it seems better to land the patch as-is and in bpf-next remove bpf_jit_enable global variable from all JITs, consolidate in one place and remove this jit_init() function.
Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- init/Kconfig | 7 +++++++ kernel/bpf/core.c | 19 +++++++++++++++++++ lib/test_bpf.c | 11 +++++++---- net/core/filter.c | 6 ++---- net/core/sysctl_net_core.c | 6 ++++++ net/socket.c | 9 +++++++++ 6 files changed, 50 insertions(+), 8 deletions(-)
--- a/init/Kconfig +++ b/init/Kconfig @@ -1342,6 +1342,13 @@ config BPF_SYSCALL Enable the bpf() system call that allows to manipulate eBPF programs and maps via file descriptors.
+config BPF_JIT_ALWAYS_ON + bool "Permanently enable BPF JIT and remove BPF interpreter" + depends on BPF_SYSCALL && HAVE_EBPF_JIT && BPF_JIT + help + Enables BPF JIT and removes BPF interpreter to avoid + speculative execution of BPF instructions by the interpreter + config SHMEM bool "Use full shmem filesystem" if EXPERT default y --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -760,6 +760,7 @@ noinline u64 __bpf_call_base(u64 r1, u64 } EXPORT_SYMBOL_GPL(__bpf_call_base);
+#ifndef CONFIG_BPF_JIT_ALWAYS_ON /** * __bpf_prog_run - run eBPF program on a given context * @ctx: is the data we are operating on @@ -1310,6 +1311,14 @@ EVAL6(PROG_NAME_LIST, 224, 256, 288, 320 EVAL4(PROG_NAME_LIST, 416, 448, 480, 512) };
+#else +static unsigned int __bpf_prog_ret0(const void *ctx, + const struct bpf_insn *insn) +{ + return 0; +} +#endif + bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp) { @@ -1357,9 +1366,13 @@ static int bpf_check_tail_call(const str */ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err) { +#ifndef CONFIG_BPF_JIT_ALWAYS_ON u32 stack_depth = max_t(u32, fp->aux->stack_depth, 1);
fp->bpf_func = interpreters[(round_up(stack_depth, 32) / 32) - 1]; +#else + fp->bpf_func = __bpf_prog_ret0; +#endif
/* eBPF JITs can rewrite the program in case constant * blinding is active. However, in case of error during @@ -1368,6 +1381,12 @@ struct bpf_prog *bpf_prog_select_runtime * be JITed, but falls back to the interpreter. */ fp = bpf_int_jit_compile(fp); +#ifdef CONFIG_BPF_JIT_ALWAYS_ON + if (!fp->jited) { + *err = -ENOTSUPP; + return fp; + } +#endif bpf_prog_lock_ro(fp);
/* The tail call compatibility check can only be done at --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -6207,9 +6207,8 @@ static struct bpf_prog *generate_filter( return NULL; } } - /* We don't expect to fail. */ if (*err) { - pr_cont("FAIL to attach err=%d len=%d\n", + pr_cont("FAIL to prog_create err=%d len=%d\n", *err, fprog.len); return NULL; } @@ -6233,6 +6232,10 @@ static struct bpf_prog *generate_filter( * checks. */ fp = bpf_prog_select_runtime(fp, err); + if (*err) { + pr_cont("FAIL to select_runtime err=%d\n", *err); + return NULL; + } break; }
@@ -6418,8 +6421,8 @@ static __init int test_bpf(void) pass_cnt++; continue; } - - return err; + err_cnt++; + continue; }
pr_cont("jited:%u ", fp->jited); --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1053,11 +1053,9 @@ static struct bpf_prog *bpf_migrate_filt */ goto out_err_free;
- /* We are guaranteed to never error here with cBPF to eBPF - * transitions, since there's no issue with type compatibility - * checks on program arrays. - */ fp = bpf_prog_select_runtime(fp, &err); + if (err) + goto out_err_free;
kfree(old_prog); return fp; --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -325,7 +325,13 @@ static struct ctl_table net_core_table[] .data = &bpf_jit_enable, .maxlen = sizeof(int), .mode = 0644, +#ifndef CONFIG_BPF_JIT_ALWAYS_ON .proc_handler = proc_dointvec +#else + .proc_handler = proc_dointvec_minmax, + .extra1 = &one, + .extra2 = &one, +#endif }, # ifdef CONFIG_HAVE_EBPF_JIT { --- a/net/socket.c +++ b/net/socket.c @@ -2642,6 +2642,15 @@ out_fs:
core_initcall(sock_init); /* early initcall */
+static int __init jit_init(void) +{ +#ifdef CONFIG_BPF_JIT_ALWAYS_ON + bpf_jit_enable = 1; +#endif + return 0; +} +pure_initcall(jit_init); + #ifdef CONFIG_PROC_FS void socket_seq_show(struct seq_file *seq) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ upstream commit be95a845cc4402272994ce290e3ad928aff06cb9 ]
In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds speculation") also change the layout of struct bpf_map such that false sharing of fast-path members like max_entries is avoided when the maps reference counter is altered. Therefore enforce them to be placed into separate cachelines.
pahole dump after change:
struct bpf_map { const struct bpf_map_ops * ops; /* 0 8 */ struct bpf_map * inner_map_meta; /* 8 8 */ void * security; /* 16 8 */ enum bpf_map_type map_type; /* 24 4 */ u32 key_size; /* 28 4 */ u32 value_size; /* 32 4 */ u32 max_entries; /* 36 4 */ u32 map_flags; /* 40 4 */ u32 pages; /* 44 4 */ u32 id; /* 48 4 */ int numa_node; /* 52 4 */ bool unpriv_array; /* 56 1 */
/* XXX 7 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */ struct user_struct * user; /* 64 8 */ atomic_t refcnt; /* 72 4 */ atomic_t usercnt; /* 76 4 */ struct work_struct work; /* 80 32 */ char name[16]; /* 112 16 */ /* --- cacheline 2 boundary (128 bytes) --- */
/* size: 128, cachelines: 2, members: 17 */ /* sum members: 121, holes: 1, sum holes: 7 */ };
Now all entries in the first cacheline are read only throughout the life time of the map, set up once during map creation. Overall struct size and number of cachelines doesn't change from the reordering. struct bpf_map is usually first member and embedded in map structs in specific map implementations, so also avoid those members to sit at the end where it could potentially share the cacheline with first map values e.g. in the array since remote CPUs could trigger map updates just as well for those (easily dirtying members like max_entries intentionally as well) while having subsequent values in cache.
Quoting from Google's Project Zero blog [1]:
Additionally, at least on the Intel machine on which this was tested, bouncing modified cache lines between cores is slow, apparently because the MESI protocol is used for cache coherence [8]. Changing the reference counter of an eBPF array on one physical CPU core causes the cache line containing the reference counter to be bounced over to that CPU core, making reads of the reference counter on all other CPU cores slow until the changed reference counter has been written back to memory. Because the length and the reference counter of an eBPF array are stored in the same cache line, this also means that changing the reference counter on one physical CPU core causes reads of the eBPF array's length to be slow on other physical CPU cores (intentional false sharing).
While this doesn't 'control' the out-of-bounds speculation through masking the index as in commit b2157399cc98, triggering a manipulation of the map's reference counter is really trivial, so lets not allow to easily affect max_entries from it.
Splitting to separate cachelines also generally makes sense from a performance perspective anyway in that fast-path won't have a cache miss if the map gets pinned, reused in other progs, etc out of control path, thus also avoids unintentional false sharing.
[1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with...
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bpf.h | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
--- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -42,7 +42,14 @@ struct bpf_map_ops { };
struct bpf_map { - atomic_t refcnt; + /* 1st cacheline with read-mostly members of which some + * are also accessed in fast-path (e.g. ops, max_entries). + */ + const struct bpf_map_ops *ops ____cacheline_aligned; + struct bpf_map *inner_map_meta; +#ifdef CONFIG_SECURITY + void *security; +#endif enum bpf_map_type map_type; u32 key_size; u32 value_size; @@ -52,11 +59,15 @@ struct bpf_map { u32 id; int numa_node; bool unpriv_array; - struct user_struct *user; - const struct bpf_map_ops *ops; - struct work_struct work; + /* 7 bytes hole */ + + /* 2nd cacheline with misc members to avoid false sharing + * particularly with refcounting. + */ + struct user_struct *user ____cacheline_aligned; + atomic_t refcnt; atomic_t usercnt; - struct bpf_map *inner_map_meta; + struct work_struct work; };
/* function argument constraints */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ upstream commit c366287ebd698ef5e3de300d90cd62ee9ee7373e ]
Divides by zero are not nice, lets avoid them if possible.
Also do_div() seems not needed when dealing with 32bit operands, but this seems a minor detail.
Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -949,7 +949,7 @@ select_insn: DST = tmp; CONT; ALU_MOD_X: - if (unlikely(SRC == 0)) + if (unlikely((u32)SRC == 0)) return 0; tmp = (u32) DST; DST = do_div(tmp, (u32) SRC); @@ -968,7 +968,7 @@ select_insn: DST = div64_u64(DST, SRC); CONT; ALU_DIV_X: - if (unlikely(SRC == 0)) + if (unlikely((u32)SRC == 0)) return 0; tmp = (u32) DST; do_div(tmp, (u32) SRC);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexei Starovoitov ast@kernel.org
[ upstream commit 68fda450a7df51cff9e5a4d4a4d9d0d5f2589153 ]
due to some JITs doing if (src_reg == 0) check in 64-bit mode for div/mod operations mask upper 32-bits of src register before doing the check
Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT") Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.") Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 18 ++++++++++++++++++ net/core/filter.c | 4 ++++ 2 files changed, 22 insertions(+)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4304,6 +4304,24 @@ static int fixup_bpf_calls(struct bpf_ve int i, cnt, delta = 0;
for (i = 0; i < insn_cnt; i++, insn++) { + if (insn->code == (BPF_ALU | BPF_MOD | BPF_X) || + insn->code == (BPF_ALU | BPF_DIV | BPF_X)) { + /* due to JIT bugs clear upper 32-bits of src register + * before div/mod operation + */ + insn_buf[0] = BPF_MOV32_REG(insn->src_reg, insn->src_reg); + insn_buf[1] = *insn; + cnt = 2; + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = prog = new_prog; + insn = new_prog->insnsi + i + delta; + continue; + } + if (insn->code != (BPF_JMP | BPF_CALL)) continue;
--- a/net/core/filter.c +++ b/net/core/filter.c @@ -457,6 +457,10 @@ do_pass: convert_bpf_extensions(fp, &insn)) break;
+ if (fp->code == (BPF_ALU | BPF_DIV | BPF_X) || + fp->code == (BPF_ALU | BPF_MOD | BPF_X)) + *insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X); + *insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k); break;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ upstream commit f37a8cb84cce18762e8f86a70bd6a49a66ab964c ]
Alexei found that verifier does not reject stores into context via BPF_ST instead of BPF_STX. And while looking at it, we also should not allow XADD variant of BPF_STX.
The context rewriter is only assuming either BPF_LDX_MEM- or BPF_STX_MEM-type operations, thus reject anything other than that so that assumptions in the rewriter properly hold. Add test cases as well for BPF selftests.
Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields") Reported-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 19 ++++++++++++++++++ tools/testing/selftests/bpf/test_verifier.c | 29 ++++++++++++++++++++++++++-- 2 files changed, 46 insertions(+), 2 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -986,6 +986,13 @@ static bool is_pointer_value(struct bpf_ return __is_pointer_value(env->allow_ptr_leaks, &env->cur_state.regs[regno]); }
+static bool is_ctx_reg(struct bpf_verifier_env *env, int regno) +{ + const struct bpf_reg_state *reg = &env->cur_state.regs[regno]; + + return reg->type == PTR_TO_CTX; +} + static int check_pkt_ptr_alignment(const struct bpf_reg_state *reg, int off, int size, bool strict) { @@ -1258,6 +1265,12 @@ static int check_xadd(struct bpf_verifie return -EACCES; }
+ if (is_ctx_reg(env, insn->dst_reg)) { + verbose("BPF_XADD stores into R%d context is not allowed\n", + insn->dst_reg); + return -EACCES; + } + /* check whether atomic_add can read the memory */ err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, BPF_SIZE(insn->code), BPF_READ, -1); @@ -3859,6 +3872,12 @@ static int do_check(struct bpf_verifier_ if (err) return err;
+ if (is_ctx_reg(env, insn->dst_reg)) { + verbose("BPF_ST stores into R%d context is not allowed\n", + insn->dst_reg); + return -EACCES; + } + /* check that memory (dst_reg + off) is writeable */ err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, BPF_SIZE(insn->code), BPF_WRITE, --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -2596,6 +2596,29 @@ static struct bpf_test tests[] = { .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, { + "context stores via ST", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_ST_MEM(BPF_DW, BPF_REG_1, offsetof(struct __sk_buff, mark), 0), + BPF_EXIT_INSN(), + }, + .errstr = "BPF_ST stores into R1 context is not allowed", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + }, + { + "context stores via XADD", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_W, BPF_REG_1, + BPF_REG_0, offsetof(struct __sk_buff, mark), 0), + BPF_EXIT_INSN(), + }, + .errstr = "BPF_XADD stores into R1 context is not allowed", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + }, + { "direct packet access: test1", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, @@ -4317,7 +4340,8 @@ static struct bpf_test tests[] = { .fixup_map1 = { 2 }, .errstr_unpriv = "R2 leaks addr into mem", .result_unpriv = REJECT, - .result = ACCEPT, + .result = REJECT, + .errstr = "BPF_XADD stores into R1 context is not allowed", }, { "leak pointer into ctx 2", @@ -4331,7 +4355,8 @@ static struct bpf_test tests[] = { }, .errstr_unpriv = "R10 leaks addr into mem", .result_unpriv = REJECT, - .result = ACCEPT, + .result = REJECT, + .errstr = "BPF_XADD stores into R1 context is not allowed", }, { "leak pointer into ctx 3",
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ upstream commit a2284d912bfc865cdca4c00488e08a3550f9a405 ]
Using dynamic stack_depth tracking in arm64 JIT is currently broken in combination with tail calls. In prologue, we cache ctx->stack_size and adjust SP reg for setting up function call stack, and tearing it down again in epilogue. Problem is that when doing a tail call, the cached ctx->stack_size might not be the same.
One way to fix the problem with minimal overhead is to re-adjust SP in emit_bpf_tail_call() and properly adjust it to the current program's ctx->stack_size. Tested on Cavium ThunderX ARMv8.
Fixes: f1c9eed7f437 ("bpf, arm64: take advantage of stack_depth tracking") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/net/bpf_jit_comp.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
--- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -148,7 +148,8 @@ static inline int epilogue_offset(const /* Stack must be multiples of 16B */ #define STACK_ALIGN(sz) (((sz) + 15) & ~15)
-#define PROLOGUE_OFFSET 8 +/* Tail call offset to jump into */ +#define PROLOGUE_OFFSET 7
static int build_prologue(struct jit_ctx *ctx) { @@ -200,19 +201,19 @@ static int build_prologue(struct jit_ctx /* Initialize tail_call_cnt */ emit(A64_MOVZ(1, tcc, 0, 0), ctx);
- /* 4 byte extra for skb_copy_bits buffer */ - ctx->stack_size = prog->aux->stack_depth + 4; - ctx->stack_size = STACK_ALIGN(ctx->stack_size); - - /* Set up function call stack */ - emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx); - cur_offset = ctx->idx - idx0; if (cur_offset != PROLOGUE_OFFSET) { pr_err_once("PROLOGUE_OFFSET = %d, expected %d!\n", cur_offset, PROLOGUE_OFFSET); return -1; } + + /* 4 byte extra for skb_copy_bits buffer */ + ctx->stack_size = prog->aux->stack_depth + 4; + ctx->stack_size = STACK_ALIGN(ctx->stack_size); + + /* Set up function call stack */ + emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx); return 0; }
@@ -260,11 +261,12 @@ static int emit_bpf_tail_call(struct jit emit(A64_LDR64(prg, tmp, prg), ctx); emit(A64_CBZ(1, prg, jmp_offset), ctx);
- /* goto *(prog->bpf_func + prologue_size); */ + /* goto *(prog->bpf_func + prologue_offset); */ off = offsetof(struct bpf_prog, bpf_func); emit_a64_mov_i64(tmp, off, ctx); emit(A64_LDR64(tmp, prg, tmp), ctx); emit(A64_ADD_I(1, tmp, tmp, sizeof(u32) * PROLOGUE_OFFSET), ctx); + emit(A64_ADD_I(1, A64_SP, A64_SP, ctx->stack_size), ctx); emit(A64_BR(tmp), ctx);
/* out: */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 56026645e2b6f11ede34a5e6ab69d3eb56f9c8fc upstream.
After commit aa7519af450d (cpufreq: Use transition_delay_us for legacy governors as well) the sampling_rate field of struct dbs_data may be less than the tick period which causes dbs_update() to produce incorrect results, so make the code ensure that the value of that field will always be sufficiently large.
Fixes: aa7519af450d (cpufreq: Use transition_delay_us for legacy governors as well) Reported-by: Andy Tang andy.tang@nxp.com Reported-by: Doug Smythies dsmythies@telus.net Tested-by: Andy Tang andy.tang@nxp.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/cpufreq/cpufreq_governor.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
--- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -22,6 +22,8 @@
#include "cpufreq_governor.h"
+#define CPUFREQ_DBS_MIN_SAMPLING_INTERVAL (2 * TICK_NSEC / NSEC_PER_USEC) + static DEFINE_PER_CPU(struct cpu_dbs_info, cpu_dbs);
static DEFINE_MUTEX(gov_dbs_data_mutex); @@ -47,11 +49,15 @@ ssize_t store_sampling_rate(struct gov_a { struct dbs_data *dbs_data = to_dbs_data(attr_set); struct policy_dbs_info *policy_dbs; + unsigned int sampling_interval; int ret; - ret = sscanf(buf, "%u", &dbs_data->sampling_rate); - if (ret != 1) + + ret = sscanf(buf, "%u", &sampling_interval); + if (ret != 1 || sampling_interval < CPUFREQ_DBS_MIN_SAMPLING_INTERVAL) return -EINVAL;
+ dbs_data->sampling_rate = sampling_interval; + /* * We are operating under dbs_data->mutex and so the list and its * entries can't be freed concurrently. @@ -430,7 +436,14 @@ int cpufreq_dbs_governor_init(struct cpu if (ret) goto free_policy_dbs_info;
- dbs_data->sampling_rate = cpufreq_policy_transition_delay_us(policy); + /* + * The sampling interval should not be less than the transition latency + * of the CPU and it also cannot be too small for dbs_update() to work + * correctly. + */ + dbs_data->sampling_rate = max_t(unsigned int, + CPUFREQ_DBS_MIN_SAMPLING_INTERVAL, + cpufreq_policy_transition_delay_us(policy));
if (!have_governor_per_policy()) gov->gdbs_data = dbs_data;
On 01/29/2018 05:56 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.16 release. There are 71 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 31 12:37:59 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.16-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On 29 January 2018 at 18:26, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.16 release. There are 71 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 31 12:37:59 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.16-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.14.16-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: d5ba39515c18189ffc6b6273a0de22ebb8581233 git describe: v4.14.15-73-gd5ba39515c18 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.15-73...
No regressions (compared to build v4.14.15-72-g19110e135e1e)
Boards, architectures and test suites: -------------------------------------
hi6220-hikey - arm64 * boot - pass: 20, * kselftest - pass: 46, skip: 16 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 21, skip: 1 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 983, skip: 121 * ltp-timers-tests - pass: 12,
juno-r2 - arm64 * boot - pass: 20, * kselftest - pass: 46, skip: 16 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 987, skip: 121 * ltp-timers-tests - pass: 12,
x15 - arm * boot - pass: 20, * kselftest - pass: 43, skip: 18 * libhugetlbfs - pass: 87, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 20, skip: 2 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 13, skip: 1 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1037, skip: 66 * ltp-timers-tests - pass: 12,
x86_64 * boot - pass: 20, * kselftest - pass: 56, fail: 1, skip: 18 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 61, skip: 1 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 9, skip: 1 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1016, skip: 116 * ltp-timers-tests - pass: 12,
Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports Tested-by: Naresh Kamboju naresh.kamboju@linaro.org
On Tue, Jan 30, 2018 at 03:36:08PM +0530, Naresh Kamboju wrote:
On 29 January 2018 at 18:26, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.16 release. There are 71 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 31 12:37:59 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.16-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Thanks for testing all of these and letting me know.
greg k-h
On 01/29/2018 04:56 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.16 release. There are 71 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 31 12:37:59 UTC 2018. Anything received after that time might be too late.
Build results: total: 145 pass: 145 fail: 0 Qemu test results: total: 126 pass: 126 fail: 0
Details are available at http://kerneltests.org/builders.
Guenter
On Tue, Jan 30, 2018 at 06:21:27AM -0800, Guenter Roeck wrote:
On 01/29/2018 04:56 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.16 release. There are 71 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 31 12:37:59 UTC 2018. Anything received after that time might be too late.
Build results: total: 145 pass: 145 fail: 0 Qemu test results: total: 126 pass: 126 fail: 0
Thanks for testing all of these and letting me know.
greg k-h
linux-stable-mirror@lists.linaro.org