This is the start of the stable review cycle for the 4.14.115 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 02 May 2019 11:34:49 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.115-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.115-rc1
ZhangXiaoxu zhangxiaoxu5@huawei.com ipv4: set the tcp_min_rtt_wlen range from 0 to one day
Eric Dumazet edumazet@google.com net/rose: fix unbound loop in rose_loopback_timer()
Kees Cook keescook@chromium.org net/rose: Convert timers to use timer_setup()
Hangbin Liu liuhangbin@gmail.com team: fix possible recursive locking when add slaves
Su Bao Cheng baocheng.su@siemens.com stmmac: pci: Adjust IOT2000 matching
Vinod Koul vkoul@kernel.org net: stmmac: move stmmac_check_ether_addr() to driver probe
Zhu Yanjun yanjun.zhu@oracle.com net: rds: exchange of 8K and 1M pool
Erez Alfasi ereza@mellanox.com net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query
Amit Cohen amitc@mellanox.com mlxsw: spectrum: Fix autoneg status in ethtool
Eric Dumazet edumazet@google.com ipv4: add sanity checks in ipv4_link_failure()
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "block/loop: Use global lock for ioctl() operation."
Jan Kara jack@suse.cz mm: Fix warning in insert_pfn()
Daniel Borkmann daniel@iogearbox.net x86/retpolines: Disable switch jump tables when retpolines are enabled
Daniel Borkmann daniel@iogearbox.net x86, retpolines: Raise limit for generating indirect calls from switch-case
Mikulas Patocka mpatocka@redhat.com dm integrity: change memcmp to strncmp in dm_integrity_ctr
Xin Long lucien.xin@gmail.com tipc: check link name with right length in tipc_nl_compat_link_set
Xin Long lucien.xin@gmail.com tipc: check bearer name with right length in tipc_nl_compat_bearer_enable
Yue Haibing yuehaibing@huawei.com fm10k: Fix a potential NULL pointer dereference
Florian Westphal fw@strlen.de netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
luca abeni luca.abeni@santannapisa.it sched/deadline: Correctly handle active 0-lag timers
Todd Kjos tkjos@android.com binder: fix handling of misaligned binder object
Andrea Claudi aclaudi@redhat.com ipvs: fix warning on unused variable
YueHaibing yuehaibing@huawei.com fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
Alexander Shishkin alexander.shishkin@linux.intel.com intel_th: gth: Fix an off-by-one in output unassigning
Linus Torvalds torvalds@linux-foundation.org slip: make slhc_free() silently accept an error pointer
Xin Long lucien.xin@gmail.com tipc: handle the err returned from cmd header function
Adalbert Lazăr alazar@bitdefender.com vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock
Dan Carpenter dan.carpenter@oracle.com ext4: fix some error pointer dereferences
Kai-Heng Feng kai.heng.feng@canonical.com USB: Consolidate LPM checks to avoid enabling LPM twice
Kai-Heng Feng kai.heng.feng@canonical.com USB: Add new USB LPM helpers
Maarten Lankhorst maarten.lankhorst@linux.intel.com drm/vc4: Fix compilation error reported by kbuild test bot
Dave Airlie airlied@redhat.com Revert "drm/i915/fbdev: Actually configure untiled displays"
Maarten Lankhorst maarten.lankhorst@linux.intel.com drm/vc4: Fix memory leak during gpu reset.
Ard Biesheuvel ard.biesheuvel@linaro.org ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache
Dirk Behme dirk.behme@de.bosch.com dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid
Alex Williamson alex.williamson@redhat.com vfio/type1: Limit DMA mappings per container
Lucas Stach l.stach@pengutronix.de Input: synaptics-rmi4 - write config register values to the right offset
NeilBrown neilb@suse.com sunrpc: don't mark uninitialised items as VALID.
Trond Myklebust trondmy@gmail.com nfsd: Don't release the callback slot unless it was actually held
Yan, Zheng zyan@redhat.com ceph: fix ci->i_head_snapc leak
Jeff Layton jlayton@kernel.org ceph: ensure d_name stability in ceph_dentry_hash()
Jeff Layton jlayton@kernel.org ceph: only use d_name directly when parent is locked
Xie XiuQi xiexiuqi@huawei.com sched/numa: Fix a possible divide-by-zero
Josh Collier josh.d.collier@intel.com IB/rdmavt: Fix frwr memory registration
Peter Zijlstra peterz@infradead.org trace: Fix preempt_enable_no_resched() abuse
Aurelien Jarno aurelien@aurel32.net MIPS: scall64-o32: Fix indirect syscall number load
YueHaibing yuehaibing@huawei.com lib/Kconfig.debug: fix build error without CONFIG_BLOCK
Jérôme Glisse jglisse@redhat.com zram: pass down the bvec we need to read into in the work struct
Jann Horn jannh@google.com tracing: Fix buffer_ref pipe ops
Wenwen Wang wang6495@umn.edu tracing: Fix a memory leak by early error exit in trace_pid_write()
Frank Sorenson sorenson@redhat.com cifs: do not attempt cifs operation on smb2+ rename error
Masahiro Yamada yamada.masahiro@socionext.com kbuild: simplify ld-option implementation
-------------
Diffstat:
Documentation/networking/ip-sysctl.txt | 1 + Makefile | 4 +- arch/arm/boot/compressed/head.S | 16 ++++- arch/mips/kernel/scall64-o32.S | 2 +- arch/x86/Makefile | 9 +++ drivers/android/binder_alloc.c | 18 +++--- drivers/block/loop.c | 58 +++++++++--------- drivers/block/loop.h | 1 + drivers/block/zram/zram_drv.c | 5 +- drivers/dma/sh/rcar-dmac.c | 4 +- drivers/gpu/drm/i915/intel_fbdev.c | 12 ++-- drivers/gpu/drm/vc4/vc4_crtc.c | 2 +- drivers/hwtracing/intel_th/gth.c | 2 +- drivers/infiniband/sw/rdmavt/mr.c | 17 +++--- drivers/input/rmi4/rmi_f11.c | 2 +- drivers/md/dm-integrity.c | 6 +- drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 + .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/port.c | 4 -- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 4 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +- drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 8 ++- drivers/net/slip/slhc.c | 2 +- drivers/net/team/team.c | 6 ++ drivers/usb/core/driver.c | 23 +++++-- drivers/usb/core/hub.c | 16 ++--- drivers/usb/core/message.c | 3 +- drivers/usb/core/sysfs.c | 5 +- drivers/usb/core/usb.h | 10 +++- drivers/vfio/vfio_iommu_type1.c | 14 +++++ fs/ceph/dir.c | 6 +- fs/ceph/mds_client.c | 70 ++++++++++++++++++---- fs/ceph/snap.c | 7 ++- fs/cifs/inode.c | 4 ++ fs/ext4/xattr.c | 3 + fs/nfs/super.c | 3 +- fs/nfsd/nfs4callback.c | 8 ++- fs/nfsd/state.h | 1 + fs/proc/proc_sysctl.c | 6 +- fs/splice.c | 4 +- include/linux/pipe_fs_i.h | 1 + kernel/sched/deadline.c | 3 +- kernel/sched/fair.c | 4 ++ kernel/trace/ring_buffer.c | 2 +- kernel/trace/trace.c | 33 +++++----- lib/Kconfig.debug | 1 + mm/memory.c | 9 ++- net/bridge/netfilter/ebtables.c | 3 +- net/ipv4/route.c | 32 +++++++--- net/ipv4/sysctl_net_ipv4.c | 5 +- net/netfilter/ipvs/ip_vs_ctl.c | 3 +- net/rds/ib_fmr.c | 11 ++++ net/rds/ib_rdma.c | 3 - net/rose/af_rose.c | 17 +++--- net/rose/rose_link.c | 16 +++-- net/rose/rose_loopback.c | 36 +++++------ net/rose/rose_route.c | 8 +-- net/rose/rose_timer.c | 30 ++++------ net/sunrpc/cache.c | 3 + net/tipc/netlink_compat.c | 24 ++++++-- net/vmw_vsock/virtio_transport_common.c | 22 ++++--- scripts/Kbuild.include | 4 +- 62 files changed, 424 insertions(+), 220 deletions(-)
From: Masahiro Yamada yamada.masahiro@socionext.com
commit 0294e6f4a0006856e1f36b8cd8fa088d9e499e98 upstream.
Currently, linker options are tested by the coordination of $(CC) and $(LD) because $(LD) needs some object to link.
As commit 86a9df597cdd ("kbuild: fix linker feature test macros when cross compiling with Clang") addressed, we need to make sure $(CC) and $(LD) agree the underlying architecture of the passed object.
This could be a bit complex when we combine tools from different groups. For example, we can use clang for $(CC), but we still need to rely on GCC toolchain for $(LD).
So, I was searching for a way of standalone testing of linker options. A trick I found is to use '-v'; this not only prints the version string, but also tests if the given option is recognized.
If a given option is supported,
$ aarch64-linux-gnu-ld -v --fix-cortex-a53-843419 GNU ld (Linaro_Binutils-2017.11) 2.28.2.20170706 $ echo $? 0
If unsupported,
$ aarch64-linux-gnu-ld -v --fix-cortex-a53-843419 GNU ld (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 2.23.1 aarch64-linux-gnu-ld: unrecognized option '--fix-cortex-a53-843419' aarch64-linux-gnu-ld: use the --help option for usage information $ echo $? 1
Gold works likewise.
$ aarch64-linux-gnu-ld.gold -v --fix-cortex-a53-843419 GNU gold (Linaro_Binutils-2017.11 2.28.2.20170706) 1.14 masahiro@pug:~/ref/linux$ echo $? 0 $ aarch64-linux-gnu-ld.gold -v --fix-cortex-a53-999999 GNU gold (Linaro_Binutils-2017.11 2.28.2.20170706) 1.14 aarch64-linux-gnu-ld.gold: --fix-cortex-a53-999999: unknown option aarch64-linux-gnu-ld.gold: use the --help option for usage information $ echo $? 1
LLD too.
$ ld.lld -v --gc-sections LLD 7.0.0 (http://llvm.org/git/lld.git 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers) $ echo $? 0 $ ld.lld -v --fix-cortex-a53-843419 LLD 7.0.0 (http://llvm.org/git/lld.git 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers) $ echo $? 0 $ ld.lld -v --fix-cortex-a53-999999 ld.lld: error: unknown argument: --fix-cortex-a53-999999 LLD 7.0.0 (http://llvm.org/git/lld.git 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers) $ echo $? 1
Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Tested-by: Nick Desaulniers ndesaulniers@google.com [nc: try-run-cached was added later, just use try-run, which is the current mainline state] Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- scripts/Kbuild.include | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -165,9 +165,7 @@ cc-ldoption = $(call try-run,\
# ld-option # Usage: LDFLAGS += $(call ld-option, -X) -ld-option = $(call try-run,\ - $(CC) $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -x c /dev/null -c -o "$$TMPO"; \ - $(LD) $(LDFLAGS) $(1) "$$TMPO" -o "$$TMP",$(1),$(2)) +ld-option = $(call try-run, $(LD) $(LDFLAGS) $(1) -v,$(1),$(2))
# ar-option # Usage: KBUILD_ARFLAGS := $(call ar-option,D)
From: Frank Sorenson sorenson@redhat.com
commit 652727bbe1b17993636346716ae5867627793647 upstream.
A path-based rename returning EBUSY will incorrectly try opening the file with a cifs (NT Create AndX) operation on an smb2+ mount, which causes the server to force a session close.
If the mount is smb2+, skip the fallback.
Signed-off-by: Frank Sorenson sorenson@redhat.com Signed-off-by: Steve French stfrench@microsoft.com CC: Stable stable@vger.kernel.org Reviewed-by: Ronnie Sahlberg lsahlber@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/inode.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -1730,6 +1730,10 @@ cifs_do_rename(const unsigned int xid, s if (rc == 0 || rc != -EBUSY) goto do_rename_exit;
+ /* Don't fall back to using SMB on SMB 2+ mount */ + if (server->vals->protocol_id != 0) + goto do_rename_exit; + /* open-file renames don't work across directories */ if (to_dentry->d_parent != from_dentry->d_parent) goto do_rename_exit;
From: Wenwen Wang wang6495@umn.edu
commit 91862cc7867bba4ee5c8fcf0ca2f1d30427b6129 upstream.
In trace_pid_write(), the buffer for trace parser is allocated through kmalloc() in trace_parser_get_init(). Later on, after the buffer is used, it is then freed through kfree() in trace_parser_put(). However, it is possible that trace_pid_write() is terminated due to unexpected errors, e.g., ENOMEM. In that case, the allocated buffer will not be freed, which is a memory leak bug.
To fix this issue, free the allocated buffer when an error is encountered.
Link: http://lkml.kernel.org/r/1555726979-15633-1-git-send-email-wang6495@umn.edu
Fixes: f4d34a87e9c10 ("tracing: Use pid bitmap instead of a pid array for set_event_pid") Cc: stable@vger.kernel.org Signed-off-by: Wenwen Wang wang6495@umn.edu Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/trace.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -494,8 +494,10 @@ int trace_pid_write(struct trace_pid_lis * not modified. */ pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL); - if (!pid_list) + if (!pid_list) { + trace_parser_put(&parser); return -ENOMEM; + }
pid_list->pid_max = READ_ONCE(pid_max);
@@ -505,6 +507,7 @@ int trace_pid_write(struct trace_pid_lis
pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3); if (!pid_list->pids) { + trace_parser_put(&parser); kfree(pid_list); return -ENOMEM; }
From: Jann Horn jannh@google.com
commit b987222654f84f7b4ca95b3a55eca784cb30235b upstream.
This fixes multiple issues in buffer_pipe_buf_ops:
- The ->steal() handler must not return zero unless the pipe buffer has the only reference to the page. But generic_pipe_buf_steal() assumes that every reference to the pipe is tracked by the page's refcount, which isn't true for these buffers - buffer_pipe_buf_get(), which duplicates a buffer, doesn't touch the page's refcount. Fix it by using generic_pipe_buf_nosteal(), which refuses every attempted theft. It should be easy to actually support ->steal, but the only current users of pipe_buf_steal() are the virtio console and FUSE, and they also only use it as an optimization. So it's probably not worth the effort. - The ->get() and ->release() handlers can be invoked concurrently on pipe buffers backed by the same struct buffer_ref. Make them safe against concurrency by using refcount_t. - The pointers stored in ->private were only zeroed out when the last reference to the buffer_ref was dropped. As far as I know, this shouldn't be necessary anyway, but if we do it, let's always do it.
Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com
Cc: Ingo Molnar mingo@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: stable@vger.kernel.org Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer") Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/splice.c | 4 ++-- include/linux/pipe_fs_i.h | 1 + kernel/trace/trace.c | 28 ++++++++++++++-------------- 3 files changed, 17 insertions(+), 16 deletions(-)
--- a/fs/splice.c +++ b/fs/splice.c @@ -332,8 +332,8 @@ const struct pipe_buf_operations default .get = generic_pipe_buf_get, };
-static int generic_pipe_buf_nosteal(struct pipe_inode_info *pipe, - struct pipe_buffer *buf) +int generic_pipe_buf_nosteal(struct pipe_inode_info *pipe, + struct pipe_buffer *buf) { return 1; } --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -182,6 +182,7 @@ void free_pipe_info(struct pipe_inode_in void generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *); int generic_pipe_buf_confirm(struct pipe_inode_info *, struct pipe_buffer *); int generic_pipe_buf_steal(struct pipe_inode_info *, struct pipe_buffer *); +int generic_pipe_buf_nosteal(struct pipe_inode_info *, struct pipe_buffer *); void generic_pipe_buf_release(struct pipe_inode_info *, struct pipe_buffer *); void pipe_buf_mark_unmergeable(struct pipe_buffer *buf);
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6719,19 +6719,23 @@ struct buffer_ref { struct ring_buffer *buffer; void *page; int cpu; - int ref; + refcount_t refcount; };
+static void buffer_ref_release(struct buffer_ref *ref) +{ + if (!refcount_dec_and_test(&ref->refcount)) + return; + ring_buffer_free_read_page(ref->buffer, ref->cpu, ref->page); + kfree(ref); +} + static void buffer_pipe_buf_release(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { struct buffer_ref *ref = (struct buffer_ref *)buf->private;
- if (--ref->ref) - return; - - ring_buffer_free_read_page(ref->buffer, ref->cpu, ref->page); - kfree(ref); + buffer_ref_release(ref); buf->private = 0; }
@@ -6740,7 +6744,7 @@ static void buffer_pipe_buf_get(struct p { struct buffer_ref *ref = (struct buffer_ref *)buf->private;
- ref->ref++; + refcount_inc(&ref->refcount); }
/* Pipe buffer operations for a buffer. */ @@ -6748,7 +6752,7 @@ static const struct pipe_buf_operations .can_merge = 0, .confirm = generic_pipe_buf_confirm, .release = buffer_pipe_buf_release, - .steal = generic_pipe_buf_steal, + .steal = generic_pipe_buf_nosteal, .get = buffer_pipe_buf_get, };
@@ -6761,11 +6765,7 @@ static void buffer_spd_release(struct sp struct buffer_ref *ref = (struct buffer_ref *)spd->partial[i].private;
- if (--ref->ref) - return; - - ring_buffer_free_read_page(ref->buffer, ref->cpu, ref->page); - kfree(ref); + buffer_ref_release(ref); spd->partial[i].private = 0; }
@@ -6820,7 +6820,7 @@ tracing_buffers_splice_read(struct file break; }
- ref->ref = 1; + refcount_set(&ref->refcount, 1); ref->buffer = iter->trace_buffer->buffer; ref->page = ring_buffer_alloc_read_page(ref->buffer, iter->cpu_file); if (IS_ERR(ref->page)) {
From: Jérôme Glisse jglisse@redhat.com
commit e153abc0739ff77bd89c9ba1688cdb963464af97 upstream.
When scheduling work item to read page we need to pass down the proper bvec struct which points to the page to read into. Before this patch it uses a randomly initialized bvec (only if PAGE_SIZE != 4096) which is wrong.
Note that without this patch on arch/kernel where PAGE_SIZE != 4096 userspace could read random memory through a zram block device (thought userspace probably would have no control on the address being read).
Link: http://lkml.kernel.org/r/20190408183219.26377-1-jglisse@redhat.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Reviewed-by: Sergey Senozhatsky sergey.senozhatsky@gmail.com Acked-by: Minchan Kim minchan@kernel.org Cc: Nitin Gupta ngupta@vflare.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/zram/zram_drv.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -488,18 +488,18 @@ struct zram_work { struct zram *zram; unsigned long entry; struct bio *bio; + struct bio_vec bvec; };
#if PAGE_SIZE != 4096 static void zram_sync_read(struct work_struct *work) { - struct bio_vec bvec; struct zram_work *zw = container_of(work, struct zram_work, work); struct zram *zram = zw->zram; unsigned long entry = zw->entry; struct bio *bio = zw->bio;
- read_from_bdev_async(zram, &bvec, entry, bio); + read_from_bdev_async(zram, &zw->bvec, entry, bio); }
/* @@ -512,6 +512,7 @@ static int read_from_bdev_sync(struct zr { struct zram_work work;
+ work.bvec = *bvec; work.zram = zram; work.entry = entry; work.bio = bio;
From: YueHaibing yuehaibing@huawei.com
commit ae3d6a323347940f0548bbb4b17f0bb2e9164169 upstream.
If CONFIG_TEST_KMOD is set to M, while CONFIG_BLOCK is not set, XFS and BTRFS can not be compiled successly.
Link: http://lkml.kernel.org/r/20190410075434.35220-1-yuehaibing@huawei.com Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader") Signed-off-by: YueHaibing yuehaibing@huawei.com Reported-by: Hulk Robot hulkci@huawei.com Reviewed-by: Kees Cook keescook@chromium.org Cc: Masahiro Yamada yamada.masahiro@socionext.com Cc: Petr Mladek pmladek@suse.com Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: Matthew Wilcox willy@infradead.org Cc: Joe Lawrence joe.lawrence@redhat.com Cc: Robin Murphy robin.murphy@arm.com Cc: Luis Chamberlain mcgrof@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- lib/Kconfig.debug | 1 + 1 file changed, 1 insertion(+)
--- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1884,6 +1884,7 @@ config TEST_KMOD depends on m depends on BLOCK && (64BIT || LBDAF) # for XFS, BTRFS depends on NETDEVICES && NET_CORE && INET # for TUN + depends on BLOCK select TEST_LKM select XFS_FS select TUN
From: Aurelien Jarno aurelien@aurel32.net
commit 79b4a9cf0e2ea8203ce777c8d5cfa86c71eae86e upstream.
Commit 4c21b8fd8f14 (MIPS: seccomp: Handle indirect system calls (o32)) added indirect syscall detection for O32 processes running on MIPS64, but it did not work correctly for big endian kernel/processes. The reason is that the syscall number is loaded from ARG1 using the lw instruction while this is a 64-bit value, so zero is loaded instead of the syscall number.
Fix the code by using the ld instruction instead. When running a 32-bit processes on a 64 bit CPU, the values are properly sign-extended, so it ensures the value passed to syscall_trace_enter is correct.
Recent systemd versions with seccomp enabled whitelist the getpid syscall for their internal processes (e.g. systemd-journald), but call it through syscall(SYS_getpid). This fix therefore allows O32 big endian systems with a 64-bit kernel to run recent systemd versions.
Signed-off-by: Aurelien Jarno aurelien@aurel32.net Cc: stable@vger.kernel.org # v3.15+ Reviewed-by: Philippe Mathieu-Daudé f4bug@amsat.org Signed-off-by: Paul Burton paul.burton@mips.com Cc: Ralf Baechle ralf@linux-mips.org Cc: James Hogan jhogan@kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/scall64-o32.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/mips/kernel/scall64-o32.S +++ b/arch/mips/kernel/scall64-o32.S @@ -125,7 +125,7 @@ trace_a_syscall: subu t1, v0, __NR_O32_Linux move a1, v0 bnez t1, 1f /* __NR_syscall at offset 0 */ - lw a1, PT_R4(sp) /* Arg1 for __NR_syscall case */ + ld a1, PT_R4(sp) /* Arg1 for __NR_syscall case */ .set pop
1: jal syscall_trace_enter
From: Peter Zijlstra peterz@infradead.org
commit d6097c9e4454adf1f8f2c9547c2fa6060d55d952 upstream.
Unless the very next line is schedule(), or implies it, one must not use preempt_enable_no_resched(). It can cause a preemption to go missing and thereby cause arbitrary delays, breaking the PREEMPT=y invariant.
Link: http://lkml.kernel.org/r/20190423200318.GY14281@hirez.programming.kicks-ass....
Cc: Waiman Long longman@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Ingo Molnar mingo@redhat.com Cc: Will Deacon will.deacon@arm.com Cc: Thomas Gleixner tglx@linutronix.de Cc: the arch/x86 maintainers x86@kernel.org Cc: Davidlohr Bueso dave@stgolabs.net Cc: Tim Chen tim.c.chen@linux.intel.com Cc: huang ying huang.ying.caritas@gmail.com Cc: Roman Gushchin guro@fb.com Cc: Alexei Starovoitov ast@kernel.org Cc: Daniel Borkmann daniel@iogearbox.net Cc: stable@vger.kernel.org Fixes: 2c2d7329d8af ("tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()") Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/ring_buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -700,7 +700,7 @@ u64 ring_buffer_time_stamp(struct ring_b
preempt_disable_notrace(); time = rb_time_stamp(buffer); - preempt_enable_no_resched_notrace(); + preempt_enable_notrace();
return time; }
From: Josh Collier josh.d.collier@intel.com
commit 7c39f7f671d2acc0a1f39ebbbee4303ad499bbfa upstream.
Current implementation was not properly handling frwr memory registrations. This was uncovered by commit 27f26cec761das ("xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)") in which xprtrdma, which is used for NFS over RDMA, started failing as it was the first ULP to modify the ib_mr iova resulting in the NFS server getting REMOTE ACCESS ERROR when attempting to perform RDMA Writes to the client.
The fix is to properly capture the true iova, offset, and length in the call to ib_map_mr_sg, and then update the iova when processing the IB_WR_REG_MEM on the send queue.
Fixes: a41081aa5936 ("IB/rdmavt: Add support for ib_map_mr_sg") Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Reviewed-by: Dennis Dalessandro dennis.dalessandro@intel.com Reviewed-by: Michael J. Ruhl michael.j.ruhl@intel.com Signed-off-by: Josh Collier josh.d.collier@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/sw/rdmavt/mr.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
--- a/drivers/infiniband/sw/rdmavt/mr.c +++ b/drivers/infiniband/sw/rdmavt/mr.c @@ -611,11 +611,6 @@ static int rvt_set_page(struct ib_mr *ib if (unlikely(mapped_segs == mr->mr.max_segs)) return -ENOMEM;
- if (mr->mr.length == 0) { - mr->mr.user_base = addr; - mr->mr.iova = addr; - } - m = mapped_segs / RVT_SEGSZ; n = mapped_segs % RVT_SEGSZ; mr->mr.map[m]->segs[n].vaddr = (void *)addr; @@ -633,17 +628,24 @@ static int rvt_set_page(struct ib_mr *ib * @sg_nents: number of entries in sg * @sg_offset: offset in bytes into sg * + * Overwrite rvt_mr length with mr length calculated by ib_sg_to_pages. + * * Return: number of sg elements mapped to the memory region */ int rvt_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, unsigned int *sg_offset) { struct rvt_mr *mr = to_imr(ibmr); + int ret;
mr->mr.length = 0; mr->mr.page_shift = PAGE_SHIFT; - return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, - rvt_set_page); + ret = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, rvt_set_page); + mr->mr.user_base = ibmr->iova; + mr->mr.iova = ibmr->iova; + mr->mr.offset = ibmr->iova - (u64)mr->mr.map[0]->segs[0].vaddr; + mr->mr.length = (size_t)ibmr->length; + return ret; }
/** @@ -674,6 +676,7 @@ int rvt_fast_reg_mr(struct rvt_qp *qp, s ibmr->rkey = key; mr->mr.lkey = key; mr->mr.access_flags = access; + mr->mr.iova = ibmr->iova; atomic_set(&mr->mr.lkey_invalid, 0);
return 0;
From: Xie XiuQi xiexiuqi@huawei.com
commit a860fa7b96e1a1c974556327aa1aee852d434c21 upstream.
sched_clock_cpu() may not be consistent between CPUs. If a task migrates to another CPU, then se.exec_start is set to that CPU's rq_clock_task() by update_stats_curr_start(). Specifically, the new value might be before the old value due to clock skew.
So then if in numa_get_avg_runtime() the expression:
'now - p->last_task_numa_placement'
ends up as -1, then the divider '*period + 1' in task_numa_placement() is 0 and things go bang. Similar to update_curr(), check if time goes backwards to avoid this.
[ peterz: Wrote new changelog. ] [ mingo: Tweaked the code comment. ]
Signed-off-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: cj.chengjian@huawei.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20190425080016.GX11158@hirez.programming.kicks-ass.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/fair.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2026,6 +2026,10 @@ static u64 numa_get_avg_runtime(struct t if (p->last_task_numa_placement) { delta = runtime - p->last_sum_exec_runtime; *period = now - p->last_task_numa_placement; + + /* Avoid time going backwards, prevent potential divide error: */ + if (unlikely((s64)*period < 0)) + *period = 0; } else { delta = p->se.avg.load_sum / p->se.load.weight; *period = LOAD_AVG_MAX;
From: Jeff Layton jlayton@kernel.org
commit 1bcb344086f3ecf8d6705f6d708441baa823beb3 upstream.
Ben reported tripping the BUG_ON in create_request_message during some performance testing. Analysis of the vmcore showed that the length of the r_dentry->d_name string changed after we allocated the buffer, but before we encoded it.
build_dentry_path returns pointers to d_name in the common case of non-snapped dentries, but this optimization isn't safe unless the parent directory is locked. When it isn't, have the code make a copy of the d_name while holding the d_lock.
Cc: stable@vger.kernel.org Reported-by: Ben England bengland@redhat.com Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: "Yan, Zheng" zyan@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ceph/mds_client.c | 61 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 50 insertions(+), 11 deletions(-)
--- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1863,10 +1863,39 @@ retry: return path; }
+/* Duplicate the dentry->d_name.name safely */ +static int clone_dentry_name(struct dentry *dentry, const char **ppath, + int *ppathlen) +{ + u32 len; + char *name; + +retry: + len = READ_ONCE(dentry->d_name.len); + name = kmalloc(len + 1, GFP_NOFS); + if (!name) + return -ENOMEM; + + spin_lock(&dentry->d_lock); + if (dentry->d_name.len != len) { + spin_unlock(&dentry->d_lock); + kfree(name); + goto retry; + } + memcpy(name, dentry->d_name.name, len); + spin_unlock(&dentry->d_lock); + + name[len] = '\0'; + *ppath = name; + *ppathlen = len; + return 0; +} + static int build_dentry_path(struct dentry *dentry, struct inode *dir, const char **ppath, int *ppathlen, u64 *pino, - int *pfreepath) + bool *pfreepath, bool parent_locked) { + int ret; char *path;
rcu_read_lock(); @@ -1875,8 +1904,15 @@ static int build_dentry_path(struct dent if (dir && ceph_snap(dir) == CEPH_NOSNAP) { *pino = ceph_ino(dir); rcu_read_unlock(); - *ppath = dentry->d_name.name; - *ppathlen = dentry->d_name.len; + if (parent_locked) { + *ppath = dentry->d_name.name; + *ppathlen = dentry->d_name.len; + } else { + ret = clone_dentry_name(dentry, ppath, ppathlen); + if (ret) + return ret; + *pfreepath = true; + } return 0; } rcu_read_unlock(); @@ -1884,13 +1920,13 @@ static int build_dentry_path(struct dent if (IS_ERR(path)) return PTR_ERR(path); *ppath = path; - *pfreepath = 1; + *pfreepath = true; return 0; }
static int build_inode_path(struct inode *inode, const char **ppath, int *ppathlen, u64 *pino, - int *pfreepath) + bool *pfreepath) { struct dentry *dentry; char *path; @@ -1906,7 +1942,7 @@ static int build_inode_path(struct inode if (IS_ERR(path)) return PTR_ERR(path); *ppath = path; - *pfreepath = 1; + *pfreepath = true; return 0; }
@@ -1917,7 +1953,7 @@ static int build_inode_path(struct inode static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry, struct inode *rdiri, const char *rpath, u64 rino, const char **ppath, int *pathlen, - u64 *ino, int *freepath) + u64 *ino, bool *freepath, bool parent_locked) { int r = 0;
@@ -1927,7 +1963,7 @@ static int set_request_path_attr(struct ceph_snap(rinode)); } else if (rdentry) { r = build_dentry_path(rdentry, rdiri, ppath, pathlen, ino, - freepath); + freepath, parent_locked); dout(" dentry %p %llx/%.*s\n", rdentry, *ino, *pathlen, *ppath); } else if (rpath || rino) { @@ -1953,7 +1989,7 @@ static struct ceph_msg *create_request_m const char *path2 = NULL; u64 ino1 = 0, ino2 = 0; int pathlen1 = 0, pathlen2 = 0; - int freepath1 = 0, freepath2 = 0; + bool freepath1 = false, freepath2 = false; int len; u16 releases; void *p, *end; @@ -1961,16 +1997,19 @@ static struct ceph_msg *create_request_m
ret = set_request_path_attr(req->r_inode, req->r_dentry, req->r_parent, req->r_path1, req->r_ino1.ino, - &path1, &pathlen1, &ino1, &freepath1); + &path1, &pathlen1, &ino1, &freepath1, + test_bit(CEPH_MDS_R_PARENT_LOCKED, + &req->r_req_flags)); if (ret < 0) { msg = ERR_PTR(ret); goto out; }
+ /* If r_old_dentry is set, then assume that its parent is locked */ ret = set_request_path_attr(NULL, req->r_old_dentry, req->r_old_dentry_dir, req->r_path2, req->r_ino2.ino, - &path2, &pathlen2, &ino2, &freepath2); + &path2, &pathlen2, &ino2, &freepath2, true); if (ret < 0) { msg = ERR_PTR(ret); goto out_free1;
From: Jeff Layton jlayton@kernel.org
commit 76a495d666e5043ffc315695f8241f5e94a98849 upstream.
Take the d_lock here to ensure that d_name doesn't change.
Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: "Yan, Zheng" zyan@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ceph/dir.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1454,6 +1454,7 @@ void ceph_dentry_lru_del(struct dentry * unsigned ceph_dentry_hash(struct inode *dir, struct dentry *dn) { struct ceph_inode_info *dci = ceph_inode(dir); + unsigned hash;
switch (dci->i_dir_layout.dl_dir_hash) { case 0: /* for backward compat */ @@ -1461,8 +1462,11 @@ unsigned ceph_dentry_hash(struct inode * return dn->d_name.hash;
default: - return ceph_str_hash(dci->i_dir_layout.dl_dir_hash, + spin_lock(&dn->d_lock); + hash = ceph_str_hash(dci->i_dir_layout.dl_dir_hash, dn->d_name.name, dn->d_name.len); + spin_unlock(&dn->d_lock); + return hash; } }
From: Yan, Zheng zyan@redhat.com
commit 37659182bff1eeaaeadcfc8f853c6d2b6dbc3f47 upstream.
We missed two places that i_wrbuffer_ref_head, i_wr_ref, i_dirty_caps and i_flushing_caps may change. When they are all zeros, we should free i_head_snapc.
Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/38224 Reported-and-tested-by: Luis Henriques lhenriques@suse.com Signed-off-by: "Yan, Zheng" zyan@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ceph/mds_client.c | 9 +++++++++ fs/ceph/snap.c | 7 ++++++- 2 files changed, 15 insertions(+), 1 deletion(-)
--- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1219,6 +1219,15 @@ static int remove_session_caps_cb(struct list_add(&ci->i_prealloc_cap_flush->i_list, &to_remove); ci->i_prealloc_cap_flush = NULL; } + + if (drop && + ci->i_wrbuffer_ref_head == 0 && + ci->i_wr_ref == 0 && + ci->i_dirty_caps == 0 && + ci->i_flushing_caps == 0) { + ceph_put_snap_context(ci->i_head_snapc); + ci->i_head_snapc = NULL; + } } spin_unlock(&ci->i_ceph_lock); while (!list_empty(&to_remove)) { --- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -568,7 +568,12 @@ void ceph_queue_cap_snap(struct ceph_ino old_snapc = NULL;
update_snapc: - if (ci->i_head_snapc) { + if (ci->i_wrbuffer_ref_head == 0 && + ci->i_wr_ref == 0 && + ci->i_dirty_caps == 0 && + ci->i_flushing_caps == 0) { + ci->i_head_snapc = NULL; + } else { ci->i_head_snapc = ceph_get_snap_context(new_snapc); dout(" new snapc is %p\n", new_snapc); }
From: Trond Myklebust trondmy@gmail.com
commit e6abc8caa6deb14be2a206253f7e1c5e37e9515b upstream.
If there are multiple callbacks queued, waiting for the callback slot when the callback gets shut down, then they all currently end up acting as if they hold the slot, and call nfsd4_cb_sequence_done() resulting in interesting side-effects.
In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done() causes a loop back to nfsd4_cb_prepare() without first freeing the slot, which causes a deadlock when nfsd41_cb_get_slot() gets called a second time.
This patch therefore adds a boolean to track whether or not the callback did pick up the slot, so that it can do the right thing in these 2 cases.
Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfsd/nfs4callback.c | 8 +++++++- fs/nfsd/state.h | 1 + 2 files changed, 8 insertions(+), 1 deletion(-)
--- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -939,8 +939,9 @@ static void nfsd4_cb_prepare(struct rpc_ cb->cb_seq_status = 1; cb->cb_status = 0; if (minorversion) { - if (!nfsd41_cb_get_slot(clp, task)) + if (!cb->cb_holds_slot && !nfsd41_cb_get_slot(clp, task)) return; + cb->cb_holds_slot = true; } rpc_call_start(task); } @@ -967,6 +968,9 @@ static bool nfsd4_cb_sequence_done(struc return true; }
+ if (!cb->cb_holds_slot) + goto need_restart; + switch (cb->cb_seq_status) { case 0: /* @@ -1004,6 +1008,7 @@ static bool nfsd4_cb_sequence_done(struc cb->cb_seq_status); }
+ cb->cb_holds_slot = false; clear_bit(0, &clp->cl_cb_slot_busy); rpc_wake_up_next(&clp->cl_cb_waitq); dprintk("%s: freed slot, new seqid=%d\n", __func__, @@ -1211,6 +1216,7 @@ void nfsd4_init_cb(struct nfsd4_callback cb->cb_seq_status = 1; cb->cb_status = 0; cb->cb_need_restart = false; + cb->cb_holds_slot = false; }
void nfsd4_run_cb(struct nfsd4_callback *cb) --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -69,6 +69,7 @@ struct nfsd4_callback { int cb_seq_status; int cb_status; bool cb_need_restart; + bool cb_holds_slot; };
struct nfsd4_callback_ops {
From: NeilBrown neilb@suse.com
commit d58431eacb226222430940134d97bfd72f292fcd upstream.
A recent commit added a call to cache_fresh_locked() when an expired item was found. The call sets the CACHE_VALID flag, so it is important that the item actually is valid. There are two ways it could be valid: 1/ If ->update has been called to fill in relevant content 2/ if CACHE_NEGATIVE is set, to say that content doesn't exist.
An expired item that is waiting for an update will be neither. Setting CACHE_VALID will mean that a subsequent call to cache_put() will be likely to dereference uninitialised pointers.
So we must make sure the item is valid, and we already have code to do that in try_to_negate_entry(). This takes the hash lock and so cannot be used directly, so take out the two lines that we need and use them.
Now cache_fresh_locked() is certain to be called only on a valid item.
Cc: stable@kernel.org # 2.6.35 Fixes: 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued request") Signed-off-by: NeilBrown neilb@suse.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sunrpc/cache.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/sunrpc/cache.c +++ b/net/sunrpc/cache.c @@ -54,6 +54,7 @@ static void cache_init(struct cache_head h->last_refresh = now; }
+static inline int cache_is_valid(struct cache_head *h); static void cache_fresh_locked(struct cache_head *head, time_t expiry, struct cache_detail *detail); static void cache_fresh_unlocked(struct cache_head *head, @@ -100,6 +101,8 @@ struct cache_head *sunrpc_cache_lookup(s if (cache_is_expired(detail, tmp)) { hlist_del_init(&tmp->cache_list); detail->entries --; + if (cache_is_valid(tmp) == -EAGAIN) + set_bit(CACHE_NEGATIVE, &tmp->flags); cache_fresh_locked(tmp, 0, detail); freeme = tmp; break;
From: Lucas Stach l.stach@pengutronix.de
commit 3a349763cf11e63534b8f2d302f2d0c790566497 upstream.
Currently any changed config register values don't take effect, as the function to write them back is called with the wrong register offset.
Fixes: ff8f83708b3e (Input: synaptics-rmi4 - add support for 2D sensors and F11) Signed-off-by: Lucas Stach l.stach@pengutronix.de Reviewed-by: Philipp Zabel p.zabel@pengutronix.de Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/rmi4/rmi_f11.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/input/rmi4/rmi_f11.c +++ b/drivers/input/rmi4/rmi_f11.c @@ -1239,7 +1239,7 @@ static int rmi_f11_initialize(struct rmi }
rc = f11_write_control_regs(fn, &f11->sens_query, - &f11->dev_controls, fn->fd.query_base_addr); + &f11->dev_controls, fn->fd.control_base_addr); if (rc) dev_warn(&fn->dev, "Failed to write control registers\n");
From: Alex Williamson alex.williamson@redhat.com
commit 492855939bdb59c6f947b0b5b44af9ad82b7e38c upstream.
Memory backed DMA mappings are accounted against a user's locked memory limit, including multiple mappings of the same memory. This accounting bounds the number of such mappings that a user can create. However, DMA mappings that are not backed by memory, such as DMA mappings of device MMIO via mmaps, do not make use of page pinning and therefore do not count against the user's locked memory limit. These mappings still consume memory, but the memory is not well associated to the process for the purpose of oom killing a task.
To add bounding on this use case, we introduce a limit to the total number of concurrent DMA mappings that a user is allowed to create. This limit is exposed as a tunable module option where the default value of 64K is expected to be well in excess of any reasonable use case (a large virtual machine configuration would typically only make use of tens of concurrent mappings).
This fixes CVE-2019-3882.
Reviewed-by: Eric Auger eric.auger@redhat.com Tested-by: Eric Auger eric.auger@redhat.com Reviewed-by: Peter Xu peterx@redhat.com Reviewed-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/vfio/vfio_iommu_type1.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -58,12 +58,18 @@ module_param_named(disable_hugepages, MODULE_PARM_DESC(disable_hugepages, "Disable VFIO IOMMU support for IOMMU hugepages.");
+static unsigned int dma_entry_limit __read_mostly = U16_MAX; +module_param_named(dma_entry_limit, dma_entry_limit, uint, 0644); +MODULE_PARM_DESC(dma_entry_limit, + "Maximum number of user DMA mappings per container (65535)."); + struct vfio_iommu { struct list_head domain_list; struct vfio_domain *external_domain; /* domain for external user */ struct mutex lock; struct rb_root dma_list; struct blocking_notifier_head notifier; + unsigned int dma_avail; bool v2; bool nesting; }; @@ -732,6 +738,7 @@ static void vfio_remove_dma(struct vfio_ vfio_unlink_dma(iommu, dma); put_task_struct(dma->task); kfree(dma); + iommu->dma_avail++; }
static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) @@ -1003,12 +1010,18 @@ static int vfio_dma_do_map(struct vfio_i goto out_unlock; }
+ if (!iommu->dma_avail) { + ret = -ENOSPC; + goto out_unlock; + } + dma = kzalloc(sizeof(*dma), GFP_KERNEL); if (!dma) { ret = -ENOMEM; goto out_unlock; }
+ iommu->dma_avail--; dma->iova = iova; dma->vaddr = vaddr; dma->prot = prot; @@ -1504,6 +1517,7 @@ static void *vfio_iommu_type1_open(unsig
INIT_LIST_HEAD(&iommu->domain_list); iommu->dma_list = RB_ROOT; + iommu->dma_avail = dma_entry_limit; mutex_init(&iommu->lock); BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
From: Dirk Behme dirk.behme@de.bosch.com
commit 907bd68a2edc491849e2fdcfe52c4596627bca94 upstream.
Having a cyclic DMA, a residue 0 is not an indication of a completed DMA. In case of cyclic DMA make sure that dma_set_residue() is called and with this a residue of 0 is forwarded correctly to the caller.
Fixes: 3544d2878817 ("dmaengine: rcar-dmac: use result of updated get_residue in tx_status") Signed-off-by: Dirk Behme dirk.behme@de.bosch.com Signed-off-by: Achim Dahlhoff Achim.Dahlhoff@de.bosch.com Signed-off-by: Hiroyuki Yokoyama hiroyuki.yokoyama.vx@renesas.com Signed-off-by: Yao Lihua ylhuajnu@outlook.com Reviewed-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/dma/sh/rcar-dmac.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/dma/sh/rcar-dmac.c +++ b/drivers/dma/sh/rcar-dmac.c @@ -1332,6 +1332,7 @@ static enum dma_status rcar_dmac_tx_stat enum dma_status status; unsigned long flags; unsigned int residue; + bool cyclic;
status = dma_cookie_status(chan, cookie, txstate); if (status == DMA_COMPLETE || !txstate) @@ -1339,10 +1340,11 @@ static enum dma_status rcar_dmac_tx_stat
spin_lock_irqsave(&rchan->lock, flags); residue = rcar_dmac_chan_get_residue(rchan, cookie); + cyclic = rchan->desc.running ? rchan->desc.running->cyclic : false; spin_unlock_irqrestore(&rchan->lock, flags);
/* if there's no residue, the cookie is complete */ - if (!residue) + if (!residue && !cyclic) return DMA_COMPLETE;
dma_set_residue(txstate, residue);
From: Ard Biesheuvel ard.biesheuvel@linaro.org
commit e17b1af96b2afc38e684aa2f1033387e2ed10029 upstream.
The EFI stub is entered with the caches and MMU enabled by the firmware, and once the stub is ready to hand over to the decompressor, we clean and disable the caches.
The cache clean routines use CP15 barrier instructions, which can be disabled via SCTLR. Normally, when using the provided cache handling routines to enable the caches and MMU, this bit is enabled as well. However, but since we entered the stub with the caches already enabled, this routine is not executed before we call the cache clean routines, resulting in undefined instruction exceptions if the firmware never enabled this bit.
So set the bit explicitly in the EFI entry code, but do so in a way that guarantees that the resulting code can still run on v6 cores as well (which are guaranteed to have CP15 barriers enabled)
Cc: stable@vger.kernel.org # v4.9+ Acked-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/compressed/head.S | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
--- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -1393,7 +1393,21 @@ ENTRY(efi_stub_entry)
@ Preserve return value of efi_entry() in r4 mov r4, r0 - bl cache_clean_flush + + @ our cache maintenance code relies on CP15 barrier instructions + @ but since we arrived here with the MMU and caches configured + @ by UEFI, we must check that the CP15BEN bit is set in SCTLR. + @ Note that this bit is RAO/WI on v6 and earlier, so the ISB in + @ the enable path will be executed on v7+ only. + mrc p15, 0, r1, c1, c0, 0 @ read SCTLR + tst r1, #(1 << 5) @ CP15BEN bit set? + bne 0f + orr r1, r1, #(1 << 5) @ CP15 barrier instructions + mcr p15, 0, r1, c1, c0, 0 @ write SCTLR + ARM( .inst 0xf57ff06f @ v7+ isb ) + THUMB( isb ) + +0: bl cache_clean_flush bl cache_off
@ Set parameters for booting zImage according to boot protocol
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
commit d08106796a78a4273e39e1bbdf538dc4334b2635 upstream.
__drm_atomic_helper_crtc_destroy_state does not free memory, it only cleans it up. Fix this by calling the functions own destroy function.
Fixes: 6d6e50039187 ("drm/vc4: Allocate the right amount of space for boot-time CRTC state.") Cc: Eric Anholt eric@anholt.net Cc: stable@vger.kernel.org # v4.6+ Reviewed-by: Eric Anholt eric@anholt.net Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20190301125627.7285-2-maarten.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/vc4/vc4_crtc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/vc4/vc4_crtc.c +++ b/drivers/gpu/drm/vc4/vc4_crtc.c @@ -867,7 +867,7 @@ static void vc4_crtc_reset(struct drm_crtc *crtc) { if (crtc->state) - __drm_atomic_helper_crtc_destroy_state(crtc->state); + vc4_crtc_destroy_state(crtc->state);
crtc->state = kzalloc(sizeof(struct vc4_crtc_state), GFP_KERNEL); if (crtc->state)
From: Dave Airlie airlied@redhat.com
commit 9fa246256e09dc30820524401cdbeeaadee94025 upstream.
This reverts commit d179b88deb3bf6fed4991a31fd6f0f2cad21fab5.
This commit is documented to break userspace X.org modesetting driver in certain configurations.
The X.org modesetting userspace driver is broken. No fixes are available yet. In order for this patch to be applied it either needs a config option or a workaround developed.
This has been reported a few times, saying it's a userspace problem is clearly against the regression rules.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109806 Signed-off-by: Dave Airlie airlied@redhat.com Cc: stable@vger.kernel.org # v3.19+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/intel_fbdev.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
--- a/drivers/gpu/drm/i915/intel_fbdev.c +++ b/drivers/gpu/drm/i915/intel_fbdev.c @@ -326,8 +326,8 @@ static bool intel_fb_initial_config(stru bool *enabled, int width, int height) { struct drm_i915_private *dev_priv = to_i915(fb_helper->dev); + unsigned long conn_configured, conn_seq, mask; unsigned int count = min(fb_helper->connector_count, BITS_PER_LONG); - unsigned long conn_configured, conn_seq; int i, j; bool *save_enabled; bool fallback = true, ret = true; @@ -345,9 +345,10 @@ static bool intel_fb_initial_config(stru drm_modeset_backoff(&ctx);
memcpy(save_enabled, enabled, count); - conn_seq = GENMASK(count - 1, 0); + mask = GENMASK(count - 1, 0); conn_configured = 0; retry: + conn_seq = conn_configured; for (i = 0; i < count; i++) { struct drm_fb_helper_connector *fb_conn; struct drm_connector *connector; @@ -360,8 +361,7 @@ retry: if (conn_configured & BIT(i)) continue;
- /* First pass, only consider tiled connectors */ - if (conn_seq == GENMASK(count - 1, 0) && !connector->has_tile) + if (conn_seq == 0 && !connector->has_tile) continue;
if (connector->status == connector_status_connected) @@ -465,10 +465,8 @@ retry: conn_configured |= BIT(i); }
- if (conn_configured != conn_seq) { /* repeat until no more are found */ - conn_seq = conn_configured; + if ((conn_configured & mask) != mask && conn_configured != conn_seq) goto retry; - }
/* * If the BIOS didn't enable everything it could, fall back to have the
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
commit 462ce5d963f18b71c63f6b7730a35a2ee5273540 upstream.
A pointer to crtc was missing, resulting in the following build error: drivers/gpu/drm/vc4/vc4_crtc.c:1045:44: sparse: sparse: incorrect type in argument 1 (different base types) drivers/gpu/drm/vc4/vc4_crtc.c:1045:44: sparse: expected struct drm_crtc *crtc drivers/gpu/drm/vc4/vc4_crtc.c:1045:44: sparse: got struct drm_crtc_state *state drivers/gpu/drm/vc4/vc4_crtc.c:1045:39: sparse: sparse: not enough arguments for function vc4_crtc_destroy_state
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Reported-by: kbuild test robot lkp@intel.com Cc: Eric Anholt eric@anholt.net Link: https://patchwork.freedesktop.org/patch/msgid/2b6ed5e6-81b0-4276-8860-870b54... Fixes: d08106796a78 ("drm/vc4: Fix memory leak during gpu reset.") Cc: stable@vger.kernel.org # v4.6+ Acked-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/vc4/vc4_crtc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/vc4/vc4_crtc.c +++ b/drivers/gpu/drm/vc4/vc4_crtc.c @@ -867,7 +867,7 @@ static void vc4_crtc_reset(struct drm_crtc *crtc) { if (crtc->state) - vc4_crtc_destroy_state(crtc->state); + vc4_crtc_destroy_state(crtc, crtc->state);
crtc->state = kzalloc(sizeof(struct vc4_crtc_state), GFP_KERNEL); if (crtc->state)
From: Kai-Heng Feng kai.heng.feng@canonical.com
commit 7529b2574a7aaf902f1f8159fbc2a7caa74be559 upstream.
Use new helpers to make LPM enabling/disabling more clear.
This is a preparation to subsequent patch.
Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Cc: stable stable@vger.kernel.org # after much soaking Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/core/driver.c | 12 +++++++++++- drivers/usb/core/hub.c | 12 ++++++------ drivers/usb/core/message.c | 2 +- drivers/usb/core/sysfs.c | 5 ++++- drivers/usb/core/usb.h | 10 ++++++++-- 5 files changed, 30 insertions(+), 11 deletions(-)
--- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1891,7 +1891,7 @@ int usb_runtime_idle(struct device *dev) return -EBUSY; }
-int usb_set_usb2_hardware_lpm(struct usb_device *udev, int enable) +static int usb_set_usb2_hardware_lpm(struct usb_device *udev, int enable) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); int ret = -EPERM; @@ -1908,6 +1908,16 @@ int usb_set_usb2_hardware_lpm(struct usb return ret; }
+int usb_enable_usb2_hardware_lpm(struct usb_device *udev) +{ + return usb_set_usb2_hardware_lpm(udev, 1); +} + +int usb_disable_usb2_hardware_lpm(struct usb_device *udev) +{ + return usb_set_usb2_hardware_lpm(udev, 0); +} + #endif /* CONFIG_PM */
struct bus_type usb_bus_type = { --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -3175,7 +3175,7 @@ int usb_port_suspend(struct usb_device *
/* disable USB2 hardware LPM */ if (udev->usb2_hw_lpm_enabled == 1) - usb_set_usb2_hardware_lpm(udev, 0); + usb_disable_usb2_hardware_lpm(udev);
if (usb_disable_ltm(udev)) { dev_err(&udev->dev, "Failed to disable LTM before suspend\n."); @@ -3214,7 +3214,7 @@ int usb_port_suspend(struct usb_device * err_ltm: /* Try to enable USB2 hardware LPM again */ if (udev->usb2_hw_lpm_capable == 1) - usb_set_usb2_hardware_lpm(udev, 1); + usb_enable_usb2_hardware_lpm(udev);
if (udev->do_remote_wakeup) (void) usb_disable_remote_wakeup(udev); @@ -3498,7 +3498,7 @@ int usb_port_resume(struct usb_device *u } else { /* Try to enable USB2 hardware LPM */ if (udev->usb2_hw_lpm_capable == 1) - usb_set_usb2_hardware_lpm(udev, 1); + usb_enable_usb2_hardware_lpm(udev);
/* Try to enable USB3 LTM */ usb_enable_ltm(udev); @@ -4334,7 +4334,7 @@ static void hub_set_initial_usb2_lpm_pol if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT)) || connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) { udev->usb2_hw_lpm_allowed = 1; - usb_set_usb2_hardware_lpm(udev, 1); + usb_enable_usb2_hardware_lpm(udev); } }
@@ -5492,7 +5492,7 @@ static int usb_reset_and_verify_device(s * It will be re-enabled by the enumeration process. */ if (udev->usb2_hw_lpm_enabled == 1) - usb_set_usb2_hardware_lpm(udev, 0); + usb_disable_usb2_hardware_lpm(udev);
/* Disable LPM and LTM while we reset the device and reinstall the alt * settings. Device-initiated LPM settings, and system exit latency @@ -5602,7 +5602,7 @@ static int usb_reset_and_verify_device(s
done: /* Now that the alt settings are re-installed, enable LTM and LPM. */ - usb_set_usb2_hardware_lpm(udev, 1); + usb_enable_usb2_hardware_lpm(udev); usb_unlocked_enable_lpm(udev); usb_enable_ltm(udev); usb_release_bos_descriptor(udev); --- a/drivers/usb/core/message.c +++ b/drivers/usb/core/message.c @@ -1183,7 +1183,7 @@ void usb_disable_device(struct usb_devic }
if (dev->usb2_hw_lpm_enabled == 1) - usb_set_usb2_hardware_lpm(dev, 0); + usb_disable_usb2_hardware_lpm(dev); usb_unlocked_disable_lpm(dev); usb_disable_ltm(dev);
--- a/drivers/usb/core/sysfs.c +++ b/drivers/usb/core/sysfs.c @@ -508,7 +508,10 @@ static ssize_t usb2_hardware_lpm_store(s
if (!ret) { udev->usb2_hw_lpm_allowed = value; - ret = usb_set_usb2_hardware_lpm(udev, value); + if (value) + ret = usb_enable_usb2_hardware_lpm(udev); + else + ret = usb_disable_usb2_hardware_lpm(udev); }
usb_unlock_device(udev); --- a/drivers/usb/core/usb.h +++ b/drivers/usb/core/usb.h @@ -89,7 +89,8 @@ extern int usb_remote_wakeup(struct usb_ extern int usb_runtime_suspend(struct device *dev); extern int usb_runtime_resume(struct device *dev); extern int usb_runtime_idle(struct device *dev); -extern int usb_set_usb2_hardware_lpm(struct usb_device *udev, int enable); +extern int usb_enable_usb2_hardware_lpm(struct usb_device *udev); +extern int usb_disable_usb2_hardware_lpm(struct usb_device *udev);
#else
@@ -109,7 +110,12 @@ static inline int usb_autoresume_device( return 0; }
-static inline int usb_set_usb2_hardware_lpm(struct usb_device *udev, int enable) +static inline int usb_enable_usb2_hardware_lpm(struct usb_device *udev) +{ + return 0; +} + +static inline int usb_disable_usb2_hardware_lpm(struct usb_device *udev) { return 0; }
From: Kai-Heng Feng kai.heng.feng@canonical.com
commit d7a6c0ce8d26412903c7981503bad9e1cc7c45d2 upstream.
USB Bluetooth controller QCA ROME (0cf3:e007) sometimes stops working after S3: [ 165.110742] Bluetooth: hci0: using NVM file: qca/nvm_usb_00000302.bin [ 168.432065] Bluetooth: hci0: Failed to send body at 4 of 1953 (-110)
After some experiments, I found that disabling LPM can workaround the issue.
On some platforms, the USB power is cut during S3, so the driver uses reset-resume to resume the device. During port resume, LPM gets enabled twice, by usb_reset_and_verify_device() and usb_port_resume().
Consolidate all checks into new LPM helpers to make sure LPM only gets enabled once.
Fixes: de68bab4fa96 ("usb: Don't enable USB 2.0 Link PM by default.”) Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Cc: stable stable@vger.kernel.org # after much soaking Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/core/driver.c | 11 ++++++++--- drivers/usb/core/hub.c | 12 ++++-------- drivers/usb/core/message.c | 3 +-- 3 files changed, 13 insertions(+), 13 deletions(-)
--- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1896,9 +1896,6 @@ static int usb_set_usb2_hardware_lpm(str struct usb_hcd *hcd = bus_to_hcd(udev->bus); int ret = -EPERM;
- if (enable && !udev->usb2_hw_lpm_allowed) - return 0; - if (hcd->driver->set_usb2_hw_lpm) { ret = hcd->driver->set_usb2_hw_lpm(hcd, udev, enable); if (!ret) @@ -1910,11 +1907,19 @@ static int usb_set_usb2_hardware_lpm(str
int usb_enable_usb2_hardware_lpm(struct usb_device *udev) { + if (!udev->usb2_hw_lpm_capable || + !udev->usb2_hw_lpm_allowed || + udev->usb2_hw_lpm_enabled) + return 0; + return usb_set_usb2_hardware_lpm(udev, 1); }
int usb_disable_usb2_hardware_lpm(struct usb_device *udev) { + if (!udev->usb2_hw_lpm_enabled) + return 0; + return usb_set_usb2_hardware_lpm(udev, 0); }
--- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -3174,8 +3174,7 @@ int usb_port_suspend(struct usb_device * }
/* disable USB2 hardware LPM */ - if (udev->usb2_hw_lpm_enabled == 1) - usb_disable_usb2_hardware_lpm(udev); + usb_disable_usb2_hardware_lpm(udev);
if (usb_disable_ltm(udev)) { dev_err(&udev->dev, "Failed to disable LTM before suspend\n."); @@ -3213,8 +3212,7 @@ int usb_port_suspend(struct usb_device * usb_enable_ltm(udev); err_ltm: /* Try to enable USB2 hardware LPM again */ - if (udev->usb2_hw_lpm_capable == 1) - usb_enable_usb2_hardware_lpm(udev); + usb_enable_usb2_hardware_lpm(udev);
if (udev->do_remote_wakeup) (void) usb_disable_remote_wakeup(udev); @@ -3497,8 +3495,7 @@ int usb_port_resume(struct usb_device *u hub_port_logical_disconnect(hub, port1); } else { /* Try to enable USB2 hardware LPM */ - if (udev->usb2_hw_lpm_capable == 1) - usb_enable_usb2_hardware_lpm(udev); + usb_enable_usb2_hardware_lpm(udev);
/* Try to enable USB3 LTM */ usb_enable_ltm(udev); @@ -5491,8 +5488,7 @@ static int usb_reset_and_verify_device(s /* Disable USB2 hardware LPM. * It will be re-enabled by the enumeration process. */ - if (udev->usb2_hw_lpm_enabled == 1) - usb_disable_usb2_hardware_lpm(udev); + usb_disable_usb2_hardware_lpm(udev);
/* Disable LPM and LTM while we reset the device and reinstall the alt * settings. Device-initiated LPM settings, and system exit latency --- a/drivers/usb/core/message.c +++ b/drivers/usb/core/message.c @@ -1182,8 +1182,7 @@ void usb_disable_device(struct usb_devic dev->actconfig->interface[i] = NULL; }
- if (dev->usb2_hw_lpm_enabled == 1) - usb_disable_usb2_hardware_lpm(dev); + usb_disable_usb2_hardware_lpm(dev); usb_unlocked_disable_lpm(dev); usb_disable_ltm(dev);
From: Dan Carpenter dan.carpenter@oracle.com
commit 7159a986b4202343f6cca3bb8079ecace5816fd6 upstream.
We can't pass error pointers to brelse().
Fixes: fb265c9cb49e ("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/xattr.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -828,6 +828,7 @@ int ext4_get_inode_usage(struct inode *i bh = ext4_sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl, REQ_PRIO); if (IS_ERR(bh)) { ret = PTR_ERR(bh); + bh = NULL; goto out; }
@@ -2905,6 +2906,7 @@ int ext4_xattr_delete_inode(handle_t *ha if (error == -EIO) EXT4_ERROR_INODE(inode, "block %llu read error", EXT4_I(inode)->i_file_acl); + bh = NULL; goto cleanup; } error = ext4_xattr_check_block(inode, bh); @@ -3061,6 +3063,7 @@ ext4_xattr_block_cache_find(struct inode if (IS_ERR(bh)) { if (PTR_ERR(bh) == -ENOMEM) return NULL; + bh = NULL; EXT4_ERROR_INODE(inode, "block %lu read error", (unsigned long)ce->e_value); } else if (ext4_xattr_cmp(header, BHDR(bh)) == 0) {
From: Adalbert Lazăr alazar@bitdefender.com
commit 4c404ce23358d5d8fbdeb7a6021a9b33d3c3c167 upstream.
Previous to commit 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug"), vsock_core_init() was called from virtio_vsock_probe(). Now, virtio_transport_reset_no_sock() can be called before vsock_core_init() has the chance to run.
[Wed Feb 27 14:17:09 2019] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110 [Wed Feb 27 14:17:09 2019] #PF error: [normal kernel read fault] [Wed Feb 27 14:17:09 2019] PGD 0 P4D 0 [Wed Feb 27 14:17:09 2019] Oops: 0000 [#1] SMP PTI [Wed Feb 27 14:17:09 2019] CPU: 3 PID: 59 Comm: kworker/3:1 Not tainted 5.0.0-rc7-390-generic-hvi #390 [Wed Feb 27 14:17:09 2019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [Wed Feb 27 14:17:09 2019] Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport] [Wed Feb 27 14:17:09 2019] RIP: 0010:virtio_transport_reset_no_sock+0x8c/0xc0 [vmw_vsock_virtio_transport_common] [Wed Feb 27 14:17:09 2019] Code: 35 8b 4f 14 48 8b 57 08 31 f6 44 8b 4f 10 44 8b 07 48 8d 7d c8 e8 84 f8 ff ff 48 85 c0 48 89 c3 74 2a e8 f7 31 03 00 48 89 df <48> 8b 80 10 01 00 00 e8 68 fb 69 ed 48 8b 75 f0 65 48 33 34 25 28 [Wed Feb 27 14:17:09 2019] RSP: 0018:ffffb42701ab7d40 EFLAGS: 00010282 [Wed Feb 27 14:17:09 2019] RAX: 0000000000000000 RBX: ffff9d79637ee080 RCX: 0000000000000003 [Wed Feb 27 14:17:09 2019] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9d79637ee080 [Wed Feb 27 14:17:09 2019] RBP: ffffb42701ab7d78 R08: ffff9d796fae70e0 R09: ffff9d796f403500 [Wed Feb 27 14:17:09 2019] R10: ffffb42701ab7d90 R11: 0000000000000000 R12: ffff9d7969d09240 [Wed Feb 27 14:17:09 2019] R13: ffff9d79624e6840 R14: ffff9d7969d09318 R15: ffff9d796d48ff80 [Wed Feb 27 14:17:09 2019] FS: 0000000000000000(0000) GS:ffff9d796fac0000(0000) knlGS:0000000000000000 [Wed Feb 27 14:17:09 2019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Wed Feb 27 14:17:09 2019] CR2: 0000000000000110 CR3: 0000000427f22000 CR4: 00000000000006e0 [Wed Feb 27 14:17:09 2019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [Wed Feb 27 14:17:09 2019] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [Wed Feb 27 14:17:09 2019] Call Trace: [Wed Feb 27 14:17:09 2019] virtio_transport_recv_pkt+0x63/0x820 [vmw_vsock_virtio_transport_common] [Wed Feb 27 14:17:09 2019] ? kfree+0x17e/0x190 [Wed Feb 27 14:17:09 2019] ? detach_buf_split+0x145/0x160 [Wed Feb 27 14:17:09 2019] ? __switch_to_asm+0x40/0x70 [Wed Feb 27 14:17:09 2019] virtio_transport_rx_work+0xa0/0x106 [vmw_vsock_virtio_transport] [Wed Feb 27 14:17:09 2019] NET: Registered protocol family 40 [Wed Feb 27 14:17:09 2019] process_one_work+0x167/0x410 [Wed Feb 27 14:17:09 2019] worker_thread+0x4d/0x460 [Wed Feb 27 14:17:09 2019] kthread+0x105/0x140 [Wed Feb 27 14:17:09 2019] ? rescuer_thread+0x360/0x360 [Wed Feb 27 14:17:09 2019] ? kthread_destroy_worker+0x50/0x50 [Wed Feb 27 14:17:09 2019] ret_from_fork+0x35/0x40 [Wed Feb 27 14:17:09 2019] Modules linked in: vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common input_leds vsock serio_raw i2c_piix4 mac_hid qemu_fw_cfg autofs4 cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net psmouse drm net_failover pata_acpi virtio_blk failover floppy
Fixes: 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug") Reported-by: Alexandru Herghelegiu aherghelegiu@bitdefender.com Signed-off-by: Adalbert Lazăr alazar@bitdefender.com Co-developed-by: Stefan Hajnoczi stefanha@redhat.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/vmw_vsock/virtio_transport_common.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-)
--- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -662,6 +662,8 @@ static int virtio_transport_reset(struct */ static int virtio_transport_reset_no_sock(struct virtio_vsock_pkt *pkt) { + const struct virtio_transport *t; + struct virtio_vsock_pkt *reply; struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_RST, .type = le16_to_cpu(pkt->hdr.type), @@ -672,15 +674,21 @@ static int virtio_transport_reset_no_soc if (le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RST) return 0;
- pkt = virtio_transport_alloc_pkt(&info, 0, - le64_to_cpu(pkt->hdr.dst_cid), - le32_to_cpu(pkt->hdr.dst_port), - le64_to_cpu(pkt->hdr.src_cid), - le32_to_cpu(pkt->hdr.src_port)); - if (!pkt) + reply = virtio_transport_alloc_pkt(&info, 0, + le64_to_cpu(pkt->hdr.dst_cid), + le32_to_cpu(pkt->hdr.dst_port), + le64_to_cpu(pkt->hdr.src_cid), + le32_to_cpu(pkt->hdr.src_port)); + if (!reply) return -ENOMEM;
- return virtio_transport_get_ops()->send_pkt(pkt); + t = virtio_transport_get_ops(); + if (!t) { + virtio_transport_free_pkt(reply); + return -ENOTCONN; + } + + return t->send_pkt(reply); }
static void virtio_transport_wait_close(struct sock *sk, long timeout)
From: Xin Long lucien.xin@gmail.com
commit 2ac695d1d602ce00b12170242f58c3d3a8e36d04 upstream.
Syzbot found a crash:
BUG: KMSAN: uninit-value in tipc_nl_compat_name_table_dump+0x54f/0xcd0 net/tipc/netlink_compat.c:872 Call Trace: tipc_nl_compat_name_table_dump+0x54f/0xcd0 net/tipc/netlink_compat.c:872 __tipc_nl_compat_dumpit+0x59e/0xda0 net/tipc/netlink_compat.c:215 tipc_nl_compat_dumpit+0x63a/0x820 net/tipc/netlink_compat.c:280 tipc_nl_compat_handle net/tipc/netlink_compat.c:1226 [inline] tipc_nl_compat_recv+0x1b5f/0x2750 net/tipc/netlink_compat.c:1265 genl_family_rcv_msg net/netlink/genetlink.c:601 [inline] genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 genl_rcv+0x63/0x80 net/netlink/genetlink.c:637 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg net/socket.c:632 [inline]
Uninit was created at: __alloc_skb+0x309/0xa20 net/core/skbuff.c:208 alloc_skb include/linux/skbuff.h:1012 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg net/socket.c:632 [inline]
It was supposed to be fixed on commit 974cb0e3e7c9 ("tipc: fix uninit-value in tipc_nl_compat_name_table_dump") by checking TLV_GET_DATA_LEN(msg->req) in cmd->header()/tipc_nl_compat_name_table_dump_header(), which is called ahead of tipc_nl_compat_name_table_dump().
However, tipc_nl_compat_dumpit() doesn't handle the error returned from cmd header function. It means even when the check added in that fix fails, it won't stop calling tipc_nl_compat_name_table_dump(), and the issue will be triggered again.
So this patch is to add the process for the err returned from cmd header function in tipc_nl_compat_dumpit().
Reported-by: syzbot+3ce8520484b0d4e260a5@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/tipc/netlink_compat.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/net/tipc/netlink_compat.c +++ b/net/tipc/netlink_compat.c @@ -262,8 +262,14 @@ static int tipc_nl_compat_dumpit(struct if (msg->rep_type) tipc_tlv_init(msg->rep, msg->rep_type);
- if (cmd->header) - (*cmd->header)(msg); + if (cmd->header) { + err = (*cmd->header)(msg); + if (err) { + kfree_skb(msg->rep); + msg->rep = NULL; + return err; + } + }
arg = nlmsg_new(0, GFP_KERNEL); if (!arg) {
From: Linus Torvalds torvalds@linux-foundation.org
commit baf76f0c58aec435a3a864075b8f6d8ee5d1f17e upstream.
This way, slhc_free() accepts what slhc_init() returns, whether that is an error or not.
In particular, the pattern in sl_alloc_bufs() is
slcomp = slhc_init(16, 16); ... slhc_free(slcomp);
for the error handling path, and rather than complicate that code, just make it ok to always free what was returned by the init function.
That's what the code used to do before commit 4ab42d78e37a ("ppp, slip: Validate VJ compression slot parameters completely") when slhc_init() just returned NULL for the error case, with no actual indication of the details of the error.
Reported-by: syzbot+45474c076a4927533d2e@syzkaller.appspotmail.com Fixes: 4ab42d78e37a ("ppp, slip: Validate VJ compression slot parameters completely") Acked-by: Ben Hutchings ben@decadent.org.uk Cc: David Miller davem@davemloft.net Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/slip/slhc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/slip/slhc.c +++ b/drivers/net/slip/slhc.c @@ -153,7 +153,7 @@ out_fail: void slhc_free(struct slcompress *comp) { - if ( comp == NULLSLCOMPR ) + if ( IS_ERR_OR_NULL(comp) ) return;
if ( comp->tstate != NULLSLSTATE )
From: Alexander Shishkin alexander.shishkin@linux.intel.com
commit 91d3f8a629849968dc91d6ce54f2d46abf4feb7f upstream.
Commit 9ed3f22223c3 ("intel_th: Don't reference unassigned outputs") fixes a NULL dereference for all masters except the last one ("256+"), which keeps the stale pointer after the output driver had been unassigned.
Fix the off-by-one.
Signed-off-by: Alexander Shishkin alexander.shishkin@linux.intel.com Fixes: 9ed3f22223c3 ("intel_th: Don't reference unassigned outputs") Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwtracing/intel_th/gth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hwtracing/intel_th/gth.c +++ b/drivers/hwtracing/intel_th/gth.c @@ -624,7 +624,7 @@ static void intel_th_gth_unassign(struct othdev->output.port = -1; othdev->output.active = false; gth->output[port].output = NULL; - for (master = 0; master < TH_CONFIGURABLE_MASTERS; master++) + for (master = 0; master <= TH_CONFIGURABLE_MASTERS; master++) if (gth->master[master] == port) gth->master[master] = -1; spin_unlock(>h->gth_lock);
From: YueHaibing yuehaibing@huawei.com
commit 89189557b47b35683a27c80ee78aef18248eefb4 upstream.
Syzkaller report this:
sysctl could not get directory: /net//bridge -12 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G C 5.1.0-rc3+ #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline] RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline] RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline] RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459 Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48 RSP: 0018:ffff8881bb507778 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff8881f224b5b8 RCX: ffffffff818f3f6a RDX: 000000000000000a RSI: 0000000000000050 RDI: ffff8881f224b568 RBP: 0000000000000000 R08: ffffed10376a0ef4 R09: ffffed10376a0ef4 R10: 0000000000000001 R11: ffffed10376a0ef4 R12: ffff8881f224b558 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f3e7ce13700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd60fbe9398 CR3: 00000001cb55c001 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: erase_entry fs/proc/proc_sysctl.c:178 [inline] erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207 start_unregistering fs/proc/proc_sysctl.c:331 [inline] drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631 get_subdir fs/proc/proc_sysctl.c:1022 [inline] __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335 br_netfilter_init+0x68/0x1000 [br_netfilter] do_one_initcall+0xbc/0x47d init/main.c:901 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy mdio_cavium iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: br_netfilter] Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 68741688d5fbfe85 ]---
commit 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links") forgot to handle start_unregistering() case, while header->parent is NULL, it calls erase_header() and as seen in the above syzkaller call trace, accessing &header->parent->root will trigger a NULL pointer dereference.
As that commit explained, there is also no need to call start_unregistering() if header->parent is NULL.
Link: http://lkml.kernel.org/r/20190409153622.28112-1-yuehaibing@huawei.com Fixes: 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links") Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets") Signed-off-by: YueHaibing yuehaibing@huawei.com Reported-by: Hulk Robot hulkci@huawei.com Reviewed-by: Kees Cook keescook@chromium.org Cc: Luis Chamberlain mcgrof@kernel.org Cc: Alexey Dobriyan adobriyan@gmail.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: "Eric W. Biederman" ebiederm@xmission.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/proc/proc_sysctl.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -1620,9 +1620,11 @@ static void drop_sysctl_table(struct ctl if (--header->nreg) return;
- if (parent) + if (parent) { put_links(header); - start_unregistering(header); + start_unregistering(header); + } + if (!--header->count) kfree_rcu(header, rcu);
From: Andrea Claudi aclaudi@redhat.com
commit c93a49b9769e435990c82297aa0baa31e1538790 upstream.
When CONFIG_IP_VS_IPV6 is not defined, build produced this warning:
net/netfilter/ipvs/ip_vs_ctl.c:899:6: warning: unused variable ‘ret’ [-Wunused-variable] int ret = 0; ^~~
Fix this by moving the declaration of 'ret' in the CONFIG_IP_VS_IPV6 section in the same function.
While at it, drop its unneeded initialisation.
Fixes: 098e13f5b21d ("ipvs: fix dependency on nf_defrag_ipv6") Reported-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Andrea Claudi aclaudi@redhat.com Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/ipvs/ip_vs_ctl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -889,12 +889,13 @@ ip_vs_new_dest(struct ip_vs_service *svc { struct ip_vs_dest *dest; unsigned int atype, i; - int ret = 0;
EnterFunction(2);
#ifdef CONFIG_IP_VS_IPV6 if (udest->af == AF_INET6) { + int ret; + atype = ipv6_addr_type(&udest->addr.in6); if ((!(atype & IPV6_ADDR_UNICAST) || atype & IPV6_ADDR_LINKLOCAL) &&
From: Todd Kjos tkjos@android.com
commit 26528be6720bb40bc8844e97ee73a37e530e9c5e upstream.
Fixes crash found by syzbot: kernel BUG at drivers/android/binder_alloc.c:LINE! (2)
Reported-and-tested-by: syzbot+55de1eb4975dec156d8f@syzkaller.appspotmail.com Signed-off-by: Todd Kjos tkjos@google.com Reviewed-by: Joel Fernandes (Google) joel@joelfernandes.org Cc: stable stable@vger.kernel.org # 5.0, 4.19, 4.14 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/android/binder_alloc.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)
--- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -945,14 +945,13 @@ enum lru_status binder_alloc_free_page(s
index = page - alloc->pages; page_addr = (uintptr_t)alloc->buffer + index * PAGE_SIZE; + + mm = alloc->vma_vm_mm; + if (!mmget_not_zero(mm)) + goto err_mmget; + if (!down_write_trylock(&mm->mmap_sem)) + goto err_down_write_mmap_sem_failed; vma = binder_alloc_get_vma(alloc); - if (vma) { - if (!mmget_not_zero(alloc->vma_vm_mm)) - goto err_mmget; - mm = alloc->vma_vm_mm; - if (!down_write_trylock(&mm->mmap_sem)) - goto err_down_write_mmap_sem_failed; - }
list_lru_isolate(lru, item); spin_unlock(lock); @@ -965,10 +964,9 @@ enum lru_status binder_alloc_free_page(s PAGE_SIZE);
trace_binder_unmap_user_end(alloc, index); - - up_write(&mm->mmap_sem); - mmput(mm); } + up_write(&mm->mmap_sem); + mmput(mm);
trace_binder_unmap_kernel_start(alloc, index);
From: luca abeni luca.abeni@santannapisa.it
commit 1b02cd6a2d7f3e2a6a5262887d2cb2912083e42f upstream.
syzbot reported the following warning:
[ ] WARNING: CPU: 4 PID: 17089 at kernel/sched/deadline.c:255 task_non_contending+0xae0/0x1950
line 255 of deadline.c is:
WARN_ON(hrtimer_active(&dl_se->inactive_timer));
in task_non_contending().
Unfortunately, in some cases (for example, a deadline task continuosly blocking and waking immediately) it can happen that a task blocks (and task_non_contending() is called) while the 0-lag timer is still active.
In this case, the safest thing to do is to immediately decrease the running bandwidth of the task, without trying to re-arm the 0-lag timer.
Signed-off-by: luca abeni luca.abeni@santannapisa.it Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Juri Lelli juri.lelli@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: chengjian (D) cj.chengjian@huawei.com Link: https://lkml.kernel.org/r/20190325131530.34706-1-luca.abeni@santannapisa.it Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/deadline.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -217,7 +217,6 @@ static void task_non_contending(struct t if (dl_se->dl_runtime == 0) return;
- WARN_ON(hrtimer_active(&dl_se->inactive_timer)); WARN_ON(dl_se->dl_non_contending);
zerolag_time = dl_se->deadline - @@ -234,7 +233,7 @@ static void task_non_contending(struct t * If the "0-lag time" already passed, decrease the active * utilization now, instead of starting a timer */ - if (zerolag_time < 0) { + if ((zerolag_time < 0) || hrtimer_active(&dl_se->inactive_timer)) { if (dl_task(p)) sub_running_bw(dl_se->dl_bw, dl_rq); if (!dl_task(p) || p->state == TASK_DEAD) {
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 7c2bd9a39845bfb6d72ddb55ce737650271f6f96 upstream.
syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family (which is embedded into user-visible "struct nfs_mount_data" structure) despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6) bytes of AF_INET6 address to rpc_sockaddr2uaddr().
Since "struct nfs_mount_data" structure is user-visible, we can't change "struct nfs_mount_data" to use "struct sockaddr_storage". Therefore, assuming that everybody is using AF_INET family when passing address via "struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET.
[1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75...
Reported-by: syzbot syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfs/super.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -2044,7 +2044,8 @@ static int nfs23_validate_mount_data(voi memcpy(sap, &data->addr, sizeof(data->addr)); args->nfs_server.addrlen = sizeof(data->addr); args->nfs_server.port = ntohs(data->addr.sin_port); - if (!nfs_verify_server_address(sap)) + if (sap->sa_family != AF_INET || + !nfs_verify_server_address(sap)) goto out_no_address;
if (!(data->flags & NFS_MOUNT_TCP))
From: Florian Westphal fw@strlen.de
commit 7caa56f006e9d712b44f27b32520c66420d5cbc6 upstream.
It means userspace gave us a ruleset where there is some other data after the ebtables target but before the beginning of the next rule.
Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support") Reported-by: syzbot+659574e7bcc7f7eb4df7@syzkaller.appspotmail.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/bridge/netfilter/ebtables.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -2030,7 +2030,8 @@ static int ebt_size_mwt(struct compat_eb if (match_kern) match_kern->match_size = ret;
- if (WARN_ON(type == EBT_COMPAT_TARGET && size_left)) + /* rule should have no remaining data after target */ + if (type == EBT_COMPAT_TARGET && size_left) return -EINVAL;
match32 = (struct compat_ebt_entry_mwt *) buf;
From: Yue Haibing yuehaibing@huawei.com
commit 01ca667133d019edc9f0a1f70a272447c84ec41f upstream.
Syzkaller report this:
kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 0 PID: 4378 Comm: syz-executor.0 Tainted: G C 5.0.0+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:__lock_acquire+0x95b/0x3200 kernel/locking/lockdep.c:3573 Code: 00 0f 85 28 1e 00 00 48 81 c4 08 01 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 24 00 00 49 81 7d 00 e0 de 03 a6 41 bc 00 00 RSP: 0018:ffff8881e3c07a40 EFLAGS: 00010002 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000080 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8881e3c07d98 R11: ffff8881c7f21f80 R12: 0000000000000001 R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000001 FS: 00007fce2252e700(0000) GS:ffff8881f2400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fffc7eb0228 CR3: 00000001e5bea002 CR4: 00000000007606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: lock_acquire+0xff/0x2c0 kernel/locking/lockdep.c:4211 __mutex_lock_common kernel/locking/mutex.c:925 [inline] __mutex_lock+0xdf/0x1050 kernel/locking/mutex.c:1072 drain_workqueue+0x24/0x3f0 kernel/workqueue.c:2934 destroy_workqueue+0x23/0x630 kernel/workqueue.c:4319 __do_sys_delete_module kernel/module.c:1018 [inline] __se_sys_delete_module kernel/module.c:961 [inline] __x64_sys_delete_module+0x30c/0x480 kernel/module.c:961 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fce2252dc58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000140 RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fce2252e6bc R13: 00000000004bcca9 R14: 00000000006f6b48 R15: 00000000ffffffff
If alloc_workqueue fails, it should return -ENOMEM, otherwise may trigger this NULL pointer dereference while unloading drivers.
Reported-by: Hulk Robot hulkci@huawei.com Fixes: 0a38c17a21a0 ("fm10k: Remove create_workqueue") Signed-off-by: Yue Haibing yuehaibing@huawei.com Tested-by: Andrew Bowers andrewx.bowers@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c @@ -58,6 +58,8 @@ static int __init fm10k_init_module(void /* create driver workqueue */ fm10k_workqueue = alloc_workqueue("%s", WQ_MEM_RECLAIM, 0, fm10k_driver_name); + if (!fm10k_workqueue) + return -ENOMEM;
fm10k_dbg_init();
From: Xin Long lucien.xin@gmail.com
commit 6f07e5f06c8712acc423485f657799fc8e11e56c upstream.
Syzbot reported the following crash:
BUG: KMSAN: uninit-value in memchr+0xce/0x110 lib/string.c:961 memchr+0xce/0x110 lib/string.c:961 string_is_valid net/tipc/netlink_compat.c:176 [inline] tipc_nl_compat_bearer_enable+0x2c4/0x910 net/tipc/netlink_compat.c:401 __tipc_nl_compat_doit net/tipc/netlink_compat.c:321 [inline] tipc_nl_compat_doit+0x3aa/0xaf0 net/tipc/netlink_compat.c:354 tipc_nl_compat_handle net/tipc/netlink_compat.c:1162 [inline] tipc_nl_compat_recv+0x1ae7/0x2750 net/tipc/netlink_compat.c:1265 genl_family_rcv_msg net/netlink/genetlink.c:601 [inline] genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 genl_rcv+0x63/0x80 net/netlink/genetlink.c:637 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg net/socket.c:632 [inline]
Uninit was created at: __alloc_skb+0x309/0xa20 net/core/skbuff.c:208 alloc_skb include/linux/skbuff.h:1012 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg net/socket.c:632 [inline]
It was triggered when the bearer name size < TIPC_MAX_BEARER_NAME, it would check with a wrong len/TLV_GET_DATA_LEN(msg->req), which also includes priority and disc_domain length.
This patch is to fix it by checking it with a right length: 'TLV_GET_DATA_LEN(msg->req) - offsetof(struct tipc_bearer_config, name)'.
Reported-by: syzbot+8b707430713eb46e1e45@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/tipc/netlink_compat.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/tipc/netlink_compat.c +++ b/net/tipc/netlink_compat.c @@ -394,7 +394,12 @@ static int tipc_nl_compat_bearer_enable( if (!bearer) return -EMSGSIZE;
- len = min_t(int, TLV_GET_DATA_LEN(msg->req), TIPC_MAX_BEARER_NAME); + len = TLV_GET_DATA_LEN(msg->req); + len -= offsetof(struct tipc_bearer_config, name); + if (len <= 0) + return -EINVAL; + + len = min_t(int, len, TIPC_MAX_BEARER_NAME); if (!string_is_valid(b->name, len)) return -EINVAL;
From: Xin Long lucien.xin@gmail.com
commit 8c63bf9ab4be8b83bd8c34aacfd2f1d2c8901c8a upstream.
A similar issue as fixed by Patch "tipc: check bearer name with right length in tipc_nl_compat_bearer_enable" was also found by syzbot in tipc_nl_compat_link_set().
The length to check with should be 'TLV_GET_DATA_LEN(msg->req) - offsetof(struct tipc_link_config, name)'.
Reported-by: syzbot+de00a87b8644a582ae79@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/tipc/netlink_compat.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/tipc/netlink_compat.c +++ b/net/tipc/netlink_compat.c @@ -768,7 +768,12 @@ static int tipc_nl_compat_link_set(struc
lc = (struct tipc_link_config *)TLV_DATA(msg->req);
- len = min_t(int, TLV_GET_DATA_LEN(msg->req), TIPC_MAX_LINK_NAME); + len = TLV_GET_DATA_LEN(msg->req); + len -= offsetof(struct tipc_link_config, name); + if (len <= 0) + return -EINVAL; + + len = min_t(int, len, TIPC_MAX_LINK_NAME); if (!string_is_valid(lc->name, len)) return -EINVAL;
From: Mikulas Patocka mpatocka@redhat.com
commit 0d74e6a3b6421d98eeafbed26f29156d469bc0b5 upstream.
If the string opt_string is small, the function memcmp can access bytes that are beyond the terminating nul character. In theory, it could cause segfault, if opt_string were located just below some unmapped memory.
Change from memcmp to strncmp so that we don't read bytes beyond the end of the string.
Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-integrity.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -2917,17 +2917,17 @@ static int dm_integrity_ctr(struct dm_ta goto bad; } ic->sectors_per_block = val >> SECTOR_SHIFT; - } else if (!memcmp(opt_string, "internal_hash:", strlen("internal_hash:"))) { + } else if (!strncmp(opt_string, "internal_hash:", strlen("internal_hash:"))) { r = get_alg_and_key(opt_string, &ic->internal_hash_alg, &ti->error, "Invalid internal_hash argument"); if (r) goto bad; - } else if (!memcmp(opt_string, "journal_crypt:", strlen("journal_crypt:"))) { + } else if (!strncmp(opt_string, "journal_crypt:", strlen("journal_crypt:"))) { r = get_alg_and_key(opt_string, &ic->journal_crypt_alg, &ti->error, "Invalid journal_crypt argument"); if (r) goto bad; - } else if (!memcmp(opt_string, "journal_mac:", strlen("journal_mac:"))) { + } else if (!strncmp(opt_string, "journal_mac:", strlen("journal_mac:"))) { r = get_alg_and_key(opt_string, &ic->journal_mac_alg, &ti->error, "Invalid journal_mac argument"); if (r)
From: Daniel Borkmann daniel@iogearbox.net
commit ce02ef06fcf7a399a6276adb83f37373d10cbbe1 upstream.
From networking side, there are numerous attempts to get rid of indirect
calls in fast-path wherever feasible in order to avoid the cost of retpolines, for example, just to name a few:
* 283c16a2dfd3 ("indirect call wrappers: helpers to speed-up indirect calls of builtin") * aaa5d90b395a ("net: use indirect call wrappers at GRO network layer") * 028e0a476684 ("net: use indirect call wrappers at GRO transport layer") * 356da6d0cde3 ("dma-mapping: bypass indirect calls for dma-direct") * 09772d92cd5a ("bpf: avoid retpoline for lookup/update/delete calls on maps") * 10870dd89e95 ("netfilter: nf_tables: add direct calls for all builtin expressions") [...]
Recent work on XDP from Björn and Magnus additionally found that manually transforming the XDP return code switch statement with more than 5 cases into if-else combination would result in a considerable speedup in XDP layer due to avoidance of indirect calls in CONFIG_RETPOLINE enabled builds. On i40e driver with XDP prog attached, a 20-26% speedup has been observed [0]. Aside from XDP, there are many other places later in the networking stack's critical path with similar switch-case processing. Rather than fixing every XDP-enabled driver and locations in stack by hand, it would be good to instead raise the limit where gcc would emit expensive indirect calls from the switch under retpolines and stick with the default as-is in case of !retpoline configured kernels. This would also have the advantage that for archs where this is not necessary, we let compiler select the underlying target optimization for these constructs and avoid potential slow-downs by if-else hand-rewrite.
In case of gcc, this setting is controlled by case-values-threshold which has an architecture global default that selects 4 or 5 (latter if target does not have a case insn that compares the bounds) where some arch back ends like arm64 or s390 override it with their own target hooks, for example, in gcc commit db7a90aa0de5 ("S/390: Disable prediction of indirect branches") the threshold pretty much disables jump tables by limit of 20 under retpoline builds. Comparing gcc's and clang's default code generation on x86-64 under O2 level with retpoline build results in the following outcome for 5 switch cases:
* gcc with -mindirect-branch=thunk-inline -mindirect-branch-register:
# gdb -batch -ex 'disassemble dispatch' ./c-switch Dump of assembler code for function dispatch: 0x0000000000400be0 <+0>: cmp $0x4,%edi 0x0000000000400be3 <+3>: ja 0x400c35 <dispatch+85> 0x0000000000400be5 <+5>: lea 0x915f8(%rip),%rdx # 0x4921e4 0x0000000000400bec <+12>: mov %edi,%edi 0x0000000000400bee <+14>: movslq (%rdx,%rdi,4),%rax 0x0000000000400bf2 <+18>: add %rdx,%rax 0x0000000000400bf5 <+21>: callq 0x400c01 <dispatch+33> 0x0000000000400bfa <+26>: pause 0x0000000000400bfc <+28>: lfence 0x0000000000400bff <+31>: jmp 0x400bfa <dispatch+26> 0x0000000000400c01 <+33>: mov %rax,(%rsp) 0x0000000000400c05 <+37>: retq 0x0000000000400c06 <+38>: nopw %cs:0x0(%rax,%rax,1) 0x0000000000400c10 <+48>: jmpq 0x400c90 <fn_3> 0x0000000000400c15 <+53>: nopl (%rax) 0x0000000000400c18 <+56>: jmpq 0x400c70 <fn_2> 0x0000000000400c1d <+61>: nopl (%rax) 0x0000000000400c20 <+64>: jmpq 0x400c50 <fn_1> 0x0000000000400c25 <+69>: nopl (%rax) 0x0000000000400c28 <+72>: jmpq 0x400c40 <fn_0> 0x0000000000400c2d <+77>: nopl (%rax) 0x0000000000400c30 <+80>: jmpq 0x400cb0 <fn_4> 0x0000000000400c35 <+85>: push %rax 0x0000000000400c36 <+86>: callq 0x40dd80 <abort> End of assembler dump.
* clang with -mretpoline emitting search tree:
# gdb -batch -ex 'disassemble dispatch' ./c-switch Dump of assembler code for function dispatch: 0x0000000000400b30 <+0>: cmp $0x1,%edi 0x0000000000400b33 <+3>: jle 0x400b44 <dispatch+20> 0x0000000000400b35 <+5>: cmp $0x2,%edi 0x0000000000400b38 <+8>: je 0x400b4d <dispatch+29> 0x0000000000400b3a <+10>: cmp $0x3,%edi 0x0000000000400b3d <+13>: jne 0x400b52 <dispatch+34> 0x0000000000400b3f <+15>: jmpq 0x400c50 <fn_3> 0x0000000000400b44 <+20>: test %edi,%edi 0x0000000000400b46 <+22>: jne 0x400b5c <dispatch+44> 0x0000000000400b48 <+24>: jmpq 0x400c20 <fn_0> 0x0000000000400b4d <+29>: jmpq 0x400c40 <fn_2> 0x0000000000400b52 <+34>: cmp $0x4,%edi 0x0000000000400b55 <+37>: jne 0x400b66 <dispatch+54> 0x0000000000400b57 <+39>: jmpq 0x400c60 <fn_4> 0x0000000000400b5c <+44>: cmp $0x1,%edi 0x0000000000400b5f <+47>: jne 0x400b66 <dispatch+54> 0x0000000000400b61 <+49>: jmpq 0x400c30 <fn_1> 0x0000000000400b66 <+54>: push %rax 0x0000000000400b67 <+55>: callq 0x40dd20 <abort> End of assembler dump.
For sake of comparison, clang without -mretpoline:
# gdb -batch -ex 'disassemble dispatch' ./c-switch Dump of assembler code for function dispatch: 0x0000000000400b30 <+0>: cmp $0x4,%edi 0x0000000000400b33 <+3>: ja 0x400b57 <dispatch+39> 0x0000000000400b35 <+5>: mov %edi,%eax 0x0000000000400b37 <+7>: jmpq *0x492148(,%rax,8) 0x0000000000400b3e <+14>: jmpq 0x400bf0 <fn_0> 0x0000000000400b43 <+19>: jmpq 0x400c30 <fn_4> 0x0000000000400b48 <+24>: jmpq 0x400c10 <fn_2> 0x0000000000400b4d <+29>: jmpq 0x400c20 <fn_3> 0x0000000000400b52 <+34>: jmpq 0x400c00 <fn_1> 0x0000000000400b57 <+39>: push %rax 0x0000000000400b58 <+40>: callq 0x40dcf0 <abort> End of assembler dump.
Raising the cases to a high number (e.g. 100) will still result in similar code generation pattern with clang and gcc as above, in other words clang generally turns off jump table emission by having an extra expansion pass under retpoline build to turn indirectbr instructions from their IR into switch instructions as a built-in -mno-jump-table lowering of a switch (in this case, even if IR input already contained an indirect branch).
For gcc, adding --param=case-values-threshold=20 as in similar fashion as s390 in order to raise the limit for x86 retpoline enabled builds results in a small vmlinux size increase of only 0.13% (before=18,027,528 after=18,051,192). For clang this option is ignored due to i) not being needed as mentioned and ii) not having above cmdline parameter. Non-retpoline-enabled builds with gcc continue to use the default case-values-threshold setting, so nothing changes here.
[0] https://lore.kernel.org/netdev/20190129095754.9390-1-bjorn.topel@gmail.com/ and "The Path to DPDK Speeds for AF_XDP", LPC 2018, networking track: - http://vger.kernel.org/lpc_net2018_talks/lpc18_pres_af_xdp_perf-v3.pdf - http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Jesper Dangaard Brouer brouer@redhat.com Acked-by: Björn Töpel bjorn.topel@intel.com Acked-by: Linus Torvalds torvalds@linux-foundation.org Cc: netdev@vger.kernel.org Cc: David S. Miller davem@davemloft.net Cc: Magnus Karlsson magnus.karlsson@intel.com Cc: Alexei Starovoitov ast@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: David Woodhouse dwmw2@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Link: https://lkml.kernel.org/r/20190221221941.29358-1-daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/Makefile | 5 +++++ 1 file changed, 5 insertions(+)
--- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -242,6 +242,11 @@ KBUILD_CFLAGS += -fno-asynchronous-unwin # Avoid indirect branches in kernel to deal with Spectre ifdef CONFIG_RETPOLINE KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) + # Additionally, avoid generating expensive indirect jumps which + # are subject to retpolines for small number of switch cases. + # clang turns off jump table generation by default when under + # retpoline builds, however, gcc does not for x86. + KBUILD_CFLAGS += $(call cc-option,--param=case-values-threshold=20) endif
archscripts: scripts_basic
From: Daniel Borkmann daniel@iogearbox.net
commit a9d57ef15cbe327fe54416dd194ee0ea66ae53a4 upstream.
Commit ce02ef06fcf7 ("x86, retpolines: Raise limit for generating indirect calls from switch-case") raised the limit under retpolines to 20 switch cases where gcc would only then start to emit jump tables, and therefore effectively disabling the emission of slow indirect calls in this area.
After this has been brought to attention to gcc folks [0], Martin Liska has then fixed gcc to align with clang by avoiding to generate switch jump tables entirely under retpolines. This is taking effect in gcc starting from stable version 8.4.0. Given kernel supports compilation with older versions of gcc where the fix is not being available or backported anymore, we need to keep the extra KBUILD_CFLAGS around for some time and generally set the -fno-jump-tables to align with what more recent gcc is doing automatically today.
More than 20 switch cases are not expected to be fast-path critical, but it would still be good to align with gcc behavior for versions < 8.4.0 in order to have consistency across supported gcc versions. vmlinux size is slightly growing by 0.27% for older gcc. This flag is only set to work around affected gcc, no change for clang.
[0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952
Suggested-by: Martin Liska mliska@suse.cz Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: David Woodhouse dwmw2@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jesper Dangaard Brouer brouer@redhat.com Cc: Björn Töpelbjorn.topel@intel.com Cc: Magnus Karlsson magnus.karlsson@intel.com Cc: Alexei Starovoitov ast@kernel.org Cc: H.J. Lu hjl.tools@gmail.com Cc: Alexei Starovoitov ast@kernel.org Cc: David S. Miller davem@davemloft.net Link: https://lkml.kernel.org/r/20190325135620.14882-1-daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/Makefile | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -245,8 +245,12 @@ ifdef CONFIG_RETPOLINE # Additionally, avoid generating expensive indirect jumps which # are subject to retpolines for small number of switch cases. # clang turns off jump table generation by default when under - # retpoline builds, however, gcc does not for x86. - KBUILD_CFLAGS += $(call cc-option,--param=case-values-threshold=20) + # retpoline builds, however, gcc does not for x86. This has + # only been fixed starting from gcc stable version 8.4.0 and + # onwards, but not for older ones. See gcc bug #86952. + ifndef CONFIG_CC_IS_CLANG + KBUILD_CFLAGS += $(call cc-option,-fno-jump-tables) + endif endif
archscripts: scripts_basic
From: Jan Kara jack@suse.cz
commit f2c57d91b0d96aa13ccff4e3b178038f17b00658 upstream.
In DAX mode a write pagefault can race with write(2) in the following way:
CPU0 CPU1 write fault for mapped zero page (hole) dax_iomap_rw() iomap_apply() xfs_file_iomap_begin() - allocates blocks dax_iomap_actor() invalidate_inode_pages2_range() - invalidates radix tree entries in given range dax_iomap_pte_fault() grab_mapping_entry() - no entry found, creates empty ... xfs_file_iomap_begin() - finds already allocated block ... vmf_insert_mixed_mkwrite() - WARNs and does nothing because there is still zero page mapped in PTE unmap_mapping_pages()
This race results in WARN_ON from insert_pfn() and is occasionally triggered by fstest generic/344. Note that the race is otherwise harmless as before write(2) on CPU0 is finished, we will invalidate page tables properly and thus user of mmap will see modified data from write(2) from that point on. So just restrict the warning only to the case when the PFN in PTE is not zero page.
Link: http://lkml.kernel.org/r/20180824154542.26872-1-jack@suse.cz Signed-off-by: Jan Kara jack@suse.cz Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: Dan Williams dan.j.williams@intel.com Cc: Dave Jiang dave.jiang@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/memory.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/mm/memory.c +++ b/mm/memory.c @@ -1804,10 +1804,15 @@ static int insert_pfn(struct vm_area_str * in may not match the PFN we have mapped if the * mapped PFN is a writeable COW page. In the mkwrite * case we are creating a writable PTE for a shared - * mapping and we expect the PFNs to match. + * mapping and we expect the PFNs to match. If they + * don't match, we are likely racing with block + * allocation and mapping invalidation so just skip the + * update. */ - if (WARN_ON_ONCE(pte_pfn(*pte) != pfn_t_to_pfn(pfn))) + if (pte_pfn(*pte) != pfn_t_to_pfn(pfn)) { + WARN_ON_ONCE(!is_zero_pfn(pte_pfn(*pte))); goto out_unlock; + } entry = *pte; goto out_mkwrite; } else
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 57da9a9742200f391d1cf93fea389f7ddc25ec9a which is commit 310ca162d779efee8a2dc3731439680f3e9c1e86 upstream.
Jan Kara has reported seeing problems with this patch applied, as has Salvatore Bonaccorso, so let's drop it for now.
Reported-by: Salvatore Bonaccorso carnil@debian.org Reported-by: Jan Kara jack@suse.cz Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/loop.c | 58 +++++++++++++++++++++++++-------------------------- drivers/block/loop.h | 1 2 files changed, 30 insertions(+), 29 deletions(-)
--- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -82,7 +82,6 @@
static DEFINE_IDR(loop_index_idr); static DEFINE_MUTEX(loop_index_mutex); -static DEFINE_MUTEX(loop_ctl_mutex);
static int max_part; static int part_shift; @@ -1019,7 +1018,7 @@ static int loop_clr_fd(struct loop_devic */ if (atomic_read(&lo->lo_refcnt) > 1) { lo->lo_flags |= LO_FLAGS_AUTOCLEAR; - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); return 0; }
@@ -1071,12 +1070,12 @@ static int loop_clr_fd(struct loop_devic if (!part_shift) lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN; loop_unprepare_queue(lo); - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); /* - * Need not hold loop_ctl_mutex to fput backing file. - * Calling fput holding loop_ctl_mutex triggers a circular + * Need not hold lo_ctl_mutex to fput backing file. + * Calling fput holding lo_ctl_mutex triggers a circular * lock dependency possibility warning as fput can take - * bd_mutex which is usually taken before loop_ctl_mutex. + * bd_mutex which is usually taken before lo_ctl_mutex. */ fput(filp); return 0; @@ -1195,7 +1194,7 @@ loop_get_status(struct loop_device *lo, int ret;
if (lo->lo_state != Lo_bound) { - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); return -ENXIO; }
@@ -1214,10 +1213,10 @@ loop_get_status(struct loop_device *lo, lo->lo_encrypt_key_size); }
- /* Drop loop_ctl_mutex while we call into the filesystem. */ + /* Drop lo_ctl_mutex while we call into the filesystem. */ path = lo->lo_backing_file->f_path; path_get(&path); - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); ret = vfs_getattr(&path, &stat, STATX_INO, AT_STATX_SYNC_AS_STAT); if (!ret) { info->lo_device = huge_encode_dev(stat.dev); @@ -1309,7 +1308,7 @@ loop_get_status_old(struct loop_device * int err;
if (!arg) { - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); return -EINVAL; } err = loop_get_status(lo, &info64); @@ -1327,7 +1326,7 @@ loop_get_status64(struct loop_device *lo int err;
if (!arg) { - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); return -EINVAL; } err = loop_get_status(lo, &info64); @@ -1402,7 +1401,7 @@ static int lo_ioctl(struct block_device struct loop_device *lo = bdev->bd_disk->private_data; int err;
- mutex_lock_nested(&loop_ctl_mutex, 1); + mutex_lock_nested(&lo->lo_ctl_mutex, 1); switch (cmd) { case LOOP_SET_FD: err = loop_set_fd(lo, mode, bdev, arg); @@ -1411,7 +1410,7 @@ static int lo_ioctl(struct block_device err = loop_change_fd(lo, bdev, arg); break; case LOOP_CLR_FD: - /* loop_clr_fd would have unlocked loop_ctl_mutex on success */ + /* loop_clr_fd would have unlocked lo_ctl_mutex on success */ err = loop_clr_fd(lo); if (!err) goto out_unlocked; @@ -1424,7 +1423,7 @@ static int lo_ioctl(struct block_device break; case LOOP_GET_STATUS: err = loop_get_status_old(lo, (struct loop_info __user *) arg); - /* loop_get_status() unlocks loop_ctl_mutex */ + /* loop_get_status() unlocks lo_ctl_mutex */ goto out_unlocked; case LOOP_SET_STATUS64: err = -EPERM; @@ -1434,7 +1433,7 @@ static int lo_ioctl(struct block_device break; case LOOP_GET_STATUS64: err = loop_get_status64(lo, (struct loop_info64 __user *) arg); - /* loop_get_status() unlocks loop_ctl_mutex */ + /* loop_get_status() unlocks lo_ctl_mutex */ goto out_unlocked; case LOOP_SET_CAPACITY: err = -EPERM; @@ -1454,7 +1453,7 @@ static int lo_ioctl(struct block_device default: err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL; } - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex);
out_unlocked: return err; @@ -1571,7 +1570,7 @@ loop_get_status_compat(struct loop_devic int err;
if (!arg) { - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); return -EINVAL; } err = loop_get_status(lo, &info64); @@ -1588,16 +1587,16 @@ static int lo_compat_ioctl(struct block_
switch(cmd) { case LOOP_SET_STATUS: - mutex_lock(&loop_ctl_mutex); + mutex_lock(&lo->lo_ctl_mutex); err = loop_set_status_compat( lo, (const struct compat_loop_info __user *) arg); - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); break; case LOOP_GET_STATUS: - mutex_lock(&loop_ctl_mutex); + mutex_lock(&lo->lo_ctl_mutex); err = loop_get_status_compat( lo, (struct compat_loop_info __user *) arg); - /* loop_get_status() unlocks loop_ctl_mutex */ + /* loop_get_status() unlocks lo_ctl_mutex */ break; case LOOP_SET_CAPACITY: case LOOP_CLR_FD: @@ -1641,7 +1640,7 @@ static void __lo_release(struct loop_dev if (atomic_dec_return(&lo->lo_refcnt)) return;
- mutex_lock(&loop_ctl_mutex); + mutex_lock(&lo->lo_ctl_mutex); if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) { /* * In autoclear mode, stop the loop thread @@ -1659,7 +1658,7 @@ static void __lo_release(struct loop_dev blk_mq_unfreeze_queue(lo->lo_queue); }
- mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); }
static void lo_release(struct gendisk *disk, fmode_t mode) @@ -1705,10 +1704,10 @@ static int unregister_transfer_cb(int id struct loop_device *lo = ptr; struct loop_func_table *xfer = data;
- mutex_lock(&loop_ctl_mutex); + mutex_lock(&lo->lo_ctl_mutex); if (lo->lo_encryption == xfer) loop_release_xfer(lo); - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); return 0; }
@@ -1881,6 +1880,7 @@ static int loop_add(struct loop_device * if (!part_shift) disk->flags |= GENHD_FL_NO_PART_SCAN; disk->flags |= GENHD_FL_EXT_DEVT; + mutex_init(&lo->lo_ctl_mutex); atomic_set(&lo->lo_refcnt, 0); lo->lo_number = i; spin_lock_init(&lo->lo_lock); @@ -1993,19 +1993,19 @@ static long loop_control_ioctl(struct fi ret = loop_lookup(&lo, parm); if (ret < 0) break; - mutex_lock(&loop_ctl_mutex); + mutex_lock(&lo->lo_ctl_mutex); if (lo->lo_state != Lo_unbound) { ret = -EBUSY; - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); break; } if (atomic_read(&lo->lo_refcnt) > 0) { ret = -EBUSY; - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); break; } lo->lo_disk->private_data = NULL; - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); idr_remove(&loop_index_idr, lo->lo_number); loop_remove(lo); break; --- a/drivers/block/loop.h +++ b/drivers/block/loop.h @@ -54,6 +54,7 @@ struct loop_device {
spinlock_t lo_lock; int lo_state; + struct mutex lo_ctl_mutex; struct kthread_worker worker; struct task_struct *worker_task; bool use_dio;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 20ff83f10f113c88d0bb74589389b05250994c16 ]
Before calling __ip_options_compile(), we need to ensure the network header is a an IPv4 one, and that it is already pulled in skb->head.
RAW sockets going through a tunnel can end up calling ipv4_link_failure() with total garbage in the skb, or arbitrary lengthes.
syzbot report :
BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:355 [inline] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123 Write of size 69 at addr ffff888096abf068 by task syz-executor.4/9204
CPU: 0 PID: 9204 Comm: syz-executor.4 Not tainted 5.1.0-rc5+ #77 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x123/0x190 mm/kasan/generic.c:191 memcpy+0x38/0x50 mm/kasan/common.c:133 memcpy include/linux/string.h:355 [inline] __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123 __icmp_send+0x725/0x1400 net/ipv4/icmp.c:695 ipv4_link_failure+0x29f/0x550 net/ipv4/route.c:1204 dst_link_failure include/net/dst.h:427 [inline] vti6_xmit net/ipv6/ip6_vti.c:514 [inline] vti6_tnl_xmit+0x10d4/0x1c0c net/ipv6/ip6_vti.c:553 __netdev_start_xmit include/linux/netdevice.h:4414 [inline] netdev_start_xmit include/linux/netdevice.h:4423 [inline] xmit_one net/core/dev.c:3292 [inline] dev_hard_start_xmit+0x1b2/0x980 net/core/dev.c:3308 __dev_queue_xmit+0x271d/0x3060 net/core/dev.c:3878 dev_queue_xmit+0x18/0x20 net/core/dev.c:3911 neigh_direct_output+0x16/0x20 net/core/neighbour.c:1527 neigh_output include/net/neighbour.h:508 [inline] ip_finish_output2+0x949/0x1740 net/ipv4/ip_output.c:229 ip_finish_output+0x73c/0xd50 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:278 [inline] ip_output+0x21f/0x670 net/ipv4/ip_output.c:405 dst_output include/net/dst.h:444 [inline] NF_HOOK include/linux/netfilter.h:289 [inline] raw_send_hdrinc net/ipv4/raw.c:432 [inline] raw_sendmsg+0x1d2b/0x2f20 net/ipv4/raw.c:663 inet_sendmsg+0x147/0x5d0 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0xdd/0x130 net/socket.c:661 sock_write_iter+0x27c/0x3e0 net/socket.c:988 call_write_iter include/linux/fs.h:1866 [inline] new_sync_write+0x4c7/0x760 fs/read_write.c:474 __vfs_write+0xe4/0x110 fs/read_write.c:487 vfs_write+0x20c/0x580 fs/read_write.c:549 ksys_write+0x14f/0x2d0 fs/read_write.c:599 __do_sys_write fs/read_write.c:611 [inline] __se_sys_write fs/read_write.c:608 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:608 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x458c29 Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f293b44bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458c29 RDX: 0000000000000014 RSI: 00000000200002c0 RDI: 0000000000000003 RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f293b44c6d4 R13: 00000000004c8623 R14: 00000000004ded68 R15: 00000000ffffffff
The buggy address belongs to the page: page:ffffea00025aafc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x1fffc0000000000() raw: 01fffc0000000000 0000000000000000 ffffffff025a0101 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888096abef80: 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 f2 ffff888096abf000: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
ffff888096abf080: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
^ ffff888096abf100: 00 00 00 00 f1 f1 f1 f1 00 00 f3 f3 00 00 00 00 ffff888096abf180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Stephen Suryaputra ssuryaextr@gmail.com Acked-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/route.c | 34 ++++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-)
--- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1192,25 +1192,39 @@ static struct dst_entry *ipv4_dst_check( return dst; }
-static void ipv4_link_failure(struct sk_buff *skb) +static void ipv4_send_dest_unreach(struct sk_buff *skb) { struct ip_options opt; - struct rtable *rt; int res;
/* Recompile ip options since IPCB may not be valid anymore. + * Also check we have a reasonable ipv4 header. */ - memset(&opt, 0, sizeof(opt)); - opt.optlen = ip_hdr(skb)->ihl*4 - sizeof(struct iphdr); - - rcu_read_lock(); - res = __ip_options_compile(dev_net(skb->dev), &opt, skb, NULL); - rcu_read_unlock(); - - if (res) + if (!pskb_network_may_pull(skb, sizeof(struct iphdr)) || + ip_hdr(skb)->version != 4 || ip_hdr(skb)->ihl < 5) return;
+ memset(&opt, 0, sizeof(opt)); + if (ip_hdr(skb)->ihl > 5) { + if (!pskb_network_may_pull(skb, ip_hdr(skb)->ihl * 4)) + return; + opt.optlen = ip_hdr(skb)->ihl * 4 - sizeof(struct iphdr); + + rcu_read_lock(); + res = __ip_options_compile(dev_net(skb->dev), &opt, skb, NULL); + rcu_read_unlock(); + + if (res) + return; + } __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0, &opt); +} + +static void ipv4_link_failure(struct sk_buff *skb) +{ + struct rtable *rt; + + ipv4_send_dest_unreach(skb);
rt = skb_rtable(skb); if (rt)
From: Amit Cohen amitc@mellanox.com
[ Upstream commit 151f0dddbbfe4c35c9c5b64873115aafd436af9d ]
If link is down and autoneg is set to on/off, the status in ethtool does not change.
The reason is when the link is down the function returns with zero before changing autoneg value.
Move the checking of link state (up/down) to be performed after setting autoneg value, in order to be sure that autoneg will change in any case.
Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Amit Cohen amitc@mellanox.com Signed-off-by: Ido Schimmel idosch@mellanox.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -2521,11 +2521,11 @@ mlxsw_sp_port_set_link_ksettings(struct if (err) return err;
+ mlxsw_sp_port->link.autoneg = autoneg; + if (!netif_running(dev)) return 0;
- mlxsw_sp_port->link.autoneg = autoneg; - mlxsw_sp_port_admin_status_set(mlxsw_sp_port, false); mlxsw_sp_port_admin_status_set(mlxsw_sp_port, true);
From: Erez Alfasi ereza@mellanox.com
[ Upstream commit ace329f4ab3ba434be2adf618073c752d083b524 ]
Querying EEPROM high pages data for SFP module is currently not supported by our driver and yet queried, resulting in invalid FW queries.
Set the EEPROM ethtool data length to 256 for SFP module will limit the reading for page 0 only and prevent invalid FW queries.
Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM") Signed-off-by: Erez Alfasi ereza@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/port.c | 4 ---- 2 files changed, 1 insertion(+), 5 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1622,7 +1622,7 @@ static int mlx5e_get_module_info(struct break; case MLX5_MODULE_ID_SFP: modinfo->type = ETH_MODULE_SFF_8472; - modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN; + modinfo->eeprom_len = MLX5_EEPROM_PAGE_LENGTH; break; default: netdev_err(priv->netdev, "%s: cable type not recognized:0x%x\n", --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -392,10 +392,6 @@ int mlx5_query_module_eeprom(struct mlx5 size -= offset + size - MLX5_EEPROM_PAGE_LENGTH;
i2c_addr = MLX5_I2C_ADDR_LOW; - if (offset >= MLX5_EEPROM_PAGE_LENGTH) { - i2c_addr = MLX5_I2C_ADDR_HIGH; - offset -= MLX5_EEPROM_PAGE_LENGTH; - }
MLX5_SET(mcia_reg, in, l, 0); MLX5_SET(mcia_reg, in, module, module_num);
From: Zhu Yanjun yanjun.zhu@oracle.com
[ Upstream commit 4b9fc7146249a6e0e3175d0acc033fdcd2bfcb17 ]
Before the commit 490ea5967b0d ("RDS: IB: move FMR code to its own file"), when the dirty_count is greater than 9/10 of max_items of 8K pool, 1M pool is used, Vice versa. After the commit 490ea5967b0d ("RDS: IB: move FMR code to its own file"), the above is removed. When we make the following tests.
Server: rds-stress -r 1.1.1.16 -D 1M
Client: rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M
The following will appear. " connecting to 1.1.1.16:4000 negotiated options, tasks will start in 2 seconds Starting up..header from 1.1.1.166:4001 to id 4001 bogus .. tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu % 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 ... " So this exchange between 8K and 1M pool is added back.
Fixes: commit 490ea5967b0d ("RDS: IB: move FMR code to its own file") Signed-off-by: Zhu Yanjun yanjun.zhu@oracle.com Acked-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rds/ib_fmr.c | 11 +++++++++++ net/rds/ib_rdma.c | 3 --- 2 files changed, 11 insertions(+), 3 deletions(-)
--- a/net/rds/ib_fmr.c +++ b/net/rds/ib_fmr.c @@ -44,6 +44,17 @@ struct rds_ib_mr *rds_ib_alloc_fmr(struc else pool = rds_ibdev->mr_1m_pool;
+ if (atomic_read(&pool->dirty_count) >= pool->max_items / 10) + queue_delayed_work(rds_ib_mr_wq, &pool->flush_worker, 10); + + /* Switch pools if one of the pool is reaching upper limit */ + if (atomic_read(&pool->dirty_count) >= pool->max_items * 9 / 10) { + if (pool->pool_type == RDS_IB_MR_8K_POOL) + pool = rds_ibdev->mr_1m_pool; + else + pool = rds_ibdev->mr_8k_pool; + } + ibmr = rds_ib_try_reuse_ibmr(pool); if (ibmr) return ibmr; --- a/net/rds/ib_rdma.c +++ b/net/rds/ib_rdma.c @@ -442,9 +442,6 @@ struct rds_ib_mr *rds_ib_try_reuse_ibmr( struct rds_ib_mr *ibmr = NULL; int iter = 0;
- if (atomic_read(&pool->dirty_count) >= pool->max_items_soft / 10) - queue_delayed_work(rds_ib_mr_wq, &pool->flush_worker, 10); - while (1) { ibmr = rds_ib_reuse_mr(pool); if (ibmr)
From: Vinod Koul vkoul@kernel.org
[ Upstream commit b561af36b1841088552464cdc3f6371d92f17710 ]
stmmac_check_ether_addr() checks the MAC address and assigns one in driver open(). In many cases when we create slave netdevice, the dev addr is inherited from master but the master dev addr maybe NULL at that time, so move this call to driver probe so that address is always valid.
Signed-off-by: Xiaofei Shen xiaofeis@codeaurora.org Tested-by: Xiaofei Shen xiaofeis@codeaurora.org Signed-off-by: Sneh Shah snehshah@codeaurora.org Signed-off-by: Vinod Koul vkoul@kernel.org Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2582,8 +2582,6 @@ static int stmmac_open(struct net_device struct stmmac_priv *priv = netdev_priv(dev); int ret;
- stmmac_check_ether_addr(priv); - if (priv->hw->pcs != STMMAC_PCS_RGMII && priv->hw->pcs != STMMAC_PCS_TBI && priv->hw->pcs != STMMAC_PCS_RTBI) { @@ -4213,6 +4211,8 @@ int stmmac_dvr_probe(struct device *devi if (ret) goto error_hw_init;
+ stmmac_check_ether_addr(priv); + /* Configure real RX and TX queues */ netif_set_real_num_rx_queues(ndev, priv->plat->rx_queues_to_use); netif_set_real_num_tx_queues(ndev, priv->plat->tx_queues_to_use);
From: Su Bao Cheng baocheng.su@siemens.com
[ Upstream commit e0c1d14a1a3211dccf0540a6703ffbd5d2a75bdb ]
Since there are more IOT2040 variants with identical hardware but different asset tags, the asset tag matching should be adjusted to support them.
For the board name "SIMATIC IOT2000", currently there are 2 types of hardware, IOT2020 and IOT2040. The IOT2020 is identified by its unique asset tag. Match on it first. If we then match on the board name only, we will catch all IOT2040 variants. In the future there will be no other devices with the "SIMATIC IOT2000" DMI board name but different hardware.
Signed-off-by: Su Bao Cheng baocheng.su@siemens.com Reviewed-by: Jan Kiszka jan.kiszka@siemens.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c @@ -159,6 +159,12 @@ static const struct dmi_system_id quark_ }, .driver_data = (void *)&galileo_stmmac_dmi_data, }, + /* + * There are 2 types of SIMATIC IOT2000: IOT20202 and IOT2040. + * The asset tag "6ES7647-0AA00-0YA2" is only for IOT2020 which + * has only one pci network device while other asset tags are + * for IOT2040 which has two. + */ { .matches = { DMI_EXACT_MATCH(DMI_BOARD_NAME, "SIMATIC IOT2000"), @@ -170,8 +176,6 @@ static const struct dmi_system_id quark_ { .matches = { DMI_EXACT_MATCH(DMI_BOARD_NAME, "SIMATIC IOT2000"), - DMI_EXACT_MATCH(DMI_BOARD_ASSET_TAG, - "6ES7647-0AA00-1YA2"), }, .driver_data = (void *)&iot2040_stmmac_dmi_data, },
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 925b0c841e066b488cc3a60272472b2c56300704 ]
If we add a bond device which is already the master of the team interface, we will hold the team->lock in team_add_slave() first and then request the lock in team_set_mac_address() again. The functions are called like:
- team_add_slave() - team_port_add() - team_port_enter() - team_modeop_port_enter() - __set_port_dev_addr() - dev_set_mac_address() - bond_set_mac_address() - dev_set_mac_address() - team_set_mac_address
Although team_upper_dev_link() would check the upper devices but it is called too late. Fix it by adding a checking before processing the slave.
v2: Do not split the string in netdev_err()
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/team/team.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -1157,6 +1157,12 @@ static int team_port_add(struct team *te return -EINVAL; }
+ if (netdev_has_upper_dev(dev, port_dev)) { + netdev_err(dev, "Device %s is already an upper device of the team interface\n", + portname); + return -EBUSY; + } + if (port_dev->features & NETIF_F_VLAN_CHALLENGED && vlan_uses_dev(dev)) { netdev_err(dev, "Device %s is VLAN challenged and team device has VLAN set up\n",
From: Kees Cook keescook@chromium.org
commit 4966babd904d7f8e9e20735f3637a98fd7ca538c upstream.
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly.
Cc: Ralf Baechle ralf@linux-mips.org Cc: "David S. Miller" davem@davemloft.net Cc: linux-hams@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/rose/af_rose.c | 17 +++++++++-------- net/rose/rose_link.c | 16 +++++++--------- net/rose/rose_loopback.c | 9 +++------ net/rose/rose_route.c | 8 ++++---- net/rose/rose_timer.c | 30 +++++++++++++----------------- 5 files changed, 36 insertions(+), 44 deletions(-)
--- a/net/rose/af_rose.c +++ b/net/rose/af_rose.c @@ -318,9 +318,11 @@ void rose_destroy_socket(struct sock *); /* * Handler for deferred kills. */ -static void rose_destroy_timer(unsigned long data) +static void rose_destroy_timer(struct timer_list *t) { - rose_destroy_socket((struct sock *)data); + struct sock *sk = from_timer(sk, t, sk_timer); + + rose_destroy_socket(sk); }
/* @@ -353,8 +355,7 @@ void rose_destroy_socket(struct sock *sk
if (sk_has_allocations(sk)) { /* Defer: outstanding buffers */ - setup_timer(&sk->sk_timer, rose_destroy_timer, - (unsigned long)sk); + timer_setup(&sk->sk_timer, rose_destroy_timer, 0); sk->sk_timer.expires = jiffies + 10 * HZ; add_timer(&sk->sk_timer); } else @@ -538,8 +539,8 @@ static int rose_create(struct net *net, sock->ops = &rose_proto_ops; sk->sk_protocol = protocol;
- init_timer(&rose->timer); - init_timer(&rose->idletimer); + timer_setup(&rose->timer, NULL, 0); + timer_setup(&rose->idletimer, NULL, 0);
rose->t1 = msecs_to_jiffies(sysctl_rose_call_request_timeout); rose->t2 = msecs_to_jiffies(sysctl_rose_reset_request_timeout); @@ -582,8 +583,8 @@ static struct sock *rose_make_new(struct sk->sk_state = TCP_ESTABLISHED; sock_copy_flags(sk, osk);
- init_timer(&rose->timer); - init_timer(&rose->idletimer); + timer_setup(&rose->timer, NULL, 0); + timer_setup(&rose->idletimer, NULL, 0);
orose = rose_sk(osk); rose->t1 = orose->t1; --- a/net/rose/rose_link.c +++ b/net/rose/rose_link.c @@ -27,8 +27,8 @@ #include <linux/interrupt.h> #include <net/rose.h>
-static void rose_ftimer_expiry(unsigned long); -static void rose_t0timer_expiry(unsigned long); +static void rose_ftimer_expiry(struct timer_list *); +static void rose_t0timer_expiry(struct timer_list *);
static void rose_transmit_restart_confirmation(struct rose_neigh *neigh); static void rose_transmit_restart_request(struct rose_neigh *neigh); @@ -37,8 +37,7 @@ void rose_start_ftimer(struct rose_neigh { del_timer(&neigh->ftimer);
- neigh->ftimer.data = (unsigned long)neigh; - neigh->ftimer.function = &rose_ftimer_expiry; + neigh->ftimer.function = (TIMER_FUNC_TYPE)rose_ftimer_expiry; neigh->ftimer.expires = jiffies + msecs_to_jiffies(sysctl_rose_link_fail_timeout);
@@ -49,8 +48,7 @@ static void rose_start_t0timer(struct ro { del_timer(&neigh->t0timer);
- neigh->t0timer.data = (unsigned long)neigh; - neigh->t0timer.function = &rose_t0timer_expiry; + neigh->t0timer.function = (TIMER_FUNC_TYPE)rose_t0timer_expiry; neigh->t0timer.expires = jiffies + msecs_to_jiffies(sysctl_rose_restart_request_timeout);
@@ -77,13 +75,13 @@ static int rose_t0timer_running(struct r return timer_pending(&neigh->t0timer); }
-static void rose_ftimer_expiry(unsigned long param) +static void rose_ftimer_expiry(struct timer_list *t) { }
-static void rose_t0timer_expiry(unsigned long param) +static void rose_t0timer_expiry(struct timer_list *t) { - struct rose_neigh *neigh = (struct rose_neigh *)param; + struct rose_neigh *neigh = from_timer(neigh, t, t0timer);
rose_transmit_restart_request(neigh);
--- a/net/rose/rose_loopback.c +++ b/net/rose/rose_loopback.c @@ -19,12 +19,13 @@ static struct sk_buff_head loopback_queu static struct timer_list loopback_timer;
static void rose_set_loopback_timer(void); +static void rose_loopback_timer(struct timer_list *unused);
void rose_loopback_init(void) { skb_queue_head_init(&loopback_queue);
- init_timer(&loopback_timer); + timer_setup(&loopback_timer, rose_loopback_timer, 0); }
static int rose_loopback_running(void) @@ -50,20 +51,16 @@ int rose_loopback_queue(struct sk_buff * return 1; }
-static void rose_loopback_timer(unsigned long);
static void rose_set_loopback_timer(void) { del_timer(&loopback_timer);
- loopback_timer.data = 0; - loopback_timer.function = &rose_loopback_timer; loopback_timer.expires = jiffies + 10; - add_timer(&loopback_timer); }
-static void rose_loopback_timer(unsigned long param) +static void rose_loopback_timer(struct timer_list *unused) { struct sk_buff *skb; struct net_device *dev; --- a/net/rose/rose_route.c +++ b/net/rose/rose_route.c @@ -104,8 +104,8 @@ static int __must_check rose_add_node(st
skb_queue_head_init(&rose_neigh->queue);
- init_timer(&rose_neigh->ftimer); - init_timer(&rose_neigh->t0timer); + timer_setup(&rose_neigh->ftimer, NULL, 0); + timer_setup(&rose_neigh->t0timer, NULL, 0);
if (rose_route->ndigis != 0) { rose_neigh->digipeat = @@ -390,8 +390,8 @@ void rose_add_loopback_neigh(void)
skb_queue_head_init(&sn->queue);
- init_timer(&sn->ftimer); - init_timer(&sn->t0timer); + timer_setup(&sn->ftimer, NULL, 0); + timer_setup(&sn->t0timer, NULL, 0);
spin_lock_bh(&rose_neigh_list_lock); sn->next = rose_neigh_list; --- a/net/rose/rose_timer.c +++ b/net/rose/rose_timer.c @@ -29,8 +29,8 @@ #include <net/rose.h>
static void rose_heartbeat_expiry(unsigned long); -static void rose_timer_expiry(unsigned long); -static void rose_idletimer_expiry(unsigned long); +static void rose_timer_expiry(struct timer_list *); +static void rose_idletimer_expiry(struct timer_list *);
void rose_start_heartbeat(struct sock *sk) { @@ -49,8 +49,7 @@ void rose_start_t1timer(struct sock *sk)
del_timer(&rose->timer);
- rose->timer.data = (unsigned long)sk; - rose->timer.function = &rose_timer_expiry; + rose->timer.function = (TIMER_FUNC_TYPE)rose_timer_expiry; rose->timer.expires = jiffies + rose->t1;
add_timer(&rose->timer); @@ -62,8 +61,7 @@ void rose_start_t2timer(struct sock *sk)
del_timer(&rose->timer);
- rose->timer.data = (unsigned long)sk; - rose->timer.function = &rose_timer_expiry; + rose->timer.function = (TIMER_FUNC_TYPE)rose_timer_expiry; rose->timer.expires = jiffies + rose->t2;
add_timer(&rose->timer); @@ -75,8 +73,7 @@ void rose_start_t3timer(struct sock *sk)
del_timer(&rose->timer);
- rose->timer.data = (unsigned long)sk; - rose->timer.function = &rose_timer_expiry; + rose->timer.function = (TIMER_FUNC_TYPE)rose_timer_expiry; rose->timer.expires = jiffies + rose->t3;
add_timer(&rose->timer); @@ -88,8 +85,7 @@ void rose_start_hbtimer(struct sock *sk)
del_timer(&rose->timer);
- rose->timer.data = (unsigned long)sk; - rose->timer.function = &rose_timer_expiry; + rose->timer.function = (TIMER_FUNC_TYPE)rose_timer_expiry; rose->timer.expires = jiffies + rose->hb;
add_timer(&rose->timer); @@ -102,8 +98,7 @@ void rose_start_idletimer(struct sock *s del_timer(&rose->idletimer);
if (rose->idle > 0) { - rose->idletimer.data = (unsigned long)sk; - rose->idletimer.function = &rose_idletimer_expiry; + rose->idletimer.function = (TIMER_FUNC_TYPE)rose_idletimer_expiry; rose->idletimer.expires = jiffies + rose->idle;
add_timer(&rose->idletimer); @@ -163,10 +158,10 @@ static void rose_heartbeat_expiry(unsign bh_unlock_sock(sk); }
-static void rose_timer_expiry(unsigned long param) +static void rose_timer_expiry(struct timer_list *t) { - struct sock *sk = (struct sock *)param; - struct rose_sock *rose = rose_sk(sk); + struct rose_sock *rose = from_timer(rose, t, timer); + struct sock *sk = &rose->sock;
bh_lock_sock(sk); switch (rose->state) { @@ -192,9 +187,10 @@ static void rose_timer_expiry(unsigned l bh_unlock_sock(sk); }
-static void rose_idletimer_expiry(unsigned long param) +static void rose_idletimer_expiry(struct timer_list *t) { - struct sock *sk = (struct sock *)param; + struct rose_sock *rose = from_timer(rose, t, idletimer); + struct sock *sk = &rose->sock;
bh_lock_sock(sk); rose_clear_queues(sk);
From: Eric Dumazet edumazet@google.com
[ Upstream commit 0453c682459583910d611a96de928f4442205493 ]
This patch adds a limit on the number of skbs that fuzzers can queue into loopback_queue. 1000 packets for rose loopback seems more than enough.
Then, since we now have multiple cpus in most linux hosts, we also need to limit the number of skbs rose_loopback_timer() can dequeue at each round.
rose_loopback_queue() can be drop-monitor friendly, calling consume_skb() or kfree_skb() appropriately.
Finally, use mod_timer() instead of del_timer() + add_timer()
syzbot report was :
rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-...!: (10499 ticks this GP) idle=536/1/0x4000000000000002 softirq=103291/103291 fqs=34 rcu: (t=10500 jiffies g=140321 q=323) rcu: rcu_preempt kthread starved for 10426 jiffies! g140321 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1 rcu: RCU grace-period kthread stack dump: rcu_preempt I29168 10 2 0x80000000 Call Trace: context_switch kernel/sched/core.c:2877 [inline] __schedule+0x813/0x1cc0 kernel/sched/core.c:3518 schedule+0x92/0x180 kernel/sched/core.c:3562 schedule_timeout+0x4db/0xfd0 kernel/time/timer.c:1803 rcu_gp_fqs_loop kernel/rcu/tree.c:1971 [inline] rcu_gp_kthread+0x962/0x17b0 kernel/rcu/tree.c:2128 kthread+0x357/0x430 kernel/kthread.c:253 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 NMI backtrace for cpu 0 CPU: 0 PID: 7632 Comm: kworker/0:4 Not tainted 5.1.0-rc5+ #172 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events iterate_cleanup_work Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101 nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1223 print_cpu_stall kernel/rcu/tree.c:1360 [inline] check_cpu_stall kernel/rcu/tree.c:1434 [inline] rcu_pending kernel/rcu/tree.c:3103 [inline] rcu_sched_clock_irq.cold+0x500/0xa4a kernel/rcu/tree.c:2544 update_process_times+0x32/0x80 kernel/time/timer.c:1635 tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161 tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271 __run_hrtimer kernel/time/hrtimer.c:1389 [inline] __hrtimer_run_queues+0x33e/0xde0 kernel/time/hrtimer.c:1451 hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline] smp_apic_timer_interrupt+0x120/0x570 arch/x86/kernel/apic/apic.c:1060 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807 RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50 kernel/kcov.c:95 Code: 89 25 b4 6e ec 08 41 bc f4 ff ff ff e8 cd 5d ea ff 48 c7 05 9e 6e ec 08 00 00 00 00 e9 a4 e9 ff ff 90 90 90 90 90 90 90 90 90 <55> 48 89 e5 48 8b 75 08 65 48 8b 04 25 00 ee 01 00 65 8b 15 c8 60 RSP: 0018:ffff8880ae807ce0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 RAX: ffff88806fd40640 RBX: dffffc0000000000 RCX: ffffffff863fbc56 RDX: 0000000000000100 RSI: ffffffff863fbc1d RDI: ffff88808cf94228 RBP: ffff8880ae807d10 R08: ffff88806fd40640 R09: ffffed1015d00f8b R10: ffffed1015d00f8a R11: 0000000000000003 R12: ffff88808cf941c0 R13: 00000000fffff034 R14: ffff8882166cd840 R15: 0000000000000000 rose_loopback_timer+0x30d/0x3f0 net/rose/rose_loopback.c:91 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325 expire_timers kernel/time/timer.c:1362 [inline] __run_timers kernel/time/timer.c:1681 [inline] __run_timers kernel/time/timer.c:1649 [inline] run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694 __do_softirq+0x266/0x95a kernel/softirq.c:293 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rose/rose_loopback.c | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-)
--- a/net/rose/rose_loopback.c +++ b/net/rose/rose_loopback.c @@ -16,6 +16,7 @@ #include <linux/init.h>
static struct sk_buff_head loopback_queue; +#define ROSE_LOOPBACK_LIMIT 1000 static struct timer_list loopback_timer;
static void rose_set_loopback_timer(void); @@ -35,29 +36,27 @@ static int rose_loopback_running(void)
int rose_loopback_queue(struct sk_buff *skb, struct rose_neigh *neigh) { - struct sk_buff *skbn; + struct sk_buff *skbn = NULL;
- skbn = skb_clone(skb, GFP_ATOMIC); + if (skb_queue_len(&loopback_queue) < ROSE_LOOPBACK_LIMIT) + skbn = skb_clone(skb, GFP_ATOMIC);
- kfree_skb(skb); - - if (skbn != NULL) { + if (skbn) { + consume_skb(skb); skb_queue_tail(&loopback_queue, skbn);
if (!rose_loopback_running()) rose_set_loopback_timer(); + } else { + kfree_skb(skb); }
return 1; }
- static void rose_set_loopback_timer(void) { - del_timer(&loopback_timer); - - loopback_timer.expires = jiffies + 10; - add_timer(&loopback_timer); + mod_timer(&loopback_timer, jiffies + 10); }
static void rose_loopback_timer(struct timer_list *unused) @@ -68,8 +67,12 @@ static void rose_loopback_timer(struct t struct sock *sk; unsigned short frametype; unsigned int lci_i, lci_o; + int count;
- while ((skb = skb_dequeue(&loopback_queue)) != NULL) { + for (count = 0; count < ROSE_LOOPBACK_LIMIT; count++) { + skb = skb_dequeue(&loopback_queue); + if (!skb) + return; if (skb->len < ROSE_MIN_LEN) { kfree_skb(skb); continue; @@ -106,6 +109,8 @@ static void rose_loopback_timer(struct t kfree_skb(skb); } } + if (!skb_queue_empty(&loopback_queue)) + mod_timer(&loopback_timer, jiffies + 1); }
void __exit rose_loopback_clear(void)
From: ZhangXiaoxu zhangxiaoxu5@huawei.com
[ Upstream commit 19fad20d15a6494f47f85d869f00b11343ee5c78 ]
There is a UBSAN report as below: UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56 signed integer overflow: 2147483647 * 1000 cannot be represented in type 'int' CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1 Call Trace: <IRQ> dump_stack+0x8c/0xba ubsan_epilogue+0x11/0x60 handle_overflow+0x12d/0x170 ? ttwu_do_wakeup+0x21/0x320 __ubsan_handle_mul_overflow+0x12/0x20 tcp_ack_update_rtt+0x76c/0x780 tcp_clean_rtx_queue+0x499/0x14d0 tcp_ack+0x69e/0x1240 ? __wake_up_sync_key+0x2c/0x50 ? update_group_capacity+0x50/0x680 tcp_rcv_established+0x4e2/0xe10 tcp_v4_do_rcv+0x22b/0x420 tcp_v4_rcv+0xfe8/0x1190 ip_protocol_deliver_rcu+0x36/0x180 ip_local_deliver+0x15b/0x1a0 ip_rcv+0xac/0xd0 __netif_receive_skb_one_core+0x7f/0xb0 __netif_receive_skb+0x33/0xc0 netif_receive_skb_internal+0x84/0x1c0 napi_gro_receive+0x2a0/0x300 receive_buf+0x3d4/0x2350 ? detach_buf_split+0x159/0x390 virtnet_poll+0x198/0x840 ? reweight_entity+0x243/0x4b0 net_rx_action+0x25c/0x770 __do_softirq+0x19b/0x66d irq_exit+0x1eb/0x230 do_IRQ+0x7a/0x150 common_interrupt+0xf/0xf </IRQ>
It can be reproduced by: echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen
Fixes: f672258391b42 ("tcp: track min RTT using windowed min-filter") Signed-off-by: ZhangXiaoxu zhangxiaoxu5@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/networking/ip-sysctl.txt | 1 + net/ipv4/sysctl_net_ipv4.c | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-)
--- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -402,6 +402,7 @@ tcp_min_rtt_wlen - INTEGER minimum RTT when it is moved to a longer path (e.g., due to traffic engineering). A longer window makes the filter more resistant to RTT inflations such as transient congestion. The unit is seconds. + Possible values: 0 - 86400 (1 day) Default: 300
tcp_moderate_rcvbuf - BOOLEAN --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -45,6 +45,7 @@ static int tcp_syn_retries_min = 1; static int tcp_syn_retries_max = MAX_TCP_SYNCNT; static int ip_ping_group_range_min[] = { 0, 0 }; static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX }; +static int one_day_secs = 24 * 3600;
/* obsolete */ static int sysctl_tcp_low_latency __read_mostly; @@ -552,7 +553,9 @@ static struct ctl_table ipv4_table[] = { .data = &sysctl_tcp_min_rtt_wlen, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one_day_secs }, { .procname = "tcp_low_latency",
stable-rc/linux-4.14.y boot: 117 boots: 2 failed, 114 passed with 1 untried/unknown (v4.14.114-54-gdb44a158d937)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.14... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.114-54...
Tree: stable-rc Branch: linux-4.14.y Git Describe: v4.14.114-54-gdb44a158d937 Git Commit: db44a158d937ed88d91fa55f1df54c11490a5b57 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 66 unique boards, 24 SoC families, 14 builds out of 201
Boot Regressions Detected:
arm:
multi_v7_defconfig: gcc-7: stih410-b2120: lab-baylibre-seattle: new failure (last pass: v4.14.114-44-g8da3ae647284)
Boot Failures Detected:
arm: multi_v7_defconfig: gcc-7: stih410-b2120: 1 failed lab
arm64: defconfig: gcc-7: rk3399-firefly: 1 failed lab
--- For more info write to info@kernelci.org
On 4/30/19 5:38 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.115 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 02 May 2019 11:34:49 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.115-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Tue, 30 Apr 2019 at 17:12, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.115 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 02 May 2019 11:34:49 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.115-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.14.115-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: db44a158d937ed88d91fa55f1df54c11490a5b57 git describe: v4.14.114-54-gdb44a158d937 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.114-5...
No regressions (compared to build v4.14.114)
No fixes (compared to build v4.14.114)
Ran 23587 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * kvm-unit-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none * ssuite
On 30/04/2019 12:38, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.115 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 02 May 2019 11:34:49 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.115-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v4.14: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 24 tests: 24 pass, 0 fail
Linux version: 4.14.115-rc1-gdb44a15 Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
On Tue, Apr 30, 2019 at 01:38:07PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.115 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 02 May 2019 11:34:49 AM UTC. Anything received after that time might be too late.
Build results: total: 172 pass: 172 fail: 0 Qemu test results: total: 333 pass: 333 fail: 0
Guenter
linux-stable-mirror@lists.linaro.org