This is the start of the stable review cycle for the 4.14.132 release. There are 43 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 04 Jul 2019 07:59:45 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.132-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.132-rc1
Xin Long lucien.xin@gmail.com tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb
Will Deacon will.deacon@arm.com futex: Update comments and docs about return values of arch futex code
Daniel Borkmann daniel@iogearbox.net bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd
Will Deacon will.deacon@arm.com arm64: futex: Avoid copying out uninitialised stack in failed cmpxchg()
Martin KaFai Lau kafai@fb.com bpf: udp: ipv6: Avoid running reuseport's bpf_prog from __udp6_lib_err
Martin KaFai Lau kafai@fb.com bpf: udp: Avoid calling reuseport's bpf_prog from udp_gro
YueHaibing yuehaibing@huawei.com bonding: Always enable vlan tx offload
YueHaibing yuehaibing@huawei.com team: Always enable vlan tx offload
Fei Li lifei.shirley@bytedance.com tun: wake up waitqueues after IFF_UP is set
Xin Long lucien.xin@gmail.com tipc: check msg->req data len in tipc_nl_compat_bearer_disable
Xin Long lucien.xin@gmail.com tipc: change to use register_pernet_device
Xin Long lucien.xin@gmail.com sctp: change to hold sk after auth shkey is created successfully
Roland Hii roland.king.guan.hii@intel.com net: stmmac: fixed new system time seconds value calculation
JingYi Hou houjingyi647@gmail.com net: remove duplicate fetch in sock_getsockopt
Eric Dumazet edumazet@google.com net/packet: fix memory leak in packet_set_ring()
Stephen Suryaputra ssuryaextr@gmail.com ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop
Neil Horman nhorman@tuxdriver.com af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
Wang Xin xin.wang7@cn.bosch.com eeprom: at24: fix unexpected timeout under high load
Geert Uytterhoeven geert@linux-m68k.org cpu/speculation: Warn on unsupported mitigations= parameter
Trond Myklebust trondmy@gmail.com NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O
Thomas Gleixner tglx@linutronix.de x86/microcode: Fix the microcode load on CPU hotplug for real
Alejandro Jimenez alejandro.j.jimenez@oracle.com x86/speculation: Allow guests to use SSBD even if host does not
Jan Kara jack@suse.cz scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck()
zhangyi (F) yi.zhang@huawei.com dm log writes: make sure super sector log updates are written in order
Colin Ian King colin.king@canonical.com mm/page_idle.c: fix oops because end_pfn is larger than max_pfn
Jann Horn jannh@google.com fs/binfmt_flat.c: make load_flat_shared_library() work
zhong jiang zhongjiang@huawei.com mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask
John Ogness john.ogness@linutronix.de fs/proc/array.c: allow reporting eip/esp for all coredumping threads
Sasha Levin sashal@kernel.org Revert "compiler.h: update definition of unreachable()"
Kristian Evensen kristian.evensen@gmail.com qmi_wwan: Fix out-of-bounds read
Adeodato Simó dato@net.com.org.es net/9p: include trans_common.h to fix missing prototype warning.
Dominique Martinet dominique.martinet@cea.fr 9p: p9dirent_read: check network-provided name length
Dominique Martinet dominique.martinet@cea.fr 9p/rdma: remove useless check in cm_event_handler
Dominique Martinet dominique.martinet@cea.fr 9p: acl: fix uninitialized iattr access
Dominique Martinet dominique.martinet@cea.fr 9p/rdma: do not disconnect on down_interruptible EAGAIN
Dominique Martinet dominique.martinet@cea.fr 9p/xen: fix check for xenbus_read error in front_probe
Martin Wilck mwilck@suse.com block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs
Christoph Hellwig hch@lst.de block: add a lower-level bio_add_page interface
Mike Marciniszyn mike.marciniszyn@intel.com IB/hfi1: Close PSM sdma_progress sleep window
Sasha Levin sashal@kernel.org Revert "x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP"
Arnaldo Carvalho de Melo acme@redhat.com perf header: Fix unchecked usage of strncpy()
Arnaldo Carvalho de Melo acme@redhat.com perf help: Remove needless use of strncpy()
Arnaldo Carvalho de Melo acme@redhat.com perf ui helpline: Use strlcpy() as a shorter form of strncpy() + explicit set nul
-------------
Diffstat:
Documentation/robust-futexes.txt | 3 +- Makefile | 4 +- arch/arm64/include/asm/futex.h | 4 +- arch/arm64/include/asm/insn.h | 8 ++ arch/arm64/kernel/insn.c | 40 +++++++ arch/arm64/net/bpf_jit.h | 4 + arch/arm64/net/bpf_jit_comp.c | 28 +++-- arch/x86/kernel/cpu/bugs.c | 11 +- arch/x86/kernel/cpu/microcode/core.c | 15 ++- block/bio.c | 131 +++++++++++++++------ drivers/infiniband/hw/hfi1/user_sdma.c | 12 +- drivers/infiniband/hw/hfi1/user_sdma.h | 1 - drivers/md/dm-log-writes.c | 23 +++- drivers/misc/eeprom/at24.c | 107 ++++++++++++----- drivers/net/bonding/bond_main.c | 2 +- .../net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 2 +- drivers/net/team/team.c | 2 +- drivers/net/tun.c | 19 ++- drivers/net/usb/qmi_wwan.c | 4 +- drivers/scsi/vmw_pvscsi.c | 6 +- fs/9p/acl.c | 2 +- fs/binfmt_flat.c | 23 ++-- fs/nfs/flexfilelayout/flexfilelayoutdev.c | 2 +- fs/proc/array.c | 2 +- include/asm-generic/futex.h | 8 +- include/linux/bio.h | 9 ++ include/linux/compiler.h | 5 +- kernel/cpu.c | 3 + kernel/trace/trace_branch.c | 4 - mm/mempolicy.c | 2 +- mm/page_idle.c | 4 +- net/9p/protocol.c | 12 +- net/9p/trans_common.c | 1 + net/9p/trans_rdma.c | 7 +- net/9p/trans_xen.c | 4 +- net/core/sock.c | 3 - net/ipv4/raw.c | 2 +- net/ipv4/udp.c | 6 +- net/ipv6/udp.c | 4 +- net/packet/af_packet.c | 23 +++- net/packet/internal.h | 1 + net/sctp/endpointola.c | 8 +- net/tipc/core.c | 12 +- net/tipc/netlink_compat.c | 18 ++- net/tipc/udp_media.c | 8 +- tools/perf/builtin-help.c | 2 +- tools/perf/ui/tui/helpline.c | 2 +- tools/perf/util/header.c | 2 +- 48 files changed, 418 insertions(+), 187 deletions(-)
From: Arnaldo Carvalho de Melo acme@redhat.com
commit 4d0f16d059ddb91424480d88473f7392f24aebdc upstream.
The strncpy() function may leave the destination string buffer unterminated, better use strlcpy() that we have a __weak fallback implementation for systems without it.
In this case we are actually setting the null byte at the right place, but since we pass the buffer size as the limit to strncpy() and not it minus one, gcc ends up warning us about that, see below. So, lets just switch to the shorter form provided by strlcpy().
This fixes this warning on an Alpine Linux Edge system with gcc 8.2:
ui/tui/helpline.c: In function 'tui_helpline__push': ui/tui/helpline.c:27:2: error: 'strncpy' specified bound 512 equals destination size [-Werror=stringop-truncation] strncpy(ui_helpline__current, msg, sz)[sz - 1] = '\0'; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors
Cc: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Fixes: e6e904687949 ("perf ui: Introduce struct ui_helpline") Link: https://lkml.kernel.org/n/tip-d1wz0hjjsh19xbalw69qpytj@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/ui/tui/helpline.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/perf/ui/tui/helpline.c +++ b/tools/perf/ui/tui/helpline.c @@ -24,7 +24,7 @@ static void tui_helpline__push(const cha SLsmg_set_color(0); SLsmg_write_nstring((char *)msg, SLtt_Screen_Cols); SLsmg_refresh(); - strncpy(ui_helpline__current, msg, sz)[sz - 1] = '\0'; + strlcpy(ui_helpline__current, msg, sz); }
static int tui_helpline__show(const char *format, va_list ap)
From: Arnaldo Carvalho de Melo acme@redhat.com
commit b6313899f4ed2e76b8375cf8069556f5b94fbff0 upstream.
Since we make sure the destination buffer has at least strlen(orig) + 1, no need to do a strncpy(dest, orig, strlen(orig)), just use strcpy(dest, orig).
This silences this gcc 8.2 warning on Alpine Linux:
In function 'add_man_viewer', inlined from 'perf_help_config' at builtin-help.c:284:3: builtin-help.c:192:2: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation] strncpy((*p)->name, name, len); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ builtin-help.c: In function 'perf_help_config': builtin-help.c:187:15: note: length computed here size_t len = strlen(name); ^~~~~~~~~~~~
Cc: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Fixes: 078006012401 ("perf_counter tools: add in basic glue from Git") Link: https://lkml.kernel.org/n/tip-2f69l7drca427ob4km8i7kvo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/builtin-help.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/perf/builtin-help.c +++ b/tools/perf/builtin-help.c @@ -189,7 +189,7 @@ static void add_man_viewer(const char *n while (*p) p = &((*p)->next); *p = zalloc(sizeof(**p) + len + 1); - strncpy((*p)->name, name, len); + strcpy((*p)->name, name); }
static int supported_man_viewer(const char *name, size_t len)
From: Arnaldo Carvalho de Melo acme@redhat.com
commit 5192bde7d98c99f2cd80225649e3c2e7493722f7 upstream.
The strncpy() function may leave the destination string buffer unterminated, better use strlcpy() that we have a __weak fallback implementation for systems without it.
This fixes this warning on an Alpine Linux Edge system with gcc 8.2:
util/header.c: In function 'perf_event__synthesize_event_update_name': util/header.c:3625:2: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation] strncpy(ev->data, evsel->name, len); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ util/header.c:3618:15: note: length computed here size_t len = strlen(evsel->name); ^~~~~~~~~~~~~~~~~~~
Cc: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Fixes: a6e5281780d1 ("perf tools: Add event_update event unit type") Link: https://lkml.kernel.org/n/tip-wycz66iy8dl2z3yifgqf894p@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/util/header.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -3171,7 +3171,7 @@ perf_event__synthesize_event_update_name if (ev == NULL) return -ENOMEM;
- strncpy(ev->data, evsel->name, len); + strlcpy(ev->data, evsel->name, len + 1); err = process(tool, (union perf_event*) ev, NULL, NULL); free(ev); return err;
This reverts commit 8190d6fbb1e9b7fa4eb41fe7aa337c46ca514e79, which was upstream commit 4a6c91fbdef846ec7250b82f2eeeb87ac5f18cf9.
On Tue, Jun 25, 2019 at 09:39:45AM +0200, Sebastian Andrzej Siewior wrote:
Please backport commit e74deb11931ff682b59d5b9d387f7115f689698e to stable _or_ revert the backport of commit 4a6c91fbdef84 ("x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP"). It uses user_access_{save|restore}() which has been introduced in the following commit.
Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace_branch.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c index 3ea65cdff30d..4ad967453b6f 100644 --- a/kernel/trace/trace_branch.c +++ b/kernel/trace/trace_branch.c @@ -205,8 +205,6 @@ void trace_likely_condition(struct ftrace_likely_data *f, int val, int expect) void ftrace_likely_update(struct ftrace_likely_data *f, int val, int expect, int is_constant) { - unsigned long flags = user_access_save(); - /* A constant is always correct */ if (is_constant) { f->constant++; @@ -225,8 +223,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, f->data.correct++; else f->data.incorrect++; - - user_access_restore(flags); } EXPORT_SYMBOL(ftrace_likely_update);
commit da9de5f8527f4b9efc82f967d29a583318c034c7 upstream.
The call to sdma_progress() is called outside the wait lock.
In this case, there is a race condition where sdma_progress() can return false and the sdma_engine can idle. If that happens, there will be no more sdma interrupts to cause the wakeup and the user_sdma xmit will hang.
Fix by moving the lock to enclose the sdma_progress() call.
Also, delete busycount. The need for this was removed by: commit bcad29137a97 ("IB/hfi1: Serve the most starved iowait entry first")
Ported to linux-4.14.y.
Cc: stable@vger.kernel.org Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Gary Leshner Gary.S.Leshner@intel.com Signed-off-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hfi1/user_sdma.c | 12 ++++-------- drivers/infiniband/hw/hfi1/user_sdma.h | 1 - 2 files changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c index cbe5ab26d95b..75275f9e363d 100644 --- a/drivers/infiniband/hw/hfi1/user_sdma.c +++ b/drivers/infiniband/hw/hfi1/user_sdma.c @@ -132,25 +132,22 @@ static int defer_packet_queue( struct hfi1_user_sdma_pkt_q *pq = container_of(wait, struct hfi1_user_sdma_pkt_q, busy); struct hfi1_ibdev *dev = &pq->dd->verbs_dev; - struct user_sdma_txreq *tx = - container_of(txreq, struct user_sdma_txreq, txreq);
- if (sdma_progress(sde, seq, txreq)) { - if (tx->busycount++ < MAX_DEFER_RETRY_COUNT) - goto eagain; - } + write_seqlock(&dev->iowait_lock); + if (sdma_progress(sde, seq, txreq)) + goto eagain; /* * We are assuming that if the list is enqueued somewhere, it * is to the dmawait list since that is the only place where * it is supposed to be enqueued. */ xchg(&pq->state, SDMA_PKT_Q_DEFERRED); - write_seqlock(&dev->iowait_lock); if (list_empty(&pq->busy.list)) iowait_queue(pkts_sent, &pq->busy, &sde->dmawait); write_sequnlock(&dev->iowait_lock); return -EBUSY; eagain: + write_sequnlock(&dev->iowait_lock); return -EAGAIN; }
@@ -803,7 +800,6 @@ static int user_sdma_send_pkts(struct user_sdma_request *req, unsigned maxpkts)
tx->flags = 0; tx->req = req; - tx->busycount = 0; INIT_LIST_HEAD(&tx->list);
/* diff --git a/drivers/infiniband/hw/hfi1/user_sdma.h b/drivers/infiniband/hw/hfi1/user_sdma.h index 2b5326d6db53..87b0c567f442 100644 --- a/drivers/infiniband/hw/hfi1/user_sdma.h +++ b/drivers/infiniband/hw/hfi1/user_sdma.h @@ -236,7 +236,6 @@ struct user_sdma_txreq { struct list_head list; struct user_sdma_request *req; u16 flags; - unsigned int busycount; u64 seqnum; };
[ Upstream commit 0aa69fd32a5f766e997ca8ab4723c5a1146efa8b ]
For the upcoming removal of buffer heads in XFS we need to keep track of the number of outstanding writeback requests per page. For this we need to know if bio_add_page merged a region with the previous bvec or not. Instead of adding additional arguments this refactors bio_add_page to be implemented using three lower level helpers which users like XFS can use directly if they care about the merge decisions.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Jens Axboe axboe@kernel.dk Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- block/bio.c | 96 +++++++++++++++++++++++++++++---------------- include/linux/bio.h | 9 +++++ 2 files changed, 72 insertions(+), 33 deletions(-)
diff --git a/block/bio.c b/block/bio.c index d01ab919b313..c1386ce2c014 100644 --- a/block/bio.c +++ b/block/bio.c @@ -773,7 +773,7 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page return 0; }
- if (bio->bi_vcnt >= bio->bi_max_vecs) + if (bio_full(bio)) return 0;
/* @@ -821,52 +821,82 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page EXPORT_SYMBOL(bio_add_pc_page);
/** - * bio_add_page - attempt to add page to bio - * @bio: destination bio - * @page: page to add - * @len: vec entry length - * @offset: vec entry offset + * __bio_try_merge_page - try appending data to an existing bvec. + * @bio: destination bio + * @page: page to add + * @len: length of the data to add + * @off: offset of the data in @page * - * Attempt to add a page to the bio_vec maplist. This will only fail - * if either bio->bi_vcnt == bio->bi_max_vecs or it's a cloned bio. + * Try to add the data at @page + @off to the last bvec of @bio. This is a + * a useful optimisation for file systems with a block size smaller than the + * page size. + * + * Return %true on success or %false on failure. */ -int bio_add_page(struct bio *bio, struct page *page, - unsigned int len, unsigned int offset) +bool __bio_try_merge_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int off) { - struct bio_vec *bv; - - /* - * cloned bio must not modify vec list - */ if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED))) - return 0; + return false;
- /* - * For filesystems with a blocksize smaller than the pagesize - * we will often be called with the same page as last time and - * a consecutive offset. Optimize this special case. - */ if (bio->bi_vcnt > 0) { - bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; + struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
- if (page == bv->bv_page && - offset == bv->bv_offset + bv->bv_len) { + if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) { bv->bv_len += len; - goto done; + bio->bi_iter.bi_size += len; + return true; } } + return false; +} +EXPORT_SYMBOL_GPL(__bio_try_merge_page);
- if (bio->bi_vcnt >= bio->bi_max_vecs) - return 0; +/** + * __bio_add_page - add page to a bio in a new segment + * @bio: destination bio + * @page: page to add + * @len: length of the data to add + * @off: offset of the data in @page + * + * Add the data at @page + @off to @bio as a new bvec. The caller must ensure + * that @bio has space for another bvec. + */ +void __bio_add_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int off) +{ + struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt];
- bv = &bio->bi_io_vec[bio->bi_vcnt]; - bv->bv_page = page; - bv->bv_len = len; - bv->bv_offset = offset; + WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); + WARN_ON_ONCE(bio_full(bio)); + + bv->bv_page = page; + bv->bv_offset = off; + bv->bv_len = len;
- bio->bi_vcnt++; -done: bio->bi_iter.bi_size += len; + bio->bi_vcnt++; +} +EXPORT_SYMBOL_GPL(__bio_add_page); + +/** + * bio_add_page - attempt to add page to bio + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * + * Attempt to add a page to the bio_vec maplist. This will only fail + * if either bio->bi_vcnt == bio->bi_max_vecs or it's a cloned bio. + */ +int bio_add_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset) +{ + if (!__bio_try_merge_page(bio, page, len, offset)) { + if (bio_full(bio)) + return 0; + __bio_add_page(bio, page, len, offset); + } return len; } EXPORT_SYMBOL(bio_add_page); diff --git a/include/linux/bio.h b/include/linux/bio.h index d4b39caf081d..e260f000b9ac 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -123,6 +123,11 @@ static inline void *bio_data(struct bio *bio) return NULL; }
+static inline bool bio_full(struct bio *bio) +{ + return bio->bi_vcnt >= bio->bi_max_vecs; +} + /* * will die */ @@ -459,6 +464,10 @@ void bio_chain(struct bio *, struct bio *); extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int); extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *, unsigned int, unsigned int); +bool __bio_try_merge_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int off); +void __bio_add_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int off); int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter); struct rq_map_data; extern struct bio *bio_map_user_iov(struct request_queue *,
[ Upstream commit 17d51b10d7773e4618bcac64648f30f12d4078fb ]
bio_iov_iter_get_pages() currently only adds pages for the next non-zero segment from the iov_iter to the bio. That's suboptimal for callers, which typically try to pin as many pages as fit into the bio. This patch converts the current bio_iov_iter_get_pages() into a static helper, and introduces a new helper that allocates as many pages as
1) fit into the bio, 2) are present in the iov_iter, 3) and can be pinned by MM.
Error is returned only if zero pages could be pinned. Because of 3), a zero return value doesn't necessarily mean all pages have been pinned. Callers that have to pin every page in the iov_iter must still call this function in a loop (this is currently the case).
This change matters most for __blkdev_direct_IO_simple(), which calls bio_iov_iter_get_pages() only once. If it obtains less pages than requested, it returns a "short write" or "short read", and __generic_file_write_iter() falls back to buffered writes, which may lead to data corruption.
Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io") Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Martin Wilck mwilck@suse.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/bio.c | 35 ++++++++++++++++++++++++++++++++--- 1 file changed, 32 insertions(+), 3 deletions(-)
diff --git a/block/bio.c b/block/bio.c index c1386ce2c014..1384f9790882 100644 --- a/block/bio.c +++ b/block/bio.c @@ -902,14 +902,16 @@ int bio_add_page(struct bio *bio, struct page *page, EXPORT_SYMBOL(bio_add_page);
/** - * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio + * __bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio * @bio: bio to add pages to * @iter: iov iterator describing the region to be mapped * - * Pins as many pages from *iter and appends them to @bio's bvec array. The + * Pins pages from *iter and appends them to @bio's bvec array. The * pages will have to be released using put_page() when done. + * For multi-segment *iter, this function only adds pages from the + * the next non-empty segment of the iov iterator. */ -int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) +static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt, idx; struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; @@ -946,6 +948,33 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) iov_iter_advance(iter, size); return 0; } + +/** + * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio + * @bio: bio to add pages to + * @iter: iov iterator describing the region to be mapped + * + * Pins pages from *iter and appends them to @bio's bvec array. The + * pages will have to be released using put_page() when done. + * The function tries, but does not guarantee, to pin as many pages as + * fit into the bio, or are requested in *iter, whatever is smaller. + * If MM encounters an error pinning the requested pages, it stops. + * Error is returned only if 0 pages could be pinned. + */ +int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) +{ + unsigned short orig_vcnt = bio->bi_vcnt; + + do { + int ret = __bio_iov_iter_get_pages(bio, iter); + + if (unlikely(ret)) + return bio->bi_vcnt > orig_vcnt ? 0 : ret; + + } while (iov_iter_count(iter) && !bio_full(bio)); + + return 0; +} EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
struct submit_bio_ret {
[ Upstream commit 2f9ad0ac947ccbe3ffe7c6229c9330f2a7755f64 ]
If the xen bus exists but does not expose the proper interface, it is possible to get a non-zero length but still some error, leading to strcmp failing trying to load invalid memory addresses e.g. fffffffffffffffe.
There is then no need to check length when there is no error, as the xenbus driver guarantees that the string is nul-terminated.
Link: http://lkml.kernel.org/r/1534236007-10170-1-git-send-email-asmadeus@codewrec... Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Reviewed-by: Stefano Stabellini sstabellini@kernel.org Cc: Eric Van Hensbergen ericvh@gmail.com Cc: Latchesar Ionkov lucho@ionkov.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/trans_xen.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c index c10bdf63eae7..389eb635ec2c 100644 --- a/net/9p/trans_xen.c +++ b/net/9p/trans_xen.c @@ -392,8 +392,8 @@ static int xen_9pfs_front_probe(struct xenbus_device *dev, unsigned int max_rings, max_ring_order, len = 0;
versions = xenbus_read(XBT_NIL, dev->otherend, "versions", &len); - if (!len) - return -EINVAL; + if (IS_ERR(versions)) + return PTR_ERR(versions); if (strcmp(versions, "1")) { kfree(versions); return -EINVAL;
[ Upstream commit 8b894adb2b7e1d1e64b8954569c761eaf3d51ab5 ]
9p/rdma would sometimes drop the connection and display errors in recv_done when the user does ^C. The errors were caused by recv buffers that were posted at the time of disconnect, and we just do not want to disconnect when down_interruptible is... interrupted.
Link: http://lkml.kernel.org/r/1535625307-18019-1-git-send-email-asmadeus@codewrec... Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/trans_rdma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c index f58467a49090..b7648b12bb1a 100644 --- a/net/9p/trans_rdma.c +++ b/net/9p/trans_rdma.c @@ -476,7 +476,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
err = post_recv(client, rpl_context); if (err) { - p9_debug(P9_DEBUG_FCALL, "POST RECV failed\n"); + p9_debug(P9_DEBUG_ERROR, "POST RECV failed: %d\n", err); goto recv_error; } /* remove posted receive buffer from request structure */ @@ -545,7 +545,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req) recv_error: kfree(rpl_context); spin_lock_irqsave(&rdma->req_lock, flags); - if (rdma->state < P9_RDMA_CLOSING) { + if (err != -EINTR && rdma->state < P9_RDMA_CLOSING) { rdma->state = P9_RDMA_CLOSING; spin_unlock_irqrestore(&rdma->req_lock, flags); rdma_disconnect(rdma->cm_id);
[ Upstream commit e02a53d92e197706cad1627bd84705d4aa20a145 ]
iattr is passed to v9fs_vfs_setattr_dotl which does send various values from iattr over the wire, even if it tells the server to only look at iattr.ia_valid fields this could leak some stack data.
Link: http://lkml.kernel.org/r/1536339057-21974-2-git-send-email-asmadeus@codewrec... Addresses-Coverity-ID: 1195601 ("Uninitalized scalar variable") Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Signed-off-by: Sasha Levin sashal@kernel.org --- fs/9p/acl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/9p/acl.c b/fs/9p/acl.c index 082d227fa56b..6261719f6f2a 100644 --- a/fs/9p/acl.c +++ b/fs/9p/acl.c @@ -276,7 +276,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler *handler, switch (handler->flags) { case ACL_TYPE_ACCESS: if (acl) { - struct iattr iattr; + struct iattr iattr = { 0 }; struct posix_acl *old_acl = acl;
retval = posix_acl_update_mode(inode, &iattr.ia_mode, &acl);
[ Upstream commit 473c7dd1d7b59ff8f88a5154737e3eac78a96e5b ]
the client c is always dereferenced to get the rdma struct, so c has to be a valid pointer at this point. Gcc would optimize that away but let's make coverity happy...
Link: http://lkml.kernel.org/r/1536339057-21974-3-git-send-email-asmadeus@codewrec... Addresses-Coverity-ID: 102778 ("Dereference before null check") Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/trans_rdma.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c index b7648b12bb1a..16a4a31f16e0 100644 --- a/net/9p/trans_rdma.c +++ b/net/9p/trans_rdma.c @@ -276,8 +276,7 @@ p9_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) case RDMA_CM_EVENT_DISCONNECTED: if (rdma) rdma->state = P9_RDMA_CLOSED; - if (c) - c->status = Disconnected; + c->status = Disconnected; break;
case RDMA_CM_EVENT_TIMEWAIT_EXIT:
[ Upstream commit ef5305f1f72eb1cfcda25c382bb0368509c0385b ]
strcpy to dirent->d_name could overflow the buffer, use strscpy to check the provided string length and error out if the size was too big.
While we are here, make the function return an error when the pdu parsing failed, instead of returning the pdu offset as if it had been a success...
Link: http://lkml.kernel.org/r/1536339057-21974-4-git-send-email-asmadeus@codewrec... Addresses-Coverity-ID: 139133 ("Copy into fixed size buffer") Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/protocol.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/net/9p/protocol.c b/net/9p/protocol.c index 766d1ef4640a..1885403c9a3e 100644 --- a/net/9p/protocol.c +++ b/net/9p/protocol.c @@ -622,13 +622,19 @@ int p9dirent_read(struct p9_client *clnt, char *buf, int len, if (ret) { p9_debug(P9_DEBUG_9P, "<<< p9dirent_read failed: %d\n", ret); trace_9p_protocol_dump(clnt, &fake_pdu); - goto out; + return ret; }
- strcpy(dirent->d_name, nameptr); + ret = strscpy(dirent->d_name, nameptr, sizeof(dirent->d_name)); + if (ret < 0) { + p9_debug(P9_DEBUG_ERROR, + "On the wire dirent name too long: %s\n", + nameptr); + kfree(nameptr); + return ret; + } kfree(nameptr);
-out: return fake_pdu.offset; } EXPORT_SYMBOL(p9dirent_read);
[ Upstream commit 52ad259eaac0454c1ac7123e7148cf8d6e6f5301 ]
This silences -Wmissing-prototypes when defining p9_release_pages.
Link: http://lkml.kernel.org/r/b1c4df8f21689b10d451c28fe38e860722d20e71.1542089696... Signed-off-by: Adeodato Simó dato@net.com.org.es Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/trans_common.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/9p/trans_common.c b/net/9p/trans_common.c index 38aa6345bdfa..9c0c894b56f8 100644 --- a/net/9p/trans_common.c +++ b/net/9p/trans_common.c @@ -14,6 +14,7 @@
#include <linux/mm.h> #include <linux/module.h> +#include "trans_common.h"
/** * p9_release_req_pages - Release pages after the transaction.
commit 904d88d743b0c94092c5117955eab695df8109e8 upstream.
The syzbot reported
Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xca/0x13e lib/dump_stack.c:113 print_address_description+0x67/0x231 mm/kasan/report.c:188 __kasan_report.cold+0x1a/0x32 mm/kasan/report.c:317 kasan_report+0xe/0x20 mm/kasan/common.c:614 qmi_wwan_probe+0x342/0x360 drivers/net/usb/qmi_wwan.c:1417 usb_probe_interface+0x305/0x7a0 drivers/usb/core/driver.c:361 really_probe+0x281/0x660 drivers/base/dd.c:509 driver_probe_device+0x104/0x210 drivers/base/dd.c:670 __device_attach_driver+0x1c2/0x220 drivers/base/dd.c:777 bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454
Caused by too many confusing indirections and casts. id->driver_info is a pointer stored in a long. We want the pointer here, not the address of it.
Thanks-to: Hillf Danton hdanton@sina.com Reported-by: syzbot+b68605d7fadd21510de1@syzkaller.appspotmail.com Cc: Kristian Evensen kristian.evensen@gmail.com Fixes: e4bf63482c30 ("qmi_wwan: Add quirk for Quectel dynamic config") Signed-off-by: Bjørn Mork bjorn@mork.no
[Upstream commit did not apply because I shuffled two lines in the backport. The fixes tag for 4.14 is 3a6a5107ceb3.]
Signed-off-by: Kristian Evensen kristian.evensen@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/qmi_wwan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index c2d6c501dd85..063daa3435e4 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1395,14 +1395,14 @@ static int qmi_wwan_probe(struct usb_interface *intf, return -ENODEV; }
- info = (void *)&id->driver_info; - /* Several Quectel modems supports dynamic interface configuration, so * we need to match on class/subclass/protocol. These values are * identical for the diagnostic- and QMI-interface, but bNumEndpoints is * different. Ignore the current interface if the number of endpoints * equals the number for the diag interface (two). */ + info = (void *)id->driver_info; + if (info->data & QMI_WWAN_QUIRK_QUECTEL_DYNCFG) { if (desc->bNumEndpoints == 2) return -ENODEV;
This reverts commit 82017e26e51596ee577171a33f357377ec6513b5, which is upstream commit fe0640eb30b7da261ae84d252ed9ed3c7e68dfd8.
On Fri, Jun 28, 2019 at 8:53 AM Tony Battersby tonyb@cybernetics.com wrote:
Old versions of gcc cannot compile 4.14 since 4.14.113:
./include/asm-generic/fixmap.h:37: error: implicit declaration of function ‘__builtin_unreachable’
The stable commit that caused the problem is 82017e26e515 ("compiler.h: update definition of unreachable()") (upstream commit fe0640eb30b7). Reverting the commit fixes the problem.
Kernel 4.17 dropped support for older versions of gcc in upstream commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6"). This was not backported to 4.14 since that would go against the stable kernel rules.
Upstream commit 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") was a fix for cafa0010cd51. This was not backported to 4.14.
Upstream commit fe0640eb30b7 ("compiler.h: update definition of unreachable()") was a fix for 815f0ddb346c. This is the commit that was backported to 4.14. But it only fixed a problem introduced in the other commits, and without those commits, it ends up introducing a problem instead of fixing one. So I recommend reverting that patch in 4.14, which will enable old gcc to compile 4.14 again. If I understand correctly, I believe that clang will still be able to compile 4.14 with the patch reverted, although I haven't tried to compile with clang.
The problematic commit is not present in 4.9.x, 4.4.x, 3.18.x, or 3.16.x.
CC: Nick Desaulniers ndesaulniers@google.com CC: Tony Battersby tonyb@cybernetics.com, Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/compiler.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 67c3934fb9ed..a704d032713b 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -119,10 +119,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, # define ASM_UNREACHABLE #endif #ifndef unreachable -# define unreachable() do { \ - annotate_unreachable(); \ - __builtin_unreachable(); \ -} while (0) +# define unreachable() do { annotate_reachable(); do { } while (1); } while (0) #endif
/*
From: John Ogness john.ogness@linutronix.de
commit cb8f381f1613cafe3aec30809991cd56e7135d92 upstream.
0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in /proc/PID/stat") stopped reporting eip/esp and fd7d56270b52 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") reintroduced the feature to fix a regression with userspace core dump handlers (such as minicoredumper).
Because PF_DUMPCORE is only set for the primary thread, this didn't fix the original problem for secondary threads. Allow reporting the eip/esp for all threads by checking for PF_EXITING as well. This is set for all the other threads when they are killed. coredump_wait() waits for all the tasks to become inactive before proceeding to invoke a core dumper.
Link: http://lkml.kernel.org/r/87y32p7i7a.fsf@linutronix.de Link: http://lkml.kernel.org/r/20190522161614.628-1-jlu@pengutronix.de Fixes: fd7d56270b526ca3 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") Signed-off-by: John Ogness john.ogness@linutronix.de Reported-by: Jan Luebbe jlu@pengutronix.de Tested-by: Jan Luebbe jlu@pengutronix.de Cc: Alexey Dobriyan adobriyan@gmail.com Cc: Andy Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/proc/array.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -448,7 +448,7 @@ static int do_task_stat(struct seq_file * a program is not able to use ptrace(2) in that case. It is * safe because the task has stopped executing permanently. */ - if (permitted && (task->flags & PF_DUMPCORE)) { + if (permitted && (task->flags & (PF_EXITING|PF_DUMPCORE))) { if (try_get_task_stack(task)) { eip = KSTK_EIP(task); esp = KSTK_ESP(task);
From: zhong jiang zhongjiang@huawei.com
commit 29b190fa774dd1b72a1a6f19687d55dc72ea83be upstream.
mpol_rebind_nodemask() is called for MPOL_BIND and MPOL_INTERLEAVE mempoclicies when the tasks's cpuset's mems_allowed changes. For policies created without MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES, it works by remapping the policy's allowed nodes (stored in v.nodes) using the previous value of mems_allowed (stored in w.cpuset_mems_allowed) as the domain of map and the new mems_allowed (passed as nodes) as the range of the map (see the comment of bitmap_remap() for details).
The result of remapping is stored back as policy's nodemask in v.nodes, and the new value of mems_allowed should be stored in w.cpuset_mems_allowed to facilitate the next rebind, if it happens.
However, 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") introduced a bug where the result of remapping is stored in w.cpuset_mems_allowed instead. Thus, a mempolicy's allowed nodes can evolve in an unexpected way after a series of rebinding due to cpuset mems_allowed changes, possibly binding to a wrong node or a smaller number of nodes which may e.g. overload them. This patch fixes the bug so rebinding again works as intended.
[vbabka@suse.cz: new changlog] Link: http://lkml.kernel.org/r/ef6a69c6-c052-b067-8f2c-9d615c619bb9@suse.cz Link: http://lkml.kernel.org/r/1558768043-23184-1-git-send-email-zhongjiang@huawei... Fixes: 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") Signed-off-by: zhong jiang zhongjiang@huawei.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Cc: Oscar Salvador osalvador@suse.de Cc: Anshuman Khandual khandual@linux.vnet.ibm.com Cc: Michal Hocko mhocko@suse.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Andrea Arcangeli aarcange@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/mempolicy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -305,7 +305,7 @@ static void mpol_rebind_nodemask(struct else { nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed, *nodes); - pol->w.cpuset_mems_allowed = tmp; + pol->w.cpuset_mems_allowed = *nodes; }
if (nodes_empty(tmp))
From: Jann Horn jannh@google.com
commit 867bfa4a5fcee66f2b25639acae718e8b28b25a5 upstream.
load_flat_shared_library() is broken: It only calls load_flat_file() if prepare_binprm() returns zero, but prepare_binprm() returns the number of bytes read - so this only happens if the file is empty.
Instead, call into load_flat_file() if the number of bytes read is non-negative. (Even if the number of bytes is zero - in that case, load_flat_file() will see nullbytes and return a nice -ENOEXEC.)
In addition, remove the code related to bprm creds and stop using prepare_binprm() - this code is loading a library, not a main executable, and it only actually uses the members "buf", "file" and "filename" of the linux_binprm struct. Instead, call kernel_read() directly.
Link: http://lkml.kernel.org/r/20190524201817.16509-1-jannh@google.com Fixes: 287980e49ffc ("remove lots of IS_ERR_VALUE abuses") Signed-off-by: Jann Horn jannh@google.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Kees Cook keescook@chromium.org Cc: Nicolas Pitre nicolas.pitre@linaro.org Cc: Arnd Bergmann arnd@arndb.de Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: Russell King linux@armlinux.org.uk Cc: Greg Ungerer gerg@linux-m68k.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/binfmt_flat.c | 23 +++++++---------------- 1 file changed, 7 insertions(+), 16 deletions(-)
--- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -856,9 +856,14 @@ err:
static int load_flat_shared_library(int id, struct lib_info *libs) { + /* + * This is a fake bprm struct; only the members "buf", "file" and + * "filename" are actually used. + */ struct linux_binprm bprm; int res; char buf[16]; + loff_t pos = 0;
memset(&bprm, 0, sizeof(bprm));
@@ -872,25 +877,11 @@ static int load_flat_shared_library(int if (IS_ERR(bprm.file)) return res;
- bprm.cred = prepare_exec_creds(); - res = -ENOMEM; - if (!bprm.cred) - goto out; - - /* We don't really care about recalculating credentials at this point - * as we're past the point of no return and are dealing with shared - * libraries. - */ - bprm.called_set_creds = 1; - - res = prepare_binprm(&bprm); + res = kernel_read(bprm.file, bprm.buf, BINPRM_BUF_SIZE, &pos);
- if (!res) + if (res >= 0) res = load_flat_file(&bprm, libs, id, NULL);
- abort_creds(bprm.cred); - -out: allow_write_access(bprm.file); fput(bprm.file);
From: Colin Ian King colin.king@canonical.com
commit 7298e3b0a149c91323b3205d325e942c3b3b9ef6 upstream.
Currently the calcuation of end_pfn can round up the pfn number to more than the actual maximum number of pfns, causing an Oops. Fix this by ensuring end_pfn is never more than max_pfn.
This can be easily triggered when on systems where the end_pfn gets rounded up to more than max_pfn using the idle-page stress-ng stress test:
sudo stress-ng --idle-page 0
BUG: unable to handle kernel paging request at 00000000000020d8 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 11039 Comm: stress-ng-idle- Not tainted 5.0.0-5-generic #6-Ubuntu Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:page_idle_get_page+0xc8/0x1a0 Code: 0f b1 0a 75 7d 48 8b 03 48 89 c2 48 c1 e8 33 83 e0 07 48 c1 ea 36 48 8d 0c 40 4c 8d 24 88 49 c1 e4 07 4c 03 24 d5 00 89 c3 be <49> 8b 44 24 58 48 8d b8 80 a1 02 00 e8 07 d5 77 00 48 8b 53 08 48 RSP: 0018:ffffafd7c672fde8 EFLAGS: 00010202 RAX: 0000000000000005 RBX: ffffe36341fff700 RCX: 000000000000000f RDX: 0000000000000284 RSI: 0000000000000275 RDI: 0000000001fff700 RBP: ffffafd7c672fe00 R08: ffffa0bc34056410 R09: 0000000000000276 R10: ffffa0bc754e9b40 R11: ffffa0bc330f6400 R12: 0000000000002080 R13: ffffe36341fff700 R14: 0000000000080000 R15: ffffa0bc330f6400 FS: 00007f0ec1ea5740(0000) GS:ffffa0bc7db00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000020d8 CR3: 0000000077d68000 CR4: 00000000000006e0 Call Trace: page_idle_bitmap_write+0x8c/0x140 sysfs_kf_bin_write+0x5c/0x70 kernfs_fop_write+0x12e/0x1b0 __vfs_write+0x1b/0x40 vfs_write+0xab/0x1b0 ksys_write+0x55/0xc0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Link: http://lkml.kernel.org/r/20190618124352.28307-1-colin.king@canonical.com Fixes: 33c3fc71c8cf ("mm: introduce idle page tracking") Signed-off-by: Colin Ian King colin.king@canonical.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Acked-by: Vladimir Davydov vdavydov.dev@gmail.com Cc: Michal Hocko mhocko@suse.com Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/page_idle.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -136,7 +136,7 @@ static ssize_t page_idle_bitmap_read(str
end_pfn = pfn + count * BITS_PER_BYTE; if (end_pfn > max_pfn) - end_pfn = ALIGN(max_pfn, BITMAP_CHUNK_BITS); + end_pfn = max_pfn;
for (; pfn < end_pfn; pfn++) { bit = pfn % BITMAP_CHUNK_BITS; @@ -181,7 +181,7 @@ static ssize_t page_idle_bitmap_write(st
end_pfn = pfn + count * BITS_PER_BYTE; if (end_pfn > max_pfn) - end_pfn = ALIGN(max_pfn, BITMAP_CHUNK_BITS); + end_pfn = max_pfn;
for (; pfn < end_pfn; pfn++) { bit = pfn % BITMAP_CHUNK_BITS;
From: zhangyi (F) yi.zhang@huawei.com
commit 211ad4b733037f66f9be0a79eade3da7ab11cbb8 upstream.
Currently, although we submit super bios in order (and super.nr_entries is incremented by each logged entry), submit_bio() is async so each super sector may not be written to log device in order and then the final nr_entries may be smaller than it should be.
This problem can be reproduced by the xfstests generic/455 with ext4:
QA output created by 455 -Silence is golden +mark 'end' does not exist
Fix this by serializing submission of super sectors to make sure each is written to the log disk in order.
Fixes: 0e9cebe724597 ("dm: add log writes target") Cc: stable@vger.kernel.org Signed-off-by: zhangyi (F) yi.zhang@huawei.com Suggested-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-log-writes.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-)
--- a/drivers/md/dm-log-writes.c +++ b/drivers/md/dm-log-writes.c @@ -57,6 +57,7 @@
#define WRITE_LOG_VERSION 1ULL #define WRITE_LOG_MAGIC 0x6a736677736872ULL +#define WRITE_LOG_SUPER_SECTOR 0
/* * The disk format for this is braindead simple. @@ -112,6 +113,7 @@ struct log_writes_c { struct list_head logging_blocks; wait_queue_head_t wait; struct task_struct *log_kthread; + struct completion super_done; };
struct pending_block { @@ -177,6 +179,14 @@ static void log_end_io(struct bio *bio) bio_put(bio); }
+static void log_end_super(struct bio *bio) +{ + struct log_writes_c *lc = bio->bi_private; + + complete(&lc->super_done); + log_end_io(bio); +} + /* * Meant to be called if there is an error, it will free all the pages * associated with the block. @@ -212,7 +222,8 @@ static int write_metadata(struct log_wri bio->bi_iter.bi_size = 0; bio->bi_iter.bi_sector = sector; bio_set_dev(bio, lc->logdev->bdev); - bio->bi_end_io = log_end_io; + bio->bi_end_io = (sector == WRITE_LOG_SUPER_SECTOR) ? + log_end_super : log_end_io; bio->bi_private = lc; bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
@@ -334,11 +345,18 @@ static int log_super(struct log_writes_c super.nr_entries = cpu_to_le64(lc->logged_entries); super.sectorsize = cpu_to_le32(lc->sectorsize);
- if (write_metadata(lc, &super, sizeof(super), NULL, 0, 0)) { + if (write_metadata(lc, &super, sizeof(super), NULL, 0, + WRITE_LOG_SUPER_SECTOR)) { DMERR("Couldn't write super"); return -1; }
+ /* + * Super sector should be writen in-order, otherwise the + * nr_entries could be rewritten incorrectly by an old bio. + */ + wait_for_completion_io(&lc->super_done); + return 0; }
@@ -447,6 +465,7 @@ static int log_writes_ctr(struct dm_targ INIT_LIST_HEAD(&lc->unflushed_blocks); INIT_LIST_HEAD(&lc->logging_blocks); init_waitqueue_head(&lc->wait); + init_completion(&lc->super_done); atomic_set(&lc->io_blocks, 0); atomic_set(&lc->pending_blocks, 0);
From: Jan Kara jack@suse.cz
commit 240b4cc8fd5db138b675297d4226ec46594d9b3b upstream.
Once we unlock adapter->hw_lock in pvscsi_queue_lck() nothing prevents just queued scsi_cmnd from completing and freeing the request. Thus cmd->cmnd[0] dereference can dereference already freed request leading to kernel crashes or other issues (which one of our customers observed). Store cmd->cmnd[0] in a local variable before unlocking adapter->hw_lock to fix the issue.
CC: stable@vger.kernel.org Signed-off-by: Jan Kara jack@suse.cz Reviewed-by: Ewan D. Milne emilne@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/vmw_pvscsi.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/scsi/vmw_pvscsi.c +++ b/drivers/scsi/vmw_pvscsi.c @@ -763,6 +763,7 @@ static int pvscsi_queue_lck(struct scsi_ struct pvscsi_adapter *adapter = shost_priv(host); struct pvscsi_ctx *ctx; unsigned long flags; + unsigned char op;
spin_lock_irqsave(&adapter->hw_lock, flags);
@@ -775,13 +776,14 @@ static int pvscsi_queue_lck(struct scsi_ }
cmd->scsi_done = done; + op = cmd->cmnd[0];
dev_dbg(&cmd->device->sdev_gendev, - "queued cmd %p, ctx %p, op=%x\n", cmd, ctx, cmd->cmnd[0]); + "queued cmd %p, ctx %p, op=%x\n", cmd, ctx, op);
spin_unlock_irqrestore(&adapter->hw_lock, flags);
- pvscsi_kick_io(adapter, cmd->cmnd[0]); + pvscsi_kick_io(adapter, op);
return 0; }
From: Alejandro Jimenez alejandro.j.jimenez@oracle.com
commit c1f7fec1eb6a2c86d01bc22afce772c743451d88 upstream.
The bits set in x86_spec_ctrl_mask are used to calculate the guest's value of SPEC_CTRL that is written to the MSR before VMENTRY, and control which mitigations the guest can enable. In the case of SSBD, unless the host has enabled SSBD always on mode (by passing "spec_store_bypass_disable=on" in the kernel parameters), the SSBD bit is not set in the mask and the guest can not properly enable the SSBD always on mitigation mode.
This has been confirmed by running the SSBD PoC on a guest using the SSBD always on mitigation mode (booted with kernel parameter "spec_store_bypass_disable=on"), and verifying that the guest is vulnerable unless the host is also using SSBD always on mode. In addition, the guest OS incorrectly reports the SSB vulnerability as mitigated.
Always set the SSBD bit in x86_spec_ctrl_mask when the host CPU supports it, allowing the guest to use SSBD whether or not the host has chosen to enable the mitigation in any of its modes.
Fixes: be6fcb5478e9 ("x86/bugs: Rework spec_ctrl base and mask logic") Signed-off-by: Alejandro Jimenez alejandro.j.jimenez@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Liam Merwick liam.merwick@oracle.com Reviewed-by: Mark Kanda mark.kanda@oracle.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Cc: bp@alien8.de Cc: rkrcmar@redhat.com Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1560187210-11054-1-git-send-email-alejandro.j.jime... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/bugs.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -821,6 +821,16 @@ static enum ssb_mitigation __init __ssb_ }
/* + * If SSBD is controlled by the SPEC_CTRL MSR, then set the proper + * bit in the mask to allow guests to use the mitigation even in the + * case where the host does not enable it. + */ + if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) || + static_cpu_has(X86_FEATURE_AMD_SSBD)) { + x86_spec_ctrl_mask |= SPEC_CTRL_SSBD; + } + + /* * We have three CPU feature flags that are in play here: * - X86_BUG_SPEC_STORE_BYPASS - CPU is susceptible. * - X86_FEATURE_SSBD - CPU is able to turn off speculative store bypass @@ -837,7 +847,6 @@ static enum ssb_mitigation __init __ssb_ x86_amd_ssb_disable(); } else { x86_spec_ctrl_base |= SPEC_CTRL_SSBD; - x86_spec_ctrl_mask |= SPEC_CTRL_SSBD; wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); } }
From: Thomas Gleixner tglx@linutronix.de
commit 5423f5ce5ca410b3646f355279e4e937d452e622 upstream.
A recent change moved the microcode loader hotplug callback into the early startup phase which is running with interrupts disabled. It missed that the callbacks invoke sysfs functions which might sleep causing nice 'might sleep' splats with proper debugging enabled.
Split the callbacks and only load the microcode in the early startup phase and move the sysfs handling back into the later threaded and preemptible bringup phase where it was before.
Fixes: 78f4e932f776 ("x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback") Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Borislav Petkov bp@suse.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: stable@vger.kernel.org Cc: x86-ml x86@kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906182228350.1766@nanos.tec.linut... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/microcode/core.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -790,13 +790,16 @@ static struct syscore_ops mc_syscore_ops .resume = mc_bp_resume, };
-static int mc_cpu_online(unsigned int cpu) +static int mc_cpu_starting(unsigned int cpu) { - struct device *dev; - - dev = get_cpu_device(cpu); microcode_update_cpu(cpu); pr_debug("CPU%d added\n", cpu); + return 0; +} + +static int mc_cpu_online(unsigned int cpu) +{ + struct device *dev = get_cpu_device(cpu);
if (sysfs_create_group(&dev->kobj, &mc_attr_group)) pr_err("Failed to create group for CPU%d\n", cpu); @@ -873,7 +876,9 @@ int __init microcode_init(void) goto out_ucode_group;
register_syscore_ops(&mc_syscore_ops); - cpuhp_setup_state_nocalls(CPUHP_AP_MICROCODE_LOADER, "x86/microcode:online", + cpuhp_setup_state_nocalls(CPUHP_AP_MICROCODE_LOADER, "x86/microcode:starting", + mc_cpu_starting, NULL); + cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "x86/microcode:online", mc_cpu_online, mc_cpu_down_prep);
pr_info("Microcode Update Driver: v%s.", DRIVER_VERSION);
From: Trond Myklebust trondmy@gmail.com
commit 68f461593f76bd5f17e87cdd0bea28f4278c7268 upstream.
Fix a typo where we're confusing the default TCP retrans value (NFS_DEF_TCP_RETRANS) for the default TCP timeout value.
Fixes: 15d03055cf39f ("pNFS/flexfiles: Set reasonable default ...") Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfs/flexfilelayout/flexfilelayoutdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c +++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c @@ -18,7 +18,7 @@
#define NFSDBG_FACILITY NFSDBG_PNFS_LD
-static unsigned int dataserver_timeo = NFS_DEF_TCP_RETRANS; +static unsigned int dataserver_timeo = NFS_DEF_TCP_TIMEO; static unsigned int dataserver_retrans;
static bool ff_layout_has_available_ds(struct pnfs_layout_segment *lseg);
From: Geert Uytterhoeven geert@linux-m68k.org
commit 1bf72720281770162c87990697eae1ba2f1d917a upstream.
Currently, if the user specifies an unsupported mitigation strategy on the kernel command line, it will be ignored silently. The code will fall back to the default strategy, possibly leaving the system more vulnerable than expected.
This may happen due to e.g. a simple typo, or, for a stable kernel release, because not all mitigation strategies have been backported.
Inform the user by printing a message.
Fixes: 98af8452945c5565 ("cpu/speculation: Add 'mitigations=' cmdline option") Signed-off-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Jiri Kosina jkosina@suse.cz Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Ben Hutchings ben@decadent.org.uk Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190516070935.22546-1-geert@linux-m68k.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/cpu.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2308,6 +2308,9 @@ static int __init mitigations_parse_cmdl cpu_mitigations = CPU_MITIGATIONS_AUTO; else if (!strcmp(arg, "auto,nosmt")) cpu_mitigations = CPU_MITIGATIONS_AUTO_NOSMT; + else + pr_crit("Unsupported mitigations=%s, system may still be vulnerable\n", + arg);
return 0; }
From: Wang Xin xin.wang7@cn.bosch.com
commit 9a9e295e7c5c0409c020088b0ae017e6c2b7df6e upstream.
Within at24_loop_until_timeout the timestamp used for timeout checking is recorded after the I2C transfer and sleep_range(). Under high CPU load either the execution time for I2C transfer or sleep_range() could actually be larger than the timeout value. Worst case the I2C transfer is only tried once because the loop will exit due to the timeout although the EEPROM is now ready.
To fix this issue the timestamp is recorded at the beginning of each iteration. That is, before I2C transfer and sleep. Then the timeout is actually checked against the timestamp of the previous iteration. This makes sure that even if the timeout is reached, there is still one more chance to try the I2C transfer in case the EEPROM is ready.
Example:
If you have a system which combines high CPU load with repeated EEPROM writes you will run into the following scenario.
- System makes a successful regmap_bulk_write() to EEPROM. - System wants to perform another write to EEPROM but EEPROM is still busy with the last write. - Because of high CPU load the usleep_range() will sleep more than 25 ms (at24_write_timeout). - Within the over-long sleeping the EEPROM finished the previous write operation and is ready again. - at24_loop_until_timeout() will detect timeout and won't try to write.
Signed-off-by: Wang Xin xin.wang7@cn.bosch.com Signed-off-by: Mark Jonas mark.jonas@de.bosch.com Signed-off-by: Bartosz Golaszewski brgl@bgdev.pl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/misc/eeprom/at24.c | 107 ++++++++++++++++++++++++++++++++------------- 1 file changed, 77 insertions(+), 30 deletions(-)
--- a/drivers/misc/eeprom/at24.c +++ b/drivers/misc/eeprom/at24.c @@ -113,22 +113,6 @@ MODULE_PARM_DESC(write_timeout, "Time (i ((1 << AT24_SIZE_FLAGS | (_flags)) \ << AT24_SIZE_BYTELEN | ilog2(_len))
-/* - * Both reads and writes fail if the previous write didn't complete yet. This - * macro loops a few times waiting at least long enough for one entire page - * write to work while making sure that at least one iteration is run before - * checking the break condition. - * - * It takes two parameters: a variable in which the future timeout in jiffies - * will be stored and a temporary variable holding the time of the last - * iteration of processing the request. Both should be unsigned integers - * holding at least 32 bits. - */ -#define loop_until_timeout(tout, op_time) \ - for (tout = jiffies + msecs_to_jiffies(write_timeout), op_time = 0; \ - op_time ? time_before(op_time, tout) : true; \ - usleep_range(1000, 1500), op_time = jiffies) - static const struct i2c_device_id at24_ids[] = { /* needs 8 addresses as A0-A2 are ignored */ { "24c00", AT24_DEVICE_MAGIC(128 / 8, AT24_FLAG_TAKE8ADDR) }, @@ -234,7 +218,14 @@ static ssize_t at24_eeprom_read_smbus(st if (count > I2C_SMBUS_BLOCK_MAX) count = I2C_SMBUS_BLOCK_MAX;
- loop_until_timeout(timeout, read_time) { + timeout = jiffies + msecs_to_jiffies(write_timeout); + do { + /* + * The timestamp shall be taken before the actual operation + * to avoid a premature timeout in case of high CPU load. + */ + read_time = jiffies; + status = i2c_smbus_read_i2c_block_data_or_emulated(client, offset, count, buf); @@ -244,7 +235,9 @@ static ssize_t at24_eeprom_read_smbus(st
if (status == count) return count; - } + + usleep_range(1000, 1500); + } while (time_before(read_time, timeout));
return -ETIMEDOUT; } @@ -284,7 +277,14 @@ static ssize_t at24_eeprom_read_i2c(stru msg[1].buf = buf; msg[1].len = count;
- loop_until_timeout(timeout, read_time) { + timeout = jiffies + msecs_to_jiffies(write_timeout); + do { + /* + * The timestamp shall be taken before the actual operation + * to avoid a premature timeout in case of high CPU load. + */ + read_time = jiffies; + status = i2c_transfer(client->adapter, msg, 2); if (status == 2) status = count; @@ -294,7 +294,9 @@ static ssize_t at24_eeprom_read_i2c(stru
if (status == count) return count; - } + + usleep_range(1000, 1500); + } while (time_before(read_time, timeout));
return -ETIMEDOUT; } @@ -343,11 +345,20 @@ static ssize_t at24_eeprom_read_serial(s msg[1].buf = buf; msg[1].len = count;
- loop_until_timeout(timeout, read_time) { + timeout = jiffies + msecs_to_jiffies(write_timeout); + do { + /* + * The timestamp shall be taken before the actual operation + * to avoid a premature timeout in case of high CPU load. + */ + read_time = jiffies; + status = i2c_transfer(client->adapter, msg, 2); if (status == 2) return count; - } + + usleep_range(1000, 1500); + } while (time_before(read_time, timeout));
return -ETIMEDOUT; } @@ -374,11 +385,20 @@ static ssize_t at24_eeprom_read_mac(stru msg[1].buf = buf; msg[1].len = count;
- loop_until_timeout(timeout, read_time) { + timeout = jiffies + msecs_to_jiffies(write_timeout); + do { + /* + * The timestamp shall be taken before the actual operation + * to avoid a premature timeout in case of high CPU load. + */ + read_time = jiffies; + status = i2c_transfer(client->adapter, msg, 2); if (status == 2) return count; - } + + usleep_range(1000, 1500); + } while (time_before(read_time, timeout));
return -ETIMEDOUT; } @@ -420,7 +440,14 @@ static ssize_t at24_eeprom_write_smbus_b client = at24_translate_offset(at24, &offset); count = at24_adjust_write_count(at24, offset, count);
- loop_until_timeout(timeout, write_time) { + timeout = jiffies + msecs_to_jiffies(write_timeout); + do { + /* + * The timestamp shall be taken before the actual operation + * to avoid a premature timeout in case of high CPU load. + */ + write_time = jiffies; + status = i2c_smbus_write_i2c_block_data(client, offset, count, buf); if (status == 0) @@ -431,7 +458,9 @@ static ssize_t at24_eeprom_write_smbus_b
if (status == count) return count; - } + + usleep_range(1000, 1500); + } while (time_before(write_time, timeout));
return -ETIMEDOUT; } @@ -446,7 +475,14 @@ static ssize_t at24_eeprom_write_smbus_b
client = at24_translate_offset(at24, &offset);
- loop_until_timeout(timeout, write_time) { + timeout = jiffies + msecs_to_jiffies(write_timeout); + do { + /* + * The timestamp shall be taken before the actual operation + * to avoid a premature timeout in case of high CPU load. + */ + write_time = jiffies; + status = i2c_smbus_write_byte_data(client, offset, buf[0]); if (status == 0) status = count; @@ -456,7 +492,9 @@ static ssize_t at24_eeprom_write_smbus_b
if (status == count) return count; - } + + usleep_range(1000, 1500); + } while (time_before(write_time, timeout));
return -ETIMEDOUT; } @@ -485,7 +523,14 @@ static ssize_t at24_eeprom_write_i2c(str memcpy(&msg.buf[i], buf, count); msg.len = i + count;
- loop_until_timeout(timeout, write_time) { + timeout = jiffies + msecs_to_jiffies(write_timeout); + do { + /* + * The timestamp shall be taken before the actual operation + * to avoid a premature timeout in case of high CPU load. + */ + write_time = jiffies; + status = i2c_transfer(client->adapter, &msg, 1); if (status == 1) status = count; @@ -495,7 +540,9 @@ static ssize_t at24_eeprom_write_i2c(str
if (status == count) return count; - } + + usleep_range(1000, 1500); + } while (time_before(write_time, timeout));
return -ETIMEDOUT; }
From: Neil Horman nhorman@tuxdriver.com
[ Upstream commit 89ed5b519004a7706f50b70f611edbd3aaacff2c ]
When an application is run that: a) Sets its scheduler to be SCHED_FIFO and b) Opens a memory mapped AF_PACKET socket, and sends frames with the MSG_DONTWAIT flag cleared, its possible for the application to hang forever in the kernel. This occurs because when waiting, the code in tpacket_snd calls schedule, which under normal circumstances allows other tasks to run, including ksoftirqd, which in some cases is responsible for freeing the transmitted skb (which in AF_PACKET calls a destructor that flips the status bit of the transmitted frame back to available, allowing the transmitting task to complete).
However, when the calling application is SCHED_FIFO, its priority is such that the schedule call immediately places the task back on the cpu, preventing ksoftirqd from freeing the skb, which in turn prevents the transmitting task from detecting that the transmission is complete.
We can fix this by converting the schedule call to a completion mechanism. By using a completion queue, we force the calling task, when it detects there are no more frames to send, to schedule itself off the cpu until such time as the last transmitted skb is freed, allowing forward progress to be made.
Tested by myself and the reporter, with good results
Change Notes:
V1->V2: Enhance the sleep logic to support being interruptible and allowing for honoring to SK_SNDTIMEO (Willem de Bruijn)
V2->V3: Rearrage the point at which we wait for the completion queue, to avoid needing to check for ph/skb being null at the end of the loop. Also move the complete call to the skb destructor to avoid needing to modify __packet_set_status. Also gate calling complete on packet_read_pending returning zero to avoid multiple calls to complete. (Willem de Bruijn)
Move timeo computation within loop, to re-fetch the socket timeout since we also use the timeo variable to record the return code from the wait_for_complete call (Neil Horman)
V3->V4: Willem has requested that the control flow be restored to the previous state. Doing so lets us eliminate the need for the po->wait_on_complete flag variable, and lets us get rid of the packet_next_frame function, but introduces another complexity. Specifically, but using the packet pending count, we can, if an applications calls sendmsg multiple times with MSG_DONTWAIT set, each set of transmitted frames, when complete, will cause tpacket_destruct_skb to issue a complete call, for which there will never be a wait_on_completion call. This imbalance will lead to any future call to wait_for_completion here to return early, when the frames they sent may not have completed. To correct this, we need to re-init the completion queue on every call to tpacket_snd before we enter the loop so as to ensure we wait properly for the frames we send in this iteration.
Change the timeout and interrupted gotos to out_put rather than out_status so that we don't try to free a non-existant skb Clean up some extra newlines (Willem de Bruijn)
Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: Neil Horman nhorman@tuxdriver.com Reported-by: Matteo Croce mcroce@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 20 +++++++++++++++++--- net/packet/internal.h | 1 + 2 files changed, 18 insertions(+), 3 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2438,6 +2438,9 @@ static void tpacket_destruct_skb(struct
ts = __packet_set_timestamp(po, ph, skb); __packet_set_status(po, ph, TP_STATUS_AVAILABLE | ts); + + if (!packet_read_pending(&po->tx_ring)) + complete(&po->skb_completion); }
sock_wfree(skb); @@ -2632,7 +2635,7 @@ static int tpacket_parse_header(struct p
static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) { - struct sk_buff *skb; + struct sk_buff *skb = NULL; struct net_device *dev; struct virtio_net_hdr *vnet_hdr = NULL; struct sockcm_cookie sockc; @@ -2647,6 +2650,7 @@ static int tpacket_snd(struct packet_soc int len_sum = 0; int status = TP_STATUS_AVAILABLE; int hlen, tlen, copylen = 0; + long timeo = 0;
mutex_lock(&po->pg_vec_lock);
@@ -2693,12 +2697,21 @@ static int tpacket_snd(struct packet_soc if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !po->has_vnet_hdr) size_max = dev->mtu + reserve + VLAN_HLEN;
+ reinit_completion(&po->skb_completion); + do { ph = packet_current_frame(po, &po->tx_ring, TP_STATUS_SEND_REQUEST); if (unlikely(ph == NULL)) { - if (need_wait && need_resched()) - schedule(); + if (need_wait && skb) { + timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT); + timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo); + if (timeo <= 0) { + err = !timeo ? -ETIMEDOUT : -ERESTARTSYS; + goto out_put; + } + } + /* check for additional frames */ continue; }
@@ -3252,6 +3265,7 @@ static int packet_create(struct net *net sock_init_data(sock, sk);
po = pkt_sk(sk); + init_completion(&po->skb_completion); sk->sk_family = PF_PACKET; po->num = proto; po->xmit = dev_queue_xmit; --- a/net/packet/internal.h +++ b/net/packet/internal.h @@ -128,6 +128,7 @@ struct packet_sock { unsigned int tp_hdrlen; unsigned int tp_reserve; unsigned int tp_tstamp; + struct completion skb_completion; struct net_device __rcu *cached_dev; int (*xmit)(struct sk_buff *skb); struct packet_type prot_hook ____cacheline_aligned_in_smp;
From: Stephen Suryaputra ssuryaextr@gmail.com
[ Upstream commit 38c73529de13e1e10914de7030b659a2f8b01c3b ]
In commit 19e4e768064a8 ("ipv4: Fix raw socket lookup for local traffic"), the dif argument to __raw_v4_lookup() is coming from the returned value of inet_iif() but the change was done only for the first lookup. Subsequent lookups in the while loop still use skb->dev->ifIndex.
Fixes: 19e4e768064a8 ("ipv4: Fix raw socket lookup for local traffic") Signed-off-by: Stephen Suryaputra ssuryaextr@gmail.com Reviewed-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/raw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -202,7 +202,7 @@ static int raw_v4_input(struct sk_buff * } sk = __raw_v4_lookup(net, sk_next(sk), iph->protocol, iph->saddr, iph->daddr, - skb->dev->ifindex, sdif); + dif, sdif); } out: read_unlock(&raw_v4_hashinfo.lock);
From: Eric Dumazet edumazet@google.com
[ Upstream commit 55655e3d1197fff16a7a05088fb0e5eba50eac55 ]
syzbot found we can leak memory in packet_set_ring(), if user application provides buggy parameters.
Fixes: 7f953ab2ba46 ("af_packet: TX_RING support for TPACKET_V3") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Sowmini Varadhan sowmini.varadhan@oracle.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -4354,7 +4354,7 @@ static int packet_set_ring(struct sock * req3->tp_sizeof_priv || req3->tp_feature_req_word) { err = -EINVAL; - goto out; + goto out_free_pg_vec; } } break; @@ -4418,6 +4418,7 @@ static int packet_set_ring(struct sock * prb_shutdown_retire_blk_timer(po, rb_queue); }
+out_free_pg_vec: if (pg_vec) free_pg_vec(pg_vec, order, req->tp_block_nr); out:
From: JingYi Hou houjingyi647@gmail.com
[ Upstream commit d0bae4a0e3d8c5690a885204d7eb2341a5b4884d ]
In sock_getsockopt(), 'optlen' is fetched the first time from userspace. 'len < 0' is then checked. Then in condition 'SO_MEMINFO', 'optlen' is fetched the second time from userspace.
If change it between two fetches may cause security problems or unexpected behaivor, and there is no reason to fetch it a second time.
To fix this, we need to remove the second fetch.
Signed-off-by: JingYi Hou houjingyi647@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/sock.c | 3 --- 1 file changed, 3 deletions(-)
--- a/net/core/sock.c +++ b/net/core/sock.c @@ -1358,9 +1358,6 @@ int sock_getsockopt(struct socket *sock, { u32 meminfo[SK_MEMINFO_VARS];
- if (get_user(len, optlen)) - return -EFAULT; - sk_get_meminfo(sk, meminfo);
len = min_t(unsigned int, len, sizeof(meminfo));
From: Roland Hii roland.king.guan.hii@intel.com
[ Upstream commit a1e5388b4d5fc78688e5e9ee6641f779721d6291 ]
When ADDSUB bit is set, the system time seconds field is calculated as the complement of the seconds part of the update value.
For example, if 3.000000001 seconds need to be subtracted from the system time, this field is calculated as 2^32 - 3 = 4294967296 - 3 = 0x100000000 - 3 = 0xFFFFFFFD
Previously, the 0x100000000 is mistakenly written as 100000000.
This is further simplified from sec = (0x100000000ULL - sec); to sec = -sec;
Fixes: ba1ffd74df74 ("stmmac: fix PTP support for GMAC4") Signed-off-by: Roland Hii roland.king.guan.hii@intel.com Signed-off-by: Ong Boon Leong boon.leong.ong@intel.com Signed-off-by: Voon Weifeng weifeng.voon@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c @@ -121,7 +121,7 @@ static int stmmac_adjust_systime(void __ * programmed with (2^32 – <new_sec_value>) */ if (gmac4) - sec = (100000000ULL - sec); + sec = -sec;
value = readl(ioaddr + PTP_TCR); if (value & PTP_TCR_TSCTRLSSR)
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 25bff6d5478b2a02368097015b7d8eb727c87e16 ]
Now in sctp_endpoint_init(), it holds the sk then creates auth shkey. But when the creation fails, it doesn't release the sk, which causes a sk defcnf leak,
Here to fix it by only holding the sk when auth shkey is created successfully.
Fixes: a29a5bd4f5c3 ("[SCTP]: Implement SCTP-AUTH initializations.") Reported-by: syzbot+afabda3890cc2f765041@syzkaller.appspotmail.com Reported-by: syzbot+276ca1c77a19977c0130@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Neil Horman nhorman@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/endpointola.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/net/sctp/endpointola.c +++ b/net/sctp/endpointola.c @@ -126,10 +126,6 @@ static struct sctp_endpoint *sctp_endpoi /* Initialize the bind addr area */ sctp_bind_addr_init(&ep->base.bind_addr, 0);
- /* Remember who we are attached to. */ - ep->base.sk = sk; - sock_hold(ep->base.sk); - /* Create the lists of associations. */ INIT_LIST_HEAD(&ep->asocs);
@@ -167,6 +163,10 @@ static struct sctp_endpoint *sctp_endpoi ep->prsctp_enable = net->sctp.prsctp_enable; ep->reconf_enable = net->sctp.reconf_enable;
+ /* Remember who we are attached to. */ + ep->base.sk = sk; + sock_hold(ep->base.sk); + return ep;
nomem_hmacs:
From: Xin Long lucien.xin@gmail.com
[ Upstream commit c492d4c74dd3f87559883ffa0f94a8f1ae3fe5f5 ]
This patch is to fix a dst defcnt leak, which can be reproduced by doing:
# ip net a c; ip net a s; modprobe tipc # ip net e s ip l a n eth1 type veth peer n eth1 netns c # ip net e c ip l s lo up; ip net e c ip l s eth1 up # ip net e s ip l s lo up; ip net e s ip l s eth1 up # ip net e c ip a a 1.1.1.2/8 dev eth1 # ip net e s ip a a 1.1.1.1/8 dev eth1 # ip net e c tipc b e m udp n u1 localip 1.1.1.2 # ip net e s tipc b e m udp n u1 localip 1.1.1.1 # ip net d c; ip net d s; rmmod tipc
and it will get stuck and keep logging the error:
unregister_netdevice: waiting for lo to become free. Usage count = 1
The cause is that a dst is held by the udp sock's sk_rx_dst set on udp rx path with udp_early_demux == 1, and this dst (eventually holding lo dev) can't be released as bearer's removal in tipc pernet .exit happens after lo dev's removal, default_device pernet .exit.
"There are two distinct types of pernet_operations recognized: subsys and device. At creation all subsys init functions are called before device init functions, and at destruction all device exit functions are called before subsys exit function."
So by calling register_pernet_device instead to register tipc_net_ops, the pernet .exit() will be invoked earlier than loopback dev's removal when a netns is being destroyed, as fou/gue does.
Note that vxlan and geneve udp tunnels don't have this issue, as the udp sock is released in their device ndo_stop().
This fix is also necessary for tipc dst_cache, which will hold dsts on tx path and I will introduce in my next patch.
Reported-by: Li Shuang shuali@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Jon Maloy jon.maloy@ericsson.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tipc/core.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/net/tipc/core.c +++ b/net/tipc/core.c @@ -128,7 +128,7 @@ static int __init tipc_init(void) if (err) goto out_sysctl;
- err = register_pernet_subsys(&tipc_net_ops); + err = register_pernet_device(&tipc_net_ops); if (err) goto out_pernet;
@@ -136,7 +136,7 @@ static int __init tipc_init(void) if (err) goto out_socket;
- err = register_pernet_subsys(&tipc_topsrv_net_ops); + err = register_pernet_device(&tipc_topsrv_net_ops); if (err) goto out_pernet_topsrv;
@@ -147,11 +147,11 @@ static int __init tipc_init(void) pr_info("Started in single node mode\n"); return 0; out_bearer: - unregister_pernet_subsys(&tipc_topsrv_net_ops); + unregister_pernet_device(&tipc_topsrv_net_ops); out_pernet_topsrv: tipc_socket_stop(); out_socket: - unregister_pernet_subsys(&tipc_net_ops); + unregister_pernet_device(&tipc_net_ops); out_pernet: tipc_unregister_sysctl(); out_sysctl: @@ -166,9 +166,9 @@ out_netlink: static void __exit tipc_exit(void) { tipc_bearer_cleanup(); - unregister_pernet_subsys(&tipc_topsrv_net_ops); + unregister_pernet_device(&tipc_topsrv_net_ops); tipc_socket_stop(); - unregister_pernet_subsys(&tipc_net_ops); + unregister_pernet_device(&tipc_net_ops); tipc_netlink_stop(); tipc_netlink_compat_stop(); tipc_unregister_sysctl();
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 4f07b80c973348a99b5d2a32476a2e7877e94a05 ]
This patch is to fix an uninit-value issue, reported by syzbot:
BUG: KMSAN: uninit-value in memchr+0xce/0x110 lib/string.c:981 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x191/0x1f0 lib/dump_stack.c:113 kmsan_report+0x130/0x2a0 mm/kmsan/kmsan.c:622 __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:310 memchr+0xce/0x110 lib/string.c:981 string_is_valid net/tipc/netlink_compat.c:176 [inline] tipc_nl_compat_bearer_disable+0x2a1/0x480 net/tipc/netlink_compat.c:449 __tipc_nl_compat_doit net/tipc/netlink_compat.c:327 [inline] tipc_nl_compat_doit+0x3ac/0xb00 net/tipc/netlink_compat.c:360 tipc_nl_compat_handle net/tipc/netlink_compat.c:1178 [inline] tipc_nl_compat_recv+0x1b1b/0x27b0 net/tipc/netlink_compat.c:1281
TLV_GET_DATA_LEN() may return a negtive int value, which will be used as size_t (becoming a big unsigned long) passed into memchr, cause this issue.
Similar to what it does in tipc_nl_compat_bearer_enable(), this fix is to return -EINVAL when TLV_GET_DATA_LEN() is negtive in tipc_nl_compat_bearer_disable(), as well as in tipc_nl_compat_link_stat_dump() and tipc_nl_compat_link_reset_stats().
v1->v2: - add the missing Fixes tags per Eric's request.
Fixes: 0762216c0ad2 ("tipc: fix uninit-value in tipc_nl_compat_bearer_enable") Fixes: 8b66fee7f8ee ("tipc: fix uninit-value in tipc_nl_compat_link_reset_stats") Reported-by: syzbot+30eaa8bf392f7fafffaf@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tipc/netlink_compat.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
--- a/net/tipc/netlink_compat.c +++ b/net/tipc/netlink_compat.c @@ -436,7 +436,11 @@ static int tipc_nl_compat_bearer_disable if (!bearer) return -EMSGSIZE;
- len = min_t(int, TLV_GET_DATA_LEN(msg->req), TIPC_MAX_BEARER_NAME); + len = TLV_GET_DATA_LEN(msg->req); + if (len <= 0) + return -EINVAL; + + len = min_t(int, len, TIPC_MAX_BEARER_NAME); if (!string_is_valid(name, len)) return -EINVAL;
@@ -528,7 +532,11 @@ static int tipc_nl_compat_link_stat_dump
name = (char *)TLV_DATA(msg->req);
- len = min_t(int, TLV_GET_DATA_LEN(msg->req), TIPC_MAX_LINK_NAME); + len = TLV_GET_DATA_LEN(msg->req); + if (len <= 0) + return -EINVAL; + + len = min_t(int, len, TIPC_MAX_BEARER_NAME); if (!string_is_valid(name, len)) return -EINVAL;
@@ -806,7 +814,11 @@ static int tipc_nl_compat_link_reset_sta if (!link) return -EMSGSIZE;
- len = min_t(int, TLV_GET_DATA_LEN(msg->req), TIPC_MAX_LINK_NAME); + len = TLV_GET_DATA_LEN(msg->req); + if (len <= 0) + return -EINVAL; + + len = min_t(int, len, TIPC_MAX_BEARER_NAME); if (!string_is_valid(name, len)) return -EINVAL;
From: Fei Li lifei.shirley@bytedance.com
[ Upstream commit 72b319dc08b4924a29f5e2560ef6d966fa54c429 ]
Currently after setting tap0 link up, the tun code wakes tx/rx waited queues up in tun_net_open() when .ndo_open() is called, however the IFF_UP flag has not been set yet. If there's already a wait queue, it would fail to transmit when checking the IFF_UP flag in tun_sendmsg(). Then the saving vhost_poll_start() will add the wq into wqh until it is waken up again. Although this works when IFF_UP flag has been set when tun_chr_poll detects; this is not true if IFF_UP flag has not been set at that time. Sadly the latter case is a fatal error, as the wq will never be waken up in future unless later manually setting link up on purpose.
Fix this by moving the wakeup process into the NETDEV_UP event notifying process, this makes sure IFF_UP has been set before all waited queues been waken up.
Signed-off-by: Fei Li lifei.shirley@bytedance.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/tun.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-)
--- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -831,18 +831,8 @@ static void tun_net_uninit(struct net_de /* Net device open. */ static int tun_net_open(struct net_device *dev) { - struct tun_struct *tun = netdev_priv(dev); - int i; - netif_tx_start_all_queues(dev);
- for (i = 0; i < tun->numqueues; i++) { - struct tun_file *tfile; - - tfile = rtnl_dereference(tun->tfiles[i]); - tfile->socket.sk->sk_write_space(tfile->socket.sk); - } - return 0; }
@@ -2826,6 +2816,7 @@ static int tun_device_event(struct notif { struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct tun_struct *tun = netdev_priv(dev); + int i;
if (dev->rtnl_link_ops != &tun_link_ops) return NOTIFY_DONE; @@ -2835,6 +2826,14 @@ static int tun_device_event(struct notif if (tun_queue_resize(tun)) return NOTIFY_BAD; break; + case NETDEV_UP: + for (i = 0; i < tun->numqueues; i++) { + struct tun_file *tfile; + + tfile = rtnl_dereference(tun->tfiles[i]); + tfile->socket.sk->sk_write_space(tfile->socket.sk); + } + break; default: break; }
From: YueHaibing yuehaibing@huawei.com
[ Upstream commit ee4297420d56a0033a8593e80b33fcc93fda8509 ]
We should rather have vlan_tci filled all the way down to the transmitting netdevice and let it do the hw/sw vlan implementation.
Suggested-by: Jiri Pirko jiri@resnulli.us Signed-off-by: YueHaibing yuehaibing@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/team/team.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -2131,12 +2131,12 @@ static void team_setup(struct net_device dev->features |= NETIF_F_NETNS_LOCAL;
dev->hw_features = TEAM_VLAN_FEATURES | - NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_CTAG_FILTER;
dev->hw_features |= NETIF_F_GSO_ENCAP_ALL; dev->features |= dev->hw_features; + dev->features |= NETIF_F_HW_VLAN_CTAG_TX; }
static int team_newlink(struct net *src_net, struct net_device *dev,
From: YueHaibing yuehaibing@huawei.com
[ Upstream commit 30d8177e8ac776d89d387fad547af6a0f599210e ]
We build vlan on top of bonding interface, which vlan offload is off, bond mode is 802.3ad (LACP) and xmit_hash_policy is BOND_XMIT_POLICY_ENCAP34.
Because vlan tx offload is off, vlan tci is cleared and skb push the vlan header in validate_xmit_vlan() while sending from vlan devices. Then in bond_xmit_hash, __skb_flow_dissect() fails to get information from protocol headers encapsulated within vlan, because 'nhoff' is points to IP header, so bond hashing is based on layer 2 info, which fails to distribute packets across slaves.
This patch always enable bonding's vlan tx offload, pass the vlan packets to the slave devices with vlan tci, let them to handle vlan implementation.
Fixes: 278339a42a1b ("bonding: propogate vlan_features to bonding master") Suggested-by: Jiri Pirko jiri@resnulli.us Signed-off-by: YueHaibing yuehaibing@huawei.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/bonding/bond_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4263,12 +4263,12 @@ void bond_setup(struct net_device *bond_ bond_dev->features |= NETIF_F_NETNS_LOCAL;
bond_dev->hw_features = BOND_VLAN_FEATURES | - NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_CTAG_FILTER;
bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL; bond_dev->features |= bond_dev->hw_features; + bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX; }
/* Destroy a bonding device.
From: Martin KaFai Lau kafai@fb.com
commit 257a525fe2e49584842c504a92c27097407f778f upstream.
When the commit a6024562ffd7 ("udp: Add GRO functions to UDP socket") added udp[46]_lib_lookup_skb to the udp_gro code path, it broke the reuseport_select_sock() assumption that skb->data is pointing to the transport header.
This patch follows an earlier __udp6_lib_err() fix by passing a NULL skb to avoid calling the reuseport's bpf_prog.
Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Cc: Tom Herbert tom@herbertland.com Signed-off-by: Martin KaFai Lau kafai@fb.com Acked-by: Song Liu songliubraving@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv4/udp.c | 6 +++++- net/ipv6/udp.c | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-)
--- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -563,7 +563,11 @@ static inline struct sock *__udp4_lib_lo struct sock *udp4_lib_lookup_skb(struct sk_buff *skb, __be16 sport, __be16 dport) { - return __udp4_lib_lookup_skb(skb, sport, dport, &udp_table); + const struct iphdr *iph = ip_hdr(skb); + + return __udp4_lib_lookup(dev_net(skb->dev), iph->saddr, sport, + iph->daddr, dport, inet_iif(skb), + inet_sdif(skb), &udp_table, NULL); } EXPORT_SYMBOL_GPL(udp4_lib_lookup_skb);
--- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -308,7 +308,7 @@ struct sock *udp6_lib_lookup_skb(struct
return __udp6_lib_lookup(dev_net(skb->dev), &iph->saddr, sport, &iph->daddr, dport, inet6_iif(skb), - inet6_sdif(skb), &udp_table, skb); + inet6_sdif(skb), &udp_table, NULL); } EXPORT_SYMBOL_GPL(udp6_lib_lookup_skb);
From: Martin KaFai Lau kafai@fb.com
commit 4ac30c4b3659efac031818c418beb51e630d512d upstream.
__udp6_lib_err() may be called when handling icmpv6 message. For example, the icmpv6 toobig(type=2). __udp6_lib_lookup() is then called which may call reuseport_select_sock(). reuseport_select_sock() will call into a bpf_prog (if there is one).
reuseport_select_sock() is expecting the skb->data pointing to the transport header (udphdr in this case). For example, run_bpf_filter() is pulling the transport header.
However, in the __udp6_lib_err() path, the skb->data is pointing to the ipv6hdr instead of the udphdr.
One option is to pull and push the ipv6hdr in __udp6_lib_err(). Instead of doing this, this patch follows how the original commit 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") was done in IPv4, which has passed a NULL skb pointer to reuseport_select_sock().
Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") Cc: Craig Gallek kraig@google.com Signed-off-by: Martin KaFai Lau kafai@fb.com Acked-by: Song Liu songliubraving@fb.com Acked-by: Craig Gallek kraig@google.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/udp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -506,7 +506,7 @@ void __udp6_lib_err(struct sk_buff *skb, struct net *net = dev_net(skb->dev);
sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source, - inet6_iif(skb), 0, udptable, skb); + inet6_iif(skb), 0, udptable, NULL); if (!sk) { __ICMP6_INC_STATS(net, __in6_dev_get(skb->dev), ICMP6_MIB_INERRORS);
From: Will Deacon will.deacon@arm.com
commit 8e4e0ac02b449297b86498ac24db5786ddd9f647 upstream.
Returning an error code from futex_atomic_cmpxchg_inatomic() indicates that the caller should not make any use of *uval, and should instead act upon on the value of the error code. Although this is implemented correctly in our futex code, we needlessly copy uninitialised stack to *uval in the error case, which can easily be avoided.
Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/futex.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -134,7 +134,9 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, : "memory"); uaccess_disable();
- *uval = val; + if (!ret) + *uval = val; + return ret; }
From: Daniel Borkmann daniel@iogearbox.net
commit 34b8ab091f9ef57a2bb3c8c8359a0a03a8abf2f9 upstream.
Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016, lets add support for STADD and use that in favor of LDXR / STXR loop for the XADD mapping if available. STADD is encoded as an alias for LDADD with XZR as the destination register, therefore add LDADD to the instruction encoder along with STADD as special case and use it in the JIT for CPUs that advertise LSE atomics in CPUID register. If immediate offset in the BPF XADD insn is 0, then use dst register directly instead of temporary one.
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Jean-Philippe Brucker jean-philippe.brucker@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/insn.h | 8 ++++++++ arch/arm64/kernel/insn.c | 40 ++++++++++++++++++++++++++++++++++++++++ arch/arm64/net/bpf_jit.h | 4 ++++ arch/arm64/net/bpf_jit_comp.c | 28 +++++++++++++++++++--------- 4 files changed, 71 insertions(+), 9 deletions(-)
--- a/arch/arm64/include/asm/insn.h +++ b/arch/arm64/include/asm/insn.h @@ -271,6 +271,7 @@ __AARCH64_INSN_FUNCS(adrp, 0x9F000000, 0 __AARCH64_INSN_FUNCS(prfm, 0x3FC00000, 0x39800000) __AARCH64_INSN_FUNCS(prfm_lit, 0xFF000000, 0xD8000000) __AARCH64_INSN_FUNCS(str_reg, 0x3FE0EC00, 0x38206800) +__AARCH64_INSN_FUNCS(ldadd, 0x3F20FC00, 0xB8200000) __AARCH64_INSN_FUNCS(ldr_reg, 0x3FE0EC00, 0x38606800) __AARCH64_INSN_FUNCS(ldr_lit, 0xBF000000, 0x18000000) __AARCH64_INSN_FUNCS(ldrsw_lit, 0xFF000000, 0x98000000) @@ -383,6 +384,13 @@ u32 aarch64_insn_gen_load_store_ex(enum enum aarch64_insn_register state, enum aarch64_insn_size_type size, enum aarch64_insn_ldst_type type); +u32 aarch64_insn_gen_ldadd(enum aarch64_insn_register result, + enum aarch64_insn_register address, + enum aarch64_insn_register value, + enum aarch64_insn_size_type size); +u32 aarch64_insn_gen_stadd(enum aarch64_insn_register address, + enum aarch64_insn_register value, + enum aarch64_insn_size_type size); u32 aarch64_insn_gen_add_sub_imm(enum aarch64_insn_register dst, enum aarch64_insn_register src, int imm, enum aarch64_insn_variant variant, --- a/arch/arm64/kernel/insn.c +++ b/arch/arm64/kernel/insn.c @@ -793,6 +793,46 @@ u32 aarch64_insn_gen_load_store_ex(enum state); }
+u32 aarch64_insn_gen_ldadd(enum aarch64_insn_register result, + enum aarch64_insn_register address, + enum aarch64_insn_register value, + enum aarch64_insn_size_type size) +{ + u32 insn = aarch64_insn_get_ldadd_value(); + + switch (size) { + case AARCH64_INSN_SIZE_32: + case AARCH64_INSN_SIZE_64: + break; + default: + pr_err("%s: unimplemented size encoding %d\n", __func__, size); + return AARCH64_BREAK_FAULT; + } + + insn = aarch64_insn_encode_ldst_size(size, insn); + + insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT, insn, + result); + + insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RN, insn, + address); + + return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RS, insn, + value); +} + +u32 aarch64_insn_gen_stadd(enum aarch64_insn_register address, + enum aarch64_insn_register value, + enum aarch64_insn_size_type size) +{ + /* + * STADD is simply encoded as an alias for LDADD with XZR as + * the destination register. + */ + return aarch64_insn_gen_ldadd(AARCH64_INSN_REG_ZR, address, + value, size); +} + static u32 aarch64_insn_encode_prfm_imm(enum aarch64_insn_prfm_type type, enum aarch64_insn_prfm_target target, enum aarch64_insn_prfm_policy policy, --- a/arch/arm64/net/bpf_jit.h +++ b/arch/arm64/net/bpf_jit.h @@ -100,6 +100,10 @@ #define A64_STXR(sf, Rt, Rn, Rs) \ A64_LSX(sf, Rt, Rn, Rs, STORE_EX)
+/* LSE atomics */ +#define A64_STADD(sf, Rn, Rs) \ + aarch64_insn_gen_stadd(Rn, Rs, A64_SIZE(sf)) + /* Add/subtract (immediate) */ #define A64_ADDSUB_IMM(sf, Rd, Rn, imm12, type) \ aarch64_insn_gen_add_sub_imm(Rd, Rn, imm12, \ --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -330,7 +330,7 @@ static int build_insn(const struct bpf_i const int i = insn - ctx->prog->insnsi; const bool is64 = BPF_CLASS(code) == BPF_ALU64; const bool isdw = BPF_SIZE(code) == BPF_DW; - u8 jmp_cond; + u8 jmp_cond, reg; s32 jmp_offset;
#define check_imm(bits, imm) do { \ @@ -706,18 +706,28 @@ emit_cond_jmp: break; } break; + /* STX XADD: lock *(u32 *)(dst + off) += src */ case BPF_STX | BPF_XADD | BPF_W: /* STX XADD: lock *(u64 *)(dst + off) += src */ case BPF_STX | BPF_XADD | BPF_DW: - emit_a64_mov_i(1, tmp, off, ctx); - emit(A64_ADD(1, tmp, tmp, dst), ctx); - emit(A64_LDXR(isdw, tmp2, tmp), ctx); - emit(A64_ADD(isdw, tmp2, tmp2, src), ctx); - emit(A64_STXR(isdw, tmp2, tmp, tmp3), ctx); - jmp_offset = -3; - check_imm19(jmp_offset); - emit(A64_CBNZ(0, tmp3, jmp_offset), ctx); + if (!off) { + reg = dst; + } else { + emit_a64_mov_i(1, tmp, off, ctx); + emit(A64_ADD(1, tmp, tmp, dst), ctx); + reg = tmp; + } + if (cpus_have_cap(ARM64_HAS_LSE_ATOMICS)) { + emit(A64_STADD(isdw, reg, src), ctx); + } else { + emit(A64_LDXR(isdw, tmp2, reg), ctx); + emit(A64_ADD(isdw, tmp2, tmp2, src), ctx); + emit(A64_STXR(isdw, tmp2, reg, tmp3), ctx); + jmp_offset = -3; + check_imm19(jmp_offset); + emit(A64_CBNZ(0, tmp3, jmp_offset), ctx); + } break;
/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
From: Will Deacon will.deacon@arm.com
commit 427503519739e779c0db8afe876c1b33f3ac60ae upstream.
The architecture implementations of 'arch_futex_atomic_op_inuser()' and 'futex_atomic_cmpxchg_inatomic()' are permitted to return only -EFAULT, -EAGAIN or -ENOSYS in the case of failure.
Update the comments in the asm-generic/ implementation and also a stray reference in the robust futex documentation.
Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/robust-futexes.txt | 3 +-- include/asm-generic/futex.h | 8 ++++++-- 2 files changed, 7 insertions(+), 4 deletions(-)
--- a/Documentation/robust-futexes.txt +++ b/Documentation/robust-futexes.txt @@ -218,5 +218,4 @@ All other architectures should build jus the new syscalls yet.
Architectures need to implement the new futex_atomic_cmpxchg_inatomic() -inline function before writing up the syscalls (that function returns --ENOSYS right now). +inline function before writing up the syscalls. --- a/include/asm-generic/futex.h +++ b/include/asm-generic/futex.h @@ -23,7 +23,9 @@ * * Return: * 0 - On success - * <0 - On error + * -EFAULT - User access resulted in a page fault + * -EAGAIN - Atomic operation was unable to complete due to contention + * -ENOSYS - Operation not supported */ static inline int arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval, u32 __user *uaddr) @@ -85,7 +87,9 @@ out_pagefault_enable: * * Return: * 0 - On success - * <0 - On error + * -EFAULT - User access resulted in a page fault + * -EAGAIN - Atomic operation was unable to complete due to contention + * -ENOSYS - Function not implemented (only if !HAVE_FUTEX_CMPXCHG) */ static inline int futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
From: Xin Long lucien.xin@gmail.com
commit c3bcde026684c62d7a2b6f626dc7cf763833875c upstream.
udp_tunnel(6)_xmit_skb() called by tipc_udp_xmit() expects a tunnel device to count packets on dev->tstats, a perpcu variable. However, TIPC is using udp tunnel with no tunnel device, and pass the lower dev, like veth device that only initializes dev->lstats(a perpcu variable) when creating it.
Later iptunnel_xmit_stats() called by ip(6)tunnel_xmit() thinks the dev as a tunnel device, and uses dev->tstats instead of dev->lstats. tstats' each pointer points to a bigger struct than lstats, so when tstats->tx_bytes is increased, other percpu variable's members could be overwritten.
syzbot has reported quite a few crashes due to fib_nh_common percpu member 'nhc_pcpu_rth_output' overwritten, call traces are like:
BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 __mkroute_output net/ipv4/route.c:2332 [inline] ip_route_output_key_hash_rcu+0x819/0x2d50 net/ipv4/route.c:2564 ip_route_output_key_hash+0x1ef/0x360 net/ipv4/route.c:2393 __ip_route_output_key include/net/route.h:125 [inline] ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2651 ip_route_output_key include/net/route.h:135 [inline] ...
or:
kasan: GPF could be caused by NULL-ptr deref or user memory access RIP: 0010:dst_dev_put+0x24/0x290 net/core/dst.c:168 <IRQ> rt_fibinfo_free_cpus net/ipv4/fib_semantics.c:200 [inline] free_fib_info_rcu+0x2e1/0x490 net/ipv4/fib_semantics.c:217 __rcu_reclaim kernel/rcu/rcu.h:240 [inline] rcu_do_batch kernel/rcu/tree.c:2437 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline] rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697 ...
The issue exists since tunnel stats update is moved to iptunnel_xmit by Commit 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()"), and here to fix it by passing a NULL tunnel dev to udp_tunnel(6)_xmit_skb so that the packets counting won't happen on dev->tstats.
Reported-by: syzbot+9d4c12bfd45a58738d0a@syzkaller.appspotmail.com Reported-by: syzbot+a9e23ea2aa21044c2798@syzkaller.appspotmail.com Reported-by: syzbot+c4c4b2bb358bb936ad7e@syzkaller.appspotmail.com Reported-by: syzbot+0290d2290a607e035ba1@syzkaller.appspotmail.com Reported-by: syzbot+a43d8d4e7e8a7a9e149e@syzkaller.appspotmail.com Reported-by: syzbot+a47c5f4c6c00fc1ed16e@syzkaller.appspotmail.com Fixes: 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()") Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/tipc/udp_media.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
--- a/net/tipc/udp_media.c +++ b/net/tipc/udp_media.c @@ -174,7 +174,6 @@ static int tipc_udp_xmit(struct net *net goto tx_error; }
- skb->dev = rt->dst.dev; ttl = ip4_dst_hoplimit(&rt->dst); udp_tunnel_xmit_skb(rt, ub->ubsock->sk, skb, src->ipv4.s_addr, dst->ipv4.s_addr, 0, ttl, 0, src->port, @@ -193,10 +192,9 @@ static int tipc_udp_xmit(struct net *net if (err) goto tx_error; ttl = ip6_dst_hoplimit(ndst); - err = udp_tunnel6_xmit_skb(ndst, ub->ubsock->sk, skb, - ndst->dev, &src->ipv6, - &dst->ipv6, 0, ttl, 0, src->port, - dst->port, false); + err = udp_tunnel6_xmit_skb(ndst, ub->ubsock->sk, skb, NULL, + &src->ipv6, &dst->ipv6, 0, ttl, 0, + src->port, dst->port, false); #endif } return err;
On Tue, 2019-07-02 at 10:02 +0200, Greg Kroah-Hartman wrote:
From: Xin Long lucien.xin@gmail.com
commit c3bcde026684c62d7a2b6f626dc7cf763833875c upstream.
udp_tunnel(6)_xmit_skb() called by tipc_udp_xmit() expects a tunnel device to count packets on dev->tstats, a perpcu variable. However, TIPC is using udp tunnel with no tunnel device, and pass the lower dev, like veth device that only initializes dev->lstats(a perpcu variable) when creating it.
Hi,
This tipc patch added in 4.14.132 is triggering a crash for me, revert fixes it.
Anyone have ideas if some other commits missing in 4.14.x to make this work...?
# modprobe tipc # tipc node set addr 1.1.2 # tipc bearer enable media udp name UDP1 localip 192.168.1.15
[ 143.105529] Own node address <1.1.2>, network identity 4711 [ 172.087098] BUG: unable to handle kernel NULL pointer dereference at 00000000000004f0 [ 172.088375] IP: iptunnel_xmit+0x15e/0x1e0 [ 172.089072] PGD 8000000231306067 P4D 8000000231306067 PUD 2356e1067 PMD 0 [ 172.090094] Oops: 0000 [#1] SMP PTI [ 172.090610] Modules linked in: tipc ip6_udp_tunnel udp_tunnel isofs kvm_intel kvm irqbypass sch_fq_codel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ata_piix dm_mirror dm_region_hash dm_log dm_mod dax autofs4 [ 172.093293] CPU: 1 PID: 747 Comm: tipc Not tainted 4.14.134-1.x86_64 #1 [ 172.094448] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 [ 172.095703] task: ffff8b99f12c0000 task.stack: ffff9ab481198000 [ 172.096731] RIP: 0010:iptunnel_xmit+0x15e/0x1e0 [ 172.097460] RSP: 0018:ffff9ab48119ba00 EFLAGS: 00010202 [ 172.098214] RAX: 0000000000000000 RBX: ffffffffbf4d8140 RCX: 000000000000008c [ 172.099320] RDX: 0000000000000001 RSI: 00000000fffffe01 RDI: ffffffffbe944d62 [ 172.100392] RBP: ffff8b99f1e7ed00 R08: ffff8b99ffc64520 R09: 0000000000000000 [ 172.101451] R10: 000000023426d000 R11: 0000000000000002 R12: 0000000000000000 [ 172.102607] R13: 0000000000000040 R14: 0000000000000000 R15: ffff8b99f426e0e8 [ 172.103728] FS: 00007efc82b96800(0000) GS:ffff8b99ffc40000(0000) knlGS:0000000000000000 [ 172.104976] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 172.105821] CR2: 00000000000004f0 CR3: 0000000234250001 CR4: 00000000003606e0 [ 172.106981] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 172.108120] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 172.109386] Call Trace: [ 172.109808] tipc_udp_xmit.isra.18+0x1a7/0x1c0 [tipc] [ 172.110687] ? __internal_add_timer+0x1a/0x50 [ 172.111369] ? __skb_clone+0x29/0x130 [ 172.111999] tipc_bearer_xmit_skb+0x4d/0x80 [tipc] [ 172.112845] tipc_enable_bearer+0x2b9/0x3c0 [tipc] [ 172.113637] ? __nla_put+0xc/0x20 [ 172.114213] tipc_nl_bearer_enable+0xca/0x100 [tipc] [ 172.114952] genl_family_rcv_msg+0x190/0x390 [ 172.115748] genl_rcv_msg+0x47/0x90 [ 172.116287] ? __alloc_skb+0x72/0x1b0 [ 172.116898] ? genl_family_rcv_msg+0x390/0x390 [ 172.117669] netlink_rcv_skb+0x3d/0x100 [ 172.118361] genl_rcv+0x24/0x40 [ 172.119005] netlink_unicast+0x16d/0x230 [ 172.119777] netlink_sendmsg+0x1ae/0x3c0 [ 172.120525] SYSC_sendto+0xe6/0x140 [ 172.121248] ? SYSC_getsockname+0x81/0xa0 [ 172.121989] ? sock_alloc_file+0x97/0x120 [ 172.122645] ? sock_map_fd+0x3d/0x60 [ 172.123278] do_syscall_64+0x74/0x190 [ 172.123911] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 172.124716] RIP: 0033:0x7efc82d6ac6b [ 172.125368] RSP: 002b:00007fff40411ae8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 172.126486] RAX: ffffffffffffffda RBX: 0000000001dfca20 RCX: 00007efc82d6ac6b [ 172.127632] RDX: 0000000000000054 RSI: 00007fff40411b60 RDI: 0000000000000003 [ 172.128765] RBP: 00007fff40411b50 R08: 00007efc82e36000 R09: 000000000000000c [ 172.129793] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff40411b60 [ 172.130799] R13: 00007fff40412d10 R14: 000000000040bb44 R15: 0000000000000000 [ 172.131868] Code: 01 00 00 00 85 d2 0f 44 d0 e8 1f f3 fa ff 48 8b 74 24 08 4c 89 fa 48 89 df e8 9f 94 fb ff 83 e0 fd 75 35 8b 4c 24 1c 85 c9 7e 2b <49> 8b 84 24 f0 04 00 00 65 48 03 05 aa 29 68 41 48 83 40 10 01 [ 172.134773] RIP: iptunnel_xmit+0x15e/0x1e0 RSP: ffff9ab48119ba00 [ 172.135697] CR2: 00000000000004f0 [ 172.136305] ---[ end trace 27f7522ade26797f ]---
Later iptunnel_xmit_stats() called by ip(6)tunnel_xmit() thinks the dev as a tunnel device, and uses dev->tstats instead of dev->lstats. tstats' each pointer points to a bigger struct than lstats, so when tstats-
tx_bytes is
increased, other percpu variable's members could be overwritten.
syzbot has reported quite a few crashes due to fib_nh_common percpu member 'nhc_pcpu_rth_output' overwritten, call traces are like:
BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 __mkroute_output net/ipv4/route.c:2332 [inline] ip_route_output_key_hash_rcu+0x819/0x2d50 net/ipv4/route.c:2564 ip_route_output_key_hash+0x1ef/0x360 net/ipv4/route.c:2393 __ip_route_output_key include/net/route.h:125 [inline] ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2651 ip_route_output_key include/net/route.h:135 [inline] ...
or:
kasan: GPF could be caused by NULL-ptr deref or user memory access RIP: 0010:dst_dev_put+0x24/0x290 net/core/dst.c:168 <IRQ> rt_fibinfo_free_cpus net/ipv4/fib_semantics.c:200 [inline] free_fib_info_rcu+0x2e1/0x490 net/ipv4/fib_semantics.c:217 __rcu_reclaim kernel/rcu/rcu.h:240 [inline] rcu_do_batch kernel/rcu/tree.c:2437 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline] rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697 ...
The issue exists since tunnel stats update is moved to iptunnel_xmit by Commit 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()"), and here to fix it by passing a NULL tunnel dev to udp_tunnel(6)_xmit_skb so that the packets counting won't happen on dev->tstats.
Reported-by: syzbot+9d4c12bfd45a58738d0a@syzkaller.appspotmail.com Reported-by: syzbot+a9e23ea2aa21044c2798@syzkaller.appspotmail.com Reported-by: syzbot+c4c4b2bb358bb936ad7e@syzkaller.appspotmail.com Reported-by: syzbot+0290d2290a607e035ba1@syzkaller.appspotmail.com Reported-by: syzbot+a43d8d4e7e8a7a9e149e@syzkaller.appspotmail.com Reported-by: syzbot+a47c5f4c6c00fc1ed16e@syzkaller.appspotmail.com Fixes: 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()") Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
net/tipc/udp_media.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
--- a/net/tipc/udp_media.c +++ b/net/tipc/udp_media.c @@ -174,7 +174,6 @@ static int tipc_udp_xmit(struct net *net goto tx_error; }
ttl = ip4_dst_hoplimit(&rt->dst); udp_tunnel_xmit_skb(rt, ub->ubsock->sk, skb, src-skb->dev = rt->dst.dev;
ipv4.s_addr,
dst->ipv4.s_addr, 0, ttl, 0, src-
port,
@@ -193,10 +192,9 @@ static int tipc_udp_xmit(struct net *net if (err) goto tx_error; ttl = ip6_dst_hoplimit(ndst);
err = udp_tunnel6_xmit_skb(ndst, ub->ubsock->sk, skb,
ndst->dev, &src->ipv6,
&dst->ipv6, 0, ttl, 0, src-
port,
dst->port, false);
err = udp_tunnel6_xmit_skb(ndst, ub->ubsock->sk, skb,
NULL,
&src->ipv6, &dst->ipv6, 0,
ttl, 0,
src->port, dst->port,
false); #endif } return err;
On Thu, Aug 01, 2019 at 10:17:30AM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
On Tue, 2019-07-02 at 10:02 +0200, Greg Kroah-Hartman wrote:
From: Xin Long lucien.xin@gmail.com
commit c3bcde026684c62d7a2b6f626dc7cf763833875c upstream.
udp_tunnel(6)_xmit_skb() called by tipc_udp_xmit() expects a tunnel device to count packets on dev->tstats, a perpcu variable. However, TIPC is using udp tunnel with no tunnel device, and pass the lower dev, like veth device that only initializes dev->lstats(a perpcu variable) when creating it.
Hi,
This tipc patch added in 4.14.132 is triggering a crash for me, revert fixes it.
Anyone have ideas if some other commits missing in 4.14.x to make this work...?
Do you also hav a problem with 4.19.y? How about 5.2.y? If not, can you do 'git bisect' to find the patch that fixes the issue?
thanks,
greg k-h
On Fri, 2019-08-02 at 09:28 +0200, gregkh@linuxfoundation.org wrote:
On Thu, Aug 01, 2019 at 10:17:30AM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
Hi,
This tipc patch added in 4.14.132 is triggering a crash for me, revert fixes it.
Anyone have ideas if some other commits missing in 4.14.x to make this work...?
Do you also hav a problem with 4.19.y? How about 5.2.y? If not, can you do 'git bisect' to find the patch that fixes the issue?
thanks,
greg k-h
Hi, please pick this to 4.14.y and 4.19.y, tested that it fixes the crash in both:
commit 5684abf7020dfc5f0b6ba1d68eda3663871fce52 Author: Xin Long lucien.xin@gmail.com Date: Mon Jun 17 21:34:13 2019 +0800
ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL
For 5.2.y nothing is needed, these commits were in v5.2-rc6 already.
-Tommi
On Fri, Aug 02, 2019 at 11:03:15AM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
On Fri, 2019-08-02 at 09:28 +0200, gregkh@linuxfoundation.org wrote:
On Thu, Aug 01, 2019 at 10:17:30AM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
Hi,
This tipc patch added in 4.14.132 is triggering a crash for me, revert fixes it.
Anyone have ideas if some other commits missing in 4.14.x to make this work...?
Do you also hav a problem with 4.19.y? How about 5.2.y? If not, can you do 'git bisect' to find the patch that fixes the issue?
thanks,
greg k-h
Hi, please pick this to 4.14.y and 4.19.y, tested that it fixes the crash in both:
commit 5684abf7020dfc5f0b6ba1d68eda3663871fce52 Author: Xin Long lucien.xin@gmail.com Date: Mon Jun 17 21:34:13 2019 +0800
ip_tunnel: allow not to count pkts on tstats by setting skb's dev
to NULL
For 5.2.y nothing is needed, these commits were in v5.2-rc6 already.
Thanks for the info, now snuck into the latest round of -rc releases.
greg k-h
On Fri, Aug 2, 2019 at 7:03 PM Rantala, Tommi T. (Nokia - FI/Espoo) tommi.t.rantala@nokia.com wrote:
On Fri, 2019-08-02 at 09:28 +0200, gregkh@linuxfoundation.org wrote:
On Thu, Aug 01, 2019 at 10:17:30AM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
Hi,
This tipc patch added in 4.14.132 is triggering a crash for me, revert fixes it.
Anyone have ideas if some other commits missing in 4.14.x to make this work...?
Do you also hav a problem with 4.19.y? How about 5.2.y? If not, can you do 'git bisect' to find the patch that fixes the issue?
thanks,
greg k-h
Hi, please pick this to 4.14.y and 4.19.y, tested that it fixes the crash in both:
commit 5684abf7020dfc5f0b6ba1d68eda3663871fce52 Author: Xin Long lucien.xin@gmail.com Date: Mon Jun 17 21:34:13 2019 +0800
ip_tunnel: allow not to count pkts on tstats by setting skb's dev
to NULL
Thanks Rantala,
sorry for late, I was in a trip.
The patch belongs to a patchset:
https://www.spinics.net/lists/netdev/msg578674.html
So this commit should also be included:
commit 6f6a8622057c92408930c31698394fae1557b188 Author: Xin Long lucien.xin@gmail.com Date: Mon Jun 17 21:34:14 2019 +0800
ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL
Next time I think I should put "Fixes:" flag into each patch.
For 5.2.y nothing is needed, these commits were in v5.2-rc6 already.
-Tommi
On Sat, Aug 03, 2019 at 08:45:03AM +0800, Xin Long wrote:
On Fri, Aug 2, 2019 at 7:03 PM Rantala, Tommi T. (Nokia - FI/Espoo) tommi.t.rantala@nokia.com wrote:
On Fri, 2019-08-02 at 09:28 +0200, gregkh@linuxfoundation.org wrote:
On Thu, Aug 01, 2019 at 10:17:30AM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
Hi,
This tipc patch added in 4.14.132 is triggering a crash for me, revert fixes it.
Anyone have ideas if some other commits missing in 4.14.x to make this work...?
Do you also hav a problem with 4.19.y? How about 5.2.y? If not, can you do 'git bisect' to find the patch that fixes the issue?
thanks,
greg k-h
Hi, please pick this to 4.14.y and 4.19.y, tested that it fixes the crash in both:
commit 5684abf7020dfc5f0b6ba1d68eda3663871fce52 Author: Xin Long lucien.xin@gmail.com Date: Mon Jun 17 21:34:13 2019 +0800
ip_tunnel: allow not to count pkts on tstats by setting skb's dev
to NULL
Thanks Rantala,
sorry for late, I was in a trip.
The patch belongs to a patchset:
https://www.spinics.net/lists/netdev/msg578674.html
So this commit should also be included:
commit 6f6a8622057c92408930c31698394fae1557b188 Author: Xin Long lucien.xin@gmail.com Date: Mon Jun 17 21:34:14 2019 +0800
ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL
This commit is also included in the following kernel releases: 4.9.186 4.14.134 4.19.59 5.1.18 5.2
so this should all be taken care of, right?
If not, please let me know.
thanks,
greg k-h
stable-rc/linux-4.14.y boot: 128 boots: 3 failed, 124 passed with 1 offline (v4.14.131-43-g6fa18665b865)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.14... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.131-43...
Tree: stable-rc Branch: linux-4.14.y Git Describe: v4.14.131-43-g6fa18665b865 Git Commit: 6fa18665b865d4e0d0bbf1a0269e79a5f0bdc2c2 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 68 unique boards, 25 SoC families, 15 builds out of 201
Boot Failures Detected:
arm: sunxi_defconfig: gcc-8: sun7i-a20-bananapi: 1 failed lab
multi_v7_defconfig: gcc-8: sun7i-a20-bananapi: 1 failed lab
arm64: defconfig: gcc-8: rk3399-firefly: 1 failed lab
Offline Platforms:
arm:
multi_v7_defconfig: gcc-8 stih410-b2120: 1 offline lab
--- For more info write to info@kernelci.org
On Tue, 2 Jul 2019 at 13:39, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.132 release. There are 43 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 04 Jul 2019 07:59:45 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.132-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.14.132-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 3734933c2330c5fe94ed2724033965b2eb545028 git describe: v4.14.131-44-g3734933c2330 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.131-4...
No regressions (compared to build v4.14.131)
No fixes (compared to build v4.14.131)
Ran 22541 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-securebits-tests * ltp-timers-tests * perf * v4l2-compliance * libhugetlbfs * ltp-commands-tests * ltp-cve-tests * ltp-math-tests * ltp-sched-tests * ltp-syscalls-tests * network-basic-tests * spectre-meltdown-checker-test * ltp-open-posix-tests * kvm-unit-tests * kselftest-vsyscall-mode-none
On Tue, Jul 02, 2019 at 10:01:40AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.132 release. There are 43 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 04 Jul 2019 07:59:45 AM UTC. Anything received after that time might be too late.
Build results: total: 172 pass: 172 fail: 0 Qemu test results: total: 346 pass: 346 fail: 0
Guenter
On Tue, Jul 02, 2019 at 10:01:40AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.132 release. There are 43 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 04 Jul 2019 07:59:45 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.132-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled, booted, and no regressions on my system.
Cheers, Kelsey
On 7/2/19 2:01 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.132 release. There are 43 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 04 Jul 2019 07:59:45 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.132-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On 02/07/2019 09:01, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.132 release. There are 43 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 04 Jul 2019 07:59:45 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.132-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v4.14: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 24 tests: 24 pass, 0 fail
Linux version: 4.14.132-rc1-g3734933c2330 Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
linux-stable-mirror@lists.linaro.org