August 2020 - Linux-stable-mirror

Fwd: [PATCH] fs/io_uring.c: Fix uninitialized variable is referenced in io_submit_sqe

by Jens Axboe

Hi, Can you queue this up for 5.4 as well? Thanks! -------- Forwarded Message -------- Subject: [PATCH] fs/io_uring.c: Fix uninitialized variable is referenced in io_submit_sqe Date: Wed, 12 Aug 2020 23:56:44 -0700 From: Liu Yong <pkfxxxing(a)gmail.com> To: Jens Axboe <axboe(a)kernel.dk> CC: linux-block(a)vger.kernel.org, linux-kernel(a)vger.kernel.org, linux-fsdevel(a)vger.kernel.org the commit <a4d61e66ee4a> ("<io_uring: prevent re-read of sqe->opcode>") caused another vulnerability. After io_get_req(), the sqe_submit struct in req is not initialized, but the following code defaults that req->submit.opcode is available. Signed-off-by: Liu Yong <pkfxxxing(a)gmail.com> --- fs/io_uring.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index be3d595a607f..c1aaee061dae 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2559,6 +2559,7 @@ static void io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, goto err; } + memcpy(&req->submit, s, sizeof(*s)); ret = io_req_set_file(ctx, s, state, req); if (unlikely(ret)) { err_req: -- 2.17.1

4 years, 11 months

2
1
0 0

[PATCH stable-4.19] cgroup: add missing skcd->no_refcnt check in cgroup_sk_clone()

by Yang Yingliang

Add skcd->no_refcnt check which is missed when backporting ad0f75e5f57c ("cgroup: fix cgroup_sk_alloc() for sk_clone_lock()"). This patch is needed in stable-4.9, stable-4.14 and stable-4.19. Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- kernel/cgroup/cgroup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 6ae98c714edd..2a879d34bbe5 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -5957,6 +5957,8 @@ void cgroup_sk_clone(struct sock_cgroup_data *skcd) { /* Socket clone path */ if (skcd->val) { + if (skcd->no_refcnt) + return; /* * We might be cloning a socket which is left in an empty * cgroup and the cgroup might have already been rmdir'd. -- 2.25.1

4 years, 11 months

2
1
0 0

[PATCH stable (4.4 + 4.9)] gpio: fix oops resulting from calling of_get_named_gpio(NULL, ...)

by Uwe Kleine-König

This happens for the spi-imx driver when running a dt-enabled kernel on a non-dt machine on Linux 4.0. Among the still supported stable versions only 4.4 and 4.9 are affected. (However the spi-imx driver doesn't call of_get_named_gpio() since v4.8-rc1 (commit b36581df7e78 ("spi: imx: Using existing properties for chipselects")) any more, but the problem might still affect other users of of_get_named_gpio().) In 4.14-rc1 this problem is gone with commit 7eb6ce2f2723 ("gpio: Convert to using %pOF instead of full_name"). This commit however doesn't seem sensible to backport as it depends on ce4fecf1fe15 ("vsprintf: Add %p extension "%pOF" for device tree") which doesn't trivially apply to v4.4. [ 1.649453] Unable to handle kernel NULL pointer dereference at virtual address 0000000c [ 1.659270] pgd = c0004000 [ 1.662036] [0000000c] *pgd=00000000 [ 1.665919] Internal error: Oops - BUG: 5 [#1] PREEMPT ARM [ 1.671438] Modules linked in: [ 1.674552] CPU: 0 PID: 1 Comm: swapper Not tainted 4.0.0 #1 [ 1.680235] Hardware name: Eckelmann ECU01 [ 1.684361] task: c7840000 ti: c7842000 task.ti: c7842000 [ 1.689821] PC is at of_get_named_gpiod_flags+0xac/0xe0 [ 1.695104] LR is at of_find_property+0x38/0x7c [ 1.699674] pc : [<c025db2c>] lr : [<c03c5f54>] psr: a0000013 [ 1.699674] sp : c7843cc8 ip : c7843c38 fp : c7843d3c [ 1.711183] r10: c7884dc0 r9 : c7a8de10 r8 : 00000000 [ 1.716434] r7 : 00000000 r6 : 00000000 r5 : c065ef50 r4 : fffffffe [ 1.722986] r3 : 00000000 r2 : 00000000 r1 : c065ef50 r0 : fffffffe [ 1.729541] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 1.736879] Control: 0005317f Table: 80004000 DAC: 00000017 [ 1.742652] Process swapper (pid: 1, stack limit = 0xc7842190) [ 1.748510] Stack: (0xc7843cc8 to 0xc7844000) [ 1.752906] 3cc0: c7843cd4 c003ccec 00000000 00000000 00000000 00000000 [ 1.761125] 3ce0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1.769345] 3d00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 fffffdfb [ 1.777566] 3d20: 00000000 c78b4e10 c7a8dc00 000001ff c7843d4c c7843d40 c025db70 c025da90 [ 1.785788] 3d40: c7843dcc c7843d50 c02f8938 c025db70 c7843d74 c7843d60 c79bc3c0 c79bc320 [ 1.794007] 3d60: c78bb140 c065476c c7a8de10 00000000 c78b4e10 c78b4e00 00000004 00000001 [ 1.802227] 3d80: c06d25d4 00000000 c7843dbc c7843d98 c0115a68 c0112538 00000001 c78b4e10 [ 1.810448] 3da0: c78b4e18 ffffffed c78b4e10 fffffdfb c070bc80 00000000 c06d25d4 00000000 [ 1.818669] 3dc0: c7843dec c7843dd0 c02a0670 c02f8828 c78b4e10 c073fcb0 00000000 c070bc80 [ 1.826890] 3de0: c7843e14 c7843df0 c029f064 c02a0630 00000000 c78b4e10 c070bc80 c78b4e44 [ 1.835110] 3e00: 00000000 c06c8cac c7843e34 c7843e18 c029f204 c029ef70 c029f170 00000000 [ 1.843332] 3e20: c070bc80 c029f170 c7843e5c c7843e38 c029d6f4 c029f180 c785c1cc c7873c30 [ 1.851553] 3e40: c0235728 c070bc80 c7ab9720 c0701e20 c7843e6c c7843e60 c029eb74 c029d6a4 [ 1.859774] 3e60: c7843e94 c7843e70 c029e7f4 c029eb64 c065f390 c7843e80 c070bc80 c06f0718 [ 1.867998] 3e80: c7ab8d60 c06b1528 c7843eac c7843e98 c029f810 c029e728 c06f0718 c06f0718 [ 1.876220] 3ea0: c7843ebc c7843eb0 c02a04dc c029f7ac c7843ecc c7843ec0 c06c8cc4 c02a049c [ 1.884443] 3ec0: c7843f4c c7843ed0 c00089dc c06c8cbc c0109ec0 c0109d18 c780ac00 00000001 [ 1.892665] 3ee0: c7843f00 c7843ef0 c06b1544 c0238a24 c7ffca48 c054c854 c7843f4c c7843f08 [ 1.900886] 3f00: c002e7f4 c06b1538 c003d0e0 00000006 00000006 c06af1a4 00000000 c066ccb4 [ 1.909107] 3f20: c7843f4c c06ea994 00000006 c071ff20 c06b1528 c06d25e0 c06d25d4 0000008f [ 1.917327] 3f40: c7843f94 c7843f50 c06b1e6c c0008964 00000006 00000006 c06b1528 dfe48a08 [ 1.925547] 3f60: 33f73660 3fd760c5 0b5d4bfd 00000000 c0527ef0 00000000 00000000 00000000 [ 1.933768] 3f80: 00000000 00000000 c7843fac c7843f98 c0527f00 c06b1d00 c7842000 00000000 [ 1.941988] 3fa0: 00000000 c7843fb0 c0009798 c0527f00 00000000 00000000 00000000 00000000 [ 1.950206] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1.958424] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 b3cf731f fe6afeef [ 1.966617] Backtrace: [ 1.969150] [<c025da80>] (of_get_named_gpiod_flags) from [<c025db70>] (of_get_named_gpio_flags+0x10/0x24) [ 1.978744] r7:000001ff r6:c7a8dc00 r5:c78b4e10 r4:00000000 [ 1.984548] [<c025db60>] (of_get_named_gpio_flags) from [<c02f8938>] (spi_imx_probe+0x120/0x67c) [ 1.993390] [<c02f8818>] (spi_imx_probe) from [<c02a0670>] (platform_drv_probe+0x50/0xac) [ 2.001589] r10:00000000 r9:c06d25d4 r8:00000000 r7:c070bc80 r6:fffffdfb r5:c78b4e10 [ 2.009549] r4:ffffffed [ 2.012144] [<c02a0620>] (platform_drv_probe) from [<c029f064>] (driver_probe_device+0x104/0x210) [ 2.021040] r7:c070bc80 r6:00000000 r5:c073fcb0 r4:c78b4e10 [ 2.026822] [<c029ef60>] (driver_probe_device) from [<c029f204>] (__driver_attach+0x94/0x98) [ 2.035282] r8:c06c8cac r7:00000000 r6:c78b4e44 r5:c070bc80 r4:c78b4e10 r3:00000000 [ 2.043191] [<c029f170>] (__driver_attach) from [<c029d6f4>] (bus_for_each_dev+0x60/0x90) [ 2.051394] r6:c029f170 r5:c070bc80 r4:00000000 r3:c029f170 [ 2.057185] [<c029d694>] (bus_for_each_dev) from [<c029eb74>] (driver_attach+0x20/0x28) [ 2.065212] r6:c0701e20 r5:c7ab9720 r4:c070bc80 [ 2.069931] [<c029eb54>] (driver_attach) from [<c029e7f4>] (bus_add_driver+0xdc/0x1dc) [ 2.077894] [<c029e718>] (bus_add_driver) from [<c029f810>] (driver_register+0x74/0xec) [ 2.085919] r7:c06b1528 r6:c7ab8d60 r5:c06f0718 r4:c070bc80 [ 2.091705] [<c029f79c>] (driver_register) from [<c02a04dc>] (__platform_driver_register+0x50/0x64) [ 2.100774] r5:c06f0718 r4:c06f0718 [ 2.104437] [<c02a048c>] (__platform_driver_register) from [<c06c8cc4>] (spi_imx_driver_init+0x18/0x20) [ 2.113884] [<c06c8cac>] (spi_imx_driver_init) from [<c00089dc>] (do_one_initcall+0x88/0x1b0) [ 2.122459] [<c0008954>] (do_one_initcall) from [<c06b1e6c>] (kernel_init_freeable+0x17c/0x248) [ 2.131182] r10:0000008f r9:c06d25d4 r8:c06d25e0 r7:c06b1528 r6:c071ff20 r5:00000006 [ 2.139141] r4:c06ea994 [ 2.141751] [<c06b1cf0>] (kernel_init_freeable) from [<c0527f00>] (kernel_init+0x10/0xec) [ 2.149955] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0527ef0 [ 2.157909] r4:00000000 [ 2.160508] [<c0527ef0>] (kernel_init) from [<c0009798>] (ret_from_fork+0x14/0x3c) [ 2.168099] r4:00000000 r3:c7842000 [ 2.171755] Code: eb0b2dc2 e51b0020 e24bd01c e89da8f0 (e597300c) Cc: stable(a)vger.kernel.org # v4.4.x, v4.9.x Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de> --- drivers/gpio/gpiolib-of.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c index 193f15d50bba..369d3ed1df6a 100644 --- a/drivers/gpio/gpiolib-of.c +++ b/drivers/gpio/gpiolib-of.c @@ -79,7 +79,7 @@ struct gpio_desc *of_get_named_gpiod_flags(struct device_node *np, &gpiospec); if (ret) { pr_debug("%s: can't parse '%s' property of node '%s[%d]'\n", - __func__, propname, np->full_name, index); + __func__, propname, np ? np->full_name : NULL, index); return ERR_PTR(ret); } -- 2.28.0

4 years, 11 months

2
1
0 0

[PATCH][for v4.4 only] udp: drop corrupt packets earlier to avoid data corruption

by Dexuan Cui

The v4.4 stable kernel lacks this bugfix: commit 327868212381 ("make skb_copy_datagram_msg() et.al. preserve ->msg_iter on error"). As a result, the v4.4 kernel can deliver corrupt data to the application when a corrupt UDP packet is closely followed by a valid UDP packet: the same invocation of the recvmsg() syscall can deliver the corrupt packet's UDP payload to the application with the UDP payload length and the "from IP/Port" of the valid packet. Details: For a UDP packet longer than 76 bytes (see the v5.8-rc6 kernel's include/linux/skbuff.h:3951), Linux delays the UDP checksum verification until the application invokes the syscall recvmsg(). In the recvmsg() syscall handler, while Linux is copying the UDP payload to the application's memory, it calculates the UDP checksum. If the calculated checksum doesn't match the received checksum, Linux drops the corrupt UDP packet, and then starts to process the next packet (if any), and if the next packet is valid (i.e. the checksum is correct), Linux will copy the valid UDP packet's payload to the application's receiver buffer. The bug is: before Linux starts to copy the valid UDP packet, the data structure used to track how many more bytes should be copied to the application memory is not reset to what it was when the application just entered the kernel by the syscall! Consequently, only a small portion or none of the valid packet's payload is copied to the application's receive buffer, and later when the application exits from the kernel, actually most of the application's receive buffer contains the payload of the corrupt packet while recvmsg() returns the length of the UDP payload of the valid packet. For the mainline kernel, the bug was fixed in commit 327868212381, but unluckily the bugfix is only backported to v4.9+. It turns out backporting 327868212381 to v4.4 means that some supporting patches must be backported first, so the overall changes seem too big, so the alternative is performs the csum validation earlier and drops the corrupt packets earlier. Signed-off-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Dexuan Cui <decui(a)microsoft.com> --- net/ipv4/udp.c | 3 +-- net/ipv6/udp.c | 6 ++---- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index bb30699..49ab587 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1589,8 +1589,7 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) } } - if (rcu_access_pointer(sk->sk_filter) && - udp_lib_checksum_complete(skb)) + if (udp_lib_checksum_complete(skb)) goto csum_error; if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) { diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 73f1112..2d6703d 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -686,10 +686,8 @@ int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) } } - if (rcu_access_pointer(sk->sk_filter)) { - if (udp_lib_checksum_complete(skb)) - goto csum_error; - } + if (udp_lib_checksum_complete(skb)) + goto csum_error; if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) { UDP6_INC_STATS_BH(sock_net(sk), -- 1.8.3.1

4 years, 11 months

3
4
0 0

[PATCH 4/6] can: j1939: socket: j1939_sk_bind(): make sure ml_priv is allocated

by Marc Kleine-Budde

From: Oleksij Rempel <o.rempel(a)pengutronix.de> This patch adds check to ensure that the struct net_device::ml_priv is allocated, as it is used later by the j1939 stack. The allocation is done by all mainline CAN network drivers, but when using bond or team devices this is not the case. Bail out if no ml_priv is allocated. Reported-by: syzbot+f03d384f3455d28833eb(a)syzkaller.appspotmail.com Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Cc: linux-stable <stable(a)vger.kernel.org> # >= v5.4 Signed-off-by: Oleksij Rempel <o.rempel(a)pengutronix.de> Link: https://lore.kernel.org/r/20200807105200.26441-4-o.rempel@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de> --- net/can/j1939/socket.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c index ad973370de12..b93876c57fc4 100644 --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -467,6 +467,14 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len) goto out_release_sock; } + if (!ndev->ml_priv) { + netdev_warn_once(ndev, + "No CAN mid layer private allocated, please fix your driver and use alloc_candev()!\n"); + dev_put(ndev); + ret = -ENODEV; + goto out_release_sock; + } + priv = j1939_netdev_start(ndev); dev_put(ndev); if (IS_ERR(priv)) { -- 2.28.0

4 years, 11 months

1
0
0 0

[PATCH 3/6] can: j1939: transport: j1939_session_tx_dat(): fix use-after-free read in j1939_tp_txtimer()

by Marc Kleine-Budde

From: Oleksij Rempel <o.rempel(a)pengutronix.de> The current stack implementation do not support ECTS requests of not aligned TP sized blocks. If ECTS will request a block with size and offset spanning two TP blocks, this will cause memcpy() to read beyond the queued skb (which does only contain one TP sized block). Sometimes KASAN will detect this read if the memory region beyond the skb was previously allocated and freed. In other situations it will stay undetected. The ETP transfer in any case will be corrupted. This patch adds a sanity check to avoid this kind of read and abort the session with error J1939_XTP_ABORT_ECTS_TOO_BIG. Reported-by: syzbot+5322482fe520b02aea30(a)syzkaller.appspotmail.com Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Cc: linux-stable <stable(a)vger.kernel.org> # >= v5.4 Signed-off-by: Oleksij Rempel <o.rempel(a)pengutronix.de> Link: https://lore.kernel.org/r/20200807105200.26441-3-o.rempel@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de> --- net/can/j1939/transport.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c index b135c5e2a86e..30957c9a8eb7 100644 --- a/net/can/j1939/transport.c +++ b/net/can/j1939/transport.c @@ -787,6 +787,18 @@ static int j1939_session_tx_dat(struct j1939_session *session) if (len > 7) len = 7; + if (offset + len > se_skb->len) { + netdev_err_once(priv->ndev, + "%s: 0x%p: requested data outside of queued buffer: offset %i, len %i, pkt.tx: %i\n", + __func__, session, skcb->offset, se_skb->len , session->pkt.tx); + return -EOVERFLOW; + } + + if (!len) { + ret = -ENOBUFS; + break; + } + memcpy(&dat[1], &tpdat[offset], len); ret = j1939_tp_tx_dat(session, dat, len + 1); if (ret < 0) { @@ -1120,6 +1132,9 @@ static enum hrtimer_restart j1939_tp_txtimer(struct hrtimer *hrtimer) * cleanup including propagation of the error to user space. */ break; + case -EOVERFLOW: + j1939_session_cancel(session, J1939_XTP_ABORT_ECTS_TOO_BIG); + break; case 0: session->tx_retry = 0; break; -- 2.28.0

4 years, 11 months

1
0
0 0

[PATCH v4] ext4: fix direct I/O read error

by Jiang Ying

This patch is used to fix ext4 direct I/O read error when the read size is not aligned with block size. Then, I will use a test to explain the error. (1) Make a file that is not aligned with block size: $dd if=/dev/zero of=./test.jar bs=1000 count=3 (2) I wrote a source file named "direct_io_read_file.c" as following: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/file.h> #include <sys/types.h> #include <sys/stat.h> #include <string.h> #define BUF_SIZE 1024 int main() { int fd; int ret; unsigned char *buf; ret = posix_memalign((void **)&buf, 512, BUF_SIZE); if (ret) { perror("posix_memalign failed"); exit(1); } fd = open("./test.jar", O_RDONLY | O_DIRECT, 0755); if (fd < 0){ perror("open ./test.jar failed"); exit(1); } do { ret = read(fd, buf, BUF_SIZE); printf("ret=%d\n",ret); if (ret < 0) { perror("write test.jar failed"); } } while (ret > 0); free(buf); close(fd); } (3) Compile the source file: $gcc direct_io_read_file.c -D_GNU_SOURCE (4) Run the test program: $./a.out The result is as following: ret=1024 ret=1024 ret=952 ret=-1 write test.jar failed: Invalid argument. I have tested this program on XFS filesystem, XFS does not have this problem, because XFS use iomap_dio_rw() to do direct I/O read. And the comparing between read offset and file size is done in iomap_dio_rw(), the code is as following: if (pos < size) { retval = filemap_write_and_wait_range(mapping, pos, pos + iov_length(iov, nr_segs) - 1); if (!retval) { retval = mapping->a_ops->direct_IO(READ, iocb, iov, pos, nr_segs); } ... } ...only when "pos < size", direct I/O can be done, or 0 will be return. I have tested the fix patch on Ext4, it is up to the mustard of EINVAL in man2(read) as following: #include <unistd.h> ssize_t read(int fd, void *buf, size_t count); EINVAL fd is attached to an object which is unsuitable for reading; or the file was opened with the O_DIRECT flag, and either the address specified in buf, the value specified in count, or the current file offset is not suitably aligned. So I think this patch can be applied to fix ext4 direct I/O error. However Ext4 introduces direct I/O read using iomap infrastructure on kernel 5.5, the patch is commit <b1b4705d54ab> ("ext4: introduce direct I/O read using iomap infrastructure"), then Ext4 will be the same as XFS, they all use iomap_dio_rw() to do direct I/O read. So this problem does not exist on kernel 5.5 for Ext4. >From above description, we can see this problem exists on all the kernel versions between kernel 3.14 and kernel 5.4. It will cause the Applications to fail to read. For example, when the search service downloads a new full index file, the search engine is loading the previous index file and is processing the search request, it can not use buffer io that may squeeze the previous index file in use from pagecache, so the serch service must use direct I/O read. Please apply this patch on these kernel versions, or please use the method on kernel 5.5 to fix this problem. Fixes: 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read") Reviewed-by: Jan Kara <jack(a)suse.cz> Co-developed-by: Wang Long <wanglong19(a)meituan.com> Signed-off-by: Wang Long <wanglong19(a)meituan.com> Signed-off-by: Jiang Ying <jiangying8582(a)126.com> --- fs/ext4/inode.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 516faa2..a66b0ac 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3821,6 +3821,11 @@ static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter) struct inode *inode = mapping->host; size_t count = iov_iter_count(iter); ssize_t ret; + loff_t offset = iocb->ki_pos; + loff_t size = i_size_read(inode); + + if (offset >= size) + return 0; /* * Shared inode_lock is enough for us - it protects against concurrent -- 1.8.3.1

4 years, 11 months

3
2
0 0

[PATCH] vdpasim: protect concurrent access to iommu iotlb

by Jason Wang

From: Max Gurtovoy <maxg(a)mellanox.com> Iommu iotlb can be accessed by different cores for performing IO using multiple virt queues. Add a spinlock to synchronize iotlb accesses. This could be easily reproduced when using more than 1 pktgen threads to inject traffic to vdpa simulator. Fixes: 2c53d0f64c06f("vdpasim: vDPA device simulator") Cc: stable(a)vger.kernel.org Signed-off-by: Max Gurtovoy <maxg(a)mellanox.com> Signed-off-by: Jason Wang <jasowang(a)redhat.com> --- drivers/vdpa/vdpa_sim/vdpa_sim.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index a9bc5e0fb353..5b5725d951ce 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -70,6 +70,8 @@ struct vdpasim { u32 status; u32 generation; u64 features; + /* spinlock to synchronize iommu table */ + spinlock_t iommu_lock; }; static struct vdpasim *vdpasim_dev; @@ -118,7 +120,9 @@ static void vdpasim_reset(struct vdpasim *vdpasim) for (i = 0; i < VDPASIM_VQ_NUM; i++) vdpasim_vq_reset(&vdpasim->vqs[i]); + spin_lock(&vdpasim->iommu_lock); vhost_iotlb_reset(vdpasim->iommu); + spin_unlock(&vdpasim->iommu_lock); vdpasim->features = 0; vdpasim->status = 0; @@ -236,8 +240,10 @@ static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page, /* For simplicity, use identical mapping to avoid e.g iova * allocator. */ + spin_lock(&vdpasim->iommu_lock); ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1, pa, dir_to_perm(dir)); + spin_unlock(&vdpasim->iommu_lock); if (ret) return DMA_MAPPING_ERROR; @@ -251,8 +257,10 @@ static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr, struct vdpasim *vdpasim = dev_to_sim(dev); struct vhost_iotlb *iommu = vdpasim->iommu; + spin_lock(&vdpasim->iommu_lock); vhost_iotlb_del_range(iommu, (u64)dma_addr, (u64)dma_addr + size - 1); + spin_unlock(&vdpasim->iommu_lock); } static void *vdpasim_alloc_coherent(struct device *dev, size_t size, @@ -264,9 +272,10 @@ static void *vdpasim_alloc_coherent(struct device *dev, size_t size, void *addr = kmalloc(size, flag); int ret; - if (!addr) + spin_lock(&vdpasim->iommu_lock); + if (!addr) { *dma_addr = DMA_MAPPING_ERROR; - else { + } else { u64 pa = virt_to_phys(addr); ret = vhost_iotlb_add_range(iommu, (u64)pa, @@ -279,6 +288,7 @@ static void *vdpasim_alloc_coherent(struct device *dev, size_t size, } else *dma_addr = (dma_addr_t)pa; } + spin_unlock(&vdpasim->iommu_lock); return addr; } @@ -290,8 +300,11 @@ static void vdpasim_free_coherent(struct device *dev, size_t size, struct vdpasim *vdpasim = dev_to_sim(dev); struct vhost_iotlb *iommu = vdpasim->iommu; + spin_lock(&vdpasim->iommu_lock); vhost_iotlb_del_range(iommu, (u64)dma_addr, (u64)dma_addr + size - 1); + spin_unlock(&vdpasim->iommu_lock); + kfree(phys_to_virt((uintptr_t)dma_addr)); } @@ -532,6 +545,7 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, u64 start = 0ULL, last = 0ULL - 1; int ret; + spin_lock(&vdpasim->iommu_lock); vhost_iotlb_reset(vdpasim->iommu); for (map = vhost_iotlb_itree_first(iotlb, start, last); map; @@ -541,10 +555,12 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, if (ret) goto err; } + spin_unlock(&vdpasim->iommu_lock); return 0; err: vhost_iotlb_reset(vdpasim->iommu); + spin_unlock(&vdpasim->iommu_lock); return ret; } @@ -552,16 +568,23 @@ static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size, u64 pa, u32 perm) { struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + int ret; - return vhost_iotlb_add_range(vdpasim->iommu, iova, - iova + size - 1, pa, perm); + spin_lock(&vdpasim->iommu_lock); + ret = vhost_iotlb_add_range(vdpasim->iommu, iova, iova + size - 1, pa, + perm); + spin_unlock(&vdpasim->iommu_lock); + + return ret; } static int vdpasim_dma_unmap(struct vdpa_device *vdpa, u64 iova, u64 size) { struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + spin_lock(&vdpasim->iommu_lock); vhost_iotlb_del_range(vdpasim->iommu, iova, iova + size - 1); + spin_unlock(&vdpasim->iommu_lock); return 0; } -- 2.20.1

4 years, 11 months

3
5
0 0

[withdrawn] bootconfig-fix-off-by-one-in-xbc_node_compose_key_after.patch removed from -mm tree

by Andrew Morton

The patch titled Subject: bootconfig: fix off-by-one in xbc_node_compose_key_after() has been removed from the -mm tree. Its filename was bootconfig-fix-off-by-one-in-xbc_node_compose_key_after.patch This patch was dropped because it was withdrawn ------------------------------------------------------ From: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Subject: bootconfig: fix off-by-one in xbc_node_compose_key_after() While reviewing some patches for bootconfig, I noticed the following code in xbc_node_compose_key_after(): ret = snprintf(buf, size, "%s%s", xbc_node_get_data(node), depth ? "." : ""); if (ret < 0) return ret; if (ret > size) { size = 0; } else { size -= ret; buf += ret; } But snprintf() returns the number of bytes that would be written, not the number of bytes that are written (ignoring the nul terminator). This means that if the number of non null bytes written were to equal size, then the nul byte, which snprintf() always adds, will overwrite that last byte. ret = snprintf(buf, 5, "hello"); printf("buf = '%s' ", buf); printf("ret = %d ", ret); produces: buf = 'hell' ret = 5 The string was truncated without ret being greater than 5. Test (ret >= size) for overwrite. Link: http://lkml.kernel.org/r/20200813183050.029a6003@oasis.local.home Fixes: 76db5a27a827c ("bootconfig: Add Extra Boot Config support") Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Cc: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/bootconfig.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/lib/bootconfig.c~bootconfig-fix-off-by-one-in-xbc_node_compose_key_after +++ a/lib/bootconfig.c @@ -248,7 +248,7 @@ int __init xbc_node_compose_key_after(st depth ? "." : ""); if (ret < 0) return ret; - if (ret > size) { + if (ret >= size) { size = 0; } else { size -= ret; _ Patches currently in -mm which might be from rostedt(a)goodmis.org are

4 years, 11 months

1
0
0 0

+ mm-page_alloc-fix-core-hung-in-free_pcppages_bulk.patch added to -mm tree

by Andrew Morton

The patch titled Subject: mm, page_alloc: fix core hung in free_pcppages_bulk() has been added to the -mm tree. Its filename is mm-page_alloc-fix-core-hung-in-free_pcppages_bulk.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-fix-core-hung-in-fre… and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-fix-core-hung-in-fre… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Charan Teja Reddy <charante(a)codeaurora.org> Subject: mm, page_alloc: fix core hung in free_pcppages_bulk() The following race is observed with the repeated online, offline and a delay between two successive online of memory blocks of movable zone. P1 P2 Online the first memory block in the movable zone. The pcp struct values are initialized to default values,i.e., pcp->high = 0 & pcp->batch = 1. Allocate the pages from the movable zone. Try to Online the second memory block in the movable zone thus it entered the online_pages() but yet to call zone_pcp_update(). This process is entered into the exit path thus it tries to release the order-0 pages to pcp lists through free_unref_page_commit(). As pcp->high = 0, pcp->count = 1 proceed to call the function free_pcppages_bulk(). Update the pcp values thus the new pcp values are like, say, pcp->high = 378, pcp->batch = 63. Read the pcp's batch value using READ_ONCE() and pass the same to free_pcppages_bulk(), pcp values passed here are, batch = 63, count = 1. Since num of pages in the pcp lists are less than ->batch, then it will stuck in while(list_empty(list)) loop with interrupts disabled thus a core hung. Avoid this by ensuring free_pcppages_bulk() is called with proper count of pcp list pages. The mentioned race is some what easily reproducible without [1] because pcp's are not updated for the first memory block online and thus there is a enough race window for P2 between alloc+free and pcp struct values update through onlining of second memory block. With [1], the race still exists but it is very narrow as we update the pcp struct values for the first memory block online itself. This is not limited to the movable zone, it could also happen in cases with the normal zone (e.g., hotplug to a node that only has DMA memory, or no other memory yet). [1]: https://patchwork.kernel.org/patch/11696389/ Link: http://lkml.kernel.org/r/1597150703-19003-1-git-send-email-charante@codeaur… Fixes: 5f8dcc21211a ("page-allocator: split per-cpu list into one-list-per-migrate-type") Signed-off-by: Charan Teja Reddy <charante(a)codeaurora.org> Acked-by: David Hildenbrand <david(a)redhat.com> Acked-by: David Rientjes <rientjes(a)google.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Vinayak Menon <vinmenon(a)codeaurora.org> Cc: <stable(a)vger.kernel.org> [2.6+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 5 +++++ 1 file changed, 5 insertions(+) --- a/mm/page_alloc.c~mm-page_alloc-fix-core-hung-in-free_pcppages_bulk +++ a/mm/page_alloc.c @@ -1301,6 +1301,11 @@ static void free_pcppages_bulk(struct zo struct page *page, *tmp; LIST_HEAD(head); + /* + * Ensure proper count is passed which otherwise would stuck in the + * below while (list_empty(list)) loop. + */ + count = min(pcp->count, count); while (count) { struct list_head *list; _ Patches currently in -mm which might be from charante(a)codeaurora.org are mm-page_alloc-fix-core-hung-in-free_pcppages_bulk.patch

4 years, 11 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror August 2020