This is the start of the stable review cycle for the 3.16.69 release. There are 10 patches in this series, which will be posted as responses to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Jun 20 14:27:59 UTC 2019. Anything received after that time might be too late.
All the patches have also been committed to the linux-3.16.y-rc branch of https://git.kernel.org/pub/scm/linux/kernel/git/bwh/linux-stable-rc.git . A shortlog and diffstat can be found below.
Ben.
-------------
Dan Carpenter (1): drivers/virt/fsl_hypervisor.c: prevent integer overflow in ioctl [6a024330650e24556b8a18cc654ad00cfecf6c6c]
Eric Dumazet (4): tcp: add tcp_min_snd_mss sysctl [5f3e2bf008c2221478101ee72f5cb4654b9fc363] tcp: enforce tcp_min_snd_mss in tcp_mtu_probing() [967c05aee439e6e5d7d805e195b3a20ef5c433d6] tcp: limit payload size of sacked skbs [3b4929f65b0d8249f19a50245cd88ed1a2f78cff] tcp: tcp_fragment() should apply sane memory limits [f070ef2ac66716357066b683fb0baf55f8191a2e]
Jason Yan (1): scsi: megaraid_sas: return error when create DMA pool failed [bcf3b67d16a4c8ffae0aa79de5853435e683945c]
Jiri Kosina (1): mm/mincore.c: make mincore() more conservative [134fca9063ad4851de767d1768180e5dede9a881]
Oleg Nesterov (1): mm: introduce vma_is_anonymous(vma) helper [b5330628546616af14ff23075fbf8d4ad91f6e25]
Sriram Rajagopalan (1): ext4: zero out the unused memory region in the extent tree block [592acbf16821288ecdc4192c47e3774a4c48bb64]
Young Xiao (1): Bluetooth: hidp: fix buffer overflow [a1616a5ac99ede5d605047a9012481ce7ff18b16]
Documentation/networking/ip-sysctl.txt | 8 ++++++++ Makefile | 4 ++-- drivers/scsi/megaraid/megaraid_sas_base.c | 1 + drivers/virt/fsl_hypervisor.c | 3 +++ fs/ext4/extents.c | 17 +++++++++++++++-- include/linux/mm.h | 5 +++++ include/linux/tcp.h | 3 +++ include/net/tcp.h | 3 +++ include/uapi/linux/snmp.h | 1 + mm/memory.c | 8 ++++---- mm/mincore.c | 21 +++++++++++++++++++++ net/bluetooth/hidp/sock.c | 1 + net/ipv4/proc.c | 1 + net/ipv4/sysctl_net_ipv4.c | 11 +++++++++++ net/ipv4/tcp.c | 1 + net/ipv4/tcp_input.c | 27 ++++++++++++++++++++++----- net/ipv4/tcp_output.c | 9 +++++++-- net/ipv4/tcp_timer.c | 1 + 18 files changed, 110 insertions(+), 15 deletions(-)
3.16.69-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 967c05aee439e6e5d7d805e195b3a20ef5c433d6 upstream.
If mtu probing is enabled tcp_mtu_probing() could very well end up with a too small MSS.
Use the new sysctl tcp_min_snd_mss to make sure MSS search is performed in an acceptable range.
CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Jonathan Lemon jonathan.lemon@gmail.com Cc: Jonathan Looney jtl@netflix.com Acked-by: Neal Cardwell ncardwell@google.com Cc: Yuchung Cheng ycheng@google.com Cc: Tyler Hicks tyhicks@canonical.com Cc: Bruce Curtis brucec@netflix.com Signed-off-by: David S. Miller davem@davemloft.net [Salvatore Bonaccorso: Backport for context changes in 4.9.168] [bwh: Backported to 3.16: The sysctl is global] Signed-off-by: Ben Hutchings ben@decadent.org.uk --- net/ipv4/tcp_timer.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -113,6 +113,7 @@ static void tcp_mtu_probing(struct inet_ mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1; mss = min(sysctl_tcp_base_mss, mss); mss = max(mss, 68 - tp->tcp_header_len); + mss = max(mss, sysctl_tcp_min_snd_mss); icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss); tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); }
3.16.69-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit f070ef2ac66716357066b683fb0baf55f8191a2e upstream.
Jonathan Looney reported that a malicious peer can force a sender to fragment its retransmit queue into tiny skbs, inflating memory usage and/or overflow 32bit counters.
TCP allows an application to queue up to sk_sndbuf bytes, so we need to give some allowance for non malicious splitting of retransmit queue.
A new SNMP counter is added to monitor how many times TCP did not allow to split an skb if the allowance was exceeded.
Note that this counter might increase in the case applications use SO_SNDBUF socket option to lower sk_sndbuf.
CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the socket is already using more than half the allowed space
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Jonathan Looney jtl@netflix.com Acked-by: Neal Cardwell ncardwell@google.com Acked-by: Yuchung Cheng ycheng@google.com Reviewed-by: Tyler Hicks tyhicks@canonical.com Cc: Bruce Curtis brucec@netflix.com Cc: Jonathan Lemon jonathan.lemon@gmail.com Signed-off-by: David S. Miller davem@davemloft.net [Salvatore Bonaccorso: Adjust context for backport to 4.9.168] [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk --- include/uapi/linux/snmp.h | 1 + net/ipv4/proc.c | 1 + net/ipv4/tcp_output.c | 5 +++++ 3 files changed, 7 insertions(+)
--- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -265,6 +265,7 @@ enum LINUX_MIB_TCPWANTZEROWINDOWADV, /* TCPWantZeroWindowAdv */ LINUX_MIB_TCPSYNRETRANS, /* TCPSynRetrans */ LINUX_MIB_TCPORIGDATASENT, /* TCPOrigDataSent */ + LINUX_MIB_TCPWQUEUETOOBIG, /* TCPWqueueTooBig */ __LINUX_MIB_MAX };
--- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -286,6 +286,7 @@ static const struct snmp_mib snmp4_net_l SNMP_MIB_ITEM("TCPWantZeroWindowAdv", LINUX_MIB_TCPWANTZEROWINDOWADV), SNMP_MIB_ITEM("TCPSynRetrans", LINUX_MIB_TCPSYNRETRANS), SNMP_MIB_ITEM("TCPOrigDataSent", LINUX_MIB_TCPORIGDATASENT), + SNMP_MIB_ITEM("TCPWqueueTooBig", LINUX_MIB_TCPWQUEUETOOBIG), SNMP_MIB_SENTINEL };
--- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1090,6 +1090,11 @@ int tcp_fragment(struct sock *sk, struct if (nsize < 0) nsize = 0;
+ if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) { + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG); + return -ENOMEM; + } + if (skb_unclone(skb, gfp)) return -ENOMEM;
Hi Ben,
On 6/18/2019 7:28 AM, Ben Hutchings wrote:
3.16.69-rc1 review patch. If anyone has any objections, please let me know.
From: Eric Dumazet edumazet@google.com
commit f070ef2ac66716357066b683fb0baf55f8191a2e upstream.
Jonathan Looney reported that a malicious peer can force a sender to fragment its retransmit queue into tiny skbs, inflating memory usage and/or overflow 32bit counters.
TCP allows an application to queue up to sk_sndbuf bytes, so we need to give some allowance for non malicious splitting of retransmit queue.
A new SNMP counter is added to monitor how many times TCP did not allow to split an skb if the allowance was exceeded.
Note that this counter might increase in the case applications use SO_SNDBUF socket option to lower sk_sndbuf.
CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the socket is already using more than half the allowed space
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Jonathan Looney jtl@netflix.com Acked-by: Neal Cardwell ncardwell@google.com Acked-by: Yuchung Cheng ycheng@google.com Reviewed-by: Tyler Hicks tyhicks@canonical.com Cc: Bruce Curtis brucec@netflix.com Cc: Jonathan Lemon jonathan.lemon@gmail.com Signed-off-by: David S. Miller davem@davemloft.net [Salvatore Bonaccorso: Adjust context for backport to 4.9.168] [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk
Don't we also need this patch to be backported:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
Thanks!
On Mon, 2019-07-01 at 19:51 -0700, Florian Fainelli wrote:
Hi Ben,
On 6/18/2019 7:28 AM, Ben Hutchings wrote:
3.16.69-rc1 review patch. If anyone has any objections, please let me know.
From: Eric Dumazet edumazet@google.com
commit f070ef2ac66716357066b683fb0baf55f8191a2e upstream.
[...]
Don't we also need this patch to be backported:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
I've queued that up for the next 3.16 update, thanks.
Ben.
3.16.69-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Sriram Rajagopalan sriramr@arista.com
commit 592acbf16821288ecdc4192c47e3774a4c48bb64 upstream.
This commit zeroes out the unused memory region in the buffer_head corresponding to the extent metablock after writing the extent header and the corresponding extent node entries.
This is done to prevent random uninitialized data from getting into the filesystem when the extent block is synced.
This fixes CVE-2019-11833.
Signed-off-by: Sriram Rajagopalan sriramr@arista.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Ben Hutchings ben@decadent.org.uk --- fs/ext4/extents.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
--- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1016,6 +1016,7 @@ static int ext4_ext_split(handle_t *hand __le32 border; ext4_fsblk_t *ablocks = NULL; /* array of allocated blocks */ int err = 0; + size_t ext_size = 0;
/* make decision: where to split? */ /* FIXME: now decision is simplest: at current extent */ @@ -1107,6 +1108,10 @@ static int ext4_ext_split(handle_t *hand le16_add_cpu(&neh->eh_entries, m); }
+ /* zero out unused area in the extent block */ + ext_size = sizeof(struct ext4_extent_header) + + sizeof(struct ext4_extent) * le16_to_cpu(neh->eh_entries); + memset(bh->b_data + ext_size, 0, inode->i_sb->s_blocksize - ext_size); ext4_extent_block_csum_set(inode, neh); set_buffer_uptodate(bh); unlock_buffer(bh); @@ -1186,6 +1191,11 @@ static int ext4_ext_split(handle_t *hand sizeof(struct ext4_extent_idx) * m); le16_add_cpu(&neh->eh_entries, m); } + /* zero out unused area in the extent block */ + ext_size = sizeof(struct ext4_extent_header) + + (sizeof(struct ext4_extent) * le16_to_cpu(neh->eh_entries)); + memset(bh->b_data + ext_size, 0, + inode->i_sb->s_blocksize - ext_size); ext4_extent_block_csum_set(inode, neh); set_buffer_uptodate(bh); unlock_buffer(bh); @@ -1251,6 +1261,7 @@ static int ext4_ext_grow_indepth(handle_ struct buffer_head *bh; ext4_fsblk_t newblock; int err = 0; + size_t ext_size = 0;
newblock = ext4_ext_new_meta_block(handle, inode, NULL, newext, &err, flags); @@ -1268,9 +1279,11 @@ static int ext4_ext_grow_indepth(handle_ goto out; }
+ ext_size = sizeof(EXT4_I(inode)->i_data); /* move top-level index/leaf into new block */ - memmove(bh->b_data, EXT4_I(inode)->i_data, - sizeof(EXT4_I(inode)->i_data)); + memmove(bh->b_data, EXT4_I(inode)->i_data, ext_size); + /* zero out unused area in the extent block */ + memset(bh->b_data + ext_size, 0, inode->i_sb->s_blocksize - ext_size);
/* set size of new block */ neh = ext_block_hdr(bh);
3.16.69-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Oleg Nesterov oleg@redhat.com
commit b5330628546616af14ff23075fbf8d4ad91f6e25 upstream.
special_mapping_fault() is absolutely broken. It seems it was always wrong, but this didn't matter until vdso/vvar started to use more than one page.
And after this change vma_is_anonymous() becomes really trivial, it simply checks vm_ops == NULL. However, I do think the helper makes sense. There are a lot of ->vm_ops != NULL checks, the helper makes the caller's code more understandable (self-documented) and this is more grep-friendly.
This patch (of 3):
Preparation. Add the new simple helper, vma_is_anonymous(vma), and change handle_pte_fault() to use it. It will have more users.
The name is not accurate, say a hpet_mmap()'ed vma is not anonymous. Perhaps it should be named vma_has_fault() instead. But it matches the logic in mmap.c/memory.c (see next changes). "True" just means that a page fault will use do_anonymous_page().
Signed-off-by: Oleg Nesterov oleg@redhat.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Andy Lutomirski luto@kernel.org Cc: Hugh Dickins hughd@google.com Cc: Pavel Emelyanov xemul@parallels.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org [bwh: Backported to 3.16 as dependency of "mm/mincore.c: make mincore() more conservative"; adjusted context] Signed-off-by: Ben Hutchings ben@decadent.org.uk --- include/linux/mm.h | 5 +++++ mm/memory.c | 8 ++++---- 2 files changed, 9 insertions(+), 4 deletions(-)
--- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1241,6 +1241,11 @@ int get_cmdline(struct task_struct *task
int vma_is_stack_for_task(struct vm_area_struct *vma, struct task_struct *t);
+static inline bool vma_is_anonymous(struct vm_area_struct *vma) +{ + return !vma->vm_ops; +} + extern unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len, --- a/mm/memory.c +++ b/mm/memory.c @@ -3105,12 +3105,12 @@ static int handle_pte_fault(struct mm_st entry = *pte; if (!pte_present(entry)) { if (pte_none(entry)) { - if (vma->vm_ops) + if (vma_is_anonymous(vma)) + return do_anonymous_page(mm, vma, address, + pte, pmd, flags); + else return do_fault(mm, vma, address, pte, pmd, flags, entry); - - return do_anonymous_page(mm, vma, address, - pte, pmd, flags); } return do_swap_page(mm, vma, address, pte, pmd, flags, entry);
3.16.69-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Young Xiao YangX92@hotmail.com
commit a1616a5ac99ede5d605047a9012481ce7ff18b16 upstream.
Struct ca is copied from userspace. It is not checked whether the "name" field is NULL terminated, which allows local users to obtain potentially sensitive information from kernel stack memory, via a HIDPCONNADD command.
This vulnerability is similar to CVE-2011-1079.
Signed-off-by: Young Xiao YangX92@hotmail.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Ben Hutchings ben@decadent.org.uk --- net/bluetooth/hidp/sock.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/bluetooth/hidp/sock.c +++ b/net/bluetooth/hidp/sock.c @@ -76,6 +76,7 @@ static int hidp_sock_ioctl(struct socket sockfd_put(csock); return err; } + ca.name[sizeof(ca.name)-1] = 0;
err = hidp_connection_add(&ca, csock, isock); if (!err && copy_to_user(argp, &ca, sizeof(ca)))
3.16.69-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Jason Yan yanaijie@huawei.com
commit bcf3b67d16a4c8ffae0aa79de5853435e683945c upstream.
when create DMA pool for cmd frames failed, we should return -ENOMEM, instead of 0. In some case in:
megasas_init_adapter_fusion()
-->megasas_alloc_cmds() -->megasas_create_frame_pool create DMA pool failed, --> megasas_free_cmds() [1]
-->megasas_alloc_cmds_fusion() failed, then goto fail_alloc_cmds. -->megasas_free_cmds() [2]
we will call megasas_free_cmds twice, [1] will kfree cmd_list, [2] will use cmd_list.it will cause a problem:
Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ffffffc000f70000 [00000000] *pgd=0000001fbf893003, *pud=0000001fbf893003, *pmd=0000001fbf894003, *pte=006000006d000707 Internal error: Oops: 96000005 [#1] SMP Modules linked in: CPU: 18 PID: 1 Comm: swapper/0 Not tainted task: ffffffdfb9290000 ti: ffffffdfb923c000 task.ti: ffffffdfb923c000 PC is at megasas_free_cmds+0x30/0x70 LR is at megasas_free_cmds+0x24/0x70 ... Call trace: [<ffffffc0005b779c>] megasas_free_cmds+0x30/0x70 [<ffffffc0005bca74>] megasas_init_adapter_fusion+0x2f4/0x4d8 [<ffffffc0005b926c>] megasas_init_fw+0x2dc/0x760 [<ffffffc0005b9ab0>] megasas_probe_one+0x3c0/0xcd8 [<ffffffc0004a5abc>] local_pci_probe+0x4c/0xb4 [<ffffffc0004a5c40>] pci_device_probe+0x11c/0x14c [<ffffffc00053a5e4>] driver_probe_device+0x1ec/0x430 [<ffffffc00053a92c>] __driver_attach+0xa8/0xb0 [<ffffffc000538178>] bus_for_each_dev+0x74/0xc8 [<ffffffc000539e88>] driver_attach+0x28/0x34 [<ffffffc000539a18>] bus_add_driver+0x16c/0x248 [<ffffffc00053b234>] driver_register+0x6c/0x138 [<ffffffc0004a5350>] __pci_register_driver+0x5c/0x6c [<ffffffc000ce3868>] megasas_init+0xc0/0x1a8 [<ffffffc000082a58>] do_one_initcall+0xe8/0x1ec [<ffffffc000ca7be8>] kernel_init_freeable+0x1c8/0x284 [<ffffffc0008d90b8>] kernel_init+0x1c/0xe4
Signed-off-by: Jason Yan yanaijie@huawei.com Acked-by: Sumit Saxena sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Ben Hutchings ben@decadent.org.uk --- drivers/scsi/megaraid/megaraid_sas_base.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -3489,6 +3489,7 @@ int megasas_alloc_cmds(struct megasas_in if (megasas_create_frame_pool(instance)) { printk(KERN_DEBUG "megasas: Error creating frame DMA pool\n"); megasas_free_cmds(instance); + return -ENOMEM; }
return 0;
3.16.69-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@oracle.com
commit 6a024330650e24556b8a18cc654ad00cfecf6c6c upstream.
The "param.count" value is a u64 thatcomes from the user. The code later in the function assumes that param.count is at least one and if it's not then it leads to an Oops when we dereference the ZERO_SIZE_PTR.
Also the addition can have an integer overflow which would lead us to allocate a smaller "pages" array than required. I can't immediately tell what the possible run times implications are, but it's safest to prevent the overflow.
Link: http://lkml.kernel.org/r/20181218082129.GE32567@kadam Fixes: 6db7199407ca ("drivers/virt: introduce Freescale hypervisor management driver") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Timur Tabi timur@freescale.com Cc: Mihai Caraman mihai.caraman@freescale.com Cc: Kumar Gala galak@kernel.crashing.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ben Hutchings ben@decadent.org.uk --- drivers/virt/fsl_hypervisor.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/virt/fsl_hypervisor.c +++ b/drivers/virt/fsl_hypervisor.c @@ -215,6 +215,9 @@ static long ioctl_memcpy(struct fsl_hv_i * hypervisor. */ lb_offset = param.local_vaddr & (PAGE_SIZE - 1); + if (param.count == 0 || + param.count > U64_MAX - lb_offset - PAGE_SIZE + 1) + return -EINVAL; num_pages = (param.count + lb_offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
/* Allocate the buffers we need */
3.16.69-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream.
Some TCP peers announce a very small MSS option in their SYN and/or SYN/ACK messages.
This forces the stack to send packets with a very high network/cpu overhead.
Linux has enforced a minimal value of 48. Since this value includes the size of TCP options, and that the options can consume up to 40 bytes, this means that each segment can include only 8 bytes of payload.
In some cases, it can be useful to increase the minimal value to a saner value.
We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility reasons.
Note that TCP_MAXSEG socket option enforces a minimal value of (TCP_MIN_MSS). David Miller increased this minimal value in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.") from 64 to 88.
We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.
CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet edumazet@google.com Suggested-by: Jonathan Looney jtl@netflix.com Acked-by: Neal Cardwell ncardwell@google.com Cc: Yuchung Cheng ycheng@google.com Cc: Tyler Hicks tyhicks@canonical.com Cc: Bruce Curtis brucec@netflix.com Cc: Jonathan Lemon jonathan.lemon@gmail.com Signed-off-by: David S. Miller davem@davemloft.net [Salvatore Bonaccorso: Backport for context changes in 4.9.168] [bwh: Backported to 3.16: Make the sysctl global, consistent with net.ipv4.tcp_base_mss] Signed-off-by: Ben Hutchings ben@decadent.org.uk --- --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -210,6 +210,14 @@ tcp_base_mss - INTEGER Path MTU discovery (MTU probing). If MTU probing is enabled, this is the initial MSS used by the connection.
+tcp_min_snd_mss - INTEGER + TCP SYN and SYNACK messages usually advertise an ADVMSS option, + as described in RFC 1122 and RFC 6691. + If this ADVMSS option is smaller than tcp_min_snd_mss, + it is silently capped to tcp_min_snd_mss. + + Default : 48 (at least 8 bytes of payload per segment) + tcp_congestion_control - STRING Set the congestion control algorithm to be used for new connections. The algorithm "reno" is always available, but --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -34,6 +34,8 @@ static int tcp_retr1_max = 255; static int ip_local_port_range_min[] = { 1, 1 }; static int ip_local_port_range_max[] = { 65535, 65535 }; static int tcp_adv_win_scale_min = -31; +static int tcp_min_snd_mss_min = TCP_MIN_SND_MSS; +static int tcp_min_snd_mss_max = 65535; static int tcp_adv_win_scale_max = 31; static int ip_ttl_min = 1; static int ip_ttl_max = 255; @@ -608,6 +610,15 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_dointvec, }, { + .procname = "tcp_min_snd_mss", + .data = &sysctl_tcp_min_snd_mss, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &tcp_min_snd_mss_min, + .extra2 = &tcp_min_snd_mss_max, + }, + { .procname = "tcp_workaround_signed_windows", .data = &sysctl_tcp_workaround_signed_windows, .maxlen = sizeof(int), --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -61,6 +61,7 @@ int sysctl_tcp_tso_win_divisor __read_mo
int sysctl_tcp_mtu_probing __read_mostly = 0; int sysctl_tcp_base_mss __read_mostly = TCP_BASE_MSS; +int sysctl_tcp_min_snd_mss __read_mostly = TCP_MIN_SND_MSS;
/* By default, RFC2861 behavior. */ int sysctl_tcp_slow_start_after_idle __read_mostly = 1; @@ -1259,8 +1260,7 @@ static inline int __tcp_mtu_to_mss(struc mss_now -= icsk->icsk_ext_hdr_len;
/* Then reserve room for full set of TCP options and 8 bytes of data */ - if (mss_now < TCP_MIN_SND_MSS) - mss_now = TCP_MIN_SND_MSS; + mss_now = max(mss_now, sysctl_tcp_min_snd_mss); return mss_now; }
--- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -270,6 +270,7 @@ extern int sysctl_tcp_moderate_rcvbuf; extern int sysctl_tcp_tso_win_divisor; extern int sysctl_tcp_mtu_probing; extern int sysctl_tcp_base_mss; +extern int sysctl_tcp_min_snd_mss; extern int sysctl_tcp_workaround_signed_windows; extern int sysctl_tcp_slow_start_after_idle; extern int sysctl_tcp_thin_linear_timeouts;
3.16.69-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 3b4929f65b0d8249f19a50245cd88ed1a2f78cff upstream.
Jonathan Looney reported that TCP can trigger the following crash in tcp_shifted_skb() :
BUG_ON(tcp_skb_pcount(skb) < pcount);
This can happen if the remote peer has advertized the smallest MSS that linux TCP accepts : 48
An skb can hold 17 fragments, and each fragment can hold 32KB on x86, or 64KB on PowerPC.
This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs can overflow.
Note that tcp_sendmsg() builds skbs with less than 64KB of payload, so this problem needs SACK to be enabled. SACK blocks allow TCP to coalesce multiple skbs in the retransmit queue, thus filling the 17 fragments to maximal capacity.
CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs
Backport notes, provided by Joao Martins joao.m.martins@oracle.com
v4.15 or since commit 737ff314563 ("tcp: use sequence distance to detect reordering") had switched from the packet-based FACK tracking and switched to sequence-based.
v4.14 and older still have the old logic and hence on tcp_skb_shift_data() needs to retain its original logic and have @fack_count in sync. In other words, we keep the increment of pcount with tcp_skb_pcount(skb) to later used that to update fack_count. To make it more explicit we track the new skb that gets incremented to pcount in @next_pcount, and we get to avoid the constant invocation of tcp_skb_pcount(skb) all together.
Fixes: 832d11c5cd07 ("tcp: Try to restore large SKBs while SACK processing") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Jonathan Looney jtl@netflix.com Acked-by: Neal Cardwell ncardwell@google.com Reviewed-by: Tyler Hicks tyhicks@canonical.com Cc: Yuchung Cheng ycheng@google.com Cc: Bruce Curtis brucec@netflix.com Cc: Jonathan Lemon jonathan.lemon@gmail.com Signed-off-by: David S. Miller davem@davemloft.net [Salvatore Bonaccorso: Adjust for context changes to backport to 4.9.168] [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk --- include/linux/tcp.h | 4 ++++ include/net/tcp.h | 2 ++ net/ipv4/tcp.c | 1 + net/ipv4/tcp_input.c | 26 ++++++++++++++++++++------ net/ipv4/tcp_output.c | 6 +++--- 5 files changed, 30 insertions(+), 9 deletions(-)
--- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -394,4 +394,7 @@ static inline int fastopen_init_queue(st return 0; }
+int tcp_skb_shift(struct sk_buff *to, struct sk_buff *from, int pcount, + int shiftlen); + #endif /* _LINUX_TCP_H */ --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -55,6 +55,8 @@ void tcp_time_wait(struct sock *sk, int
#define MAX_TCP_HEADER (128 + MAX_HEADER) #define MAX_TCP_OPTION_SPACE 40 +#define TCP_MIN_SND_MSS 48 +#define TCP_MIN_GSO_SIZE (TCP_MIN_SND_MSS - MAX_TCP_OPTION_SPACE)
/* * Never offer a window over 32767 without using window scaling. Some --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3169,6 +3169,7 @@ void __init tcp_init(void) int max_rshare, max_wshare, cnt; unsigned int i;
+ BUILD_BUG_ON(TCP_MIN_SND_MSS <= MAX_TCP_OPTION_SPACE); BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb));
percpu_counter_init(&tcp_sockets_allocated, 0); --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1296,7 +1296,7 @@ static bool tcp_shifted_skb(struct sock TCP_SKB_CB(skb)->seq += shifted;
skb_shinfo(prev)->gso_segs += pcount; - BUG_ON(skb_shinfo(skb)->gso_segs < pcount); + WARN_ON_ONCE(tcp_skb_pcount(skb) < pcount); skb_shinfo(skb)->gso_segs -= pcount;
/* When we're adding to gso_segs == 1, gso_size will be zero, @@ -1362,6 +1362,21 @@ static int skb_can_shift(const struct sk return !skb_headlen(skb) && skb_is_nonlinear(skb); }
+int tcp_skb_shift(struct sk_buff *to, struct sk_buff *from, + int pcount, int shiftlen) +{ + /* TCP min gso_size is 8 bytes (TCP_MIN_GSO_SIZE) + * Since TCP_SKB_CB(skb)->tcp_gso_segs is 16 bits, we need + * to make sure not storing more than 65535 * 8 bytes per skb, + * even if current MSS is bigger. + */ + if (unlikely(to->len + shiftlen >= 65535 * TCP_MIN_GSO_SIZE)) + return 0; + if (unlikely(tcp_skb_pcount(to) + pcount > 65535)) + return 0; + return skb_shift(to, from, shiftlen); +} + /* Try collapsing SACK blocks spanning across multiple skbs to a single * skb. */ @@ -1373,6 +1388,7 @@ static struct sk_buff *tcp_shift_skb_dat struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *prev; int mss; + int next_pcount; int pcount = 0; int len; int in_sack; @@ -1467,7 +1483,7 @@ static struct sk_buff *tcp_shift_skb_dat if (!after(TCP_SKB_CB(skb)->seq + len, tp->snd_una)) goto fallback;
- if (!skb_shift(prev, skb, len)) + if (!tcp_skb_shift(prev, skb, pcount, len)) goto fallback; if (!tcp_shifted_skb(sk, skb, state, pcount, len, mss, dup_sack)) goto out; @@ -1486,9 +1502,10 @@ static struct sk_buff *tcp_shift_skb_dat goto out;
len = skb->len; - if (skb_shift(prev, skb, len)) { - pcount += tcp_skb_pcount(skb); - tcp_shifted_skb(sk, skb, state, tcp_skb_pcount(skb), len, mss, 0); + next_pcount = tcp_skb_pcount(skb); + if (tcp_skb_shift(prev, skb, next_pcount, len)) { + pcount += next_pcount; + tcp_shifted_skb(sk, skb, state, next_pcount, len, mss, 0); }
out: --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1254,8 +1254,8 @@ static inline int __tcp_mtu_to_mss(struc mss_now -= icsk->icsk_ext_hdr_len;
/* Then reserve room for full set of TCP options and 8 bytes of data */ - if (mss_now < 48) - mss_now = 48; + if (mss_now < TCP_MIN_SND_MSS) + mss_now = TCP_MIN_SND_MSS; return mss_now; }
3.16.69-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Kosina jkosina@suse.cz
commit 134fca9063ad4851de767d1768180e5dede9a881 upstream.
The semantics of what mincore() considers to be resident is not completely clear, but Linux has always (since 2.3.52, which is when mincore() was initially done) treated it as "page is available in page cache".
That's potentially a problem, as that [in]directly exposes meta-information about pagecache / memory mapping state even about memory not strictly belonging to the process executing the syscall, opening possibilities for sidechannel attacks.
Change the semantics of mincore() so that it only reveals pagecache information for non-anonymous mappings that belog to files that the calling process could (if it tried to) successfully open for writing; otherwise we'd be including shared non-exclusive mappings, which
- is the sidechannel
- is not the usecase for mincore(), as that's primarily used for data, not (shared) text
[jkosina@suse.cz: v2] Link: http://lkml.kernel.org/r/20190312141708.6652-2-vbabka@suse.cz [mhocko@suse.com: restructure can_do_mincore() conditions] Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1903062342020.19912@cbobk.fhfr.pm Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Acked-by: Josh Snyder joshs@netflix.com Acked-by: Michal Hocko mhocko@suse.com Originally-by: Linus Torvalds torvalds@linux-foundation.org Originally-by: Dominique Martinet asmadeus@codewreck.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Chinner david@fromorbit.com Cc: Kevin Easton kevin@guarana.org Cc: Matthew Wilcox willy@infradead.org Cc: Cyril Hrubis chrubis@suse.cz Cc: Tejun Heo tj@kernel.org Cc: Kirill A. Shutemov kirill@shutemov.name Cc: Daniel Gruss daniel@gruss.cc Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk --- --- a/mm/mincore.c +++ b/mm/mincore.c @@ -212,6 +212,22 @@ static void mincore_page_range(struct vm } while (pgd++, addr = next, addr != end); }
+static inline bool can_do_mincore(struct vm_area_struct *vma) +{ + if (vma_is_anonymous(vma)) + return true; + if (!vma->vm_file) + return false; + /* + * Reveal pagecache information only for non-anonymous mappings that + * correspond to the files the calling process could (if tried) open + * for writing; otherwise we'd be including shared non-exclusive + * mappings, which opens a side channel. + */ + return inode_owner_or_capable(file_inode(vma->vm_file)) || + inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; +} + /* * Do a chunk of "sys_mincore()". We've already checked * all the arguments, we hold the mmap semaphore: we should @@ -227,6 +243,11 @@ static long do_mincore(unsigned long add return -ENOMEM;
end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); + if (!can_do_mincore(vma)) { + unsigned long pages = DIV_ROUND_UP(end - addr, PAGE_SIZE); + memset(vec, 1, pages); + return pages; + }
if (is_vm_hugetlb_page(vma)) mincore_hugetlb_page_range(vma, addr, end, vec);
On Tue, Jun 18, 2019 at 03:27:59PM +0100, Ben Hutchings wrote:
This is the start of the stable review cycle for the 3.16.69 release. There are 10 patches in this series, which will be posted as responses to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Jun 20 14:27:59 UTC 2019. Anything received after that time might be too late.
Build results: total: 136 pass: 136 fail: 0 Qemu test results: total: 231 pass: 231 fail: 0
Guenter
On Wed, 2019-06-19 at 14:58 -0700, Guenter Roeck wrote:
On Tue, Jun 18, 2019 at 03:27:59PM +0100, Ben Hutchings wrote:
This is the start of the stable review cycle for the 3.16.69 release. There are 10 patches in this series, which will be posted as responses to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Jun 20 14:27:59 UTC 2019. Anything received after that time might be too late.
Build results: total: 136 pass: 136 fail: 0 Qemu test results: total: 231 pass: 231 fail: 0
Great, thanks for checking.
Ben.
linux-stable-mirror@lists.linaro.org