The above referenced commit has problems on older non-rbTree kernels.
AFAICS, the commit has only been backported to 4.14 up to now, but the commit that fdfc5c8594c2 is fixing (namely ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty"), is in v4.2.
Christoph Paasch (2): tcp: Reset send_head when removing skb from write-queue tcp: Don't dequeue SYN/FIN-segments from write-queue
net/ipv4/tcp.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
syzkaller is not happy since commit fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases"):
CPU: 1 PID: 13814 Comm: syz-executor.4 Not tainted 4.14.143 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011 task: ffff888040105c00 task.stack: ffff8880649c0000 RIP: 0010:tcp_sendmsg_locked+0x6b4/0x4390 net/ipv4/tcp.c:1350 RSP: 0018:ffff8880649cf718 EFLAGS: 00010206 RAX: 0000000000000014 RBX: 000000000000001e RCX: ffffc90000717000 RDX: 0000000000000077 RSI: ffffffff82e760f7 RDI: 00000000000000a0 RBP: ffff8880649cfaa8 R08: 1ffff1100c939e7a R09: ffff8880401063c8 R10: 0000000000000003 R11: 0000000000000001 R12: dffffc0000000000 R13: ffff888043d74750 R14: ffff888043d74500 R15: 000000000000001e FS: 00007f0afcb6d700(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2ca22000 CR3: 0000000040496004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tcp_sendmsg+0x2a/0x40 net/ipv4/tcp.c:1533 inet_sendmsg+0x173/0x4e0 net/ipv4/af_inet.c:784 sock_sendmsg_nosec net/socket.c:646 [inline] sock_sendmsg+0xc3/0x100 net/socket.c:656 SYSC_sendto+0x35d/0x5e0 net/socket.c:1766 do_syscall_64+0x241/0x680 arch/x86/entry/common.c:292 entry_SYSCALL_64_after_hwframe+0x42/0xb7
The problem is that we are removing an skb from the write-queue that could have been referenced by the sk_send_head. Thus, we need to check for the send_head's sanity after removing it.
This patch needs to be backported only to 4.14 and older (among those that applied the backport of fdfc5c8594c2).
Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases") Cc: Eric Dumazet edumazet@google.com Cc: Jason Baron jbaron@akamai.com Cc: Vladimir Rutsky rutsky@google.com Cc: Soheil Hassas Yeganeh soheil@google.com Cc: Neal Cardwell ncardwell@google.com Signed-off-by: Christoph Paasch cpaasch@apple.com --- net/ipv4/tcp.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5ce069ce2a97..efe767e20d01 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -924,8 +924,7 @@ static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) { if (skb && !skb->len) { tcp_unlink_write_queue(skb, sk); - if (tcp_write_queue_empty(sk)) - tcp_chrono_stop(sk, TCP_CHRONO_BUSY); + tcp_check_send_head(sk, skb); sk_wmem_free_skb(sk, skb); } }
Hi Christoph,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master] [cannot apply to v5.3-rc8 next-20190904] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Christoph-Paasch/tcp-Reset-send_hea... config: x86_64-randconfig-s0-201937 (attached as .config) compiler: gcc-4.9 (Debian 4.9.2-10+deb8u1) 4.9.2 reproduce: # save the attached .config to linux build tree make ARCH=x86_64
If you fix the issue, kindly add following tag Reported-by: kbuild test robot lkp@intel.com
All errors (new ones prefixed by >>):
net//ipv4/tcp.c: In function 'tcp_remove_empty_skb':
net//ipv4/tcp.c:948:3: error: implicit declaration of function 'tcp_check_send_head' [-Werror=implicit-function-declaration]
tcp_check_send_head(sk, skb); ^ Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size Cyclomatic Complexity 5 include/linux/compiler.h:__write_once_size Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_read Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_write Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_read Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_set Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_inc Cyclomatic Complexity 2 arch/x86/include/asm/atomic.h:arch_atomic_dec_and_test Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_read Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_read Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_set Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_inc Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_dec_and_test Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_read Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_read Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:arch_set_bit Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:arch___set_bit Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:arch_clear_bit Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:arch___clear_bit Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:variable_test_bit Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64 Cyclomatic Complexity 1 include/asm-generic/bitops-instrumented.h:set_bit Cyclomatic Complexity 1 include/asm-generic/bitops-instrumented.h:__set_bit Cyclomatic Complexity 1 include/asm-generic/bitops-instrumented.h:clear_bit Cyclomatic Complexity 1 include/asm-generic/bitops-instrumented.h:__clear_bit Cyclomatic Complexity 2 include/asm-generic/bitops-instrumented.h:test_bit Cyclomatic Complexity 1 include/linux/bitops.h:ror32 Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64 Cyclomatic Complexity 1 include/linux/kernel.h:kstrtoul Cyclomatic Complexity 1 include/linux/list.h:INIT_LIST_HEAD Cyclomatic Complexity 1 include/linux/percpu-defs.h:__this_cpu_preempt_check Cyclomatic Complexity 1 include/linux/math64.h:div_u64_rem Cyclomatic Complexity 1 include/linux/math64.h:div_u64 Cyclomatic Complexity 1 arch/x86/include/asm/current.h:get_current Cyclomatic Complexity 2 arch/x86/include/asm/page_64.h:__phys_addr_nodebug Cyclomatic Complexity 69 include/asm-generic/getorder.h:get_order Cyclomatic Complexity 1 include/linux/jump_label.h:static_key_count Cyclomatic Complexity 2 include/linux/jump_label.h:static_key_slow_inc Cyclomatic Complexity 4 include/linux/jump_label.h:static_key_enable Cyclomatic Complexity 3 include/linux/string.h:memset Cyclomatic Complexity 4 include/linux/string.h:memcpy Cyclomatic Complexity 1 include/linux/err.h:IS_ERR Cyclomatic Complexity 1 include/linux/thread_info.h:test_ti_thread_flag Cyclomatic Complexity 1 include/linux/thread_info.h:check_object_size Cyclomatic Complexity 2 include/linux/thread_info.h:copy_overflow Cyclomatic Complexity 8 include/linux/thread_info.h:check_copy_size Cyclomatic Complexity 1 arch/x86/include/asm/preempt.h:preempt_count Cyclomatic Complexity 1 arch/x86/include/asm/preempt.h:should_resched Cyclomatic Complexity 1 include/linux/bottom_half.h:local_bh_disable Cyclomatic Complexity 1 include/linux/bottom_half.h:local_bh_enable Cyclomatic Complexity 1 include/linux/lockdep.h:lock_is_held Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check Cyclomatic Complexity 1 include/linux/spinlock.h:spin_lock Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock_bh Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_lock_acquire Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_lock_release Cyclomatic Complexity 4 include/linux/rcupdate.h:rcu_read_lock Cyclomatic Complexity 4 include/linux/rcupdate.h:rcu_read_unlock Cyclomatic Complexity 1 include/linux/time32.h:timespec64_to_timespec Cyclomatic Complexity 1 include/linux/ktime.h:ktime_to_ns Cyclomatic Complexity 1 include/linux/timekeeping.h:ktime_get_ns Cyclomatic Complexity 2 include/linux/page-flags.h:compound_head Cyclomatic Complexity 1 include/linux/gfp.h:gfpflags_allow_blocking Cyclomatic Complexity 3 include/linux/slab.h:kmalloc_type Cyclomatic Complexity 28 include/linux/slab.h:kmalloc_index Cyclomatic Complexity 1 include/linux/slab.h:__kmalloc_node Cyclomatic Complexity 1 include/linux/slab.h:kmem_cache_alloc_node_trace Cyclomatic Complexity 1 include/linux/slab.h:kmalloc_large Cyclomatic Complexity 4 include/linux/slab.h:kmalloc Cyclomatic Complexity 4 include/linux/slab.h:kmalloc_node Cyclomatic Complexity 1 include/linux/slab.h:kzalloc Cyclomatic Complexity 1 include/linux/refcount.h:refcount_read Cyclomatic Complexity 1 include/linux/sched.h:task_pid_nr Cyclomatic Complexity 1 include/linux/sched.h:task_thread_info Cyclomatic Complexity 1 include/linux/sched.h:test_tsk_thread_flag Cyclomatic Complexity 2 include/linux/uaccess.h:copy_from_user Cyclomatic Complexity 2 include/linux/uaccess.h:copy_to_user Cyclomatic Complexity 1 include/crypto/hash.h:__crypto_ahash_cast Cyclomatic Complexity 1 include/crypto/hash.h:crypto_ahash_tfm Cyclomatic Complexity 1 include/crypto/hash.h:crypto_ahash_reqtfm Cyclomatic Complexity 1 include/crypto/hash.h:crypto_ahash_reqsize Cyclomatic Complexity 1 include/crypto/hash.h:crypto_ahash_update Cyclomatic Complexity 1 include/crypto/hash.h:ahash_request_set_tfm Cyclomatic Complexity 2 include/crypto/hash.h:ahash_request_alloc Cyclomatic Complexity 1 include/crypto/hash.h:ahash_request_set_callback Cyclomatic Complexity 1 include/crypto/hash.h:ahash_request_set_crypt Cyclomatic Complexity 1 include/linux/percpu_counter.h:percpu_counter_init Cyclomatic Complexity 2 include/linux/percpu_counter.h:percpu_counter_add Cyclomatic Complexity 1 include/linux/percpu_counter.h:percpu_counter_read_positive Cyclomatic Complexity 1 include/linux/percpu_counter.h:percpu_counter_sum_positive Cyclomatic Complexity 1 include/linux/percpu_counter.h:percpu_counter_inc Cyclomatic Complexity 4 include/linux/poll.h:poll_wait Cyclomatic Complexity 3 include/linux/poll.h:poll_does_not_wait Cyclomatic Complexity 2 include/linux/uio.h:copy_to_iter Cyclomatic Complexity 2 include/linux/uio.h:copy_from_iter_full Cyclomatic Complexity 2 include/linux/uio.h:copy_from_iter_full_nocache
vim +/tcp_check_send_head +948 net//ipv4/tcp.c
937 938 /* In some cases, both sendpage() and sendmsg() could have added 939 * an skb to the write queue, but failed adding payload on it. 940 * We need to remove it to consume less memory, but more 941 * importantly be able to generate EPOLLOUT for Edge Trigger epoll() 942 * users. 943 */ 944 static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) 945 { 946 if (skb && !skb->len) { 947 tcp_unlink_write_queue(skb, sk);
948 tcp_check_send_head(sk, skb);
949 sk_wmem_free_skb(sk, skb); 950 } 951 } 952
--- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
If a SYN/FIN-segment is on the write-queue, skb->len is 0, but the segment actually has been transmitted. end_seq and seq of the tcp_skb_cb in that case will indicate this difference.
We should not remove such segments from the write-queue as we might be in SYN_SENT-state and a retransmission-timer is running. When that one fires, packets_out will be 1, but the write-queue would be empty, resulting in:
[ 61.280214] ------------[ cut here ]------------ [ 61.281307] WARNING: CPU: 0 PID: 0 at net/ipv4/tcp_timer.c:429 tcp_retransmit_timer+0x18f9/0x2660 [ 61.283498] Modules linked in: [ 61.284084] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.142 #58 [ 61.285214] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011 [ 61.286644] task: ffffffff8401e1c0 task.stack: ffffffff84000000 [ 61.287758] RIP: 0010:tcp_retransmit_timer+0x18f9/0x2660 [ 61.288715] RSP: 0018:ffff88806ce07cb8 EFLAGS: 00010206 [ 61.289669] RAX: ffffffff8401e1c0 RBX: ffff88805c998b00 RCX: 0000000000000006 [ 61.290968] RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffff88805c9994d8 [ 61.292314] RBP: ffff88805c99919a R08: ffff88807fff901c R09: ffff88807fff9008 [ 61.293547] R10: ffff88807fff9017 R11: ffff88807fff9010 R12: ffff88805c998b30 [ 61.294834] R13: ffffffff844b9380 R14: 0000000000000000 R15: ffff88805c99930c [ 61.296086] FS: 0000000000000000(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000 [ 61.297523] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 61.298646] CR2: 00007f721da50ff8 CR3: 0000000004014002 CR4: 00000000001606f0 [ 61.299944] Call Trace: [ 61.300403] <IRQ> [ 61.300806] ? kvm_sched_clock_read+0x21/0x30 [ 61.301689] ? sched_clock+0x5/0x10 [ 61.302433] ? sched_clock_cpu+0x18/0x170 [ 61.303173] tcp_write_timer_handler+0x2c1/0x7a0 [ 61.304038] tcp_write_timer+0x13e/0x160 [ 61.304794] call_timer_fn+0x14a/0x5f0 [ 61.305480] ? tcp_write_timer_handler+0x7a0/0x7a0 [ 61.306364] ? __next_timer_interrupt+0x140/0x140 [ 61.307229] ? _raw_spin_unlock_irq+0x24/0x40 [ 61.308033] ? tcp_write_timer_handler+0x7a0/0x7a0 [ 61.308887] ? tcp_write_timer_handler+0x7a0/0x7a0 [ 61.309760] run_timer_softirq+0xc41/0x1080 [ 61.310539] ? trigger_dyntick_cpu.isra.33+0x180/0x180 [ 61.311506] ? ktime_get+0x13f/0x1c0 [ 61.312232] ? clockevents_program_event+0x10d/0x2f0 [ 61.313158] __do_softirq+0x20b/0x96b [ 61.313889] irq_exit+0x1a7/0x1e0 [ 61.314513] smp_apic_timer_interrupt+0xfc/0x4d0 [ 61.315386] apic_timer_interrupt+0x8f/0xa0 [ 61.316129] </IRQ>
Followed by a panic.
So, before removing an skb with skb->len == 0, let's make sure that the skb is really empty by checking the end_seq and seq.
This patch needs to be backported only to 4.14 and older (among those that applied the backport of fdfc5c8594c2).
Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases") Cc: Eric Dumazet edumazet@google.com Cc: Jason Baron jbaron@akamai.com Cc: Vladimir Rutsky rutsky@google.com Cc: Soheil Hassas Yeganeh soheil@google.com Cc: Neal Cardwell ncardwell@google.com Signed-off-by: Christoph Paasch cpaasch@apple.com --- net/ipv4/tcp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index efe767e20d01..c1f59a53f68f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -922,7 +922,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags) */ static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) { - if (skb && !skb->len) { + if (skb && !skb->len && + TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) { tcp_unlink_write_queue(skb, sk); tcp_check_send_head(sk, skb); sk_wmem_free_skb(sk, skb);
Hello Greg & Sasha,
On Sat, Sep 14, 2019 at 12:20 AM Christoph Paasch cpaasch@apple.com wrote:
The above referenced commit has problems on older non-rbTree kernels.
AFAICS, the commit has only been backported to 4.14 up to now, but the commit that fdfc5c8594c2 is fixing (namely ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty"), is in v4.2.
Christoph Paasch (2): tcp: Reset send_head when removing skb from write-queue tcp: Don't dequeue SYN/FIN-segments from write-queue
I'm checking in on these two patches for the 4.14 stable-queue. Especially the panic fixed by patch 2 is pretty easy to trigger :-/
Thanks, Christoph
On Thu, Sep 19, 2019 at 08:21:43AM -0700, Christoph Paasch wrote:
Hello Greg & Sasha,
On Sat, Sep 14, 2019 at 12:20 AM Christoph Paasch cpaasch@apple.com wrote:
The above referenced commit has problems on older non-rbTree kernels.
AFAICS, the commit has only been backported to 4.14 up to now, but the commit that fdfc5c8594c2 is fixing (namely ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty"), is in v4.2.
Christoph Paasch (2): tcp: Reset send_head when removing skb from write-queue tcp: Don't dequeue SYN/FIN-segments from write-queue
I'm checking in on these two patches for the 4.14 stable-queue. Especially the panic fixed by patch 2 is pretty easy to trigger :-/
Dude, it's been less than a week. And it's the middle of the merge window. And it's the week after Plumbers and Maintainer's summit.
Relax...
I'll go queue these up now, but I am worried about them, given this total mess the backports seem to have caused.
Why isn't this needed in 4.9.y and 4.4.y also?
thanks,
greg k-h
On Thu, Sep 19, 2019 at 1:07 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Sep 19, 2019 at 08:21:43AM -0700, Christoph Paasch wrote:
Hello Greg & Sasha,
On Sat, Sep 14, 2019 at 12:20 AM Christoph Paasch cpaasch@apple.com wrote:
The above referenced commit has problems on older non-rbTree kernels.
AFAICS, the commit has only been backported to 4.14 up to now, but the commit that fdfc5c8594c2 is fixing (namely ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty"), is in v4.2.
Christoph Paasch (2): tcp: Reset send_head when removing skb from write-queue tcp: Don't dequeue SYN/FIN-segments from write-queue
I'm checking in on these two patches for the 4.14 stable-queue. Especially the panic fixed by patch 2 is pretty easy to trigger :-/
Dude, it's been less than a week. And it's the middle of the merge window. And it's the week after Plumbers and Maintainer's summit.
Relax...
Sorry!
I'll go queue these up now, but I am worried about them, given this total mess the backports seem to have caused.
Why isn't this needed in 4.9.y and 4.4.y also?
From what I see, commit fdfc5c8594c2 has not been backported to 4.9
and older. But I can imagine that eventually it will have to be backported (I guess Eric can confirm or deny). If it gets backported to 4.9 and 4.4, my 2 patches will have to come with them.
Christoph
linux-stable-mirror@lists.linaro.org