Hi Pavlos,
On Tue, Mar 20, 2018 at 01:01:38PM +0100, Pavlos Parissis wrote:
Hi,
We were upgrading a production system from 4.14.20 to 4.14.28 and we got the following crash and I was wondering if anyone has seen similar crash:
[ 346.435832] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 [ 346.473216] IP: tcp_push+0x42/0x120 [ 346.489607] PGD 8000001838949067 P4D 8000001838949067 PUD 183894a067 PMD 0 [ 346.523318] Oops: 0002 [#1] SMP PTI [ 346.540395] Modules linked in: sctp_diag sctp dccp_diag dccp udp_diag unix_diag tcp_diag inet_diag 8021q garp mrp input_leds joydev xfs libcrc32c loop vfat fat x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel iTCO_wdt crypto_simd glue_helper cryptd iTCO_vendor_support intel_cstate lpc_ich intel_rapl_perf mfd_core hpwdt i2c_i801 hpilo pcspkr wmi sg ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter shpchp ioatdma ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops sd_mod ttm crc32c_intel ixgbe mdio hpsa tg3 i40e drm dca ptp pps_core scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod dax [ 346.854574] CPU: 5 PID: 1533 Comm: carbon-submissi Not tainted 4.14.28-1.el7.x86_64 #1 [ 346.892452] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 04/25/2017 [ 346.931641] task: ffff88183806c5c0 task.stack: ffffc90007ea8000 [ 346.959768] RIP: 0010:tcp_push+0x42/0x120 [ 346.978914] RSP: 0018:ffffc90007eabc78 EFLAGS: 00010246 [ 347.004199] RAX: 0000000000000000 RBX: 00000000000000c2 RCX: 0000000000000001 [ 347.038684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88184ad0c800 [ 347.073236] RBP: ffffc90007eabc78 R08: 000000000000ffcb R09: 0000000000000257 [ 347.108070] R10: ffff88184ad0c958 R11: 000000000000ffcb R12: 00000000ffffffe0 [ 347.142006] R13: 00000000ffffffe0 R14: ffff88184ad0c800 R15: ffff88184ad0c958 [ 347.176290] FS: 00007fbad3ff7700(0000) GS:ffff880c4fd40000(0000) knlGS:0000000000000000 [ 347.215545] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 347.243091] CR2: 0000000000000038 CR3: 0000001838bd4004 CR4: 00000000001606e0 [ 347.276950] Call Trace: [ 347.288526] tcp_sendmsg_locked+0x118/0xe50 [ 347.308321] tcp_sendmsg+0x2c/0x50 [ 347.324517] inet_sendmsg+0x37/0xb0 [ 347.341379] sock_sendmsg+0x3e/0x50 [ 347.358018] sock_write_iter+0x85/0xf0 [ 347.376095] __vfs_write+0xfb/0x160 [ 347.392961] vfs_write+0xb2/0x1b0 [ 347.408915] ? syscall_trace_enter+0x1cd/0x2b0 [ 347.430458] SyS_write+0x55/0xc0 [ 347.446047] do_syscall_64+0x79/0x1b0 [ 347.463757] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 347.488102] RIP: 0033:0x7fbae295a6ad [ 347.505100] RSP: 002b:00007fbad3ff6e60 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 [ 347.541196] RAX: ffffffffffffffda RBX: 00000000000000c7 RCX: 00007fbae295a6ad [ 347.575649] RDX: 00000000000000c7 RSI: 00007fbabc06ab60 RDI: 0000000000000013 [ 347.609762] RBP: 000000000000000a R08: 00007fbabc06ab60 R09: 00000000022a58f0 [ 347.643653] R10: 0000000000001a05 R11: 0000000000000293 R12: 00007fbabc06ab60 [ 347.677684] R13: 000000000200f040 R14: 00000000022a1840 R15: 00000000000000cb [ 347.712207] Code: 48 8b 87 60 01 00 00 4c 8d 97 58 01 00 00 41 89 d3 ba 00 00 00 00 49 39 c2 48 0f 44 c2 89 f2 81 e2 00 80 00 00 0f 85 af 00 00 00 <80> 48 38 08 44 8b 8f 74 06 00 00 44 89 8f 7c 06 00 00 83 e6 01 [ 347.803312] RIP: tcp_push+0x42/0x120 RSP: ffffc90007eabc78 [ 347.829666] CR2: 0000000000000038 [ 347.845805] ---[ end trace 031807a627822772 ]--- [ 347.873681] Kernel panic - not syncing: Fatal exception [ 347.898899] Kernel Offset: disabled [ 347.920580] Rebooting in 70 seconds..
Interesting, I also experienced a spontaneous panic on my home firewall after upgrading it from 4.14.10 to 4.14.27, but I didn't have any symbol in the traces so the dump wasn't exploitable. All I know is that it was a NULL deref with a very small offset as well. It may be totally unrelated though but the coincidence is troubling, especially since I haven't had a panic in -stable for a very long time.
Ah I've just seen your second e-mail. So if it's the same as the patch you pointed, the bug is 4.14-only and the fix as well. It will likely come with the next batch of networking backports.
Cheers, Willy