On Fri, Nov 10, 2017 at 2:58 AM, Patrick McLean chutzpah@gentoo.org wrote:
On 2017-11-09 12:04 PM, Linus Torvalds wrote:
On Thu, Nov 9, 2017 at 11:51 AM, Patrick McLean chutzpah@gentoo.org wrote:
We will check our fork against the in-kernel cp201x driver to make sure we didn't miss anything, but it seems odd we would be hitting the issue so consistently in the NFS code path, rather than somewhere in USB, serial, or GPIO paths.
So since you seem to be able to reproduce this _reasonably_ easily, it's definitely worth checking that it still reproduces even without the gcc plugins.
I haven't been able to reproduce it with RANDSTRUCT disabled (and structleak enabled). I will keep trying for a little while more, but evidence seems to be pointing to that.
Something must have changed since 4.13.8 to trigger this though. This did not crop up at all until we tried 4.13.11, where it we saw it pretty quickly. We have a pretty large number of machines running 4.13.6 with RANDSTRUCT enabled and running a the same workload with many more clients, and have not seen this bug at all.
I couldn't find anything overly suspicious between 4.13.8 and 4.13.11, see the full list of commits since 3.14.6 at https://pastebin.com/AcxBZR7H
The ones I couldn't immediately rule out (but no smoking gun either) would be:
9970679f497a x86/cpu/AMD: Apply the Erratum 688 fix when the BIOS doesn't ca6711747c5a assoc_array: Fix a buggy node-splitting case 2fbb8bf749b5 xfs: move two more RT specific functions into CONFIG_XFS_RT 1e1427356d8d xfs: trim writepage mapping to within eof 9df9b634f637 xfs: cancel dirty pages on invalidation cd3f0bee1b94 xfs: handle error if xfs_btree_get_bufs fails 58cfca25f540 xfs: reinit btree pointer on attr tree inactivation walk 659a9989b68b xfs: don't change inode mode if ACL update fails 88ccd3b6884a xfs: move more RT specific code under CONFIG_XFS_RT 5733ebee586c xfs: Don't log uninitialised fields in inode structures 199a7448c097 xfs: handle racy AIO in xfs_reflink_end_cow ee5d69c908a1 xfs: always swap the cow forks when swapping extents 2888145444f1 xfs: Capture state of the right inode in xfs_iflush_done d0fa252b207f xfs: perag initialization should only touch m_ag_max_usable for AG 0 8da6f7fbe43c xfs: update i_size after unwritten conversion in dio completion a9eac76e958b xfs: report zeroed or not correctly in xfs_zero_range() 67d51bdcc9f4 fs/xfs: Use %pS printk format for direct addresses 2bf3122f2130 xfs: evict CoW fork extents when performing finsert/fcollapse a58a0826656d xfs: don't unconditionally clear the reflink flag on zero-block files c61e905e0ee2 iomap_dio_rw: Allocate AIO completion queue before submitting dio 7610595830bb pkcs7: Prevent NULL pointer dereference, since sinfo is not always set. 24a33a0c96f3 KEYS: don't let add_key() update an uninstantiated key ad4aa448c9b2 FS-Cache: fix dereference of NULL user_key_payload f45b8fe12221 KEYS: Fix race between updating and finding a negative key e56be12012c2 ecryptfs: fix dereference of NULL user_key_payload 363ce0b01fe0 fscrypt: fix dereference of NULL user_key_payload cc757d55c903 lib/digsig: fix dereference of NULL user_key_payload f5e97214207f x86/microcode/intel: Disable late loading on model 79 7b5e405b7878 Revert "tools/power turbostat: stop migrating, unless '-m'" 8b1e10789c84 KEYS: encrypted: fix dereference of NULL user_key_payload a258a35a9930 mm: page_vma_mapped: ensure pmd is loaded with READ_ONCE outside of lock e47a56cbf519 usb: xhci: Handle error condition in xhci_stop_device() d53911e63388 usb: xhci: Reset halted endpoint if trb is noop d1120fe38b3f xhci: Cleanup current_cmd in xhci_cleanup_command_queue() 301d332138d2 xhci: Identify USB 3.1 capable hosts by their port protocol capability 015e94ead900 usb: hub: Allow reset retry for USB2 devices on connect bounce 1916547b28bd usb: quirks: add quirk for WORLDE MINI MIDI keyboard e3a038930502 usb: cdc_acm: Add quirk for Elatec TWN3 c2110c8dea7a USB: serial: metro-usb: add MS7820 device id 775462fd5c53 USB: core: fix out-of-bounds access bug in usb_get_bos_descriptor() a9fdf6354267 USB: devio: Revert "USB: devio: Don't corrupt user memory"
However, you mentioned cp210x, and I noticed related changes in 4.13.8:
e21045a22395 USB: serial: console: fix use-after-free after failed setup 6c7cb458405e USB: serial: console: fix use-after-free on disconnect 4b3e3c7282d6 USB: serial: qcserial: add Dell DW5818, DW5819 c796da1d110f USB: serial: option: add support for TP-Link LTE module e7e0b4b39663 USB: serial: cp210x: add support for ELV TFD500 1ae2c690f967 USB: serial: cp210x: fix partnum regression 78a02c93648e USB: serial: ftdi_sio: add id for Cypress WICED dev board
You could try reverting those seven, this could point to your forked driver if it makes a difference.
Arnd