On Tue, 2018-02-13 at 06:46 +0100, Paolo Valente wrote:
Any chance your gremlin grants you some OOPS?
I crash dumped it, though I'm unlikely to have time to even figure out what the hell _I'm_ doing in IO land, much less what it's doing :)
If you want it, and have a spot, I could try an over night upload (vmcore=14GB). Hung tasks are not in short supply.
crash> ps | grep UN 4 2 0 ffff880187f12940 UN 0.0 0 0 [kworker/0:0H] 6 2 0 ffff880187f144c0 UN 0.0 0 0 [mm_percpu_wq] 8 2 1 ffff880187f16040 UN 0.0 0 0 [rcu_sched] 9 2 4 ffff880187f16e00 UN 0.0 0 0 [rcu_bh] 18 2 1 ffff88018740d280 UN 0.0 0 0 [kworker/1:0H] 24 2 2 ffff880187582940 UN 0.0 0 0 [kworker/2:0H] 30 2 3 ffff8801875e8000 UN 0.0 0 0 [kworker/3:0H] 36 2 4 ffff8801875ed280 UN 0.0 0 0 [kworker/4:0H] 42 2 5 ffff88018769a940 UN 0.0 0 0 [kworker/5:0H] 48 2 6 ffff880187700000 UN 0.0 0 0 [kworker/6:0H] 54 2 7 ffff880187705280 UN 0.0 0 0 [kworker/7:0H] 57 2 1 ffff8801877b1b80 UN 0.0 0 0 [netns] 58 2 0 ffff8801877b2940 UN 0.0 0 0 [kworker/0:1] 62 2 2 ffff8801877b6040 UN 0.0 0 0 [writeback] 66 2 3 ffff88041e881b80 UN 0.0 0 0 [crypto] 67 2 5 ffff88041e882940 UN 0.0 0 0 [kintegrityd] 68 2 6 ffff88041e883700 UN 0.0 0 0 [kblockd] 69 2 2 ffff88041e8844c0 UN 0.0 0 0 [kworker/2:1] 73 2 4 ffff88041e910dc0 UN 0.0 0 0 [edac-poller] 74 2 7 ffff88041e911b80 UN 0.0 0 0 [devfreq_wq] 75 2 5 ffff8801877b0000 UN 0.0 0 0 [watchdogd] 91 2 3 ffff8803fa08d280 UN 0.0 0 0 [nvme-wq] 92 2 0 ffff8803fa08e040 UN 0.0 0 0 [ipv6_addrconf] 160 2 1 ffff88041e886e00 UN 0.0 0 0 [kaluad] 163 2 5 ffff8803fa346e00 UN 0.0 0 0 [kmpath_rdacd] 167 2 7 ffff8803fa346040 UN 0.0 0 0 [kmpathd] 168 2 3 ffff8803fa345280 UN 0.0 0 0 [kmpath_handlerd] 290 2 7 ffff8803f784a940 UN 0.0 0 0 [kworker/7:2] 356 2 6 ffff8803f677ee00 UN 0.0 0 0 [ata_sff] 526 2 3 ffff8803fc678000 UN 0.0 0 0 [scsi_tmf_0] 528 2 0 ffff8803fc679b80 UN 0.0 0 0 [scsi_tmf_1] 530 2 3 ffff8803fc67b700 UN 0.0 0 0 [scsi_tmf_2] 532 2 7 ffff8803fc67d280 UN 0.0 0 0 [scsi_tmf_3] 534 2 7 ffff8803fc67ee00 UN 0.0 0 0 [scsi_tmf_4] 536 2 3 ffff8803f6848dc0 UN 0.0 0 0 [scsi_tmf_5] 540 2 2 ffff8803f684c4c0 UN 0.0 0 0 [kworker/u16:6] 541 2 5 ffff8803f684d280 UN 0.0 0 0 [nvkm-disp] 543 2 4 ffff8803f684ee00 UN 0.0 0 0 [ttm_swap] 546 2 3 ffff8803f3ae8000 UN 0.0 0 0 [kworker/3:1H] 547 2 2 ffff8803f3ae8dc0 UN 0.0 0 0 [kworker/2:1H] 550 2 6 ffff8803f3ae9b80 UN 0.0 0 0 [kworker/6:1H] 552 2 5 ffff8803f3aeb700 UN 0.0 0 0 [scsi_tmf_6] 591 2 1 ffff8803f3aeee00 UN 0.0 0 0 [ext4-rsv-conver] 601 2 5 ffff8803f880ee00 UN 0.0 0 0 [kworker/5:1H] 632 2 1 ffff8803fa023700 UN 0.0 0 0 [kworker/1:1H] 643 2 0 ffff8803fa025280 UN 0.0 0 0 [kworker/0:1H] 659 2 4 ffff8803f677d280 UN 0.0 0 0 [kworker/4:1H] 663 2 7 ffff8803f6779b80 UN 0.0 0 0 [kworker/7:1H] 701 2 0 ffff8803f9eea940 UN 0.0 0 0 [kworker/0:2] 702 2 1 ffff8803f9eeb700 UN 0.0 0 0 [rpciod] 703 2 7 ffff8803f9eee040 UN 0.0 0 0 [xprtiod] 1020 2 4 ffff8803fc1c44c0 UN 0.0 0 0 [acpi_thermal_pm] 1170 2 3 ffff8803f1708dc0 UN 0.0 0 0 [jbd2/sdd1-8] 1171 2 6 ffff8803f170c4c0 UN 0.0 0 0 [ext4-rsv-conver] 1176 2 3 ffff8803fac58000 UN 0.0 0 0 [ext4-rsv-conver] 4233 2 5 ffff8803a8be8dc0 UN 0.0 0 0 [kworker/5:1] 8808 10134 6 ffff88037e789b80 UN 0.0 8104 3248 make 15603 2 5 ffff88041e886040 UN 0.0 0 0 [kworker/5:0] 15815 2 1 ffff88037e788dc0 UN 0.0 0 0 [kworker/1:0] 19344 8808 6 ffff8803c6658dc0 UN 0.0 7720 2960 make 19633 19632 3 ffff88041e916e00 UN 0.0 9924 2296 gcc 19651 19649 3 ffff8803a8bee040 UN 0.0 9896 2276 gcc 19683 19682 3 ffff8803b6e29b80 UN 0.0 9904 2300 gcc 19709 19687 2 ffff8803a8a41b80 UN 0.0 5852 1868 rm 19728 19727 2 ffff8803a8be9b80 UN 0.0 9932 2256 gcc 19732 19731 5 ffff880396dc0dc0 UN 0.0 9920 2296 gcc 19739 19606 4 ffff8803f394d280 UN 0.0 4560 1608 rm 19740 1 5 ffff8803b6ec2940 UN 0.1 49792 19732 cc1 19744 19656 2 ffff880396ceb700 UN 0.0 4524 1828 recordmcount 19922 1 3 ffff8803a8a40dc0 UN 0.1 388140 13996 pool 20278 2 2 ffff880396dc2940 UN 0.0 0 0 [kworker/u16:1] 20505 2 2 ffff8803b6eb1b80 UN 0.0 0 0 [kworker/2:2] 20648 2 3 ffff88037e78a940 UN 0.0 0 0 [kworker/3:1] 20766 2 6 ffff880187f11b80 UN 0.0 0 0 [kworker/u16:0] 21007 2 6 ffff8803c97844c0 UN 0.0 0 0 [kworker/6:0] 21385 2 4 ffff8803a8a46040 UN 0.0 0 0 [kworker/4:1] 21562 2 1 ffff8803f7b61b80 UN 0.0 0 0 [kworker/1:2] 21730 2 7 ffff8803d6525280 UN 0.0 0 0 [kworker/7:3] 22775 2 4 ffff880396dbe040 UN 0.0 0 0 [kworker/4:2] 22879 2 6 ffff880396db8000 UN 0.0 0 0 [kworker/6:1] 22937 2 7 ffff880396dbc4c0 UN 0.0 0 0 [kworker/7:0] 22952 2 4 ffff880396dbee00 UN 0.0 0 0 [kworker/u16:2] 23004 2 5 ffff880396dbb700 UN 0.0 0 0 [kworker/5:2] 23070 2 1 ffff8803d6520dc0 UN 0.0 0 0 [kworker/1:1] 23152 4478 2 ffff8803a8a444c0 UN 0.0 5748 1656 sync 31874 2 3 ffff88018740c4c0 UN 0.0 0 0 [kworker/3:0]
BTW, the only other commit in that series that affects bfq interaction with the rest of the system is: a52a69ea89dc block, bfq: limit tags for writes and async I/O
I'll see if that changes things on the side. Work beckons.
And the other commits that do something beyond changing some calculation are: 0d52af590552 block, bfq: release oom-queue ref to root group on exit 52257ffbfcaf block, bfq: put async queues for root bfq groups too
Thanks, Paolo
-Mike