On 11/21/2017 07:39 PM, Jens Axboe wrote:
> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>> Bisect points to
>>>>>>
>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>> Author: Christoph Hellwig hch@lst.de
>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>
>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>
>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>
>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>> the code.
>>>>>>
>>>>>> Signed-off-by: Christoph Hellwig hch@lst.de
>>>>>> Reviewed-by: Jens Axboe axboe@kernel.dk
>>>>>> Cc: Keith Busch keith.busch@intel.com
>>>>>> Cc: linux-block@vger.kernel.org
>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>> Signed-off-by: Thomas Gleixner tglx@linutronix.de
>>>>>> Cc: Oleksandr Natalenko oleksandr@natalenko.name
>>>>>> Cc: Mike Galbraith efault@gmx.de
>>>>>> Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
>>>>>
>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>> take a look.
>>>>
>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>> that if I offline a few CPUs here and register a queue, those still show
>>>> up as present (just offline) and get mapped accordingly.
>>>>
>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>> up as present and it gets hotplugged, then I can see how this condition
>>>> would trigger. What environment are you running this in? We might have
>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>> for a dead cpu and handle that.
>>>
>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>> not available CPU.
>>>
>>> in libvirt/virsh speak:
>>> <vcpu placement='static' current='1'>4</vcpu>
>>
>> So that's why we run into problems. It's not present when we load the device,
>> but becomes present and online afterwards.
>>
>> Christoph, we used to handle this just fine, your patch broke it.
>>
>> I'll see if I can come up with an appropriate fix.
>
> Can you try the below?
It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
output with 2 cpus:
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat
Try this, basically just a revert.