On Tue, Oct 01, 2019 at 01:37:24PM -0400, Boris Ostrovsky wrote:
On 10/1/19 11:03 AM, Juergen Gross wrote:
In case a user process using xenbus has open transactions and is killed e.g. via ctrl-C the following cleanup of the allocated resources might result in a deadlock due to trying to end a transaction in the xenbus worker thread:
[ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds. [ 2551.492215] Tainted: P OE 5.0.0-29-generic #5 [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2551.528585] xenbus D 0 37 2 0x80000080 [ 2551.528590] Call Trace: [ 2551.528603] __schedule+0x2c0/0x870 [ 2551.528606] ? _cond_resched+0x19/0x40 [ 2551.528632] schedule+0x2c/0x70 [ 2551.528637] xs_talkv+0x1ec/0x2b0 [ 2551.528642] ? wait_woken+0x80/0x80 [ 2551.528645] xs_single+0x53/0x80 [ 2551.528648] xenbus_transaction_end+0x3b/0x70 [ 2551.528651] xenbus_file_free+0x5a/0x160 [ 2551.528654] xenbus_dev_queue_reply+0xc4/0x220 [ 2551.528657] xenbus_thread+0x7de/0x880 [ 2551.528660] ? wait_woken+0x80/0x80 [ 2551.528665] kthread+0x121/0x140 [ 2551.528667] ? xb_read+0x1d0/0x1d0 [ 2551.528670] ? kthread_park+0x90/0x90 [ 2551.528673] ret_from_fork+0x35/0x40
Fix this by doing the cleanup via a workqueue instead.
Reported-by: James Dingwall james@dingwall.me.uk Fixes: fd8aa9095a95c ("xen: optimize xenbus driver for multiple concurrent xenstore accesses") Cc: stable@vger.kernel.org # 4.11 Signed-off-by: Juergen Gross jgross@suse.com
Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com
Tested-by: James Dingwall james@dingwall.me.uk
This patch does resolve the observed issue although for my (extreme and not representative of our normal workload) test case the worker still gets blocked for some time if the xenstore-rm is interrupted and no concurrent xenstore commands can run. I assume that the worker completes the rm and then does a rollback in the background rather than being interrupted early as a result of the userspace program being terminated.
Thanks, James