On 01.10.18 at 11:03, jgross@suse.com wrote:
On 01/10/2018 10:57, Jan Beulich wrote:
On 01.10.18 at 09:16, jgross@suse.com wrote:
xen_qlock_wait() isn't safe for nested calls due to interrupts. A call of xen_qlock_kick() might be ignored in case a deeper nesting level was active right before the call of xen_poll_irq():
CPU 1: CPU 2: spin_lock(lock1) spin_lock(lock1) -> xen_qlock_wait() -> xen_clear_irq_pending() Interrupt happens spin_unlock(lock1) -> xen_qlock_kick(CPU 2) spin_lock_irqsave(lock2) spin_lock_irqsave(lock2) -> xen_qlock_wait() -> xen_clear_irq_pending() clears kick for lock1 -> xen_poll_irq() spin_unlock_irq_restore(lock2) -> xen_qlock_kick(CPU 2) wakes up spin_unlock_irq_restore(lock2) IRET resumes in xen_qlock_wait() -> xen_poll_irq() never wakes up
The solution is to disable interrupts in xen_qlock_wait() and not to poll for the irq in case xen_qlock_wait() is called in nmi context.
Are precautions against NMI really worthwhile? Locks acquired both in NMI context as well as outside of it are liable to deadlock anyway, aren't they?
The locks don't need to be the same. A NMI-only lock tried to be acquired with xen_qlock_wait() for another lock having been interrupted by the NMI will be enough to risk the issue.
Ah, right. In which case Reviewed-by: Jan Beulich jbeulich@suse.com
Jan