On Wed, Nov 08, 2017 at 01:41:06PM +0100, Rafael J. Wysocki wrote:
On Wednesday, November 8, 2017 1:31:22 PM CET Ville Syrjälä wrote:
On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki rjw@rjwysocki.net wrote:
On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala ville.syrjala@linux.intel.com wrote:
From: Ville Syrjälä ville.syrjala@linux.intel.com
acpi_remove_pm_notifier() ends up calling flush_workqueue() while holding acpi_pm_notifier_lock, and that same lock is taken by by the work via acpi_pm_notify_handler(). This can deadlock.
OK, good catch!
[cut]
Cc: stable@vger.kernel.org Cc: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: Len Brown lenb@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Tejun Heo tj@kernel.org Cc: Ingo Molnar mingo@kernel.org Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications") Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com
drivers/acpi/device_pm.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index fbcc73f7a099..18af71057b44 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
#ifdef CONFIG_PM static DEFINE_MUTEX(acpi_pm_notifier_lock); +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
void acpi_pm_wakeup_event(struct device *dev) { @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, if (!dev && !func) return AE_BAD_PARAMETER;
mutex_lock(&acpi_pm_notifier_lock);
mutex_lock(&acpi_pm_notifier_install_lock); if (adev->wakeup.flags.notifier_present) goto out;
adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
adev->wakeup.context.dev = dev;
adev->wakeup.context.func = func;
But this doesn't look good to me.
notifier_present should be checked under acpi_pm_notifier_lock.
Actually, acpi_install_notify_handler() itself need not be called under the lock, because it does sufficient checks of its own.
So say you do
mutex_lock(&acpi_pm_notifier_lock);
if (adev->wakeup.context.dev) goto out;
adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); adev->wakeup.context.dev = dev; adev->wakeup.context.func = func;
mutex_unlock(&acpi_pm_notifier_lock);
status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY, acpi_pm_notify_handler, NULL); if (ACPI_FAILURE(status)) goto out;
mutex_lock(&acpi_pm_notifier_lock);
And here you just set notifier_present under acpi_pm_notifier_lock.
adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
adev->wakeup.context.dev = dev;
adev->wakeup.context.func = func; adev->wakeup.flags.notifier_present = true;
mutex_unlock(&acpi_pm_notifier_lock);
out:
mutex_unlock(&acpi_pm_notifier_lock);
mutex_unlock(&acpi_pm_notifier_install_lock); return status;
}
Then on removal you can clear notifier_present first and drop the lock around the acpi_remove_notify_handler() call and nothing bad will happen.
If you call acpi_add_pm_notifier() twice in parallel, the first instance will set context.dev and the second one will see it set and bail out. The first instance will then do the rest.
If you call acpi_remove_pm_notifier() twice in a row, the first instance will see notifier_present set and will clear it, so the second one will see notifier_present unset and it will bail out.
Now, if you call acpi_remove_pm_notifier() in parallel with acpi_add_pm_notifier(), either the former will see notifier_present unset and bail out, or the latter will see context.dev unset and bail out.
It doesn't look like the outer lock is needed, or have I missed anything?
So something like the below (totally untested) should work too, shouldn't it?
There is a problem if a device is removed while acpi_add_pm_notifier() is still in progress, in which case with my patch the acpi_remove_pm_notifier() called from the removal path may bail out prematurely (that doesn't seem likely to happen, but still I don't see why it cannot happen), so I'll just use your patch. :-)
OK. I was just looking at your version and was pretty much convinced that it would work. But I'll take your word that it might not :)
Well, you don't have to. :-)
The scenario I have in mind is as follows:
acpi_add_pm_notifier() sets context.dev and context.func and drops the lock. notifier_present is still unset.
acpi_remove_pm_notifier() checks notifier_present under the lock. It is (still) unset, so the function decides that there's nothing to do.
acpi_add_pm_notifier() continues with notifier installation and the device goes away at the same time.
Right. That makes sense. Though I don't know what would prevent racing add_pm_notifier() against remove_pm_notifier() in a similar way even with the outer lock. In that case the remove_pm_notifier() would just have to get at the mutex first and then we'd still end up with the notifier installed.