Recently a performance problem was reported for a process invoking a non-trival ASL program. The method call in this case ends up repetitively triggering a call path like:
acpi_ex_store acpi_ex_store_object_to_node acpi_ex_write_data_to_field acpi_ex_insert_into_field acpi_ex_write_with_update_rule acpi_ex_field_datum_io acpi_ex_access_region acpi_ev_address_space_dispatch acpi_ex_system_memory_space_handler acpi_os_map_cleanup.part.14 _synchronize_rcu_expedited.constprop.89 schedule
The end result of frequent synchronize_rcu_expedited() invocation is tiny sub-millisecond spurts of execution where the scheduler freely migrates this apparently sleepy task. The overhead of frequent scheduler invocation multiplies the execution time by a factor of 2-3X.
For example, performance improves from 16 minutes to 7 minutes for a firmware update procedure across 24 devices.
Perhaps the rcu usage was intended to allow for not taking a sleeping lock in the acpi_os_{read,write}_memory() path which ostensibly could be called from an APEI NMI error interrupt? Neither rcu_read_lock() nor ioremap() are interrupt safe, so add a WARN_ONCE() to validate that rcu was not serving as a mechanism to avoid direct calls to ioremap(). Even the original implementation had a spin_lock_irqsave(), but that is not NMI safe.
APEI itself already has some concept of avoiding ioremap() from interrupt context (see erst_exec_move_data()), if the new warning triggers it means that APEI either needs more instrumentation like that to pre-emptively fail, or more infrastructure to arrange for pre-mapping the resources it needs in NMI context.
Cc: stable@vger.kernel.org Fixes: 620242ae8c3d ("ACPI: Maintain a list of ACPI memory mapped I/O remappings") Cc: Len Brown lenb@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Ira Weiny ira.weiny@intel.com Cc: James Morse james.morse@arm.com Cc: Erik Kaneda erik.kaneda@intel.com Cc: Myron Stowe myron.stowe@redhat.com Cc: "Rafael J. Wysocki" rjw@rjwysocki.net Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com --- drivers/acpi/osl.c | 117 +++++++++++++++++++++++++--------------------------- 1 file changed, 57 insertions(+), 60 deletions(-)
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index 762c5d50b8fe..207528c71e9c 100644 --- a/drivers/acpi/osl.c +++ b/drivers/acpi/osl.c @@ -214,13 +214,13 @@ acpi_physical_address __init acpi_os_get_root_pointer(void) return pa; }
-/* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */ static struct acpi_ioremap * acpi_map_lookup(acpi_physical_address phys, acpi_size size) { struct acpi_ioremap *map;
- list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held()) + lockdep_assert_held(&acpi_ioremap_lock); + list_for_each_entry(map, &acpi_ioremaps, list) if (map->phys <= phys && phys + size <= map->phys + map->size) return map; @@ -228,7 +228,6 @@ acpi_map_lookup(acpi_physical_address phys, acpi_size size) return NULL; }
-/* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */ static void __iomem * acpi_map_vaddr_lookup(acpi_physical_address phys, unsigned int size) { @@ -263,7 +262,8 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size size) { struct acpi_ioremap *map;
- list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held()) + lockdep_assert_held(&acpi_ioremap_lock); + list_for_each_entry(map, &acpi_ioremaps, list) if (map->virt <= virt && virt + size <= map->virt + map->size) return map; @@ -360,7 +360,7 @@ void __iomem __ref map->size = pg_sz; map->refcount = 1;
- list_add_tail_rcu(&map->list, &acpi_ioremaps); + list_add_tail(&map->list, &acpi_ioremaps);
out: mutex_unlock(&acpi_ioremap_lock); @@ -374,20 +374,13 @@ void *__ref acpi_os_map_memory(acpi_physical_address phys, acpi_size size) } EXPORT_SYMBOL_GPL(acpi_os_map_memory);
-/* Must be called with mutex_lock(&acpi_ioremap_lock) */ -static unsigned long acpi_os_drop_map_ref(struct acpi_ioremap *map) -{ - unsigned long refcount = --map->refcount; - - if (!refcount) - list_del_rcu(&map->list); - return refcount; -} - -static void acpi_os_map_cleanup(struct acpi_ioremap *map) +static void acpi_os_drop_map_ref(struct acpi_ioremap *map) { - synchronize_rcu_expedited(); + lockdep_assert_held(&acpi_ioremap_lock); + if (--map->refcount > 0) + return; acpi_unmap(map->phys, map->virt); + list_del(&map->list); kfree(map); }
@@ -408,7 +401,6 @@ static void acpi_os_map_cleanup(struct acpi_ioremap *map) void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size) { struct acpi_ioremap *map; - unsigned long refcount;
if (!acpi_permanent_mmap) { __acpi_unmap_table(virt, size); @@ -422,11 +414,8 @@ void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size) WARN(true, PREFIX "%s: bad address %p\n", __func__, virt); return; } - refcount = acpi_os_drop_map_ref(map); + acpi_os_drop_map_ref(map); mutex_unlock(&acpi_ioremap_lock); - - if (!refcount) - acpi_os_map_cleanup(map); } EXPORT_SYMBOL_GPL(acpi_os_unmap_iomem);
@@ -461,7 +450,6 @@ void acpi_os_unmap_generic_address(struct acpi_generic_address *gas) { u64 addr; struct acpi_ioremap *map; - unsigned long refcount;
if (gas->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY) return; @@ -477,11 +465,8 @@ void acpi_os_unmap_generic_address(struct acpi_generic_address *gas) mutex_unlock(&acpi_ioremap_lock); return; } - refcount = acpi_os_drop_map_ref(map); + acpi_os_drop_map_ref(map); mutex_unlock(&acpi_ioremap_lock); - - if (!refcount) - acpi_os_map_cleanup(map); } EXPORT_SYMBOL(acpi_os_unmap_generic_address);
@@ -700,55 +685,71 @@ int acpi_os_read_iomem(void __iomem *virt_addr, u64 *value, u32 width) return 0; }
+static void __iomem *acpi_os_rw_map(acpi_physical_address phys_addr, + unsigned int size, bool *did_fallback) +{ + void __iomem *virt_addr = NULL; + + if (WARN_ONCE(in_interrupt(), "ioremap in interrupt context\n")) + return NULL; + + /* Try to use a cached mapping and fallback otherwise */ + *did_fallback = false; + mutex_lock(&acpi_ioremap_lock); + virt_addr = acpi_map_vaddr_lookup(phys_addr, size); + if (virt_addr) + return virt_addr; + mutex_unlock(&acpi_ioremap_lock); + + virt_addr = acpi_os_ioremap(phys_addr, size); + *did_fallback = true; + + return virt_addr; +} + +static void acpi_os_rw_unmap(void __iomem *virt_addr, bool did_fallback) +{ + if (did_fallback) { + /* in the fallback case no lock is held */ + iounmap(virt_addr); + return; + } + + mutex_unlock(&acpi_ioremap_lock); +} + acpi_status acpi_os_read_memory(acpi_physical_address phys_addr, u64 *value, u32 width) { - void __iomem *virt_addr; unsigned int size = width / 8; - bool unmap = false; + bool did_fallback = false; + void __iomem *virt_addr; u64 dummy; int error;
- rcu_read_lock(); - virt_addr = acpi_map_vaddr_lookup(phys_addr, size); - if (!virt_addr) { - rcu_read_unlock(); - virt_addr = acpi_os_ioremap(phys_addr, size); - if (!virt_addr) - return AE_BAD_ADDRESS; - unmap = true; - } - + virt_addr = acpi_os_rw_map(phys_addr, size, &did_fallback); + if (!virt_addr) + return AE_BAD_ADDRESS; if (!value) value = &dummy;
error = acpi_os_read_iomem(virt_addr, value, width); BUG_ON(error);
- if (unmap) - iounmap(virt_addr); - else - rcu_read_unlock(); - + acpi_os_rw_unmap(virt_addr, did_fallback); return AE_OK; }
acpi_status acpi_os_write_memory(acpi_physical_address phys_addr, u64 value, u32 width) { - void __iomem *virt_addr; unsigned int size = width / 8; - bool unmap = false; + bool did_fallback = false; + void __iomem *virt_addr;
- rcu_read_lock(); - virt_addr = acpi_map_vaddr_lookup(phys_addr, size); - if (!virt_addr) { - rcu_read_unlock(); - virt_addr = acpi_os_ioremap(phys_addr, size); - if (!virt_addr) - return AE_BAD_ADDRESS; - unmap = true; - } + virt_addr = acpi_os_rw_map(phys_addr, size, &did_fallback); + if (!virt_addr) + return AE_BAD_ADDRESS;
switch (width) { case 8: @@ -767,11 +768,7 @@ acpi_os_write_memory(acpi_physical_address phys_addr, u64 value, u32 width) BUG(); }
- if (unmap) - iounmap(virt_addr); - else - rcu_read_unlock(); - + acpi_os_rw_unmap(virt_addr, did_fallback); return AE_OK; }
On Thu, May 7, 2020 at 3:21 AM Dan Williams dan.j.williams@intel.com wrote:
Recently a performance problem was reported for a process invoking a non-trival ASL program. The method call in this case ends up repetitively triggering a call path like:
acpi_ex_store acpi_ex_store_object_to_node acpi_ex_write_data_to_field acpi_ex_insert_into_field acpi_ex_write_with_update_rule acpi_ex_field_datum_io acpi_ex_access_region acpi_ev_address_space_dispatch acpi_ex_system_memory_space_handler acpi_os_map_cleanup.part.14 _synchronize_rcu_expedited.constprop.89 schedule
The end result of frequent synchronize_rcu_expedited() invocation is tiny sub-millisecond spurts of execution where the scheduler freely migrates this apparently sleepy task. The overhead of frequent scheduler invocation multiplies the execution time by a factor of 2-3X.
For example, performance improves from 16 minutes to 7 minutes for a firmware update procedure across 24 devices.
Perhaps the rcu usage was intended to allow for not taking a sleeping lock in the acpi_os_{read,write}_memory() path which ostensibly could be called from an APEI NMI error interrupt? Neither rcu_read_lock() nor ioremap() are interrupt safe, so add a WARN_ONCE() to validate that rcu was not serving as a mechanism to avoid direct calls to ioremap(). Even the original implementation had a spin_lock_irqsave(), but that is not NMI safe.
APEI itself already has some concept of avoiding ioremap() from interrupt context (see erst_exec_move_data()), if the new warning triggers it means that APEI either needs more instrumentation like that to pre-emptively fail, or more infrastructure to arrange for pre-mapping the resources it needs in NMI context.
...
+static void __iomem *acpi_os_rw_map(acpi_physical_address phys_addr,
unsigned int size, bool *did_fallback)
+{
void __iomem *virt_addr = NULL;
Assignment is not needed as far as I can see.
if (WARN_ONCE(in_interrupt(), "ioremap in interrupt context\n"))
return NULL;
/* Try to use a cached mapping and fallback otherwise */
*did_fallback = false;
mutex_lock(&acpi_ioremap_lock);
virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
if (virt_addr)
return virt_addr;
mutex_unlock(&acpi_ioremap_lock);
virt_addr = acpi_os_ioremap(phys_addr, size);
*did_fallback = true;
return virt_addr;
+}
I'm wondering if Sparse is okay with this...
+static void acpi_os_rw_unmap(void __iomem *virt_addr, bool did_fallback) +{
if (did_fallback) {
/* in the fallback case no lock is held */
iounmap(virt_addr);
return;
}
mutex_unlock(&acpi_ioremap_lock);
+}
...and this functions from locking perspective.
On Thu, May 7, 2020 at 2:25 AM Andy Shevchenko andy.shevchenko@gmail.com wrote:
On Thu, May 7, 2020 at 3:21 AM Dan Williams dan.j.williams@intel.com wrote:
Recently a performance problem was reported for a process invoking a non-trival ASL program. The method call in this case ends up repetitively triggering a call path like:
acpi_ex_store acpi_ex_store_object_to_node acpi_ex_write_data_to_field acpi_ex_insert_into_field acpi_ex_write_with_update_rule acpi_ex_field_datum_io acpi_ex_access_region acpi_ev_address_space_dispatch acpi_ex_system_memory_space_handler acpi_os_map_cleanup.part.14 _synchronize_rcu_expedited.constprop.89 schedule
The end result of frequent synchronize_rcu_expedited() invocation is tiny sub-millisecond spurts of execution where the scheduler freely migrates this apparently sleepy task. The overhead of frequent scheduler invocation multiplies the execution time by a factor of 2-3X.
For example, performance improves from 16 minutes to 7 minutes for a firmware update procedure across 24 devices.
Perhaps the rcu usage was intended to allow for not taking a sleeping lock in the acpi_os_{read,write}_memory() path which ostensibly could be called from an APEI NMI error interrupt? Neither rcu_read_lock() nor ioremap() are interrupt safe, so add a WARN_ONCE() to validate that rcu was not serving as a mechanism to avoid direct calls to ioremap(). Even the original implementation had a spin_lock_irqsave(), but that is not NMI safe.
APEI itself already has some concept of avoiding ioremap() from interrupt context (see erst_exec_move_data()), if the new warning triggers it means that APEI either needs more instrumentation like that to pre-emptively fail, or more infrastructure to arrange for pre-mapping the resources it needs in NMI context.
...
+static void __iomem *acpi_os_rw_map(acpi_physical_address phys_addr,
unsigned int size, bool *did_fallback)
+{
void __iomem *virt_addr = NULL;
Assignment is not needed as far as I can see.
True, holdover from a previous version, will drop.
if (WARN_ONCE(in_interrupt(), "ioremap in interrupt context\n"))
return NULL;
/* Try to use a cached mapping and fallback otherwise */
*did_fallback = false;
mutex_lock(&acpi_ioremap_lock);
virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
if (virt_addr)
return virt_addr;
mutex_unlock(&acpi_ioremap_lock);
virt_addr = acpi_os_ioremap(phys_addr, size);
*did_fallback = true;
return virt_addr;
+}
I'm wondering if Sparse is okay with this...
Seems like it is:
$ sparse --version v0.6.1-191-gc51a0382202e
$ cat out | grep osl.c CHECK drivers/acpi/osl.c drivers/acpi/osl.c:373:17: warning: cast removes address space 'asn:2' of expression
...was the only warning I got.
On 5/7/2020 1:39 AM, Dan Williams wrote:
Recently a performance problem was reported for a process invoking a non-trival ASL program. The method call in this case ends up repetitively triggering a call path like:
acpi_ex_store acpi_ex_store_object_to_node acpi_ex_write_data_to_field acpi_ex_insert_into_field acpi_ex_write_with_update_rule acpi_ex_field_datum_io acpi_ex_access_region acpi_ev_address_space_dispatch acpi_ex_system_memory_space_handler acpi_os_map_cleanup.part.14 _synchronize_rcu_expedited.constprop.89 schedule
The end result of frequent synchronize_rcu_expedited() invocation is tiny sub-millisecond spurts of execution where the scheduler freely migrates this apparently sleepy task. The overhead of frequent scheduler invocation multiplies the execution time by a factor of 2-3X.
For example, performance improves from 16 minutes to 7 minutes for a firmware update procedure across 24 devices.
Perhaps the rcu usage was intended to allow for not taking a sleeping lock in the acpi_os_{read,write}_memory() path which ostensibly could be called from an APEI NMI error interrupt? Neither rcu_read_lock() nor ioremap() are interrupt safe, so add a WARN_ONCE() to validate that rcu was not serving as a mechanism to avoid direct calls to ioremap(). Even the original implementation had a spin_lock_irqsave(), but that is not NMI safe.
APEI itself already has some concept of avoiding ioremap() from interrupt context (see erst_exec_move_data()), if the new warning triggers it means that APEI either needs more instrumentation like that to pre-emptively fail, or more infrastructure to arrange for pre-mapping the resources it needs in NMI context.
Cc: stable@vger.kernel.org Fixes: 620242ae8c3d ("ACPI: Maintain a list of ACPI memory mapped I/O remappings") Cc: Len Brown lenb@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Ira Weiny ira.weiny@intel.com Cc: James Morse james.morse@arm.com Cc: Erik Kaneda erik.kaneda@intel.com Cc: Myron Stowe myron.stowe@redhat.com Cc: "Rafael J. Wysocki" rjw@rjwysocki.net Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com
linux-acpi is kind of relevant for this too, so please CC it.
drivers/acpi/osl.c | 117 +++++++++++++++++++++++++--------------------------- 1 file changed, 57 insertions(+), 60 deletions(-)
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index 762c5d50b8fe..207528c71e9c 100644 --- a/drivers/acpi/osl.c +++ b/drivers/acpi/osl.c @@ -214,13 +214,13 @@ acpi_physical_address __init acpi_os_get_root_pointer(void) return pa; } -/* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */ static struct acpi_ioremap * acpi_map_lookup(acpi_physical_address phys, acpi_size size) { struct acpi_ioremap *map;
- list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
- lockdep_assert_held(&acpi_ioremap_lock);
- list_for_each_entry(map, &acpi_ioremaps, list) if (map->phys <= phys && phys + size <= map->phys + map->size) return map;
@@ -228,7 +228,6 @@ acpi_map_lookup(acpi_physical_address phys, acpi_size size) return NULL; } -/* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */ static void __iomem * acpi_map_vaddr_lookup(acpi_physical_address phys, unsigned int size) { @@ -263,7 +262,8 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size size) { struct acpi_ioremap *map;
- list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
- lockdep_assert_held(&acpi_ioremap_lock);
- list_for_each_entry(map, &acpi_ioremaps, list) if (map->virt <= virt && virt + size <= map->virt + map->size) return map;
@@ -360,7 +360,7 @@ void __iomem __ref map->size = pg_sz; map->refcount = 1;
- list_add_tail_rcu(&map->list, &acpi_ioremaps);
- list_add_tail(&map->list, &acpi_ioremaps);
out: mutex_unlock(&acpi_ioremap_lock); @@ -374,20 +374,13 @@ void *__ref acpi_os_map_memory(acpi_physical_address phys, acpi_size size) } EXPORT_SYMBOL_GPL(acpi_os_map_memory); -/* Must be called with mutex_lock(&acpi_ioremap_lock) */ -static unsigned long acpi_os_drop_map_ref(struct acpi_ioremap *map) -{
- unsigned long refcount = --map->refcount;
- if (!refcount)
list_del_rcu(&map->list);
- return refcount;
-}
-static void acpi_os_map_cleanup(struct acpi_ioremap *map) +static void acpi_os_drop_map_ref(struct acpi_ioremap *map) {
- synchronize_rcu_expedited();
- lockdep_assert_held(&acpi_ioremap_lock);
- if (--map->refcount > 0)
acpi_unmap(map->phys, map->virt);return;
- list_del(&map->list); kfree(map); }
@@ -408,7 +401,6 @@ static void acpi_os_map_cleanup(struct acpi_ioremap *map) void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size) { struct acpi_ioremap *map;
- unsigned long refcount;
if (!acpi_permanent_mmap) { __acpi_unmap_table(virt, size); @@ -422,11 +414,8 @@ void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size) WARN(true, PREFIX "%s: bad address %p\n", __func__, virt); return; }
- refcount = acpi_os_drop_map_ref(map);
- acpi_os_drop_map_ref(map); mutex_unlock(&acpi_ioremap_lock);
- if (!refcount)
} EXPORT_SYMBOL_GPL(acpi_os_unmap_iomem);acpi_os_map_cleanup(map);
@@ -461,7 +450,6 @@ void acpi_os_unmap_generic_address(struct acpi_generic_address *gas) { u64 addr; struct acpi_ioremap *map;
- unsigned long refcount;
if (gas->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY) return; @@ -477,11 +465,8 @@ void acpi_os_unmap_generic_address(struct acpi_generic_address *gas) mutex_unlock(&acpi_ioremap_lock); return; }
- refcount = acpi_os_drop_map_ref(map);
- acpi_os_drop_map_ref(map); mutex_unlock(&acpi_ioremap_lock);
- if (!refcount)
} EXPORT_SYMBOL(acpi_os_unmap_generic_address);acpi_os_map_cleanup(map);
@@ -700,55 +685,71 @@ int acpi_os_read_iomem(void __iomem *virt_addr, u64 *value, u32 width) return 0; } +static void __iomem *acpi_os_rw_map(acpi_physical_address phys_addr,
unsigned int size, bool *did_fallback)
+{
- void __iomem *virt_addr = NULL;
- if (WARN_ONCE(in_interrupt(), "ioremap in interrupt context\n"))
return NULL;
- /* Try to use a cached mapping and fallback otherwise */
- *did_fallback = false;
- mutex_lock(&acpi_ioremap_lock);
- virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
- if (virt_addr)
return virt_addr;
- mutex_unlock(&acpi_ioremap_lock);
- virt_addr = acpi_os_ioremap(phys_addr, size);
- *did_fallback = true;
- return virt_addr;
+}
+static void acpi_os_rw_unmap(void __iomem *virt_addr, bool did_fallback) +{
- if (did_fallback) {
/* in the fallback case no lock is held */
iounmap(virt_addr);
return;
- }
- mutex_unlock(&acpi_ioremap_lock);
+}
- acpi_status acpi_os_read_memory(acpi_physical_address phys_addr, u64 *value, u32 width) {
- void __iomem *virt_addr; unsigned int size = width / 8;
- bool unmap = false;
- bool did_fallback = false;
- void __iomem *virt_addr; u64 dummy; int error;
- rcu_read_lock();
- virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
- if (!virt_addr) {
rcu_read_unlock();
virt_addr = acpi_os_ioremap(phys_addr, size);
if (!virt_addr)
return AE_BAD_ADDRESS;
unmap = true;
- }
- virt_addr = acpi_os_rw_map(phys_addr, size, &did_fallback);
- if (!virt_addr)
if (!value) value = &dummy;return AE_BAD_ADDRESS;
error = acpi_os_read_iomem(virt_addr, value, width); BUG_ON(error);
- if (unmap)
iounmap(virt_addr);
- else
rcu_read_unlock();
- acpi_os_rw_unmap(virt_addr, did_fallback); return AE_OK; }
acpi_status acpi_os_write_memory(acpi_physical_address phys_addr, u64 value, u32 width) {
- void __iomem *virt_addr; unsigned int size = width / 8;
- bool unmap = false;
- bool did_fallback = false;
- void __iomem *virt_addr;
- rcu_read_lock();
- virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
- if (!virt_addr) {
rcu_read_unlock();
virt_addr = acpi_os_ioremap(phys_addr, size);
if (!virt_addr)
return AE_BAD_ADDRESS;
unmap = true;
- }
- virt_addr = acpi_os_rw_map(phys_addr, size, &did_fallback);
- if (!virt_addr)
return AE_BAD_ADDRESS;
switch (width) { case 8: @@ -767,11 +768,7 @@ acpi_os_write_memory(acpi_physical_address phys_addr, u64 value, u32 width) BUG(); }
- if (unmap)
iounmap(virt_addr);
- else
rcu_read_unlock();
- acpi_os_rw_unmap(virt_addr, did_fallback); return AE_OK; }
On Thu, May 7, 2020 at 9:43 AM Rafael J. Wysocki rafael.j.wysocki@intel.com wrote:
On 5/7/2020 1:39 AM, Dan Williams wrote:
Recently a performance problem was reported for a process invoking a non-trival ASL program. The method call in this case ends up repetitively triggering a call path like:
acpi_ex_store acpi_ex_store_object_to_node acpi_ex_write_data_to_field acpi_ex_insert_into_field acpi_ex_write_with_update_rule acpi_ex_field_datum_io acpi_ex_access_region acpi_ev_address_space_dispatch acpi_ex_system_memory_space_handler acpi_os_map_cleanup.part.14 _synchronize_rcu_expedited.constprop.89 schedule
The end result of frequent synchronize_rcu_expedited() invocation is tiny sub-millisecond spurts of execution where the scheduler freely migrates this apparently sleepy task. The overhead of frequent scheduler invocation multiplies the execution time by a factor of 2-3X.
For example, performance improves from 16 minutes to 7 minutes for a firmware update procedure across 24 devices.
Perhaps the rcu usage was intended to allow for not taking a sleeping lock in the acpi_os_{read,write}_memory() path which ostensibly could be called from an APEI NMI error interrupt? Neither rcu_read_lock() nor ioremap() are interrupt safe, so add a WARN_ONCE() to validate that rcu was not serving as a mechanism to avoid direct calls to ioremap(). Even the original implementation had a spin_lock_irqsave(), but that is not NMI safe.
APEI itself already has some concept of avoiding ioremap() from interrupt context (see erst_exec_move_data()), if the new warning triggers it means that APEI either needs more instrumentation like that to pre-emptively fail, or more infrastructure to arrange for pre-mapping the resources it needs in NMI context.
Cc: stable@vger.kernel.org Fixes: 620242ae8c3d ("ACPI: Maintain a list of ACPI memory mapped I/O remappings") Cc: Len Brown lenb@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Ira Weiny ira.weiny@intel.com Cc: James Morse james.morse@arm.com Cc: Erik Kaneda erik.kaneda@intel.com Cc: Myron Stowe myron.stowe@redhat.com Cc: "Rafael J. Wysocki" rjw@rjwysocki.net Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com
linux-acpi is kind of relevant for this too, so please CC it.
Whoops, my bad. Will resend with some of Andy's cleanup suggestions.
Greeting,
FYI, we noticed the following commit (built with gcc-7):
commit: b13663bdf9701c8896bebcc7ee998f8656c1ea37 ("[PATCH] ACPI: Drop rcu usage for MMIO mappings") url: https://github.com/0day-ci/linux/commits/Dan-Williams/ACPI-Drop-rcu-usage-fo... base: https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git linux-next
in testcase: v4l2 with following parameters:
test: device ucode: 0x43
on test machine: 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 256G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag Reported-by: kernel test robot rong.a.chen@intel.com
[ 21.012858] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281 [ 21.013816] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 [ 21.029953] tsc: Refined TSC clocksource calibration: 2294.686 MHz [ 21.013816] CPU: 55 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3-00025-gb13663bdf9701c #1 [ 21.013816] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [ 21.013816] Call Trace: [ 21.042037] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x211399552f8, max_idle_ns: 440795292447 ns [ 21.013816] dump_stack+0x66/0x8b [ 21.064421] ___might_sleep+0x102/0x120 [ 21.064421] mutex_lock+0x1c/0x40 [ 21.064421] acpi_os_rw_map+0x37/0xe0 [ 21.064421] acpi_os_read_memory+0x34/0xc0 [ 21.064421] ? acpi_match_platform_list+0x84/0x100 [ 21.064421] apei_read+0x97/0xb0 [ 21.064421] __ghes_peek_estatus+0x27/0xc0 [ 21.064421] ghes_proc+0x37/0x120 [ 21.064421] ghes_probe+0x1d0/0x460 [ 21.064421] platform_drv_probe+0x37/0x90 [ 21.064421] really_probe+0xef/0x430 [ 21.064421] driver_probe_device+0x110/0x120 [ 21.064421] device_driver_attach+0x4f/0x60 [ 21.064421] __driver_attach+0x9a/0x140 [ 21.064421] ? device_driver_attach+0x60/0x60 [ 21.064421] bus_for_each_dev+0x76/0xc0 [ 21.064421] ? klist_add_tail+0x3b/0x70 [ 21.064421] bus_add_driver+0x144/0x220 [ 21.064421] ? bert_init+0x229/0x229 [ 21.064421] driver_register+0x5b/0xf0 [ 21.064421] ? bert_init+0x229/0x229 [ 21.064421] ghes_init+0x83/0xde [ 21.064421] do_one_initcall+0x46/0x220 [ 21.064421] kernel_init_freeable+0x206/0x280 [ 21.064421] ? rest_init+0xd0/0xd0 [ 21.064421] kernel_init+0xa/0x110 [ 21.064421] ret_from_fork+0x35/0x40 [ 21.211518] clocksource: Switched to clocksource tsc [ 21.212408] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. [ 21.227478] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 21.235019] 00:02: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A [ 21.244105] 00:03: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A [ 21.254257] Non-volatile memory driver v1.3 [ 21.259421] Linux agpgart interface v0.103 [ 21.272262] rdac: device handler registered [ 21.277466] hp_sw: device handler registered [ 21.282671] emc: device handler registered [ 21.288039] alua: device handler registered [ 21.293154] MACsec IEEE 802.1AE [ 21.297325] libphy: Fixed MDIO Bus: probed [ 21.302666] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI [ 21.310980] e1000: Copyright (c) 1999-2006 Intel Corporation. [ 21.317926] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k [ 21.324883] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 21.332069] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.6.0-k [ 21.340297] igb: Copyright (c) 2007-2014 Intel Corporation. [ 21.347058] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 5.1.0-k [ 21.356399] ixgbe: Copyright (c) 1999-2016 Intel Corporation. [ 21.363577] IOAPIC[9]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:EF Dest:00000001 SID:002C SQ:0 SVT:1) [ 21.379417] IOAPIC[1]: Set routing entry (9-13 -> 0xef -> IRQ 38 Mode:1 Active:1 Dest:1) [ 21.665318] ixgbe 0000:03:00.0: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0 [ 21.761727] ixgbe 0000:03:00.0: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 21.795985] ixgbe 0000:03:00.0: MAC: 3, PHY: 0, PBA No: 000000-000 [ 21.803321] ixgbe 0000:03:00.0: 00:1e:67:f7:44:b3 [ 21.957977] ixgbe 0000:03:00.0: Intel(R) 10 Gigabit Network Connection [ 21.965867] libphy: ixgbe-mdio: probed [ 21.970646] IOAPIC[9]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:EF Dest:00000001 SID:002C SQ:0 SVT:1) [ 21.986482] IOAPIC[1]: Set routing entry (9-10 -> 0xef -> IRQ 105 Mode:1 Active:1 Dest:1) [ 22.265269] ixgbe 0000:03:00.1: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0 [ 22.361656] ixgbe 0000:03:00.1: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 22.395908] ixgbe 0000:03:00.1: MAC: 3, PHY: 0, PBA No: 000000-000 [ 22.403235] ixgbe 0000:03:00.1: 00:1e:67:f7:44:b4 [ 22.556985] ixgbe 0000:03:00.1: Intel(R) 10 Gigabit Network Connection [ 22.564864] libphy: ixgbe-mdio: probed [ 22.569541] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.8.20-k [ 22.579179] i40e: Copyright (c) 2013 - 2019 Intel Corporation. [ 22.586811] usbcore: registered new interface driver catc [ 22.593299] usbcore: registered new interface driver kaweth [ 22.599957] pegasus: v0.9.3 (2013/04/25), Pegasus/Pegasus II USB Ethernet driver [ 22.609017] usbcore: registered new interface driver pegasus [ 22.615784] usbcore: registered new interface driver rtl8150 [ 22.622550] usbcore: registered new interface driver asix [ 22.629029] usbcore: registered new interface driver cdc_ether [ 22.635987] usbcore: registered new interface driver cdc_eem [ 22.642749] usbcore: registered new interface driver dm9601 [ 22.649423] usbcore: registered new interface driver smsc75xx [ 22.656294] usbcore: registered new interface driver smsc95xx [ 22.663163] usbcore: registered new interface driver gl620a [ 22.669828] usbcore: registered new interface driver net1080 [ 22.676587] usbcore: registered new interface driver plusb [ 22.683162] usbcore: registered new interface driver rndis_host [ 22.690218] usbcore: registered new interface driver cdc_subset [ 22.697285] usbcore: registered new interface driver zaurus [ 22.703954] usbcore: registered new interface driver MOSCHIP usb-ethernet driver [ 22.713005] usbcore: registered new interface driver int51x1 [ 22.719770] usbcore: registered new interface driver ipheth [ 22.726439] usbcore: registered new interface driver sierra_net [ 22.733563] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 22.741292] ehci-pci: EHCI PCI platform driver [ 22.746931] IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:EF Dest:00000001 SID:F0FF SQ:0 SVT:1) [ 22.762784] IOAPIC[0]: Set routing entry (8-18 -> 0xef -> IRQ 18 Mode:1 Active:1 Dest:1) [ 22.772881] ehci-pci 0000:00:1a.0: EHCI Host Controller [ 22.779227] ehci-pci 0000:00:1a.0: new USB bus registered, assigned bus number 1
To reproduce:
git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml
Thanks, Rong Chen
On 5/11/2020 11:00 AM, kernel test robot wrote:
Greeting,
FYI, we noticed the following commit (built with gcc-7):
commit: b13663bdf9701c8896bebcc7ee998f8656c1ea37 ("[PATCH] ACPI: Drop rcu usage for MMIO mappings") url: https://github.com/0day-ci/linux/commits/Dan-Williams/ACPI-Drop-rcu-usage-fo... base: https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git linux-next
in testcase: v4l2 with following parameters:
test: device ucode: 0x43
on test machine: 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 256G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag Reported-by: kernel test robot rong.a.chen@intel.com
Dan,
Has this been addressed in the v2?
[ 21.012858] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281 [ 21.013816] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 [ 21.029953] tsc: Refined TSC clocksource calibration: 2294.686 MHz [ 21.013816] CPU: 55 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3-00025-gb13663bdf9701c #1 [ 21.013816] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [ 21.013816] Call Trace: [ 21.042037] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x211399552f8, max_idle_ns: 440795292447 ns [ 21.013816] dump_stack+0x66/0x8b [ 21.064421] ___might_sleep+0x102/0x120 [ 21.064421] mutex_lock+0x1c/0x40 [ 21.064421] acpi_os_rw_map+0x37/0xe0 [ 21.064421] acpi_os_read_memory+0x34/0xc0 [ 21.064421] ? acpi_match_platform_list+0x84/0x100 [ 21.064421] apei_read+0x97/0xb0 [ 21.064421] __ghes_peek_estatus+0x27/0xc0 [ 21.064421] ghes_proc+0x37/0x120 [ 21.064421] ghes_probe+0x1d0/0x460 [ 21.064421] platform_drv_probe+0x37/0x90 [ 21.064421] really_probe+0xef/0x430 [ 21.064421] driver_probe_device+0x110/0x120 [ 21.064421] device_driver_attach+0x4f/0x60 [ 21.064421] __driver_attach+0x9a/0x140 [ 21.064421] ? device_driver_attach+0x60/0x60 [ 21.064421] bus_for_each_dev+0x76/0xc0 [ 21.064421] ? klist_add_tail+0x3b/0x70 [ 21.064421] bus_add_driver+0x144/0x220 [ 21.064421] ? bert_init+0x229/0x229 [ 21.064421] driver_register+0x5b/0xf0 [ 21.064421] ? bert_init+0x229/0x229 [ 21.064421] ghes_init+0x83/0xde [ 21.064421] do_one_initcall+0x46/0x220 [ 21.064421] kernel_init_freeable+0x206/0x280 [ 21.064421] ? rest_init+0xd0/0xd0 [ 21.064421] kernel_init+0xa/0x110 [ 21.064421] ret_from_fork+0x35/0x40 [ 21.211518] clocksource: Switched to clocksource tsc [ 21.212408] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. [ 21.227478] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 21.235019] 00:02: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A [ 21.244105] 00:03: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A [ 21.254257] Non-volatile memory driver v1.3 [ 21.259421] Linux agpgart interface v0.103 [ 21.272262] rdac: device handler registered [ 21.277466] hp_sw: device handler registered [ 21.282671] emc: device handler registered [ 21.288039] alua: device handler registered [ 21.293154] MACsec IEEE 802.1AE [ 21.297325] libphy: Fixed MDIO Bus: probed [ 21.302666] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI [ 21.310980] e1000: Copyright (c) 1999-2006 Intel Corporation. [ 21.317926] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k [ 21.324883] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 21.332069] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.6.0-k [ 21.340297] igb: Copyright (c) 2007-2014 Intel Corporation. [ 21.347058] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 5.1.0-k [ 21.356399] ixgbe: Copyright (c) 1999-2016 Intel Corporation. [ 21.363577] IOAPIC[9]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:EF Dest:00000001 SID:002C SQ:0 SVT:1) [ 21.379417] IOAPIC[1]: Set routing entry (9-13 -> 0xef -> IRQ 38 Mode:1 Active:1 Dest:1) [ 21.665318] ixgbe 0000:03:00.0: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0 [ 21.761727] ixgbe 0000:03:00.0: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 21.795985] ixgbe 0000:03:00.0: MAC: 3, PHY: 0, PBA No: 000000-000 [ 21.803321] ixgbe 0000:03:00.0: 00:1e:67:f7:44:b3 [ 21.957977] ixgbe 0000:03:00.0: Intel(R) 10 Gigabit Network Connection [ 21.965867] libphy: ixgbe-mdio: probed [ 21.970646] IOAPIC[9]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:EF Dest:00000001 SID:002C SQ:0 SVT:1) [ 21.986482] IOAPIC[1]: Set routing entry (9-10 -> 0xef -> IRQ 105 Mode:1 Active:1 Dest:1) [ 22.265269] ixgbe 0000:03:00.1: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0 [ 22.361656] ixgbe 0000:03:00.1: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 22.395908] ixgbe 0000:03:00.1: MAC: 3, PHY: 0, PBA No: 000000-000 [ 22.403235] ixgbe 0000:03:00.1: 00:1e:67:f7:44:b4 [ 22.556985] ixgbe 0000:03:00.1: Intel(R) 10 Gigabit Network Connection [ 22.564864] libphy: ixgbe-mdio: probed [ 22.569541] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.8.20-k [ 22.579179] i40e: Copyright (c) 2013 - 2019 Intel Corporation. [ 22.586811] usbcore: registered new interface driver catc [ 22.593299] usbcore: registered new interface driver kaweth [ 22.599957] pegasus: v0.9.3 (2013/04/25), Pegasus/Pegasus II USB Ethernet driver [ 22.609017] usbcore: registered new interface driver pegasus [ 22.615784] usbcore: registered new interface driver rtl8150 [ 22.622550] usbcore: registered new interface driver asix [ 22.629029] usbcore: registered new interface driver cdc_ether [ 22.635987] usbcore: registered new interface driver cdc_eem [ 22.642749] usbcore: registered new interface driver dm9601 [ 22.649423] usbcore: registered new interface driver smsc75xx [ 22.656294] usbcore: registered new interface driver smsc95xx [ 22.663163] usbcore: registered new interface driver gl620a [ 22.669828] usbcore: registered new interface driver net1080 [ 22.676587] usbcore: registered new interface driver plusb [ 22.683162] usbcore: registered new interface driver rndis_host [ 22.690218] usbcore: registered new interface driver cdc_subset [ 22.697285] usbcore: registered new interface driver zaurus [ 22.703954] usbcore: registered new interface driver MOSCHIP usb-ethernet driver [ 22.713005] usbcore: registered new interface driver int51x1 [ 22.719770] usbcore: registered new interface driver ipheth [ 22.726439] usbcore: registered new interface driver sierra_net [ 22.733563] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 22.741292] ehci-pci: EHCI PCI platform driver [ 22.746931] IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:EF Dest:00000001 SID:F0FF SQ:0 SVT:1) [ 22.762784] IOAPIC[0]: Set routing entry (8-18 -> 0xef -> IRQ 18 Mode:1 Active:1 Dest:1) [ 22.772881] ehci-pci 0000:00:1a.0: EHCI Host Controller [ 22.779227] ehci-pci 0000:00:1a.0: new USB bus registered, assigned bus number 1
To reproduce:
git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml
Thanks, Rong Chen
On Tue, May 12, 2020 at 9:28 AM Rafael J. Wysocki rafael.j.wysocki@intel.com wrote:
On 5/11/2020 11:00 AM, kernel test robot wrote:
Greeting,
FYI, we noticed the following commit (built with gcc-7):
commit: b13663bdf9701c8896bebcc7ee998f8656c1ea37 ("[PATCH] ACPI: Drop rcu usage for MMIO mappings") url: https://github.com/0day-ci/linux/commits/Dan-Williams/ACPI-Drop-rcu-usage-fo... base: https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git linux-next
in testcase: v4l2 with following parameters:
test: device ucode: 0x43
on test machine: 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 256G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag Reported-by: kernel test robot rong.a.chen@intel.com
Dan,
Has this been addressed in the v2?
No, this looks like a case I was concerned about, i.e. the GHES code is not being completely careful to avoid calling potentially sleeping functions with interrupts disabled. There is the nice comment that indicates that the fixmap should be used when ghes_notify_lock_irq() is held, but there seems to be no infrastructure to use / divert to the fixmap in the ghes_proc() path. That needs to be reworked first. It seems the implementation was getting lucky before to hit the cached acpi_ioremap in this path under rcu_read_lock(), but it appears it should have always been using the fixmap. Ying, James, is my read correct?
/* * Because the memory area used to transfer hardware error information * from BIOS to Linux can be determined only in NMI, IRQ or timer * handler, but general ioremap can not be used in atomic context, so * the fixmap is used instead. * * This spinlock is used to prevent the fixmap entry from being used * simultaneously. */ static DEFINE_SPINLOCK(ghes_notify_lock_irq);
[ 21.012858] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281 [ 21.013816] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 [ 21.029953] tsc: Refined TSC clocksource calibration: 2294.686 MHz [ 21.013816] CPU: 55 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3-00025-gb13663bdf9701c #1 [ 21.013816] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [ 21.013816] Call Trace: [ 21.042037] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x211399552f8, max_idle_ns: 440795292447 ns [ 21.013816] dump_stack+0x66/0x8b [ 21.064421] ___might_sleep+0x102/0x120 [ 21.064421] mutex_lock+0x1c/0x40 [ 21.064421] acpi_os_rw_map+0x37/0xe0 [ 21.064421] acpi_os_read_memory+0x34/0xc0 [ 21.064421] ? acpi_match_platform_list+0x84/0x100 [ 21.064421] apei_read+0x97/0xb0 [ 21.064421] __ghes_peek_estatus+0x27/0xc0 [ 21.064421] ghes_proc+0x37/0x120 [ 21.064421] ghes_probe+0x1d0/0x460 [ 21.064421] platform_drv_probe+0x37/0x90 [ 21.064421] really_probe+0xef/0x430 [ 21.064421] driver_probe_device+0x110/0x120 [ 21.064421] device_driver_attach+0x4f/0x60 [ 21.064421] __driver_attach+0x9a/0x140 [ 21.064421] ? device_driver_attach+0x60/0x60 [ 21.064421] bus_for_each_dev+0x76/0xc0 [ 21.064421] ? klist_add_tail+0x3b/0x70 [ 21.064421] bus_add_driver+0x144/0x220 [ 21.064421] ? bert_init+0x229/0x229 [ 21.064421] driver_register+0x5b/0xf0 [ 21.064421] ? bert_init+0x229/0x229 [ 21.064421] ghes_init+0x83/0xde [ 21.064421] do_one_initcall+0x46/0x220 [ 21.064421] kernel_init_freeable+0x206/0x280 [ 21.064421] ? rest_init+0xd0/0xd0 [ 21.064421] kernel_init+0xa/0x110 [ 21.064421] ret_from_fork+0x35/0x40 [ 21.211518] clocksource: Switched to clocksource tsc [ 21.212408] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. [ 21.227478] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 21.235019] 00:02: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A [ 21.244105] 00:03: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A [ 21.254257] Non-volatile memory driver v1.3 [ 21.259421] Linux agpgart interface v0.103 [ 21.272262] rdac: device handler registered [ 21.277466] hp_sw: device handler registered [ 21.282671] emc: device handler registered [ 21.288039] alua: device handler registered [ 21.293154] MACsec IEEE 802.1AE [ 21.297325] libphy: Fixed MDIO Bus: probed [ 21.302666] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI [ 21.310980] e1000: Copyright (c) 1999-2006 Intel Corporation. [ 21.317926] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k [ 21.324883] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 21.332069] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.6.0-k [ 21.340297] igb: Copyright (c) 2007-2014 Intel Corporation. [ 21.347058] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 5.1.0-k [ 21.356399] ixgbe: Copyright (c) 1999-2016 Intel Corporation. [ 21.363577] IOAPIC[9]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:EF Dest:00000001 SID:002C SQ:0 SVT:1) [ 21.379417] IOAPIC[1]: Set routing entry (9-13 -> 0xef -> IRQ 38 Mode:1 Active:1 Dest:1) [ 21.665318] ixgbe 0000:03:00.0: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0 [ 21.761727] ixgbe 0000:03:00.0: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 21.795985] ixgbe 0000:03:00.0: MAC: 3, PHY: 0, PBA No: 000000-000 [ 21.803321] ixgbe 0000:03:00.0: 00:1e:67:f7:44:b3 [ 21.957977] ixgbe 0000:03:00.0: Intel(R) 10 Gigabit Network Connection [ 21.965867] libphy: ixgbe-mdio: probed [ 21.970646] IOAPIC[9]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:EF Dest:00000001 SID:002C SQ:0 SVT:1) [ 21.986482] IOAPIC[1]: Set routing entry (9-10 -> 0xef -> IRQ 105 Mode:1 Active:1 Dest:1) [ 22.265269] ixgbe 0000:03:00.1: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0 [ 22.361656] ixgbe 0000:03:00.1: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 22.395908] ixgbe 0000:03:00.1: MAC: 3, PHY: 0, PBA No: 000000-000 [ 22.403235] ixgbe 0000:03:00.1: 00:1e:67:f7:44:b4 [ 22.556985] ixgbe 0000:03:00.1: Intel(R) 10 Gigabit Network Connection [ 22.564864] libphy: ixgbe-mdio: probed [ 22.569541] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.8.20-k [ 22.579179] i40e: Copyright (c) 2013 - 2019 Intel Corporation. [ 22.586811] usbcore: registered new interface driver catc [ 22.593299] usbcore: registered new interface driver kaweth [ 22.599957] pegasus: v0.9.3 (2013/04/25), Pegasus/Pegasus II USB Ethernet driver [ 22.609017] usbcore: registered new interface driver pegasus [ 22.615784] usbcore: registered new interface driver rtl8150 [ 22.622550] usbcore: registered new interface driver asix [ 22.629029] usbcore: registered new interface driver cdc_ether [ 22.635987] usbcore: registered new interface driver cdc_eem [ 22.642749] usbcore: registered new interface driver dm9601 [ 22.649423] usbcore: registered new interface driver smsc75xx [ 22.656294] usbcore: registered new interface driver smsc95xx [ 22.663163] usbcore: registered new interface driver gl620a [ 22.669828] usbcore: registered new interface driver net1080 [ 22.676587] usbcore: registered new interface driver plusb [ 22.683162] usbcore: registered new interface driver rndis_host [ 22.690218] usbcore: registered new interface driver cdc_subset [ 22.697285] usbcore: registered new interface driver zaurus [ 22.703954] usbcore: registered new interface driver MOSCHIP usb-ethernet driver [ 22.713005] usbcore: registered new interface driver int51x1 [ 22.719770] usbcore: registered new interface driver ipheth [ 22.726439] usbcore: registered new interface driver sierra_net [ 22.733563] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 22.741292] ehci-pci: EHCI PCI platform driver [ 22.746931] IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:EF Dest:00000001 SID:F0FF SQ:0 SVT:1) [ 22.762784] IOAPIC[0]: Set routing entry (8-18 -> 0xef -> IRQ 18 Mode:1 Active:1 Dest:1) [ 22.772881] ehci-pci 0000:00:1a.0: EHCI Host Controller [ 22.779227] ehci-pci 0000:00:1a.0: new USB bus registered, assigned bus number 1
To reproduce:
git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml
Thanks, Rong Chen
Hi guys,
On 12/05/2020 19:05, Dan Williams wrote:
On Tue, May 12, 2020 at 9:28 AM Rafael J. Wysocki rafael.j.wysocki@intel.com wrote:
Dan,
Has this been addressed in the v2?
No, this looks like a case I was concerned about, i.e. the GHES code is not being completely careful to avoid calling potentially sleeping functions with interrupts disabled. There is the nice comment that indicates that the fixmap should be used when ghes_notify_lock_irq() is held, but there seems to be no infrastructure to use / divert to the fixmap in the ghes_proc() path.
ghes_map()/ghes_unmap() use the fixmap for reading the firmware provided records, but this came through apei_read(), which claims to be IRQ and NMI safe...
That needs to be reworked first. It seems the implementation was getting lucky before to hit the cached acpi_ioremap in this path under rcu_read_lock(), but it appears it should have always been using the fixmap. Ying, James, is my read correct?
The path through this thing is pretty tortuous: The static HEST contains the address of the pointer that firmware updates to point to CPER records when they are generated. This pointer might be static (records are always in the same place), it might not.
The address in the tables is static. ghes.c maps it in ghes_new(): | rc = apei_map_generic_address(&generic->error_status_address);
which happens before the ghes_add_timer()/request_irq()/ghes_nmi_add() stuff, so we should always use the existing mapping.
__ghes_peek_estatus() reads the pointer with apei_read(), which should use the mapping from ghes_new(), then uses ghes_copy_tofrom_phys() which uses the fixmap to read the CPER records.
Does apei_map_generic_address() no longer keep the GAR/address mapped? (also possible I've totally mis-understood how ACPIs caching of mappings works!)
Thanks,
James
On Mon, May 18, 2020 at 11:08 AM James Morse james.morse@arm.com wrote:
Hi guys,
On 12/05/2020 19:05, Dan Williams wrote:
On Tue, May 12, 2020 at 9:28 AM Rafael J. Wysocki rafael.j.wysocki@intel.com wrote:
Dan,
Has this been addressed in the v2?
No, this looks like a case I was concerned about, i.e. the GHES code is not being completely careful to avoid calling potentially sleeping functions with interrupts disabled. There is the nice comment that indicates that the fixmap should be used when ghes_notify_lock_irq() is held, but there seems to be no infrastructure to use / divert to the fixmap in the ghes_proc() path.
ghes_map()/ghes_unmap() use the fixmap for reading the firmware provided records, but this came through apei_read(), which claims to be IRQ and NMI safe...
That needs to be reworked first. It seems the implementation was getting lucky before to hit the cached acpi_ioremap in this path under rcu_read_lock(), but it appears it should have always been using the fixmap. Ying, James, is my read correct?
The path through this thing is pretty tortuous: The static HEST contains the address of the pointer that firmware updates to point to CPER records when they are generated. This pointer might be static (records are always in the same place), it might not.
The address in the tables is static. ghes.c maps it in ghes_new(): | rc = apei_map_generic_address(&generic->error_status_address);
which happens before the ghes_add_timer()/request_irq()/ghes_nmi_add() stuff, so we should always use the existing mapping.
__ghes_peek_estatus() reads the pointer with apei_read(), which should use the mapping from ghes_new(), then uses ghes_copy_tofrom_phys() which uses the fixmap to read the CPER records.
Does apei_map_generic_address() no longer keep the GAR/address mapped? (also possible I've totally mis-understood how ACPIs caching of mappings works!)
Upon further investigation the problem appears to be that System-Memory OperationRegions are dynamically mapped at runtime for ASL code. This results in every unmap event triggering eviction from the cache and incurring synchronize_rcu_expedited(). The APEI code avoids this path by taking an extra reference at the beginning of time such that the rcu-walk through the cache at NMI time is guaranteed to both succeed, and not trigger an unmap event.
So now I'm looking at whether System-Memory OperationRegions can be generically pre-mapped in a similar fashion.
linux-stable-mirror@lists.linaro.org