This patch series includes some improvement to Machine check handler
for pseries. Patch 1 fixes an issue where machine check handler crashes
kernel while accessing vmalloc-ed buffer while in nmi context.
Patch 2 fixes endain bug while restoring of r3 in MCE handler.
Patch 4 implements a real mode mce handler and flushes the SLBs on SLB error.
Patch 5 display's the MCE error details on console.
Patch 6 saves and dumps the SLB contents on SLB MCE errors to improve the
debugability.
Change in V4:
- Flush the SLBs in real mode mce handler to handle SLB errors for entry 0.
- Allocate buffers per cpu to hold rtas error log and old slb contents.
- Defer the logging of rtas error log to irq work queue.
Change in V3:
- Moved patch 5 to patch 2
Change in V2:
- patch 3: Display additional info (NIP and task info) in MCE error details.
- patch 5: Fix endain bug while restoring of r3 in MCE handler.
---
Mahesh Salgaonkar (6):
powerpc/pseries: Defer the logging of rtas error to irq work queue.
powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
powerpc/pseries: Define MCE error event section.
powerpc/pseries: flush SLB contents on SLB MCE errors.
powerpc/pseries: Display machine check error details.
powerpc/pseries: Dump the SLB contents on SLB MCE errors.
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 8 +
arch/powerpc/include/asm/machdep.h | 1
arch/powerpc/include/asm/paca.h | 4
arch/powerpc/include/asm/rtas.h | 116 ++++++++++++
arch/powerpc/kernel/exceptions-64s.S | 42 ++++
arch/powerpc/kernel/mce.c | 16 +-
arch/powerpc/mm/slb.c | 63 +++++++
arch/powerpc/platforms/powernv/opal.c | 1
arch/powerpc/platforms/pseries/pseries.h | 1
arch/powerpc/platforms/pseries/ras.c | 236 +++++++++++++++++++++++--
arch/powerpc/platforms/pseries/setup.c | 27 +++
11 files changed, 497 insertions(+), 18 deletions(-)
--
Signature
QUEUE_FLAG_DAX is an indication that a given block device supports
filesystem DAX and should not be set for PMEM namespaces which are in "raw"
or "sector" modes. These namespaces lack struct page and are prevented
from participating in filesystem DAX.
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Suggested-by: Mike Snitzer <snitzer(a)redhat.com>
Cc: stable(a)vger.kernel.org
---
drivers/nvdimm/pmem.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 68940356cad3..8b1fd7f1a224 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -414,7 +414,8 @@ static int pmem_attach_disk(struct device *dev,
blk_queue_logical_block_size(q, pmem_sector_size(ndns));
blk_queue_max_hw_sectors(q, UINT_MAX);
blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
- blk_queue_flag_set(QUEUE_FLAG_DAX, q);
+ if (pmem->pfn_flags & PFN_MAP)
+ blk_queue_flag_set(QUEUE_FLAG_DAX, q);
q->queuedata = pmem;
disk = alloc_disk_node(0, nid);
--
2.14.4
The other day I was testing one of the HP laptops at my office with an
i915/amdgpu hybrid setup and noticed that hotplugging was non-functional
on almost all of the display outputs. I eventually discovered that all
of the external outputs were connected to the amdgpu device instead of
i915, and that the hotplugs weren't being detected so long as the GPU
was in runtime suspend. After some talking with folks at AMD, I learned
that amdgpu is actually supposed to support hotplug detection in runtime
suspend so long as the OEM has implemented it properly in the firmware.
On this HP ZBook 15 G4 (the machine in question), amdgpu wasn't managing
to find the ATIF handle at all despite the fact that I could see acpi
events being sent in response to any hotplugging. After going through
dumps of the firmware, I discovered that this machine did in fact
support ATIF, but that it's ATIF method lived in an entirely different
namespace than this device's handle (the device handle was
\_SB_.PCI0.PEG0.PEGP, but ATIF lives in ATPX's handle at
\_SB_.PCI0.GFX0).
According to AMD, the reason for this appears to be that on laptops with
PRIME configurations, the ATIF and other related ACPI handles usually
live under the namespace of the intel GPU instead of the AMD GPU.
Machines that don't have more then a single GPU have the ATIF handle
located in the namespace of the AMD GPU.
So, fix this by probing ATPX's ACPI parent's namespace if we can't find
ATIF elsewhere, along with storing a pointer to the proper handle to use
for ATIF and using that instead of the device's handle.
This fixes HPD detection while in runtime suspend for this ZBook!
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 80 +++++++++++++++++-------
1 file changed, 59 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index 22c7e8ec0b9a..b0d7978c475d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -65,6 +65,8 @@ struct amdgpu_atif_functions {
};
struct amdgpu_atif {
+ acpi_handle handle;
+
struct amdgpu_atif_notifications notifications;
struct amdgpu_atif_functions functions;
struct amdgpu_atif_notification_cfg notification_cfg;
@@ -83,8 +85,9 @@ struct amdgpu_atif {
* Executes the requested ATIF function (all asics).
* Returns a pointer to the acpi output buffer.
*/
-static union acpi_object *amdgpu_atif_call(acpi_handle handle, int function,
- struct acpi_buffer *params)
+static union acpi_object *amdgpu_atif_call(struct amdgpu_atif *atif,
+ int function,
+ struct acpi_buffer *params)
{
acpi_status status;
union acpi_object atif_arg_elements[2];
@@ -107,7 +110,8 @@ static union acpi_object *amdgpu_atif_call(acpi_handle handle, int function,
atif_arg_elements[1].integer.value = 0;
}
- status = acpi_evaluate_object(handle, "ATIF", &atif_arg, &buffer);
+ status = acpi_evaluate_object(atif->handle, NULL, &atif_arg,
+ &buffer);
/* Fail only if calling the method fails and ATIF is supported */
if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) {
@@ -178,15 +182,14 @@ static void amdgpu_atif_parse_functions(struct amdgpu_atif_functions *f, u32 mas
* (all asics).
* returns 0 on success, error on failure.
*/
-static int amdgpu_atif_verify_interface(acpi_handle handle,
- struct amdgpu_atif *atif)
+static int amdgpu_atif_verify_interface(struct amdgpu_atif *atif)
{
union acpi_object *info;
struct atif_verify_interface output;
size_t size;
int err = 0;
- info = amdgpu_atif_call(handle, ATIF_FUNCTION_VERIFY_INTERFACE, NULL);
+ info = amdgpu_atif_call(atif, ATIF_FUNCTION_VERIFY_INTERFACE, NULL);
if (!info)
return -EIO;
@@ -213,6 +216,36 @@ static int amdgpu_atif_verify_interface(acpi_handle handle,
return err;
}
+static acpi_handle amdgpu_atif_probe_handle(acpi_handle dhandle)
+{
+ acpi_handle handle = NULL;
+ char acpi_method_name[255] = { 0 };
+ struct acpi_buffer buffer = { sizeof(acpi_method_name), acpi_method_name };
+ acpi_status status;
+
+ /* First check the device handle */
+ status = acpi_get_handle(dhandle, "ATIF", &handle);
+ if (ACPI_SUCCESS(status))
+ goto out;
+
+ /* Some vendors stick the ATIF methods in the same namespace as the
+ * ATPX methods
+ */
+ if (amdgpu_has_atpx()) {
+ status = acpi_get_handle(amdgpu_atpx_get_dhandle(), "ATIF",
+ &handle);
+ if (ACPI_SUCCESS(status))
+ goto out;
+ }
+
+ DRM_DEBUG_DRIVER("No ATIF handle found\n");
+ return NULL;
+out:
+ acpi_get_name(handle, ACPI_FULL_PATHNAME, &buffer);
+ DRM_DEBUG_DRIVER("Found ATIF handle %s\n", acpi_method_name);
+ return handle;
+}
+
/**
* amdgpu_atif_get_notification_params - determine notify configuration
*
@@ -225,15 +258,16 @@ static int amdgpu_atif_verify_interface(acpi_handle handle,
* where n is specified in the result if a notifier is used.
* Returns 0 on success, error on failure.
*/
-static int amdgpu_atif_get_notification_params(acpi_handle handle,
- struct amdgpu_atif_notification_cfg *n)
+static int amdgpu_atif_get_notification_params(struct amdgpu_atif *atif)
{
union acpi_object *info;
+ struct amdgpu_atif_notification_cfg *n = &atif->notification_cfg;
struct atif_system_params params;
size_t size;
int err = 0;
- info = amdgpu_atif_call(handle, ATIF_FUNCTION_GET_SYSTEM_PARAMETERS, NULL);
+ info = amdgpu_atif_call(atif, ATIF_FUNCTION_GET_SYSTEM_PARAMETERS,
+ NULL);
if (!info) {
err = -EIO;
goto out;
@@ -287,14 +321,15 @@ static int amdgpu_atif_get_notification_params(acpi_handle handle,
* (all asics).
* Returns 0 on success, error on failure.
*/
-static int amdgpu_atif_get_sbios_requests(acpi_handle handle,
- struct atif_sbios_requests *req)
+static int amdgpu_atif_get_sbios_requests(struct amdgpu_atif *atif,
+ struct atif_sbios_requests *req)
{
union acpi_object *info;
size_t size;
int count = 0;
- info = amdgpu_atif_call(handle, ATIF_FUNCTION_GET_SYSTEM_BIOS_REQUESTS, NULL);
+ info = amdgpu_atif_call(atif, ATIF_FUNCTION_GET_SYSTEM_BIOS_REQUESTS,
+ NULL);
if (!info)
return -EIO;
@@ -327,11 +362,10 @@ static int amdgpu_atif_get_sbios_requests(acpi_handle handle,
* Returns NOTIFY code
*/
static int amdgpu_atif_handler(struct amdgpu_device *adev,
- struct acpi_bus_event *event)
+ struct acpi_bus_event *event)
{
struct amdgpu_atif *atif = adev->atif;
struct atif_sbios_requests req;
- acpi_handle handle;
int count;
DRM_DEBUG_DRIVER("event, device_class = %s, type = %#x\n",
@@ -347,8 +381,7 @@ static int amdgpu_atif_handler(struct amdgpu_device *adev,
return NOTIFY_DONE;
/* Check pending SBIOS requests */
- handle = ACPI_HANDLE(&adev->pdev->dev);
- count = amdgpu_atif_get_sbios_requests(handle, &req);
+ count = amdgpu_atif_get_sbios_requests(atif, &req);
if (count <= 0)
return NOTIFY_DONE;
@@ -679,7 +712,7 @@ static int amdgpu_acpi_event(struct notifier_block *nb,
*/
int amdgpu_acpi_init(struct amdgpu_device *adev)
{
- acpi_handle handle;
+ acpi_handle handle, atif_handle;
struct amdgpu_atif *atif;
struct amdgpu_atcs *atcs = &adev->atcs;
int ret;
@@ -696,14 +729,20 @@ int amdgpu_acpi_init(struct amdgpu_device *adev)
DRM_DEBUG_DRIVER("Call to ATCS verify_interface failed: %d\n", ret);
}
- /* Call the ATIF method */
+ /* Probe for ATIF, and initialize it if found */
+ atif_handle = amdgpu_atif_probe_handle(handle);
+ if (!atif_handle)
+ goto out;
+
atif = kzalloc(sizeof(*atif), GFP_KERNEL);
if (!atif) {
DRM_WARN("Not enough memory to initialize ATIF\n");
goto out;
}
+ atif->handle = atif_handle;
- ret = amdgpu_atif_verify_interface(handle, atif);
+ /* Call the ATIF method */
+ ret = amdgpu_atif_verify_interface(atif);
if (ret) {
DRM_DEBUG_DRIVER("Call to ATIF verify_interface failed: %d\n", ret);
kfree(atif);
@@ -739,8 +778,7 @@ int amdgpu_acpi_init(struct amdgpu_device *adev)
}
if (atif->functions.system_params) {
- ret = amdgpu_atif_get_notification_params(handle,
- &atif->notification_cfg);
+ ret = amdgpu_atif_get_notification_params(atif);
if (ret) {
DRM_DEBUG_DRIVER("Call to GET_SYSTEM_PARAMS failed: %d\n",
ret);
--
2.17.1
The handles for ACPI methods like ATPX and ATIF will live under the
integrated GPU's namespace on systems with more then one GPU. ATPX is
already detected regardless of the namespace it lives in, and it will
always be in the same namespace as ATIF. So we can make things easier on
ourselves by just adding some functions to retrieve said namespace so it
can be searched later for ATIF.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 ++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c | 6 ++++++
2 files changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 7df5d3d11aff..7dcbac8af9a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1858,6 +1858,12 @@ static inline bool amdgpu_atpx_dgpu_req_power_for_displays(void) { return false;
static inline bool amdgpu_has_atpx(void) { return false; }
#endif
+#if defined(CONFIG_VGA_SWITCHEROO) && defined(CONFIG_ACPI)
+void *amdgpu_atpx_get_dhandle(void);
+#else
+static inline void *amdgpu_atpx_get_dhandle(void) { return NULL; }
+#endif
+
/*
* KMS
*/
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
index a3896412a019..b33f1680c9a3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
@@ -90,6 +90,12 @@ bool amdgpu_atpx_dgpu_req_power_for_displays(void) {
return amdgpu_atpx_priv.atpx.dgpu_req_power_for_displays;
}
+#if defined(CONFIG_ACPI)
+void *amdgpu_atpx_get_dhandle(void) {
+ return amdgpu_atpx_priv.dhandle;
+}
+#endif
+
/**
* amdgpu_atpx_call - call an ATPX method
*
--
2.17.1
Until the original patch that adds the set_memory_block_size_order()
function is added, then the following two patches should also not be
applied. In fact I'm surprised it built with that function undefined.
Thanks,
Mike
On 6/27/2018 7:08 PM, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> x86/platform/UV: Use new set memory block size function
>
> to the 4.14-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> x86-platform-uv-use-new-set-memory-block-size-function.patch
> and it can be found in the queue-4.14 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
> From bbbd2b51a2aa0d76b3676271e216cf3647773397 Mon Sep 17 00:00:00 2001
> From: "mike.travis(a)hpe.com" <mike.travis(a)hpe.com>
> Date: Thu, 24 May 2018 15:17:13 -0500
> Subject: x86/platform/UV: Use new set memory block size function
>
> From: mike.travis(a)hpe.com <mike.travis(a)hpe.com>
>
> commit bbbd2b51a2aa0d76b3676271e216cf3647773397 upstream.
>
> Add a call to the new function to "adjust" the current fixed UV memory
> block size of 2GB so it can be changed to a different physical boundary.
> This accommodates changes in the Intel BIOS, and therefore UV BIOS,
> which now can align boundaries different than the previous UV standard
> of 2GB. It also flags any UV Global Address boundaries from BIOS that
> cause a change in the mem block size (boundary).
>
> The current boundary of 2GB has been used on UV since the first system
> release in 2009 with Linux 2.6 and has worked fine. But the new NVDIMM
> persistent memory modules (PMEM), along with the Intel BIOS changes to
> support these modules caused the memory block size boundary to be set
> to a lower limit. Intel only guarantees that this minimum boundary at
> 64MB though the current Linux limit is 128MB.
>
> Note that the default remains 2GB if no changes occur.
>
> Signed-off-by: Mike Travis <mike.travis(a)hpe.com>
> Reviewed-by: Andrew Banman <andrew.banman(a)hpe.com>
> Cc: Andrew Morton <akpm(a)linux-foundation.org>
> Cc: Dimitri Sivanich <dimitri.sivanich(a)hpe.com>
> Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
> Cc: Peter Zijlstra <peterz(a)infradead.org>
> Cc: Russ Anderson <russ.anderson(a)hpe.com>
> Cc: Thomas Gleixner <tglx(a)linutronix.de>
> Cc: dan.j.williams(a)intel.com
> Cc: jgross(a)suse.com
> Cc: kirill.shutemov(a)linux.intel.com
> Cc: mhocko(a)suse.com
> Cc: stable(a)vger.kernel.org
> Link: https://lkml.kernel.org/lkml/20180524201711.732785782@stormcage.americas.sg…
> Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
>
> ---
> arch/x86/kernel/apic/x2apic_uv_x.c | 49 ++++++++++++++++++++++++++++++++++---
> 1 file changed, 46 insertions(+), 3 deletions(-)
>
> --- a/arch/x86/kernel/apic/x2apic_uv_x.c
> +++ b/arch/x86/kernel/apic/x2apic_uv_x.c
> @@ -26,6 +26,7 @@
> #include <linux/delay.h>
> #include <linux/crash_dump.h>
> #include <linux/reboot.h>
> +#include <linux/memory.h>
>
> #include <asm/uv/uv_mmrs.h>
> #include <asm/uv/uv_hub.h>
> @@ -346,6 +347,40 @@ extern int uv_hub_info_version(void)
> }
> EXPORT_SYMBOL(uv_hub_info_version);
>
> +/* Default UV memory block size is 2GB */
> +static unsigned long mem_block_size = (2UL << 30);
> +
> +static __init int adj_blksize(u32 lgre)
> +{
> + unsigned long base = (unsigned long)lgre << UV_GAM_RANGE_SHFT;
> + unsigned long size;
> +
> + for (size = mem_block_size; size > MIN_MEMORY_BLOCK_SIZE; size >>= 1)
> + if (IS_ALIGNED(base, size))
> + break;
> +
> + if (size >= mem_block_size)
> + return 0;
> +
> + mem_block_size = size;
> + return 1;
> +}
> +
> +static __init void set_block_size(void)
> +{
> + unsigned int order = ffs(mem_block_size);
> +
> + if (order) {
> + /* adjust for ffs return of 1..64 */
> + set_memory_block_size_order(order - 1);
> + pr_info("UV: mem_block_size set to 0x%lx\n", mem_block_size);
> + } else {
> + /* bad or zero value, default to 1UL << 31 (2GB) */
> + pr_err("UV: mem_block_size error with 0x%lx\n", mem_block_size);
> + set_memory_block_size_order(31);
> + }
> +}
> +
> /* Build GAM range lookup table: */
> static __init void build_uv_gr_table(void)
> {
> @@ -1144,23 +1179,30 @@ static void __init decode_gam_rng_tbl(un
> << UV_GAM_RANGE_SHFT);
> int order = 0;
> char suffix[] = " KMGTPE";
> + int flag = ' ';
>
> while (size > 9999 && order < sizeof(suffix)) {
> size /= 1024;
> order++;
> }
>
> + /* adjust max block size to current range start */
> + if (gre->type == 1 || gre->type == 2)
> + if (adj_blksize(lgre))
> + flag = '*';
> +
> if (!index) {
> pr_info("UV: GAM Range Table...\n");
> - pr_info("UV: # %20s %14s %5s %4s %5s %3s %2s\n", "Range", "", "Size", "Type", "NASID", "SID", "PN");
> + pr_info("UV: # %20s %14s %6s %4s %5s %3s %2s\n", "Range", "", "Size", "Type", "NASID", "SID", "PN");
> }
> - pr_info("UV: %2d: 0x%014lx-0x%014lx %5lu%c %3d %04x %02x %02x\n",
> + pr_info("UV: %2d: 0x%014lx-0x%014lx%c %5lu%c %3d %04x %02x %02x\n",
> index++,
> (unsigned long)lgre << UV_GAM_RANGE_SHFT,
> (unsigned long)gre->limit << UV_GAM_RANGE_SHFT,
> - size, suffix[order],
> + flag, size, suffix[order],
> gre->type, gre->nasid, gre->sockid, gre->pnode);
>
> + /* update to next range start */
> lgre = gre->limit;
> if (sock_min > gre->sockid)
> sock_min = gre->sockid;
> @@ -1391,6 +1433,7 @@ static void __init uv_system_init_hub(vo
>
> build_socket_tables();
> build_uv_gr_table();
> + set_block_size();
> uv_init_hub_info(&hub_info);
> uv_possible_blades = num_possible_nodes();
> if (!_node_to_pnode)
>
>
> Patches currently in stable-queue which might be from mike.travis(a)hpe.com are
>
> queue-4.14/x86-platform-uv-add-kernel-parameter-to-set-memory-block-size.patch
> queue-4.14/x86-platform-uv-use-new-set-memory-block-size-function.patch
>
This is a note to let you know that I've just added the patch titled
serdev: fix memleak on module unload
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From bc6cf3669d22371f573ab0305b3abf13887c0786 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Wed, 13 Jun 2018 17:08:59 +0200
Subject: serdev: fix memleak on module unload
Make sure to free all resources associated with the ida on module
exit.
Fixes: cd6484e1830b ("serdev: Introduce new bus for serial attached devices")
Cc: stable <stable(a)vger.kernel.org> # 4.11
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serdev/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/tty/serdev/core.c b/drivers/tty/serdev/core.c
index df93b727e984..9e59f4788589 100644
--- a/drivers/tty/serdev/core.c
+++ b/drivers/tty/serdev/core.c
@@ -617,6 +617,7 @@ EXPORT_SYMBOL_GPL(__serdev_device_driver_register);
static void __exit serdev_exit(void)
{
bus_unregister(&serdev_bus_type);
+ ida_destroy(&ctrl_ida);
}
module_exit(serdev_exit);
--
2.18.0