The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5d96c9342c23ee1d084802dcf064caa67ecaa45b Mon Sep 17 00:00:00 2001
From: Vishal Verma <vishal.l.verma(a)intel.com>
Date: Thu, 25 Oct 2018 18:37:28 -0600
Subject: [PATCH] acpi/nfit, x86/mce: Handle only uncorrectable machine checks
The MCE handler for nfit devices is called for memory errors on a
Non-Volatile DIMM and adds the error location to a 'badblocks' list.
This list is used by the various NVDIMM drivers to avoid consuming known
poison locations during IO.
The MCE handler gets called for both corrected and uncorrectable errors.
Until now, both kinds of errors have been added to the badblocks list.
However, corrected memory errors indicate that the problem has already
been fixed by hardware, and the resulting interrupt is merely a
notification to Linux.
As far as future accesses to that location are concerned, it is
perfectly fine to use, and thus doesn't need to be included in the above
badblocks list.
Add a check in the nfit MCE handler to filter out corrected mce events,
and only process uncorrectable errors.
Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error")
Reported-by: Omar Avelar <omar.avelar(a)intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
CC: Arnd Bergmann <arnd(a)arndb.de>
CC: Dan Williams <dan.j.williams(a)intel.com>
CC: Dave Jiang <dave.jiang(a)intel.com>
CC: elliott(a)hpe.com
CC: "H. Peter Anvin" <hpa(a)zytor.com>
CC: Ingo Molnar <mingo(a)redhat.com>
CC: Len Brown <lenb(a)kernel.org>
CC: linux-acpi(a)vger.kernel.org
CC: linux-edac <linux-edac(a)vger.kernel.org>
CC: linux-nvdimm(a)lists.01.org
CC: Qiuxu Zhuo <qiuxu.zhuo(a)intel.com>
CC: "Rafael J. Wysocki" <rjw(a)rjwysocki.net>
CC: Ross Zwisler <zwisler(a)kernel.org>
CC: stable <stable(a)vger.kernel.org>
CC: Thomas Gleixner <tglx(a)linutronix.de>
CC: Tony Luck <tony.luck(a)intel.com>
CC: x86-ml <x86(a)kernel.org>
CC: Yazen Ghannam <yazen.ghannam(a)amd.com>
Link: http://lkml.kernel.org/r/20181026003729.8420-1-vishal.l.verma@intel.com
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 4da9b1c58d28..dbd9fe2f6163 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -221,6 +221,7 @@ static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_am
int mce_available(struct cpuinfo_x86 *c);
bool mce_is_memory_error(struct mce *m);
+bool mce_is_correctable(struct mce *m);
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8c66d2fc8f81..77527b8ea982 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -534,7 +534,7 @@ bool mce_is_memory_error(struct mce *m)
}
EXPORT_SYMBOL_GPL(mce_is_memory_error);
-static bool mce_is_correctable(struct mce *m)
+bool mce_is_correctable(struct mce *m)
{
if (m->cpuvendor == X86_VENDOR_AMD && m->status & MCI_STATUS_DEFERRED)
return false;
@@ -547,6 +547,7 @@ static bool mce_is_correctable(struct mce *m)
return true;
}
+EXPORT_SYMBOL_GPL(mce_is_correctable);
static bool cec_add_mce(struct mce *m)
{
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index e9626bf6ca29..7a51707f87e9 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -25,8 +25,8 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
struct acpi_nfit_desc *acpi_desc;
struct nfit_spa *nfit_spa;
- /* We only care about memory errors */
- if (!mce_is_memory_error(mce))
+ /* We only care about uncorrectable memory errors */
+ if (!mce_is_memory_error(mce) || mce_is_correctable(mce))
return NOTIFY_DONE;
/*
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5d96c9342c23ee1d084802dcf064caa67ecaa45b Mon Sep 17 00:00:00 2001
From: Vishal Verma <vishal.l.verma(a)intel.com>
Date: Thu, 25 Oct 2018 18:37:28 -0600
Subject: [PATCH] acpi/nfit, x86/mce: Handle only uncorrectable machine checks
The MCE handler for nfit devices is called for memory errors on a
Non-Volatile DIMM and adds the error location to a 'badblocks' list.
This list is used by the various NVDIMM drivers to avoid consuming known
poison locations during IO.
The MCE handler gets called for both corrected and uncorrectable errors.
Until now, both kinds of errors have been added to the badblocks list.
However, corrected memory errors indicate that the problem has already
been fixed by hardware, and the resulting interrupt is merely a
notification to Linux.
As far as future accesses to that location are concerned, it is
perfectly fine to use, and thus doesn't need to be included in the above
badblocks list.
Add a check in the nfit MCE handler to filter out corrected mce events,
and only process uncorrectable errors.
Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error")
Reported-by: Omar Avelar <omar.avelar(a)intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
CC: Arnd Bergmann <arnd(a)arndb.de>
CC: Dan Williams <dan.j.williams(a)intel.com>
CC: Dave Jiang <dave.jiang(a)intel.com>
CC: elliott(a)hpe.com
CC: "H. Peter Anvin" <hpa(a)zytor.com>
CC: Ingo Molnar <mingo(a)redhat.com>
CC: Len Brown <lenb(a)kernel.org>
CC: linux-acpi(a)vger.kernel.org
CC: linux-edac <linux-edac(a)vger.kernel.org>
CC: linux-nvdimm(a)lists.01.org
CC: Qiuxu Zhuo <qiuxu.zhuo(a)intel.com>
CC: "Rafael J. Wysocki" <rjw(a)rjwysocki.net>
CC: Ross Zwisler <zwisler(a)kernel.org>
CC: stable <stable(a)vger.kernel.org>
CC: Thomas Gleixner <tglx(a)linutronix.de>
CC: Tony Luck <tony.luck(a)intel.com>
CC: x86-ml <x86(a)kernel.org>
CC: Yazen Ghannam <yazen.ghannam(a)amd.com>
Link: http://lkml.kernel.org/r/20181026003729.8420-1-vishal.l.verma@intel.com
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 4da9b1c58d28..dbd9fe2f6163 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -221,6 +221,7 @@ static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_am
int mce_available(struct cpuinfo_x86 *c);
bool mce_is_memory_error(struct mce *m);
+bool mce_is_correctable(struct mce *m);
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8c66d2fc8f81..77527b8ea982 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -534,7 +534,7 @@ bool mce_is_memory_error(struct mce *m)
}
EXPORT_SYMBOL_GPL(mce_is_memory_error);
-static bool mce_is_correctable(struct mce *m)
+bool mce_is_correctable(struct mce *m)
{
if (m->cpuvendor == X86_VENDOR_AMD && m->status & MCI_STATUS_DEFERRED)
return false;
@@ -547,6 +547,7 @@ static bool mce_is_correctable(struct mce *m)
return true;
}
+EXPORT_SYMBOL_GPL(mce_is_correctable);
static bool cec_add_mce(struct mce *m)
{
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index e9626bf6ca29..7a51707f87e9 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -25,8 +25,8 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
struct acpi_nfit_desc *acpi_desc;
struct nfit_spa *nfit_spa;
- /* We only care about memory errors */
- if (!mce_is_memory_error(mce))
+ /* We only care about uncorrectable memory errors */
+ if (!mce_is_memory_error(mce) || mce_is_correctable(mce))
return NOTIFY_DONE;
/*
This reverts commit ffb80fc672c3a7b6afd0cefcb1524fb99917b2f3.
Turns out that commit is wrong. Host controllers are allowed to use
Clear Feature HALT as means to sync data toggle between host and
periperal.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Felipe Balbi <felipe.balbi(a)linux.intel.com>
---
drivers/usb/dwc3/gadget.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 9faad896b3a1..9f92ee03dde7 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1470,9 +1470,6 @@ int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value, int protocol)
unsigned transfer_in_flight;
unsigned started;
- if (dep->flags & DWC3_EP_STALL)
- return 0;
-
if (dep->number > 1)
trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
else
@@ -1494,8 +1491,6 @@ int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value, int protocol)
else
dep->flags |= DWC3_EP_STALL;
} else {
- if (!(dep->flags & DWC3_EP_STALL))
- return 0;
ret = dwc3_send_clear_stall_ep_cmd(dep);
if (ret)
--
2.19.1
Hi,
that patch is not 100% correct. You can revert it in your tree. I added
that because of a problem I found when running adb against macOS.
It's actually okay to send Clear Halt at any time, but for some reason
dwc3 was hanging when running adb against macOS.
If you can revert the patch and make sure it works against all 3 major
OSes (linux, windows and mac) I'd be really glad.
liangshengjun <liangshengjun(a)hisilicon.com> writes:
> Hi felipe,
>
> I have met a case about set/clear Halt patch
> Version: linux v4.16,
> Case: usb uvc run with bulk-mode connect to Windows 7 PC. When PC stop camera application , it would send clearHalt request to uvc device to streaming-off video transfer.
> But with v4.16 dwc3 drivers, it would skip handling this clear Halt request , because dep->flags is not DWC3_EP_STALL status, then it causes PC restart camera application , uvc transfer fail.
> And I have confirmed v3.18 dwc3 drivers is OK.
>
> So how to balance for handling clear Halt without first setHalt ??
>
> PS:
> commit ffb80fc672c3a7b6afd0cefcb1524fb99917b2f3
> Author: Felipe Balbi <felipe.balbi(a)linux.intel.com>
> Date: Thu Jan 19 13:38:42 2017 +0200
>
> usb: dwc3: gadget: skip Set/Clear Halt when invalid
>
> At least macOS seems to be sending
> ClearFeature(ENDPOINT_HALT) to endpoints which
> aren't Halted. This makes DWC3's CLEARSTALL command
> time out which causes several issues for the driver.
>
> Instead, let's just return 0 and bail out early.
>
> Cc: <stable(a)vger.kernel.org>
> Signed-off-by: Felipe Balbi <felipe.balbi(a)linux.intel.com>
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 6faf484..0a664d8 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -1379,6 +1379,9 @@ int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value, int protocol)
> unsigned transfer_in_flight;
> unsigned started;
>
> + if (dep->flags & DWC3_EP_STALL)
> + return 0;
> +
> if (dep->number > 1)
> trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
> else
> @@ -1400,6 +1403,8 @@ int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value, int protocol)
> else
> dep->flags |= DWC3_EP_STALL;
> } else {
> + if (!(dep->flags & DWC3_EP_STALL))
> + return 0;
>
> ret = dwc3_send_clear_stall_ep_cmd(dep);
> if (ret)
>
>
> Liang Shengjun
> [cid:image001.png@01D40971.9265B340]
> HISILICON TECHNOLOGIES CO., LTD.
> New R&D Center, Wuhe Road, Bantian,
> Longgang District, Shenzhen 518129 P.R. China
>
--
balbi
Hi Greg,
I noticed that 3.18.125 added commit bc07ee33284a ('Revert "drm/i915:
Fix mutex->owner inspection race under DEBUG_MUTEXES"'), which states
that the reason it can be applied is:
The core fix was applied in
commit a63b03e2d2477586440741677ecac45bcf28d7b1
Author: Chris Wilson <chris(a)chris-wilson.co.uk>
Date: Tue Jan 6 10:29:35 2015 +0000
mutex: Always clear owner field upon mutex_unlock()
(note the absence of stable@ tag)
so we can now revert our band-aid commit 226e5ae9e5f910 for -next.
but that the commit referenced wasn't also pulled in.
Please consider pulling that one too if you're going to do another 3.18
stable release.
Thanks,
Tom