The patch titled
Subject: mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
has been removed from the -mm tree. Its filename was
mm-hwpoison-fix-race-between-hugetlb-free-demotion-and-memory_failure_hugetlb.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Subject: mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
There is a race condition between memory_failure_hugetlb() and hugetlb
free/demotion, which causes setting PageHWPoison flag on the wrong page.
The one simple result is that wrong processes can be killed, but another
(more serious) one is that the actual error is left unhandled, so no one
prevents later access to it, and that might lead to more serious results
like consuming corrupted data.
Think about the below race window:
CPU 1 CPU 2
memory_failure_hugetlb
struct page *head = compound_head(p);
hugetlb page might be freed to
buddy, or even changed to another
compound page.
get_hwpoison_page -- page is not what we want now...
The current code first does prechecks roughly and then reconfirms after
taking refcount, but it's found that it makes code overly complicated, so
move the prechecks in a single hugetlb_lock range.
A newly introduced function, try_memory_failure_hugetlb(), always takes
hugetlb_lock (even for non-hugetlb pages). That can be improved, but
memory_failure() is rare in principle, so should not be a big problem.
Link: https://lkml.kernel.org/r/20220408135323.1559401-2-naoya.horiguchi@linux.dev
Fixes: 761ad8d7c7b5 ("mm: hwpoison: introduce memory_failure_hugetlb()")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reported-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Dan Carpenter <dan.carpenter(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb.h | 6 +
include/linux/mm.h | 8 ++
mm/hugetlb.c | 10 ++
mm/memory-failure.c | 145 ++++++++++++++++++++++++++------------
4 files changed, 127 insertions(+), 42 deletions(-)
--- a/include/linux/hugetlb.h~mm-hwpoison-fix-race-between-hugetlb-free-demotion-and-memory_failure_hugetlb
+++ a/include/linux/hugetlb.h
@@ -169,6 +169,7 @@ long hugetlb_unreserve_pages(struct inod
long freed);
bool isolate_huge_page(struct page *page, struct list_head *list);
int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
+int get_huge_page_for_hwpoison(unsigned long pfn, int flags);
void putback_active_hugepage(struct page *page);
void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
void free_huge_page(struct page *page);
@@ -377,6 +378,11 @@ static inline int get_hwpoison_huge_page
{
return 0;
}
+
+static inline int get_huge_page_for_hwpoison(unsigned long pfn, int flags)
+{
+ return 0;
+}
static inline void putback_active_hugepage(struct page *page)
{
--- a/include/linux/mm.h~mm-hwpoison-fix-race-between-hugetlb-free-demotion-and-memory_failure_hugetlb
+++ a/include/linux/mm.h
@@ -3197,6 +3197,14 @@ extern int sysctl_memory_failure_recover
extern void shake_page(struct page *p);
extern atomic_long_t num_poisoned_pages __read_mostly;
extern int soft_offline_page(unsigned long pfn, int flags);
+#ifdef CONFIG_MEMORY_FAILURE
+extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
+#else
+static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
+{
+ return 0;
+}
+#endif
#ifndef arch_memory_failure
static inline int arch_memory_failure(unsigned long pfn, int flags)
--- a/mm/hugetlb.c~mm-hwpoison-fix-race-between-hugetlb-free-demotion-and-memory_failure_hugetlb
+++ a/mm/hugetlb.c
@@ -6785,6 +6785,16 @@ int get_hwpoison_huge_page(struct page *
return ret;
}
+int get_huge_page_for_hwpoison(unsigned long pfn, int flags)
+{
+ int ret;
+
+ spin_lock_irq(&hugetlb_lock);
+ ret = __get_huge_page_for_hwpoison(pfn, flags);
+ spin_unlock_irq(&hugetlb_lock);
+ return ret;
+}
+
void putback_active_hugepage(struct page *page)
{
spin_lock_irq(&hugetlb_lock);
--- a/mm/memory-failure.c~mm-hwpoison-fix-race-between-hugetlb-free-demotion-and-memory_failure_hugetlb
+++ a/mm/memory-failure.c
@@ -1498,50 +1498,113 @@ static int try_to_split_thp_page(struct
return 0;
}
-static int memory_failure_hugetlb(unsigned long pfn, int flags)
+/*
+ * Called from hugetlb code with hugetlb_lock held.
+ *
+ * Return values:
+ * 0 - free hugepage
+ * 1 - in-use hugepage
+ * 2 - not a hugepage
+ * -EBUSY - the hugepage is busy (try to retry)
+ * -EHWPOISON - the hugepage is already hwpoisoned
+ */
+int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
+{
+ struct page *page = pfn_to_page(pfn);
+ struct page *head = compound_head(page);
+ int ret = 2; /* fallback to normal page handling */
+ bool count_increased = false;
+
+ if (!PageHeadHuge(head))
+ goto out;
+
+ if (flags & MF_COUNT_INCREASED) {
+ ret = 1;
+ count_increased = true;
+ } else if (HPageFreed(head) || HPageMigratable(head)) {
+ ret = get_page_unless_zero(head);
+ if (ret)
+ count_increased = true;
+ } else {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ if (TestSetPageHWPoison(head)) {
+ ret = -EHWPOISON;
+ goto out;
+ }
+
+ return ret;
+out:
+ if (count_increased)
+ put_page(head);
+ return ret;
+}
+
+#ifdef CONFIG_HUGETLB_PAGE
+/*
+ * Taking refcount of hugetlb pages needs extra care about race conditions
+ * with basic operations like hugepage allocation/free/demotion.
+ * So some of prechecks for hwpoison (pinning, and testing/setting
+ * PageHWPoison) should be done in single hugetlb_lock range.
+ */
+static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb)
{
- struct page *p = pfn_to_page(pfn);
- struct page *head = compound_head(p);
int res;
+ struct page *p = pfn_to_page(pfn);
+ struct page *head;
unsigned long page_flags;
+ bool retry = true;
- if (TestSetPageHWPoison(head)) {
- pr_err("Memory failure: %#lx: already hardware poisoned\n",
- pfn);
- res = -EHWPOISON;
- if (flags & MF_ACTION_REQUIRED)
+ *hugetlb = 1;
+retry:
+ res = get_huge_page_for_hwpoison(pfn, flags);
+ if (res == 2) { /* fallback to normal page handling */
+ *hugetlb = 0;
+ return 0;
+ } else if (res == -EHWPOISON) {
+ pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
+ if (flags & MF_ACTION_REQUIRED) {
+ head = compound_head(p);
res = kill_accessing_process(current, page_to_pfn(head), flags);
+ }
return res;
+ } else if (res == -EBUSY) {
+ if (retry) {
+ retry = false;
+ goto retry;
+ }
+ action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
+ return res;
+ }
+
+ head = compound_head(p);
+ lock_page(head);
+
+ if (hwpoison_filter(p)) {
+ ClearPageHWPoison(head);
+ res = -EOPNOTSUPP;
+ goto out;
}
num_poisoned_pages_inc();
- if (!(flags & MF_COUNT_INCREASED)) {
- res = get_hwpoison_page(p, flags);
- if (!res) {
- lock_page(head);
- if (hwpoison_filter(p)) {
- if (TestClearPageHWPoison(head))
- num_poisoned_pages_dec();
- unlock_page(head);
- return -EOPNOTSUPP;
- }
- unlock_page(head);
- res = MF_FAILED;
- if (__page_handle_poison(p)) {
- page_ref_inc(p);
- res = MF_RECOVERED;
- }
- action_result(pfn, MF_MSG_FREE_HUGE, res);
- return res == MF_RECOVERED ? 0 : -EBUSY;
- } else if (res < 0) {
- action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
- return -EBUSY;
+ /*
+ * Handling free hugepage. The possible race with hugepage allocation
+ * or demotion can be prevented by PageHWPoison flag.
+ */
+ if (res == 0) {
+ unlock_page(head);
+ res = MF_FAILED;
+ if (__page_handle_poison(p)) {
+ page_ref_inc(p);
+ res = MF_RECOVERED;
}
+ action_result(pfn, MF_MSG_FREE_HUGE, res);
+ return res == MF_RECOVERED ? 0 : -EBUSY;
}
- lock_page(head);
-
/*
* The page could have changed compound pages due to race window.
* If this happens just bail out.
@@ -1554,14 +1617,6 @@ static int memory_failure_hugetlb(unsign
page_flags = head->flags;
- if (hwpoison_filter(p)) {
- if (TestClearPageHWPoison(head))
- num_poisoned_pages_dec();
- put_page(p);
- res = -EOPNOTSUPP;
- goto out;
- }
-
/*
* TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
* simply disable it. In order to make it work properly, we need
@@ -1588,6 +1643,12 @@ out:
unlock_page(head);
return res;
}
+#else
+static inline int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb)
+{
+ return 0;
+}
+#endif
static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
struct dev_pagemap *pgmap)
@@ -1712,6 +1773,7 @@ int memory_failure(unsigned long pfn, in
int res = 0;
unsigned long page_flags;
bool retry = true;
+ int hugetlb = 0;
if (!sysctl_memory_failure_recovery)
panic("Memory failure on page %lx", pfn);
@@ -1739,10 +1801,9 @@ int memory_failure(unsigned long pfn, in
}
try_again:
- if (PageHuge(p)) {
- res = memory_failure_hugetlb(pfn, flags);
+ res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
+ if (hugetlb)
goto unlock_mutex;
- }
if (TestSetPageHWPoison(p)) {
pr_err("Memory failure: %#lx: already hardware poisoned\n",
_
Patches currently in -mm which might be from naoya.horiguchi(a)nec.com are
mm-hwpoison-put-page-in-already-hwpoisoned-case-with-mf_count_increased.patch
revert-mm-memory-failurec-fix-race-with-changing-page-compound-again.patch
mm-hugetlb-hwpoison-separate-branch-for-free-and-in-use-hugepage.patch
Since linux-stable 5.10.112 commit:
1ff5359afa5e ("net: micrel: fix KS8851_MLL Kconfig")
it is not possible to select KS8851_MLL symbol.
This is because the commit adds dependency on Kconfig symbol
PTP_1588_CLOCK_OPTIONAL
which was added in Linux upstream commit:
e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies")
And the aforementioned commit is not part of stable 5.10.112.
This is a note to let you know that I've just added the patch titled
usb: dwc3: core: Only handle soft-reset in DCTL
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From f4fd84ae0765a80494b28c43b756a95100351a94 Mon Sep 17 00:00:00 2001
From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Date: Thu, 21 Apr 2022 19:33:56 -0700
Subject: usb: dwc3: core: Only handle soft-reset in DCTL
Make sure not to set run_stop bit or link state change request while
initiating soft-reset. Register read-modify-write operation may
unintentionally start the controller before the initialization completes
with its previous DCTL value, which can cause initialization failure.
Fixes: f59dcab17629 ("usb: dwc3: core: improve reset sequence")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/6aecbd78328f102003d40ccf18ceeebd411d3703.16505947…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 1ca9dae57855..d28cd1a6709b 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -274,7 +274,8 @@ int dwc3_core_soft_reset(struct dwc3 *dwc)
reg = dwc3_readl(dwc->regs, DWC3_DCTL);
reg |= DWC3_DCTL_CSFTRST;
- dwc3_writel(dwc->regs, DWC3_DCTL, reg);
+ reg &= ~DWC3_DCTL_RUN_STOP;
+ dwc3_gadget_dctl_write_safe(dwc, reg);
/*
* For DWC_usb31 controller 1.90a and later, the DCTL.CSFRST bit
--
2.36.0
commit 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members before
initialization") attempted to fix a race condition that lead to a NULL
pointer, but in the process caused a regression for _AEI/_EVT declared
GPIOs. This manifests in messages showing deferred probing while trying
to allocate IRQs like so:
[ 0.688318] amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x0000 to IRQ, err -517
[ 0.688337] amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x002C to IRQ, err -517
[ 0.688348] amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x003D to IRQ, err -517
[ 0.688359] amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x003E to IRQ, err -517
[ 0.688369] amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x003A to IRQ, err -517
[ 0.688379] amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x003B to IRQ, err -517
[ 0.688389] amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x0002 to IRQ, err -517
[ 0.688399] amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x0011 to IRQ, err -517
[ 0.688410] amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x0012 to IRQ, err -517
[ 0.688420] amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x0007 to IRQ, err -517
The code for walking _AEI doesn't handle deferred probing and so this leads
to non-functional GPIO interrupts.
Fix this issue by moving the call to `acpi_gpiochip_request_interrupts` to
occur after gc->irc.initialized is set.
Cc: Shreeya Patel <shreeya.patel(a)collabora.com>
Cc: stable(a)vger.kernel.org
Fixes: 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members before initialization")
Reported-by: Mario Limonciello <mario.limonciello(a)amd.com>
Link: https://lore.kernel.org/linux-gpio/BL1PR12MB51577A77F000A008AA694675E2EF9@B…
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/gpio/gpiolib.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 085348e08986..b7694171655c 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -1601,8 +1601,6 @@ static int gpiochip_add_irqchip(struct gpio_chip *gc,
gpiochip_set_irq_hooks(gc);
- acpi_gpiochip_request_interrupts(gc);
-
/*
* Using barrier() here to prevent compiler from reordering
* gc->irq.initialized before initialization of above
@@ -1612,6 +1610,8 @@ static int gpiochip_add_irqchip(struct gpio_chip *gc,
gc->irq.initialized = true;
+ acpi_gpiochip_request_interrupts(gc);
+
return 0;
}
--
2.34.1