From: Shyjumon N <shyjumon.n(a)intel.com>
commit 1fae37accfc5872af3905d4ba71dc6ab15829be7 upstream
The Samsung SSD SM981/PM981 and Toshiba SSD KBG40ZNT256G on the Lenovo
C640 platform experience runtime resume issues when the SSDs are kept in
sleep/suspend mode for long time.
This patch applies the 'Simple Suspend' quirk to these configurations.
With this patch, the issue had not been observed in a 1+ day test.
Reviewed-by: Jon Derrick <jonathan.derrick(a)intel.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Shyjumon N <shyjumon.n(a)intel.com>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Erpeng Xu <xuerpeng(a)uniontech.com>
---
drivers/nvme/host/pci.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9c80f9f08149..b0434b687b17 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2747,6 +2747,18 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
(dmi_match(DMI_BOARD_NAME, "PRIME B350M-A") ||
dmi_match(DMI_BOARD_NAME, "PRIME Z370-A")))
return NVME_QUIRK_NO_APST;
+ } else if ((pdev->vendor == 0x144d && (pdev->device == 0xa801 ||
+ pdev->device == 0xa808 || pdev->device == 0xa809)) ||
+ (pdev->vendor == 0x1e0f && pdev->device == 0x0001)) {
+ /*
+ * Forcing to use host managed nvme power settings for
+ * lowest idle power with quick resume latency on
+ * Samsung and Toshiba SSDs based on suspend behavior
+ * on Coffee Lake board for LENOVO C640
+ */
+ if ((dmi_match(DMI_BOARD_VENDOR, "LENOVO")) &&
+ dmi_match(DMI_BOARD_NAME, "LNVNB161216"))
+ return NVME_QUIRK_SIMPLE_SUSPEND;
}
return 0;
--
2.45.2
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 7697a0fe0154468f5df35c23ebd7aa48994c2cdc
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072921-props-yam-bb2b@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
7697a0fe0154 ("LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h")
26a3b85bac08 ("loongarch: convert to generic syscall table")
505d66d1abfb ("clone3: drop __ARCH_WANT_SYS_CLONE3 macro")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7697a0fe0154468f5df35c23ebd7aa48994c2cdc Mon Sep 17 00:00:00 2001
From: Huacai Chen <chenhuacai(a)kernel.org>
Date: Sat, 20 Jul 2024 22:40:58 +0800
Subject: [PATCH] LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Chromium sandbox apparently wants to deny statx [1] so it could properly
inspect arguments after the sandboxed process later falls back to fstat.
Because there's currently not a "fd-only" version of statx, so that the
sandbox has no way to ensure the path argument is empty without being
able to peek into the sandboxed process's memory. For architectures able
to do newfstatat though, glibc falls back to newfstatat after getting
-ENOSYS for statx, then the respective SIGSYS handler [2] takes care of
inspecting the path argument, transforming allowed newfstatat's into
fstat instead which is allowed and has the same type of return value.
But, as LoongArch is the first architecture to not have fstat nor
newfstatat, the LoongArch glibc does not attempt falling back at all
when it gets -ENOSYS for statx -- and you see the problem there!
Actually, back when the LoongArch port was under review, people were
aware of the same problem with sandboxing clone3 [3], so clone was
eventually kept. Unfortunately it seemed at that time no one had noticed
statx, so besides restoring fstat/newfstatat to LoongArch uapi (and
postponing the problem further), it seems inevitable that we would need
to tackle seccomp deep argument inspection.
However, this is obviously a decision that shouldn't be taken lightly,
so we just restore fstat/newfstatat by defining __ARCH_WANT_NEW_STAT
in unistd.h. This is the simplest solution for now, and so we hope the
community will tackle the long-standing problem of seccomp deep argument
inspection in the future [4][5].
Also add "newstat" to syscall_abis_64 in Makefile.syscalls due to
upstream asm-generic changes.
More infomation please reading this thread [6].
[1] https://chromium-review.googlesource.com/c/chromium/src/+/2823150
[2] https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux…
[3] https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal…
[4] https://lwn.net/Articles/799557/
[5] https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-i…
[6] https://lore.kernel.org/loongarch/20240226-granit-seilschaft-eccc2433014d@b…
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
diff --git a/arch/loongarch/include/asm/unistd.h b/arch/loongarch/include/asm/unistd.h
index fc0a481a7416..e2c0f3d86c7b 100644
--- a/arch/loongarch/include/asm/unistd.h
+++ b/arch/loongarch/include/asm/unistd.h
@@ -8,6 +8,7 @@
#include <uapi/asm/unistd.h>
+#define __ARCH_WANT_NEW_STAT
#define __ARCH_WANT_SYS_CLONE
#define NR_syscalls (__NR_syscalls)
diff --git a/arch/loongarch/kernel/Makefile.syscalls b/arch/loongarch/kernel/Makefile.syscalls
index ab7d9baa2915..523bb411a3bc 100644
--- a/arch/loongarch/kernel/Makefile.syscalls
+++ b/arch/loongarch/kernel/Makefile.syscalls
@@ -1,4 +1,3 @@
# SPDX-License-Identifier: GPL-2.0
-# No special ABIs on loongarch so far
-syscall_abis_64 +=
+syscall_abis_64 += newstat
To clarify…
On 02/07/2024 5:54 pm, Calum Mackay wrote:
> hi Petr,
>
> I noticed your LTP patch [1][2] which adjusts the nfsstat01 test on v6.9
> kernels, to account for Josef's changes [3], which restrict the NFS/RPC
> stats per-namespace.
>
> I see that Josef's changes were backported, as far back as longterm
> v5.4,
Sorry, that's not quite accurate.
Josef's NFS client changes were all backported from v6.9, as far as
longterm v5.4.y:
2057a48d0dd0 sunrpc: add a struct rpc_stats arg to rpc_create_args
d47151b79e32 nfs: expose /proc/net/sunrpc/nfs in net namespaces
1548036ef120 nfs: make the rpc_stat per net namespace
Of Josef's NFS server changes, four were backported from v6.9 to v6.8:
418b9687dece sunrpc: use the struct net as the svc proc private
d98416cc2154 nfsd: rename NFSD_NET_* to NFSD_STATS_*
93483ac5fec6 nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
4b14885411f7 nfsd: make all of the nfsd stats per-network namespace
and the others remained only in v6.9:
ab42f4d9a26f sunrpc: don't change ->sv_stats if it doesn't exist
a2214ed588fb nfsd: stop setting ->pg_stats for unused stats
f09432386766 sunrpc: pass in the sv_stats struct through svc_create_pooled
3f6ef182f144 sunrpc: remove ->pg_stats from svc_program
e41ee44cc6a4 nfsd: remove nfsd_stats, make th_cnt a global counter
16fb9808ab2c nfsd: make svc_stat per-network namespace instead of global
I'm wondering if this difference between NFS client, and NFS server,
stat behaviour, across kernel versions, may perhaps cause some user
confusion?
cheers,
calum.
> so your check for kernel version "6.9" in the test may need to be
> adjusted, if LTP is intended to be run on stable kernels?
>
> best wishes,
> calum.
>
>
> [1] https://lore.kernel.org/ltp/20240620111129.594449-1-pvorel@suse.cz/
> [2] https://patchwork.ozlabs.org/project/ltp/
> patch/20240620111129.594449-1-pvorel(a)suse.cz/
> [3] https://lore.kernel.org/linux-nfs/
> cover.1708026931.git.josef(a)toxicpanda.com/
We recently made GUP's common page table walking code to also walk hugetlb
VMAs without most hugetlb special-casing, preparing for the future of
having less hugetlb-specific page table walking code in the codebase.
Turns out that we missed one page table locking detail: page table locking
for hugetlb folios that are not mapped using a single PMD/PUD.
Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
page tables, will perform a pte_offset_map_lock() to grab the PTE table
lock.
However, hugetlb that concurrently modifies these page tables would
actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
locks would differ. Something similar can happen right now with hugetlb
folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
This issue can be reproduced [1], for example triggering:
[ 3105.936100] ------------[ cut here ]------------
[ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188
[ 3105.944634] Modules linked in: [...]
[ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1
[ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024
[ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3105.991108] pc : try_grab_folio+0x11c/0x188
[ 3105.994013] lr : follow_page_pte+0xd8/0x430
[ 3105.996986] sp : ffff80008eafb8f0
[ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43
[ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48
[ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978
[ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001
[ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000
[ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000
[ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0
[ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080
[ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000
[ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000
[ 3106.047957] Call trace:
[ 3106.049522] try_grab_folio+0x11c/0x188
[ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0
[ 3106.055527] follow_page_mask+0x1a0/0x2b8
[ 3106.058118] __get_user_pages+0xf0/0x348
[ 3106.060647] faultin_page_range+0xb0/0x360
[ 3106.063651] do_madvise+0x340/0x598
Let's make huge_pte_lockptr() effectively use the same PT locks as any
core-mm page table walker would. Add ptep_lockptr() to obtain the PTE
page table lock using a pte pointer -- unfortunately we cannot convert
pte_lockptr() because virt_to_page() doesn't work with kmap'ed page
tables we can have with CONFIG_HIGHPTE.
Take care of PTE tables possibly spanning multiple pages, and take care of
CONFIG_PGTABLE_LEVELS complexity when e.g., PMD_SIZE == PUD_SIZE. For
example, with CONFIG_PGTABLE_LEVELS == 2, core-mm would detect
with hugepagesize==PMD_SIZE pmd_leaf() and use the pmd_lockptr(), which
would end up just mapping to the per-MM PT lock.
There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
folio being mapped using two PTE page tables. While hugetlb wants to take
the PMD table lock, core-mm would grab the PTE table lock of one of both
PTE page tables. In such corner cases, we have to make sure that both
locks match, which is (fortunately!) currently guaranteed for 8xx as it
does not support SMP and consequently doesn't use split PT locks.
[1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/
Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
Reviewed-by: James Houghton <jthoughton(a)google.com>
Cc: <stable(a)vger.kernel.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
Third time is the charm?
Retested on arm64 and x86-64. Cross-compiled on a bunch of others.
v2 -> v3:
* Handle CONFIG_PGTABLE_LEVELS oddities as good as possible. It's a mess.
Remove the size >= P4D_SIZE check and simply default to the
&mm->page_table_lock.
* Align the PTE pointer to the start of the page table to handle PTE page
tables bigger than a single page (unclear if this could currently trigger).
* Extend patch description
v1 -> 2:
* Extend patch description
* Drop "mm: let pte_lockptr() consume a pte_t pointer"
* Introduce ptep_lockptr() in this patch
---
include/linux/hugetlb.h | 27 +++++++++++++++++++++++++--
include/linux/mm.h | 22 ++++++++++++++++++++++
2 files changed, 47 insertions(+), 2 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c9bf68c239a01..e6437a06e2346 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -944,9 +944,32 @@ static inline bool htlb_allow_alloc_fallback(int reason)
static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
- if (huge_page_size(h) == PMD_SIZE)
+ unsigned long size = huge_page_size(h);
+
+ VM_WARN_ON(size == PAGE_SIZE);
+
+ /*
+ * hugetlb must use the exact same PT locks as core-mm page table
+ * walkers would. When modifying a PTE table, hugetlb must take the
+ * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
+ * PT lock etc.
+ *
+ * The expectation is that any hugetlb folio smaller than a PMD is
+ * always mapped into a single PTE table and that any hugetlb folio
+ * smaller than a PUD (but at least as big as a PMD) is always mapped
+ * into a single PMD table.
+ *
+ * If that does not hold for an architecture, then that architecture
+ * must disable split PT locks such that all *_lockptr() functions
+ * will give us the same result: the per-MM PT lock.
+ */
+ if (size < PMD_SIZE && !IS_ENABLED(CONFIG_HIGHPTE))
+ /* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */
+ return ptep_lockptr(mm, pte);
+ else if (size < PUD_SIZE || CONFIG_PGTABLE_LEVELS == 2)
return pmd_lockptr(mm, (pmd_t *) pte);
- VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
+ else if (size < P4D_SIZE || CONFIG_PGTABLE_LEVELS == 3)
+ return pud_lockptr(mm, (pud_t *) pte);
return &mm->page_table_lock;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b100df8cb5857..f6c7fe8f5746f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2926,6 +2926,24 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
}
+static inline struct page *ptep_pgtable_page(pte_t *pte)
+{
+ unsigned long mask = ~(PTRS_PER_PTE * sizeof(pte_t) - 1);
+
+ BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE));
+ return virt_to_page((void *)((unsigned long)pte & mask));
+}
+
+static inline struct ptdesc *ptep_ptdesc(pte_t *pte)
+{
+ return page_ptdesc(ptep_pgtable_page(pte));
+}
+
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ return ptlock_ptr(ptep_ptdesc(pte));
+}
+
static inline bool ptlock_init(struct ptdesc *ptdesc)
{
/*
@@ -2950,6 +2968,10 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
{
return &mm->page_table_lock;
}
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ return &mm->page_table_lock;
+}
static inline void ptlock_cache_init(void) {}
static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
static inline void ptlock_free(struct ptdesc *ptdesc) {}
--
2.45.2
When operating in High-Speed, it is observed that DSTS[USBLNKST] doesn't
update link state immediately after receiving the wakeup interrupt. Since
wakeup event handler calls the resume callbacks, there is a chance that
function drivers can perform an ep queue. Which in turn tries to perform
remote wakeup from send_gadget_ep_cmd(), this happens because DSTS[[21:18]
wasn't updated to U0 yet. It is observed that the latency of DSTS can be
in order of milli-seconds. Hence update the dwc->link_state from evtinfo,
and use this variable to prevent calling remote wakup unnecessarily.
Fixes: ecba9bc9946b ("usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
drivers/usb/dwc3/gadget.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..3b55285118b0 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -328,7 +328,8 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
}
if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int link_state;
+ int link_state;
+ bool remote_wakeup = false;
/*
* Initiate remote wakeup if the link state is in U3 when
@@ -339,15 +340,26 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
link_state = dwc3_gadget_get_link_state(dwc);
switch (link_state) {
case DWC3_LINK_STATE_U2:
- if (dwc->gadget->speed >= USB_SPEED_SUPER)
+ if (dwc->gadget->speed < USB_SPEED_SUPER)
+ remote_wakeup = true;
+ break;
+ case DWC3_LINK_STATE_U3:
+ /*
+ * In HS, DSTS can take few milliseconds to update linkstate bits,
+ * so rely on dwc->link_state to identify whether gadget woke up.
+ * Don't issue remote wakuep again if link is already in U0.
+ */
+ if (dwc->link_state == DWC3_LINK_STATE_U0)
break;
- fallthrough;
- case DWC3_LINK_STATE_U3:
+ remote_wakeup = true;
+ break;
+ }
+
+ if (remote_wakeup) {
ret = __dwc3_gadget_wakeup(dwc, false);
dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
ret);
- break;
}
}
@@ -4214,6 +4226,7 @@ static void dwc3_gadget_conndone_interrupt(struct dwc3 *dwc)
static void dwc3_gadget_wakeup_interrupt(struct dwc3 *dwc, unsigned int evtinfo)
{
dwc->suspended = false;
+ dwc->link_state = evtinfo & DWC3_LINK_STATE_MASK;
/*
* TODO take core out of low power mode when that's
@@ -4225,8 +4238,6 @@ static void dwc3_gadget_wakeup_interrupt(struct dwc3 *dwc, unsigned int evtinfo)
dwc->gadget_driver->resume(dwc->gadget);
spin_lock(&dwc->lock);
}
-
- dwc->link_state = evtinfo & DWC3_LINK_STATE_MASK;
}
static void dwc3_gadget_linksts_change_interrupt(struct dwc3 *dwc,
--
2.25.1
From: Jakub Kicinski <kuba(a)kernel.org>
[ Upstream commit e01e3934a1b2d122919f73bc6ddbe1cdafc4bbdb ]
Similarly to previous commit, the submitting thread (recvmsg/sendmsg)
may exit as soon as the async crypto handler calls complete().
Reorder scheduling the work before calling complete().
This seems more logical in the first place, as it's
the inverse order of what the submitting thread will do.
Reported-by: valis <sec(a)valis.email>
Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Reviewed-by: Sabrina Dubroca <sd(a)queasysnail.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
(cherry picked from commit 6db22d6c7a6dc914b12c0469b94eb639b6a8a146)
[Lee: Fixed merge-conflict in Stable branches linux-6.1.y and older]
Signed-off-by: Lee Jones <lee(a)kernel.org>
---
net/tls/tls_sw.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2bd27b77769cb..d53587ff9ddea 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -449,7 +449,6 @@ static void tls_encrypt_done(crypto_completion_data_t *data, int err)
struct scatterlist *sge;
struct sk_msg *msg_en;
struct tls_rec *rec;
- bool ready = false;
struct sock *sk;
rec = container_of(aead_req, struct tls_rec, aead_req);
@@ -486,19 +485,16 @@ static void tls_encrypt_done(crypto_completion_data_t *data, int err)
/* If received record is at head of tx_list, schedule tx */
first_rec = list_first_entry(&ctx->tx_list,
struct tls_rec, list);
- if (rec == first_rec)
- ready = true;
+ if (rec == first_rec) {
+ /* Schedule the transmission */
+ if (!test_and_set_bit(BIT_TX_SCHEDULED,
+ &ctx->tx_bitmask))
+ schedule_delayed_work(&ctx->tx_work.work, 1);
+ }
}
if (atomic_dec_and_test(&ctx->encrypt_pending))
complete(&ctx->async_wait.completion);
-
- if (!ready)
- return;
-
- /* Schedule the transmission */
- if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
- schedule_delayed_work(&ctx->tx_work.work, 1);
}
static int tls_encrypt_async_wait(struct tls_sw_context_tx *ctx)
--
2.44.0.278.ge034bb2e1d-goog
commit 11a1f4bc47362700fcbde717292158873fb847ed upstream.
Keith reports a use-after-free when a DPC event occurs concurrently to
hot-removal of the same portion of the hierarchy:
The dpc_handler() awaits readiness of the secondary bus below the
Downstream Port where the DPC event occurred. To do so, it polls the
config space of the first child device on the secondary bus. If that
child device is concurrently removed, accesses to its struct pci_dev
cause the kernel to oops.
That's because pci_bridge_wait_for_secondary_bus() neglects to hold a
reference on the child device. Before v6.3, the function was only
called on resume from system sleep or on runtime resume. Holding a
reference wasn't necessary back then because the pciehp IRQ thread
could never run concurrently. (On resume from system sleep, IRQs are
not enabled until after the resume_noirq phase. And runtime resume is
always awaited before a PCI device is removed.)
However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also
called on a DPC event. Commit 53b54ad074de ("PCI/DPC: Await readiness
of secondary bus after reset"), which introduced that, failed to
appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a
reference on the child device because dpc_handler() and pciehp may
indeed run concurrently. The commit was backported to v5.10+ stable
kernels, so that's the oldest one affected.
Add the missing reference acquisition.
Abridged stack trace:
BUG: unable to handle page fault for address: 00000000091400c0
CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0
RIP: pci_bus_read_config_dword+0x17/0x50
pci_dev_wait()
pci_bridge_wait_for_secondary_bus()
dpc_reset_link()
pcie_do_recovery()
dpc_handler()
Fixes: 53b54ad074de ("PCI/DPC: Await readiness of secondary bus after reset")
Closes: https://lore.kernel.org/r/20240612181625.3604512-3-kbusch@meta.com/
Link: https://lore.kernel.org/linux-pci/8e4bcd4116fd94f592f2bf2749f168099c480ddf.…
Reported-by: Keith Busch <kbusch(a)kernel.org>
Tested-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski(a)kernel.org>
Reviewed-by: Keith Busch <kbusch(a)kernel.org>
Reviewed-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # v5.10+
---
drivers/pci/pci.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 530ced8f7abd..09d5fa637b98 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4817,7 +4817,7 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
int timeout)
{
struct pci_dev *child;
- int delay;
+ int delay, ret = 0;
if (pci_dev_is_disconnected(dev))
return 0;
@@ -4845,8 +4845,8 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
return 0;
}
- child = list_first_entry(&dev->subordinate->devices, struct pci_dev,
- bus_list);
+ child = pci_dev_get(list_first_entry(&dev->subordinate->devices,
+ struct pci_dev, bus_list));
up_read(&pci_bus_sem);
/*
@@ -4856,7 +4856,7 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
if (!pci_is_pcie(dev)) {
pci_dbg(dev, "waiting %d ms for secondary bus\n", 1000 + delay);
msleep(1000 + delay);
- return 0;
+ goto put_child;
}
/*
@@ -4877,7 +4877,7 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
* until the timeout expires.
*/
if (!pcie_downstream_port(dev))
- return 0;
+ goto put_child;
if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
@@ -4888,11 +4888,16 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
if (!pcie_wait_for_link_delay(dev, true, delay)) {
/* Did not train, no need to wait any further */
pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n");
- return -ENOTTY;
+ ret = -ENOTTY;
+ goto put_child;
}
}
- return pci_dev_wait(child, reset_type, timeout - delay);
+ ret = pci_dev_wait(child, reset_type, timeout - delay);
+
+put_child:
+ pci_dev_put(child);
+ return ret;
}
void pci_reset_secondary_bus(struct pci_dev *dev)
--
2.43.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 3415b10a03945b0da4a635e146750dfe5ce0f448
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073000-polish-wish-b988@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
3415b10a0394 ("kbuild: Fix '-S -c' in x86 stack protector scripts")
3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable")
c9a1ff316bc9 ("x86/stackprotector: Pre-initialize canary for secondary CPUs")
dc4e0021b00b ("x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area")
e99b6f46ee5c ("x86/doublefault/32: Rename doublefault.c to doublefault_32.c")
93efbde2c331 ("x86/traps: Disentangle the 32-bit and 64-bit doublefault code")
ab851d49f6bf ("Merge branch 'x86-iopl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3415b10a03945b0da4a635e146750dfe5ce0f448 Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan(a)kernel.org>
Date: Fri, 26 Jul 2024 11:05:00 -0700
Subject: [PATCH] kbuild: Fix '-S -c' in x86 stack protector scripts
After a recent change in clang to stop consuming all instances of '-S'
and '-c' [1], the stack protector scripts break due to the kernel's use
of -Werror=unused-command-line-argument to catch cases where flags are
not being properly consumed by the compiler driver:
$ echo | clang -o - -x c - -S -c -Werror=unused-command-line-argument
clang: error: argument unused during compilation: '-c' [-Werror,-Wunused-command-line-argument]
This results in CONFIG_STACKPROTECTOR getting disabled because
CONFIG_CC_HAS_SANE_STACKPROTECTOR is no longer set.
'-c' and '-S' both instruct the compiler to stop at different stages of
the pipeline ('-S' after compiling, '-c' after assembling), so having
them present together in the same command makes little sense. In this
case, the test wants to stop before assembling because it is looking at
the textual assembly output of the compiler for either '%fs' or '%gs',
so remove '-c' from the list of arguments to resolve the error.
All versions of GCC continue to work after this change, along with
versions of clang that do or do not contain the change mentioned above.
Cc: stable(a)vger.kernel.org
Fixes: 4f7fd4d7a791 ("[PATCH] Add the -fstack-protector option to the CFLAGS")
Fixes: 60a5317ff0f4 ("x86: implement x86_32 stack protector")
Link: https://github.com/llvm/llvm-project/commit/6461e537815f7fa68cef06842505353… [1]
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
diff --git a/scripts/gcc-x86_32-has-stack-protector.sh b/scripts/gcc-x86_32-has-stack-protector.sh
index 825c75c5b715..9459ca4f0f11 100755
--- a/scripts/gcc-x86_32-has-stack-protector.sh
+++ b/scripts/gcc-x86_32-has-stack-protector.sh
@@ -5,4 +5,4 @@
# -mstack-protector-guard-reg, added by
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
-echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard - -o - 2> /dev/null | grep -q "%fs"
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -m32 -O0 -fstack-protector -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard - -o - 2> /dev/null | grep -q "%fs"
diff --git a/scripts/gcc-x86_64-has-stack-protector.sh b/scripts/gcc-x86_64-has-stack-protector.sh
index 75e4e22b986a..f680bb01aeeb 100755
--- a/scripts/gcc-x86_64-has-stack-protector.sh
+++ b/scripts/gcc-x86_64-has-stack-protector.sh
@@ -1,4 +1,4 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
-echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m64 -O0 -mcmodel=kernel -fno-PIE -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -m64 -O0 -mcmodel=kernel -fno-PIE -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
Some older systems still compile kernels with old gcc version.
$ grep -B3 "^GNU C" linux-5.10.223-rc1/Documentation/Changes
====================== =============== ========================================
Program Minimal version Command to check the version
====================== =============== ========================================
GNU C 4.9 gcc --version
These warnings and errors show up when compiling with gcc 4.9.2
UPD include/config/kernel.release
UPD include/generated/uapi/linux/version.h
UPD include/generated/utsrelease.h
CC scripts/mod/empty.o
In file included from ././include/linux/compiler_types.h:65:0,
from <command-line>:0:
./include/linux/compiler_attributes.h:29:29: warning: "__GCC4_has_attribute___uninitialized__" is not defined [-Wundef]
# define __has_attribute(x) __GCC4_has_attribute_##x
^
./include/linux/compiler_attributes.h:278:5: note: in expansion of macro '__has_attribute'
#if __has_attribute(__uninitialized__)
^
[SNIP]
AR arch/x86/events/built-in.a
CC arch/x86/kvm/../../../virt/kvm/kvm_main.o
In file included from ././include/linux/compiler_types.h:65:0,
from <command-line>:0:
./include/linux/compiler_attributes.h:29:29: error: "__GCC4_has_attribute___uninitialized__" is not defined [-Werror=undef]
# define __has_attribute(x) __GCC4_has_attribute_##x
^
./include/linux/compiler_attributes.h:278:5: note: in expansion of macro '__has_attribute'
#if __has_attribute(__uninitialized__)
^
cc1: all warnings being treated as errors
make[2]: *** [scripts/Makefile.build:286: arch/x86/kvm/../../../virt/kvm/kvm_main.o] Error 1
make[1]: *** [scripts/Makefile.build:503: arch/x86/kvm] Error 2
make: *** [Makefile:1832: arch/x86] Error 2
Following patch fixes this. Upstream won't need this because
newer kernels are not compilable with gcc 4.9
--- ./include/linux/compiler_attributes.h.OLD
+++ ./include/linux/compiler_attributes.h
@@ -37,6 +37,7 @@
# define __GCC4_has_attribute___nonstring__ 0
# define __GCC4_has_attribute___no_sanitize_address__ (__GNUC_MINOR__ >= 8)
# define __GCC4_has_attribute___no_sanitize_undefined__ (__GNUC_MINOR__ >= 9)
+# define __GCC4_has_attribute___uninitialized__ 0
# define __GCC4_has_attribute___fallthrough__ 0
# define __GCC4_has_attribute___warning__ 1
#endif
Fixes: upstream fd7eea27a3ae "Compiler Attributes: Add __uninitialized macro"
Signed-off-by: Jari Ruusu <jariruusu(a)protonmail.com>
--
Jari Ruusu 4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD ACDF F073 3C80 8132 F189